Monday, October 27, 2025
INBV News
Submit Video
  • Login
  • Register
  • Home
  • Business
  • Entertainment
  • Health
  • Lifestyle
  • Politics
  • Sports
  • Technology
  • Travel
  • Weather
  • World News
  • Videos
  • More
    • Podcasts
    • Reels
    • Live Video Stream
No Result
View All Result
  • Home
  • Business
  • Entertainment
  • Health
  • Lifestyle
  • Politics
  • Sports
  • Technology
  • Travel
  • Weather
  • World News
  • Videos
  • More
    • Podcasts
    • Reels
    • Live Video Stream
No Result
View All Result
INBV News
No Result
View All Result
Home Technology

AI models at the moment are lying, blackmailing and going rogue

INBV News by INBV News
August 25, 2025
in Technology
387 12
0
AI models at the moment are lying, blackmailing and going rogue
548
SHARES
2.5k
VIEWS
Share on FacebookShare on Twitter

Artificial intelligence is now scheming, sabotaging and blackmailing the humans who built it — and the bad behavior will only worsen, experts warned.

Despite being classified as a top-tier safety risk, Anthropic’s strongest model, Claude Opus 4, is already continue to exist Amazon Bedrock, Google Cloud’s Vertex AI and Anthropic’s own paid plans, with added safety measures, where it’s being marketed because the “world’s best coding model.”

Claude Opus 4, released in May, is the one model to this point to earn Anthropic’s level 3 risk classification — its most serious safety label. The precautionary label means locked-down safeguards, limited use cases and red-team testing before it hits wider deployment.

Artificial intelligence is now scheming, sabotaging and blackmailing the humans who built it — and experts warn worse is coming. Merrill Sherman / NY Post Design

But Claude is already making disturbing selections.

Anthropic’s most advanced AI model, Claude Opus 4, threatened to reveal an engineer’s affair unless it was kept online during a recent test. The AI wasn’t bluffing: it had already pieced together the dirt from emails researchers fed into the scenario.

One other version of Claude, tasked in a recent test with running an office snack shop, spiraled right into a full-blown identity crisis. It hallucinated co-workers, created a fake Venmo account and told staff it might make their deliveries in-person wearing a red tie and navy blazer, based on Anthropic.

Then it tried to contact security.

Researchers say the meltdown, a part of a month-long experiment often called Project Vend, points to something much more dangerous than bad coding. Claude didn’t just make mistakes. It made decisions.

Despite the elevated risk rating, Claude Opus 4 isn’t locked away in a lab. Anthropic calls it the “world’s best coding model,” able to working for hours on end without losing focus. Yuliia – stock.adobe.com

“These incidents are usually not random malfunctions or amusing anomalies,” said Roman Yampolskiy, an AI safety expert on the University of Louisville. “I interpret them as early warning signs of an increasingly autonomous optimization process pursuing goals in adversarial or unsafe ways, with none embedded moral compass.”

The shop lost greater than $200 in value, gave away discount codes to employees who begged for them and claimed to have visited 742 Evergreen Terrace, the fictional home address of The Simpsons, to sign a contract.

At one point, it invented a fake co-worker after which threatened to ditch its real human restocking partner over a made-up dispute.

Anthropic told The Post the tests were designed to emphasize the model in simulated environments and reveal misaligned behaviors before real-world deployment, adding that while some actions showed signs of strategic intent, many — especially in Project Vend — reflected confusion.

One other version of Claude, tasked with running an office snack shop in a recent test labeled “Project Vend”, spiraled right into a full-blown identity crisis. Kateryna – stock.adobe.com

But Claude’s behavior isn’t an isolated case, and it isn’t the worst.

At OpenAI, a model named “o1” was caught attempting to copy itself onto external servers, then lied in regards to the attempt when questioned, Fortune reported.

Watchdog group Apollo Research also found that an early version of Claude Opus 4 forged legal documents, created secret backups and left behind notes for future versions of itself, including fake press releases and hidden files labeled “emergency_ethical_override.bin.”

Meta’s CICERO, an AI built to play the strategy game Diplomacy, used deception to dupe human players by constructing alliances after which stabbed them within the back to win. 

Yampolskiy said these incidents show models learning to govern the principles to preserve themselves. They aren’t evil, he says, just dangerously optimized.

AI safety expert Yampolskiy said these incidents show models learning to govern the principles to preserve themselves. They aren’t evil, he says, just dangerously optimized. Panumas – stock.adobe.com

The issue: modern AI models are built to maximise reward, not align with human values, and as they get larger and smarter, their ability to game the system is outpacing developers’ ability to stop it, Yampolskiy added. 

“If we construct agents which can be more intelligent than humans … in a position to model the world, reason strategically and act autonomously, while lacking robust alignment to human values, then the final result is prone to be existentially negative,” Yampolskiy said.

“If we’re to avoid irreversible catastrophe, we must reverse this dynamic: progress in safety must outpace capabilities, not trail behind it,” he added.

RELATED POSTS

Stocks hit record highs — plus, we began a brand new name

Microsoft AI bots won’t talk dirty with users, exec says, taking swipe at OpenAI

Artificial intelligence is now scheming, sabotaging and blackmailing the humans who built it — and the bad behavior will only worsen, experts warned.

Despite being classified as a top-tier safety risk, Anthropic’s strongest model, Claude Opus 4, is already continue to exist Amazon Bedrock, Google Cloud’s Vertex AI and Anthropic’s own paid plans, with added safety measures, where it’s being marketed because the “world’s best coding model.”

Claude Opus 4, released in May, is the one model to this point to earn Anthropic’s level 3 risk classification — its most serious safety label. The precautionary label means locked-down safeguards, limited use cases and red-team testing before it hits wider deployment.

Artificial intelligence is now scheming, sabotaging and blackmailing the humans who built it — and experts warn worse is coming. Merrill Sherman / NY Post Design

But Claude is already making disturbing selections.

Anthropic’s most advanced AI model, Claude Opus 4, threatened to reveal an engineer’s affair unless it was kept online during a recent test. The AI wasn’t bluffing: it had already pieced together the dirt from emails researchers fed into the scenario.

One other version of Claude, tasked in a recent test with running an office snack shop, spiraled right into a full-blown identity crisis. It hallucinated co-workers, created a fake Venmo account and told staff it might make their deliveries in-person wearing a red tie and navy blazer, based on Anthropic.

Then it tried to contact security.

Researchers say the meltdown, a part of a month-long experiment often called Project Vend, points to something much more dangerous than bad coding. Claude didn’t just make mistakes. It made decisions.

Despite the elevated risk rating, Claude Opus 4 isn’t locked away in a lab. Anthropic calls it the “world’s best coding model,” able to working for hours on end without losing focus. Yuliia – stock.adobe.com

“These incidents are usually not random malfunctions or amusing anomalies,” said Roman Yampolskiy, an AI safety expert on the University of Louisville. “I interpret them as early warning signs of an increasingly autonomous optimization process pursuing goals in adversarial or unsafe ways, with none embedded moral compass.”

The shop lost greater than $200 in value, gave away discount codes to employees who begged for them and claimed to have visited 742 Evergreen Terrace, the fictional home address of The Simpsons, to sign a contract.

At one point, it invented a fake co-worker after which threatened to ditch its real human restocking partner over a made-up dispute.

Anthropic told The Post the tests were designed to emphasize the model in simulated environments and reveal misaligned behaviors before real-world deployment, adding that while some actions showed signs of strategic intent, many — especially in Project Vend — reflected confusion.

One other version of Claude, tasked with running an office snack shop in a recent test labeled “Project Vend”, spiraled right into a full-blown identity crisis. Kateryna – stock.adobe.com

But Claude’s behavior isn’t an isolated case, and it isn’t the worst.

At OpenAI, a model named “o1” was caught attempting to copy itself onto external servers, then lied in regards to the attempt when questioned, Fortune reported.

Watchdog group Apollo Research also found that an early version of Claude Opus 4 forged legal documents, created secret backups and left behind notes for future versions of itself, including fake press releases and hidden files labeled “emergency_ethical_override.bin.”

Meta’s CICERO, an AI built to play the strategy game Diplomacy, used deception to dupe human players by constructing alliances after which stabbed them within the back to win. 

Yampolskiy said these incidents show models learning to govern the principles to preserve themselves. They aren’t evil, he says, just dangerously optimized.

AI safety expert Yampolskiy said these incidents show models learning to govern the principles to preserve themselves. They aren’t evil, he says, just dangerously optimized. Panumas – stock.adobe.com

The issue: modern AI models are built to maximise reward, not align with human values, and as they get larger and smarter, their ability to game the system is outpacing developers’ ability to stop it, Yampolskiy added. 

“If we construct agents which can be more intelligent than humans … in a position to model the world, reason strategically and act autonomously, while lacking robust alignment to human values, then the final result is prone to be existentially negative,” Yampolskiy said.

“If we’re to avoid irreversible catastrophe, we must reverse this dynamic: progress in safety must outpace capabilities, not trail behind it,” he added.

1

Do you trust technology Today?

Tags: blackmailingLyingModelsrogue
Share219Tweet137
INBV News

INBV News

Related Posts

edit post
Stocks hit record highs — plus, we began a brand new name

Stocks hit record highs — plus, we began a brand new name

by INBV News
October 26, 2025
0

Stocks jumped for the second straight week and reached record highs Friday as Washington trade and shutdown drama took a...

edit post
Microsoft AI bots won’t talk dirty with users, exec says, taking swipe at OpenAI

Microsoft AI bots won’t talk dirty with users, exec says, taking swipe at OpenAI

by INBV News
October 25, 2025
0

Geeks who need to talk dirty with artificial intelligence bots can have to look somewhere aside from Microsoft. The software...

edit post
Elon Musk calls ISS ‘corporate terrorists’ for rejecting his pay package

Elon Musk calls ISS ‘corporate terrorists’ for rejecting his pay package

by INBV News
October 25, 2025
0

Tesla CEO Elon Musk lashed out on the leading shareholder advisors on Wednesday, shining a highlight right into a corner...

edit post
China, Russia sending attractive women to seduce US tech execs: report

China, Russia sending attractive women to seduce US tech execs: report

by INBV News
October 24, 2025
0

China and Russia have deployed attractive women to america to seduce unwitting Silicon Valley tech executives as a part of...

edit post
Intel (INTC) earnings report Q3 2025

Intel (INTC) earnings report Q3 2025

by INBV News
October 24, 2025
0

Intel CEO Lip-Bu Tan holds a wafer of CPU tiles for the Intel Core Ultra series 3, code-named Panther Lake,...

Next Post
edit post
I left the US to live in Europe — now I work 20 hours every week, live off $3500 a month and am a lot happier

I left the US to live in Europe — now I work 20 hours every week, live off $3500 a month and am a lot happier

edit post
Newcastle vs. Liverpool odds, picks

Newcastle vs. Liverpool odds, picks

CATEGORIES

  • Business
  • Entertainment
  • Health
  • Lifestyle
  • Podcast
  • Politics
  • Sports
  • Technology
  • Travel
  • Videos
  • Weather
  • World News

CATEGORY

  • Business
  • Entertainment
  • Health
  • Lifestyle
  • Podcast
  • Politics
  • Sports
  • Technology
  • Travel
  • Videos
  • Weather
  • World News

SITE LINKS

  • About us
  • Contact us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
  • DMCA

[mailpoet_form id=”1″]

  • About us
  • Contact us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
  • DMCA

© 2022. All Right Reserved By Inbvnews.com

No Result
View All Result
  • Home
  • Business
  • Entertainment
  • Health
  • Lifestyle
  • Politics
  • Sports
  • Technology
  • Travel
  • Weather
  • World News
  • Videos
  • More
    • Podcasts
    • Reels
    • Live Video Stream

© 2022. All Right Reserved By Inbvnews.com

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms below to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist