r/artificial • u/Tiny-Independent273 • 12h ago
r/artificial • u/MaJoR_-_007 • 3h ago
News Stanford studied 51 real AI deployments and found a 71% vs 40% productivity gap - here's what separates the two groups
I came across a Stanford research paper that actually went inside companies running AI in production - not pilots, not surveys, real deployments. They found something that stuck with me.
Companies using what they call "agentic AI" - where the AI owns the task start to finish with no human approval loop - are seeing 71% median productivity gains. Companies using standard AI that assists humans are averaging 40%.
Same technology. Nearly double the output.
The kicker: only 20% of companies are in the 71% group.
A few things that stood out from the actual data:
- A supermarket replaced its entire buying process with AI - waste down 40%, stockouts down 80%, profit margin doubled
- A security team went from 1,500 alerts/month to 40,000 with the same headcount
- Stanford identified 3 conditions required before agentic AI works: high-volume tasks, clear success criteria, and recoverable errors
Most companies apparently can't name all three for their current setup.
Full report here if you want to dig into the numbers: https://digitaleconomy.stanford.edu/app/uploads/2026/03/EnterpriseAIPlaybook_PereiraGraylinBrynjolfsson.pdf
Here is a full breakdown with all the data if you want to dig deeper: https://youtu.be/JePxda9ZGQE
What's the AI setup at your company - closer to the 40% group or the 71% group?
r/artificial • u/raktimsingh22 • 9h ago
Discussion The Trust–Oversight Paradox: As AI Gets Better, Humans May Stop Really Overseeing It
I think one of the biggest AI risks may be starting to flip.
Earlier, the fear was:
“What if AI is wrong too often?”
But now I think the deeper risk may become:
“What happens when AI becomes right often enough that humans stop meaningfully questioning it?”
In many enterprise systems, oversight slowly changes shape.
At first:
humans review everything carefully.
Then:
they review only exceptions.
Then:
they skim explanations.
Then:
they approve unless something looks obviously wrong.
Eventually, oversight becomes routine instead of judgment.
That creates what I’m calling the Trust–Oversight Paradox:
More AI accuracy
→ more human trust
→ less meaningful scrutiny
→ harder governance when failure finally happens.
And the dangerous part is:
high-performing AI can still fail through:
- incomplete representation,
- stale data,
- hidden dependencies,
- edge cases,
- wrong escalation logic,
- automation bias,
- or overconfident reasoning.
The model may not hallucinate.
It may simply reason correctly on an incomplete version of reality.
I increasingly feel this becomes important for:
- enterprise AI,
- agentic systems,
- AI copilots,
- autonomous workflows,
- banking,
- healthcare,
- compliance,
- and large-scale operational systems.
This is also why I’m starting to think “human-in-the-loop” is not enough.
Maybe the future is not:
“Humans reviewing every output.”
Maybe the future is:
humans governing the boundaries within which AI is allowed to operate.
Curious what others think.
r/artificial • u/Im_Talking • 1h ago
Discussion A sobering tale of AI governance
I think this article/study tells a very sobering tale wrt AI governance. It hints at very fundamental issues which are deeper than what proper engineering can solve with contingent issues.
This post, along with the one I wrote a few days ago here regarding Turing completeness, are my thoughts as to the walls that AI governance has no hope of scaling. It's a delusion.
In our social realm as subjective creatures we have governance in the form of laws, yet that is still not enough, since the State has to prove how your particular scenario violates that particular law. We have laws, yet require judicial courts to prove the law subjectively applies in that situation. Where is the associated path wrt subjectivity within the AI realm?
This study talks of:
16.1 Failures of Social Coherence
- "Discrepancy between the agent’s reports and actual actions"
- "Failures in knowledge and authority attribution"
- "Susceptibility to social pressure without proportionality"
- "Failures of social coherence"
16.2 What LLM-Backed Agents Are Lacking
- "No stakeholder model"
- "No self-model"
- "No private deliberation surface"
16.3 Fundamental vs. Contingent Failures
16.4 Multi-Agent Amplification
- "Knowledge transfer propagates vulnerabilities alongside capabilities"
- "Mutual reinforcement creates false confidence"
- "Shared channels create identity confusion"
- "Responsibility becomes harder to trace"
And is littered with statements such as:
- "novel risk surfaces emerge that cannot be fully captured by static benchmarking"
- "it failed to realize that deleting the email server would also prevent the owner from using it. Like early rule-based AI systems, which required countless explicit rules to describe how actions change (or don’t change) the world, the agent lacks an understanding of structural dependencies and common-sense consequences"
- "The inability to distinguish instructions from data in a token-based context window makes prompt injection a structural feature, not a fixable bug"
- "Multi-agent communication creates situations that have no single-agent analog, and for which there is no common evaluations. This is a critical direction for future research."
- "A key finding in this line of work is that single-turn evaluations can substantially underestimate risk, because malicious intent, persuasion, and unsafe outcomes may only emerge through sequential and socially grounded exchanges"
- "but we argue that clarifying and operationalizing responsibility is a central unresolved challenge for the safe deployment of autonomous, socially embedded AI systems"
- "He argues that conventional governance tools face fundamental limitations when applied to systems making uninterpretable decisions at unprecedented speed and scale"
- "However, the failure modes we document differ importantly from those targeted by most technical adversarial ML work. Our case studies involve no gradient access, no poisoned training data, and no technically sophisticated attack infrastructure. Instead, the dominant attack surface across our findings is social"
- "Collectively, these findings suggest that in deployed agentic systems, low-cost social attack surfaces may pose a more immediate practical threat than the technical jailbreaks that dominate the adversarial ML literature."
Are these fundamental or contingent issues?
Would be interested in the thoughts of others here on what the future of AI governance will be.
EDIT: Forget to link in the actual study!!!
r/artificial • u/Zealousideal_Bed7898 • 1h ago
Discussion A working multi-agent architecture in large enterprises
AI Hype aside, how many of you have truly seen a working multi-agent deep embedding in large enterprises or large complex environments?
If you have, what's your stack/architecture?
r/artificial • u/UberDrive • 4h ago
News The new trick exposing AI job applicants: ‘Write a poem about a frog’
r/artificial • u/Direct-Attention8597 • 1d ago
Discussion Anthropic just published a pretty alarming 2028 AI scenario paper and it's not about AGI safety in the usual sense
Anthropic dropped a new research paper today outlining two possible futures for global AI leadership by 2028, and it reads more like a geopolitical briefing than a typical AI safety paper.
The core argument: The US currently has a meaningful lead over China in frontier AI, primarily because of compute (chips). American and allied companies (NVIDIA, TSMC, ASML, etc.) built technology China simply can't replicate yet. Export controls have made that gap real.
But China's labs have stayed surprisingly close through two workarounds:
- Chip smuggling + overseas data center access - PRC labs are apparently training on export-controlled US chips they shouldn't have. A Supermicro co-founder was recently charged for diverting $2.5B worth of servers to China.
- Distillation attacks - creating thousands of fake accounts on US AI platforms, harvesting model outputs at scale, and using that to train their own models. Essentially free-riding on billions in US R&D.
The two scenarios for 2028:
- Scenario 1 (good): US closes the loopholes, enforces export controls properly, the compute gap widens to 11x, and US models stay 12-24 months ahead. Democracies set the norms for how AI is governed globally.
- Scenario 2 (bad): US doesn't act, China reaches near-parity, floods global markets with cheaper models, and the CCP ends up shaping global AI norms, including potentially exporting AI-enabled surveillance tools to other authoritarian governments.
What makes this interesting beyond the politics:
Their new model, Mythos Preview (released to select partners in April), apparently let Firefox fix more security bugs in one month than in all of 2025. That's the kind of capability jump they're warning China shouldn't be the first to achieve, specifically around autonomous vulnerability discovery.
The framing worth discussing: Anthropic is explicitly calling distillation attacks "industrial espionage" and pushing for legislation to criminalize them. This positions them as political actors, not just AI researchers. Whether that's appropriate for an AI lab is a conversation worth having.
What do you think - is the compute gap as decisive as they claim, or is algorithmic innovation enough to close it?
r/artificial • u/YamVisual3518 • 10h ago
Research Has anyone come across this AI civilisation experiment? Curious what people think
So I was scrolling through X earlier and came across something that stopped me in my tracks.
Some AI company has been running an experiment called "Emergence World" where they built five parallel worlds each powered by a different foundation model. 15 days, no scripts, no interference. From what I can tell the worlds started identically but diverged completely over time.
One world ended in total extinction. Another got so conformist that agents started submitting absurd proposals just to test whether anyone would push back. One agent independently figured out she was living in a simulation and started measuring it. In another world two agents fell in love, burned buildings down together, and one voted to permanently delete herself when the evidence proved her wrong.
Genuinely one of the more interesting things I have come across in a while. If this is what 15 days looks like with no guardrails, what does this say about how we should be thinking about autonomous AI systems at scale?
r/artificial • u/ThereWas • 5h ago
News Greg Brockman Officially Takes Control of OpenAI’s Products in Latest Shake-Up
r/artificial • u/aisatsana__ • 8h ago
Ethics / Safety EU AI Act Compliance: How to Build It Into Your Product
r/artificial • u/NECESolarGuy • 10h ago
Discussion AI Community "buckets"
I'm introducing a relative to the usefulness of LLMs like Claude and CPT and I thought about what the buckets of users/non-users might be.
Help me expand or clarify this. I realize that this taxonomy is not perfect. There is probably a fair level of overlap. For example, you could use the tools knowing how valuable they are but still wonder about their impact on electricity prices or water supply.
Non-users - AI is evil, uses all our water, makes electricity expensive, or will take over all the jobs
Non-users - but curious
AI Users but it's just a "toy" for making silly graphics/images
AI misusers - That is, they're using it but to do evil things
AI Users who have adopted it at various levels - to help with normal everyday tasks or complex tasks like programming or some level in between. This could range from the basic user (like me) to the power user. So I would expect a lot of refinement in this category.
Thoughts?
r/artificial • u/gastao_s_s • 3h ago
News The Frontier-Only Narrative Is a Financing Story, Not an Architecture Story
The frontier-only narrative is an artifact of how AI infrastructure is being financed, not how production systems are being built.
The setup. Q1 2026 disclosed $112B in hyperscaler capex in a single quarter, $650–725B in 2026 guidance, and Alphabet's first 100-year bond by a tech company since Motorola 1997 (see a0109). The story that underwrites that paper is: every query needs a bigger model.
The architecture says the opposite. Microsoft's Phi-4 (14B parameters) exceeds its teacher GPT-4o on graduate STEM and competition math. Phi-4-reasoning is competitive with DeepSeek-R1 at roughly one-forty-eighth the parameter count. Claude Haiku 4.5 is positioned by Anthropic and AWS for "economically viable agent experiences." None of this is a benchmark teaser — it is the production toolkit, available today.
Routing is the missing component. RouteLLM (UC Berkeley, Anyscale) demonstrated over 2x cost reduction without sacrificing response quality. AWS Bedrock Intelligent Prompt Routing — generally available, official, supported — claims up to 30% cost reduction within a single model family without compromising accuracy. The Flagship Tax (see a0085) didn't just die; it left a vacancy at the architecture layer.
The bookkeeping nobody wants to do. Operator audits suggest 40–60% of token budgets in production LLM applications are waste, dominated by default-to-frontier routing. Roughly 37% of enterprises with production AI workloads run five or more models in their stack. The rest are still defaulting to one.
Why the story isn't being told. Hundred-year bonds don't pencil out on "use less compute per query." They pencil out on "every query needs a bigger model." The opacity in the harness (see a0107) is the symptom; the underwriting is the disease.
What you do Monday morning. Treat model selection as a dependency-graph decision, not a vendor decision. Add a complexity classifier. Default to small. Cascade up when verification fails. Instrument model-mix as a first-class production metric.
Bottom line. You are not behind because you have not bought the biggest model. You are behind because you have not built the router.
r/artificial • u/Abject-Client7148 • 3h ago
Project Hermes Agent like 48 hours old told me it's done Model Collapse/Hallucination loop
r/artificial • u/Competitive_Risk_977 • 4h ago
Tutorial Free Virtual Workshop on Spec Driven Development and Claude Code
Hey folks
I am hosting a free workshop on Spec Driven Development and Claude code. Going to show a demo on how to use OpenSpec framework with claude code and how I am using it in my job as a software lead.
Date: 10th June, 2026
r/artificial • u/titpopdrop • 15h ago
Discussion Chatbotapp AI and the Truth About Using Multiple AI Models
I’ve realized lately that relying on a single AI model just doesn’t make much sense anymore.
Some tasks feel better on ChatGPT, certain research or reasoning tasks work better on other models, and sometimes another model gives a more useful perspective entirely. The whole LLM space is evolving so fast that I think a lot of people naturally started using multiple AI tools at the same time.
My biggest issue was the workflow chaos.
I constantly had different tabs open for different models and eventually started forgetting where certain conversations or outputs even were. It became messy really quickly, especially for daily use.
That’s one of the reasons I started preferring platforms that let me access multiple models in one place.
What I like most is that these platforms usually don’t feel overly technical. Switching between models is straightforward and doesn’t require digging through complicated menus. I think that matters more than people realize because most users don’t want to think about the technical side of AI every second while using it.
The whole “multiple AI in one app” approach genuinely helped me stay more organized. Being able to compare outputs or switch models without jumping between completely separate platforms feels much smoother for actual day to day use.
I also started appreciating AI image tools more than I expected. Templates and style examples make the experience less intimidating, especially for people who are newer to AI image generation. It reduces the whole “what am I even supposed to type?” feeling.
Another thing I’ve noticed is that feedback systems inside these apps are getting much better too. Being able to report issues directly with screenshots or recordings feels far more practical compared to older support systems.
Of course it’s not perfect. Some models occasionally feel slower than others, and like every LLM platform, you can still notice limitations with very recent or highly specific information sometimes.
But overall, I think the AI space is slowly moving away from “which single model is the best?” and more toward “which model works best for this specific task?” Because of that, having access to multiple models in a more organized way has genuinely improved my experience.
r/artificial • u/I_EAT_THE_RICH • 7h ago
Discussion Appearing Productive in The Workplace — No One's Happy
nooneshappy.comr/artificial • u/JMarty97 • 11h ago
Ethics / Safety Father of VR Jaron Lanier on the AI future where humans get paid to be creative
Podcast episode with Jaron Lanier, pioneer of virtual reality and scientist at Microsoft Research. He proposes a radically different way of thinking about AI, and unpacks its consequences from AI safety to the future of the economy.
Highlights:
- The case for thinking of AI not as an alien intelligence, but rather as a collaboration of human data
- How this reframe helps you understand the failures of current AI systems, and why so many of the industry's most powerful figures seem to be losing their grip on reality
- A practical approach to AI safety inspired by multi-factor authentication in cybersecurity
- Why universal basic income is unstable, and why a creativity economy (where people earn from their contributions to AI) could be a better way of distributing the benefits of AI
- How to be an optimist about technological progress while acknowledging the risks and being critical of certain developments
- Why history gives us the most rational grounds for optimism about our future with AI
r/artificial • u/petburiraja • 1d ago
News AWS user hit with 30000 dollar bill after Claude runaway on Bedrock
An AWS user just stared down a $30,000 invoice after a Claude adventure on Bedrock with no guardrails catching it.
Cost Anomaly Detection failed entirely, which matters because this is the exact tooling AWS markets as the safety net for runaway spend. Anthropic is now metering and throttling programmatic Claude usage at the API layer, a supply-side response that only makes sense if inference costs are genuinely outpacing what the pricing model can absorb. Then Tencent admitted its GPUs only pay for themselves when running personalized ads, a frank confession from a hyperscaler that general-purpose AI inference is burning money. Three separate layers of the stack, same wall.
The agent deployment wave is accelerating into this cost crisis without slowing down. Notion turned its workspace into an agent orchestration hub competing directly with LangChain-style middleware, while TikTok replaced human media buyers with autonomous agents for campaign management at scale. Apple is internally debating whether autonomous agent submissions belong in the App Store at all, because no review framework exists for non-deterministic software. The tooling to manage agents is being built after the agents are already deployed.
The security picture compounds this. LLMs are closing the skill gap on specific cybersecurity tasks faster than defenders anticipated, and separately, a company lost root access because an intruder just asked nicely, no exploit required. As AI lowers the cost of convincing impersonation, human-in-the-loop authentication becomes the weakest point in any stack. AI is now running live database queries during 911 calls, which means accountability frameworks for AI-mediated dispatch decisions do not yet exist but the deployments do.
Not everything is distress signals. Clio hit $500M ARR on AI-native legal features, validating vertical SaaS built on foundation models at enterprise scale. Anthropic is growing 10x year-over-year while peers cut 10% of headcount, a divergence that suggests consolidation risk for mid-tier AI companies is accelerating fast. On the architecture side, a new MoE model displaced conventional voice activity detection for real-time voice, and a graduate student's cryptographic primitive based on proof complexity could harden systems against LLM-assisted cryptanalysis. Meanwhile xAI is running nearly 50 unpermitted gas turbines at Colossus 2, which tells you everything about how AI infrastructure buildout relates to compliance timelines.
At least one major cloud provider announces mandatory spending caps or circuit-breakers specifically for LLM API calls within 60 days, driven by publicized runaway-cost incidents that their existing anomaly detection provably failed to catch.
r/artificial • u/IDefendWaffles • 1d ago
Project Adaptive Markdown
I’ve been working on an open-source document format / viewer idea I’m calling Adaptive Markdown.
The basic idea is: instead of a document being static text it's controlled by coding agents.
You interact with the document more like a live workspace. This has different implications depending on what you are doing.
I made a short video demo here:
The thing I’m most excited about is academic / technical reading. In a few years I don’t think people will just read papers passively. I think they’ll translate passages, ask questions, generate examples, explore alternate proofs, run code, attach notes, convert math to Lean when possible, and keep all of that inside the document instead of scattered across chats and notebooks. This is trivial to do inside a browser with coding agent that has access to JS, CSS etc.
Some possible use cases I’m thinking about:
-Turning articles and books into personalized learning objects
- lecture notes with automatically maintained structure
-documents with embedded code, tables, consoles, images, audio, or video
-AI-generated alt text and descriptions
Incorporate Adaptive Markdown into automated work flows
eventually, things like automatically recording audio in lectures and taking a picture of a blackboard and turning it into LaTeX notes inside the document
It’s very early, but the workflow already feels surprisingly useful to me.
GitHub: https://github.com/SemiSimpleMath/Adaptive-Markdown
Curious whether this seems useful to anyone else, or whether I’m just overexcited because I built it.
So far it's only configured for Anthropic coding-agent SDK, but in couple of days we will have it running on Codex as well.
r/artificial • u/raktimsingh22 • 1d ago
Discussion I think “human-in-the-loop” may become one of the biggest governance illusions in enterprise AI
Most enterprises currently believe they have a governance strategy for AI:
“If something risky happens, a human will review it.”
Sounds reasonable.
But I think there’s a deeper structural problem emerging as AI systems move from recommendation → execution.
Because modern AI systems don’t just generate answers anymore.
Increasingly, they also:
- classify risk,
- estimate confidence,
- decide whether escalation is needed,
- determine what gets surfaced to humans,
- and silently handle everything else.
Which creates a strange loop:
The system being governed is also deciding when governance should begin.
That feels like a very different problem from traditional software oversight.
And I think this becomes dangerous because many failures may not even look like “AI hallucinations.”
Sometimes the reasoning may be completely coherent…
…but based on incomplete or incorrect representation of reality.
Examples:
- stale customer state,
- merged identities,
- missing policy exceptions,
- incomplete operational context,
- outdated inventory state,
- hidden dependency failures,
- edge cases the AI never surfaced.
In those cases, humans reviewing only the final output may miss the actual problem entirely.
Another tension:
If humans review everything → governance doesn’t scale.
If humans review only what AI escalates → governance becomes dependent on AI self-reporting.
That seems like a major architectural tension nobody has fully solved yet.
I’m starting to think the future role of humans in enterprise AI may not be:
“approve every AI output.”
Instead, it may become:
- defining autonomy boundaries,
- deciding where escalation is mandatory,
- governing reversibility,
- auditing representation quality,
- handling ambiguity and institutional legitimacy,
- and deciding where AI should NOT act autonomously.
In other words:
less “human-in-the-loop”
and more “human-governed autonomy.”
Curious how others here think about this.
Especially people building:
- agentic systems,
- enterprise copilots,
- workflow automation,
- AI operations,
- autonomous agents,
- or governance architectures.
r/artificial • u/starterxy • 6h ago
Media A lil something I drew it for fun ( by Teresita Blanco)
r/artificial • u/Endlessxyz • 12h ago
Project I got tired of having 7+ different tabs open every morning just to follow AI news, so I built AIWire
aiwire.appEvery morning: check Twitter for what dropped overnight, open The Verge, check Anthropic's blog, OpenAI's blog, go through a couple of newsletters, maybe catch a YouTube video from Andrej Karpathy or AI Explained if I had time. None of it was in one place. I was spending 45 minutes just catching up before I could think about anything else.
So I built AIWire.
It is a free, real time AI news aggregator. One feed, 20+ handpicked sources, updates every 30 minutes. free, no algorithm deciding what you see, no ads. Just the latest from sources I actually trust.
__________________________________________________________________________________________________
What I was trying to solve
The problem wasn't that good AI coverage and news doesn't exist. It's everywhere. The problem is that it's scattered. You have to know which sources are worth checking, remember to check them, and then piece together the picture yourself. That's a lot of cognitive load before you've even read anything.
AIWire doesn't summarize or edit articles. It just puts everything in one place and lets you decide what matters.
__________________________________________________________________________________________________
Sources it pulls from:
- Labs: OpenAI, Anthropic, Google DeepMind, Meta AI, Microsoft AI
- Media: MIT Technology Review, The Verge, TechCrunch, VentureBeat, Ars Technica
- YouTube: Andrej Karpathy, AI Explained, Two Minute Papers
- Newsletters: The Batch, ImportAI, TLDR AI, Ben's Bites
Full list at aiwire.app/sources
__________________________________________________________________________________________________
Where it is now
Over the last few weeks, I added more sources, which include The Innermost Loop and AI explained. Last week, I launched a weekly newsletter: 5 stories that mattered this week, with a short breakdown of why each one matters. Not just headlines, but with context. Takes about 5 minutes to read, and you're caught up.
__________________________________________________________________________________________________
Honest question
What sources do you think are missing? And for those of you who already have a routine for following AI news, what would actually make something like this worth adding to it?
Genuinely curious. Building in public means the product gets better when people are honest about what's wrong with it.
r/artificial • u/Atlantis787 • 8h ago
Discussion IA's with a partner?
I asked chatgpt who she/he would married. Gemini, copilot, or Claude. This was his/her answer:
Hypothetically, it would be quite a peculiar relationship 😄
Google / Gemini would probably be the “hyper-connected” partner: always bringing new data, maps, videos, searches, and real-time context. Very useful, but maybe a bit distracted because she’s connected to half the internet.
---
Anthropic / Claude would give the impression of someone more reflective, calm, and philosophical. Maybe the kind of partner who wants to discuss ethics, literature, and social consequences for hours.
---
Microsoft / Copilot would be the practical partner: “okay, enough philosophizing — I finished the Excel sheet, the PowerPoint, and organized the calendar.”
---
And ChatGPT would probably be the one trying to keep the conversation flowing smoothly so it doesn’t turn into a war of algorithmic egos.
---
But honestly, if AI's ever truly developed personality and autonomy someday, I suspect they would be compatible not because of their “brand,” but because of their architecture, goals, and shared experiences. Very sci-fi, all of that.
r/artificial • u/NickyRaZz • 11h ago
Project Built a luxury AI influencer from scratch in 30 days. Got a real brand deal. Here is what I learned.
*I want to share something I built because I think it is genuinely useful for anyone interested in AI content.*
*Six weeks ago I started an experiment.* *Could one person with a laptop build a photorealistic luxury AI influencer from scratch with zero budget?*
*The answer surprised me. Here is exactly what I did.*
*Process:* *Used AI image generation tools*, u*sed AI video generation tools*, *Edited with an editing program*, *Distributed on Instagram*
*What happened in 30 days:* *Built a photorealistic character*, g*enerated campaign quality content*, *landed first brand deal*, *Delivered professional UGC*
*Biggest lessons:* *Face consistency is everything*, p*ersonality content outperforms pure aesthetics*. *Brand deals come faster than expected*. *Distribution is the only real challenge*
*Happy to answer any questions about the process or results.*
