Welcome to Eye on AI. On this version…President Trump takes intention at state AI rules with a brand new government order…OpenAI unveils a brand new picture generator to meet up with Google’s Nano Banana….Google DeepMind trains a extra succesful agent for digital worlds…and an AI security report card doesn’t present a lot reassurance.
Whats up. 2025 was speculated to be the yr of AI brokers. However because the yr attracts to an in depth, it’s clear such prognostications from tech distributors have been overly optimistic. Sure, some corporations have began to make use of AI brokers. However most should not but doing so, particularly not in company-wide deployments.
A McKinsey “State of AI” survey from final month discovered {that a} majority of companies had but to start utilizing AI brokers, whereas 40% stated they have been experimenting. Lower than 1 / 4 stated they’d deployed AI brokers at scale in no less than one use case; and when the consulting agency requested folks about whether or not they have been utilizing AI in particular capabilities, equivalent to advertising and gross sales or human sources, the outcomes have been even worse. Not more than 10% of survey respondents stated they’d AI brokers “totally scaled” or have been “within the technique of scaling” in any of those areas. The one operate with essentially the most utilization of scaled brokers was IT (the place brokers are sometimes used to robotically resolve service tickets or set up software program for workers), and even right here solely 2% reported having brokers “totally scaled,” with an extra 8% saying they have been “scaling.”
An enormous a part of the issue is that designing workflows for AI brokers that may allow them to provide dependable outcomes seems to be tough. Even essentially the most able to right this moment’s AI fashions sit on an odd boundary—able to doing sure duties in a workflow in addition to people, however unable to do others. Complicated duties that contain gathering information from a number of sources and utilizing software program instruments over many steps symbolize a selected problem. The longer the workflow, the extra threat that an error in one of many early steps in a course of will compound, leading to a failed final result. Plus, essentially the most succesful AI fashions will be costly to make use of at scale, particularly if the workflow entails the agent having to do a number of planning and reasoning.
Many corporations have sought to resolve these issues by designing “multi-agent workflows,” the place completely different brokers are spun up, with every assigned only one discrete step within the workflow, together with typically utilizing one agent to verify the work of one other agent. This will enhance efficiency, nevertheless it can also wind up being costly—typically too costly to make the workflow value automating.
Are two AI brokers at all times higher than one?
Now a crew at Google has performed analysis that goals to present companies a great rubric for deciding when it’s higher to make use of a single agent, versus constructing a multi-agent workflow, and what sort of multi-agent workflows could be finest for a selected process.
The researchers performed 180 managed experiments utilizing AI fashions from Google, OpenAI, and Anthropic. It tried them in opposition to 4 completely different agentic AI benchmarks that lined a various set of objectives: retrieving data from a number of web sites; planning in a Minecraft recreation setting; planning and gear use to perform frequent enterprise duties equivalent to answering emails, scheduling conferences, and utilizing mission administration software program; and a finance agent benchmark. That finance take a look at requires brokers to retrieve data from SEC filings and carry out fundamental analytics, equivalent to evaluating precise outcomes to administration’s forecasts from the prior quarter, determining how income derived from a selected product phase has modified over time, or determining how a lot money an organization might need free for M&A exercise.
Up to now yr, the standard knowledge has been that multi-agent workflows produce extra dependable outcomes. (I’ve beforehand written about this view, which has been backed up by the expertise of some corporations, equivalent to Prosus, right here in Eye on AI.) However the Google researchers discovered as a substitute that whether or not the standard knowledge held was extremely contingent on precisely what the duty was.
Single brokers do higher at sequential steps, worse at parallel ones
If the duty was sequential, which was the case for most of the Minecraft benchmark duties, then it turned out that as long as a single AI agent might carry out the duty precisely no less than 45% of the time (which is a reasonably low bar, in my view), then it was higher to deploy only one agent. Utilizing a number of brokers, in any configuration, decreased general efficiency by big quantities, ranging between 39% and 70%. The rationale, in response to the researchers, is that if an organization had a restricted token price range for finishing the complete process, then the calls for of a number of brokers making an attempt to determine easy methods to use completely different instruments would rapidly overwhelm the price range.
But when a process concerned steps that could possibly be carried out in parallel, as was true for most of the monetary evaluation duties, then multi-agent methods conveyed large benefits. What’s extra, the researchers discovered that precisely how the brokers are configured to work with each other makes a giant distinction, too. For the financial-analysis duties, a centralized multi-agent syste—the place a single coordinator agent directs and oversees the exercise of a number of sub-agents and all communication flows to and from the coordinator—produced the perfect end result. This technique carried out 80% higher than a single agent. In the meantime, an impartial multi-agent system, wherein there is no such thing as a coordinator and every agent is just assigned a slim position that they full in parallel, was solely 57% higher than a single agent.
Analysis like this could assist corporations determine the perfect methods to configure AI brokers and allow the know-how to lastly start to ship on final yr’s guarantees. For these promoting AI agent know-how, late is best than by no means. For the folks working within the companies utilizing AI brokers, we’ll need to see what affect these brokers have on the labor market. That’s a narrative we’ll be watching carefully as we head into 2026.
With that, right here’s extra AI information.
Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn
FORTUNE ON AI
A grassroots NIMBY revolt is popping voters in Republican strongholds in opposition to the AI data-center growth —by Eva Roytburg
Accenture exec will get actual on transformation: ‘The information and AI technique isn’t a separate technique, it’s the enterprise technique’ —by Nick Lichtenberg
AWS CEO says changing younger workers with AI is ‘one of many dumbest concepts’—and dangerous for enterprise: ‘Sooner or later the entire thing explodes on itself’ —by Sasha Rogelberg
What occurs to previous AI chips? They’re nonetheless put to good use and don’t depreciate that quick, analyst says —by Jason Ma
AI IN THE NEWS
President Trump indicators government order to cease state-level AI regulation. President Trump signed an government order giving the U.S. Lawyer Basic broad energy to problem and probably overturn state legal guidelines that regulate synthetic intelligence, arguing they hinder U.S. “international AI dominance.” The order additionally permits federal businesses to withhold funding from states that hold such legal guidelines. Trump stated he wished to exchange what he referred to as a complicated patchwork of state guidelines with a single federal framework—however the order didn’t comprise any new federal necessities for these constructing AI fashions. Tech corporations welcomed the transfer, however the government order drew bipartisan criticism and is anticipated to face authorized challenges from states and client teams who argue that solely Congress can pre-empt state legal guidelines. Learn extra right here from the New York Occasions.
Oracle inventory hammered on stories of knowledge heart delays, big lease obligations. Oracle denied a Bloomberg report that it had delayed completion of knowledge facilities being constructed for OpenAI, saying all tasks stay on monitor to fulfill contractual commitments regardless of labor and supplies shortages. The report rattled traders already frightened about Oracle’s debt-heavy push into AI infrastructure beneath its $300 billion OpenAI deal, and traders pummeled Oracle’s inventory worth. You possibly can learn extra on Oracle’s denial from Reuters right here. Oracle was additionally shaken by stories that it has $248 billion in rental funds for information facilities that may start between now and 2028. That was lined by Bloomberg right here.
OpenAI launches new picture technology mannequin. The corporate debuted a brand new picture technology AI mannequin that it says affords extra fine-grained enhancing management and generates pictures 4 instances sooner than its earlier picture creators. The transfer is being extensively considered as an effort by OpenAI to point out that it has not misplaced floor to opponents, specifically Google, whose Nano Banana Professional picture technology mannequin has been the discuss of the web because it launched in late November. You possibly can learn extra from Fortune’s Sharon Goldman right here.
OpenAI hires Shopify government in push to make ChatGPT an ‘working system’ The AI firm employed Glen Coates, who had been head of “core product” at Shopify, to be its new head of app platform, working beneath ChatGPT product head Nick Turley. “We’re going to search out out what occurs if you happen to architect an OS ground-up with a genius at its core that use its apps similar to you may,” Coates wrote in a LinkedIn submit saying the transfer.
EYE ON AI RESEARCH
A Google DeepMind agent that may make advanced plans in a digital world. The AI lab debuted an up to date model of its SIMA agent, referred to as SIMA 2, that may navigate advanced, 3D digital worlds, together with these from completely different video video games. In contrast to earlier methods that solely adopted easy instructions, SIMA 2 can perceive broader objectives, maintain quick conversations, and determine multi-step plans by itself. In checks, it carried out much better than its predecessor and got here near human gamers on many duties, even in video games it had by no means seen earlier than. Notably, SIMA 2 can even educate itself new expertise by setting its personal challenges and studying from trial and error. The paper exhibits progress in the direction of AI that may act, adapt, and study in environments slightly than simply analyze textual content or pictures. The method, which relies on reinforcement studying—a way the place an agent learns by trial and error to perform a objective—ought to assist energy extra succesful digital assistants and, ultimately, real-world robots. You possibly can learn the paper right here.
AI CALENDAR
Jan. 6: Fortune Brainstorm Tech CES Dinner. Apply to attend right here.
Jan. 19-23: World Financial Discussion board, Davos, Switzerland.
Feb. 10-11: AI Motion Summit, New Delhi, India.
BRAIN FOOD
Is it protected? Just a few weeks in the past, the Way forward for Life Institute (FLI) launched its newest AI Security Index, a report that grades main AI labs on how they’re doing on a spread of security standards. A transparent hole has emerged between three of the main AI labs and just about everybody else. OpenAI, Google, and Anthropic all obtained grades within the “C” vary. Anthropic and OpenAI each scored a C+, with Anthropic narrowly beating OpenAI on its complete security rating. Google DeepMind’s stable C was an enchancment from the C- it scored when FLI final graded the sphere on their security efforts again in July. However the remainder of the pack is doing a reasonably poor job. X.ai and Meta and DeepSeek all obtained Ds, whereas Alibaba, which makes the favored open supply AI mannequin Qwen, bought a D-. (DeepSeek’s grade was truly a step up from the F it obtained in the summertime.)
Regardless of this considerably dismal image, FLI CEO Max Tegmark—ever an optimist—advised me he truly sees some excellent news within the outcomes. Not solely did all of the labs pull up their uncooked scores by no less than some extent, extra AI corporations agreed to submit information to FLI so as to be graded. Tegmark sees this as proof that the AI Security Index is beginning to have its meant impact of making “a race to the highest” on AI security. However Tegmark additionally permits that each one three of the top-marked AI labs noticed their scores for “present harms” from AI—such because the adverse impacts their fashions can have on psychological well being—slip since they have been assessed in the summertime. And in terms of potential “existential dangers” to humanity, not one of the labs will get a grade above D. In some way that doesn’t cheer me.
FORTUNE AIQ: THE YEAR IN AI—AND WHAT’S AHEAD
Companies took large steps ahead on the AI journey in 2025, from hiring Chief AI Officers to experimenting with AI brokers. The teachings realized—each good and dangerous–mixed with the know-how’s newest improvements will make 2026 one other decisive yr. Discover all of Fortune AIQ, and skim the most recent playbook beneath:
–The three developments that dominated corporations’ AI rollouts in 2025.
–2025 was the yr of agentic AI. How did we do?
–AI coding instruments exploded in 2025. The primary safety exploits present what might go fallacious.
–The massive AI New 12 months’s decision for companies in 2026: ROI.
–Companies face a complicated patchwork of AI coverage and guidelines. Is readability on the horizon?