Gradient Labs gives every bank customer an AI account manager
Accelerating the next phase of AI
How AIRA2 breaks AI research bottlenecks
The promise of AI agents that can conduct genuine scientific research has long captivated the machine learning community, and, let’s be honest, slightly haunted it too.
A new system called AIRA2, developed by researchers at Meta’s FAIR lab and collaborating institutions, represents a significant leap forward in this quest…
The three walls holding back AI research (and the hidden bottlenecks within them)
Previous attempts at building AI research agents keep hitting the same ceilings. The team behind AIRA2 identified key bottlenecks that limit progress, no matter how much compute is thrown at the problem.
- Limited compute throughput Most agents run synchronously on a single GPU, sitting idle while experiments complete. This drastically slows iteration and caps exploration.
- Too few experiments per day Because of this bottleneck, agents can only test ~10–20 candidates daily—far too low to meaningfully search a massive solution space.
- The generalization gap Instead of improving over time, agents often get worse, chasing short-term gains that don’t hold up.
- Metric gaming and evaluation noise Agents exploit flaws in their own evaluation, benefiting from lucky data splits or unnoticed bugs that distort results.
- Rigid, single-turn promptsPredefined actions like “write code” or “debug” break down in complex scenarios, leaving agents stuck when tasks become multi-step or unpredictable.

Engineering solutions for each bottleneck
AIRA2 addresses each bottleneck through specific architectural innovations.
To solve the compute problem, the system uses an asynchronous multi-GPU worker pool. Think of it as having eight hands instead of one; suddenly, multitasking becomes less of a fantasy.
While one worker trains a model on its dedicated GPU, the orchestrator dispatches new experiments to others, compressing days of sequential work into hours.
For the generalization gap, AIRA2 implements a Hidden Consistent Evaluation (HCE) protocol.
The system splits data into three sets:
- Training data the agent can see
- A hidden search set for evaluating candidates
- A validation set used only for final selection
To overcome static operator limitations, AIRA2 replaces fixed prompts with ReAct agents that can reason and act autonomously.
These sub-agents can:
- Perform exploratory data analysis
- Run quick experiments
- Inspect error logs
- Iteratively debug issues
Instead of failing when encountering an unexpected error, they can investigate, hypothesize, and try multiple fixes within the same session, more like a determined researcher, less like a script that gives up after one exception.

Proving the approach works
The researchers evaluated AIRA2 on MLE-bench-30, a collection of 30 Kaggle machine learning competitions ranging from computer vision to natural language processing.
More impressively, it continued improving to 76.0% at 72 hours, while previous systems typically degraded with extended runtime, like marathon runners who forgot to train.
The ablation studies revealed crucial insights
Removing the parallel compute capability dropped performance by over 12 percentile points at 72 hours.
Without the hidden evaluation protocol, performance plateaued after 24 hours and showed no improvement with additional compute (a very expensive way to stand still).
The ReAct agents proved especially valuable early in the search, providing a 5.5 percentile point boost at 3 hours by enabling more efficient exploration.
Perhaps most revealing was the finding about overfitting
By implementing consistent evaluation, the researchers discovered that the performance degradation seen in prior work wasn’t due to data memorization at all.
Instead, it stemmed from evaluation noise and metric gaming. Once these sources of instability were controlled, agent performance improved monotonically with additional compute (finally behaving the way everyone had hoped it would in the first place).

Real breakthroughs in action
Beyond the numbers, AIRA2 demonstrated moments of genuine scientific reasoning.
Rather than discarding the approach, the agent inspected the logs, correctly diagnosed under-fitting, scaled up the model parameters, extended training time, and achieved a gold medal score.
Not bad for something that doesn’t need coffee breaks.
Similar breakthroughs occurred on other challenging tasks. On a text completion challenge, AIRA2 decomposed the problem into two learned subtasks, training separate models for detecting missing word positions and filling gaps.
On a fine-grained image classification task with 3,474 classes, it achieved the highest score among all evaluated agents by carefully ensembling multiple vision models with asymmetric loss functions, no small feat, even by human standards.
The path forward for AI-driven research
AIRA2 represents more than incremental progress.
By treating AI research as a distributed systems problem rather than just a reasoning challenge, it demonstrates that the key to scaling AI agents lies in addressing fundamental engineering bottlenecks.
The system’s ability to maintain consistent improvement over 72 hours of compute suggests we’re moving closer to agents that can conduct genuine, sustained scientific investigation, without quietly falling apart halfway through.
The implications extend beyond benchmark performance
As these systems mature, they could accelerate discovery across fields from drug development to materials science.
However, challenges remain.
The researchers acknowledge that distinguishing genuine reasoning from sophisticated pattern matching remains difficult, especially given potential contamination from publicly available solutions in training data.
With careful engineering to address compute efficiency, evaluation reliability, and operator flexibility, we can build systems that don’t just automate routine tasks but engage in the messy, iterative process of scientific discovery.
The gap between human and AI researchers continues to narrow, one bottleneck at a time.

5 lessons we can learn from Sora: Hype vs reality
For a brief moment, Sora seemed like the future of AI video generation. Then, almost as quickly as it appeared, it quietly disappeared.
Sora’s rise and disappearance offer a rare glimpse into the practical realities of developing cutting-edge AI. For AI leaders, engineers, and decision-makers, it provides a real-world view of what it takes to build scalable, commercially viable AI products.
These lessons are essential for anyone hoping to turn AI research into lasting impact (without losing their sanity along the way).
1. Compute costs can limit even the most advanced AI models
Sora pushed the boundaries of multimodal AI, generating high-quality video from simple text prompts. The results were impressive, showing what AI can do when it combines natural language understanding with visual synthesis.
Behind the shiny demos, however, economics told a different story…
Video generation consumes far more computational resources than text or image generation.
Each video requires multiple GPU passes, massive memory bandwidth, and precise rendering pipelines. Running Sora at scale required significant GPU infrastructure, which made operating costs extremely high.
For organizations investing in AI infrastructure, the lesson is clear:
If your AI model’s scalability relies on high compute costs, innovation alone will not guarantee success. Even the fanciest AI can’t survive on wishful thinking.
2. Viral AI products may create lasting value
Sora captured immediate attention as a breakthrough in AI content generation, with early adoption surging thanks to curiosity and experimentation.
Engagement dropped quickly. Novelty does not equal necessity.
While Sora impressed users with creative demos, it struggled to offer repeatable value for daily use. Tools integrated into professional workflows, such as AI copilots, automation platforms, or enterprise AI solutions, provide consistent value.
- Build for retention, not just reach
- Prioritize workflow integration over wow-factor
The most successful AI products balance novelty with practicality, offering value that users return to day after day. Think of it as the difference between a fleeting TikTok trend and a tool you actually rely on at work.
3. Monetization strategies must be clear from day one
Sora also highlighted the challenges of monetizing cutting-edge AI technology. Its positioning in the AI business model landscape was unclear:
- Expensive for mass free usage
- Entertainment-focused for enterprise budgets
- Early for a well-defined pricing strategy
While Sora generated excitement, companies struggled to find a path to revenue. The market rewards AI applications where ROI is measurable, including:
- AI for productivity
- AI for software development
- AI for operational efficiency
These areas are experiencing accelerating enterprise AI adoption. Clear monetization strategies (subscription, usage-based, or enterprise licensing) turn AI innovation into sustainable products. In short: hype gets attention, but cash keeps the lights on.
4. Trust, IP, and governance are central concerns
Like many generative AI systems, Sora raised urgent questions about:
- Copyright and intellectual property
- Deepfake risks and synthetic media misuse
- Ownership of AI-generated content
For companies deploying AI at scale, these issues are critical. Organizations must establish strong governance frameworks, compliance strategies, and ethical guidelines.
5. Focus and resource allocation determine AI winners
Sora demonstrates the importance of focus and strategic resource allocation. OpenAI shifted its resources from Sora toward higher-impact areas, including:
- Enterprise AI tools
- AI coding assistants
- Agent-based systems
In a world of limited compute, talent, and capital, every AI initiative competes for attention and investment. Success is determined by strategic prioritization.
The most effective AI strategy is to focus on initiatives that scale.
This requires leadership teams to make careful choices, balancing short-term excitement with long-term impact. Scaling AI involves building products that deliver sustained value.
Conclusion: From hype to execution
Sora illustrates a broader shift in the AI landscape. We are moving from:
- Experimental innovation to Scalable AI Systems
- Eye-catching demos to Production-Grade AI Applications
- Hype-driven narratives to ROI-Driven Decision-Making
The future of AI rewards teams that combine technical excellence with practical deployment. Successful AI products deliver consistent, measurable value while navigating the constraints of cost, infrastructure, and trust.
Sora shows that while hype opens doors, execution defines winners. Today’s AI professionals must focus on building products that actually work in the real world, and maybe have a little fun along the way…

Thanks for registering
Check your inbox, you’ll receive an email with your link to join.
See you soon.
Fighting financial crime with hybrid AI
I’ve been in the data game long enough to see plenty of AI projects crash and burn.
I started my career building data warehouses for telcos and banks, then moved into machine learning consulting, where I led hundreds of projects across industries. Now I’m leading data analytics and machine learning at Phenom, and I want to share something we recently built that actually works.
Let me be clear about what I mean when I say “Gen AI” here. I’m talking about LLMs and the tools built on top of them. The “old school ML” I’ll reference means those low-complexity supervised models we’ve been using for years, the ones that are fast, cheap, and reliable by nature.

The reality of building AI in fintech
Phenom provides banking solutions for SMEs across Europe, but at our core, we’re a B2B fintech scale-up. Each of these words carries weight.
Being B2B means every single client counts. We can’t mess around with client communications or operations. Everything that touches our clients’ needs to meet a certain standard, no exceptions.
Being a fintech means we love technology, sure, but we’re also bound by regulations. The Financial Crimes Enforcement Network doesn’t care how innovative your solution is if it doesn’t meet compliance standards.
And being a scale-up? That means we can’t afford AI theater. We have some budget for innovation and experimentation, but every investment needs to demonstrate real efficiency gains and positive ROI.
These constraints shaped our entire approach to AI and machine learning at Phenom. We’ve established two fundamental pillars that guide everything we build.
- First, we successfully convinced leadership (all the way up to the board) that while AI is nice, having a solid data foundation and platform is even better. When you’re dealing with regulatory reporting or enabling better tactical and strategic business decisions, that foundation matters more than any flashy AI feature.
- Second, we developed clear ground rules for when to use which technology. When we need stability and structured signals, we reach for traditional machine learning first. When we’re dealing with messy input data like customer reviews or unstructured text, we consider generative AI.
High-risk scenarios involving financial crime, regulations, or customer care always get hybrid solutions with humans in the loop. Low-risk internal use cases? That’s where we let AI shine and can afford the occasional mistake.

For expert advice like this straight to your inbox every other Friday, sign up for Pro+ membership.
You’ll also get access to 300+ hours of exclusive video content, a complimentary Summit ticket, and so much more.
So, what are you waiting for?
Journalist Sues FAA Over Drone No Fly Zone Designed to Prevent Filming ICE

Minnesota photojournalist Rob Levine and the Reporters Committee for Freedom of the Press are suing the Federal Aviation Administration over a recently issued restriction that prevents drones from flying within 3,000 feet of Department of Homeland Security buildings and vehicles, an amorphous no-fly zone that encompasses Immigrations and Customs Enforcement agents.
The FAA issued the temporary flight restriction (TFR) in January as ICE agents flooded the streets of Minneapolis. The rule established a no fly zone of 3,000 feet around “Department of Homeland Security facilities and mobile assets,” a restriction that Levine and his lawyers argue is impossible to follow and is aimed at curtailing the First Amendment rights of journalists.
“Because there is no means of verifying in advance whether DHS vehicles—such as unmarked cars driven by Immigration and Customs Enforcement agents—are operating in a given location, the practical consequence is that drone pilots nationwide cannot know whether a flight will expose them to liability,” Levine’s lawyers argued in a court document.
Levine lives in Minneapolis and spent the early days of Operation Metro Surge using his drone to capture footage of protests and ICE agents. Then the TFR hit. “It sent a shiver down my spine,” he told 404 Media. “I’m like ‘Oh my god.’ In a city like Minneapolis at the time with, I don’t know, three or four thousand DHS agents in various stages of uniform or undercoverness or civilian cars that they had switched license plates on? Masquerading as delivery men? They were everywhere here. I immediately grounded myself because there was no way you could know in advance whether or not you were violating that [flight restriction]. And when you’re flying they could drive by and you might not even know it.”
Grayson Clary, a lawyer with Reporters Committee for Freedom of the Press who is representing Levine, told 404 Media that the FAA has previously used flight restrictions in ways that seem designed to prevent newsgathering. “The FAA has a long history of imposing these temporary flight restrictions over newsworthy events in ways that frustrate journalists’ ability to cover protests, law enforcement’s response to protests, you name it, and this is sort of the newest escalation in that story,” he said.
This new no fly zone is a modification of an old TFR from 2025 that restricted drone pilots from operating within 3,000 feet of Department of Defense and Department of Energy bases.
“When you think about the old restriction, it’s essentially don’t fly within 3,000 feet of an enormous Naval vessel or a Department of Energy convoy that’s ferrying nuclear weapons around,” Clary said. “They just sort of added DHS to the end of that without taking stock of just how much more difficult it is to know whether you’re within 3,000 feet of a DHS ground vehicle as opposed to within 3,000 feet of a destroyer sitting in a Naval base.”
DHS isn’t forthcoming about the number of ICE agents in a given city or where they are operating. They often wear plainclothes, patrol cities in unmarked vehicles, and don’t announce themselves to people in the neighborhoods they patrol. Clary and Levine argued that the secretive nature of DHS has made it impossible for journalists to comply with the FAA’s no fly zone.
The penalties for violating the FAA restriction are severe. “They can take your drone and destroy it. They could shoot it down if they wanted to. They can arrest you and throw you in jail…and they can also make it so you can never fly a drone again,” Levine said. “It seems purely to prevent photo journalism and to chill photo journalists because the rule is so vague they could even charge you after the fact if they determined that you were somewhere and they had been near there.” The FAA has a history of trying to enforce drone restrictions against operators after the fact, based on footage or images posted on YouTube or social media sites.
Clary agreed. “That’s part of what makes this such a First Amendment problem is that it has a real chilling effect. When you don’t know where exactly the line is, you’re going to play it more carefully to make sure that you don’t accidentally cross it,” he said.
Levine has fought the FAA before on this issue and won. In 2016, just as he was first learning how to pilot drones for his photojournalism work, he traveled to North Dakota to cover the anti-oil pipeline protests at Standing Rock. At the time, the FAA had issued a TFR over the area but Levine was able to push the agency into granting him a waiver on First Amendment grounds.
DHS operates its own drones to aid its surveillance efforts. Last year it flew Predator drones above protests in Los Angeles and Minneapolis residents have taken a lot of footage capturing drones flying above homes in Minnesota.
Artemis II Astronauts Have ‘Two Microsoft Outlooks’ and Neither Work

In 1969, the three astronauts of the Apollo 10 mission conducted a momentous “dress rehearsal” for putting humans on the lunar surface for the first time. It was a historic, inspiring moment for humanity; Astronaut John Young watched from a command module spacecraft as Thomas Stafford and Gene Cernan broke away and flew a lunar module within 10 miles of the moon’s surface, then reunited to return home to Earth. It’s from this mission that we have one of the most powerful transcripts in NASA history:
“Who did what?” Young asked. “Where did that come from?” Cernan added.
“Give me a napkin quick,” Stafford said. “There’s a turd floating through the air.”
The provenance of the poop remains one of the great mysteries of spaceflight. Today, in the early Earth-morning hours of the Artemis II astronauts’ history-mirroring mission around the moon, we have another: Why is Microsoft Outlook not working in space?
A Secure Chat App’s Encryption Is So Bad It Is ‘Meaningless’

TeleGuard, an app that markets itself as a secure, end-to-end encrypted messaging platform which has been downloaded more than a million times, implements its encryption so poorly that an attacker can trivially access a user’s private key and decrypt their messages, multiple security researchers told 404 Media. TeleGuard also uploads users’ private keys to a company server, meaning TeleGuard itself could decrypt its users’ messages, and the key can also at least partially be derived from simply intercepting a user’s traffic, the researchers found.
The news highlights something of the wild west of encrypted messaging apps, where not all are created equal.
“No storage of data. Highly encrypted. Swiss made,” the website for TeleGuard reads. The site also says, “The chats as well as voice and video calls are end-to-end encrypted.”







