Precision Talent

Loading

Blog

MIT researchers use AI to uncover atomic defects in materials

In biology, defects are generally bad. But in materials science, defects can be intentionally tuned to give materials useful new properties. Today, atomic-scale defects are carefully introduced during the manufacturing process of products like steel, semiconductors, and solar cells to help improve strength, control electrical conductivity, optimize performance, and more.

But even as defects have become a powerful tool, accurately measuring different types of defects and their concentrations in finished products has been challenging, especially without cutting open or damaging the final material. Without knowing what defects are in their materials, engineers risk making products that perform poorly or have unintended properties.

Now, MIT researchers have built an AI model capable of classifying and quantifying certain defects using data from a noninvasive neutron-scattering technique. The model, which was trained on 2,000 different semiconductor materials, can detect up to six kinds of point defects in a material simultaneously, something that would be impossible using conventional techniques alone.

“Existing techniques can’t accurately characterize defects in a universal and quantitative way without destroying the material,” says lead author Mouyang Cheng, a PhD candidate in the Department of Materials Science and Engineering. “For conventional techniques without machine learning, detecting six different defects is unthinkable. It’s something you can’t do any other way.”

The researchers say the model is a step toward harnessing defects more precisely in products like semiconductors, microelectronics, solar cells, and battery materials.

“Right now, detecting defects is like the saying about seeing an elephant: Each technique can only see part of it,” says senior author and associate professor of nuclear science and engineering Mingda Li. “Some see the nose, others the trunk or ears. But it is extremely hard to see the full elephant. We need better ways of getting the full picture of defects, because we have to understand them to make materials more useful.”

Joining Cheng and Li on the paper are postdoc Chu-Liang Fu, undergraduate researcher Bowen Yu, master’s student Eunbi Rha, PhD student Abhijatmedhi Chotrattanapituk ’21, and Oak Ridge National Laboratory staff members Douglas L Abernathy PhD ’93 and Yongqiang Cheng. The paper appears today in the journal Matter.

Detecting defects

Manufacturers have gotten good at tuning defects in their materials, but measuring precise quantities of defects in finished products is still largely a guessing game.

“Engineers have many ways to introduce defects, like through doping, but they still struggle with basic questions like what kind of defect they’ve created and in what concentration,” Fu says. “Sometimes they also have unwanted defects, like oxidation. They don’t always know if they introduced some unwanted defects or impurity during synthesis. It’s a longstanding challenge.”

The result is that there are often multiple defects in each material. Unfortunately, each method for understanding defects has its limits. Techniques like X-ray diffraction and positron annihilation characterize only some types of defects. Raman spectroscopy can discern the type of defect but can’t directly infer the concentration. Another technique known as transmission electron microscope requires people to cut thin slices of samples for scanning.

In a few previous papers, Li and collaborators applied machine learning to experimental spectroscopy data to characterize crystalline materials. For the new paper, they wanted to apply that technique to defects.

For their experiment, the researchers built a computational database of 2,000 semiconductor materials. They made sample pairs of each material, with one doped for defects and one left without defects, then used a neutron-scattering technique that measures the different vibrational frequencies of atoms in solid materials. They trained a machine-learning model on the results.

“That built a foundational model that covers 56 elements in the periodic table,” Cheng says. “The model leverages the multihead attention mechanism, just like what ChatGPT is using. It similarly extracts the difference in the data between materials with and without defects and outputs a prediction of what dopants were used and in what concentrations.”

The researchers fine-tuned their model, verified it on experimental data, and showed it could measure defect concentrations in an alloy commonly used in electronics and in a separate superconductor material.

The researchers also doped the materials multiple times to introduce multiple point defects and test the limits of the model, ultimately finding it can make predictions about up to six defects in materials simultaneously, with defect concentrations as low as 0.2 percent.

“We were really surprised it worked that well,” Cheng says. “It’s very challenging to decode the mixed signals from two different types of defects — let alone six.”

A model approach

Typically, manufacturers of things like semiconductors run invasive tests on a small percentage of products as they come off the manufacturing line, a slow process that limits their ability to detect every defect.

“Right now, people largely estimate the quantities of defects in their materials,” Yu says. “It is a painstaking experience to check the estimates by using each individual technique, which only offers local information in a single grain anyway. It creates misunderstandings about what defects people think they have in their material.”

The results were exciting for the researchers, but they note their technique measuring the vibrational frequencies with neutrons would be difficult for companies to quickly deploy in their own quality-control processes.

“This method is very powerful, but its availability is limited,” Rha says. “Vibrational spectra is a simple idea, but in certain setups it’s very complicated. There are some simpler experimental setups based on other approaches, like Raman spectroscopy, that could be more quickly adopted.”

Li says companies have already expressed interest in the approach and asked when it will work with Raman spectroscopy, a widely used technique that measures the scattering of light. Li says the researchers’ next step is training a similar model based on Raman spectroscopy data. They also plan to expand their approach to detect features that are larger than point defects, like grains and dislocations.

For now, though, the researchers believe their study demonstrates the inherent advantage of AI techniques for interpreting defect data.

“To the human eye, these defect signals would look essentially the same,” Li says. “But the pattern recognition of AI is good enough to discern different signals and get to the ground truth. Defects are this double-edged sword. There are many good defects, but if there are too many, performance can degrade. This opens up a new paradigm in defect science.”

The work was supported, in part, by the Department of Energy and the National Science Foundation.

OpenAI acquires TBPN

OpenAI acquires TBPN to accelerate global conversations around AI and support independent media, expanding dialogue with builders, businesses, and the broader tech community.

Accelerating the next phase of AI

OpenAI raises $122 billion in new funding to expand frontier AI globally, invest in next-generation compute, and meet growing demand for ChatGPT, Codex, and enterprise AI.

How AIRA2 breaks AI research bottlenecks

How AIRA2 breaks AI research  bottlenecks

The promise of AI agents that can conduct genuine scientific research has long captivated the machine learning community, and, let’s be honest, slightly haunted it too. 

A new system called AIRA2, developed by researchers at Meta’s FAIR lab and collaborating institutions, represents a significant leap forward in this quest…

The three walls holding back AI research (and the hidden bottlenecks within them)

Previous attempts at building AI research agents keep hitting the same ceilings. The team behind AIRA2 identified key bottlenecks that limit progress, no matter how much compute is thrown at the problem.

  • Limited compute throughput Most agents run synchronously on a single GPU, sitting idle while experiments complete. This drastically slows iteration and caps exploration.
  • Too few experiments per day Because of this bottleneck, agents can only test ~10–20 candidates daily—far too low to meaningfully search a massive solution space.
  • The generalization gap Instead of improving over time, agents often get worse, chasing short-term gains that don’t hold up.
  • Metric gaming and evaluation noise Agents exploit flaws in their own evaluation, benefiting from lucky data splits or unnoticed bugs that distort results.
  • Rigid, single-turn promptsPredefined actions like “write code” or “debug” break down in complex scenarios, leaving agents stuck when tasks become multi-step or unpredictable.
How AIRA2 breaks AI research  bottlenecks

Engineering solutions for each bottleneck

AIRA2 addresses each bottleneck through specific architectural innovations.

To solve the compute problem, the system uses an asynchronous multi-GPU worker pool. Think of it as having eight hands instead of one; suddenly, multitasking becomes less of a fantasy. 

While one worker trains a model on its dedicated GPU, the orchestrator dispatches new experiments to others, compressing days of sequential work into hours.

For the generalization gap, AIRA2 implements a Hidden Consistent Evaluation (HCE) protocol. 

The system splits data into three sets:

  • Training data the agent can see
  • A hidden search set for evaluating candidates
  • A validation set used only for final selection
💡
Crucially, the agent never sees the labels for the search or validation sets, preventing it from gaming the metrics or getting too clever for its own good. All evaluation happens externally in isolated containers, with fixed data splits throughout the search.

To overcome static operator limitations, AIRA2 replaces fixed prompts with ReAct agents that can reason and act autonomously. 

These sub-agents can:

  • Perform exploratory data analysis
  • Run quick experiments
  • Inspect error logs
  • Iteratively debug issues

Instead of failing when encountering an unexpected error, they can investigate, hypothesize, and try multiple fixes within the same session, more like a determined researcher, less like a script that gives up after one exception.

The story of Sora: What it reveals about building real-world AI
After ChatGPT’s breakthrough, the race to define the next frontier of generative AI accelerated. One of the most talked-about innovations was OpenAI’s Sora, a text-to-video AI model that promised to transform digital content creation.
How AIRA2 breaks AI research  bottlenecks

Proving the approach works

The researchers evaluated AIRA2 on MLE-bench-30, a collection of 30 Kaggle machine learning competitions ranging from computer vision to natural language processing.

💡
Using 8 NVIDIA H200 GPUs and Google’s Gemini 3.0 Pro model, AIRA2 achieved a mean percentile rank of 71.8% at 24 hours, surpassing the previous best of 69.9%.

More impressively, it continued improving to 76.0% at 72 hours, while previous systems typically degraded with extended runtime, like marathon runners who forgot to train.

The ablation studies revealed crucial insights

Removing the parallel compute capability dropped performance by over 12 percentile points at 72 hours.

Without the hidden evaluation protocol, performance plateaued after 24 hours and showed no improvement with additional compute (a very expensive way to stand still).

The ReAct agents proved especially valuable early in the search, providing a 5.5 percentile point boost at 3 hours by enabling more efficient exploration.

Perhaps most revealing was the finding about overfitting

By implementing consistent evaluation, the researchers discovered that the performance degradation seen in prior work wasn’t due to data memorization at all.

Instead, it stemmed from evaluation noise and metric gaming. Once these sources of instability were controlled, agent performance improved monotonically with additional compute (finally behaving the way everyone had hoped it would in the first place).

Building hybrid AI for financial crime detection
Here’s how consulting leader Valentin Marenich and his team built a hybrid AI system that combines machine learning, generative AI, and human oversight to deliver real-world results in a highly regulated environment.
How AIRA2 breaks AI research  bottlenecks

Real breakthroughs in action

Beyond the numbers, AIRA2 demonstrated moments of genuine scientific reasoning.

💡
On a molecular prediction task where all other agents failed to achieve any medal, AIRA2 noticed that a poorly performing model was training suspiciously fast, a red flag in machine learning if there ever was one.

Rather than discarding the approach, the agent inspected the logs, correctly diagnosed under-fitting, scaled up the model parameters, extended training time, and achieved a gold medal score.

Not bad for something that doesn’t need coffee breaks.

Similar breakthroughs occurred on other challenging tasks. On a text completion challenge, AIRA2 decomposed the problem into two learned subtasks, training separate models for detecting missing word positions and filling gaps.

On a fine-grained image classification task with 3,474 classes, it achieved the highest score among all evaluated agents by carefully ensembling multiple vision models with asymmetric loss functions, no small feat, even by human standards.


The path forward for AI-driven research

AIRA2 represents more than incremental progress.

By treating AI research as a distributed systems problem rather than just a reasoning challenge, it demonstrates that the key to scaling AI agents lies in addressing fundamental engineering bottlenecks.

The system’s ability to maintain consistent improvement over 72 hours of compute suggests we’re moving closer to agents that can conduct genuine, sustained scientific investigation, without quietly falling apart halfway through.

The implications extend beyond benchmark performance

As these systems mature, they could accelerate discovery across fields from drug development to materials science.

However, challenges remain.

The researchers acknowledge that distinguishing genuine reasoning from sophisticated pattern matching remains difficult, especially given potential contamination from publicly available solutions in training data.

💡
What AIRA2 proves definitively is that the barriers to effective AI research agents aren’t insurmountable.

With careful engineering to address compute efficiency, evaluation reliability, and operator flexibility, we can build systems that don’t just automate routine tasks but engage in the messy, iterative process of scientific discovery.

The gap between human and AI researchers continues to narrow, one bottleneck at a time.

How New York’s tech leaders are shaping the future
Artificial intelligence is transforming industries at breakneck speed, and New York is at the heart of this revolution.
How AIRA2 breaks AI research  bottlenecks

5 lessons we can learn from Sora: Hype vs reality

5 lessons we can learn from  Sora: Hype vs reality

For a brief moment, Sora seemed like the future of AI video generation. Then, almost as quickly as it appeared, it quietly disappeared.

Sora’s rise and disappearance offer a rare glimpse into the practical realities of developing cutting-edge AI. For AI leaders, engineers, and decision-makers, it provides a real-world view of what it takes to build scalable, commercially viable AI products. 

These lessons are essential for anyone hoping to turn AI research into lasting impact (without losing their sanity along the way).


1. Compute costs can limit even the most advanced AI models

Sora pushed the boundaries of multimodal AI, generating high-quality video from simple text prompts. The results were impressive, showing what AI can do when it combines natural language understanding with visual synthesis. 

Behind the shiny demos, however, economics told a different story…

Video generation consumes far more computational resources than text or image generation. 

Each video requires multiple GPU passes, massive memory bandwidth, and precise rendering pipelines. Running Sora at scale required significant GPU infrastructure, which made operating costs extremely high.

For organizations investing in AI infrastructure, the lesson is clear:

If your AI model’s scalability relies on high compute costs, innovation alone will not guarantee success. Even the fanciest AI can’t survive on wishful thinking.


2. Viral AI products may create lasting value

Sora captured immediate attention as a breakthrough in AI content generation, with early adoption surging thanks to curiosity and experimentation.

Engagement dropped quickly. Novelty does not equal necessity. 

While Sora impressed users with creative demos, it struggled to offer repeatable value for daily use. Tools integrated into professional workflows, such as AI copilots, automation platforms, or enterprise AI solutions, provide consistent value.

💡
For product teams, the takeaway is straightforward: building viral demos is exciting, but retention drives long-term success. Products must solve recurring problems or integrate seamlessly into user workflows.
  • Build for retention, not just reach
  • Prioritize workflow integration over wow-factor

The most successful AI products balance novelty with practicality, offering value that users return to day after day. Think of it as the difference between a fleeting TikTok trend and a tool you actually rely on at work.


3. Monetization strategies must be clear from day one

Sora also highlighted the challenges of monetizing cutting-edge AI technology. Its positioning in the AI business model landscape was unclear:

  • Expensive for mass free usage
  • Entertainment-focused for enterprise budgets
  • Early for a well-defined pricing strategy

While Sora generated excitement, companies struggled to find a path to revenue. The market rewards AI applications where ROI is measurable, including:

  • AI for productivity
  • AI for software development
  • AI for operational efficiency

These areas are experiencing accelerating enterprise AI adoption. Clear monetization strategies (subscription, usage-based, or enterprise licensing) turn AI innovation into sustainable products. In short: hype gets attention, but cash keeps the lights on.


4. Trust, IP, and governance are central concerns

Like many generative AI systems, Sora raised urgent questions about:

  • Copyright and intellectual property
  • Deepfake risks and synthetic media misuse
  • Ownership of AI-generated content

For companies deploying AI at scale, these issues are critical. Organizations must establish strong governance frameworks, compliance strategies, and ethical guidelines. 

💡
Trust is a core part of product design. Users and enterprises expect AI outputs to be compliant. Addressing governance can improve adoption and reduce legal or operational risks. Think of governance as the seatbelt of AI: you might be able to drive without it, but do you really want to test that theory?

5. Focus and resource allocation determine AI winners

Sora demonstrates the importance of focus and strategic resource allocation. OpenAI shifted its resources from Sora toward higher-impact areas, including:

In a world of limited compute, talent, and capital, every AI initiative competes for attention and investment. Success is determined by strategic prioritization.

The most effective AI strategy is to focus on initiatives that scale.

This requires leadership teams to make careful choices, balancing short-term excitement with long-term impact. Scaling AI involves building products that deliver sustained value.


Conclusion: From hype to execution

Sora illustrates a broader shift in the AI landscape. We are moving from:

  • Experimental innovation to Scalable AI Systems
  • Eye-catching demos to Production-Grade AI Applications
  • Hype-driven narratives to ROI-Driven Decision-Making

The future of AI rewards teams that combine technical excellence with practical deployment. Successful AI products deliver consistent, measurable value while navigating the constraints of cost, infrastructure, and trust.

Sora shows that while hype opens doors, execution defines winners. Today’s AI professionals must focus on building products that actually work in the real world, and maybe have a little fun along the way…

Solving accountability in multi-agent AI systems
All AI systems can fail, but now we can trace exactly who’s responsible. Implicit Execution Tracing (IET) embeds invisible signatures in AI outputs, making multi-agent systems accountable, auditable, and tamper-proof.
5 lessons we can learn from  Sora: Hype vs reality

Fighting financial crime with hybrid AI

Fighting financial crime with  hybrid AI

I’ve been in the data game long enough to see plenty of AI projects crash and burn. 

I started my career building data warehouses for telcos and banks, then moved into machine learning consulting, where I led hundreds of projects across industries. Now I’m leading data analytics and machine learning at Phenom, and I want to share something we recently built that actually works.

Let me be clear about what I mean when I say “Gen AI” here. I’m talking about LLMs and the tools built on top of them. The “old school ML” I’ll reference means those low-complexity supervised models we’ve been using for years, the ones that are fast, cheap, and reliable by nature.

AI swarms are here: How autonomous agents work together
If the last wave of AI felt like hiring a very smart intern, this one feels more like managing an entire organization that never sleeps (and occasionally argues with itself).
Fighting financial crime with  hybrid AI

The reality of building AI in fintech

Phenom provides banking solutions for SMEs across Europe, but at our core, we’re a B2B fintech scale-up. Each of these words carries weight.

Being B2B means every single client counts. We can’t mess around with client communications or operations. Everything that touches our clients’ needs to meet a certain standard, no exceptions.

Being a fintech means we love technology, sure, but we’re also bound by regulations. The Financial Crimes Enforcement Network doesn’t care how innovative your solution is if it doesn’t meet compliance standards.

And being a scale-up? That means we can’t afford AI theater. We have some budget for innovation and experimentation, but every investment needs to demonstrate real efficiency gains and positive ROI.

These constraints shaped our entire approach to AI and machine learning at Phenom. We’ve established two fundamental pillars that guide everything we build.

  • First, we successfully convinced leadership (all the way up to the board) that while AI is nice, having a solid data foundation and platform is even better. When you’re dealing with regulatory reporting or enabling better tactical and strategic business decisions, that foundation matters more than any flashy AI feature.
  • Second, we developed clear ground rules for when to use which technology. When we need stability and structured signals, we reach for traditional machine learning first. When we’re dealing with messy input data like customer reviews or unstructured text, we consider generative AI. 

High-risk scenarios involving financial crime, regulations, or customer care always get hybrid solutions with humans in the loop. Low-risk internal use cases? That’s where we let AI shine and can afford the occasional mistake.

Fighting financial crime with  hybrid AI

For expert advice like this straight to your inbox every other Friday, sign up for Pro+ membership.

You’ll also get access to 300+ hours of exclusive video content, a complimentary Summit ticket, and so much more.

So, what are you waiting for?


Get Pro+

Behind the Blog: Systems As Designed


Behind the Blog: Systems As Designed

This is Behind the Blog, where we share our behind-the-scenes thoughts about how a few of our top stories of the week came together. This week, we discuss crypto, journalists using AI, and a cool photo of Earth.

JOSEPH: I can’t talk about the story just yet, but recently I had to acquire some cryptocurrency quickly for research purposes. I was not anticipating quite how dramatically the world of cryptocurrency and getting it has changed.

I first became aware of cryptocurrency, or more specifically Bitcoin, when I was an intern at VICE. Someone on my table (they put all the unpaid interns on a medium sized table in the London office) was talking about it. They were pretty deep into it as I recall, and covered it a fair bit. I then was asked to work on a collaborative documentary between VICE, Raw, and the BBC about the Silk Road drug marketplace because I already knew more than most about message encryption. I then had to learn more about Bitcoin.