AI News - Precision Talent

admin
Jun, Thu, 2026

AI News

MIT affiliates win 2026 Hertz Foundation Fellowships

The Hertz Foundation announced that it awarded 2026 fellowships to three current MIT students as well as an incoming graduate student. They are: Annika Marschner, Alvin Q. Meng, Zachary S. Siegel, and Matthew Wanta.

The prestigious science and technology award provides each recipient with five years of financial support — a stipend and full tuition equivalent — which gives them an unusual measure of autonomy to pursue ground-breaking research in their graduate work.

“What particularly impresses me about this cohort is their fearlessness in taking on new challenges and advancing the frontiers of science,” says Philip Welkhoff, a Hertz Fellow and director of the malaria program at the Gates Foundation, who co-led the selection process. “Each has exhibited tremendous creativity, grit, and vision, and I cannot wait to see what each accomplishes with the freedom to innovate provided by the Hertz Fellowship.”

In addition to funding, fellows receive lifelong access to Hertz Foundation programs including events, mentoring, and networking opportunities, with the over 1,300 fellows named since the fellowship was established in 1963. The connections forged among these individuals have sparked collaborative startups, research, and commercialization in a range of technology, science, and engineering fields. Hertz Fellows have contributed to breakthroughs in such areas as advanced medical therapies, global defense networks, and the James Webb Space Telescope.

This year’s MIT-affiliated recipients are among a total of 19 Hertz Foundation Fellows scholars selected from across the United States.

Annika Marschner ’26 majored in mechanical engineering and will begin her PhD at MIT in the fall. Her undergraduate research centered on the development of novel technologies for both biointerfacing and bio-inspired systems, including a custom benchtop stereoscope-compatible incubator and extrusion-based desktop bioprinter for MIT’s Raman Lab, a light-based filamented bioprinting system for ETH Zürich’s Tissue Engineering and Biofabrication Lab, and large-scale hardware designs for robotic systems in MIT’s Biomimetic Robotics Lab. Marschner’s undergraduate thesis focused on improving the speed and dexterity of dynamic motions in bio-inspired robotic limbs. As a graduate student, she plans to continue her work on both hardware and control system design in biologically relevant settings, especially in the areas of assistive medical technology and surgical robotics.

Alvin Q. Meng is doctoral student in inorganic chemistry focusing on understanding the fundamental interactions underlying chemical structure and reactivity. He is currently studying iron-sulfur clusters under the guidance of Professor Daniel L.M. Suess. Born in Tianjin, China, Meng immigrated to the United States at the age of 10. He received undergraduate degrees in chemistry and mathematics from the University of Virginia, where he worked in the research group of Professor W. Dean Harman. His research involved the synthesis and characterization of dihapto-coordinated tungsten complexes of cyclopentadiene, focusing on a class of unusual binuclear species containing a carbon–carbon bond linking two metal-bound five-membered rings.

Zachary S. Siegel is an electrical engineering and computer science graduate student pursuing a PhD in the Computer Science and Artificial Intelligence Laboratory, where he works at the intersection of robotics, cognitive science, and artificial intelligence. He graduated summa cum laude from Princeton University with a BSE in computer science and a minor in philosophy, receiving honors including Tau Beta Pi, Sigma Xi and the Outstanding Computer Science Independent Work Prize. His senior thesis, advised by Tom Griffiths and Jacob Andreas, investigated how humans infer the goals of others in open-ended, real-world environments. Siegel demonstrated how Bayesian inference serves as an accurate model of people’s goal predictions by comparing partial observations to a learned library of possible plans weighted by their prior likelihood. His doctoral research goal is to build machines that learn and reason more like people — systems that can learn from limited data and generalize to new situations by combining robot planning and Bayesian inference. Siegel is particularly interested in combinatorial generalization: the human capacity to compose known skills in novel ways to solve previously unseen problems without additional demonstrations. At MIT, he is advised by Leslie P. Kaelbling, Tomás Lozano-Pérez, and Joshua B. Tenenbaum.

Matthew Wanta is an incoming doctoral student who will begin operations research at MIT in the fall. He is a class of 2026 graduate of the United States Military Academy at West Point with a bachelor’s degree in computer science and mathematical sciences, both with honors. His work centered on machine learning for autonomous systems, integrating probabilistic modeling and computer vision into cooperative drone search and swarm control frameworks. In collaboration with DEVCOM Armaments Center, Wanta developed computer vision models for detecting energetic defects in artillery munitions, enabling rapid, nonintrusive quality control in defense manufacturing. His work with U.S. Special Operations Command and Army C5ISR organizations focused on autonomous aerial search and sensing, where he built simulation architectures for probabilistic target localization and multi-agent coordination. Wanta served as company commander for Bravo Company, 2nd Regiment; president of Upsilon Pi Epsilon; and vice president of Phi Kappa Phi. He is an Astronaut Scholar and Sapper School graduate, and commissioned as an Army officer in the Cyber Corps.

admin
Jun, Thu, 2026

AI News

OpenAI to acquire Ona

OpenAI plans to acquire Ona to expand Codex with secure, persistent cloud environments, enabling long-running AI agents across enterprise workflows.

admin
Jun, Thu, 2026

AI News

Amazon Data Centers In Mississippi Have Already Raised Electricity Rates for Local Customers, Report Suggests

The AI industry has been pushing a narrative that the technology is a “black box” whose inner workings are so complex that they remain unknown even to the people making it. But another black box of AI is the underlying cost of the technology, and, specifically, what the AI boom is costing people who live near massive data centers. The data centers and energy plants that power large language models and other generative AI tools are subject to contracts cloaked in non-disclosure agreements and in many cases shielded from public scrutiny on the pretext that they contain competitive information.

A new report written by consultancy Synapse and commissioned by advocacy groups Earthjustice and Environmental Advocates Mississippi attempts to calculate the cost of 3 planned Amazon data centers to Entergy Mississippi customers, who share an energy utility with the centers. These hidden costs may offer a window into the broader burden borne by residents living near data centers around the country. The report estimates that residential customers of Entergy Mississippi, one of the state’s regional energy monopolies, have paid $38 million as of March 2026 for infrastructure and other costs related to data centers and will have paid $74 million by the end of the year.

The average Entergy Mississippi customer is now paying at least an extra $10.60 a month to finance the data centers, the report says. It amounts to a 7 percent bill increase at a time when gas prices, choked supply chains and cuts to federal benefits are already hurting Americans. Entergy customers do not see costs for data centers highlighted separately in their bills.

According to report author Ben Havumaki, this only represents the costs that Entergy Mississippi customers have paid so far, and bills will likely rise.

“We know as a matter of fact that Entergy has made far more investments in service of data centers already and that the total..will be far in excess of that amount,” Havumaki told 404 Media.

The assessment was made by examining public dockets filed by Entergy Mississippi as well as the company’s Securities and Exchange Commission (SEC) filings. While Mississippi law makes a specific cost breakdown of energy bills difficult to uncover, the authors traced a line item used to specify costs of large load energy infrastructure to make their assessment.

In 2024, Amazon announced it was building two new data centers in Madison County and in 2025 announced plans for a data center in Warren County.

To power the data centers, Entergy announced three new gas-fired plants in 2025 in Greenville, Ridgeland, and Vicksburg, two of which are replacing existing gas plants, as well as two solar facilities for a total cost of nearly $4 billion.

Yolanda Daniel is a member of Environmental Advocates Mississippi, which helped commission the report and opposes the data center. Daniel says that the home that she grew up in is steps from the proposed gas-powered plant in Ridgeland Entergy is building. Daniel, who spent 30 years out of the state before returning to the area last year, first learned about the power plant driving down the road dividing Madison and Hines County, where she saw a sign notifying residents of a zoning board hearing. She said she and others helped pack the hearing in opposition.

“We named all the harms, all the studies, all the science,” Daniel says. While the Ridgeland zoning board initially voted down Entergy’s permit to examine the land, the Board of Aldermen went ahead with the plans anyway. Ridgeland Mayor Gene McGee said, “Nobody will even know it’s there, no pollution that sort of thing, and it’ll bring a lot of business to Ridgeland and Madison County,” according to the Magnolia Tribune.

Four homeowners associations, including one Daniel belongs to, filed an administrative complaint against the gas plant.

Entergy’s public messaging about the data centers focuses on the company using its newfound revenue from Amazon to make grid improvements that will lower customers’ bills in the long term. Haley Fisackerly, the company’s CEO, has argued that though energy bills are going up, they are going up at a slower pace than if the data centers were not built.

In a June 8 press release, Fisackerly touted the company’s previously announced “Superpower Mississippi” plan, which includes $300 million of grid improvements he says will save customers money by, “improving reliability and reducing power outages through stronger materials, tree trimming measures and technology-driven distribution network upgrades.” He says the improvements are funded by Amazon and Avaio, which constructs data centers. Fisackerly says that this is in addition to $600 million grid improvements the company already had planned.

The announcement assumes that Entergy would have replaced the two power plants regardless and makes hard-to-prove assertions about energy efficiency.

But the fact that Entergy Mississippi is already charging its customers for the construction of those energy plants is more straightforward, according to the Synapse report.

While Entergy Mississippi’s rate increases are typically restricted to 4 percent a year under state law, a 2024 law called SB2001 allows the company to raise rates in excess of that to fund the construction of energy plants that power data centers.

The fees show up in public dockets as an “interim facilities rate adjustment,” which is how Synapse reached its calculation of costs to residential customers. $8.7 million in fees associated with the Delta Blues Advanced Power Station were charged to residential customers, as are $46.7 million in costs related to data center projects whose specifics are unknown.

While in theory costs other than data center infrastructure could be present in this line item, “We see no evidence that that is occurring,” the author of the report, Ben Havumaki, told 404 Media. That’s because this line item was zeroed out before Entergy Mississippi began making its data center energy buildout, he said.

Entergy Mississippi shares a parent company with Entergy Louisiana, which approved three new gas plants last year to power Meta’s data center in Richland Parish, Louisiana. Entergy Louisiana has now pitched an additional seven gas plants to serve Meta’s facility.

The report also takes issue with a March claim by Entergy that agreements with data centers will actually be saving customers in three states (Arkansas, Mississippi and Louisiana) $5 billion over the next two decades. Synapse says “it is possible that data centers could be offsetting some or all of their incremental costs through separate financial arrangements with Entergy,” but there is no way of confirming this because filings between Entergy and data center operators are kept confidential.

Mississippi is a uniquely difficult state to verify Entergy’s claims that customer’s bills are being subsidized by Amazon or other tech companies. SB2001 cloaks the public service commission’s review of energy contracts from public view, designating them “a trade secret” and exempting them from the state’s freedom of information law. The law limits the Mississippi Public Service Commission’s role in making sure that data centers are distributing their costs evenly to energy utility customers. It also exempts state agencies from competitive bidding requirements when courting data centers. This means, “they can just put the shovel in the ground and start building themselves immediately without proving that they are the least costly option,” Havumaki says.

When Entergy says Amazon’s data center is saving customers money, “It’s basically [saying] trust us, we’ve done the math and know that it works out better for you,” Havumaki says. Havumaki also notes that infrastructure costs related to data centers have skyrocketed, so Amazon has an incentive to hide costs.

The 2024 law also makes it impossible for the public service commission to adjust how much Amazon pays for its energy bills later on.

According to the law, public utilities can enter into agreements with a large customer, “without reference to the rates” set according to the state’s public utilities statute. ” SB2001 also says the utility can’t alter or edit the agreement between Entergy and the data center customer later on.

According to the report, this means, “once the Entergy-[Amazon]contract sets a cost allocation, that allocation is locked in. The Commission cannot revisit it even if future rate proceedings reveal that it is unfair to other customers. “

While the commission can’t change rates that Amazon or other tech companies pay for energy, it still has the ability to stop charging residents for energy plant construction related to data centers. But Havumaki is skeptical this would happen.

“It’s highly unlikely that any commissioner would disallow recovery of any of these investments, because there is so much momentum behind this whole process,” he says.

When reached for comment about the Synapse report, a spokesperson for Entergy sent a statement saying that, “Entergy Mississippi customers are not subsidizing data centers — they’re benefitting from them. Independent regulators in Mississippi, Arkansas, and Louisiana confirm that data centers are paying their fair share, plus additional benefits for customers.”

When it comes to Entergy’s hidden contracts with Amazon and other tech companies, the spokesperson said, “Customer confidentiality doesn’t reduce accountability. The facts are clear: Technology investment is making power in Mississippi more reliable, more affordable, and more competitive.” The company did not answer any specific questions about the interim facilities rate adjustment that shows residential customers are paying for data center infrastructure.

Amazon commissioned a report on the costs of its data centers to customers. The report found that Amazon was paying, “sufficient or surplus net revenue,” meaning that Entergy could be using its profits to subsidize other customers, but that “the use of this additional margin is at the utility’s discretion.”

The Synapse report ends with a recommendation that Entergy commit that data centers’ energy needs not be subsidized by other customers. To make the process more transparent, Entergy should have a standard contract with customer protection provisions that it uses for data center customers.

To prevent “stranded assets,” or costs incurred by customers for infrastructure that ends up abandoned or unused, the report recommends charging a minimum rate to the data center regardless of use, as well as “exit fees” if the data center closes.

“These are really uncontroversial, widely adopted provisions to ensure a baseline of customer protection, a baseline of transparency, and actually hold Entergy’s feet to the fire,” Havumaki said.

admin
Jun, Thu, 2026

AI News

The benchmark gap, explained: What AI leaderboards measure and what they miss

Somewhere out there, a model changelog is promising “significant reasoning improvements.” And somewhere else, an engineering team is staring at a production incident that the benchmark scores completely missed.

The benchmark gap, explained: What AI leaderboards measure and what they miss

These two things are related.

Every frontier model now scores above 88% on MMLU. GPT-5.3 Codex sits at 93%.

At that ceiling, score differences between models are statistical noise, and the benchmark that defined AI progress for years has become functionally useless for comparing top-tier systems.

Research published in late 2025 found a 37% gap between lab benchmark scores and real-world deployment performance for enterprise agentic AI systems.

Production had other ideas…

💡

This is benchmark theater: evaluation performed as spectacle, with the substance stripped out. If you have ever watched a model ace every eval you threw at it and then hallucinate its way through a production workflow on day one, you already know exactly what this article is about.

Pull up a chair and let’s begin…

How benchmarks became a leaderboard sport

The origin story

The original purpose of benchmarks like MMLU, GSM8K, and HumanEval was genuinely reasonable. Standardized tests let researchers compare models across institutions, track progress over time, and surface capability gaps.

Good stuff.

The problem arrived when benchmark scores became the primary currency for model marketing, at which point “measuring capability” became “winning the leaderboard.”

Where the incentives went wrong

Once scores started driving funding decisions, press coverage, and enterprise procurement, the incentive to optimize for the test rather than underlying capability became structurally inevitable.

Labs are staffed with brilliant researchers who understand exactly which training decisions move benchmark numbers. Some of that optimization reflects genuine improvement.

Some of it is, if we are being honest, just very well-compensated teaching to the test.

The contamination problem runs deeper than most teams realize

Data contamination is the most documented failure mode in benchmark evaluation, and also the most politely ignored one. LLMs are trained on web-scale corpora, and those corpora routinely include benchmark questions, answer keys, and worked solutions.

Claude responded

Empirical audits have found contamination levels ranging from 1% to 45% across popular QA benchmarks, with rates growing as benchmarks age. Turns out the internet is a terrible place to keep your test answers private.

Why mitigation strategies fall short

The standard fixes are less effective than assumed:

Paraphrasing questions provides minimal protection: research at ACL 2025 found LLMs often circumvent these transformations because they have already been trained on the obfuscated formats
Translation and context tweaks face the same problem: a model that has seen a paraphrased version of a GSM8K problem during pretraining is still a contaminated model. Just a more devious one
N-gram overlap and hash-based matching catch the obvious cases, but semantic similarity and cross-lingual leakage are substantially harder to detect at scale

💡

The deeper issue is that training corpora are so large that labs themselves have limited certainty about what is inside them. Nobody loves admitting that, but there it is.

What the numbers actually measure

Here is what benchmark saturation looks like in practice as of early 2026:

MMLU and MMLU-Pro: functionally saturated above 88% for frontier models, making score differences at the top statistically meaningless for procurement decisions
GSM8K: frontier models now reach 99% (GPT-5.3 Codex), rendering it useful only for evaluating smaller or fine-tuned models against base variants
MATH-500: at 96% for leading models, approaching the same ceiling that made MMLU uninformative
GPQA Diamond: sitting at 94.3% for frontier models despite being designed as a graduate-level science benchmark just two years ago.

The benchmark gap, explained: What AI leaderboards measure and what they miss

Enter humanity’s last exam

Humanity’s Last Exam (HLE), developed by the Center for AI Safety and Scale AI and published in Nature in January 2026, was specifically designed to resist this saturation.

Built from 2,500 questions sourced from nearly 1,000 subject-matter experts across 500 institutions, it filtered to problems that stumped GPT-4o and Claude 3.5 Sonnet at launch.

💡

The results are clarifying. The best frontier models currently score around 35% on HLE. Human domain experts average 90%.

That 55-point gap is a far more honest picture of where these models actually sit on genuinely hard reasoning tasks, and a useful corrective the next time a model changelog promises “significant reasoning improvements.”

The structural mismatch between benchmarks and production

Even a perfectly uncontaminated benchmark has a deeper problem: it measures a model in isolation on a fixed task, which is rarely how AI systems actually get used. A model evaluated on clean, well-formed prompts in a controlled environment is essentially a driver who only ever practiced in an empty parking lot.

Confident.

Fast.

Completely unprepared for the school run.

As MIT Technology Review has argued, AI systems are almost always deployed in ways that differ fundamentally from how they are benchmarked.

What production actually throws at your model

Production environments introduce variables that static benchmarks are structurally unable to capture:

Prompt injection attacks and adversarial inputs from real users (who are creative, bored, and occasionally out to cause chaos)
Latency constraints and SLA requirements that affect which responses are actually usable in practice
Cost variation: the CLEAR framework research found 50x cost variation across enterprise agentic systems achieving similar accuracy scores
Reliability degradation at volume: consistency dropping from 60% to 25% under production load conditions, per the same research
Compliance and policy requirements that standard benchmarks leave entirely unaddressed

💡

The 37% lab-to-production gap in agentic systems is a direct consequence of benchmarks optimizing for task completion accuracy while enterprises need holistic performance across all of the above.

A model that scores 91% on SWE-bench Verified may still stumble on the prompt injection, access control, and error recovery requirements of an actual production coding agent. The leaderboard has yet to add a column for “falls over when a user pastes something unexpected.”

The emerging evaluation stack

The research community has been building toward more defensible evaluation for several years.

The approaches gaining traction in 2026 share a common logic: make the benchmark harder to game by making it harder to predict.

Benchmarks designed to stay ahead:

LiveBench refreshes tasks on a rolling schedule, sourcing from recent publications and events that fall after model training cutoffs
LiveCodeBench continuously collects newly released programming problems, so score increases must reflect genuine improvement rather than memorization
SWE-bench Verified moved from isolated function generation to real GitHub issues requiring working patches validated by unit tests. As of March 2026, Claude Opus 4.5 leads at 80.9%.

The layered enterprise approach

For enterprise teams, the Kili Technology benchmark guide published in May 2026 recommends stacking evaluation in three layers: automated metrics for coverage, LLM-as-a-judge for screening, and human expert review for domain-specific correctness.

💡

The human expert layer is the part most teams skip in the interest of speed. It is also the part that most reliably catches the failures that matter. Skipping it is roughly the evaluation equivalent of skipping the last mile of a marathon because you are almost there.

What rigorous evaluation actually looks like

An eval program that predicts production performance requires shifting the question from “what score does this model achieve?” to “does this model behave reliably under the conditions we will actually run it in?” That reframe sounds small. It changes everything about how you build your eval suite.

What a production-grade eval suite covers

A production-grade eval suite covers:

Task-specific evals built from your own data distribution, covering the edge cases and adversarial inputs that generic benchmarks ignore
Latency, cost-per-task, and failure mode tracking alongside accuracy, giving a picture that maps to real decisions
Multi-step task completion evaluated under realistic tool constraints for agentic systems, with human-in-the-loop checkpoints that reflect how the system will actually be operated

The teams making the most of enterprise AI in 2026 are running automated evaluations on every prompt, model, or tool change before deployment, according to AI agent adoption research published by Digital Applied in April 2026.

That discipline is tedious, unglamorous, and completely invisible to anyone who writes analyst reports about AI adoption.

It is also what separates the 14% of enterprises that have successfully scaled agents to production from the 78% still running pilots and wondering why things keep breaking.

Final thoughts

Benchmark scores are a useful starting point for model selection. The problem is the industry has spent years treating them as a finishing point, and the gap between leaderboard performance and production reality is the bill coming due.

💡

The good news: rigorous evaluation is a solvable problem. The tooling is maturing, the frameworks exist, and the teams who have done the work are seeing the results.

The honest ask is committing the time and resources to build eval programs that reflect your actual deployment conditions rather than the idealized ones that happen to match the standard benchmarks.

“The benchmark said it was fine” is an answer that production environments will test, patiently, every single day. The better answer is knowing exactly where your model stands before it ever gets there.

admin
Jun, Thu, 2026

AI News

How an astrophysicist uses Codex to help simulate black holes

Discover how astrophysicist Chi-kwan Chan uses Codex to build black hole simulations, helping scientists study extreme physics and test Einstein’s theory of general relativity.

admin
Jun, Wed, 2026

AI News

Scientists Just Accidentally Discovered a Strange, Hidden Rule of Human Nature

🌘

Subscribe to 404 Media to get The Abstract, our newsletter about the most exciting and mind-boggling science news and studies of the week.

Scientists Just Accidentally Discovered a Strange, Hidden Rule of Human Nature

Scientists have discovered that people walking in crowds tend to spontaneously turn counterclockwise—regardless of the environment, from schoolyards to busy settings—a surprise finding that “may represent a manifestation of a deeper biological principle of symmetry breaking,” according to a study published in Nature Communications on Wednesday.

The bizarre finding was made essentially by accident; during the Covid-19 pandemic, researchers led by Iñaki Echeverría Huarte, a professor who studies pedestrian dynamics at the University of Navarra in Spain, studied the movements of pedestrians as part of a project to inform public health guidance on social distancing measures. But the videos revealed something unexpected—a consistent pattern of people turning counterclockwise when switching direction.

“The discovery was a serendipitous one (as sometimes happens in science),” Huarte told 404 Media in an email exchange that also included study co-author Claudio Feliciani, a professor who studies crowd dynamics at the University of Tokyo. “Since then, we have completed a series of experiments in Spain to test several hypotheses.”

“Curiously, during a conference where I was presenting the first part of this story, Claudio and I got talking and thought together: why not run an experiment in Japan?” he continued. “We were convinced the rotation would flip there, for several reasons (cultural ones, and the different type of avoidance behaviour that exists in Japan compared with Spain). However…it did not.”

Indeed, over the course of several experiments that took place in different environments in Spain and Japan, the counterclockwise bias persisted, suggesting that the team may have stumbled on a hidden rule of behavior. This preference showed up whether people were walking alone, or as part of a group, suggesting that it emerges from individuals, rather than as a collective phenomenon that is only present in crowds.

Scientists Just Accidentally Discovered a Strange, Hidden Rule of Human Nature — *Overhead shot of schoolyard in Spain. Image: ©2026 Echeverría-Huarte et al. CC-BY-ND*

“We are now only sure that it is not a collective but an individual bias, and that is very, very robust,” said Feliciani. However, the team stopped short of describing the bias as a “universal law” until more research is conducted, especially in more complex scenarios, such as emergency evacuations or dense crowds.

For this study, the researchers analyzed the movements of hundreds of participants, including adults who were instructed to move freely in different settings, teenagers playing in their schoolyard in Spain, and children at a nursery school in Japan. They accounted for individual variations such as handedness (left or right), age, as well as local social etiquette about expected behavior in crowds.

In each situation, the participants displayed a clear counterclockwise bias in the rotation of their bodies as they moved to a new direction. Each group also contained people who turned predominantly clockwise or showed no rotational bias, but they were fewer in number than the counterclockwise turners. The nursery school children showed an even stronger bias toward counterclockwise turns, suggesting that it may not be a learned behavior, but something biologically rooted.

“It is likely biomechanical, but exactly why is hard to tell,” said Feliciani. He added that this symmetry-breaking motion appears to be unusual in animals, and that “most animals show no bias, and humans are probably the exception or, for sure, a rare case.”

That said, the study outlined a few exceptions, including temnothorax ants, which tend to turn left while exploring, and budgies, which show preferences in certain lateral directions during flight.

Tip Jar

Huarte is working on follow-up studies that use virtual reality to shed light on the bias, but for now, this weird pattern remains unexplained. A better understanding of its origins could be useful for applications in busy settings like airports, museums, shopping centres, and other public spaces. It’s also an example of how unexpected behavior can be hidden in plain sight.

“I believe the real value of our discoveries lies in the fact that it can lead to other discoveries on how we process locomotor information and use them to move,” Feliciani concluded.

🌘

Subscribe to 404 Media to get The Abstract, our newsletter about the most exciting and mind-boggling science news and studies of the week.

admin
Jun, Wed, 2026

AI News

New framework for auditing machine unlearning

Algorithms & Theory

admin
Jun, Wed, 2026

AI News

From data to decisions: how LSEG is scaling trusted AI

See how LSEG uses OpenAI to scale trusted AI across its global business, accelerating insights, shrinking release cycles, and empowering 4,000 employees.

admin
Jun, Wed, 2026

AI News

Startup’s nuclear-inspired cooling system could make data centers more sustainable

The rise of artificial intelligence is riding on the back of an enormous data center expansion. Data centers are projected to account for anywhere from 9 to 17 percent of total electricity usage in the U.S. by the end of the decade. Today, around a third of data center electricity is devoted to cooling the chips that run AI models.

That’s the process Ferveret is working to make more efficient. The startup, founded by Reza Azizian, a former MIT postdoc in nuclear engineering, and Matteo Bucci, MIT’s Esther and Harold E. Edgerton Associate Professor in the Department of Nuclear Science and Engineering, is adapting an approach from nuclear reactors to cool chips using no water and significantly less electricity.

The company’s cooling system submerges computer servers in a specialized liquid that absorbs heat much more efficiently than air from a fan. What makes the solution different from other liquid cooling systems are the bubbles: Ferveret’s Adaptive Phase Cooling (APC) solution produces much smaller bubbles at the surface of the server, which detach more frequently, accelerating the heat transfer process.

Ferveret is already testing its solutions with companies including CleanSpark, the data center developer and operator, as well as FuriosaAI, an AI accelerator company, and Switch, one of the largest data center operators in the U.S.

In a recent study in collaboration with the Samueli Computer Science Department at the University of California at Los Angeles, Ferveret found its APC solution led to a 15 percent improvement in computational power efficiency compared to state-of-the-art liquid cooling solutions. By combining those savings with Ferveret’s power control system to optimize operating conditions, the company says it allows data centers to get 35 percent more tokens — small pieces of text or data — from their AI models with the same amount of power.

“Our goal is to make data centers as sustainable as possible and help them use every single watt of power to generate tokens, which are the most useful outputs,” Azizian says. “Our system enables the operation of more powerful chips, it helps data centers waste a lot less energy, and it accomplishes all that with zero water consumption.”

From nuclear reactors to AI

Azizian was a postdoc at MIT in 2013 when he met Bucci, who was then a research scientist. They worked on heat transfer in nuclear reactors before Azizian went into industry, where he shifted his focus to cooling chips. Azizian first worked on Microsoft’s HoloLens augmented reality headset and then joined Nvidia, which produces the graphical processing units companies use to train and run the latest AI models. Meanwhile, Bucci continued conducting research at MIT, becoming an assistant professor in 2016.

Azizian walked into his first data center in 2017, where he was struck by the massive, noisy fans that filled the building as they cooled.

“I thought, ‘Holy crap, this is not how you cool facilities,’” Azizian recalls, noting air cooling can still take up 40 percent of the power going into a data center. “It was not an efficient way of doing things, but since it wasn’t hurting the performance, no one cared that the cooling technology was 50 years old.”

Azizian began talking with Bucci about applying their knowledge around optimizing heat transfer in nuclear reactors to data centers. Scientists have spent decades finding better ways to move heat in nuclear reactors.

“Heat transfer determines how much energy you can extract from the reactor core, which translates directly to revenue,” Azizian explains.

The founders started Ferveret in 2021. A lot has changed since Azizian walked into his first data center. Chip companies have packed more and more components onto their chips as the explosion in artificial intelligence has put a premium on squeezing as much computing capacity as possible out of limited power supplies.

That has driven data center operators to use liquid to cool chips — often through a technique known as immersion cooling that submerges chips in liquid. The most effective form of immersion cooling brings the liquid to a boil.

“Liquid is a better heat transfer medium than air. That’s why when you stick your hand into room temperature water it still feels cold,” Bucci explains. “When liquid is boiling, it becomes even better at removing heat because the phase change requires a lot of energy, which is the energy you remove from the chip. That lets you transfer large quantities of heat with minimal temperature differences between the chips and the liquid.”

Unfortunately, boiling liquid adds complexity to the system because it forces operators to capture and reliquefy the bubbles while controlling for pressure, temperature, and fluid inventory.

Ferveret’s system is adapted from a process in nuclear reactors called subcooled boiling. It uses a liquid with a low boiling point and none of the toxic PFAS “forever chemicals” that other approaches rely on. At the surface of the chip, Ferveret’s liquid produces smaller bubbles than other immersion cooling approaches. Those bubbles detach more frequently and quickly recondense in the surrounding liquid, accelerating the bubble-rewetting cycle at the surface of the chip to hasten heat transfer.

Ferveret delivers its APC system in small boxes, each of which houses one server. The founders say their modular systems make it easier to deploy the system and simplify maintenance.

“The physics enable us to get to form factors that weren’t possible in the past,” Azizian says. “Most immersion cooling solutions are large tanks that people submerge the servers in. We have a smaller, modular rack-mounted solution that makes it adaptable to the current infrastructure, so it’s easier for people to deploy our technology.”

Ferveret also offers control software that adjusts the power going to each server in real-time to further improve efficiency.

“We deliver full-stack systems that include the cooling box, the rack, the cooling distribution units, and sensors that measure the temperature and pressure,” Bucci says. “Our software monitors those sensors and optimizes the operating condition inside each box to ensure that energy consumption is minimized in the system.”

AI with fewer resources

In addition to helping data centers to run more efficiently, Ferveret is also improving sustainability by making it easier to operate data centers in remote regions with more renewable energy.

“The sun shines in places where you don’t have much water, so the advantage of us being water-free is we allow you to build data centers where you have solar energy but nothing to cool the data center down,” Bucci says. “This technology can help deploy data centers in regions where normally you wouldn’t have the resources to do so, including Africa, the Middle East, and of course parts of America. It’s a huge unlock.”

Ferveret is in talks with the large cloud computing companies known as hyperscalers, and is currently part of Nvidia’s Inception program for startups. The company plans to announce expanded partnerships later this year. From there, the founders plan to quickly scale their technology to help the AI industry continue to grow without further straining the planet.

“The computing industry is facing a huge challenge in the form of access to power, and they have a problem with access to water in many regions,” Azizian says. “That will only become more limiting as the industry grows. The main goal for these data center operators would be to get more tokens from the power they have. We’ve shown we can do that.”

admin
Jun, Wed, 2026

AI News

The consequences of relying on AI for accurate news

It’s no secret that the last few years have seen a massive explosion in the use of artificial intelligence for general information-gathering. An even more recent trend, though, is how large language models (LLMs) like ChatGPT, Claude, and Gemini are increasingly being used for verifying and consuming news; reports from the Pew Research Center over the last year found that one-in-five U.S. teens regularly use LLMs to get their news, while one-in-four young adults have reported using them for that purpose at least once.

A new open-access study from the MIT Media Lab should give some of those users pause: Researchers found that, over the course of a month, participants who relied on AI systems to verify facts actually got worse at detecting misinformation on their own when their chatbots were taken away.

This phenomenon, which is often referred to as the “AI dependency paradox,” has been observed in a wide range of knowledge domains, like the 2025 study that found that doctors who used AI got worse at detecting cancer on their own. The dynamic mirrors broader tech trends around so-called “deskilling” (or “cognitive offloading”) that have been well-documented for decades, from calculators weakening our math skills to Global Positioning System (GPS) technologies impacting our natural sense of direction.

In the new Media Lab study, which tracked 67 people over four weeks as they evaluated news headline-image pairs, participants were 21 percent more accurate in detecting fake news when assisted by an AI chatbot during a session — confirming previous research out of the MIT Sloan School of Management demonstrating that AI can be an effective tool in reducing people’s beliefs in false information.

However, the study showed that a new wrinkle emerged when the AI was no longer present: By week four, participants’ unassisted performance on new news items declined by 15 percentage points compared to before the study started. (Roughly a quarter of all participants actually reported feeling that they were getting better at detection, even as their performance declined.)

Dunning-Kruger creeps in

“Users get excited about these ‘magical’ LLMs, but forget that they’re just statistical models that predict the next ‘token’ in a sequence [of letters/words],” says MIT media arts and sciences (MAS) PhD student Anku Rani, co-lead author of a new paper about the research, alongside fellow MAS PhD student Valdemar Danry. “Many impressive behaviors emerge from scaling this, but it comes with real limitations, both in what the model can reliably generate and in its broader impact on the people using it.”

Qualitative analysis identified distinct behavioral patterns, with the team labeling one-fifth of all participants as “Dependency Developers” who gradually shifted from active self-reliance to passive acceptance of AI guidance.

In the post-experiment survey, one respondent explicitly acknowledged this transition, noting their passive role in the process. “While [the chatbots] did emphasize that you must check across multiple sources to make sure a story is true, they didn’t teach me much about exploring the context of the images themselves,” the participant said.

The research team said that these AI models are particularly vulnerable to mistakes in the midst of emotionally charged breaking news, as exhibited by the widespread misinformation that accompanied President Trump’s recent assassination attempt and major events during the Iranian war. (The authors also point out that the original human-created news content that’s used to train the AI models is increasingly unreliable and/or biased, further exacerbating the problem.)

The paper, which Danry and Rani presented at the 2026 CHI Conference on Human Factors in Computing Systems, was co-authored by Assistant Professor Paul Pu Liang, Senior Research Scientist Andrew Lippman, and senior author Pattie Maes, the Germeshausen Professor of Media Arts and Sciences.

The solution: Being a coach, not a crutch

The researchers say that the results of their project suggest that the specific way in which an AI interacts with a user determines whether its impact will be “as a coach, versus as a crutch.” The study found a clear distinction between conversational strategies that simply help in the moment and those that actually support active learning and skill development.

For the latter, the Media Lab team uncovered several strategies associated with stronger independent detection later on, even if the strategies initially slowed down performance during the interaction. This included the Socratic method of the AI asking guided questions, as well as so-called “deep probing,” where the system provides gently persuasive statements if the user appears to be veering away from the correct response.

“AIs that ‘tell’ by providing direct answers are more likely to foster reliance, while those that ‘ask’ via Socratic questioning are better at engaging someone to actually learn how to discern the truth on their own,” says Danry. “But it’s very much a trade-off between speed and effort.”

Rani noted a few key limitations to the one-month study, from the small dataset of roughly 50 validated news items to the demographic focus on the United States and the United Kingdom. In the future, she says that the team hopes to do similar experiments with more geographically diverse cohorts, including low-resource communities, and is also eager to explore whether other multi-modal interaction strategies — like interacting with culturally adaptive digital twins instead of text-based chatbots — help people improve their abilities to detect misinformation.

At a higher level, the researchers hope that the project will be something that educators can examine as they develop teaching plans that incorporate AI tools into their school curricula.

“It’s especially important to raise awareness in our schools and academic communities about the shortcomings of using AI as learning tools,” says Maes. “People need to know that if they ‘delegate’ their thinking, they’re not going to get better at that particular brand of problem-solving. Ultimately, the ability to question and analyze information is important for everyone, because it empowers us to solve problems and form our own independent opinions about the world.”

Danry adds that the rapidly-evolving field of machine learning and deep learning will require continuous education on the benefits and drawbacks of LLMs.

“There’s a lot of work to do in making sure that we don’t just fully offload critical tasks that we want to be able to keep on doing to these models,” he says. “We need to develop a new kind of AI literacy.”

The research project was supported, in part, by the Media Lab Consortium, an MIT Tata Center Technology and Design Fellowship, and a Google PhD Fellowship in Human–Computer Interaction.

Category AI News

How benchmarks became a leaderboard sport

Where the incentives went wrong

The contamination problem runs deeper than most teams realize

Claude responded

What the numbers actually measure

Enter humanity’s last exam

The structural mismatch between benchmarks and production

What production actually throws at your model

The emerging evaluation stack

The layered enterprise approach

What rigorous evaluation actually looks like