Precision Talent

Loading

Blog

Solving the “Whac-a-mole dilemma”: A smarter way to debias AI vision models

In today’s hospitals and clinics, a dermatologist may use an artificial intelligence model for classifying skin lesions to assess if the lesion is at risk of developing into a cancer or if it is benign. But if the model is biased toward certain skin tones, it could fail to identify a high-risk patient.

Perhaps one of the best known and most persistent challenges that AI research continues to reckon with is bias. Bias is often discussed in relation to training data, but model architecture can also contain and amplify bias, negatively influencing model performance in real-world settings. In high-stakes medical scenarios, the very real consequences of poor performance have made bias into a quintessential safety issue.

A new paper from researchers at MIT, Worcester Polytechnic Institute, and Google that was accepted to the 2026 International Conference for Learning Representations proposes a novel debiasing approach called “Weighted Rotational DebiasING” (i.e., WRING) that can be applied to vision language models (VLMs), like OpenAI’s OpenCLIP.

VLMs are multi-modal models that can understand and interpret different data modalities like video, image, and text simultaneously. While debiasing approaches for VLMs do exist, the most commonly used approach is known as “projection debiasing,” which leads to what has been termed the “Whac-A-Mole dilemma”, an empirical observation that was formally introduced to AI research in 2023.

Projection debiasing is a post-processing approach that removes the undesirable, biased information from model embeddings by “projecting” the subspace out of a representation space of relationships, thereby cutting out the bias. But this approach has its drawbacks.

“When you do that, you inadvertently squish everything around,” says Walter Gerych, the paper’s first author, who conducted this research last year as a postdoc at MIT. “All the other relationships that the model learns change when you do that.”

Gerych, who is now an assistant professor of computer science at Worcester Polytechnic Institute, is joined on the paper by MIT graduate students Cassandra Parent and Quinn Perian; Google’s Rafiya Javed; and MIT associate professors of electrical engineering Justin Solomon and Marzyeh Ghassemi, who is an affiliate of the Abdul Latif Jameel Clinic for Machine Learning and Health and the Laboratory for Information and Decision Systems. 

While projection debiasing stops the model from acting upon the bias that’s been projected out of the subspace, it can end up amplifying and creating other biases, hence the Whac-A-Mole dilemma. According to Ghassemi, the unintended amplification of model biases is “both a technical and practical challenge. For instance, when debiasing a VLM that retrieves images of clinical staff — if racial bias is removed — it could have the unintended consequence of amplifying gender bias.” 

WRING works by moving certain coordinates within the high-dimensional space of a model — the ones that appear to be responsible for bias — to a different angle, so the model can no longer distinguish between different groups within a certain concept. This changes the representation within a specific space while leaving the model’s other relationships intact. And like projection debiasing, WRING is a post-processing approach, which means it can be applied “on the fly” to a pre-trained VLM. 

“People already spent a lot of resources, a lot of money, training these huge models, and we don’t really want to go in and modify something during training because then you have to start from scratch,” Gerych explains. “[WRING is] very efficient. It doesn’t require more training of the model and it’s minimally invasive.”

In their results, the researchers found that WRING significantly reduced bias for a target concept without increasing bias in other areas. But for now, the approach is somewhat limited to Contrastive Language-Image Pre-training (CLIP) models, a type of VLM that connects images to language for search or classification.

“Extending this for ChatGPT-style, generative language models, is the reasonable next step for us,” says Gerych.

This work was supported, in part, by a National Science Foundation CAREER Award, AI2050 Award Early Career Fellowship, Sloan Research Fellow Award, the Gordon and Betty Moore Foundation Award, and MIT-Google Computing Innovation Award.

The MIT-IBM Computing Research Lab launches to shape the future of AI and quantum computing

The following is a joint announcement by the MIT Schwarzman College of Computing and IBM.

IBM and MIT today announced the launch of the MIT-IBM Computing Research Lab, advancing their long-standing collaboration to shape the next era of computing. The new lab expands its scope to include quantum computing, alongside foundational artificial intelligence research, with the goal of unlocking new computational approaches that go beyond the limits of today’s classical systems.

The MIT-IBM Computing Research Lab builds on a distinguished history of scientific excellence at the intersection of research and academia. Evolving from the MIT-IBM Watson AI Lab, which originated in 2017 on MIT’s campus, the new lab reflects a transformed technology landscape — one which AI has entered mainstream deployment, and quantum computing is rapidly advancing toward practical impact. Together, MIT and IBM aim to help lead research in AI and quantum and to redefine mathematical foundations across both domains.

“We expect the MIT-IBM Computing Research Lab to emerge as one of the world’s premier academic and industrial hubs accelerating the future of computing,” says Jay Gambetta, director of IBM Research and IBM Fellow, and IBM chair of the MIT-IBM Computing Research Lab. “Together, the brightest minds at MIT and IBM will rethink how models, algorithms, and systems are designed for an era that will be defined by the sum of what’s possible when AI and quantum computing come together.”

“For a decade, the collaboration between MIT and IBM has produced leading-edge research and innovation, and provided mentorship and supported the professional growth of researchers both at MIT and IBM,” says Anantha Chandrakasan, MIT’s provost, who, as then-dean of the School of Engineering, spearheaded the creation of the MIT-IBM Watson AI Lab and will continue as MIT chair of the lab. “The incredible technical achievements sets the bar high for our work together over the next 10 years. I look forward to another decade of impact.”

Addressing the next frontiers in computation

The MIT-IBM Computing Research Lab will serve as a focal point for joint research between MIT and IBM in AI, algorithms, and quantum computing, as well as the integration of these technologies into hybrid computing systems. The lab is designed to accelerate progress toward powerful new computational approaches that take advantage of rapid advances in AI and quantum-centric supercomputing, including those that combine maturing quantum hardware with classical systems and advanced AI methods.

This research initiative will include improving capabilities and integrating AI with traditional computing, alongside pursuing advances in small, efficient, modular language model architectures, novel AI computing paradigms, and enterprise-focused AI systems designed for deployment in real-world environments, where reliability, transparency, and trust are essential.

In parallel, the lab will rethink the mathematical and algorithmic foundations that underpin the next era of computing by accelerating the development of novel quantum algorithms for complex problems, with impacts in areas such as materials science, chemistry, and biology.

Additionally, the lab will investigate mathematical and algorithmic foundations of machine learning, optimization, Hamiltonian simulations, and partial differential equations, which are used to approximate the behaviors of dynamical systems that currently stump classical systems beyond limited scales and accuracy. Innovations from the lab could have wide implications for global industries, from more accurate weather and air turbulence prediction to better forecasts of financial market performance. Similarly, with improved optimization approaches, research from the lab could help lower risks in areas like finance, predict protein structures for more targeted medicine, and streamline global supply chains.

With its focus on AI, algorithms, and quantum, the MIT-IBM Computing Research Lab will complement and enhance the work of two of MIT’s strategic initiatives, the MIT Generative AI Impact Consortium and the MIT Quantum Initiative. MIT President Sally Kornbluth launched these strategic initiatives to broaden and deepen MIT’s impact in developing solutions to serious global challenges. The MIT-IBM Computing Research Lab will also leverage IBM’s longtime leadership and expertise in quantum computing. As part of its ambitious roadmap, IBM has laid out a clear path to delivering the world’s first fault-tolerant quantum computer by 2029, and is working across industries to drive value from quantum-centric supercomputing, tightly integrating quantum computers with high-performance computing and AI accelerators to solve the world’s toughest problems.

Deep integration with scientific domains

The MIT-IBM Computing Research Lab will also continue to serve as a foundation for training the next generation of computational scientists and innovators. It will do so by engaging faculty and students across MIT departments, enabling new computational approaches to accelerate discoveries in the physical and life sciences.

The lab will continue to be co-directed by Aude Oliva, senior research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory, and David Cox, vice president of AI Foundations at IBM Research. MIT and IBM have appointed leads for each of the lab’s three focus areas — AI, algorithms, and quantum. Jacob Andreas, associate professor in the Department of Electrical Engineering and Computer Science (EECS), and Kenney Ng, principal research scientist at IBM Research and the MIT-IBM science program manager, will co-lead AI; Vinod Vaikuntanathan, the Ford Foundation Professor of Engineering in EECS, and Vasileios Kalantzis, IBM Research senior research scientist, will co-lead algorithms; and Aram Harrow, professor of physics, and Hanhee Paik, IBM director of Quantum Algorithm Centers, will co-lead quantum.

“The MIT-IBM Computing Research Lab reflects an important expansion of the collaboration between MIT and IBM and the increasing connections across AI, algorithms, and quantum. This deepened focus also underscores a strong alignment with the MIT Schwarzman College of Computing’s mission to advance the forefront of computing and its integration across disciplines,” says Dan Huttenlocher, dean of the MIT Schwarzman College of Computing and MIT co-chair of the lab. “I’m excited about what this next chapter will enable in these three areas, and their impact broadly.”

Building on nearly a decade of collaboration

The MIT-IBM Watson AI Lab helped pioneer a model for academic-industry research collaboration, aligning long-term scientific inquiry with real-world impact. Since its inception, the lab has funded over 210 research projects involving over 150 MIT faculty members and over 200 IBM researchers. Collectively, the projects have led to over 1,500 peer-reviewed articles. The lab also helped shape the career growth of a number of MIT students and junior researchers, funding more than 500 students and postdocs.

“The true measure of this lab is not just innovation, but transformation of a field. Hundreds of students have contributed to thousands of publications in top conferences and journals, demonstrating their capabilities to address meaningful problems,” says Oliva. “The MIT-IBM Computing Research Lab builds on an extraordinary legacy of impact to advance a trusted collaboration that will redefine the future of AI and quantum computing in a way never seen before.”

“By coupling academic rigor with industrial scale, the lab aims to define the computational foundations that will power the next generation of AI, quantum, and scientific breakthroughs,” says Cox. “By bringing together advances in AI, algorithms, and quantum computing under one integrated research effort, we’re creating the conditions to rethink the mathematical and computational foundations of science and engineering.”

The MIT-IBM Computing Research Lab will capitalize on this foundation, expanding both the scientific scope and the ecosystem of collaborators across the Cambridge-Boston region and beyond.

Enabling privacy-preserving AI training on everyday devices

A new method developed by MIT researchers can accelerate a privacy-preserving artificial intelligence training method by about 81 percent. This advance could enable a wider array of resource-constrained edge devices, like sensors and smartwatches, to deploy more accurate AI models while keeping user data secure.

The MIT researchers boosted the efficiency of a technique known as federated learning, which involves a network of connected devices that work together to train a shared AI model.

In federated learning, the model is broadcast from a central server to wireless devices. Each device trains the model using its local data and then transfers model updates back to the server. Data are kept secure because they remain on each device.

But not all devices in the network have enough capacity, computational capability, and connectivity to store, train, and transfer the model back and forth with the server in a timely manner. This causes delays that worsen training performance.

The MIT researchers developed a technique to overcome these memory constraints and communication bottlenecks. Their method is designed to handle a heterogenous network of wireless devices with varied limitations.

This new approach could make it more feasible for AI models to be used in high-stakes applications with strict security and privacy standards, like health care and finance.

“This work is about bringing AI to small devices where it is not currently possible to run these kinds of powerful models. We carry these devices around with us in our daily lives. We need AI to be able to run on these devices, not just on giant servers and GPUs, and this work is an important step toward enabling that,” says Irene Tenison, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this technique.

Her co-authors include Anna Murphy ’25, a machine-learning engineer at Lincoln Laboratory; Charles Beauville, a visiting student from Ecole Polytechnique Fédérale de Lausanne (EPFL) in Switzerland and a machine-learning engineer at Flower Labs; and senior author Lalana Kagal, a principal research scientist in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT. The research will be presented at the IEEE International Joint Conference on Neural Networks. 

Reducing lag time

Many federated learning approaches assume all devices in the network have enough memory to train the full AI model, and stable connectivity to transmit updates back to the server quickly.

But these assumptions fall short with a network of heterogenous devices, like smartwatches, wireless sensors, and mobile phones. These edge devices have limited memory and computational power, and often face intermittent network connectivity.

The central server usually waits to receive model updates from all devices, then averages them to complete the training round. This process repeats until training is complete.

“This lag time can slow down the training procedure or even cause it to fail,” Tenison says.

To overcome these limitations, the MIT researchers developed a new framework called FTTE (Federated Tiny Training Engine) that reduces the memory and communication overhead needed by each mobile device.

Their framework involves three main innovations.

First, rather than broadcasting the entire model to all devices, FTTE sends a smaller subset of model parameters instead, reducing the memory requirement for each device. Parameters are internal variables the model adjusts during training.

FTTE uses a special search procedure to identify parameters that will maximize the model’s accuracy while staying within a certain memory budget. That limit is set based on the most memory-constrained device.

Second, the server updates the model using an asynchronous approach. Rather than waiting for responses from all devices, the server accumulates incoming updates until it reaches a fixed capacity, then proceeds with the training round.

Third, the server weights updates from each device based on when it received them. In this way, older updates don’t contribute as much to the training process. These outdated data can hold the model back, slowing the training process and reducing accuracy.

“We use this semi-asynchronous approach because want to involve the least powerful devices in the training process so they can contribute their data to the model, but we don’t want the more powerful devices in the network to stay idle for a long time and waste resources,” Tenison says.

Achieving acceleration

The researchers tested their framework in simulations with hundreds of heterogeneous devices and a variety of models and datasets. On average, FTTE enabled the training procedure to reach completing 81 percent faster than standard federated learning approaches.

Their method reduced the on-device memory overhead by 80 percent and the communication payload by 69 percent, while attaining near the accuracy of other techniques.

“Because we want the model to train as fast as possible to save the battery life of these resource-constrained devices, we do have a tradeoff in accuracy. But a small drop in accuracy could be acceptable in some applications, especially since our method performs so much faster,” she says.

FTTE also demonstrated effective scalability and delivered higher performance gains for larger groups of devices.

In addition to these simulations, the researchers tested FTTE on a small network of real devices with varying computational capabilities.

“Not everyone has the latest Apple iPhone. In many developing countries, for instance, users might have less powerful mobile phones. With our technique, we can bring the benefits of federated learning to these settings,” she says.

In the future, the researchers want to study how their method could be used to increase the personalized performance of AI models on each device, rather than focusing on the average performance of the model. They also want to conduct larger experiments on real hardware.

This work was funded, in part, by a Takeda PhD Fellowship.

A faster way to estimate AI power consumption

Due to the explosive growth of artificial intelligence, it is estimated that data centers will consume up to 12 percent of total U.S. electricity by 2028, according to the Lawrence Berkeley National Laboratory. Improving data center energy efficiency is one way scientists are striving to make AI more sustainable.

Toward that goal, researchers from MIT and the MIT-IBM Watson AI Lab developed a rapid prediction tool that tells data center operators how much power will be consumed by running a particular AI workload on a certain processor or AI accelerator chip.

Their method produces reliable power estimates in a few seconds, unlike traditional modeling techniques that can take hours or even days to yield results. Moreover, their prediction tool can be applied to a wide range of hardware configurations — even emerging designs that haven’t been deployed yet.

Data center operators could use these estimates to effectively allocate limited resources across multiple AI models and processors, improving energy efficiency. In addition, this tool could allow algorithm developers and model providers to assess potential energy consumption of a new model before they deploy it.

“The AI sustainability challenge is a pressing question we have to answer. Because our estimation method is fast, convenient, and provides direct feedback, we hope it makes algorithm developers and data center operators more likely to think about reducing energy consumption,” says Kyungmi Lee, an MIT postdoc and lead author of a paper on this technique.

She is joined on the paper by Zhiye Song, an electrical engineering and computer science (EECS) graduate student; Eun Kyung Lee and Xin Zhang, research managers at IBM Research and the MIT-IBM Watson AI Lab; Tamar Eilam, IBM Fellow, chief scientist of sustainable computing at IBM Research, and a member of the MIT-IBM Watson AI Lab; and senior author Anantha P. Chandrakasan, MIT provost, Vannevar Bush Professor of Electrical Engineering and Computer Science, and a member of the MIT-IBM Watson AI Lab. The research is being presented this week at the IEEE International Symposium on Performance Analysis of Systems and Software.

Expediting energy estimation

Inside a data center, thousands of powerful graphics processing units (GPUs) perform operations to train and deploy AI models. The power consumption of a particular GPU will vary based on its configuration and the workload it is handling.

Many traditional methods used to predict energy consumption involve breaking a workload into individual steps and emulating how each module inside the GPU is being utilized one step at a time. But AI workloads like model training and data preprocessing are extremely large and can take hours or even days to simulate in this manner.

“As an operator, if I want to compare different algorithms or configurations to find the most energy-efficient manner to proceed, if a single emulation is going to take days, that is going to become very impractical,” Lee says.

To speed up the prediction process, the MIT researchers sought to use less-detailed information that could be estimated faster. They found that AI workloads often have many repeatable patterns. They could use these patterns to generate the information needed for reliable but quick power estimation.

In many cases, algorithm developers write programs to run as efficiently as possible on a GPU. For instance, they use well-structured optimizations to distribute the work across parallel processing cores and move chunks of data around in the most efficient manner.

“These optimizations that software developers use create a regular structure, and that is what we are trying to leverage,” explains Lee.

The researchers developed a lightweight estimation model, called EnergAIzer, that captures the power usage pattern of a GPU from those optimizations.

An accurate assessment

But while their estimation was fast, the researchers found that it didn’t take all energy costs into account. For instance, every time a GPU runs a program, there is a fixed energy cost required for setting up and configurating that program. Then each time the GPU runs an operation on a chunk of data, an additional energy cost must be paid.

Due to fluctuations in the hardware or conflicts in accessing or moving data, a GPU might not be able to use all available bandwidth, slowing operations down and drawing more energy over time.

To include these additional costs and variances, the researchers gathered real measurements from GPUs to generate correction terms they applied to their estimation model.

“This way, we can get a fast estimation that is also very accurate,” she says.

In the end, a user can provide their workload information, like the AI model they want to run and the number and length of user inputs to process, and EnergAIzer will output an energy consumption estimation in a matter of seconds.

The user can also change the GPU configuration or adjust the operating speed to see how such design choices impact the overall power consumption.

When the researchers tested EnergAIzer using real AI workload information from actual GPUs, it could estimate the power consumption with only about 8 percent error, which is comparable to traditional methods that can take hours to produce results.

Their method could also be used to predict the power consumption of future GPUs and emerging device configurations, as long as the hardware doesn’t change drastically in a short amount of time.

In the future, the researchers want to test EnergAIzer on the newest GPU configurations and scale the model up so it can be applied to many GPUs that are collaborating to run a workload.

“To really make an impact on sustainability, we need a tool that can provide a fast energy estimation solution across the stack, for hardware designers, data center operators, and algorithm developers, so they can all be more aware of power consumption. With this tool, we’ve taken one step toward that goal,” Lee says.

This research was funded, in part, by the MIT-IBM Watson AI Lab.

MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone

Every year, the countries competing in the International Mathematical Olympiad (IMO) arrive with a booklet of their best, most original problems. Those booklets get shared among delegations, then quietly disappear. No one had ever collected them systematically, cleaned them, and made them available, not for AI researchers testing the limits of mathematical reasoning, and not for the students around the world training for these competitions largely on their own.

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), King Abdullah University of Science and Technology (KAUST), and the company HUMAIN have now done exactly that.

MathNet is the largest high-quality dataset of proof-based math problems ever created. Comprising more than 30,000 expert-authored problems and solutions spanning 47 countries, 17 languages, and 143 competitions, it is five times larger than the next-biggest dataset of its kind. The work will be presented at the International Conference on Learning Representations (ICLR) in Brazil later this month.

What makes MathNet different is not only its size, but its breadth. Previous Olympiad-level datasets draw almost exclusively from competitions in the United States and China. MathNet spans dozens of countries across six continents, covers 17 languages, includes both text- and image-based problems and solutions, and spans four decades of competition mathematics. The goal is to capture the full range of mathematical perspectives and problem-solving traditions that exist across the global math community, not just the most visible ones.

“Every country brings a booklet of its most novel and most creative problems,” says Shaden Alshammari, an MIT PhD student and lead author on the paper. “They share the booklets with each other, but no one had made the effort to collect them, clean them, and upload them online.”

Building MathNet required tracking down 1,595 PDF volumes totaling more than 25,000 pages, spanning digital documents and decades-old scans in more than a dozen languages. A significant portion of that archive came from an unlikely source: Navid Safaei, a longtime IMO community figure and co-author who had been collecting and scanning those booklets by hand since 2006. His personal archive formed much of the backbone of the dataset.

The sourcing matters as much as the scale. Where most existing math datasets pull problems from community forums like Art of Problem Solving (AoPS), MathNet draws exclusively from official national competition booklets. The solutions in those booklets are expert-written and peer-reviewed, and they often run to multiple pages, with authors walking through several approaches to the same problem. That depth gives AI models a far richer signal for learning mathematical reasoning than the shorter, informal solutions typical of community-sourced datasets. It also means the dataset is genuinely useful for students: Anyone preparing for the IMO or a national competition now has access to a centralized, searchable collection of high-quality problems and worked solutions from traditions around the world.

“I remember so many students for whom it was an individual effort. No one in their country was training them for this kind of competition,” says Alshammari, who competed in the IMO as a student herself. “We hope this gives them a centralized place with high-quality problems and solutions to learn from.”

The team has deep roots in the IMO community. Sultan Albarakati, a co-author, currently serves on the IMO board, and the researchers are working to share the dataset with the IMO foundation directly. To validate the dataset, they assembled a grading group of more than 30 human evaluators from countries including Armenia, Russia, Ukraine, Vietnam, and Poland, who coordinated together to verify thousands of solutions.

“The MathNet database has the potential to be an excellent resource for both students and leaders seeking new problems to work on or looking for the solution to a difficult question,” says Tanish Patil, deputy leader of Switzerland’s IMO. “Whilst other archives of Olympiad problems do exist (notably, the Contest Collections forums on AoPS), these resources lack standardized formatting system, verified solutions, and important problem metadata that topics and theory require. It will also be interesting to see how this dataset is used to improve the performance of reasoning models, and if we will soon be able to reliably answer an important issue when creating novel Olympiad questions: determining if a problem is truly original.”

MathNet also functions as a rigorous benchmark for AI performance, and the results reveal a more complicated picture than recent headlines about AI math prowess might suggest. Frontier models have made extraordinary progress: Some have reportedly achieved gold-medal performance at the IMO, and on standard benchmarks they now solve problems that would stump most humans. But MathNet shows that progress is uneven. Even GPT-5, the top-performing model tested, averaged around 69.3 percent on MathNet’s main benchmark of 6,400 problems, failing nearly one-in-three Olympiad-level problems. And when problems include figures, performance drops significantly across the board, exposing visual reasoning as a consistent weak point for even the most capable models.

Several open-source models scored 0 percent on Mongolian-language problems, highlighting another dimension where current AI systems fall short despite their overall strength.

“GPT models are equally good in English and other languages,” Alshammari says. “But many of the open-source models fail completely at less-common languages, such as Mongolian.”

The diversity of MathNet is also designed to address a deeper limitation in how AI models learn mathematics. When training data skews toward English and Chinese problems, models absorb a narrow slice of mathematical culture. A Romanian combinatorics problem or a Brazilian number theory problem may approach the same underlying concept from a completely different angle. Exposure to that range, the researchers argue, makes both humans and AI systems better mathematical thinkers.

Beyond problem-solving, MathNet introduces a retrieval benchmark that asks whether models can recognize when two problems share the same underlying mathematical structure, a capability that matters both for AI development and for the math community itself. Near-duplicate problems have appeared in real IMO exams over the years because finding mathematical equivalences across different notations, languages, and formats is genuinely hard, even for expert human committees. Testing eight state-of-the-art embedding models, the researchers found that even the strongest identified the correct match only about 5 percent of the time on the first try, with models frequently ranking structurally unrelated problems as more similar than equivalent ones.

The dataset also includes a retrieval-augmented generation benchmark, testing whether giving a model a structurally related problem before asking it to solve a new one improves performance. It does, but only when the retrieved problem is genuinely relevant. DeepSeek-V3.2-Speciale gained up to 12 percentage points with well-matched retrieval, while irrelevant retrieval degraded performance in roughly 22 percent of cases.

Alshammari wrote the paper with Safaei, HUMAIN AI engineer Abrar Zainal, KAUST Academy Director Sultan Albarakati, and MIT CSAIL colleagues: master’s student Kevin Wen SB ’25; Microsoft Principal Engineering Manager Mark Hamilton SM ’22, PhD ‘25; and professors William Freeman and Antonio Torralba. Their work was funded, in part, by the Schwarzman College of Computing Fellowship and the National Science Foundation.

MathNet is publicly available at mathnet.csail.mit.edu.

Teaching AI models to say “I’m not sure”

Confidence is persuasive. In artificial intelligence systems, it is often misleading.

Today’s most capable reasoning models share a trait with the loudest voice in the room: They deliver every answer with the same unshakable certainty, whether they’re right or guessing. Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have now traced that overconfidence to a specific flaw in how these models are trained, and developed a method that fixes it without giving up any accuracy.

The technique, called RLCR (Reinforcement Learning with Calibration Rewards), trains language models to produce calibrated confidence estimates alongside their answers. In addition to coming up with an answer, the model thinks about its uncertainty in that answer, and outputs a confidence score. In experiments across multiple benchmarks, RLCR reduced calibration error by up to 90 percent while maintaining or improving accuracy, both on the tasks the model was trained on and on entirely new ones it had never seen. The work will be presented at the International Conference on Learning Representations later this month.

The problem traces to a surprisingly simple source. The reinforcement learning (RL) methods behind recent breakthroughs in AI reasoning, including the training approach used in systems like OpenAI’s o1, reward models for getting the right answer, and penalize them for getting it wrong. Nothing in between. A model that arrives at the correct answer through careful reasoning receives the same reward as one that guesses correctly by chance. Over time, this trains models to confidently answer every question they are asked, whether they have strong evidence or are effectively flipping a coin.

That overconfidence has consequences. When models are deployed in medicine, law, finance, or any setting where users make decisions based on AI outputs, a system that expresses high confidence regardless of its actual certainty becomes unreliable in ways that are difficult to detect from the outside. A model that says “I’m 95 percent sure” when it is right only half the time is more dangerous than one that simply gets the answer wrong, because users have no signal to seek a second opinion.

“The standard training approach is simple and powerful, but it gives the model no incentive to express uncertainty or say I don’t know,” says Mehul Damani, an MIT PhD student and co-lead author on the paper. “So the model naturally learns to guess when it is unsure.” 

RLCR addresses this by adding a single term to the reward function: a Brier score, a well-established measure that penalizes the gap between a model’s stated confidence and its actual accuracy. During training, models learn to reason about both the problem and their own uncertainty, producing an answer and a confidence estimate together. Confidently wrong answers are penalized. So are unnecessarily uncertain correct ones.

The math backs it up: the team proved formally that this type of reward structure guarantees models that are both accurate and well-calibrated. They then tested the approach on a 7-billion-parameter model across a range of question-answering and math benchmarks, including six datasets the model had never been trained on.

The results showed a consistent pattern. Standard RL training actively degraded calibration compared to the base model, making models worse at estimating their own uncertainty. RLCR reversed that effect, substantially improving calibration with no loss in accuracy. The method also outperformed post-hoc approaches, in which a separate classifier is trained to assign confidence scores after the fact. “What’s striking is that ordinary RL training doesn’t just fail to help calibration. It actively hurts it,” says Isha Puri, an MIT PhD student and co-lead author. “The models become more capable and more overconfident at the same time.”

The team also demonstrated that the confidence estimates produced by RLCR are practically useful at inference time. When models generate multiple candidate answers, selecting the one with the highest self-reported confidence, or weighting votes by confidence in a majority-voting scheme, improves both accuracy and calibration as compute scales.

An additional finding suggests that the act of reasoning about uncertainty itself has value. The researchers trained classifiers on model outputs and found that including the model’s explicit uncertainty reasoning in the input improved the classifier’s performance, particularly for smaller models. The model’s self-reflective reasoning about what it does and doesn’t know contains real information, not just decoration.

In addition to Damani and Puri, other authors on the paper are Stewart Slocum, Idan Shenfeld, Leshem Choshen, and senior authors Jacob Andreas and Yoon Kim.

Jacob Andreas and Brett McGuire named Edgerton Award winners

MIT Associate Professor Jacob Andreas of the Department of Electrical Engineering and Computer Science [EECS] and MIT Associate Professor Brett McGuire of the Department of Chemistry have been selected as the winners of the 2026 Harold E. Edgerton Faculty Achievement Award. Established in 1982 as a permanent tribute to Institute Professor Emeritus Harold E. Edgerton’s great and enduring support for younger faculty members, this award is given annually in recognition of exceptional distinction in teaching, research, and service.

“The Department of Chemistry is extremely delighted to see Brett recognized for science that has changed how we think about carbon in space,” says Class of 1942 Professor of Chemistry and Department Head Matthew D. Shoulders. “Brett’s lab combines laboratory spectroscopy, radio astronomy, and sophisticated signal-analysis methods to pull definitive molecular fingerprints out of extraordinarily faint data. His discovery of polycyclic aromatic hydrocarbons in the cold interstellar medium has opened a powerful new window on astrochemistry. Moreover, Brett is inventing the creative and unique tools that make discoveries like this possible.”

“Jacob Andreas represents the very best of MIT EECS” says Asu Ozdaglar, EECS department head. “He is an innovative researcher whose work combines computational and linguistically informed approaches to build foundations of language learning. He is an extraordinary educator who has brought these forefront ideas into our core classes in natural language processing and machine learning. His ability to bridge foundational theory with real-world impact, while also advancing the social and ethical dimensions of computing, makes him truly deserving of the Edgerton Faculty Achievement Award.”

Andreas joined the MIT faculty in July 2019, and is affiliated with the Computer Science and Artificial Intelligence Laboratory. His work is in natural language processing (NLP), and more broadly in AI. He aims to understand the computational foundations of language learning, and to build intelligent systems that can learn from human guidance. Among other honors, Andreas has received Samsung’s AI Researcher of the Year award, MIT’s Kolokotrones and Junior Bose teaching awards, a 2024 Sloan Research Fellow award, and paper awards at the International Conference on Machine Learning and the Association for Computational Linguistics.

Andreas received his BS from Columbia University, his MPhil from Cambridge University (where he studied as a Churchill scholar), and his PhD in natural language processing from the University of California at Berkeley. His work in natural language processing has taken on thorny problems in the capability gap between humans and computers. “The defining feature of human language use is our capacity for compositional generalization,” explains Antonio Torralba, Delta Electronics Professor and faculty head of Artificial Intelligence and Decision-Making in the Department of EECS. “Many of the core challenges in natural language processing is addressed by simply training larger and larger neural models, but this kind of compositional generalization remains a persistent difficulty, and without the ability to generalize compositionally, the deep learning toolkit will never be robust enough for the most challenging real-world NLP tasks. Jacob’s work on compositional modeling draws new connections between NLP and work in computer vision and physics aimed at modeling systems governed by symmetries and other algebraic structures and, using them, they have been able to build NLP models exhibiting a number of new, human-like language acquisition behaviors, including one-shot word learning, learning via mutual exclusivity constraints, and learning of grammatical rules in extremely low-resource settings.”

Within EECS, Andreas has developed multiple advanced courses in natural language processing, as well as new exercises designed to get students to grapple with important social and ethical considerations in machine learning deployment. “Jacob has taken a leading role in completely modernizing and extending our course offerings in natural language processing,” says award nominator Leslie Pack Kaelbling, Panasonic Professor in the Department of EECS. “He has led the development of a modern two-course sequence, which is a cornerstone of the new AI+D [artificial intelligence and decision-making] major, routinely enrolling several hundred students each semester. His command of the area is broad and deep, and his classes integrate classical structural understanding of language with the most modern learning-based approaches. He has put MIT EECS on the worldwide map as a place to study natural language at every level.”

Brett McGuire joined the MIT faculty in 2020 and was promoted to associate professor in 2025. His research operates at the intersection of physical chemistry, molecular spectroscopy, and observational astrophysics, where he seeks to uncover how the chemical building blocks of life evolve alongside and help shape the birth of stars and planets. A former Jansky Fellow and then Hubble Postdoctoral Fellow at the National Radio Astronomy Observatory, McGuire has a BS in chemistry from the University of Illinois and a PhD in physical chemistry from Caltech. His honors include a 2026 Sloan Fellowship, the Beckman Young Investigator Award, the Helen B. Warner Prize for Astronomy, and the MIT Award for Teaching with Digital Technology.

The faculty who nominated McGuire for this award praised his extraordinary public outreach, his immediate willingness to take on teaching class 5.111 (Principles of Chemical Science), a General Institute Requirement (GIR) course comprised of 150–500 students, and his service to both the MIT and astrochemical communities.

“Brett is at the very top of astrochemical scientists in his age group due to his discovery of fused carbon ring compounds in the cold region of the ISM [interstellar medium], an observation that provides a route for carbon incorporation in planets,” says Sylvia Ceyer, the John C. Sheehan Professor of Chemistry in her nomination statement. “His extensive involvement in service-oriented activities within the astrochemical/physical community is highly unusual for a junior scientist, and is testament to the value that the astronomical community places in his wisdom and judgement. His phenomenal organizational skills have made his contributions to graduate admission protocols and seminar administration at MIT the envy of the department. And most importantly, Brett is a superb teacher, who cares deeply about students’ understanding and success, not only in his course, but in their future endeavors.”

“As an assistant professor, Brett volunteered to teach 5.111, a large GIR course with 150–500 students, and has received some of the best teaching evaluations among all faculty who have led the subject,” says Mei Hong, the David A. Leighty Professor of Chemistry. “He has a natural talent in explaining abstract physical chemistry concepts in an engaging manner. His slides, which he prepared from scratch instead of modifying from previous years’ material from other professors, are clear, and … the combination of lucid explanation and humor has generated great enthusiasm and interest in chemistry among students.”

Subject evaluations from McGuire’s courses praised his humor, the clarity of his explanations, and his ability to transform a lecture into a “science show.” “I haven’t felt this sort of desire for the depth of understanding in a subject beyond just a straight grade [in some time],” says one student. “Brett definitely stimulated that love of learning for me.” 

“Brett is an outstanding faculty member who is dedicated to fostering student learning and success,” says Jennifer Weisman, assistant director of academic programs in chemistry. “He is thoughtful, caring, and goes above and beyond to help his colleagues, students, and staff.”

“I’m thrilled to be selected for the Edgerton Award this year,” says McGuire. “The award is nominally for teaching, research, and service; MIT and the chemistry department in particular have been an incredible place to learn and grow in all these areas. I’m incredibly grateful for the mentorship, enthusiasm, and support I have received from my colleagues, from my students both in the lab and in the classroom, and from the MIT community during my time here. I look forward to many more years of exciting discovery together with this one-of-a-kind community.”

Bringing AI-driven protein-design tools to biologists everywhere

Artificial intelligence is already proving it can accelerate drug development and improve our understanding of disease. But to turn AI into novel treatments we need to get the latest, most powerful models into the hands of scientists.

The problem is that most scientists aren’t machine-learning experts. Now the company OpenProtein.AI is helping scientists stay on the cutting edge of AI with a no-code platform that gives them access to powerful foundation models and a suite of tools for designing proteins, predicting protein structure and function, and training models.

The company, founded by Tristan Bepler PhD ’20 and former MIT associate professor Tim Lu PhD ’07, is already equipping researchers in pharmaceutical and biotech companies of all sizes with its tools, including internally developed foundation models for protein engineering. OpenProtein.AI also offers its platform to scientists in academia for free.

“It’s a really exciting time right now because these models can not only make protein engineering more efficient — which shortens development cycles for therapeutics and industrial uses — they can also enhance our ability to design new proteins with specific traits,” Bepler says. “We’re also thinking about applying these approaches to non-protein modalities. The big picture is we’re creating a language for describing biological systems.”

Advancing biology with AI

Bepler came to MIT in 2014 as part of the Computational and Systems Biology PhD Program, studying under Bonnie Berger, MIT’s Simons Professor of Applied Mathematics. It was there that he realized how little we understand about the molecules that make up the building blocks of biology.

“We hadn’t characterized biomolecules and proteins well enough to create good predictive models of what, say, a whole genome circuit will do, or how a protein interaction network will behave,” Bepler recalls. “It got me interested in understanding proteins at a more fine-grained level.”

Bepler began exploring ways to predict the chains of amino acids that make up proteins by analyzing evolutionary data. This was before Google released AlphaFold, a powerful prediction model for protein structure. The work led to one of the first generative AI models for understanding and designing proteins — what the team calls a protein language model.

“I was really excited about the classical framework of proteins and the relationships between their sequence, structure, and function. We don’t understand those links well,” Bepler says. “So how could we use these foundation models to skip the ‘structure’ component and go straight from sequence to function?”

After earning his PhD in 2020, Bepler entered Lu’s lab in MIT’s Department of Biological Engineering as a postdoc.

“This was around the time when the idea of integrating AI with biology was starting to pick up,” Lu recalls. “Tristan helped us build better computational models for biologic design. We also realized there’s a disconnect between the most cutting-edge tools available and the biologists, who would love to use these things but don’t know how to code. OpenProtein came from the idea of broadening access to these tools.”

Bepler had worked at the forefront of AI as part of his PhD. He knew the technology could help scientists accelerate their work.

“We started with the idea to build a general-purpose platform for doing machine learning-in-the-loop protein engineering,” Bepler says. “We wanted to build something that was user friendly because machine-learning ideas are kind of esoteric. They require implementation, GPUs, fine-tuning, designing libraries of sequences. Especially at that time, it was a lot for biologists to learn.”

OpenProtein’s platform, in contrast, features an intuitive web interface for biologists to upload data and conduct protein engineering work with machine learning. It features a range of open-source models, including PoET, OpenProtein’s flagship protein language model.

PoET, short for Protein Evolutionary Transformer, was trained on protein groups to generate sets of related proteins. Bepler and his collaborators showed it could generalize about evolutionary constraints on proteins and incorporate new information on protein sequences without retraining, allowing other researchers to add experimental data to improve the model.

“Researchers can use their own data to train models and optimize protein sequences, and then they can use our other tools to analyze those proteins,” Bepler says. “People are generating libraries of protein sequences in silico [on computers] and then running them through predictive models to get validation and structural predictors. It’s basically a no-code front-end, but we also have APIs for people who want to access it with code.”

The models help researchers design proteins faster, then decide which ones are promising enough for further lab testing. Researchers can also input proteins of interest, and the models can generate new ones with similar properties.

Since its founding, OpenProtein’s team has continued to add tools to its platform for researchers regardless of their lab size or resources.

“We’ve tried really hard to make the platform an open-ended toolbox,” Bepler says. “It has specific workflows, but it’s not tied specifically to one protein function or class of proteins. One of the great things about these models is they are very good at understanding proteins broadly. They learn about the whole space of possible proteins.”

Enabling the next generation of therapies

The large pharmaceutical company Boehringer Ingelheim began using OpenProtein’s platform in early 2025. Recently, the companies announced an expanded collaboration that will see OpenProtein’s platform and models embedded into Boehringer Ingelheim’s work as it engineers proteins to treat diseases like cancer and autoimmune or inflammatory conditions.

Last year, OpenProtein also released a new version of its protein language model, PoET-2, that outperforms much larger models while using a small fraction of the computing resources and experimental data.

“We really want to solve the question of how we describe proteins,” Bepler says. “What’s the meaningful, domain-specific language of protein constraints we use as we generate them? How can we bring in more evolutionary constraints? How can we describe an enzymatic reaction a protein carries out such that a model can generate sequences to do that reaction?”

Moving forward, the founders are hoping to make models that factor in the changing, interconnected nature of protein function.

“The area I am excited about is going beyond protein binding events to use these models to predict and design dynamic features, where the protein has to engage two, three, or four biological mechanisms at the same time, or change its function after binding,” says Lu, who currently serves in an advisory role for the company.

As progress in AI races forward, OpenProtein continues to see its mission as giving scientists the best tools to develop new treatments faster.

“As work gets more complex, with approaches incorporating things like protein logic and dynamic therapies, the existing experimental toolsets become limiting,” Lu says. “It’s really important to create open ecosystems around AI and biology. There’s a risk that AI resources could get so concentrated that the average researcher can’t use them. Open access is super important for the scientific field to make progress.”

Human-machine teaming dives underwater

The electricity to an island goes out. To find the break in the underwater power cable, a ship pulls up the entire line or deploys remotely operated vehicles (ROVs) to traverse the line. But what if an autonomous underwater vehicle (AUV) could map the line and pinpoint the location of the fault for a diver to fix?

Such underwater human-robot teaming is the focus of an MIT Lincoln Laboratory project funded through an internally administered R&D portfolio on autonomous systems and carried out by the Advanced Undersea Systems and Technology Group. The project seeks to leverage the respective strengths of humans and robots to optimize maritime missions for the U.S. military, including critical infrastructure inspection and repair, search and rescue, harbor entry, and countermine operations.

“Divers and AUVs generally don’t team at all underwater,” says principal investigator Madeline Miller. “Underwater missions requiring humans typically do so because they involve some sort of manipulation a robot can’t do, like repairing infrastructure or deactivating a mine. Even ROVs are challenging to work with underwater in very skilled manipulation tasks because the manipulators themselves aren’t agile enough.”

Beyond their superior dexterity, humans excel at recognizing objects underwater. But humans working underwater can’t perform complex computations or move very quickly, especially if they are carrying heavy equipment; robots have an edge over humans in processing power, high-speed mobility, and endurance. To combine these strengths, Miller and her team are developing hardware and algorithms for underwater navigation and perception — two key capabilities for effective human-robot teaming.

As Miller explains, divers may only have a compass and fin-kick counts to guide them. With few landmarks and potentially murky conditions caused by a lack of light at depth or the presence of biological matter in the water column, they can easily become disoriented and lost. For robots to help divers navigate, they need to perceive their environment. However, in the presence of darkness and turbidity, optical sensors (cameras) cannot generate images, while acoustic sensors (sonar) generate images that lack color and only show the shapes and shadows of objects in the scene. The historical lack of large, labeled sonar image datasets has hindered training of underwater perception algorithms. Even if data were available, the dynamic ocean can obscure the true nature of objects, confusing artificial intelligence. For instance, a downed aircraft broken into multiple pieces, or a tire covered in an overgrowth of mussels, may no longer resemble an aircraft or tire, respectively.

“Ultimately, we want to devise solutions for navigation and perception in expeditionary environments,” Miller says. “For the missions we’re thinking about, there is limited or no opportunity to map out the area in advance. For the harbor entry mission, maybe you have a satellite map but no underwater map, for example.”

On the navigation side, Miller’s team picked up on work started by the MIT Marine Robotics Group, led by John Leonard, to develop diver-AUV teaming algorithms. With their navigation algorithms, Leonard’s group ran simulations under optimal conditions and performed field testing in calm waters using human-paddled kayaks as proxies for both divers and AUVs. Miller’s team then integrated these algorithms into a mission-relevant AUV and began testing them under more realistic ocean conditions, initially with a support boat acting as a diver surrogate, and then with actual divers.

“We quickly learned that you need more sensing capabilities on the diver when you factor in ocean currents,” Miller explains. “With the algorithms demonstrated by MIT, the vehicle only needed to calculate the distance, or range, to the diver at regular intervals to solve the optimization problem of estimating the positions of both the vehicle and diver over time. But with the real ocean forces pushing everything around, this optimization problem blows up quickly.”

On the perception side, Miller’s team has been developing an AI classifier that can process both optical and sonar data mid-mission and solicit human input for any objects classified with uncertainty.

“The idea is for the classifier to pass along some information — say, a bounding box around an image — to the diver and indicate, “I think this is a tire, but I’m not sure. What do you think?” Then, the diver can respond, “Yes, you’ve got it right, or no, look over here in the image to improve your classification,” Miller says.

This feedback loop requires an underwater acoustic modem to support diver-AUV communication. State-of-the-art data rates in underwater acoustic communications would require tens of minutes to send an uncompressed image from the AUV to the diver. So, one aspect the team is investigating is how to compress information into a minimum amount to be useful, working within the constraints of the low bandwidth and high latency of underwater communications and the low size, weight, and power of the commercial off-the-shelf (COTS) hardware they’re using. For their prototype system, the team procured mostly COTS sensors and built a sensor payload that would easily integrate into an AUV routinely employed by the U.S. Navy, with the goal of facilitating technology transition. Beyond sonar and optical sensors, the payload features an acoustic modem for ranging to the diver and several data processing and compute boards.

Miller’s team has tested the sensor-equipped AUV and algorithms around coastal New England — including in the open ocean near Portsmouth, New Hampshire, with the University of New Hampshire’s (UNH) Gulf Surveyor and Gulf Challenger coastal research vessels as diver surrogates, and on the Boston-area Charles River, with an MIT Sailing Pavilion skiff as the surrogate.

“The UNH boats are well-equipped and can access realistic ocean conditions. But pretending to be a diver with a large boat is hard. With the skiff, we can move more slowly and get the relative motion in tune with how a diver and AUV would navigate together.”

Last summer, the team started testing equipment with human divers at Michigan Technological University’s Great Lakes Research Center. Although the divers lacked an interface to feed back information to the AUV, each swam holding the team’s tube-shaped prototype tablet, dubbed a “tube-let.” The tube-let was equipped with a pressure and depth sensor, inertial measurement unit (to track relative motion), and ranging modem — all necessary components for the navigation algorithms to solve the optimization problem.

“A challenge during testing was coordinating the motion of the diver and vehicle, because they don’t yet collaborate,” Miller says. “Once the divers go underwater, there is no communication with the team on the surface. So, you have to plan where to put the diver and vehicle so they don’t collide.”

The team also worked on the perception problem. The water clarity of the Great Lakes at that time of year allowed for underwater imaging with an optical sensor. Caroline Keenan, a Lincoln Scholars Program PhD student jointly working in the laboratory’s Advanced Undersea Systems and Technology Group and Leonard’s research group at MIT, took the opportunity to advance her work on knowledge transfer from optical sensors to sonar sensors. She is exploring whether optical classifiers can train sonar classifiers to recognize objects for which sonar data doesn’t exist. The motivation is to reduce the human operator load associated with labeling sonar data and training sonar classifiers.

With the internally funded research program coming to an end, Miller’s team is now seeking external sponsorship to refine and transition the technology to military or commercial partners.

“The modern world runs on undersea telecommunication and power cables, which are vulnerable to attack by disruptive actors. The undersea domain is becoming increasingly contested as more nations develop and advance the capabilities of autonomous maritime systems. Maintaining global economic security and U.S. strategic advantage in the undersea domain will require leveraging and combining the best of AI and human capabilities,” Miller says.

Q&A: MIT SHASS and the future of education in the age of AI

The MIT School of Humanities, Arts, and Social Sciences (SHASS) was founded in 1950 in response to “a new era emerging from social upheaval and the disasters of war,” as outlined in the 1949 Lewis Committee Report

The report’s findings emphasized MIT’s role and responsibility in the new nuclear age, which called for doubling down on genuine “integration” of scientific and technical topics with humanistic scholarship and teaching. Only that way, the committee wrote, could MIT tackle “the most difficult and complicated problems confronting our generation.”

As SHASS marks its 75th anniversary, Dean Agustín Rayo answers questions about why the need for developing students with broad minds and human understanding is as urgent as ever, given pressing challenges in the midst of a new technological revolution.

Q: Many universities are responding to artificial intelligence by launching new technical programs or updating curricula. You’ve suggested the change is deeper than that. Why?

A: Artificial intelligence isn’t just changing the way students learn — it’s transforming every aspect of society. The labor market is experiencing a dramatic shift, upending traditional paths to financial stability. And AI is changing the ways we bring meaning to our lives: the ways we build relationships, the ways we pay attention, and the things we enjoy doing.

The upshot is that the most important question universities need to ask is not how to adapt our pedagogy to AI — although we certainly need to address that. The most important question we need to ask is how to provide an education that brings real value to students in the age of AI. 

We need to ensure that universities provide students with the tools they need to find a path to financial security and to build meaningful lives.

We need to produce students with minds that are both nimble and broad. We need our students to not only be able to execute tasks effectively, but also have the judgment to determine which tasks are worth executing. We need students who have a moral compass, and who understand how the world works, in all of its political, economic, and human complexity. We need students who know how to think critically, and who have excellent communication and leadership skills.

Q: What role do the humanities, arts, and social sciences play in preparing MIT students for that future?

A: They’re essential, and are rightly a core part of an MIT education: MIT has long required its undergraduates take at least eight courses in HASS disciplines to graduate.

Fields like philosophy, political science, economics, literature, history, music, and anthropology are crucial to developing the parts of our lives that are essentially human — the parts that will not be replaced by AI.

They are crucial to developing critical thinking and a moral compass. They are crucial to understanding people — our values, institutions, cultures, and ways of thinking. They are crucial to creating students who are broad thinkers who understand the way the world works. They are crucial to developing students who are excellent communicators and are able to describe their projects — and their lives — in a way that endows them with meaning.

Our students understand this. Here is how one of them put the point: “Engineering gives me the tools to measure the world; the humanities teach me how to interpret it. That balance has shaped both how I do science and why I do it.” (Full interview here.)

Q: Some people worry that emphasizing humanistic study could dilute MIT’s technological edge. How do you respond to that concern?

A: I think the opposite is true. 

MIT is an important engine for social mobility in the United States, and a catalyst for entrepreneurship, which has added billions of dollars to the American economy. That cannot be separated from the fact that we are a technical institution, which brings together the country’s most talented undergraduates — regardless of socioeconomic background — and transforms them into the next generation of our country’s top scientific and engineering leaders. 

MIT plays an incredibly important role in our country. So, the last thing I want to do is mess with our secret sauce.

But I also think that the age of AI is forcing us to rethink what it means to be a top engineer. 

Think about artificial intelligence itself. The challenges we face are not just technical. Issues like bias, accountability, governance, and the societal impact of automation are no less important. Understanding those dimensions helps technologists design better systems and anticipate real-world consequences.

Strengthening the humanities at MIT isn’t a departure from our core mission — it’s a way of ensuring that our technical leadership continues to matter in the world.

Q: What kinds of changes is MIT SHASS pursuing to support this vision?

A: There’s a lot going on! 

We’ve launched the MIT Human Insight Collaborative (MITHIC) as a way of strengthening research in the humanities, arts, and social sciences, and of deepening collaboration with colleagues across MIT.

We’re shaping the undergraduate experience to ensure that every MIT student engages with the big societal questions shaping our time, from democratic resilience to climate change to the ethics of new technologies.

We’re building stronger connections through initiatives like the creation of shared faculty positions with the MIT Schwarzman College of Computing (SCC). And we recently launched a new Music Technology and Computation Graduate Program with the School of Engineering.

We’re partnering with SERC (the SCC’s Social and Ethical Responsibilities of Computing) to design new classes on the intersection of computing and human-centered issues, such as ethics.

And we’re elevating the humanities — for their own sake, and as a space for experimentation, bringing together students, faculty, and partners to explore new forms of research, teaching, and public engagement.

This is a very exciting time for SHASS.