Precision Talent

Loading

How Joseph Paradiso’s sensing innovations bridge the arts, medicine, and ecology

Joseph Paradiso thinks that the most engaging research questions usually span disciplines. 

Paradiso was trained as a physicist and completed his PhD in experimental high-energy physics at MIT in 1981. His father was a photographer and filmmaker working at MIT, MIT Lincoln Laboratory, and the MITRE Corporation, so he grew up in a house where artists, scientists, and engineers regularly gathered and interesting music was always playing. 

That mix of influences led him to the MIT Media Lab, where he is the Alexander W. Dreyfoos Professor, academic head of the Program in Media Arts and Sciences, and director of the Responsive Environments research group.

At the Media Lab, Paradiso conducts research that engages sensing of different kinds and applies it across diverse and often extreme applications. He works on developing technologies that can efficiently capture and process multiple sensing modalities, and leverages this capability in application domains like the internet of things, medicine, environmental sensing, space exploration, and artistic expression. These efforts use that information to help people better understand the world, express themselves, and connect with one another.

Early in his career, Paradiso helped pioneer the field of wireless wearable sensing. He built many systems with multiple embedded sensors that could send information from the human body in real-time. One of his early flagship projects in this area was a pair of shoes fielded in 1997 for real-time augmented dance performance that embedded 16 sensors in each shoe, allowing wearers’ movements to directly generate music through algorithmic mapping. And Paradiso’s research at the Media Lab has consistently focused on sensing and using that information in new ways. 

“When I would list all the sensors … people would laugh. But now, my watch is measuring most of these things,” Paradiso notes. “The world has moved.” 

That progression from early prototypes to everyday technology helped lay the groundwork for devices people now use regularly to track activity, health, and performance.

As sensing systems improved, Paradiso expanded his work from individuals to groups. He developed platforms that allowed dance ensembles to create music together through their collective motion. Achieving this required Paradiso and his team to develop new ways for compact wearable devices to communicate wirelessly at high speed, as well as new approaches to real-time data processing and extending the range of available microelectromechanical systems (MEMS) sensors.

Those same sensing platforms were later adapted for sports medicine in 2006. Working with doctors who support elite athletes, his array of compact, wearable sensors captured large amounts of high-speed motion data from multiple points on the body, aimed at helping clinicians assess injury risk, performance, and recovery on the go, without the complex equipment typically associated with biomechanical monitoring and clinical settings.

More recently, Paradiso’s research has extended beyond humans. Through collaborations with National Geographic Explorers, his team has deployed sensors in remote environments to study animal behavior, including low-power compact wearable devices to detect the environmental conditions around the animal as well as track them (currently on lions and hyenas in Botswana and goats in Chile), and acoustic sensors with onboard AI to detect and monitor populations of endangered honeybees in Patagonia. This work provides new ways to understand how ecosystems function and how the planet is changing.

Paradiso was named an IEEE Fellow in January, recognizing his achievement in wireless wearable sensing and mobile energy harvesting. This is the highest grade of membership in IEEE, the world’s leading professional association dedicated to advancing technology for the benefit of humanity.

Across art, health, and the natural world, Paradiso’s work reflects how foundational research at MIT can seed technologies that ripple outward over time, shaping new applications and opening new fields. As advances in wearable technologies drive the rush toward the ever-more-connected human, a persistent existential question lurks. 

“Where do I stop, versus others begin?” Paradiso asks. 

For him, the aim is not novelty for its own sake, but amplification: using technology to help people become more perceptive, better connected, and more aware of their place in a larger system.

Improving AI models’ ability to explain their predictions

In high-stakes settings like medical diagnostics, users often want to know what led a computer vision model to make a certain prediction, so they can determine whether to trust its output.

Concept bottleneck modeling is one method that enables artificial intelligence systems to explain their decision-making process. These methods force a deep-learning model to use a set of concepts, which can be understood by humans, to make a prediction. In new research, MIT computer scientists developed a method that coaxes the model to achieve better accuracy and clearer, more concise explanations.

The concepts the model uses are usually defined in advance by human experts. For instance, a clinician could suggest the use of concepts like “clustered brown dots” and “variegated pigmentation” to predict that a medical image shows melanoma.

But previously defined concepts could be irrelevant or lack sufficient detail for a specific task, reducing the model’s accuracy. The new method extracts concepts the model has already learned while it was trained to perform that particular task, and forces the model to use those, producing better explanations than standard concept bottleneck models.

The approach utilizes a pair of specialized machine-learning models that automatically extract knowledge from a target model and translate it into plain-language concepts. In the end, their technique can convert any pretrained computer vision model into one that can use concepts to explain its reasoning.

“In a sense, we want to be able to read the minds of these computer vision models. A concept bottleneck model is one way for users to tell what the model is thinking and why it made a certain prediction. Because our method uses better concepts, it can lead to higher accuracy and ultimately improve the accountability of black-box AI models,” says lead author Antonio De Santis, a graduate student at Polytechnic University of Milan who completed this research while a visiting graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT.

He is joined on a paper about the work by Schrasing Tong SM ’20, PhD ’26; Marco Brambilla, professor of computer science and engineering at Polytechnic University of Milan; and senior author Lalana Kagal, a principal research scientist in CSAIL. The research will be presented at the International Conference on Learning Representations.

Building a better bottleneck

Concept bottleneck models (CBMs) are a popular approach for improving AI explainability. These techniques add an intermediate step by forcing a computer vision model to predict the concepts present in an image, then use those concepts to make a final prediction.

This intermediate step, or “bottleneck,” helps users understand the model’s reasoning.

For example, a model that identifies bird species could select concepts like “yellow legs” and “blue wings” before predicting a barn swallow.

But because these concepts are often generated in advance by humans or large language models (LLMs), they might not fit the specific task. In addition, even if given a set of pre-defined concepts, the model sometimes utilizes undesirable learned information anyway, which is a problem known as information leakage.

“These models are trained to maximize performance, so the model might secretly use concepts we are unaware of,” De Santis explains.

The MIT researchers had a different idea: Since the model has been trained on a vast amount of data, it may have learned the concepts needed to generate accurate predictions for the particular task at hand. They sought to build a CBM by extracting this existing knowledge and converting it into text a human can understand.

In the first step of their method, a specialized deep-learning model called a sparse autoencoder selectively takes the most relevant features the model learned and reconstructs them into a handful of concepts. Then, a multimodal LLM describes each concept in plain language.

This multimodal LLM also annotates images in the dataset by identifying which concepts are present and absent in each image. The researchers use this annotated dataset to train a concept bottleneck module to recognize the concepts.

They incorporate this module into the target model, forcing it to make predictions using only the set of learned concepts the researchers extracted.

Controlling the concepts

They overcame many challenges as they developed this method, from ensuring the LLM annotated concepts correctly to determining whether the sparse autoencoder had identified human-understandable concepts.

To prevent the model from using unknown or unwanted concepts, they restrict it to use only five concepts for each prediction. This also forces the model to choose the most relevant concepts and makes the explanations more understandable.

When they compared their approach to state-of-the-art CBMs on tasks like predicting bird species and identifying skin lesions in medical images, their method achieved the highest accuracy while providing more precise explanations.

Their approach also generated concepts that were more applicable to the images in the dataset. 

“We’ve shown that extracting concepts from the original model can outperform other CBMs, but there is still a tradeoff between interpretability and accuracy that needs to be addressed. Black-box models that are not interpretable still outperform ours,” De Santis says.

In the future, the researchers want to study potential solutions to the information leakage problem, perhaps by adding additional concept bottleneck modules so unwanted concepts can’t leak through. They also plan to scale up their method by using a larger multimodal LLM to annotate a bigger training dataset, which could boost performance.

“I’m excited by this work because it pushes interpretable AI in a very promising direction and creates a natural bridge to symbolic AI and knowledge graphs,” says Andreas Hotho, professor and head of the Data Science Chair at the University of Würzburg, who was not involved with this work. “By deriving concept bottlenecks from the model’s own internal mechanisms rather than only from human-defined concepts, it offers a path toward explanations that are more faithful to the model and opens many opportunities for follow-up work with structured knowledge.”

This research was supported by the Progetto Rocca Doctoral Fellowship, the Italian Ministry of University and Research under the National Recovery and Resilience Plan, Thales Alenia Space, and the European Union under the NextGenerationEU project.

A “ChatGPT for spreadsheets” helps solve difficult engineering challenges faster

Many engineering challenges come down to the same headache — too many knobs to turn and too few chances to test them. Whether tuning a power grid or designing a safer vehicle, each evaluation can be costly, and there may be hundreds of variables that could matter.

Consider car safety design. Engineers must integrate thousands of parts, and many design choices can affect how a vehicle performs in a collision. Classic optimization tools could start to struggle when searching for the best combination.

MIT researchers developed a new approach that rethinks how a classic method, known as Bayesian optimization, can be used to solve problems with hundreds of variables. In tests on realistic engineering-style benchmarks, like power-system optimization, the approach found top solutions 10 to 100 times faster than widely used methods.

Their technique leverages a foundation model trained on tabular data that automatically identifies the variables that matter most for improving performance, repeating the process to hone in on better and better solutions. Foundation models are huge artificial intelligence systems trained on vast, general datasets. This allows them to adapt to different applications.

The researchers’ tabular foundation model does not need to be constantly retrained as it works toward a solution, increasing the efficiency of the optimization process. The technique also delivers greater speedups for more complicated problems, so it could be especially useful in demanding applications like materials development or drug discovery.

“Modern AI and machine-learning models can fundamentally change the way engineers and scientists create complex systems. We came up with one algorithm that can not only solve high-dimensional problems, but is also reusable so it can be applied to many problems without the need to start everything from scratch,” says Rosen Yu, a graduate student in computational science and engineering and lead author of a paper on this technique.

Yu is joined on the paper by Cyril Picard, a former MIT postdoc and research scientist, and Faez Ahmed, associate professor of mechanical engineering and a core member of the MIT Center for Computational Science and Engineering. The research will be presented at the International Conference on Learning Representations.

Improving a proven method

When scientists seek to solve a multifaceted problem but have expensive methods to evaluate success, like crash testing a car to know how good each design is, they often use a tried-and-true method called Bayesian optimization. This iterative method finds the best configuration for a complicated system by building a surrogate model that helps estimate what to explore next while considering the uncertainty of its predictions.

But the surrogate model must be retrained after each iteration, which can quickly become computationally intractable when the space of potential solutions is very large. In addition, scientists need to build a new model from scratch any time they want to tackle a different scenario.

To address both shortcomings, the MIT researchers utilized a generative AI system known as a tabular foundation model as the surrogate model inside a Bayesian optimization algorithm.

“A tabular foundation model is like a ChatGPT for spreadsheets. The input and output of these models are tabular data, which in the engineering domain is much more common to see and use than language,” Yu says.

Just like large language models such as ChatGPT,  Claude, and Gemini, the model has been pre-trained on an enormous amount of tabular data. This makes it well-equipped to tackle a range of prediction problems. In addition, the model can be deployed as-is, without the need for any retraining.

To make their system more accurate and efficient for optimization, the researchers employed a trick that enables the model to identify features of the design space that will have the biggest impact on the solution.

“A car might have 300 design criteria, but not all of them are the main driver of the best design if you are trying to increase some safety parameters. Our algorithm can smartly select the most critical features to focus on,” Yu says.

It does this by using a tabular foundation model to estimate which variables (or combinations of variables) most influence the outcome.

It then focuses the search on those high-impact variables instead of wasting time exploring everything equally. For instance, if the size of the front crumple zone significantly increased and the car’s safety rating improved, that feature likely played a role in the enhancement.

Bigger problems, better solutions

One of their biggest challenges was finding the best tabular foundation model for this task, Yu says. Then they had to connect it with a Bayesian optimization algorithm in such a way that it could identify the most prominent design features.

“Finding the most prominent dimension is a well-known problem in math and computer science, but coming up with a way that leveraged the properties of a tabular foundation model was a real challenge,” Yu says.

With the algorithmic framework in place, the researchers tested their method by comparing it to five state-of-the-art optimization algorithms.

On 60 benchmark problems, including realistic situations like power grid design and car crash testing, their method consistently found the best solution between 10 and 100 times faster than the other algorithms.

“When an optimization problem gets more and more dimensions, our algorithm really shines,” Yu added.

But their method did not outperform the baselines on all problems, such as robotic path planning. This likely indicates that scenario was not well-defined in the model’s training data, Yu says.

In the future, the researchers want to study methods that could boost the performance of tabular foundation models. They also want to apply their technique to problems with thousands or even millions of dimensions, like the design of a naval ship.

“At a higher level, this work points to a broader shift: using foundation models not just for perception or language, but as algorithmic engines inside scientific and engineering tools, allowing classical methods like Bayesian optimization to scale to regimes that were previously impractical,” says Ahmed.

“The approach presented in this work, using a pretrained foundation model together with high‑dimensional Bayesian optimization, is a creative and promising way to reduce the heavy data requirements of simulation‑based design. Overall, this work is a practical and powerful step toward making advanced design optimization more accessible and easier to apply in real-world settings,” says Wei Chen, the Wilson-Cook Professor in Engineering Design and chair of the Department of Mechanical Engineering at Northwestern University, who was not involved in this research.

A better method for planning complex visual tasks

MIT researchers have developed a generative artificial intelligence-driven approach for planning long-term visual tasks, like robot navigation, that is about twice as effective as some existing techniques.

Their method uses a specialized vision-language model to perceive the scenario in an image and simulate actions needed to reach a goal. Then a second model translates those simulations into a standard programming language for planning problems, and refines the solution.

In the end, the system automatically generates a set of files that can be fed into classical planning software, which computes a plan to achieve the goal. This two-step system generated plans with an average success rate of about 70 percent, outperforming the best baseline methods that could only reach about 30 percent.

Importantly, the system can solve new problems it hasn’t encountered before, making it well-suited for real environments where conditions can change at a moment’s notice.

“Our framework combines the advantages of vision-language models, like their ability to understand images, with the strong planning capabilities of a formal solver,” says Yilun Hao, an aeronautics and astronautics (AeroAstro) graduate student at MIT and lead author of an open-access paper on this technique. “It can take a single image and move it through simulation and then to a reliable, long-horizon plan that could be useful in many real-life applications.”

She is joined on the paper by Yongchao Chen, a graduate student in the MIT Laboratory for Information and Decision Systems (LIDS); Chuchu Fan, an associate professor in AeroAstro and a principal investigator in LIDS; and Yang Zhang, a research scientist at the MIT-IBM Watson AI Lab. The paper will be presented at the International Conference on Learning Representations.

Tackling visual tasks

For the past few years, Fan and her colleagues have studied the use of generative AI models to perform complex reasoning and planning, often employing large language models (LLMs) to process text inputs.

Many real-world planning problems, like robotic assembly and autonomous driving, have visual inputs that an LLM can’t handle well on its own. The researchers sought to expand into the visual domain by utilizing vision-language models (VLMs), powerful AI systems that can process images and text.

But VLMs struggle to understand spatial relationships between objects in a scene and often fail to reason correctly over many steps. This makes it difficult to use VLMs for long-range planning.

On the other hand, scientists have developed robust, formal planners that can generate effective long-horizon plans for complex situations. However, these software systems can’t process visual inputs and require expert knowledge to encode a problem into language the solver can understand.

Fan and her team built an automatic planning system that takes the best of both methods. The system, called VLM-guided formal planning (VLMFP), utilizes two specialized VLMs that work together to turn visual planning problems into ready-to-use files for formal planning software.

The researchers first carefully trained a small model they call SimVLM to specialize in describing the scenario in an image using natural language and simulating a sequence of actions in that scenario. Then a much larger model, which they call GenVLM, uses the description from SimVLM to generate a set of initial files in a formal planning language known as the Planning Domain Definition Language (PDDL).

The files are ready to be fed into a classical PDDL solver, which computes a step-by-step plan to solve the task. GenVLM compares the results of the solver with those of the simulator and iteratively refines the PDDL files.

“The generator and simulator work together to be able to reach the exact same result, which is an action simulation that achieves the goal,” Hao says.

Because GenVLM is a large generative AI model, it has seen many examples of PDDL during training and learned how this formal language can solve a wide range of problems. This existing knowledge enables the model to generate accurate PDDL files.

A flexible approach

VLMFP generates two separate PDDL files. The first is a domain file that defines the environment, valid actions, and domain rules. It also produces a problem file that defines the initial states and the goal of a particular problem at hand.

“One advantage of PDDL is the domain file is the same for all instances in that environment. This makes our framework good at generalizing to unseen instances under the same domain,” Hao explains.

To enable the system to generalize effectively, the researchers needed to carefully design just enough training data for SimVLM so the model learned to understand the problem and goal without memorizing patterns in the scenario. When tested, SimVLM successfully described the scenario, simulated actions, and detected if the goal was reached in about 85 percent of experiments.

Overall, the VLMFP framework achieved a success rate of about 60 percent on six 2D planning tasks and greater than 80 percent on two 3D tasks, including multirobot collaboration and robotic assembly. It also generated valid plans for more than 50 percent of scenarios it hadn’t seen before, far outpacing the baseline methods.

“Our framework can generalize when the rules change in different situations. This gives our system the flexibility to solve many types of visual-based planning problems,” Fan adds.

In the future, the researchers want to enable VLMFP to handle more complex scenarios and explore methods to identify and mitigate hallucinations by the VLMs.

“In the long term, generative AI models could act as agents and make use of the right tools to solve much more complicated problems. But what does it mean to have the right tools, and how do we incorporate those tools? There is still a long way to go, but by bringing visual-based planning into the picture, this work is an important piece of the puzzle,” Fan says.

This work was funded, in part, by the MIT-IBM Watson AI Lab.

New MIT class uses anthropology to improve chatbots

Young adults growing up in the attention economy — preparing for adult life, with social media and chatbots competing for their attention — can easily fall into unhealthy relationships with digital platforms. But what if chatbots weren’t mere distractions from real life? Could they be designed humanely, as moral partners whose digital goal is to be a social guide rather than an addictive escape?

At MIT, a friendship between two professors — one an anthropologist, the other a computer scientist — led to creation of an undergraduate class that set out to find the answer to those questions. Combining the two seemingly disparate disciplines, the class encourages students to design artificial intelligence chatbots in humane ways that help users improve themselves.

The class, 6.S061/21A.S02 (Humane User Experience Design, a.k.a. Humane UXD), is an upper-level computer science class cross-listed with anthropology. This unique cross-listing allows computer science majors to fulfill a humanities requirement while also pursuing their career objectives. The two professors use methods from linguistic anthropology to teach students how to integrate the interactional and interpersonal needs of humans into programming.

Professor Arvind Satyanarayan, a computer scientist whose research develops tools for interactive data visualization and user interfaces, and Professor Graham Jones, an anthropologist whose research focuses on communication, created Humane UXD last summer with a grant from the MIT Morningside Academy for Design (MAD). The MIT MAD Design Curriculum Program provides funding for faculty to develop new classes or enhance existing classes using innovative pedagogical approaches that transcend departmental boundaries. Alongside the grant provided by MAD, Jones and Satyanarayan received funding to develop Humane UXD under the auspices of the Common Ground for Computing Education, an initiative of the MIT Schwarzman College of Computing that brings together departments to create courses integrating computing with other disciplines. 

The Design Curriculum Program is currently accepting applications for the 2026-27 academic year; the deadline is Friday, March 20. 

Jones and Satyanarayan met several years ago when they co-advised a doctoral student’s research on data visualization for visually impaired people. They’ve since become close friends who can pretty much finish one another’s sentences.

“There’s a way in which you don’t really fully externalize what you know or how you think until you’re teaching,” Jones says. “So, it’s been really fun for me to see Arvind unfurl his expertise as a teacher in a way that lets me see how the pieces fit together — and discover underlying commonalities between our disciplines and our ways of thinking.”

Satyanarayan continues that thought: “One of the things I really enjoyed is the reciprocal version of what Graham said, which is that my field — human-computer interaction — inherited a lot of methods from anthropology, such as interviews and user studies and observation studies. And over the decades, those methods have gotten more and more watered down. As a result, a lot of things have been lost.

“For instance, it was very exciting for me to see how an anthropologist teaches students to interview people. It’s completely different than how I would do it. With my way, we lose the rapport and connection you need to build with your interview participant. Instead, we just extract data from them.”

For Jones’ part, teaching with a computer scientist holds another kind of allure: design. He says that human speech and interaction are organized into underlying genres with stable sets of rules that differentiate an interview at a cocktail party from a conversation at a funeral.

“ChatGPT and other large language models are trained on naturally occurring human communication, so they have all those genres inside them in a latent state, waiting to be activated,” he says.

“As a social scientist, I teach methods for analyzing human conversation, and give students very powerful tools to do that. But it ends up usually being an exercise in pure research, whereas this is a design class, where students are building real-world systems.”

The curriculum appears to be on target for preparing students for jobs after graduation. One student sought permission to miss class for a week because he had a trial internship at a chatbot startup; when he returned, he said his work at the startup was just like what he was learning in class. He got the job.

The sampling of group projects below, built with Google’s Gemini, demonstrates some of what’s possible when, as Jones says, “there’s a really deep intertwining of the technology piece with the humanities piece.” The students’ design work shows that entirely new ways of programming can be conceptualized when the humane is made a priority.

The bots demonstrate clearly that an interdisciplinary class can be designed in such a way that everyone benefits: Students learn more and differently; they can fulfill a non-major course requirement by taking a class that is directly beneficial to their careers; and long-term faculty partnerships can be forged or strengthened.

Team Pond

One project promises to be particularly useful for graduating seniors. Pond is designed to help young college graduates adapt to the challenges of independent adult life. Team Pond configured the chatbot not to simply parrot the user, or to sycophantically praise wrong answers. Instead, Pond provides advice to help with “adulting” (behaving as a responsible adult).

“Pond is built to be your companion from college life into post-college life, to help you in your transition from being a small fish in a small pond to being a small fish in a very big pond,” says sophomore Mary Feliz.

“College is very much a high-proximity and high-context environment, in the sense that everybody around you is going through the same thing, and it’s easy to build relationships or find opportunities, because there are structured pathways that you have access to,” explains graduate student Emaan Khan. “Post-grad life is low-context. You’re not always surrounded by your peers or your professors. It’s no-proximity also, in the sense that you don’t have opportunities at your doorstep. Pond is a tool to help empower you to access certain opportunities, or learn how to navigate.”

Pond’s developers are graduate student Claire Camacho, Harvard University graduate student Charlotte Cheah, Feliz, and Khan. They trained Pond to offer expertise in three areas: social life, professional life, and adult skills.

The students demonstrated Pond’s social communication by asking how to start conversations in a hobby group. Next, they exhibited Pond’s adulting communication by typing in a question.

“My landlord has asked me personal questions that make me uncomfortable. What should I do?”

Appearing in Pond’s textbox was this response: “Politely set boundaries. You are not obligated to answer personal questions. You can gently redirect the conversation back to property-related topics. Know your rights: familiarize yourself with tenant rights in your city or state regarding privacy and landlord conduct.”

When asked how to do that, Pond offered specific instructions: “You can try saying phrases like: ‘I prefer to keep our conversations focused on the property,’ and ‘let’s stick to topics related to the apartment, please.’ Always redirect the conversation back to the apartment or a legitimate maintenance issue. Keep your tone polite but firm. Document any conversations if needed.”

Pond also offered a role-playing scenario to help the user learn what polite-but-firm language might be in that situation.

“The ethos of the practice mode is that you are actively building a skill, so that after using Pond for some time, you feel confident that you can swim on your own,” Khan says. The chatbot uses a point system that allows users to graduate from a topic, and a treasure chest to store prizes, elements added to boost the bot’s appeal.

Team News Nest

Another of the projects, News Nest, provides a sophisticated means of helping young people engage with credible news sources in a way that makes it fun. The name is derived from the program’s 10 appealing and colorful birds, each of which focuses on a particular area of news. If you want the headlines, you ask Polly the Parrot, the main news carrier; if you’re interested in science, Gaia the Goose guides you. The flock also includes Flynn the Falcon, sports reporter; Credo the Crow, for crime and legal news; Edwin the Eagle, a business and economics news guide; Pizzazz the Peacock for pop and entertainment stories; and Pixel the Pigeon, a technology news specialist.

News Nest’s development team is made up of MIT seniors Tiana Jiang and Krystal Montgomery, and junior Natalie Tan. They intentionally built News Nest to prevent “doomscrolling,” provide media transparency (sources and political leanings are always shown), and they created a clever, healthy buffer from emotional manipulation and engagement traps by employing birds rather than human characters.

Team M^3 (Multi-Agent Murder Mystery)

A third team, M^3, decided to experiment with making AI humane by keeping it fun. MIT senior Rodis Aguilar, junior David De La Torre, and second-year Deeraj Pothapragada developed M^3, a social deduction multi-agent murder mystery that incorporates four chatbots as different personalities: Gemini, OpenAI’s ChatGPT, xAI’s Grok, and Anthropic’s Claude. The user is the fifth player. 

Like a regular murder mystery, there are locations, weapons, and lies. The user has to guess who committed the murder. It’s very similar to a board or online game played with real players, only these are enhanced AI opponents you can’t see, who may or may not tell the truth in response to questions. Users can’t get too involved with one chatbot, because they’re playing all four. Also, as in a real life murder mystery game, the user is sometimes guilty.

3 Questions: On the future of AI and the mathematical and physical sciences

Curiosity-driven research has long sparked technological transformations. A century ago, curiosity about atoms led to quantum mechanics, and eventually the transistor at the heart of modern computing. Conversely, the steam engine was a practical breakthrough, but it took fundamental research in thermodynamics to fully harness its power. 

Today, artificial intelligence and science find themselves at a similar inflection point. The current AI revolution has been fueled by decades of research in the mathematical and physical sciences (MPS), which provided the challenging problems, datasets, and insights that made modern AI possible. The 2024 Nobel Prizes in physics and chemistry, recognizing foundational AI methods rooted in physics and AI applications for protein design, made this connection impossible to miss.

In 2025, MIT hosted a Workshop on the Future of AI+MPS, funded by the National Science Foundation with support from the MIT School of Science and the MIT departments of Physics, Chemistry, and Mathematics. The workshop brought together leading AI and science researchers to chart how the MPS domains can best capitalize on — and contribute to — the future of AI. Now a white paper, with recommendations for funding agencies, institutions, and researchers, has been published in Machine Learning: Science and Technology. In this interview, Jesse Thaler, MIT professor of physics and chair of the workshop, describes key themes and how MIT is positioning itself to lead in AI and science.

Q: What are the report’s key themes regarding last year’s gathering of leaders across the mathematical and physical sciences?

A: Gathering so many researchers at the forefront of AI and science in one room was illuminating. Though the workshop participants came from five distinct scientific communities — astronomy, chemistry, materials science, mathematics, and physics — we found many similarities in how we are each engaging with AI. A real consensus emerged from our animated discussions: Coordinated investment in computing and data infrastructures, cross-disciplinary research techniques, and rigorous training can meaningfully advance both AI and science.

One of the central insights was that this has to be a two-way street. It’s not just about using AI to do better science; science can also make AI better. Scientists excel at distilling insights from complex systems, including neural networks, by uncovering underlying principles and emergent behaviors. We call this the “science of AI,” and it comes in three flavors: science driving AI, where scientific reasoning informs foundational AI approaches; science inspiring AI, where scientific challenges push the development of new algorithms; and science explaining AI, where scientific tools help illuminate how machine intelligence actually works.

In my own field of particle physics, for instance, researchers are developing real-time AI algorithms to handle the data deluge from collider experiments. This work has direct implications for discovering new physics, but the algorithms themselves turn out to be valuable well beyond our field. The workshop made clear that the science of AI should be a community priority — it has the potential to transform how we understand, develop, and control AI systems.

Of course, bridging science and AI requires people who can work across both worlds. Attendees consistently emphasized the need for “centaur scientists” — researchers with genuine interdisciplinary expertise. Supporting these polymaths at every career stage, from integrated undergraduate courses to interdisciplinary PhD programs to joint faculty hires, emerged as essential.

Q: How do MIT’s AI and science efforts align with the workshop recommendations?

A: The workshop framed its recommendations around three pillars: research, talent, and community. As director of the NSF Institute for Artificial Intelligence and Fundamental Interactions (IAIFI) — a collaborative AI and physics effort among MIT and Harvard, Northeastern, and Tufts universities — I’ve seen firsthand how effective this framework can be. Scaling this up to MIT, we can see where progress is being made and where opportunities lie.

On the research front, MIT is already enabling AI-and-science work in both directions. Even a quick scroll through MIT News shows how individual researchers across the School of Science are pursuing AI-driven projects, building a pipeline of knowledge and surfacing new opportunities. At the same time, collaborative efforts like IAIFI and the Accelerated AI Algorithms for Data-Driven Discovery (A3D3) Institute concentrate interdisciplinary energy for greater impact. The MIT Generative AI Impact Consortium is also supporting application-driven AI work at the university scale.

To foster early-career AI-and-science talent, several initiatives are training the next generation of centaur scientists. The MIT Schwarzman College of Computing’s Common Ground for Computing Education program helps students become “bilingual” in computing and their home discipline. Interdisciplinary PhD pathways are also gaining traction; IAIFI worked with the MIT Institute for Data, Systems, and Society to create one in physics, statistics, and data science, and about 10 percent of physics PhD students now opt for it — a number that’s likely to grow. Dedicated postdoctoral roles like the IAIFI Fellowship and Tayebati Fellowship give early-career researchers the freedom to pursue interdisciplinary work. Funding centaur scientists and giving them space to build connections across domains, universities, and career stages has been transformative.

Finally, community-building ties it all together. From focused workshops to large symposia, organizing interdisciplinary events signals that AI and science isn’t siloed work — it’s an emerging field. MIT has the talent and resources to make a significant impact, and hosting these gatherings at multiple scales helps establish that leadership.

Q: What lessons can MIT draw about further advancing its AI-and-science efforts?

A: The workshop crystallized something important: The institutions that lead in AI and science will be the ones that think systematically, not piecemeal. Resources are finite, so priorities matter. Workshop attendees were clear about what becomes possible when an institution coordinates hires, research, and training around a cohesive strategy.

MIT is well positioned to build on what’s already underway with more structural initiatives — joint faculty lines across computing and scientific domains, expanded interdisciplinary degree pathways, and deliberate “science of AI” funding. We’re already seeing moves in this direction; this year, the MIT Schwarzman College of Computing and the Department of Physics are conducting their first-ever joint faculty search, which is exciting to see.

The virtuous cycle of AI and science has the potential to be truly transformative — offering deeper insight into AI, accelerating scientific discovery, and producing robust tools for both. By developing an intentional strategy, MIT will be well positioned to lead in, and benefit from, the coming waves of AI.

Can AI help predict which heart-failure patients will worsen within a year?

Characterized by weakened or damaged heart musculature, heart failure results in the gradual buildup of fluid in a patient’s lungs, legs, feet, and other parts of the body. The condition is chronic and incurable, often leading to arrhythmias or sudden cardiac arrest. For many centuries, bloodletting and leeches were the treatment of choice, famously practiced by barber surgeons in Europe, during a time when physicians rarely operated on patients. 

In the 21st century, the management of heart failure has become decidedly less medieval: Today, patients undergo a combination of healthy lifestyle changes, prescription of medications, and sometimes use pacemakers. Yet heart failure remains one of the leading causes of morbidity and mortality, placing a substantial burden on health-care systems across the globe. 

“About half of the people diagnosed with heart failure will die within five years of diagnosis,” says Teya Bergamaschi, an MIT PhD student in the lab of Nina T. and Robert H. Rubin Professor Collin Stultz and the co-first author of a new paper introducing a deep learning model for predicting heart failure. “Understanding how a patient will fare after hospitalization is really important in allocating finite resources.”

The paper, published in Lancet eClinical Medicine by a team of researchers at MIT, Mass General Brigham, and Harvard Medical School, shares results from developing and testing PULSE-HF, which stands loosely for “Predict changes in left ventricULar Systolic function from ECGs of patients who have Heart Failure.” The project was conducted in Stultz’s lab, which is affiliated with the MIT Abdul Latif Jameel Clinic for Machine Learning in Health. Developed and retrospectively tested across three different patient cohorts from Massachusetts General Hospital, Brigham and Women’s Hospital, and MIMIC-IV (a publicly available dataset), the deep learning model accurately predicts changes in the left ventricular ejection fraction (LVEF), which is the percentage of blood being pumped out of the left ventricle of the heart.

A healthy human heart pumps out about 50 to 70 percent of blood from the left ventricle with each beat — anything less is considered a sign of a potential problem. “The model takes an [electrocardiogram] and outputs a prediction of whether or not there will be an ejection fraction within the next year that falls below 40 percent,” says Tiffany Yau, an MIT PhD student in Stultz’s lab who is also co-first author of the PULSE-HF paper. “That is the most severe subgroup of heart failure.” 

If PULSE-HF predicts that a patient’s ejection fraction is likely to worsen within a year, the clinician can prioritize the patient for follow-up. Subsequently, lower-risk patients can reduce their number of hospital visits and the amount of time spent getting 10 electrodes adhered to their body for a 12-lead ECG. The model can also be deployed in low-resource clinical settings, including doctors offices in rural areas that don’t typically have a cardiac sonographer employed to run ultrasounds on a daily basis.

“The biggest thing that distinguishes [PULSE-HF] from other heart failure ECG methods is instead of detection, it does forecasting,” says Yau. The paper notes that to date, no other methods exist for predicting future LVEF decline among patients with heart failure.

During the testing and validation process, the researchers used a metric known as “area under the receiver operating characteristic curve” (AUROC) to measure PULSE-HF’s performance. AUROC is typically used to measure a model’s ability to discriminate between classes on a scale from 0 to 1, with 0.5 being random and 1 being perfect. PULSE-HF achieved AUROCs ranging from 0.87 to 0.91 across all three patient cohorts.

Notably, the researchers also built a version of PULSE-HF for single-lead ECGs, meaning only one electrode needs to be placed on the body. While 12-lead ECGs are generally considered superior for being more comprehensive and accurate, the performance of the single-lead version of PULSE-HF was just as strong as the 12-lead version.

Despite the elegant simplicity behind the idea of PULSE-HF, like most clinical AI research, it belies a laborious execution. “It’s taken years [to complete this project],” Bergamaschi recalls. “It’s gone through many iterations.” 

One of the team’s biggest challenges was collecting, processing, and cleaning the ECG and echocardiogram datasets. While the model aims to forecast a patient’s ejection fraction, the labels for the training data weren’t always readily available. Much like a student learning from a textbook with an answer key, labeling is critical for helping machine-learning models correctly identify patterns in data.

Clean, linear text in the form of TXT files typically works best when training models. But echocardiogram files typically come in the form of PDFs, and when PDFs are converted to TXT files, the text (which gets broken up by line breaks and formatting) becomes difficult for the model to read. The unpredictable nature of real-life scenarios, like a restless patient or a loose lead, also marred the data. “There are a lot of signal artifacts that need to be cleaned,” Bergamaschi says. “It’s kind of a never-ending rabbit hole.”

While Bergamaschi and Yau acknowledge that more complicated methods could help filter the data for better signals, there is a limit to the usefulness of these approaches. “At what point do you stop?” Yau asks. “You have to think about the use case — is it easiest to have this model that works on data that is slightly messy? Because it probably will be.”

The researchers anticipate that the next step for PULSE-HF will be testing the model in a prospective study on real patients, whose future ejection fraction is unknown.

Despite the challenges inherent to bringing clinical AI tools like PULSE-HF over the finish line, including the possible risk of prolonging a PhD by another year, the students feel that the years of hard work were worthwhile. 

“I think things are rewarding partially because they’re challenging,” Bergamaschi says. “A friend said to me, ‘If you think you will find your calling after graduation, if your calling is truly your calling, it will be there in the one additional year it takes you to graduate.’ … The way we’re measured as researchers in [the ML and health] space is different from other researchers in ML space. Everyone in this community understands the unique challenges that exist here.”

“There’s too much suffering in the world,” says Yau, who joined Stultz’s lab after a health event made her realize the importance of machine learning in health care. “Anything that tries to ease suffering is something that I would consider a valuable use of my time.” 

MIT-IBM Watson AI Lab seed to signal: Amplifying early-career faculty impact

The early years of faculty members’ careers are a formative and exciting time in which to establish a firm footing that helps determine the trajectory of researchers’ studies. This includes building a research team, which demands innovative ideas and direction, creative collaborators, and reliable resources. 

For a group of MIT faculty working with and on artificial intelligence, early engagement with the MIT-IBM Watson AI Lab through projects has played an important role helping to promote ambitious lines of inquiry and shaping prolific research groups.

Building momentum

“The MIT-IBM Watson AI Lab has been hugely important for my success, especially when I was starting out,” says Jacob Andreas — associate professor in the Department of Electrical Engineering and Computer Science (EECS), a member of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), and a researcher with the MIT-IBM Watson AI Lab — who studies natural language processing (NLP). Shortly after joining MIT, Andreas jump-started his first major project through the MIT-IBM Watson AI Lab, working on language representation and structured data augmentation methods for low-resource languages. “It really was the thing that let me launch my lab and start recruiting students.” 

Andreas notes that this occurred during a “pivotal moment” when the field of NLP was undergoing significant shifts to understand language models — a task that required significantly more compute, which was available through the MIT-IBM Watson AI Lab. “I feel like the kind of the work that we did under that [first] project, and in collaboration with all of our people on the IBM side, was pretty helpful in figuring out just how to navigate that transition.” Further, the Andreas group was able to pursue multi-year projects on pre-training, reinforcement learning, and calibration for trustworthy responses, thanks to the computing resources and expertise within the MIT-IBM community.

For several other faculty members, timely participation with the MIT-IBM Watson AI Lab proved to be highly advantageous as well. “Having both intellectual support and also being able to leverage some of the computational resources that are within MIT-IBM, that’s been completely transformative and incredibly important for my research program,” says Yoon Kim — associate professor in EECS, CSAIL, and a researcher with the MIT-IBM Watson AI Lab — who has also seen his research field alter trajectory. Before joining MIT, Kim met his future collaborators during an MIT-IBM postdoctoral position, where he pursued neuro-symbolic model development; now, Kim’s team develops methods to improve large language model (LLM) capabilities and efficiency. 

One factor he points to that led to his group’s success is a seamless research process with intellectual partners. This has allowed his MIT-IBM team to apply for a project, experiment at scale, identify bottlenecks, validate techniques, and adapt as necessary to develop cutting-edge methods for potential inclusion in real-world applications. “This is an impetus for new ideas, and that’s, I think, what’s unique about this relationship,” says Kim.

Merging expertise

The nature of the MIT-IBM Watson AI Lab is that it not only brings together researchers in the AI realm to accelerate research, but also blends work across disciplines. Lab researcher and MIT associate professor in EECS and CSAIL Justin Solomon describes his research group as growing up with the lab, and the collaboration as being “crucial … from its beginning until now.” Solomon’s research team focuses on theoretically oriented, geometric problems as they pertain to computer graphics, vision, and machine learning. 

Solomon credits the MIT-IBM collaboration with expanding his skill set as well as applications of his group’s work — a sentiment that’s also shared by lab researchers Chuchu Fan, an associate professor of aeronautics and astronautics and a member of the Laboratory for Information and Decision Systems, and Faez Ahmed, associate professor of mechanical engineering. “They [IBM] are able to translate some of these really messy problems from engineering into the sort of mathematical assets that our team can work on, and close the loop,” says Solomon. This, for Solomon, includes fusing distinct AI models that were trained on different datasets for separate tasks. “I think these are all really exciting spaces,” he says.

“I think these early-career projects [with the MIT-IBM Watson AI Lab] largely shaped my own research agenda,” says Fan, whose research intersects robotics, control theory, and safety-critical systems. Like Kim, Solomon, and Andreas, Fan and Ahmed began projects through the collaboration the first year they were able to at MIT. Constraints and optimization govern the problems that Fan and Ahmed address, and so require deep domain knowledge outside of AI. 

Working with the MIT-IBM Watson AI Lab enabled Fan’s group to combine formal methods with natural language processing, which she says, allowed the team to go from developing autoregressive task and motion planning for robots to creating LLM-based agents for travel planning, decision-making, and verification. “That work was the first exploration of using an LLM to translate any free-form natural language into some specification that robot can understand, can execute. That’s something that I’m very proud of, and very difficult at the time,” says Fan. Further, through joint investigation, her team has been able to improve LLM reasoning­ — work that “would be impossible without the IBM support,” she says.   

Through the lab, Faez Ahmed’s collaboration facilitated the development of machine-learning methods to accelerate discovery and design within complex mechanical systems. Their Linkages work, for instance, employs “generative optimization” to solve engineering problems in a way that is both data-driven and has precision; more recently, they’re applying multi-modal data and LLMs to computer-aided design. Ahmed states that AI is frequently applied to problems that are already solvable, but could benefit from increased speed or efficiency; however, challenges — like mechanical linkages that were deemed “almost unsolvable” — are now within reach. “I do think that is definitely the hallmark [of our MIT-IBM team],” says Ahmed, praising the achievements of his MIT-IBM group, which is co-lead by Akash Srivastava and Dan Gutfreund of IBM.

What began as initial collaborations for each MIT faculty member has evolved into a lasting intellectual relationship, where both parties are “excited about the science,” and “student-driven,” Ahmed adds. Taken together, the experiences of Jacob Andreas, Yoon Kim, Justin Solomon, Chuchu Fan, and Faez Ahmed speak to the impact that a durable, hands-on, academia-industry relationship can have on establishing research groups and ambitious scientific exploration.

Sustaining diplomacy amid competition in US-China relations

The United States and China “are the two largest emitters of carbon in the world,” said Nicholas Burns, former U.S. ambassador to the People’s Republic of China, at a recent MIT seminar. “We need to work with each other for the good of both of our countries.” 

During the MITEI Presents: Advancing the Energy Transition presentation, Burns gave insight into the evolving state of U.S.-China relations, its implications for the global order, and its impact on global efforts to advance the energy transition and address climate change.

“We are the two largest global economies,” said Burns, who is now the Goodman Professor of the Practice of Diplomacy and International Relations at Harvard University’s Kennedy School of Government. “These are the only two countries that affect everybody else in the international system because of our weight.”

The relationship between the United States and China can be summarized in three words, according to Burns: competitive, tough, and adversarial — a description that rings true on both sides. He listed four primary areas for this competition: military, technology, trade and economics, and values.

Burns described the especially complicated area of trade and economics. “We both want to be number one. Neither of us — to be honest — is willing to be number two,” said Burns. Outside of North America, China is the United States’ largest trade partner. Outright trade wars — like those in April and October 2025 — create friction. “At one point, you’ll remember, 145 percent tariffs by the United States, and 125 percent by China on the United States. That just grinds a relationship. Those level of tariffs, had they been sustained, would have meant zero trade between the two countries.”

The energy field can be significantly impacted by this area of competition, Burns added. China is dominant in the production and processing of rare earth elements, many of which are critical to products like lithium batteries, solar panels, and electric vehicles. In 2024 and 2025, the United States was not the only country to place tariffs on these products; India, Turkey, South Africa, Mexico, Canada, the EU, and others followed suit. “I think the Trump administration is right, as President Biden was, to try to diversify sources on rare earths,” Burns said.

Burns also noted with interest the dichotomy in the Chinese energy sector between their lead on clean energy technology and their continual use of coal, standing out as an inconsistency in China’s efforts. Burns believes that climate change could be a key area of cooperation between China and the United States, emphasizing the importance of the United States’ participation, both technologically and diplomatically.

Burns also described the significant technological competition between the United States and China — an area of central importance. Throughout his presentation, Burns was quick to praise the emphasis that China puts on education and academic achievement, particularly in STEM fields. Pulling from a recent article in The Economist, he compared the 36 percent of Chinese first-year university students majoring in STEM fields to the 5 percent of American first-year students in STEM. “Think about the volume of graduates and the disparity between our country and China,” he said. “Then think about the percentage of those graduates who go into science and technology.”

Currently, areas like artificial intelligence, quantum computing, and biotechnology are taking center stage in technological innovation. “The Chinese are very skilled in terms of industrial processes and doctrine of adapting quickly,” said Burns. He explained that holding a competitive edge lies not only in who is first on the market, but who adopts the technology first, and who is able to unite that technological progress with policy.

“This is the most important relationship that we have in the world,” said Burns. He believes that the true test is whether the United States and China can manage competition so that interests are protected, while avoiding the use of the massive destructive power both countries possess. “We’ve got to normalize the communication and engagement to prevent the worst from happening,” said Burns.

“We’re at a stage of human history where we’re all linked together, and the fate of everybody in this room and all of our countries is linked together by these huge transnational challenges,” said Burns. “We’ve got to learn to compete and yet live in peace with each other in the process.”

This speaker series highlights energy experts and leaders at the forefront of the scientific, technological, and policy solutions needed to transform our energy systems. Visit MITEI’s Events page for more information on this and additional events.

Generative AI improves a wireless vision system that sees through obstructions

MIT researchers have spent more than a decade studying techniques that enable robots to find and manipulate hidden objects by “seeing” through obstacles. Their methods utilize surface-penetrating wireless signals that reflect off concealed items.

Now, the researchers are leveraging generative artificial intelligence models to overcome a longstanding bottleneck that limited the precision of prior approaches. The result is a new method that produces more accurate shape reconstructions, which could improve a robot’s ability to reliably grasp and manipulate objects that are blocked from view.

This new technique builds a partial reconstruction of a hidden object from reflected wireless signals and fills in the missing parts of its shape using a specially trained generative AI model.

The researchers also introduced an expanded system that uses generative AI to accurately reconstruct an entire room, including all the furniture. The system utilizes wireless signals sent from one stationary radar, which reflect off humans moving in the space.  

This overcomes one key challenge of many existing methods, which require a wireless sensor to be mounted on a mobile robot to scan the environment. And unlike some popular camera-based techniques, their method preserves the privacy of people in the environment.

These innovations could enable warehouse robots to verify packed items before shipping, eliminating waste from product returns. They could also allow smart home robots to understand someone’s location in a room, improving the safety and efficiency of human-robot interaction.

“What we’ve done now is develop generative AI models that help us understand wireless reflections. This opens up a lot of interesting new applications, but technically it is also a qualitative leap in capabilities, from being able to fill in gaps we were not able to see before to being able to interpret reflections and reconstruct entire scenes,” says Fadel Adib, associate professor in the Department of Electrical Engineering and Computer Science, director of the Signal Kinetics group in the MIT Media Lab, and senior author of two papers on these techniques. “We are using AI to finally unlock wireless vision.”

Adib is joined on the first paper by lead author and research assistant Laura Dodds; as well as research assistants Maisy Lam, Waleed Akbar, and Yibo Cheng; and on the second paper by lead author and former postdoc Kaichen Zhou; Dodds; and research assistant Sayed Saad Afzal. Both papers will be presented at the IEEE Conference on Computer Vision and Pattern Recognition.

Surmounting specularity

The Adib Group previously demonstrated the use of millimeter wave (mmWave) signals to create accurate reconstructions of 3D objects that are hidden from view, like a lost wallet buried under a pile.

These waves, which are the same type of signals used in Wi-Fi, can pass through common obstructions like drywall, plastic, and cardboard, and reflect off hidden objects.

But mmWaves usually reflect in a specular manner, which means a wave reflects in a single direction after striking a surface. So large portions of the surface will reflect signals away from the mmWave sensor, making those areas effectively invisible.

“When we want to reconstruct an object, we are only able to see the top surface and we can’t see any of the bottom or sides,” Dodds explains.

The researchers previously used principles from physics to interpret reflected signals, but this limits the accuracy of the reconstructed 3D shape.

In the new papers, they overcame that limitation by using a generative AI model to fill in parts that are missing from a partial reconstruction.

“But the challenge then becomes: How do you train these models to fill in these gaps?” Adib says.

Usually, researchers use extremely large datasets to train a generative AI model, which is one reason models like Claude and Llama exhibit such impressive performance. But no mmWave datasets are large enough for training.

Instead, the researchers adapted the images in large computer vision datasets to mimic the properties in mmWave reflections.

“We were simulating the property of specularity and the noise we get from these reflections so we can apply existing datasets to our domain. It would have taken years for us to collect enough new data to do this,” Lam says.

The researchers embed the physics of mmWave reflections directly into these adapted data, creating a synthetic dataset they use to teach a generative AI model to perform plausible shape reconstructions.

The complete system, called Wave-Former, proposes a set of potential object surfaces based on mmWave reflections, feeds them to the generative AI model to complete the shape, and then refines the surfaces until it achieves a full reconstruction.

Wave-Former was able to generate faithful reconstructions of about 70 everyday objects, such as cans, boxes, utensils, and fruit, boosting accuracy by nearly 20 percent over state-of-the-art baselines. The objects were hidden behind or under cardboard, wood, drywall, plastic, and fabric.

Seeing “ghosts”

The team used this same approach to build an expanded system that fully reconstructs entire indoor scenes by leveraging mmWave reflections off humans moving in a room.

Human motion generates multipath reflections. Some mmWaves reflect off the human, then reflect again off a wall or object, and then arrive back at the sensor, Dodds explains.

These secondary reflections create so-called “ghost signals,” which are reflected copies of the original signal that change location as a human moves. These ghost signals are usually discarded as noise, but they also hold information about the layout of the room.

“By analyzing how these reflections change over time, we can start to get a coarse understanding of the environment around us. But trying to directly interpret these signals is going to be limited in accuracy and resolution.” Dodds says.

They used a similar training method to teach a generative AI model to interpret those coarse scene reconstructions and understand the behavior of multipath mmWave reflections. This model fills in the gaps, refining the initial reconstruction until it completes the scene.

They tested their scene reconstruction system, called RISE, using more than 100 human trajectories captured by a single mmWave radar. On average, RISE generated reconstructions that were about twice as precise than existing techniques.

In the future, the researchers want to improve the granularity and detail in their reconstructions. They also want to build large foundation models for wireless signals, like the foundation models GPT, Claude, and Gemini for language and vision, which could open new applications.

This work is supported, in part, by the National Science Foundation (NSF), the MIT Media Lab, and Amazon.