Back to blog
VisionMay 20268 min read

AI-native manufacturing for 2D materials

Closed-loop digital twins, surrogate models, and edge metrology turn a wafer line into a self-tuning research instrument.

Most semiconductor fabs treat process recipes as folklore. A senior engineer locks in a sequence (temperature ramp, precursor flow, growth time, anneal) and that recipe propagates by tribal knowledge from tool to tool, sometimes from lab to lab. When something drifts, a small group of people troubleshoot it from memory.

That model worked for silicon. It does not scale to 2D materials, where every monolayer of MoS₂, WS₂, or WSe₂ is a different beast depending on substrate, sulfur partial pressure, and how clean the upstream chamber happened to be that week. The defect chemistry is sensitive enough that two nominally identical CVD runs can produce films an order of magnitude apart in vacancy density (Komsa et al. PRL 2012; Hong et al. Nat. Commun. 2015).

We think the way out is to build the line as if it were a research instrument: instrumented, simulated, and closed-loop from the start.

The control loop

Each run feeds back into the model that picks the next one.

Growth tooldeposition stepIn-situ metrologyoptical, non-destructiveCalibration librarypaired recordsPredictortrained offlinePlannerproposes next setpointNext recipetarget-shapedwaferspectrumretrainpredicted outcomeoptimizecommit
A closed-loop view of the same primitives that already exist as offline pieces in most labs.

The strategic objective

The goal is autonomous robotic synthesis driven by closed-loop digital twins.

Concretely, that means a CVD or ALD tool that:

  • Carries a live model of its own state (chamber chemistry, thermal field, residual contamination).
  • Predicts the wafer-level outcome of the next run before it commits to running it.
  • Updates that prediction from on-tool metrology, not from a binder of paper SOPs.

A digital twin is only useful if it closes the loop. A model that gets compared to reality at the end of the week is a dashboard. A model that gets compared every wafer, and that gets to argue back about the next setpoint, is a controller.

This is the same architectural bet behind the new generation of self-driving labs (Häse, Roch & Aspuru-Guzik, Trends Chem. 2019; A-Lab at LBNL, Nature 2023). The difference is that we want the loop to run continuously on a wafer line, not a synthesis well-plate.

Autonomous synthesis with surrogate models

DFT and MLIP-driven simulations of phonon spectra, defect formation energies, and Raman fingerprints are the ground truth. They're also too slow to live inside a control loop. A run that takes hours to converge can't tell a tool what to do in the next thirty seconds.

The surrogate-model layer collapses this. The slow simulations — phonon calculations across the relevant defect populations, run on a modern MLIP relaxed against DFT references — are used to build a calibration library offline. That library then trains compact predictors that are fast enough to call inside a control loop. Given a process condition, the predictor returns an expected defect distribution and an expected spectral signature.

From DFT to the controller

Each tier distills the one above into something fast enough to live in a tighter loop.

SLOWFASTDFT referenceslowground truthMachine-learning interatomic potentialsofflinetrained on DFTCompact predictorsreal-timedistilled from MLIPsInference at the measurementline-ratededicated hardware
The control loop only works if the model layer runs in milliseconds. The layers above stay offline.

The cost gradient between these layers is the whole reason the architecture works. DFT runs are the ground truth and take hours per structure. MLIP runs build the calibration library on a GPU. The trained predictors run on commodity hardware fast enough to live in a planner. Inference at the optical head runs faster still. Each tier handles a different question on a different timescale.

That gives the planner a tractable optimization target. Instead of asking what recipe minimizes vacancy density across this wafer (intractable), it asks what recipe minimizes a measurable spectral feature the predictor knows how to forecast, which is well-posed and can be validated against the next measurement. Bayesian optimization frameworks like Dragonfly and BoTorch have already shown this works for chemical reaction optimization (Shields et al. Nature 2021); a wafer line is a higher-dimensional version of the same problem.

The simulation pipeline does not go away. It moves up the stack, running offline to extend the calibration library and to re-train the predictors when they disagree with metrology. MLIP accuracy itself is a moving target (Pota et al. on κSRME, NeurIPS 2024), so the same architecture has to absorb model upgrades without a process re-qualification.

Edge computing for high-fidelity feedback

The bottleneck in any closed loop is the latency and fidelity of the measurement. Pulling a wafer out, sending it across campus for AFM or TEM, and waiting two days for an answer is not feedback. It's a postmortem.

In-situ Raman and PL mapping is the right primitive. It's fast, non-destructive, sensitive to the same defect chemistry that controls device performance (Mignuzzi et al. PRB 2015; Berkdemir et al. Sci. Rep. 2013), and it can be built directly into the tool. The relationships between vibrational mode shape, peak shifts, and defect chemistry are well-characterized in the literature for the common TMDs and for graphene; turning those relationships into a quantitative readout is engineering, not science.

The constraint is throughput. A hyperspectral map is megabytes per second of structured data, and the controller needs answers in milliseconds. That pushes inference toward dedicated hardware close to the measurement, running the same family of models a researcher would run in the workspace but continuously and at line rate. The output isn't a spectrum. It's a per-wafer estimate of defect concentration, a confidence interval, and a flag if the spectrum looks anomalous compared to the calibration library.

Sensors get cheaper and faster every year. Inference at the measurement is the part that has to be designed in.

What the loop changes

Relative to a manual recipe-development cycle, a closed loop of this kind reshapes a handful of things at once.

The most obvious is iteration rate. Self-driving labs in chemistry have already demonstrated order-of-magnitude accelerations once every experiment runs through a planner instead of through a grad-student-week (Burger et al. Nature 2020). 2D-material growth is a harder optimization problem, but the bottleneck is the same: measurement loop latency. Replace days-long ex-situ characterization with seconds-long in-situ readout, and the planner gets to act on every wafer.

The less obvious change is what a recipe even is. Recipes expressed as setpoints (temperatures, flows, dwell times) don't transfer between tools, because a temperature is a chamber wall and a flow is a meter that drifts. Recipes expressed as targets in measurable physical features do transfer, because every tool has its own controller that can solve for the local setpoints needed to hit them. That moves the unit of reusable knowledge from a tool-specific procedure to a target-specific specification.

The other shifts are smaller and follow from the same primitives. Per-wafer fault detection becomes possible because the anomaly model is already running. Growth times get shorter because the planner can stop when the in-situ readout says the film is good. Drift becomes a leading indicator of contamination or wear instead of something noticed at end-of-batch.

None of these are speculative individually. The bet is that they compound when they live inside the same loop.

Where the durable advantage lives

Anyone can buy a CVD tool. Anyone can hire a process engineer. The thing that compounds, and that doesn't transfer, is the experimental record: paired measurements and process metadata, accumulated over years of operation, validated against the simulation pipeline that generated the calibration in the first place. Every wafer that runs through a closed-loop tool extends that record, which in turn tightens the predictors the next wafer is grown against.

The same dynamic is what makes foundation-model lab automation work for protein engineering: the durable asset is the labeled experimental dataset, not the model architecture. Whoever runs the loop longest, against the best metrology, ends up with a calibration library no one else can replicate without doing the same work.

What this means for Matter42

The platform we're building today is the research-side counterpart to the production loop above. The same kinds of analyses a scientist runs on a single sample in a workspace are the kinds of analyses a production controller has to run continuously on a line. The earlier those two surfaces share calibration and model assumptions, the shorter the path between them.

The lab buildout we sketched in Designing the Matter42 lab is the first physical step in that direction. A closed-loop production tool is the eventual one.

What AI-native manufacturing looks like from where we sit is not a robot that replaces an engineer. It's a tool that writes its own recipes in a language the engineer and the model can both read.

Copyright © 2026 Matter42. All rights reserved.