Back to blog
Product noteMay 20266 min read

Toward a research agent for 2D materials

Why a calibrated research agent — not a generic chatbot — is the right interface to spectroscopy, simulation, documents, and project memory for 2D-material labs.

A 2D-materials lab is a strange knowledge graph. A single sample produces a hyperspectral Raman map, a PL map, a stack of microscopy images, a few pages of growth notes, a slide of process conditions, and eventually a paragraph in a paper. Each artifact lives in a different tool, and the thing that connects them is the scientist's memory.

That works at one sample a week. It does not work at the throughput AI-assisted research is starting to demand.

The Matter42 research agent is the part of the platform meant to close that gap. It isn't a chatbot bolted onto a file viewer. It's a calibrated assistant that holds project context, calls a small set of well-defined scientific tools, and writes back into a memory the next session can read.

What a research agent has to be

There are two failure modes for "AI for science" products. One is a generic LLM that confidently makes up phonon assignments. The other is a pile of disconnected scripts that each do one calculation correctly but lose the context tying them together.

We want neither. The bar we hold ourselves to:

  • Calibrated, not improvised. Every quantitative claim (defect density, cluster boundary, peak assignment) comes from a tool whose calibration is explicit and version-controlled, not from the language model's arithmetic.
  • Context-preserving. Sample identity, growth conditions, prior analyses, and the scientist's own notes follow the project across sessions. The agent should not need to be re-briefed every time it opens.
  • Auditable. Every tool call leaves a trace: inputs, outputs, the figures it produced. A reviewer six months later should be able to reconstruct what was done.
  • Honest about uncertainty. When a measurement is ambiguous or a region is artifact-prone, the agent should say so, surface the relevant figure, and ask. Silent confidence is dangerous.

These constraints rule out a lot of designs that look good in a demo and fail in a working lab.

The architecture in one paragraph

The agent itself is small. Most of the value lives in three layers underneath it: a set of domain tools that each do one calibrated thing, a project memory that's durable across sessions, and a parsing layer that turns whatever the user uploads into a typed dataset the tools know how to consume. The agent's job is mostly to choose the right tool, pass it the right handles, and stitch the results back together for the scientist.

A thin agent on top of three substrates

Most of the value lives in the layers below the chat — the agent's job is mostly to choose the right tool and stitch results back together.

Calibration library · project storeVersioned reference data and per-project stateDomain toolsCalibrated, single-purposeVersioned reference dataQuantitative outputsProvenance for every callProject memorySample & file contextPinned analysis choicesDurable notesAuditable historyFile ingestionMultiple input modalitiesPre-parsed at uploadTyped dataset handlesTool-aware affordancesResearch agentroutes calls · keeps results consistentScientistquestions · figures
The agent is the surface. The work happens in the substrates underneath.

That structure is deliberate. The hard parts of materials science aren't the language. The hard parts are the calibration libraries, the reference data, and the disciplined plumbing between them. The agent is the surface; the science is in the tools.

Domain tools, not domain prompts

A general-purpose model can describe a Raman spectrum in plausible English. It can't, on its own, tell you the vacancy percentage in a WS₂ monolayer from the E₂g linewidth. That requires:

  • Reference data for the material: peak positions, intrinsic widths, temperature coefficients.
  • A calibration curve relating linewidth to defect concentration, derived from first-principles simulation.
  • A spatial-statistics step to separate clean material from boundary halos and instrument artifacts.
  • An honest error bar that accounts for instrument resolution.

Each is a small, well-defined function. Wrapped as a tool, the agent can call it and get back a quantitative answer with provenance. Wrapped as a prompt, it would be a hallucination waiting to happen.

The same principle applies across the rest of the workflow. Parsing vendor formats and handling bad pixels is deterministic; reproducible plotting of hyperspectral data has to be pixel-accurate; unsupervised separation of high- and low-defect regions needs exact grid alignment; inverting a measured peak shape into a defect concentration is regression against a calibrated curve, not arithmetic in prose. None of these are creative tasks. They're the kind of thing where you want an answer, a confidence interval, and a record of how the answer was produced.

The agent's role is to know which of these to reach for and to keep their outputs consistent with each other. When a scientist asks "how defective is this sample?", the right answer is a quantitative map produced by the calibrated tool, not a paragraph guessing at it.

Project memory

Tools matter, but the thing that makes a research workspace feel different from a chat window is memory.

A project carries the context a scientist would otherwise have to re-explain at the top of every chat: what samples and files belong to it, the analysis choices that have been pinned, durable notes about sample history and recurring preferences, and a record of which analyses produced which figures.

This isn't a transcript. It's a structured store the next session reads before asking its first question. A scientist who comes back two weeks later doesn't have to re-explain how this batch was grown, which regions of the map are artifact-prone, or what "low-defect" means in the context of this project. The agent already knows.

That memory is also what makes review tractable. A collaborator opening the project can see, in one place, what the sample is, what was measured, what was concluded, and which figures justify the conclusion. The lab notebook stops being a parallel artifact someone has to maintain by hand.

How files become first-class

Most "upload a file and chat with it" products treat the file as an opaque blob the model has to wrestle with at chat time. That fails on hyperspectral data: a single Raman map can be tens of millions of intensity values, and asking a language model to reason over the raw bytes is both wrong and expensive.

So uploaded files are parsed before the conversation starts. Parsing decides what kind of dataset it is and converts it into a form the tools know how to read. The agent doesn't see the raw bytes. It sees a structured reference to the parsed dataset, knows what kind of data is in it, and knows which tools apply.

The result is that the same chat can fluently talk about a Raman map, a PL map, a TEM image, a process log, and a PDF of a related paper, because each one has been pre-translated into something a tool can consume.

Where the tools come from

The calibration data behind the spectroscopy tools isn't improvised. It's grounded in first-principles simulation, validated against measured spectra from materials with well-understood defect chemistry, and extended every time a new calibration run goes through. The numbers the agent quotes at chat time are the numbers that came out of that pipeline.

This shares an architectural bet with AI-native manufacturing for 2D materials: research analysis and production-line analysis should rest on the same calibration assumptions, so that what a scientist learns about a sample in a workspace is portable to the kind of in-line decisions a closed-loop tool eventually has to make.

Why this is the right shape for the moment

The pace of progress in 2D materials is, increasingly, set by how fast a lab can iterate on the loop of grow → measure → interpret → adjust. Every step in that loop has been getting faster: precursor delivery, in-situ optics, simulation throughput. The bottleneck has been the interpretation step, which traditionally lives in a scientist's head and a folder of half-organized files.

A research agent that's calibrated, context-preserving, and honest about its limits is a direct attack on that bottleneck. It doesn't replace the scientist; it removes the friction between asking a scientific question and getting a quantitative answer that someone else can review.

That feels like the right surface area for AI in materials science right now. Not a model that promises to discover the next semiconductor on its own, but a workspace where a domain expert and a calibrated assistant can move through a project at the speed the underlying physics actually allows.

Where we go from here

The current research agent already covers the spectroscopy and simulation surface a TMD lab uses day to day. The roadmap from here is mostly about depth, not breadth: better region-aware analyses, more calibrated defect species, tighter integration with growth process logs, and eventually a live link to the closed-loop production tool the manufacturing post describes.

We'll keep posting notes here as those pieces land.

Copyright © 2026 Matter42. All rights reserved.