Methods

Matter42's analysis tools are designed around interpretable spectroscopy features. The agent can discuss results in natural language, but quantitative conclusions should come from the tool outputs and their calibration limits.

Parsing And Exploration

parse_data converts uploaded files into normalized datasets. For hyperspectral maps, that means a spatial grid, spectral axis, spectra cube, quality mask, and metadata when the parser can recover it.

explore_data is the first scientific pass. It shows the mean spectrum, pixel-level spectra, spatial maps, and feature statistics so you can check whether peaks are resolved, masks are sensible, and gradients are real before asking for density or classification.

Raman Observables

For Raman maps, the most important channels are:

E2g center and FWHM.
A1g center and FWHM.
E2g and A1g intensities.
A1g/E2g ratio.
Peak asymmetry when measurable.

Linewidth broadening is sensitive to disorder and finite phonon lifetime. Peak shifts can also reflect strain, doping, temperature, thickness, and defect chemistry, so they should be interpreted with sample context.

PL Observables

For PL maps, the tools focus on:

Integrated PL intensity and quenching.
Peak position and FWHM.
Spectral asymmetry.
Trion/exciton balance.
Sub-gap emission fraction.

PL is most useful for electronic activity: whether defects act as traps, non-radiative recombination centers, or charged scattering sites. It complements Raman, which is more directly tied to lattice disorder.

Clustering Method

cluster_defects extracts Raman and PL features, standardizes them, projects them with PCA for visualization, and clusters valid pixels with KMeans. If you do not specify n_clusters, the tool tests 2-4 clusters and selects by silhouette score.

Cluster labels are ordered for interpretability: low-defect clusters tend to be brighter in PL or lower in Raman damage proxies, while high-defect clusters tend to show stronger quenching, broader Raman modes, or elevated A1g/E2g ratios.

Use clustering as a segmentation of spectral populations. It answers where distinct populations are, not which atomistic defect is present.

Density Inversion

estimate_defect_density converts Raman E2g linewidth into an equivalent defect concentration. The tool:

Measures E2g FWHM per valid pixel.
Subtracts instrumental broadening in quadrature when instrument_fwhm is known or estimated.
Compares the intrinsic linewidth to MLIP calibration curves.
Returns a per-pixel defect density map and candidate calibration summaries.

The result is strongest when:

The material is MoS2 or WS2.
E2g is well resolved.
Instrumental broadening is known.
The sample is inside the 0-4% calibrated range.
The chosen region excludes obvious artifacts.

The reported percentage is an equivalent concentration under the calibration model. It is not a universal concentration measurement for every possible defect, strain state, charge state, or multilayer condition.

Defect Type Ranking

classify_defect_type ranks calibrated defect families by matching measured Raman fingerprints against simulated series. It first uses E2g linewidth to estimate concentration on each candidate curve, then compares predicted and observed E2g/A1g centers, linewidths, and asymmetry terms.

Current calibrated families for MoS2 and WS2 include:

Chalcogen vacancy.
Metal vacancy.
S->O substitution.
S->C substitution.

The ranking is a hypothesis over known candidates. A poor residual, mixed spatial map, or strong PL disagreement can indicate strain, doping, thickness changes, unmodelled defect chemistry, or multiple defect populations.

PL Activity Score

estimate_defect_activity combines three PL-derived channels into a 0-1 score:

Quenching, weighted most strongly, captures PL yield loss from non-radiative recombination.
Trion enhancement captures charged-defect behavior.
Sub-gap emission captures deep states below the band edge.

Use this score to distinguish structural disorder from electronically harmful defects. A sample can have measurable Raman disorder without the same regions being strongly PL-active.

Region Logic

Region-aware tools share the same spatial vocabulary:

quality: all parse-time valid pixels.
interior: quality pixels eroded away from damaged or missing boundaries.
transition: boundary halo pixels.
damaged: masked or damaged pixels.
all: every pixel.

boundary_buffer_um controls the erosion distance. selection_policy="largest_component" keeps only the largest connected usable island, which is often appropriate for PFIB or fragmented flakes. segment_regions is the preview tool for checking these assumptions before running quantitative analysis.

Paired PL And Raman Alignment

When you provide both PL and Raman maps, Matter42 aligns valid coordinates with nearest-neighbor matching. This is intended for maps acquired on the same sample with slightly different grids or cropped extents.

Do not manually merge spectra unless you have a separate registration workflow. Keep the raw maps separate, parse both, and pass the companion dataset ID to the analysis tool.

Calibration Limits

Quantitative Raman tools depend on calibration data from MLIP Raman projections. The current density and defect-type workflows are calibrated for MoS2 and WS2 over roughly 0-4% defect concentration.

Important caveats:

Neutral-lattice simulations do not capture all charge, doping, or Fano effects.
Strain, temperature, multilayers, alloying, and substrate interactions can shift modes independently of defects.
Instrumental linewidth must be corrected for reliable density estimates.
Extrapolation beyond the calibration range should be treated as a boundary warning, not a precise value.
PL activity is phenomenological and should be interpreted with device geometry, excitation conditions, and sample history.

App Architecture

The web app runs in the browser and calls the Matter42 backend over HTTP. Authenticated requests carry a Supabase session token. Assistant turns stream back as tool calls, tool results, figures, and final text. Plotly renders interactive figures in the browser; parsing, model calls, storage, and tool execution run on backend services.

Methods

Parsing And Exploration

Raman Observables

For Raman maps, the most important channels are:

E2g center and FWHM.
A1g center and FWHM.
E2g and A1g intensities.
A1g/E2g ratio.
Peak asymmetry when measurable.

PL Observables

For PL maps, the tools focus on:

Integrated PL intensity and quenching.
Peak position and FWHM.
Spectral asymmetry.
Trion/exciton balance.
Sub-gap emission fraction.

Clustering Method

Use clustering as a segmentation of spectral populations. It answers where distinct populations are, not which atomistic defect is present.

Density Inversion

estimate_defect_density converts Raman E2g linewidth into an equivalent defect concentration. The tool:

Measures E2g FWHM per valid pixel.
Subtracts instrumental broadening in quadrature when instrument_fwhm is known or estimated.
Compares the intrinsic linewidth to MLIP calibration curves.
Returns a per-pixel defect density map and candidate calibration summaries.

The result is strongest when:

The material is MoS2 or WS2.
E2g is well resolved.
Instrumental broadening is known.
The sample is inside the 0-4% calibrated range.
The chosen region excludes obvious artifacts.

Defect Type Ranking

Current calibrated families for MoS2 and WS2 include:

Chalcogen vacancy.
Metal vacancy.
S->O substitution.
S->C substitution.

PL Activity Score

estimate_defect_activity combines three PL-derived channels into a 0-1 score:

Quenching, weighted most strongly, captures PL yield loss from non-radiative recombination.
Trion enhancement captures charged-defect behavior.
Sub-gap emission captures deep states below the band edge.

Use this score to distinguish structural disorder from electronically harmful defects. A sample can have measurable Raman disorder without the same regions being strongly PL-active.

Region Logic

Region-aware tools share the same spatial vocabulary:

quality: all parse-time valid pixels.
interior: quality pixels eroded away from damaged or missing boundaries.
transition: boundary halo pixels.
damaged: masked or damaged pixels.
all: every pixel.

Paired PL And Raman Alignment

Do not manually merge spectra unless you have a separate registration workflow. Keep the raw maps separate, parse both, and pass the companion dataset ID to the analysis tool.

Calibration Limits

Quantitative Raman tools depend on calibration data from MLIP Raman projections. The current density and defect-type workflows are calibrated for MoS2 and WS2 over roughly 0-4% defect concentration.

Important caveats:

Neutral-lattice simulations do not capture all charge, doping, or Fano effects.
Strain, temperature, multilayers, alloying, and substrate interactions can shift modes independently of defects.
Instrumental linewidth must be corrected for reliable density estimates.
Extrapolation beyond the calibration range should be treated as a boundary warning, not a precise value.
PL activity is phenomenological and should be interpreted with device geometry, excitation conditions, and sample history.

App Architecture

Methods

Parsing And Exploration

Raman Observables

For Raman maps, the most important channels are:

E2g center and FWHM.
A1g center and FWHM.
E2g and A1g intensities.
A1g/E2g ratio.
Peak asymmetry when measurable.

PL Observables

For PL maps, the tools focus on:

Integrated PL intensity and quenching.
Peak position and FWHM.
Spectral asymmetry.
Trion/exciton balance.
Sub-gap emission fraction.

Clustering Method

Use clustering as a segmentation of spectral populations. It answers where distinct populations are, not which atomistic defect is present.

Density Inversion

estimate_defect_density converts Raman E2g linewidth into an equivalent defect concentration. The tool:

Measures E2g FWHM per valid pixel.
Subtracts instrumental broadening in quadrature when instrument_fwhm is known or estimated.
Compares the intrinsic linewidth to MLIP calibration curves.
Returns a per-pixel defect density map and candidate calibration summaries.

The result is strongest when:

The material is MoS2 or WS2.
E2g is well resolved.
Instrumental broadening is known.
The sample is inside the 0-4% calibrated range.
The chosen region excludes obvious artifacts.

Defect Type Ranking

Current calibrated families for MoS2 and WS2 include:

Chalcogen vacancy.
Metal vacancy.
S->O substitution.
S->C substitution.

PL Activity Score

estimate_defect_activity combines three PL-derived channels into a 0-1 score:

Quenching, weighted most strongly, captures PL yield loss from non-radiative recombination.
Trion enhancement captures charged-defect behavior.
Sub-gap emission captures deep states below the band edge.

Use this score to distinguish structural disorder from electronically harmful defects. A sample can have measurable Raman disorder without the same regions being strongly PL-active.

Region Logic

Region-aware tools share the same spatial vocabulary:

quality: all parse-time valid pixels.
interior: quality pixels eroded away from damaged or missing boundaries.
transition: boundary halo pixels.
damaged: masked or damaged pixels.
all: every pixel.

Paired PL And Raman Alignment

Do not manually merge spectra unless you have a separate registration workflow. Keep the raw maps separate, parse both, and pass the companion dataset ID to the analysis tool.

Calibration Limits

Quantitative Raman tools depend on calibration data from MLIP Raman projections. The current density and defect-type workflows are calibrated for MoS2 and WS2 over roughly 0-4% defect concentration.

Important caveats:

Neutral-lattice simulations do not capture all charge, doping, or Fano effects.
Strain, temperature, multilayers, alloying, and substrate interactions can shift modes independently of defects.
Instrumental linewidth must be corrected for reliable density estimates.
Extrapolation beyond the calibration range should be treated as a boundary warning, not a precise value.
PL activity is phenomenological and should be interpreted with device geometry, excitation conditions, and sample history.