Data Guide

Matter42 treats every upload as a project file first. When the agent parses a file, it creates a normalized dataset with a dataset_id; every later tool call should refer to that ID rather than re-reading the raw file.

Supported Uploads

File typeExamplesWhat Matter42 creates
LabSpec hyperspectral maps.txt Raman or PL maps with spatial coordinates and spectrahyperspectral_map dataset with x/y grid, spectral axis, spectra cube, quality mask, and measurement metadata when available
Single spectraLabSpec .txt, two-column .csv or .tsvsingle_spectrum dataset with spectral axis and intensity
Numeric tables.csv, .tsv, .xlsx, .xlstabular dataset with numeric columns and optional primary x/y columns
Documents.pdf, .md, .markdown, .mdx, .rst, plain .txtdocument dataset for notes, paper excerpts, protocols, or metadata
Images.png, .jpg, .jpeg, .tif, .tiff, .bmp, .gifimage dataset with optional modality and pixel-size metadata

Unsupported or exotic formats, such as .mat, .npy, custom HDF5, multimodal bundles, or OCR-required images, need the guided parse_upload path. That path inspects the file outside the model context, writes a canonical HDF5 envelope, and registers it with register_dataset.

Parser Hints

The agent can pass hints to parse_data when the file or sample needs extra context:

HintUse it when
data_type="raman" or data_type="pl"The spectral axis is ambiguous or the filename does not identify the measurement
boundary_buffer_um=2.0PFIB milling, etched holes, tears, or damaged boundaries should be excluded from interior analysis
description="..."You want the project file to carry sample prep, dose, anneal, or instrument notes
primary_x / primary_yA table has obvious plotting axes
modality="sem", "tem", "optical", or "afm"An image should be interpreted as a known microscopy modality
pixel_size_um=...Image measurements need a physical scale

For Raman density estimates, also tell the agent your instrumental linewidth if you know it:

Use instrument_fwhm=1.6 cm^-1 for the density and classification tools.

If you do not provide it, the backend can sometimes infer an instrument contribution from the Si peak or from instrument metadata, but explicit calibration is better for quantitative work.

What Gets Extracted

For hyperspectral maps, Matter42 stores the spectra as a spatial cube and derives maps from physics-relevant windows. Raman maps focus on E2g and A1g features: intensity, peak center, FWHM, ratio, and asymmetry where measurable. PL maps focus on total intensity, peak position, FWHM, asymmetry, quenching, trion/exciton balance, and sub-gap emission.

The parser also creates a quality mask. Downstream region-aware tools can further split pixels into:

  • quality: parse-time valid pixels.
  • interior: valid pixels eroded away from damaged boundaries.
  • transition: boundary halo pixels.
  • damaged: masked or artifact regions.
  • all: every pixel, including masked pixels.

Data Hygiene

Upload raw files when possible. Do not paste large spectra or maps into chat; attach the file so bytes stay in storage and only paths, URLs, dataset IDs, and summaries enter the conversation.

Keep matched Raman and PL maps as separate uploads. Ask the agent to parse both and pass the second dataset as an auxiliary input. The analysis tools handle grid overlap and nearest-neighbor alignment.

For maps with sample damage, add experimental context up front: PFIB dose, boundary buffer, known etched regions, multilayer regions, laser wavelength, grating, and whether the map includes a Si calibration peak.

Good Data Questions

  • "Is this Raman map high-defect or low-defect, and where are the damaged regions?"
  • "Estimate vacancy percentage from E2g broadening, using only the interior region."
  • "Which defect family best matches the Raman fingerprint?"
  • "Do PL-quenched regions spatially overlap with Raman-broadened regions?"
  • "Show me the mean spectrum and annotate peaks."
  • "Compare these two annealing conditions using the same region settings."
  • "For this image/table/document, summarize the usable metadata and suggest what analysis is possible."

Copyright © 2026 Matter42. All rights reserved.