Open data platform for biology
Enable learning at scale with an open-source API for everything: lakehouse, lineage, feature store, ontologies, LIMS, and ELN.

Trace data & code
Know where a dataset or model came from and what it's used for. Track lineage with a single line of code in notebooks, scripts & workflows across Python, R, Nextflow, and the shell.
Query datasets at scale
Query and batch-load datasets with lakehouse support for parquet, AnnData, SpatialData, zarr, tiledbsoma, and more. Manage their features & schemas with Postgres or SQLite.

Lakehouse-native sheets
Build sheets using the same features and schemas that manage your datasets in storage. Use one Python/R class for your LIMS: experiments, samples, datasets, models, notes, reports, and more. Ontologies and change management built in.

Validate & annotate datasets
Use schemas to enforce consistency across sheets, parquet, AnnData, SpatialData, zarr, tiledbsoma, and more. Annotate datasets with a single line of code.
Administrate with ease while staying in control
Manage fine-grained access with SaaS-like simplicity while maintaining direct admin control at the Postgres and object storage level.

Build your organization's long-term memory
Your data, models, and reports are auto-connected during regular operations so that your team and agents can learn & improve.
