Open data platform for biology
Enable learning at scale with an open-source lakehouse native to biology & data lineage.

Trace data & code
Always know where a dataset came from and what it's used for. Capture data lineage in interactive analyses & scripts with a simple function call.
Manage datasets at scale
Query flexibly across storage and databases with a biology-aware lakehouse that goes beyond tables.
Manage registries
One Python class handles everything: experiments, samples, datasets, models, and more. Built on the Django ORM with ontology support.
Validate & annotate datasets
Use schemas to enforce consistency. Annotate datasets with a few lines of code.
Leverage distributed infrastructure
Create & connect databases with ease. Zero-copy transfer data.
Build your organization's long-term memory
Transform artifacts into more useful representations: queryable datasets, predictive models, and analytical insights.
