lamindb#

Open-source data platform for biology.

LaminDB assumes that data is stored in scalable storage backends, typically array formats like parquet, zarr, HDF5, TileDB or DuckDB or simple files in object storage.

LaminDB helps you manage these data using registries for metadata.

The two most important are:

File()

Files: immutable data batches.

Dataset()

Datasets: mutable collections of data batches.

Four registries track provenance of data batches:

Transform()

Transforms of files & datasets.

Run()

Runs of transforms.

User()

Users: humans and bots.

Storage()

Storage locations: S3/GCP buckets or local directories.

Four registries validate & contextualize data batches:

ULabel()

Universal label ontology.

Feature()

Dimensions of measurement.

FeatureSet()

Jointly measured sets of features.

Modality()

Measurement types of features.

Functional tools:

track([transform, new_run, reference, ...])

Track global Transform & Run for a notebook or pipeline.

view([n, schema, registries])

View data.

save(records, **kwargs)

Bulk save to registries & storage.

Static classes & modules:

settings

Global Settings.

setup

Setup & configure LaminDB.

schema

Schema tools & overview.

dev

Developer API.