Often, siloed object stores, SQL databases & ELN/LIMS systems pile up inaccessible & hard-to-integrate data impacting derived analytical insights.

LaminDB’s features aim to address key problems underlying this tendency, taking inspiration from a number of data tools.

For data users

  • Unify access to data & metadata across storage (arrays, files) & SQL database backends:

    • Query by & search for anything: filter, search

    • Stage, load or stream files & datasets: stage, load, backed

    • Model data schema-less or schema-full, mount custom schema plug-ins & manage schema migrations (schemas)

    • Organize data around learning: Feature, FeatureSet, ULabel, Modality

    • Leverage support for common array formats in memory & storage: DataFrame, AnnData, MuData, pyarrow.Table backed by parquet, zarr, TileDB, HDF5, h5ad, DuckDB

    • Bridge immutable data artifacts (File) and data warehousing (Dataset)

  • Track data flow across notebooks, pipelines & UI: track(), Transform & Run

  • Manage registries for experimental metadata & ontologies in a simple database:

  • Validate, standardize & annotate data batches:

  • Create DB instances within seconds and share data across a mesh of instances (setup)

For platform builders

  • Zero lock-in: LaminDB runs on generic backends server-side and is not a client for “Lamin Cloud”

    • Flexible storage backends (local, S3, GCP, anything fsspec supports)

    • Currently two SQL backends for managing metadata: SQLite & Postgres

  • Scalable: metadata tables support 100s of millions of entries

  • Access management:

    • High-level access management through Lamin’s collaborator roles

    • Fine-grained access management via embedded storage & SQL roles

  • Secure: embedded in your infrastructure (Lamin has no access to your data & metadata)

  • Idempotent & ACID operations

  • File, dataset & transform versioning

  • Safeguards against typos & duplications when populating registries

  • Tested & typed (up to Django Model fields, to come)


Unlike GitHub & most SaaS platforms, LaminHub by default neither hosts data nor metadata, but connects to distributed storage locations & databases through LaminDB.

Public demo instances to explore in the UI or load using the CLI via lamin load owner/instance (you need an account):

See validated files & arrays in context of ontologies & experimental metadata:

Track data flow through pipelines, notebooks & UI:

Upload & register files:

Browse files: