Key problems of data-heavy R&D.

The complexity of modern R&D data often blocks realizing the scientific progress promised by high-resolution readouts and computation.

Here we list key problems we see and how we think about solving them.

Data cannot be accessed at all#




Object storage.

Data in object storage can’t be queried.

Index observations and variables and link them in a query database.

Pile of data.

Data can’t be accessed as it’s not structured and siloed in fragmented infrastructure.

Structure data both by biological entities and by provenance with one interface across storage and database backends.

Data cannot be accessed at scale#




Anecdotal data.

Data can’t be accessed at scale as no viable programmatic interfaces exist.

API-first platform.

Cross-storage integration.

Molecular (high-dimensional) data can’t be efficiently integrated with phenotypic (low-dimensional) data.

Index molecular data with the same biological entities as phenotypic data. Provide connectors for low-dimensional data management systems (ELN & LIMS systems).

Scientific results are not solid#




Stand on solid ground.

Key analytics results cannot be linked to supporting data as too many processing steps are involved.

Provide full data provenance.

Collaborative science across organizations is hard#




Siloed infrastructure.

Data can’t be easily shared across organizations.

Federated collaboration hub on distributed infrastructure.

Siloed semantics.

External data can’t be mapped on in-house data and vice versa.

Provide curation and ingestion API, operate on open-source data models that can be adopted by any organization.

R&D misses opportunities for higher effectiveness#




Optimal decision making.

There is no framework for tracking decision making in complex R&D teams.

Graph of data flow in R&D team, including scientists, computation, decisions, predictions. Unlike workflow frameworks, LaminDB creates an emergent graphs.

Dry lab is not integrated.

Data platforms offer no adequate interface for the drylab.

API-first with data scientist needs in mind.

Support learning.

There is no support for the learning-from-data cycle.

Support data models across the full lab cycle, including measured → relevant → derived features. Manage knowledge through rich semantic models that map high-dimensional data.

Standard data platforms lack the support of basic R&D operations#




Development data.

Data associated with assay development can’t be ingested as data models are too rigid.

Allow partial integrity in LaminDB’s implementation of a data lakehouse: ingest data of any curation level and label them with corresponding QC flags.

Corrupted data.

Data is often corrupted.

Full provenance allows to trace back corruption to its origin and write a simple fix, typically, in form of an ingestion constraint.

Building an R&D data platform is hard#




Aligning data models.

Data models are hard to align across interdisciplinary stakeholders.

Lamin’s data model templates cover 90% of cases, the remaining 10% can be get configured.


Existing platforms lock organizations into specific cloud infrastructure.

Open-source and multi-cloud stack with zero lock-in danger.

Migrations are a pain.

Migrating data models in a fast-paced R&D environment can be prohibitive.

LaminDB’s schema modules migrate automatically.