Manage biological knowledge#

Important

LaminDB speaks biology through Bionty schema module.

A basic wetlab schema maps R&D operations.

Let’s first review a last concept: knowledge-derived entities.

If tracked data isn’t curated against knowledge-derived standards, data integration often becomes a pain.

LaminDB with the Bionty extension can manage knowledge by providing lookups, curation and knowledge-coupled SQL tables.

[lnschema-bionty] contains the schema tables for all entities that are avilable in Bionty.

Let us map data on knowledge to illustrate the process.

import lamindb as ln
import lnschema_bionty as bt

ln.track()
ℹ️ Instance: testuser1/lnbionty-test
ℹ️ User: testuser1
ℹ️ Added notebook: Transform(id='f7F0c2n2Ft1s', v='0', name='knowledge', type=notebook, title='Manage biological knowledge', created_by='DzTjkKse', created_at=datetime.datetime(2023, 3, 30, 23, 17, 34))
ℹ️ Added run: Run(id='XE0faLyeY9bXU4hLhSlV', transform_id='f7F0c2n2Ft1s', transform_v='0', created_by='DzTjkKse', created_at=datetime.datetime(2023, 3, 30, 23, 17, 34))

Tip

You can view the reference of current biological entities in LaminDB:

ln.select(bt.dev.BiontyVersions).join(bt.dev.CurrentBiontyVersions).df()
entity database database_v database_url created_by created_at updated_at
id
0 Species ensembl release-108 https://ftp.ensembl.org/pub/release-108/mysql/ DzTjkKse 2023-03-30 23:17:09 None
1 Gene ensembl release-108 https://ftp.ensembl.org/pub/release-108/mysql/ DzTjkKse 2023-03-30 23:17:09 None
2 Protein uniprot 2022-04 https://ftp.uniprot.org/pub/databases/uniprot/... DzTjkKse 2023-03-30 23:17:09 None
3 CellMarker cellmarker 2.0 http://bio-bigdata.hrbmu.edu.cn/CellMarker/Cel... DzTjkKse 2023-03-30 23:17:09 None
4 CellLine clo 2022-03-21 https://data.bioontology.org/ontologies/CLO/su... DzTjkKse 2023-03-30 23:17:09 None
5 CellType cl 2023-02-15 http://purl.obolibrary.org/obo/cl/releases/202... DzTjkKse 2023-03-30 23:17:09 None
6 Tissue uberon 2023-02-14 http://purl.obolibrary.org/obo/uberon/releases... DzTjkKse 2023-03-30 23:17:09 None
7 Disease mondo 2023-02-06 http://purl.obolibrary.org/obo/mondo/releases/... DzTjkKse 2023-03-30 23:17:09 None
8 Readout efo 3.48.0 http://www.ebi.ac.uk/efo/releases/v3.48.0/efo.owl DzTjkKse 2023-03-30 23:17:09 None
9 Phenotype hp 2023-01-27 https://github.com/obophenotype/human-phenotyp... DzTjkKse 2023-03-30 23:17:09 None
10 Pathway pw 7.74 https://data.bioontology.org/ontologies/PW/dow... DzTjkKse 2023-03-30 23:17:09 None

Lookup ontology ids#

For instance, you can retrieve the cell type ontology id of “gamma delta T cell” by accessing a cell type ontology through the bionty.Entity CellType:

ct_lookup = bt.CellType.lookup
ct_lookup.gamma_delta_T_cell
cell_type(ontology_id='CL:0000798', name='gamma-delta T cell')

See also

See the Bionty documentation for gene name aliasing and other types of lookups.

Create knowledge-derived records#

You can also directly create a record for the CellType table:

bt.CellType(name=ct_lookup.gamma_delta_T_cell.name)
CellType(id='CL:0000798', ontology_id='CL:0000798', name='gamma-delta T cell')

Curate metadata by linking it against knowledge#

Let us link all the biological samples in a cross-tissue scRNA-seq dataset:

adata = ln.dev.datasets.anndata_human_immune_cells()
meta = adata.obs.drop_duplicates(subset=adata.obs.columns)
meta.head()
donor_id tissue cell_type assay tissue_ontology_term_id cell_type_ontology_term_id assay_ontology_term_id
CZINY-0109_CTGGTCTAGTCTGTAC D496 blood classical monocyte 10x 3' v3 UBERON:0000178 CL:0000860 EFO:0009922
CZI-IA10244332+CZI-IA10244434_CCTTCGACATACTCTT 621B thoracic lymph node T follicular helper cell 10x 5' v2 UBERON:0007644 CL:0002038 EFO:0009900
Pan_T7935491_CTGGTCTGTACATGTC A29 spleen memory B cell 10x 5' v1 UBERON:0002106 CL:0000787 EFO:0011025
Pan_T7980367_GGGCATCCAGGTGGAT A36 lung alveolar macrophage 10x 5' v1 UBERON:0002048 CL:0000583 EFO:0011025
Pan_T7935494_ATCATGGTCTACCTGC A29 mesenteric lymph node naive thymus-derived CD4-positive, alpha-beta ... 10x 5' v1 UBERON:0002509 CL:0000895 EFO:0011025
meta.shape
(514, 7)

Let’s first add all the cell types: .curate allows you to check the passed ids are present in the knowledge table.

It returns a new DataFrame indexed with the curated ids and a boolean __curated__ column.

celltype_curate = bt.CellType.curate(meta, column="cell_type_ontology_term_id")
✅ 514 terms (100.0%) are mapped.
🔶 0 terms (0.0%) are not mapped.

Here we saw all terms can be linked. 🎉

Update the content of the knowledge-managed tables in your SQL database#

Assume there are cell types in this new dataset that are tracked in the ontology (bionty.CellType), but are not yet tracked in the DB table (lns.bionty.CellType).

Let us fix that!

We can go ahead and create records of the CellType table:

celltype_records = [bt.CellType(ontology_id=i) for i in celltype_curate.index.unique()]
celltype_records[:3]
[CellType(id='0r8seCQT', ontology_id='CL:0000860', name='classical monocyte'),
 CellType(id='mflMyS4P', ontology_id='CL:0002038', name='T follicular helper cell'),
 CellType(id='6D2sGpoW', ontology_id='CL:0000787', name='memory B cell')]

We can do the same for tissues:

tissue_curate = bt.Tissue.curate(meta, column="tissue_ontology_term_id")
✅ 514 terms (100.0%) are mapped.
🔶 0 terms (0.0%) are not mapped.
tissue_records = [bt.Tissue(ontology_id=i) for i in tissue_curate.index.unique()]
tissue_records[:3]
[Tissue(id='oOynQVrG', ontology_id='UBERON:0000178', name='blood'),
 Tissue(id='3PjYAKnU', ontology_id='UBERON:0007644', name='thoracic lymph node'),
 Tissue(id='27IniEkO', ontology_id='UBERON:0002106', name='spleen')]

Finally, let’s add them to the database:

ln.add(celltype_records + tissue_records)  # add all records in one transaction
[CellType(id='0r8seCQT', ontology_id='CL:0000860', name='classical monocyte'),
 CellType(id='mflMyS4P', ontology_id='CL:0002038', name='T follicular helper cell'),
 CellType(id='6D2sGpoW', ontology_id='CL:0000787', name='memory B cell'),
 CellType(id='mBHciTPF', ontology_id='CL:0000583', name='alveolar macrophage'),
 CellType(id='HSkf3Q1R', ontology_id='CL:0000895', name='naive thymus-derived CD4-positive, alpha-beta T cell'),
 CellType(id='4ChG7mFe', ontology_id='CL:0001062', name='effector memory CD8-positive, alpha-beta T cell, terminally differentiated'),
 CellType(id='7kDaS5Ou', ontology_id='CL:0000789', name='alpha-beta T cell'),
 CellType(id='rtDPx6Hx', ontology_id='CL:0000492', name='CD4-positive helper T cell'),
 CellType(id='9XsNkGyV', ontology_id='CL:0000900', name='naive thymus-derived CD8-positive, alpha-beta T cell'),
 CellType(id='hY2x3J1e', ontology_id='CL:0000235', name='macrophage'),
 CellType(id='0PZas9yO', ontology_id='CL:0000940', name='mucosal invariant T cell'),
 CellType(id='oXERjFK4', ontology_id='CL:0001071', name='group 3 innate lymphoid cell'),
 CellType(id='BOpRP2Fh', ontology_id='CL:0000788', name='naive B cell'),
 CellType(id='Bgz4LhGa', ontology_id='CL:0000548', name='animal cell'),
 CellType(id='TWgOU9nt', ontology_id='CL:0000938', name='CD16-negative, CD56-bright natural killer cell, human'),
 CellType(id='NJd24u0f', ontology_id='CL:0000786', name='plasma cell'),
 CellType(id='XbHAnqxj', ontology_id='CL:0000909', name='CD8-positive, alpha-beta memory T cell'),
 CellType(id='gHrHTy92', ontology_id='CL:0000939', name='CD16-positive, CD56-dim natural killer cell, human'),
 CellType(id='gVr0YFoD', ontology_id='CL:0000798', name='gamma-delta T cell'),
 CellType(id='XdmAIFHE', ontology_id='CL:0000990', name='conventional dendritic cell'),
 CellType(id='gAfMcxGY', ontology_id='CL:0001203', name='CD8-positive, alpha-beta memory T cell, CD45RO-positive'),
 CellType(id='3nr16uCT', ontology_id='CL:0000905', name='effector memory CD4-positive, alpha-beta T cell'),
 CellType(id='8NjaB9BF', ontology_id='CL:0000875', name='non-classical monocyte'),
 CellType(id='ZKZVJGmU', ontology_id='CL:0000097', name='mast cell'),
 CellType(id='s5P8y64X', ontology_id='CL:0000815', name='regulatory T cell'),
 CellType(id='8J77wB5e', ontology_id='CL:0011026', name='progenitor cell'),
 CellType(id='ZsMcdB6W', ontology_id='CL:0001056', name='dendritic cell, human'),
 CellType(id='tgTN23uh', ontology_id='CL:0000980', name='plasmablast'),
 CellType(id='j1cCEIlt', ontology_id='CL:0000784', name='plasmacytoid dendritic cell'),
 CellType(id='5H3ZzPO0', ontology_id='CL:0000542', name='lymphocyte'),
 CellType(id='4sdPZTzG', ontology_id='CL:0000844', name='germinal center B cell'),
 CellType(id='jZVwZhZ5', ontology_id='CL:0000556', name='megakaryocyte'),
 Tissue(id='oOynQVrG', ontology_id='UBERON:0000178', name='blood'),
 Tissue(id='3PjYAKnU', ontology_id='UBERON:0007644', name='thoracic lymph node'),
 Tissue(id='27IniEkO', ontology_id='UBERON:0002106', name='spleen'),
 Tissue(id='RqsZxxU5', ontology_id='UBERON:0002048', name='lung'),
 Tissue(id='oacWHW6m', ontology_id='UBERON:0002509', name='mesenteric lymph node'),
 Tissue(id='5hkBkUOD', ontology_id='UBERON:0000030', name='lamina propria'),
 Tissue(id='fDaUnIP1', ontology_id='UBERON:0002107', name='liver'),
 Tissue(id='TLoVnCfT', ontology_id='UBERON:0000400', name='jejunal epithelium'),
 Tissue(id='n2XwDJRH', ontology_id='UBERON:0003688', name='omentum'),
 Tissue(id='npJZcG4r', ontology_id='UBERON:0002371', name='bone marrow'),
 Tissue(id='ZgaNJ7qR', ontology_id='UBERON:0002116', name='ileum'),
 Tissue(id='iM008Cay', ontology_id='UBERON:0001153', name='caecum'),
 Tissue(id='nPwtH6Tb', ontology_id='UBERON:0002370', name='thymus'),
 Tissue(id='vgPMrvDI', ontology_id='UBERON:0001134', name='skeletal muscle tissue'),
 Tissue(id='MXtPqLkD', ontology_id='UBERON:0002114', name='duodenum'),
 Tissue(id='Yje0RkZC', ontology_id='UBERON:0001159', name='sigmoid colon'),
 Tissue(id='UQvkthge', ontology_id='UBERON:0001157', name='transverse colon')]

Check they are in the database:

ln.select(bt.Tissue, name="blood").one()
Tissue(id='oOynQVrG', ontology_id='UBERON:0000178', name='blood')

Now, some of the foundational knowledge is in place! 😌