Manage biological knowledge#
Important
LaminDB speaks biology through Bionty schema module.
A basic wetlab schema maps R&D operations.
Let’s first review a last concept: knowledge-derived entities.
If tracked data isn’t curated against knowledge-derived standards, data integration often becomes a pain.
LaminDB with the Bionty extension can manage knowledge by providing lookups, curation and knowledge-coupled SQL tables.
[lnschema-bionty] contains the schema tables for all entities that are avilable in Bionty.
Let us map data on knowledge to illustrate the process.
import lamindb as ln
import lnschema_bionty as bt
ln.track()
ℹ️ Instance: testuser1/lnbionty-test
ℹ️ User: testuser1
ℹ️ Added notebook: Transform(id='f7F0c2n2Ft1s', v='0', name='knowledge', type=notebook, title='Manage biological knowledge', created_by='DzTjkKse', created_at=datetime.datetime(2023, 3, 30, 23, 17, 34))
ℹ️ Added run: Run(id='XE0faLyeY9bXU4hLhSlV', transform_id='f7F0c2n2Ft1s', transform_v='0', created_by='DzTjkKse', created_at=datetime.datetime(2023, 3, 30, 23, 17, 34))
Tip
You can view the reference of current biological entities in LaminDB:
ln.select(bt.dev.BiontyVersions).join(bt.dev.CurrentBiontyVersions).df()
entity | database | database_v | database_url | created_by | created_at | updated_at | |
---|---|---|---|---|---|---|---|
id | |||||||
0 | Species | ensembl | release-108 | https://ftp.ensembl.org/pub/release-108/mysql/ | DzTjkKse | 2023-03-30 23:17:09 | None |
1 | Gene | ensembl | release-108 | https://ftp.ensembl.org/pub/release-108/mysql/ | DzTjkKse | 2023-03-30 23:17:09 | None |
2 | Protein | uniprot | 2022-04 | https://ftp.uniprot.org/pub/databases/uniprot/... | DzTjkKse | 2023-03-30 23:17:09 | None |
3 | CellMarker | cellmarker | 2.0 | http://bio-bigdata.hrbmu.edu.cn/CellMarker/Cel... | DzTjkKse | 2023-03-30 23:17:09 | None |
4 | CellLine | clo | 2022-03-21 | https://data.bioontology.org/ontologies/CLO/su... | DzTjkKse | 2023-03-30 23:17:09 | None |
5 | CellType | cl | 2023-02-15 | http://purl.obolibrary.org/obo/cl/releases/202... | DzTjkKse | 2023-03-30 23:17:09 | None |
6 | Tissue | uberon | 2023-02-14 | http://purl.obolibrary.org/obo/uberon/releases... | DzTjkKse | 2023-03-30 23:17:09 | None |
7 | Disease | mondo | 2023-02-06 | http://purl.obolibrary.org/obo/mondo/releases/... | DzTjkKse | 2023-03-30 23:17:09 | None |
8 | Readout | efo | 3.48.0 | http://www.ebi.ac.uk/efo/releases/v3.48.0/efo.owl | DzTjkKse | 2023-03-30 23:17:09 | None |
9 | Phenotype | hp | 2023-01-27 | https://github.com/obophenotype/human-phenotyp... | DzTjkKse | 2023-03-30 23:17:09 | None |
10 | Pathway | pw | 7.74 | https://data.bioontology.org/ontologies/PW/dow... | DzTjkKse | 2023-03-30 23:17:09 | None |
Lookup ontology ids#
For instance, you can retrieve the cell type ontology id of “gamma delta T cell” by accessing a cell type ontology through the bionty.Entity
CellType
:
ct_lookup = bt.CellType.lookup
ct_lookup.gamma_delta_T_cell
cell_type(ontology_id='CL:0000798', name='gamma-delta T cell')
See also
See the Bionty documentation for gene name aliasing and other types of lookups.
Create knowledge-derived records#
You can also directly create a record for the CellType
table:
bt.CellType(name=ct_lookup.gamma_delta_T_cell.name)
CellType(id='CL:0000798', ontology_id='CL:0000798', name='gamma-delta T cell')
Curate metadata by linking it against knowledge#
Let us link all the biological samples in a cross-tissue scRNA-seq dataset:
adata = ln.dev.datasets.anndata_human_immune_cells()
meta = adata.obs.drop_duplicates(subset=adata.obs.columns)
meta.head()
donor_id | tissue | cell_type | assay | tissue_ontology_term_id | cell_type_ontology_term_id | assay_ontology_term_id | |
---|---|---|---|---|---|---|---|
CZINY-0109_CTGGTCTAGTCTGTAC | D496 | blood | classical monocyte | 10x 3' v3 | UBERON:0000178 | CL:0000860 | EFO:0009922 |
CZI-IA10244332+CZI-IA10244434_CCTTCGACATACTCTT | 621B | thoracic lymph node | T follicular helper cell | 10x 5' v2 | UBERON:0007644 | CL:0002038 | EFO:0009900 |
Pan_T7935491_CTGGTCTGTACATGTC | A29 | spleen | memory B cell | 10x 5' v1 | UBERON:0002106 | CL:0000787 | EFO:0011025 |
Pan_T7980367_GGGCATCCAGGTGGAT | A36 | lung | alveolar macrophage | 10x 5' v1 | UBERON:0002048 | CL:0000583 | EFO:0011025 |
Pan_T7935494_ATCATGGTCTACCTGC | A29 | mesenteric lymph node | naive thymus-derived CD4-positive, alpha-beta ... | 10x 5' v1 | UBERON:0002509 | CL:0000895 | EFO:0011025 |
meta.shape
(514, 7)
Let’s first add all the cell types: .curate
allows you to check the passed ids are present in the knowledge table.
It returns a new DataFrame
indexed with the curated ids and a boolean __curated__
column.
celltype_curate = bt.CellType.curate(meta, column="cell_type_ontology_term_id")
✅ 514 terms (100.0%) are mapped.
🔶 0 terms (0.0%) are not mapped.
Here we saw all terms can be linked. 🎉
Update the content of the knowledge-managed tables in your SQL database#
Assume there are cell types in this new dataset that are tracked in the ontology (bionty.CellType
), but are not yet tracked in the DB table (lns.bionty.CellType
).
Let us fix that!
We can go ahead and create records of the CellType
table:
celltype_records = [bt.CellType(ontology_id=i) for i in celltype_curate.index.unique()]
celltype_records[:3]
[CellType(id='0r8seCQT', ontology_id='CL:0000860', name='classical monocyte'),
CellType(id='mflMyS4P', ontology_id='CL:0002038', name='T follicular helper cell'),
CellType(id='6D2sGpoW', ontology_id='CL:0000787', name='memory B cell')]
We can do the same for tissues:
tissue_curate = bt.Tissue.curate(meta, column="tissue_ontology_term_id")
✅ 514 terms (100.0%) are mapped.
🔶 0 terms (0.0%) are not mapped.
tissue_records = [bt.Tissue(ontology_id=i) for i in tissue_curate.index.unique()]
tissue_records[:3]
[Tissue(id='oOynQVrG', ontology_id='UBERON:0000178', name='blood'),
Tissue(id='3PjYAKnU', ontology_id='UBERON:0007644', name='thoracic lymph node'),
Tissue(id='27IniEkO', ontology_id='UBERON:0002106', name='spleen')]
Finally, let’s add them to the database:
ln.add(celltype_records + tissue_records) # add all records in one transaction
[CellType(id='0r8seCQT', ontology_id='CL:0000860', name='classical monocyte'),
CellType(id='mflMyS4P', ontology_id='CL:0002038', name='T follicular helper cell'),
CellType(id='6D2sGpoW', ontology_id='CL:0000787', name='memory B cell'),
CellType(id='mBHciTPF', ontology_id='CL:0000583', name='alveolar macrophage'),
CellType(id='HSkf3Q1R', ontology_id='CL:0000895', name='naive thymus-derived CD4-positive, alpha-beta T cell'),
CellType(id='4ChG7mFe', ontology_id='CL:0001062', name='effector memory CD8-positive, alpha-beta T cell, terminally differentiated'),
CellType(id='7kDaS5Ou', ontology_id='CL:0000789', name='alpha-beta T cell'),
CellType(id='rtDPx6Hx', ontology_id='CL:0000492', name='CD4-positive helper T cell'),
CellType(id='9XsNkGyV', ontology_id='CL:0000900', name='naive thymus-derived CD8-positive, alpha-beta T cell'),
CellType(id='hY2x3J1e', ontology_id='CL:0000235', name='macrophage'),
CellType(id='0PZas9yO', ontology_id='CL:0000940', name='mucosal invariant T cell'),
CellType(id='oXERjFK4', ontology_id='CL:0001071', name='group 3 innate lymphoid cell'),
CellType(id='BOpRP2Fh', ontology_id='CL:0000788', name='naive B cell'),
CellType(id='Bgz4LhGa', ontology_id='CL:0000548', name='animal cell'),
CellType(id='TWgOU9nt', ontology_id='CL:0000938', name='CD16-negative, CD56-bright natural killer cell, human'),
CellType(id='NJd24u0f', ontology_id='CL:0000786', name='plasma cell'),
CellType(id='XbHAnqxj', ontology_id='CL:0000909', name='CD8-positive, alpha-beta memory T cell'),
CellType(id='gHrHTy92', ontology_id='CL:0000939', name='CD16-positive, CD56-dim natural killer cell, human'),
CellType(id='gVr0YFoD', ontology_id='CL:0000798', name='gamma-delta T cell'),
CellType(id='XdmAIFHE', ontology_id='CL:0000990', name='conventional dendritic cell'),
CellType(id='gAfMcxGY', ontology_id='CL:0001203', name='CD8-positive, alpha-beta memory T cell, CD45RO-positive'),
CellType(id='3nr16uCT', ontology_id='CL:0000905', name='effector memory CD4-positive, alpha-beta T cell'),
CellType(id='8NjaB9BF', ontology_id='CL:0000875', name='non-classical monocyte'),
CellType(id='ZKZVJGmU', ontology_id='CL:0000097', name='mast cell'),
CellType(id='s5P8y64X', ontology_id='CL:0000815', name='regulatory T cell'),
CellType(id='8J77wB5e', ontology_id='CL:0011026', name='progenitor cell'),
CellType(id='ZsMcdB6W', ontology_id='CL:0001056', name='dendritic cell, human'),
CellType(id='tgTN23uh', ontology_id='CL:0000980', name='plasmablast'),
CellType(id='j1cCEIlt', ontology_id='CL:0000784', name='plasmacytoid dendritic cell'),
CellType(id='5H3ZzPO0', ontology_id='CL:0000542', name='lymphocyte'),
CellType(id='4sdPZTzG', ontology_id='CL:0000844', name='germinal center B cell'),
CellType(id='jZVwZhZ5', ontology_id='CL:0000556', name='megakaryocyte'),
Tissue(id='oOynQVrG', ontology_id='UBERON:0000178', name='blood'),
Tissue(id='3PjYAKnU', ontology_id='UBERON:0007644', name='thoracic lymph node'),
Tissue(id='27IniEkO', ontology_id='UBERON:0002106', name='spleen'),
Tissue(id='RqsZxxU5', ontology_id='UBERON:0002048', name='lung'),
Tissue(id='oacWHW6m', ontology_id='UBERON:0002509', name='mesenteric lymph node'),
Tissue(id='5hkBkUOD', ontology_id='UBERON:0000030', name='lamina propria'),
Tissue(id='fDaUnIP1', ontology_id='UBERON:0002107', name='liver'),
Tissue(id='TLoVnCfT', ontology_id='UBERON:0000400', name='jejunal epithelium'),
Tissue(id='n2XwDJRH', ontology_id='UBERON:0003688', name='omentum'),
Tissue(id='npJZcG4r', ontology_id='UBERON:0002371', name='bone marrow'),
Tissue(id='ZgaNJ7qR', ontology_id='UBERON:0002116', name='ileum'),
Tissue(id='iM008Cay', ontology_id='UBERON:0001153', name='caecum'),
Tissue(id='nPwtH6Tb', ontology_id='UBERON:0002370', name='thymus'),
Tissue(id='vgPMrvDI', ontology_id='UBERON:0001134', name='skeletal muscle tissue'),
Tissue(id='MXtPqLkD', ontology_id='UBERON:0002114', name='duodenum'),
Tissue(id='Yje0RkZC', ontology_id='UBERON:0001159', name='sigmoid colon'),
Tissue(id='UQvkthge', ontology_id='UBERON:0001157', name='transverse colon')]
Check they are in the database:
ln.select(bt.Tissue, name="blood").one()
Tissue(id='oOynQVrG', ontology_id='UBERON:0000178', name='blood')
Now, some of the foundational knowledge is in place! 😌