Bionty schema#

Hide code cell content
!lamin init --storage lnbionty-test --schema bionty
ℹ️ Not registering instance on hub, if you want, call `lamin register`
ℹ️ Loading schema modules: core==0.34.0 bionty==0.18.0 
✅ Loaded instance: testuser1/lnbionty-test
✅ Created & loaded instance: testuser1/lnbionty-test
import lamindb as ln
import lnschema_bionty as bt
import pandas as pd
✅ Loaded instance: testuser1/lnbionty-test
ln.schema.view()
../_images/8f5a1c1f16d741912682cebb3ec4a4c55551ba0f29dc71348d48d6130ade95a5.svg

Create a SQL record from scratch#

This works like any other ORM:

bt.Species(name="new species")
Species(id='C82a', name='new species', created_by_id='DzTjkKse')
bt.Gene(symbol="synTCF7")
Gene(id='aJNGiVfn', symbol='synTCF7', created_by_id='DzTjkKse')
bt.CellType(name="my T cell", ontology_id="my_ontology_id")
CellType(id='4u7zBVY2', name='my T cell', ontology_id='my_ontology_id', created_by_id='DzTjkKse')

Create a SQL record from knowledge#

With configured database and version for each entity (see Bionty Configuration), all other fields can be auto populated given one field value.

bt.Species.from_bionty(name="mouse")


Species(id='vado', name='mouse', taxon_id=10090, scientific_name='mus_musculus', created_by_id='DzTjkKse')
bt.CellType.from_bionty(name="T cell")


CellType(id='QvYE8bIq', name='T cell', synonyms='T-lymphocyte|T lymphocyte|T-cell', ontology_id='CL:0000084', definition='A Type Of Lymphocyte Whose Defining Characteristic Is The Expression Of A T Cell Receptor Complex.', created_by_id='DzTjkKse')
bt.Gene.from_bionty(symbol="TCF7")
Gene(id='geGss49r', ensembl_gene_id='ENSG00000081059', symbol='TCF7', gene_type='protein_coding', description='transcription factor 7 [Source:HGNC Symbol;Acc:HGNC:11639]', ncbi_gene_id=6932, hgnc_id='HGNC:11639', omim_id=189908, synonyms='TCF-1', version='Ens107', created_by_id='DzTjkKse')

Note

For feature entities: Gene, Protein and CellMarker: make sure you configure the correct species (default is “human”):

bt.Gene.from_bionty(symbol="Ap5b1", species="mouse")
Gene(id='rLsAaitp', ensembl_gene_id='ENSMUSG00000049562', symbol='Ap5b1', gene_type='protein_coding', description='adaptor-related protein complex 5, beta 1 subunit [Source:MGI Symbol;Acc:MGI:2685808]', ncbi_gene_id=381201, mgi_id='MGI:2685808', synonyms='Gm962', version='Ens107', created_by_id='DzTjkKse')

Bionty object#

Note

You may access the full bionty functionalities via the Bionty object: lnschema-bionty.{entity}.bionty()

e.g. bionty.Gene() is the same as lnschema.Gene.bionty()

species_bionty = bt.Species.bionty()
gene_bionty = bt.Gene.bionty(species="mouse")

Create a SQL record from a bionty lookup#

sp_lookup = species_bionty.lookup()
sp_lookup.giant_panda
species(index=2, id='NCBI_9646', name='giant panda', scientific_name='ailuropoda_melanoleuca', division='EnsemblVertebrates', taxon_id=9646, assembly='ASM200744v2', assembly_accession='GCA_002007445.2', genebuild='2020-05-Ensembl/2020-06', variation='N', microarray='N', pan_compara='N', peptide_compara='Y', genome_alignments='Y', other_alignments='Y', core_db='ailuropoda_melanoleuca_core_108_2', species_id=1)
bt.Species(sp_lookup.giant_panda)
Species(id='GtCe', name='giant panda', taxon_id=9646, scientific_name='ailuropoda_melanoleuca', created_by_id='DzTjkKse')

Other bionty functionalities#

df = species_bionty.df()
df.head()
id name scientific_name division taxon_id assembly assembly_accession genebuild variation microarray pan_compara peptide_compara genome_alignments other_alignments core_db species_id
0 NCBI_80966 spiny chromis acanthochromis_polyacanthus EnsemblVertebrates 80966 ASM210954v1 GCA_002109545.1 2018-05-Ensembl/2020-03 N N N Y Y Y acanthochromis_polyacanthus_core_108_1 1
1 NCBI_211598 eurasian sparrowhawk accipiter_nisus EnsemblVertebrates 211598 Accipiter_nisus_ver1.0 GCA_004320145.1 2019-07-Ensembl/2019-09 N N N N N Y accipiter_nisus_core_108_1 1
2 NCBI_9646 giant panda ailuropoda_melanoleuca EnsemblVertebrates 9646 ASM200744v2 GCA_002007445.2 2020-05-Ensembl/2020-06 N N N Y Y Y ailuropoda_melanoleuca_core_108_2 1
3 NCBI_241587 yellow-billed parrot amazona_collaria EnsemblVertebrates 241587 ASM394721v1 GCA_003947215.1 2019-07-Ensembl/2019-09 N N N N N Y amazona_collaria_core_108_1 1
4 NCBI_61819 midas cichlid amphilophus_citrinellus EnsemblVertebrates 61819 Midas_v5 GCA_000751415.1 2018-05-Ensembl/2018-07 N N N Y Y Y amphilophus_citrinellus_core_108_5 1
species_bionty.curate(pd.DataFrame(index=["human", "mouse"]))
✅ 2 terms (100.0%) are mapped.
🔶 0 terms (0.0%) are not mapped.
orig_index __curated__
human human True
mouse mouse True
df = gene_bionty.df()
df.head()
id ensembl_gene_id symbol gene_type description ncbi_gene_id mgi_id synonyms version
0 Epd98t ENSMUSG00000064336 mt-Tf Mt_tRNA mitochondrially encoded tRNA phenylalanine [So... None MGI:102487 tRNA|tRNA-Phe|TrnF tRNA Ens107
1 RiOxA6 ENSMUSG00000064337 mt-Rnr1 Mt_rRNA mitochondrially encoded 12S rRNA [Source:MGI S... None MGI:102493 12S ribosomal RNA|12S rRNA|12SrRNA|Rnr1 s-rRNA Ens107
2 cMIElg ENSMUSG00000064338 mt-Tv Mt_tRNA mitochondrially encoded tRNA valine [Source:MG... None MGI:102472 tRNA|tRNA-Val|TrnaV tRNA Ens107
3 DbiNNA ENSMUSG00000064339 mt-Rnr2 Mt_rRNA mitochondrially encoded 16S rRNA [Source:MGI S... None MGI:102492 16S ribosomal RNA|16S rRNA|16SrRNA|Rnr2 16S ri... Ens107
4 NO6NBF ENSMUSG00000064340 mt-Tl1 Mt_tRNA mitochondrially encoded tRNA leucine 1 [Source... None MGI:102482 tRNA|tRNA Leu|tRNA Leu_1|TrnrL1 tRNA Ens107
gene_bionty.curate(pd.DataFrame(index=["Ap5b1", "Gm6713"]), reference_id="symbol")
✅ 2 terms (100.0%) are mapped.
🔶 0 terms (0.0%) are not mapped.
orig_index __curated__
Ap5b1 Ap5b1 True
Gm6713 Gm6713 True