Bionty schema#
Show code cell content
!lamin init --storage lnbionty-test --schema bionty
ℹ️ Not registering instance on hub, if you want, call `lamin register`
ℹ️ Loading schema modules: core==0.34.0 bionty==0.18.0
✅ Loaded instance: testuser1/lnbionty-test
✅ Created & loaded instance: testuser1/lnbionty-test
import lamindb as ln
import lnschema_bionty as bt
import pandas as pd
✅ Loaded instance: testuser1/lnbionty-test
ln.schema.view()
Create a SQL record from scratch#
This works like any other ORM:
bt.Species(name="new species")
Species(id='C82a', name='new species', created_by_id='DzTjkKse')
bt.Gene(symbol="synTCF7")
Gene(id='aJNGiVfn', symbol='synTCF7', created_by_id='DzTjkKse')
bt.CellType(name="my T cell", ontology_id="my_ontology_id")
CellType(id='4u7zBVY2', name='my T cell', ontology_id='my_ontology_id', created_by_id='DzTjkKse')
Create a SQL record from knowledge#
With configured database and version for each entity (see Bionty Configuration), all other fields can be auto populated given one field value.
bt.Species.from_bionty(name="mouse")
Species(id='vado', name='mouse', taxon_id=10090, scientific_name='mus_musculus', created_by_id='DzTjkKse')
bt.CellType.from_bionty(name="T cell")
CellType(id='QvYE8bIq', name='T cell', synonyms='T-lymphocyte|T lymphocyte|T-cell', ontology_id='CL:0000084', definition='A Type Of Lymphocyte Whose Defining Characteristic Is The Expression Of A T Cell Receptor Complex.', created_by_id='DzTjkKse')
bt.Gene.from_bionty(symbol="TCF7")
Gene(id='geGss49r', ensembl_gene_id='ENSG00000081059', symbol='TCF7', gene_type='protein_coding', description='transcription factor 7 [Source:HGNC Symbol;Acc:HGNC:11639]', ncbi_gene_id=6932, hgnc_id='HGNC:11639', omim_id=189908, synonyms='TCF-1', version='Ens107', created_by_id='DzTjkKse')
Note
For feature entities: Gene
, Protein
and CellMarker
: make sure you configure the correct species (default is “human”):
bt.Gene.from_bionty(symbol="Ap5b1", species="mouse")
Gene(id='rLsAaitp', ensembl_gene_id='ENSMUSG00000049562', symbol='Ap5b1', gene_type='protein_coding', description='adaptor-related protein complex 5, beta 1 subunit [Source:MGI Symbol;Acc:MGI:2685808]', ncbi_gene_id=381201, mgi_id='MGI:2685808', synonyms='Gm962', version='Ens107', created_by_id='DzTjkKse')
Bionty object#
Note
You may access the full bionty functionalities via the Bionty object: lnschema-bionty.{entity}.bionty()
e.g.
bionty.Gene()
is the same as lnschema.Gene.bionty()
species_bionty = bt.Species.bionty()
gene_bionty = bt.Gene.bionty(species="mouse")
Create a SQL record from a bionty lookup#
sp_lookup = species_bionty.lookup()
sp_lookup.giant_panda
species(index=2, id='NCBI_9646', name='giant panda', scientific_name='ailuropoda_melanoleuca', division='EnsemblVertebrates', taxon_id=9646, assembly='ASM200744v2', assembly_accession='GCA_002007445.2', genebuild='2020-05-Ensembl/2020-06', variation='N', microarray='N', pan_compara='N', peptide_compara='Y', genome_alignments='Y', other_alignments='Y', core_db='ailuropoda_melanoleuca_core_108_2', species_id=1)
bt.Species(sp_lookup.giant_panda)
Species(id='GtCe', name='giant panda', taxon_id=9646, scientific_name='ailuropoda_melanoleuca', created_by_id='DzTjkKse')
Other bionty functionalities#
df = species_bionty.df()
df.head()
id | name | scientific_name | division | taxon_id | assembly | assembly_accession | genebuild | variation | microarray | pan_compara | peptide_compara | genome_alignments | other_alignments | core_db | species_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NCBI_80966 | spiny chromis | acanthochromis_polyacanthus | EnsemblVertebrates | 80966 | ASM210954v1 | GCA_002109545.1 | 2018-05-Ensembl/2020-03 | N | N | N | Y | Y | Y | acanthochromis_polyacanthus_core_108_1 | 1 |
1 | NCBI_211598 | eurasian sparrowhawk | accipiter_nisus | EnsemblVertebrates | 211598 | Accipiter_nisus_ver1.0 | GCA_004320145.1 | 2019-07-Ensembl/2019-09 | N | N | N | N | N | Y | accipiter_nisus_core_108_1 | 1 |
2 | NCBI_9646 | giant panda | ailuropoda_melanoleuca | EnsemblVertebrates | 9646 | ASM200744v2 | GCA_002007445.2 | 2020-05-Ensembl/2020-06 | N | N | N | Y | Y | Y | ailuropoda_melanoleuca_core_108_2 | 1 |
3 | NCBI_241587 | yellow-billed parrot | amazona_collaria | EnsemblVertebrates | 241587 | ASM394721v1 | GCA_003947215.1 | 2019-07-Ensembl/2019-09 | N | N | N | N | N | Y | amazona_collaria_core_108_1 | 1 |
4 | NCBI_61819 | midas cichlid | amphilophus_citrinellus | EnsemblVertebrates | 61819 | Midas_v5 | GCA_000751415.1 | 2018-05-Ensembl/2018-07 | N | N | N | Y | Y | Y | amphilophus_citrinellus_core_108_5 | 1 |
species_bionty.curate(pd.DataFrame(index=["human", "mouse"]))
✅ 2 terms (100.0%) are mapped.
🔶 0 terms (0.0%) are not mapped.
orig_index | __curated__ | |
---|---|---|
human | human | True |
mouse | mouse | True |
df = gene_bionty.df()
df.head()
id | ensembl_gene_id | symbol | gene_type | description | ncbi_gene_id | mgi_id | synonyms | version | |
---|---|---|---|---|---|---|---|---|---|
0 | Epd98t | ENSMUSG00000064336 | mt-Tf | Mt_tRNA | mitochondrially encoded tRNA phenylalanine [So... | None | MGI:102487 | tRNA|tRNA-Phe|TrnF tRNA | Ens107 |
1 | RiOxA6 | ENSMUSG00000064337 | mt-Rnr1 | Mt_rRNA | mitochondrially encoded 12S rRNA [Source:MGI S... | None | MGI:102493 | 12S ribosomal RNA|12S rRNA|12SrRNA|Rnr1 s-rRNA | Ens107 |
2 | cMIElg | ENSMUSG00000064338 | mt-Tv | Mt_tRNA | mitochondrially encoded tRNA valine [Source:MG... | None | MGI:102472 | tRNA|tRNA-Val|TrnaV tRNA | Ens107 |
3 | DbiNNA | ENSMUSG00000064339 | mt-Rnr2 | Mt_rRNA | mitochondrially encoded 16S rRNA [Source:MGI S... | None | MGI:102492 | 16S ribosomal RNA|16S rRNA|16SrRNA|Rnr2 16S ri... | Ens107 |
4 | NO6NBF | ENSMUSG00000064340 | mt-Tl1 | Mt_tRNA | mitochondrially encoded tRNA leucine 1 [Source... | None | MGI:102482 | tRNA|tRNA Leu|tRNA Leu_1|TrnrL1 tRNA | Ens107 |
gene_bionty.curate(pd.DataFrame(index=["Ap5b1", "Gm6713"]), reference_id="symbol")
✅ 2 terms (100.0%) are mapped.
🔶 0 terms (0.0%) are not mapped.
orig_index | __curated__ | |
---|---|---|
Ap5b1 | Ap5b1 | True |
Gm6713 | Gm6713 | True |