Jupyter Notebook

Manage biological registries#

Registries manage the formalized knowledge & experimental design that anchor dry & wetlab work.

In LaminDB, registries are standard SQL tables, equipped with mechanisms that avoid typos & duplicated data.

In addition, LaminDB makes it easy to import records from public ontologies, based on plug-in lnschema_bionty.

In this notebook, you’ll see how to manage an in-house ontology anchored in public knowledge.

(If you also manage experimental design through registries, you can access all metadata through one API and store it in one simple SQL database.)

Setup#

Let us create an instance that has lnschema_bionty mounted:

!lamin init --storage ./test-registries --schema bionty
Hide code cell output
✅ saved: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-09-26 15:20:54)
✅ saved: Storage(id='9v3TTOwG', root='/home/runner/work/lamindb/lamindb/docs/test-registries', type='local', updated_at=2023-09-26 15:20:54, created_by_id='DzTjkKse')
💡 loaded instance: testuser1/test-registries
💡 did not register local instance on hub (if you want, call `lamin register`)

import lamindb as ln
import lnschema_bionty as lb
💡 loaded instance: testuser1/test-registries (lamindb 0.54.2)
ln.settings.verbosity = "info"

Let’s pre-populate the cell type registry with a few records:

lb.Species.from_bionty(name="human").save()
lb.CellType.from_bionty(name="T cell").save()
lb.CellType(name="my T cell subtype").save()
Hide code cell output
💡 downloading Species source file from: https://ftp.ensembl.org/pub/release-110/species_EnsemblVertebrates.txt
✅ created 1 Species record from Bionty matching name: 'human'
✅ created 1 CellType record from Bionty matching name: 'T cell'
💡 also saving parents of CellType(id='BxNjby0x', name='T cell', ontology_id='CL:0000084', synonyms='T-lymphocyte|T-cell|T lymphocyte', description='A Type Of Lymphocyte Whose Defining Characteristic Is The Expression Of A T Cell Receptor Complex.', updated_at=2023-09-26 15:21:01, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: 'CL:0000542'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 also saving parents of CellType(id='g8slxY8X', name='lymphocyte', ontology_id='CL:0000542', description='A Lymphocyte Is A Leukocyte Commonly Found In The Blood And Lymph That Has The Characteristics Of A Large Nucleus, A Neutral Staining Cytoplasm, And Prominent Heterochromatin.', updated_at=2023-09-26 15:21:02, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: 'CL:0000738'
💡 also saving parents of CellType(id='MkrH0gsX', name='leukocyte', ontology_id='CL:0000738', synonyms='white blood cell|leucocyte', description='An Achromatic Cell Of The Myeloid Or Lymphoid Lineages Capable Of Ameboid Movement, Found In Blood Or Other Tissue.', updated_at=2023-09-26 15:21:03, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: 'CL:0000988'
💡 also saving parents of CellType(id='Q0aQr5JB', name='hematopoietic cell', ontology_id='CL:0000988', synonyms='haematopoietic cell|hemopoietic cell|haemopoietic cell', description='A Cell Of A Hematopoietic Lineage.', updated_at=2023-09-26 15:21:05, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ created 2 CellType records from Bionty matching ontology_id: 'CL:0002371', 'CL:0000548'
💡 also saving parents of CellType(id='QMAH6IlS', name='somatic cell', ontology_id='CL:0002371', description='A Cell Of An Organism That Does Not Pass On Its Genetic Material To The Organism'S Offspring (I.E. A Non-Germ Line Cell).', updated_at=2023-09-26 15:21:06, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ loaded 1 CellType record matching ontology_id: 'CL:0000548'
✅ created 1 CellType record from Bionty matching ontology_id: 'CL:0000003'
💡 also saving parents of CellType(id='VT73gpK2', name='native cell', ontology_id='CL:0000003', description='A Cell That Is Found In A Natural Setting, Which Includes Multicellular Organism Cells 'In Vivo' (I.E. Part Of An Organism), And Unicellular Organisms 'In Environment' (I.E. Part Of A Natural Environment).', updated_at=2023-09-26 15:21:07, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: 'CL:0000000'
💡 also saving parents of CellType(id='H0taCt24', name='animal cell', ontology_id='CL:0000548', synonyms='metazoan cell', description='A Native Cell That Is Part Of Some Metazoa.', updated_at=2023-09-26 15:21:06, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: 'CL:0000255'
❗ records with similar names exist! did you mean to load one of them?
id synonyms __ratio__
name
T cell BxNjby0x T-lymphocyte|T-cell|T lymphocyte 90.0
cell Ry0JGwSD 90.0

Access records in public ontologies#

We start with a public ontology for cell types.

Bionty - short for “biological entity” - is a class for accessing public ontologies.

Bionty provides simple access to curated public ontologies that Lamin hosts for reliable and performant access. For most Bionty objects, you can access the underlying ontology through Pronto.

(If you don’t need to manage in-house registries, you can also use the bionty package standalone.)

Let’s create a Bionty object:

bionty = lb.CellType.bionty()
bionty
CellType
Species: all
Source: cl, 2023-04-20
#terms: 2862

📖 CellType.df(): ontology reference table
🔎 CellType.lookup(): autocompletion of terms
🎯 CellType.search(): free text search of terms
✅ CellType.validate(): strictly validate values
🧐 CellType.inspect(): full inspection of values
👽 CellType.standardize(): convert to standardized names
🪜 CellType.diff(): difference between two versions
🔗 CellType.ontology: Pronto.Ontology object

We can use it to search the public ontology against cell types:

bionty.search("gamma delta T cell").head(3)
ontology_id definition synonyms parents __ratio__
name
gamma-delta T cell CL:0000798 A T Cell That Expresses A Gamma-Delta T Cell R... gamma-delta T lymphocyte|gamma-delta T-lymphoc... [CL:0000084] 100.0
mature gamma-delta T cell CL:0000800 A Gamma-Delta T Cell That Has A Mature Phenoty... mature gamma-delta T-cell|mature gamma-delta T... [CL:0000798] 95.0
CD27-negative gamma-delta T cell CL:0002125 A Circulating Gamma-Delta T Cell That Expresse... gammadelta-17 cells [CL:0000800] 90.0

And we can also use it to look up cell types with auto-complete:

lookup = bionty.lookup()
lookup.gamma_delta_t_cell
CellType(ontology_id='CL:0000798', name='gamma-delta T cell', definition='A T Cell That Expresses A Gamma-Delta T Cell Receptor Complex.', synonyms='gamma-delta T lymphocyte|gamma-delta T-lymphocyte|gammadelta T cell|gamma-delta T-cell', parents=array(['CL:0000084'], dtype=object))

Create records in in-house ontologies#

We can now create a record for our in-house SQL registry by passing the result of the lookup in the public ontology to the CellType constructor:

gdt_cell = lb.CellType(lookup.gamma_delta_t_cell)
❗ records with similar names exist! did you mean to load one of them?
id synonyms __ratio__
name
T cell BxNjby0x T-lymphocyte|T-cell|T lymphocyte 90.0
cell Ry0JGwSD 90.0

(Alternatively, we could construct the gamma delta T cell via from_bionty(), which is synonyms-aware.)

gdt_cell
CellType(id='64kIG7So', name='gamma-delta T cell', ontology_id='CL:0000798', synonyms='gamma-delta T lymphocyte|gamma-delta T-lymphocyte|gammadelta T cell|gamma-delta T-cell', bionty_source_id='7npi', created_by_id='DzTjkKse')

When we save this record to the registry, logging informs us that we’re also saving parent ontological terms:

gdt_cell.save()
Hide code cell output
💡 also saving parents of CellType(id='64kIG7So', name='gamma-delta T cell', ontology_id='CL:0000798', synonyms='gamma-delta T lymphocyte|gamma-delta T-lymphocyte|gammadelta T cell|gamma-delta T-cell', updated_at=2023-09-26 15:21:11, bionty_source_id='7npi', created_by_id='DzTjkKse')
Will I always see parents being saved?

No, this only happens a single time.

  • If we accidentally save the same record again, it will be recognized that the record and all parents are already in the registry.

  • If we save another record that has overlapping parents, only new parents will be saved.

View the ontological hierarchy:

gdt_cell.view_parents()
_images/e75f45e5a66bf29b78ceb290ff39759cae521d46d4bfeebf2eb0a440de68d02b.svg

Or access the parents directly:

gdt_cell.parents.df()
name ontology_id abbr synonyms description bionty_source_id updated_at created_by_id
id
BxNjby0x T cell CL:0000084 None T-lymphocyte|T-cell|T lymphocyte A Type Of Lymphocyte Whose Defining Characteri... 7npi 2023-09-26 15:21:01 DzTjkKse

You can construct custom hierarchies of terms by specifying parents:

my_celltype = lb.CellType.filter(name="my T cell subtype").one()
my_celltype.parents.add(gdt_cell)
gdt_cell.view_parents(distance=2, with_children=True)
_images/e11e253316b4f38cf351ae9f52562557ca2077805f38839b797cc4cd31d855d1.svg

This cell type and all its parents can now be queried & searched in the registry using lb.CellType.filter and lb.CellType.search.

Load records for values in data sources#

When accessing data sources, one often encounters bulk references to entities that might be corrupted or curated using different standardization schemes.

Let’s consider an example based on an AnnData object:

adata = ln.dev.datasets.anndata_with_obs()

In the cell_type annotations of this AnnData object, we find 4 references to cell types:

adata.obs.cell_type.value_counts()
T cell                     10
hematopoietic stem cell    10
hepatocyte                 10
my new cell type           10
Name: cell_type, dtype: int64

We’d like to load the corresponding records in our in-house ontology to annotate the batch of data.

To this end, you’ll typically use from_values, which will both validate & load records that match the values.

cell_types = lb.CellType.from_values(adata.obs.cell_type)

cell_types
✅ loaded 1 CellType record matching name: 'T cell'
✅ created 2 CellType records from Bionty matching name: 'hematopoietic stem cell', 'hepatocyte'
did not create CellType record for 1 non-validated name: 'my new cell type'
[CellType(id='BxNjby0x', name='T cell', ontology_id='CL:0000084', synonyms='T-lymphocyte|T-cell|T lymphocyte', description='A Type Of Lymphocyte Whose Defining Characteristic Is The Expression Of A T Cell Receptor Complex.', updated_at=2023-09-26 15:21:01, bionty_source_id='7npi', created_by_id='DzTjkKse'),
 CellType(id='m91LZBDZ', name='hematopoietic stem cell', ontology_id='CL:0000037', synonyms='blood forming stem cell|hemopoietic stem cell|HSC', description='A Stem Cell From Which All Cells Of The Lymphoid And Myeloid Lineages Develop, Including Blood Cells And Cells Of The Immune System. Hematopoietic Stem Cells Lack Cell Markers Of Effector Cells (Lin-Negative). Lin-Negative Is Defined By Lacking One Or More Of The Following Cell Surface Markers: Cd2, Cd3 Epsilon, Cd4, Cd5 ,Cd8 Alpha Chain, Cd11B, Cd14, Cd19, Cd20, Cd56, Ly6G, Ter119.', bionty_source_id='7npi', created_by_id='DzTjkKse'),
 CellType(id='J7hHC8SK', name='hepatocyte', ontology_id='CL:0000182', description='The Main Structural Component Of The Liver. They Are Specialized Epithelial Cells That Are Organized Into Interconnected Plates Called Lobules. Majority Of Cell Population Of Liver, Polygonal In Shape, Arranged In Plates Or Trabeculae Between Sinusoids; May Have Single Nucleus Or Binucleated.', bionty_source_id='7npi', created_by_id='DzTjkKse')]

Logging informed us that all 4 cell types are validated.

And because we loaded these records at the same time, we could readily use them to annotate a batch of data.

What happened under-the-hood?

.from_values() performs the following look ups:

  1. If registry records match the values, load these records

  2. If values match synonyms of registry records, load these records

  3. (lnschema_bionty-only) If no record in the registry matches, attempt to load records from a public reference through Bionty

  4. (lnschema_bionty-only) Same as 3. but based on synonyms

No records will be returned if input field values aren’t mappable.

Example:

celltype_names = [
    "gamma-delta T cell",  # existing record with the same name
    "T lymphocyte",  # existing record with synonym
    "hepatocyte",  # Bionty record with the same name
    "HSC",  # Bionty record with synonym
    "my new cell type",  # Not exist in DB, not exist in Bionty
]
lb.CellType.from_values(celltype_names)

This returns records for all names except from “my new cell type”.

If you’d like to add this new value to the registry, do it like so:

my_celltype = lb.CellType(name="my new cell type")
my_celltype.save()

Alternatively, we can create entries based on ontology ids:

adata.obs.cell_type_id.unique().tolist()
['CL:0000084', 'CL:0000037', 'CL:0000182', '']
lb.CellType.from_values(adata.obs.cell_type_id, field=lb.CellType.ontology_id)
✅ loaded 1 CellType record matching ontology_id: 'CL:0000084'
✅ created 2 CellType records from Bionty matching ontology_id: 'CL:0000037', 'CL:0000182'
[CellType(id='BxNjby0x', name='T cell', ontology_id='CL:0000084', synonyms='T-lymphocyte|T-cell|T lymphocyte', description='A Type Of Lymphocyte Whose Defining Characteristic Is The Expression Of A T Cell Receptor Complex.', updated_at=2023-09-26 15:21:01, bionty_source_id='7npi', created_by_id='DzTjkKse'),
 CellType(id='m91LZBDZ', name='hematopoietic stem cell', ontology_id='CL:0000037', synonyms='blood forming stem cell|hemopoietic stem cell|HSC', description='A Stem Cell From Which All Cells Of The Lymphoid And Myeloid Lineages Develop, Including Blood Cells And Cells Of The Immune System. Hematopoietic Stem Cells Lack Cell Markers Of Effector Cells (Lin-Negative). Lin-Negative Is Defined By Lacking One Or More Of The Following Cell Surface Markers: Cd2, Cd3 Epsilon, Cd4, Cd5 ,Cd8 Alpha Chain, Cd11B, Cd14, Cd19, Cd20, Cd56, Ly6G, Ter119.', bionty_source_id='7npi', created_by_id='DzTjkKse'),
 CellType(id='J7hHC8SK', name='hepatocyte', ontology_id='CL:0000182', description='The Main Structural Component Of The Liver. They Are Specialized Epithelial Cells That Are Organized Into Interconnected Plates Called Lobules. Majority Of Cell Population Of Liver, Polygonal In Shape, Arranged In Plates Or Trabeculae Between Sinusoids; May Have Single Nucleus Or Binucleated.', bionty_source_id='7npi', created_by_id='DzTjkKse')]

If we’re happy with cell_types records, we save them to the registry:

ln.save(cell_types)
Hide code cell output
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 also saving parents of CellType(id='J7hHC8SK', name='hepatocyte', ontology_id='CL:0000182', description='The Main Structural Component Of The Liver. They Are Specialized Epithelial Cells That Are Organized Into Interconnected Plates Called Lobules. Majority Of Cell Population Of Liver, Polygonal In Shape, Arranged In Plates Or Trabeculae Between Sinusoids; May Have Single Nucleus Or Binucleated.', updated_at=2023-09-26 15:21:15, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ loaded 1 CellType record matching ontology_id: 'CL:0002371'
✅ created 2 CellType records from Bionty matching ontology_id: 'CL:0000066', 'CL:0000417'
💡 also saving parents of CellType(id='AOy0Et6k', name='endopolyploid cell', ontology_id='CL:0000417', updated_at=2023-09-26 15:21:16, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: 'CL:0000412'
💡 also saving parents of CellType(id='eqVIXZEb', name='polyploid cell', ontology_id='CL:0000412', description='A Cell Whose Nucleus, Or Nuclei, Each Contain More Than Two Haploid Genomes.', updated_at=2023-09-26 15:21:17, bionty_source_id='7npi', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='P6E7yrc7', name='epithelial cell', ontology_id='CL:0000066', synonyms='epitheliocyte', description='A Cell That Is Usually Found In A Two-Dimensional Sheet With A Free Surface. The Cell Has A Cytoskeleton That Allows For Tight Cell To Cell Contact And For Cell Polarity Where Apical Part Is Directed Towards The Lumen And The Basal Part To The Basal Lamina.', updated_at=2023-09-26 15:21:16, bionty_source_id='7npi', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='m91LZBDZ', name='hematopoietic stem cell', ontology_id='CL:0000037', synonyms='blood forming stem cell|hemopoietic stem cell|HSC', description='A Stem Cell From Which All Cells Of The Lymphoid And Myeloid Lineages Develop, Including Blood Cells And Cells Of The Immune System. Hematopoietic Stem Cells Lack Cell Markers Of Effector Cells (Lin-Negative). Lin-Negative Is Defined By Lacking One Or More Of The Following Cell Surface Markers: Cd2, Cd3 Epsilon, Cd4, Cd5 ,Cd8 Alpha Chain, Cd11B, Cd14, Cd19, Cd20, Cd56, Ly6G, Ter119.', updated_at=2023-09-26 15:21:15, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ loaded 1 CellType record matching ontology_id: 'CL:0000988'
✅ created 1 CellType record from Bionty matching ontology_id: 'CL:0008001'
💡 also saving parents of CellType(id='0d3ym06W', name='hematopoietic precursor cell', ontology_id='CL:0008001', description='Any Hematopoietic Cell That Is A Precursor Of Some Other Hematopoietic Cell Type.', updated_at=2023-09-26 15:21:18, bionty_source_id='7npi', created_by_id='DzTjkKse')

Now, let’s inspect our in-house registry:

lb.CellType.filter().df()
Hide code cell output
name ontology_id abbr synonyms description bionty_source_id updated_at created_by_id
id
BxNjby0x T cell CL:0000084 None T-lymphocyte|T-cell|T lymphocyte A Type Of Lymphocyte Whose Defining Characteri... 7npi 2023-09-26 15:21:01 DzTjkKse
g8slxY8X lymphocyte CL:0000542 None None A Lymphocyte Is A Leukocyte Commonly Found In ... 7npi 2023-09-26 15:21:02 DzTjkKse
MkrH0gsX leukocyte CL:0000738 None white blood cell|leucocyte An Achromatic Cell Of The Myeloid Or Lymphoid ... 7npi 2023-09-26 15:21:03 DzTjkKse
Q0aQr5JB hematopoietic cell CL:0000988 None haematopoietic cell|hemopoietic cell|haemopoie... A Cell Of A Hematopoietic Lineage. 7npi 2023-09-26 15:21:05 DzTjkKse
QMAH6IlS somatic cell CL:0002371 None None A Cell Of An Organism That Does Not Pass On It... 7npi 2023-09-26 15:21:06 DzTjkKse
H0taCt24 animal cell CL:0000548 None metazoan cell A Native Cell That Is Part Of Some Metazoa. 7npi 2023-09-26 15:21:06 DzTjkKse
VT73gpK2 native cell CL:0000003 None None A Cell That Is Found In A Natural Setting, Whi... 7npi 2023-09-26 15:21:07 DzTjkKse
Ry0JGwSD cell CL:0000000 None None A Material Entity Of Anatomical Origin (Part O... 7npi 2023-09-26 15:21:08 DzTjkKse
igNGxgJT eukaryotic cell CL:0000255 None None None 7npi 2023-09-26 15:21:09 DzTjkKse
uUhIkxU7 my T cell subtype None None None None None 2023-09-26 15:21:09 DzTjkKse
64kIG7So gamma-delta T cell CL:0000798 None gamma-delta T lymphocyte|gamma-delta T-lymphoc... None 7npi 2023-09-26 15:21:11 DzTjkKse
J7hHC8SK hepatocyte CL:0000182 None None The Main Structural Component Of The Liver. Th... 7npi 2023-09-26 15:21:15 DzTjkKse
m91LZBDZ hematopoietic stem cell CL:0000037 None blood forming stem cell|hemopoietic stem cell|HSC A Stem Cell From Which All Cells Of The Lympho... 7npi 2023-09-26 15:21:15 DzTjkKse
AOy0Et6k endopolyploid cell CL:0000417 None None None 7npi 2023-09-26 15:21:16 DzTjkKse
P6E7yrc7 epithelial cell CL:0000066 None epitheliocyte A Cell That Is Usually Found In A Two-Dimensio... 7npi 2023-09-26 15:21:16 DzTjkKse
eqVIXZEb polyploid cell CL:0000412 None None A Cell Whose Nucleus, Or Nuclei, Each Contain ... 7npi 2023-09-26 15:21:17 DzTjkKse
0d3ym06W hematopoietic precursor cell CL:0008001 None None Any Hematopoietic Cell That Is A Precursor Of ... 7npi 2023-09-26 15:21:18 DzTjkKse

Access records in in-house ontologies#

Search:

lb.CellType.search("gamma delta T cell").head(2)
id synonyms __ratio__
name
gamma-delta T cell 64kIG7So gamma-delta T lymphocyte|gamma-delta T-lymphoc... 100.0
T cell BxNjby0x T-lymphocyte|T-cell|T lymphocyte 90.0

Or look up with auto-complete:

cell_types = lb.CellType.lookup()
hsc_record = cell_types.hematopoietic_stem_cell

hsc_record
CellType(id='m91LZBDZ', name='hematopoietic stem cell', ontology_id='CL:0000037', synonyms='blood forming stem cell|hemopoietic stem cell|HSC', description='A Stem Cell From Which All Cells Of The Lymphoid And Myeloid Lineages Develop, Including Blood Cells And Cells Of The Immune System. Hematopoietic Stem Cells Lack Cell Markers Of Effector Cells (Lin-Negative). Lin-Negative Is Defined By Lacking One Or More Of The Following Cell Surface Markers: Cd2, Cd3 Epsilon, Cd4, Cd5 ,Cd8 Alpha Chain, Cd11B, Cd14, Cd19, Cd20, Cd56, Ly6G, Ter119.', updated_at=2023-09-26 15:21:15, bionty_source_id='7npi', created_by_id='DzTjkKse')

Validate & standardize#

Simple validation of an iterable of values works like so:

lb.CellType.validate(["HSC", "blood forming stem cell"])
2 terms (100.00%) are not validated for name: HSC, blood forming stem cell
array([False, False])

Because these values don’t comply with the registry, they’re not validated!

You can easily convert these values to validated standardized names based on synonyms like so:

lb.CellType.standardize(["HSC", "blood forming stem cell"])
💡 standardized 2/2 terms
['hematopoietic stem cell', 'hematopoietic stem cell']

Alternatively, you can use .from_values(), which will only ever create validated records and automatically standardize under-the-hood:

lb.CellType.from_values(["HSC", "blood forming stem cell"])
✅ loaded 2 CellType records matching synonyms: 'HSC', 'blood forming stem cell'
[CellType(id='m91LZBDZ', name='hematopoietic stem cell', ontology_id='CL:0000037', synonyms='blood forming stem cell|hemopoietic stem cell|HSC', description='A Stem Cell From Which All Cells Of The Lymphoid And Myeloid Lineages Develop, Including Blood Cells And Cells Of The Immune System. Hematopoietic Stem Cells Lack Cell Markers Of Effector Cells (Lin-Negative). Lin-Negative Is Defined By Lacking One Or More Of The Following Cell Surface Markers: Cd2, Cd3 Epsilon, Cd4, Cd5 ,Cd8 Alpha Chain, Cd11B, Cd14, Cd19, Cd20, Cd56, Ly6G, Ter119.', updated_at=2023-09-26 15:21:15, bionty_source_id='7npi', created_by_id='DzTjkKse')]

We can also add new synonyms to a record like so:

hsc_record.add_synonym("HSCs")

And when we encounter this synonym as a value, it will now be standardized using synonyms-lookup, and mapped on the correct registry record:

lb.CellType.standardize(["HSCs"])
💡 standardized 1/1 terms
['hematopoietic stem cell']

A special synonym is .abbr (short for abbreviation), which has its own field and can be assigned via:

hsc_record.set_abbr("HSC")

You can create a lookup object from the .abbr field:

cell_types = lb.CellType.lookup("abbr")
hsc = cell_types.hsc
hsc
CellType(id='m91LZBDZ', name='hematopoietic stem cell', ontology_id='CL:0000037', abbr='HSC', synonyms='HSCs|blood forming stem cell|HSC|hemopoietic stem cell', description='A Stem Cell From Which All Cells Of The Lymphoid And Myeloid Lineages Develop, Including Blood Cells And Cells Of The Immune System. Hematopoietic Stem Cells Lack Cell Markers Of Effector Cells (Lin-Negative). Lin-Negative Is Defined By Lacking One Or More Of The Following Cell Surface Markers: Cd2, Cd3 Epsilon, Cd4, Cd5 ,Cd8 Alpha Chain, Cd11B, Cd14, Cd19, Cd20, Cd56, Ly6G, Ter119.', updated_at=2023-09-26 15:21:19, bionty_source_id='7npi', created_by_id='DzTjkKse')

The same workflow works for all of lnschema_bionty’s registries.

Manage registries across species#

Most registries are species-aware, for instance, Gene:

lb.Gene.from_bionty(symbol="TCF7", species="human")
✅ created 1 Gene record from Bionty matching symbol: 'TCF7'
Gene(id='0StEa7eEhivb', symbol='TCF7', ensembl_gene_id='ENSG00000081059', ncbi_gene_ids='6932', biotype='protein_coding', description='transcription factor 7 [Source:HGNC Symbol;Acc:HGNC:11639]', synonyms='TCF-1', species_id='uHJU', bionty_source_id='LIw3', created_by_id='DzTjkKse')

Similarly, API calls that interact with multi-species registries accept a species argument, e.g.:

lb.Gene.validate(["TCF7", "ABC1"], species="human")
Hide code cell output
2 terms (100.00%) are not validated for symbol: TCF7, ABC1
array([False, False])

You can also pass species for validating features upon registering data, e.g., in ln.File.from_anndata(..., field=lb.Gene.ensembl_gene_id, species=...).

And when working with the same species throughout your analysis/workflow, you can omit the species argument by configuring it globally:

lb.settings.species = "mouse"
lb.Gene.from_bionty(symbol="Ap5b1")
✅ created 1 Gene record from Bionty matching symbol: 'Ap5b1'
Gene(id='B6eFOVlpj5om', symbol='Ap5b1', ensembl_gene_id='ENSMUSG00000049562', ncbi_gene_ids='381201', biotype='protein_coding', description='adaptor-related protein complex 5, beta 1 subunit [Source:MGI Symbol;Acc:MGI:2685808]', synonyms='Gm962', species_id='vado', bionty_source_id='YoR2', created_by_id='DzTjkKse')

Track underlying ontology versions#

Under-the-hood, source ontology versions are automatically tracked:

lb.BiontySource.filter(currently_used=True).df()
Hide code cell output
entity species currently_used source source_name version url md5 source_website updated_at created_by_id
id
kuTH Species vertebrates True ensembl Ensembl release-110 https://ftp.ensembl.org/pub/release-110/specie... f3faf95648d3a2b50fd3625456739706 https://www.ensembl.org 2023-09-26 15:20:54 DzTjkKse
dkvi Species bacteria True ensembl Ensembl release-57 https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacte... ee28510ed5586ea7ab4495717c96efc8 https://www.ensembl.org 2023-09-26 15:20:54 DzTjkKse
4T1v Species fungi True ensembl Ensembl release-57 http://ftp.ensemblgenomes.org/pub/fungi/releas... dbcde58f4396ab8b2480f7fe9f83df8a https://www.ensembl.org 2023-09-26 15:20:54 DzTjkKse
2WAU Species metazoa True ensembl Ensembl release-57 http://ftp.ensemblgenomes.org/pub/metazoa/rele... 424636a574fec078a61cbdddb05f9132 https://www.ensembl.org 2023-09-26 15:20:54 DzTjkKse
cDTn Species plants True ensembl Ensembl release-57 https://ftp.ensemblgenomes.ebi.ac.uk/pub/plant... eadaa1f3e527e4c3940c90c7fa5c8bf4 https://www.ensembl.org 2023-09-26 15:20:54 DzTjkKse
LIw3 Gene human True ensembl Ensembl release-110 s3://bionty-assets/df_human__ensembl__release-... 832f3947e83664588d419608a469b528 https://www.ensembl.org 2023-09-26 15:20:54 DzTjkKse
YoR2 Gene mouse True ensembl Ensembl release-110 s3://bionty-assets/df_mouse__ensembl__release-... fa4ce130f2929aefd7ac3bc8eaf0c4de https://www.ensembl.org 2023-09-26 15:20:54 DzTjkKse
tep3 Gene saccharomyces cerevisiae True ensembl Ensembl release-110 s3://bionty-assets/df_saccharomyces cerevisiae... 2e59495a3e87ea6575e408697dd73459 https://www.ensembl.org 2023-09-26 15:20:54 DzTjkKse
NSV4 Protein human True uniprot Uniprot 2023-03 s3://bionty-assets/df_human__uniprot__2023-03_... 1c46e85c6faf5eff3de5b4e1e4edc4d3 https://www.uniprot.org 2023-09-26 15:20:54 DzTjkKse
NkNa Protein mouse True uniprot Uniprot 2023-03 s3://bionty-assets/df_mouse__uniprot__2023-03_... 9d5e9a8225011d3218e10f9bbb96a46c https://www.uniprot.org 2023-09-26 15:20:54 DzTjkKse
TbT9 CellMarker human True cellmarker CellMarker 2.0 s3://bionty-assets/human_cellmarker_2.0_CellMa... d565d4a542a5c7e7a06255975358e4f4 http://bio-bigdata.hrbmu.edu.cn/CellMarker 2023-09-26 15:20:54 DzTjkKse
Fnl7 CellMarker mouse True cellmarker CellMarker 2.0 s3://bionty-assets/mouse_cellmarker_2.0_CellMa... 189586732c63be949e40dfa6a3636105 http://bio-bigdata.hrbmu.edu.cn/CellMarker 2023-09-26 15:20:54 DzTjkKse
MLRc CellLine all True clo Cell Line Ontology 2022-03-21 https://data.bioontology.org/ontologies/CLO/su... ea58a1010b7e745702a8397a526b3a33 https://bioportal.bioontology.org/ontologies/CLO 2023-09-26 15:20:54 DzTjkKse
7npi CellType all True cl Cell Ontology 2023-04-20 http://purl.obolibrary.org/obo/cl/releases/202... 58cdc1545f0d35e6fce76a65331b00fb https://obophenotype.github.io/cell-ontology 2023-09-26 15:20:54 DzTjkKse
Xcgu Tissue all True uberon Uberon multi-species anatomy ontology 2023-04-19 http://purl.obolibrary.org/obo/uberon/releases... 5611dd1375d5a95ac7d7de8e25e6016f http://obophenotype.github.io/uberon 2023-09-26 15:20:54 DzTjkKse
DXsf Disease all True mondo Mondo Disease Ontology 2023-04-04 http://purl.obolibrary.org/obo/mondo/releases/... 700c43dd9ba51aecc7a8edfc3bc2dab1 https://mondo.monarchinitiative.org 2023-09-26 15:20:54 DzTjkKse
qmRy Disease human True doid Human Disease Ontology 2023-03-31 http://purl.obolibrary.org/obo/doid/releases/2... 64f083a1e47867c307c8eae308afc3bb https://disease-ontology.org 2023-09-26 15:20:54 DzTjkKse
oGI0 ExperimentalFactor all True efo The Experimental Factor Ontology 3.48.0 http://www.ebi.ac.uk/efo/releases/v3.48.0/efo.owl 3367e9a9ae3dee9113024e5108c49091 https://bioportal.bioontology.org/ontologies/EFO 2023-09-26 15:20:54 DzTjkKse
22jy Phenotype human True hp Human Phenotype Ontology 2023-06-17 https://github.com/obophenotype/human-phenotyp... 65e8d96bc81deb893163927063b10c06 https://hpo.jax.org 2023-09-26 15:20:54 DzTjkKse
Flv4 Phenotype mammalian True mp Mammalian Phenotype Ontology 2023-05-31 https://github.com/mgijax/mammalian-phenotype-... be89052cf6d9c0b6197038fe347ef293 https://github.com/mgijax/mammalian-phenotype-... 2023-09-26 15:20:54 DzTjkKse
cVc8 Phenotype zebrafish True zp Zebrafish Phenotype Ontology 2022-12-17 https://github.com/obophenotype/zebrafish-phen... 03430b567bf153216c0fa4c3440b3b24 https://github.com/obophenotype/zebrafish-phen... 2023-09-26 15:20:54 DzTjkKse
J3tT Phenotype all True pato Phenotype And Trait Ontology 2023-05-18 http://purl.obolibrary.org/obo/pato/releases/2... bd472f4971492109493d4ad8a779a8dd https://github.com/pato-ontology/pato 2023-09-26 15:20:54 DzTjkKse
VPpX Pathway all True go Gene Ontology 2023-05-10 https://data.bioontology.org/ontologies/GO/sub... e9845499eadaef2418f464cd7e9ac92e http://geneontology.org 2023-09-26 15:20:54 DzTjkKse
gXBr BFXPipeline all True lamin Bioinformatics Pipeline 1.0.0 s3://bionty-assets/bfxpipelines.json a7eff57a256994692fba46e0199ffc94 https://lamin.ai 2023-09-26 15:20:54 DzTjkKse
Ry7e Drug all True dron Drug Ontology 2023-03-10 https://data.bioontology.org/ontologies/DRON/s... 75e86011158fae76bb46d96662a33ba3 https://bioportal.bioontology.org/ontologies/DRON 2023-09-26 15:20:54 DzTjkKse
utju DevelopmentalStage human True hsapdv Human Developmental Stages 2020-03-10 http://purl.obolibrary.org/obo/hsapdv.owl 0423f338c50161880df4d5d1523d24ed https://github.com/obophenotype/developmental-... 2023-09-26 15:20:54 DzTjkKse
tTTN DevelopmentalStage mouse True mmusdv Mouse Developmental Stages 2020-03-10 http://purl.obolibrary.org/obo/mmusdv.owl 6342b59cf3082b10c54f90a8c3336b72 https://github.com/obophenotype/developmental-... 2023-09-26 15:20:54 DzTjkKse
1sSG Ethnicity human True hancestro Human Ancestry Ontology 2023-07-313.0 http://purl.obolibrary.org/obo/hancestro.owl af731447e95b4ca341a91b018edd4885 https://github.com/EBISPOT/hancestro 2023-09-26 15:20:54 DzTjkKse

Each record is linked to a versioned bionty source (if it was created from bionty):

hepatocyte = lb.CellType.filter(name="hepatocyte").one()
hepatocyte.bionty_source
BiontySource(id='7npi', entity='CellType', species='all', currently_used=True, source='cl', source_name='Cell Ontology', version='2023-04-20', url='http://purl.obolibrary.org/obo/cl/releases/2023-04-20/cl-base.owl', md5='58cdc1545f0d35e6fce76a65331b00fb', source_website='https://obophenotype.github.io/cell-ontology', updated_at=2023-09-26 15:20:54, created_by_id='DzTjkKse')

Create records from specific public ontologies#

By default, records are created from the "currently_used" Bionty sources which are configured during the instance initialization, e.g.:

lb.Phenotype.bionty()
Phenotype
Species: human
Source: hp, 2023-06-17
#terms: 17653

📖 Phenotype.df(): ontology reference table
🔎 Phenotype.lookup(): autocompletion of terms
🎯 Phenotype.search(): free text search of terms
✅ Phenotype.validate(): strictly validate values
🧐 Phenotype.inspect(): full inspection of values
👽 Phenotype.standardize(): convert to standardized names
🪜 Phenotype.diff(): difference between two versions
🔗 Phenotype.ontology: Pronto.Ontology object

Sometimes, the default source doesn’t contain the ontology term you are looking for.

You can then specify to create a record from a non-default source:

bionty_source = lb.BiontySource.filter(entity="Phenotype", source="pato").one()
age = lb.Phenotype.from_bionty(name="age", bionty_source=bionty_source)
age
✅ created 1 Phenotype record from Bionty matching name: 'age'
Phenotype(id='he7RPQN5', name='age', ontology_id='PATO:0000011', description='A Time Quality Inhering In A Bearer By Virtue Of How Long The Bearer Has Existed.', bionty_source_id='J3tT', created_by_id='DzTjkKse')
age.bionty_source
BiontySource(id='J3tT', entity='Phenotype', species='all', currently_used=True, source='pato', source_name='Phenotype And Trait Ontology', version='2023-05-18', url='http://purl.obolibrary.org/obo/pato/releases/2023-05-18/pato.owl', md5='bd472f4971492109493d4ad8a779a8dd', source_website='https://github.com/pato-ontology/pato', updated_at=2023-09-26 15:20:54, created_by_id='DzTjkKse')

Analogously, you can pass bionty_source to bulk-create records from a non-default source:

records = lb.Phenotype.from_values(["age", "life span"], bionty_source=bionty_source)
records
✅ created 2 Phenotype records from Bionty matching name: 'age', 'life span'
[Phenotype(id='he7RPQN5', name='age', ontology_id='PATO:0000011', description='A Time Quality Inhering In A Bearer By Virtue Of How Long The Bearer Has Existed.', bionty_source_id='J3tT', created_by_id='DzTjkKse'),
 Phenotype(id='rEuNj69a', name='life span', ontology_id='PATO:0000050', description='A Time Quality Inhering In A Bearer By Virtue Of The Bearer'S Expected Maximum Age.', bionty_source_id='J3tT', created_by_id='DzTjkKse')]
Hide code cell content
!lamin delete --force test-registries
!rm -r test-registries
💡 deleting instance testuser1/test-registries
✅     deleted instance settings file: /home/runner/.lamin/instance--testuser1--test-registries.env
✅     instance cache deleted
✅     deleted '.lndb' sqlite file
❗     consider manually deleting your stored data: /home/runner/work/lamindb/lamindb/docs/test-registries