Manage biological registries#
Registries manage the formalized knowledge & experimental design that anchor dry & wetlab work.
In LaminDB, registries are standard SQL tables, equipped with mechanisms that avoid typos & duplicated data.
In addition, LaminDB makes it easy to import records from public ontologies, based on plug-in lnschema_bionty
.
In this notebook, you’ll see how to manage an in-house ontology anchored in public knowledge.
(If you also manage experimental design through registries, you can access all metadata through one API and store it in one simple SQL database.)
Setup#
Let us create an instance that has lnschema_bionty
mounted:
!lamin init --storage ./test-registries --schema bionty
Show code cell output
✅ saved: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-09-26 15:20:54)
✅ saved: Storage(id='9v3TTOwG', root='/home/runner/work/lamindb/lamindb/docs/test-registries', type='local', updated_at=2023-09-26 15:20:54, created_by_id='DzTjkKse')
💡 loaded instance: testuser1/test-registries
💡 did not register local instance on hub (if you want, call `lamin register`)
import lamindb as ln
import lnschema_bionty as lb
💡 loaded instance: testuser1/test-registries (lamindb 0.54.2)
ln.settings.verbosity = "info"
Let’s pre-populate the cell type registry with a few records:
lb.Species.from_bionty(name="human").save()
lb.CellType.from_bionty(name="T cell").save()
lb.CellType(name="my T cell subtype").save()
Show code cell output
💡 downloading Species source file from: https://ftp.ensembl.org/pub/release-110/species_EnsemblVertebrates.txt
✅ created 1 Species record from Bionty matching name: 'human'
✅ created 1 CellType record from Bionty matching name: 'T cell'
💡 also saving parents of CellType(id='BxNjby0x', name='T cell', ontology_id='CL:0000084', synonyms='T-lymphocyte|T-cell|T lymphocyte', description='A Type Of Lymphocyte Whose Defining Characteristic Is The Expression Of A T Cell Receptor Complex.', updated_at=2023-09-26 15:21:01, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: 'CL:0000542'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 also saving parents of CellType(id='g8slxY8X', name='lymphocyte', ontology_id='CL:0000542', description='A Lymphocyte Is A Leukocyte Commonly Found In The Blood And Lymph That Has The Characteristics Of A Large Nucleus, A Neutral Staining Cytoplasm, And Prominent Heterochromatin.', updated_at=2023-09-26 15:21:02, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: 'CL:0000738'
💡 also saving parents of CellType(id='MkrH0gsX', name='leukocyte', ontology_id='CL:0000738', synonyms='white blood cell|leucocyte', description='An Achromatic Cell Of The Myeloid Or Lymphoid Lineages Capable Of Ameboid Movement, Found In Blood Or Other Tissue.', updated_at=2023-09-26 15:21:03, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: 'CL:0000988'
💡 also saving parents of CellType(id='Q0aQr5JB', name='hematopoietic cell', ontology_id='CL:0000988', synonyms='haematopoietic cell|hemopoietic cell|haemopoietic cell', description='A Cell Of A Hematopoietic Lineage.', updated_at=2023-09-26 15:21:05, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ created 2 CellType records from Bionty matching ontology_id: 'CL:0002371', 'CL:0000548'
💡 also saving parents of CellType(id='QMAH6IlS', name='somatic cell', ontology_id='CL:0002371', description='A Cell Of An Organism That Does Not Pass On Its Genetic Material To The Organism'S Offspring (I.E. A Non-Germ Line Cell).', updated_at=2023-09-26 15:21:06, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ loaded 1 CellType record matching ontology_id: 'CL:0000548'
✅ created 1 CellType record from Bionty matching ontology_id: 'CL:0000003'
💡 also saving parents of CellType(id='VT73gpK2', name='native cell', ontology_id='CL:0000003', description='A Cell That Is Found In A Natural Setting, Which Includes Multicellular Organism Cells 'In Vivo' (I.E. Part Of An Organism), And Unicellular Organisms 'In Environment' (I.E. Part Of A Natural Environment).', updated_at=2023-09-26 15:21:07, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: 'CL:0000000'
💡 also saving parents of CellType(id='H0taCt24', name='animal cell', ontology_id='CL:0000548', synonyms='metazoan cell', description='A Native Cell That Is Part Of Some Metazoa.', updated_at=2023-09-26 15:21:06, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: 'CL:0000255'
❗ records with similar names exist! did you mean to load one of them?
id | synonyms | __ratio__ | |
---|---|---|---|
name | |||
T cell | BxNjby0x | T-lymphocyte|T-cell|T lymphocyte | 90.0 |
cell | Ry0JGwSD | 90.0 |
Access records in public ontologies#
We start with a public ontology for cell types.
Bionty - short for “biological entity” - is a class for accessing public ontologies.
Bionty provides simple access to curated public ontologies that Lamin hosts for reliable and performant access. For most Bionty objects, you can access the underlying ontology through Pronto.
(If you don’t need to manage in-house registries, you can also use the bionty package standalone.)
Let’s create a Bionty
object:
bionty = lb.CellType.bionty()
bionty
CellType
Species: all
Source: cl, 2023-04-20
#terms: 2862
📖 CellType.df(): ontology reference table
🔎 CellType.lookup(): autocompletion of terms
🎯 CellType.search(): free text search of terms
✅ CellType.validate(): strictly validate values
🧐 CellType.inspect(): full inspection of values
👽 CellType.standardize(): convert to standardized names
🪜 CellType.diff(): difference between two versions
🔗 CellType.ontology: Pronto.Ontology object
We can use it to search the public ontology against cell types:
bionty.search("gamma delta T cell").head(3)
ontology_id | definition | synonyms | parents | __ratio__ | |
---|---|---|---|---|---|
name | |||||
gamma-delta T cell | CL:0000798 | A T Cell That Expresses A Gamma-Delta T Cell R... | gamma-delta T lymphocyte|gamma-delta T-lymphoc... | [CL:0000084] | 100.0 |
mature gamma-delta T cell | CL:0000800 | A Gamma-Delta T Cell That Has A Mature Phenoty... | mature gamma-delta T-cell|mature gamma-delta T... | [CL:0000798] | 95.0 |
CD27-negative gamma-delta T cell | CL:0002125 | A Circulating Gamma-Delta T Cell That Expresse... | gammadelta-17 cells | [CL:0000800] | 90.0 |
And we can also use it to look up cell types with auto-complete:
lookup = bionty.lookup()
lookup.gamma_delta_t_cell
CellType(ontology_id='CL:0000798', name='gamma-delta T cell', definition='A T Cell That Expresses A Gamma-Delta T Cell Receptor Complex.', synonyms='gamma-delta T lymphocyte|gamma-delta T-lymphocyte|gammadelta T cell|gamma-delta T-cell', parents=array(['CL:0000084'], dtype=object))
Create records in in-house ontologies#
We can now create a record for our in-house SQL registry by passing the result of the lookup in the public ontology to the CellType
constructor:
gdt_cell = lb.CellType(lookup.gamma_delta_t_cell)
❗ records with similar names exist! did you mean to load one of them?
id | synonyms | __ratio__ | |
---|---|---|---|
name | |||
T cell | BxNjby0x | T-lymphocyte|T-cell|T lymphocyte | 90.0 |
cell | Ry0JGwSD | 90.0 |
(Alternatively, we could construct the gamma delta T cell via from_bionty()
, which is synonyms-aware.)
gdt_cell
CellType(id='64kIG7So', name='gamma-delta T cell', ontology_id='CL:0000798', synonyms='gamma-delta T lymphocyte|gamma-delta T-lymphocyte|gammadelta T cell|gamma-delta T-cell', bionty_source_id='7npi', created_by_id='DzTjkKse')
When we save this record to the registry, logging informs us that we’re also saving parent ontological terms:
gdt_cell.save()
Show code cell output
💡 also saving parents of CellType(id='64kIG7So', name='gamma-delta T cell', ontology_id='CL:0000798', synonyms='gamma-delta T lymphocyte|gamma-delta T-lymphocyte|gammadelta T cell|gamma-delta T-cell', updated_at=2023-09-26 15:21:11, bionty_source_id='7npi', created_by_id='DzTjkKse')
Will I always see parents being saved?
No, this only happens a single time.
If we accidentally save the same record again, it will be recognized that the record and all parents are already in the registry.
If we save another record that has overlapping parents, only new parents will be saved.
View the ontological hierarchy:
gdt_cell.view_parents()
Or access the parents directly:
gdt_cell.parents.df()
name | ontology_id | abbr | synonyms | description | bionty_source_id | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|
id | ||||||||
BxNjby0x | T cell | CL:0000084 | None | T-lymphocyte|T-cell|T lymphocyte | A Type Of Lymphocyte Whose Defining Characteri... | 7npi | 2023-09-26 15:21:01 | DzTjkKse |
You can construct custom hierarchies of terms by specifying parents:
my_celltype = lb.CellType.filter(name="my T cell subtype").one()
my_celltype.parents.add(gdt_cell)
gdt_cell.view_parents(distance=2, with_children=True)
This cell type and all its parents can now be queried & searched in the registry using lb.CellType.filter
and lb.CellType.search
.
Load records for values in data sources#
When accessing data sources, one often encounters bulk references to entities that might be corrupted or curated using different standardization schemes.
Let’s consider an example based on an AnnData
object:
adata = ln.dev.datasets.anndata_with_obs()
In the cell_type
annotations of this AnnData
object, we find 4 references to cell types:
adata.obs.cell_type.value_counts()
T cell 10
hematopoietic stem cell 10
hepatocyte 10
my new cell type 10
Name: cell_type, dtype: int64
We’d like to load the corresponding records in our in-house ontology to annotate the batch of data.
To this end, you’ll typically use from_values
, which will both validate & load records that match the values.
cell_types = lb.CellType.from_values(adata.obs.cell_type)
cell_types
✅ loaded 1 CellType record matching name: 'T cell'
✅ created 2 CellType records from Bionty matching name: 'hematopoietic stem cell', 'hepatocyte'
❗ did not create CellType record for 1 non-validated name: 'my new cell type'
[CellType(id='BxNjby0x', name='T cell', ontology_id='CL:0000084', synonyms='T-lymphocyte|T-cell|T lymphocyte', description='A Type Of Lymphocyte Whose Defining Characteristic Is The Expression Of A T Cell Receptor Complex.', updated_at=2023-09-26 15:21:01, bionty_source_id='7npi', created_by_id='DzTjkKse'),
CellType(id='m91LZBDZ', name='hematopoietic stem cell', ontology_id='CL:0000037', synonyms='blood forming stem cell|hemopoietic stem cell|HSC', description='A Stem Cell From Which All Cells Of The Lymphoid And Myeloid Lineages Develop, Including Blood Cells And Cells Of The Immune System. Hematopoietic Stem Cells Lack Cell Markers Of Effector Cells (Lin-Negative). Lin-Negative Is Defined By Lacking One Or More Of The Following Cell Surface Markers: Cd2, Cd3 Epsilon, Cd4, Cd5 ,Cd8 Alpha Chain, Cd11B, Cd14, Cd19, Cd20, Cd56, Ly6G, Ter119.', bionty_source_id='7npi', created_by_id='DzTjkKse'),
CellType(id='J7hHC8SK', name='hepatocyte', ontology_id='CL:0000182', description='The Main Structural Component Of The Liver. They Are Specialized Epithelial Cells That Are Organized Into Interconnected Plates Called Lobules. Majority Of Cell Population Of Liver, Polygonal In Shape, Arranged In Plates Or Trabeculae Between Sinusoids; May Have Single Nucleus Or Binucleated.', bionty_source_id='7npi', created_by_id='DzTjkKse')]
Logging informed us that all 4 cell types are validated.
And because we loaded these records at the same time, we could readily use them to annotate a batch of data.
What happened under-the-hood?
.from_values()
performs the following look ups:
If registry records match the values, load these records
If values match synonyms of registry records, load these records
(
lnschema_bionty
-only) If no record in the registry matches, attempt to load records from a public reference through Bionty(
lnschema_bionty
-only) Same as 3. but based on synonyms
No records will be returned if input field values aren’t mappable.
Example:
celltype_names = [
"gamma-delta T cell", # existing record with the same name
"T lymphocyte", # existing record with synonym
"hepatocyte", # Bionty record with the same name
"HSC", # Bionty record with synonym
"my new cell type", # Not exist in DB, not exist in Bionty
]
lb.CellType.from_values(celltype_names)
This returns records for all names except from “my new cell type”.
If you’d like to add this new value to the registry, do it like so:
my_celltype = lb.CellType(name="my new cell type")
my_celltype.save()
Alternatively, we can create entries based on ontology ids:
adata.obs.cell_type_id.unique().tolist()
['CL:0000084', 'CL:0000037', 'CL:0000182', '']
lb.CellType.from_values(adata.obs.cell_type_id, field=lb.CellType.ontology_id)
✅ loaded 1 CellType record matching ontology_id: 'CL:0000084'
✅ created 2 CellType records from Bionty matching ontology_id: 'CL:0000037', 'CL:0000182'
[CellType(id='BxNjby0x', name='T cell', ontology_id='CL:0000084', synonyms='T-lymphocyte|T-cell|T lymphocyte', description='A Type Of Lymphocyte Whose Defining Characteristic Is The Expression Of A T Cell Receptor Complex.', updated_at=2023-09-26 15:21:01, bionty_source_id='7npi', created_by_id='DzTjkKse'),
CellType(id='m91LZBDZ', name='hematopoietic stem cell', ontology_id='CL:0000037', synonyms='blood forming stem cell|hemopoietic stem cell|HSC', description='A Stem Cell From Which All Cells Of The Lymphoid And Myeloid Lineages Develop, Including Blood Cells And Cells Of The Immune System. Hematopoietic Stem Cells Lack Cell Markers Of Effector Cells (Lin-Negative). Lin-Negative Is Defined By Lacking One Or More Of The Following Cell Surface Markers: Cd2, Cd3 Epsilon, Cd4, Cd5 ,Cd8 Alpha Chain, Cd11B, Cd14, Cd19, Cd20, Cd56, Ly6G, Ter119.', bionty_source_id='7npi', created_by_id='DzTjkKse'),
CellType(id='J7hHC8SK', name='hepatocyte', ontology_id='CL:0000182', description='The Main Structural Component Of The Liver. They Are Specialized Epithelial Cells That Are Organized Into Interconnected Plates Called Lobules. Majority Of Cell Population Of Liver, Polygonal In Shape, Arranged In Plates Or Trabeculae Between Sinusoids; May Have Single Nucleus Or Binucleated.', bionty_source_id='7npi', created_by_id='DzTjkKse')]
If we’re happy with cell_types
records, we save them to the registry:
ln.save(cell_types)
Show code cell output
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 also saving parents of CellType(id='J7hHC8SK', name='hepatocyte', ontology_id='CL:0000182', description='The Main Structural Component Of The Liver. They Are Specialized Epithelial Cells That Are Organized Into Interconnected Plates Called Lobules. Majority Of Cell Population Of Liver, Polygonal In Shape, Arranged In Plates Or Trabeculae Between Sinusoids; May Have Single Nucleus Or Binucleated.', updated_at=2023-09-26 15:21:15, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ loaded 1 CellType record matching ontology_id: 'CL:0002371'
✅ created 2 CellType records from Bionty matching ontology_id: 'CL:0000066', 'CL:0000417'
💡 also saving parents of CellType(id='AOy0Et6k', name='endopolyploid cell', ontology_id='CL:0000417', updated_at=2023-09-26 15:21:16, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: 'CL:0000412'
💡 also saving parents of CellType(id='eqVIXZEb', name='polyploid cell', ontology_id='CL:0000412', description='A Cell Whose Nucleus, Or Nuclei, Each Contain More Than Two Haploid Genomes.', updated_at=2023-09-26 15:21:17, bionty_source_id='7npi', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='P6E7yrc7', name='epithelial cell', ontology_id='CL:0000066', synonyms='epitheliocyte', description='A Cell That Is Usually Found In A Two-Dimensional Sheet With A Free Surface. The Cell Has A Cytoskeleton That Allows For Tight Cell To Cell Contact And For Cell Polarity Where Apical Part Is Directed Towards The Lumen And The Basal Part To The Basal Lamina.', updated_at=2023-09-26 15:21:16, bionty_source_id='7npi', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='m91LZBDZ', name='hematopoietic stem cell', ontology_id='CL:0000037', synonyms='blood forming stem cell|hemopoietic stem cell|HSC', description='A Stem Cell From Which All Cells Of The Lymphoid And Myeloid Lineages Develop, Including Blood Cells And Cells Of The Immune System. Hematopoietic Stem Cells Lack Cell Markers Of Effector Cells (Lin-Negative). Lin-Negative Is Defined By Lacking One Or More Of The Following Cell Surface Markers: Cd2, Cd3 Epsilon, Cd4, Cd5 ,Cd8 Alpha Chain, Cd11B, Cd14, Cd19, Cd20, Cd56, Ly6G, Ter119.', updated_at=2023-09-26 15:21:15, bionty_source_id='7npi', created_by_id='DzTjkKse')
✅ loaded 1 CellType record matching ontology_id: 'CL:0000988'
✅ created 1 CellType record from Bionty matching ontology_id: 'CL:0008001'
💡 also saving parents of CellType(id='0d3ym06W', name='hematopoietic precursor cell', ontology_id='CL:0008001', description='Any Hematopoietic Cell That Is A Precursor Of Some Other Hematopoietic Cell Type.', updated_at=2023-09-26 15:21:18, bionty_source_id='7npi', created_by_id='DzTjkKse')
Now, let’s inspect our in-house registry:
lb.CellType.filter().df()
Show code cell output
name | ontology_id | abbr | synonyms | description | bionty_source_id | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|
id | ||||||||
BxNjby0x | T cell | CL:0000084 | None | T-lymphocyte|T-cell|T lymphocyte | A Type Of Lymphocyte Whose Defining Characteri... | 7npi | 2023-09-26 15:21:01 | DzTjkKse |
g8slxY8X | lymphocyte | CL:0000542 | None | None | A Lymphocyte Is A Leukocyte Commonly Found In ... | 7npi | 2023-09-26 15:21:02 | DzTjkKse |
MkrH0gsX | leukocyte | CL:0000738 | None | white blood cell|leucocyte | An Achromatic Cell Of The Myeloid Or Lymphoid ... | 7npi | 2023-09-26 15:21:03 | DzTjkKse |
Q0aQr5JB | hematopoietic cell | CL:0000988 | None | haematopoietic cell|hemopoietic cell|haemopoie... | A Cell Of A Hematopoietic Lineage. | 7npi | 2023-09-26 15:21:05 | DzTjkKse |
QMAH6IlS | somatic cell | CL:0002371 | None | None | A Cell Of An Organism That Does Not Pass On It... | 7npi | 2023-09-26 15:21:06 | DzTjkKse |
H0taCt24 | animal cell | CL:0000548 | None | metazoan cell | A Native Cell That Is Part Of Some Metazoa. | 7npi | 2023-09-26 15:21:06 | DzTjkKse |
VT73gpK2 | native cell | CL:0000003 | None | None | A Cell That Is Found In A Natural Setting, Whi... | 7npi | 2023-09-26 15:21:07 | DzTjkKse |
Ry0JGwSD | cell | CL:0000000 | None | None | A Material Entity Of Anatomical Origin (Part O... | 7npi | 2023-09-26 15:21:08 | DzTjkKse |
igNGxgJT | eukaryotic cell | CL:0000255 | None | None | None | 7npi | 2023-09-26 15:21:09 | DzTjkKse |
uUhIkxU7 | my T cell subtype | None | None | None | None | None | 2023-09-26 15:21:09 | DzTjkKse |
64kIG7So | gamma-delta T cell | CL:0000798 | None | gamma-delta T lymphocyte|gamma-delta T-lymphoc... | None | 7npi | 2023-09-26 15:21:11 | DzTjkKse |
J7hHC8SK | hepatocyte | CL:0000182 | None | None | The Main Structural Component Of The Liver. Th... | 7npi | 2023-09-26 15:21:15 | DzTjkKse |
m91LZBDZ | hematopoietic stem cell | CL:0000037 | None | blood forming stem cell|hemopoietic stem cell|HSC | A Stem Cell From Which All Cells Of The Lympho... | 7npi | 2023-09-26 15:21:15 | DzTjkKse |
AOy0Et6k | endopolyploid cell | CL:0000417 | None | None | None | 7npi | 2023-09-26 15:21:16 | DzTjkKse |
P6E7yrc7 | epithelial cell | CL:0000066 | None | epitheliocyte | A Cell That Is Usually Found In A Two-Dimensio... | 7npi | 2023-09-26 15:21:16 | DzTjkKse |
eqVIXZEb | polyploid cell | CL:0000412 | None | None | A Cell Whose Nucleus, Or Nuclei, Each Contain ... | 7npi | 2023-09-26 15:21:17 | DzTjkKse |
0d3ym06W | hematopoietic precursor cell | CL:0008001 | None | None | Any Hematopoietic Cell That Is A Precursor Of ... | 7npi | 2023-09-26 15:21:18 | DzTjkKse |
Access records in in-house ontologies#
Search:
lb.CellType.search("gamma delta T cell").head(2)
id | synonyms | __ratio__ | |
---|---|---|---|
name | |||
gamma-delta T cell | 64kIG7So | gamma-delta T lymphocyte|gamma-delta T-lymphoc... | 100.0 |
T cell | BxNjby0x | T-lymphocyte|T-cell|T lymphocyte | 90.0 |
Or look up with auto-complete:
cell_types = lb.CellType.lookup()
hsc_record = cell_types.hematopoietic_stem_cell
hsc_record
CellType(id='m91LZBDZ', name='hematopoietic stem cell', ontology_id='CL:0000037', synonyms='blood forming stem cell|hemopoietic stem cell|HSC', description='A Stem Cell From Which All Cells Of The Lymphoid And Myeloid Lineages Develop, Including Blood Cells And Cells Of The Immune System. Hematopoietic Stem Cells Lack Cell Markers Of Effector Cells (Lin-Negative). Lin-Negative Is Defined By Lacking One Or More Of The Following Cell Surface Markers: Cd2, Cd3 Epsilon, Cd4, Cd5 ,Cd8 Alpha Chain, Cd11B, Cd14, Cd19, Cd20, Cd56, Ly6G, Ter119.', updated_at=2023-09-26 15:21:15, bionty_source_id='7npi', created_by_id='DzTjkKse')
Validate & standardize#
Simple validation of an iterable of values works like so:
lb.CellType.validate(["HSC", "blood forming stem cell"])
❗ 2 terms (100.00%) are not validated for name: HSC, blood forming stem cell
array([False, False])
Because these values don’t comply with the registry, they’re not validated!
You can easily convert these values to validated standardized names based on synonyms like so:
lb.CellType.standardize(["HSC", "blood forming stem cell"])
💡 standardized 2/2 terms
['hematopoietic stem cell', 'hematopoietic stem cell']
Alternatively, you can use .from_values()
, which will only ever create validated records and automatically standardize under-the-hood:
lb.CellType.from_values(["HSC", "blood forming stem cell"])
✅ loaded 2 CellType records matching synonyms: 'HSC', 'blood forming stem cell'
[CellType(id='m91LZBDZ', name='hematopoietic stem cell', ontology_id='CL:0000037', synonyms='blood forming stem cell|hemopoietic stem cell|HSC', description='A Stem Cell From Which All Cells Of The Lymphoid And Myeloid Lineages Develop, Including Blood Cells And Cells Of The Immune System. Hematopoietic Stem Cells Lack Cell Markers Of Effector Cells (Lin-Negative). Lin-Negative Is Defined By Lacking One Or More Of The Following Cell Surface Markers: Cd2, Cd3 Epsilon, Cd4, Cd5 ,Cd8 Alpha Chain, Cd11B, Cd14, Cd19, Cd20, Cd56, Ly6G, Ter119.', updated_at=2023-09-26 15:21:15, bionty_source_id='7npi', created_by_id='DzTjkKse')]
We can also add new synonyms to a record like so:
hsc_record.add_synonym("HSCs")
And when we encounter this synonym as a value, it will now be standardized using synonyms-lookup, and mapped on the correct registry record:
lb.CellType.standardize(["HSCs"])
💡 standardized 1/1 terms
['hematopoietic stem cell']
A special synonym is .abbr
(short for abbreviation), which has its own field and can be assigned via:
hsc_record.set_abbr("HSC")
You can create a lookup object from the .abbr
field:
cell_types = lb.CellType.lookup("abbr")
hsc = cell_types.hsc
hsc
CellType(id='m91LZBDZ', name='hematopoietic stem cell', ontology_id='CL:0000037', abbr='HSC', synonyms='HSCs|blood forming stem cell|HSC|hemopoietic stem cell', description='A Stem Cell From Which All Cells Of The Lymphoid And Myeloid Lineages Develop, Including Blood Cells And Cells Of The Immune System. Hematopoietic Stem Cells Lack Cell Markers Of Effector Cells (Lin-Negative). Lin-Negative Is Defined By Lacking One Or More Of The Following Cell Surface Markers: Cd2, Cd3 Epsilon, Cd4, Cd5 ,Cd8 Alpha Chain, Cd11B, Cd14, Cd19, Cd20, Cd56, Ly6G, Ter119.', updated_at=2023-09-26 15:21:19, bionty_source_id='7npi', created_by_id='DzTjkKse')
The same workflow works for all of lnschema_bionty
’s registries.
Manage registries across species#
Most registries are species-aware, for instance, Gene
:
lb.Gene.from_bionty(symbol="TCF7", species="human")
✅ created 1 Gene record from Bionty matching symbol: 'TCF7'
Gene(id='0StEa7eEhivb', symbol='TCF7', ensembl_gene_id='ENSG00000081059', ncbi_gene_ids='6932', biotype='protein_coding', description='transcription factor 7 [Source:HGNC Symbol;Acc:HGNC:11639]', synonyms='TCF-1', species_id='uHJU', bionty_source_id='LIw3', created_by_id='DzTjkKse')
Similarly, API calls that interact with multi-species registries accept a species
argument, e.g.:
lb.Gene.validate(["TCF7", "ABC1"], species="human")
Show code cell output
❗ 2 terms (100.00%) are not validated for symbol: TCF7, ABC1
array([False, False])
You can also pass species for validating features upon registering data, e.g., in ln.File.from_anndata(..., field=lb.Gene.ensembl_gene_id, species=...)
.
And when working with the same species throughout your analysis/workflow, you can omit the species
argument by configuring it globally:
lb.settings.species = "mouse"
lb.Gene.from_bionty(symbol="Ap5b1")
✅ created 1 Gene record from Bionty matching symbol: 'Ap5b1'
Gene(id='B6eFOVlpj5om', symbol='Ap5b1', ensembl_gene_id='ENSMUSG00000049562', ncbi_gene_ids='381201', biotype='protein_coding', description='adaptor-related protein complex 5, beta 1 subunit [Source:MGI Symbol;Acc:MGI:2685808]', synonyms='Gm962', species_id='vado', bionty_source_id='YoR2', created_by_id='DzTjkKse')
Track underlying ontology versions#
Under-the-hood, source ontology versions are automatically tracked:
lb.BiontySource.filter(currently_used=True).df()
Show code cell output
entity | species | currently_used | source | source_name | version | url | md5 | source_website | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||
kuTH | Species | vertebrates | True | ensembl | Ensembl | release-110 | https://ftp.ensembl.org/pub/release-110/specie... | f3faf95648d3a2b50fd3625456739706 | https://www.ensembl.org | 2023-09-26 15:20:54 | DzTjkKse |
dkvi | Species | bacteria | True | ensembl | Ensembl | release-57 | https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacte... | ee28510ed5586ea7ab4495717c96efc8 | https://www.ensembl.org | 2023-09-26 15:20:54 | DzTjkKse |
4T1v | Species | fungi | True | ensembl | Ensembl | release-57 | http://ftp.ensemblgenomes.org/pub/fungi/releas... | dbcde58f4396ab8b2480f7fe9f83df8a | https://www.ensembl.org | 2023-09-26 15:20:54 | DzTjkKse |
2WAU | Species | metazoa | True | ensembl | Ensembl | release-57 | http://ftp.ensemblgenomes.org/pub/metazoa/rele... | 424636a574fec078a61cbdddb05f9132 | https://www.ensembl.org | 2023-09-26 15:20:54 | DzTjkKse |
cDTn | Species | plants | True | ensembl | Ensembl | release-57 | https://ftp.ensemblgenomes.ebi.ac.uk/pub/plant... | eadaa1f3e527e4c3940c90c7fa5c8bf4 | https://www.ensembl.org | 2023-09-26 15:20:54 | DzTjkKse |
LIw3 | Gene | human | True | ensembl | Ensembl | release-110 | s3://bionty-assets/df_human__ensembl__release-... | 832f3947e83664588d419608a469b528 | https://www.ensembl.org | 2023-09-26 15:20:54 | DzTjkKse |
YoR2 | Gene | mouse | True | ensembl | Ensembl | release-110 | s3://bionty-assets/df_mouse__ensembl__release-... | fa4ce130f2929aefd7ac3bc8eaf0c4de | https://www.ensembl.org | 2023-09-26 15:20:54 | DzTjkKse |
tep3 | Gene | saccharomyces cerevisiae | True | ensembl | Ensembl | release-110 | s3://bionty-assets/df_saccharomyces cerevisiae... | 2e59495a3e87ea6575e408697dd73459 | https://www.ensembl.org | 2023-09-26 15:20:54 | DzTjkKse |
NSV4 | Protein | human | True | uniprot | Uniprot | 2023-03 | s3://bionty-assets/df_human__uniprot__2023-03_... | 1c46e85c6faf5eff3de5b4e1e4edc4d3 | https://www.uniprot.org | 2023-09-26 15:20:54 | DzTjkKse |
NkNa | Protein | mouse | True | uniprot | Uniprot | 2023-03 | s3://bionty-assets/df_mouse__uniprot__2023-03_... | 9d5e9a8225011d3218e10f9bbb96a46c | https://www.uniprot.org | 2023-09-26 15:20:54 | DzTjkKse |
TbT9 | CellMarker | human | True | cellmarker | CellMarker | 2.0 | s3://bionty-assets/human_cellmarker_2.0_CellMa... | d565d4a542a5c7e7a06255975358e4f4 | http://bio-bigdata.hrbmu.edu.cn/CellMarker | 2023-09-26 15:20:54 | DzTjkKse |
Fnl7 | CellMarker | mouse | True | cellmarker | CellMarker | 2.0 | s3://bionty-assets/mouse_cellmarker_2.0_CellMa... | 189586732c63be949e40dfa6a3636105 | http://bio-bigdata.hrbmu.edu.cn/CellMarker | 2023-09-26 15:20:54 | DzTjkKse |
MLRc | CellLine | all | True | clo | Cell Line Ontology | 2022-03-21 | https://data.bioontology.org/ontologies/CLO/su... | ea58a1010b7e745702a8397a526b3a33 | https://bioportal.bioontology.org/ontologies/CLO | 2023-09-26 15:20:54 | DzTjkKse |
7npi | CellType | all | True | cl | Cell Ontology | 2023-04-20 | http://purl.obolibrary.org/obo/cl/releases/202... | 58cdc1545f0d35e6fce76a65331b00fb | https://obophenotype.github.io/cell-ontology | 2023-09-26 15:20:54 | DzTjkKse |
Xcgu | Tissue | all | True | uberon | Uberon multi-species anatomy ontology | 2023-04-19 | http://purl.obolibrary.org/obo/uberon/releases... | 5611dd1375d5a95ac7d7de8e25e6016f | http://obophenotype.github.io/uberon | 2023-09-26 15:20:54 | DzTjkKse |
DXsf | Disease | all | True | mondo | Mondo Disease Ontology | 2023-04-04 | http://purl.obolibrary.org/obo/mondo/releases/... | 700c43dd9ba51aecc7a8edfc3bc2dab1 | https://mondo.monarchinitiative.org | 2023-09-26 15:20:54 | DzTjkKse |
qmRy | Disease | human | True | doid | Human Disease Ontology | 2023-03-31 | http://purl.obolibrary.org/obo/doid/releases/2... | 64f083a1e47867c307c8eae308afc3bb | https://disease-ontology.org | 2023-09-26 15:20:54 | DzTjkKse |
oGI0 | ExperimentalFactor | all | True | efo | The Experimental Factor Ontology | 3.48.0 | http://www.ebi.ac.uk/efo/releases/v3.48.0/efo.owl | 3367e9a9ae3dee9113024e5108c49091 | https://bioportal.bioontology.org/ontologies/EFO | 2023-09-26 15:20:54 | DzTjkKse |
22jy | Phenotype | human | True | hp | Human Phenotype Ontology | 2023-06-17 | https://github.com/obophenotype/human-phenotyp... | 65e8d96bc81deb893163927063b10c06 | https://hpo.jax.org | 2023-09-26 15:20:54 | DzTjkKse |
Flv4 | Phenotype | mammalian | True | mp | Mammalian Phenotype Ontology | 2023-05-31 | https://github.com/mgijax/mammalian-phenotype-... | be89052cf6d9c0b6197038fe347ef293 | https://github.com/mgijax/mammalian-phenotype-... | 2023-09-26 15:20:54 | DzTjkKse |
cVc8 | Phenotype | zebrafish | True | zp | Zebrafish Phenotype Ontology | 2022-12-17 | https://github.com/obophenotype/zebrafish-phen... | 03430b567bf153216c0fa4c3440b3b24 | https://github.com/obophenotype/zebrafish-phen... | 2023-09-26 15:20:54 | DzTjkKse |
J3tT | Phenotype | all | True | pato | Phenotype And Trait Ontology | 2023-05-18 | http://purl.obolibrary.org/obo/pato/releases/2... | bd472f4971492109493d4ad8a779a8dd | https://github.com/pato-ontology/pato | 2023-09-26 15:20:54 | DzTjkKse |
VPpX | Pathway | all | True | go | Gene Ontology | 2023-05-10 | https://data.bioontology.org/ontologies/GO/sub... | e9845499eadaef2418f464cd7e9ac92e | http://geneontology.org | 2023-09-26 15:20:54 | DzTjkKse |
gXBr | BFXPipeline | all | True | lamin | Bioinformatics Pipeline | 1.0.0 | s3://bionty-assets/bfxpipelines.json | a7eff57a256994692fba46e0199ffc94 | https://lamin.ai | 2023-09-26 15:20:54 | DzTjkKse |
Ry7e | Drug | all | True | dron | Drug Ontology | 2023-03-10 | https://data.bioontology.org/ontologies/DRON/s... | 75e86011158fae76bb46d96662a33ba3 | https://bioportal.bioontology.org/ontologies/DRON | 2023-09-26 15:20:54 | DzTjkKse |
utju | DevelopmentalStage | human | True | hsapdv | Human Developmental Stages | 2020-03-10 | http://purl.obolibrary.org/obo/hsapdv.owl | 0423f338c50161880df4d5d1523d24ed | https://github.com/obophenotype/developmental-... | 2023-09-26 15:20:54 | DzTjkKse |
tTTN | DevelopmentalStage | mouse | True | mmusdv | Mouse Developmental Stages | 2020-03-10 | http://purl.obolibrary.org/obo/mmusdv.owl | 6342b59cf3082b10c54f90a8c3336b72 | https://github.com/obophenotype/developmental-... | 2023-09-26 15:20:54 | DzTjkKse |
1sSG | Ethnicity | human | True | hancestro | Human Ancestry Ontology | 2023-07-313.0 | http://purl.obolibrary.org/obo/hancestro.owl | af731447e95b4ca341a91b018edd4885 | https://github.com/EBISPOT/hancestro | 2023-09-26 15:20:54 | DzTjkKse |
Each record is linked to a versioned bionty source (if it was created from bionty):
hepatocyte = lb.CellType.filter(name="hepatocyte").one()
hepatocyte.bionty_source
BiontySource(id='7npi', entity='CellType', species='all', currently_used=True, source='cl', source_name='Cell Ontology', version='2023-04-20', url='http://purl.obolibrary.org/obo/cl/releases/2023-04-20/cl-base.owl', md5='58cdc1545f0d35e6fce76a65331b00fb', source_website='https://obophenotype.github.io/cell-ontology', updated_at=2023-09-26 15:20:54, created_by_id='DzTjkKse')
Create records from specific public ontologies#
By default, records are created from the "currently_used"
Bionty sources which are configured during the instance initialization, e.g.:
lb.Phenotype.bionty()
Phenotype
Species: human
Source: hp, 2023-06-17
#terms: 17653
📖 Phenotype.df(): ontology reference table
🔎 Phenotype.lookup(): autocompletion of terms
🎯 Phenotype.search(): free text search of terms
✅ Phenotype.validate(): strictly validate values
🧐 Phenotype.inspect(): full inspection of values
👽 Phenotype.standardize(): convert to standardized names
🪜 Phenotype.diff(): difference between two versions
🔗 Phenotype.ontology: Pronto.Ontology object
Sometimes, the default source doesn’t contain the ontology term you are looking for.
You can then specify to create a record from a non-default source:
bionty_source = lb.BiontySource.filter(entity="Phenotype", source="pato").one()
age = lb.Phenotype.from_bionty(name="age", bionty_source=bionty_source)
age
✅ created 1 Phenotype record from Bionty matching name: 'age'
Phenotype(id='he7RPQN5', name='age', ontology_id='PATO:0000011', description='A Time Quality Inhering In A Bearer By Virtue Of How Long The Bearer Has Existed.', bionty_source_id='J3tT', created_by_id='DzTjkKse')
age.bionty_source
BiontySource(id='J3tT', entity='Phenotype', species='all', currently_used=True, source='pato', source_name='Phenotype And Trait Ontology', version='2023-05-18', url='http://purl.obolibrary.org/obo/pato/releases/2023-05-18/pato.owl', md5='bd472f4971492109493d4ad8a779a8dd', source_website='https://github.com/pato-ontology/pato', updated_at=2023-09-26 15:20:54, created_by_id='DzTjkKse')
Analogously, you can pass bionty_source
to bulk-create records from a non-default source:
records = lb.Phenotype.from_values(["age", "life span"], bionty_source=bionty_source)
records
✅ created 2 Phenotype records from Bionty matching name: 'age', 'life span'
[Phenotype(id='he7RPQN5', name='age', ontology_id='PATO:0000011', description='A Time Quality Inhering In A Bearer By Virtue Of How Long The Bearer Has Existed.', bionty_source_id='J3tT', created_by_id='DzTjkKse'),
Phenotype(id='rEuNj69a', name='life span', ontology_id='PATO:0000050', description='A Time Quality Inhering In A Bearer By Virtue Of The Bearer'S Expected Maximum Age.', bionty_source_id='J3tT', created_by_id='DzTjkKse')]
Show code cell content
!lamin delete --force test-registries
!rm -r test-registries
💡 deleting instance testuser1/test-registries
✅ deleted instance settings file: /home/runner/.lamin/instance--testuser1--test-registries.env
✅ instance cache deleted
✅ deleted '.lndb' sqlite file
❗ consider manually deleting your stored data: /home/runner/work/lamindb/lamindb/docs/test-registries