Annotate an h5ad file based on CELLxGENE schema#

This guide shows how to validate and curate an AnnData object using the metadata registries of laminlabs/cellxgene, based on the CELLxGENE schema version 5.1.0.

The validated object can be subsequently registered as an artifact in your LaminDB instance.

Note

The Annotate class is primarily designed to validate all metadata with respect to adhere to the ontologies. It not reimplement all rules of the cellxgene schema and we therefore recommend running the cellxgene-schema if full adherence beyond metadata is a necessity.

Set up#

Load your instance to register the annotated AnnData:

!lamin init --storage ./test-cellxgene-annotate --schema bionty
Hide code cell output
πŸ’‘ connected lamindb: testuser1/test-cellxgene-annotate
import lamindb as ln
import lnschema_bionty as lb
from cellxgene_lamin import Annotate, datasets, CellxGeneFields

ln.settings.verbosity = "hint"
lb.settings.organism = "human"
πŸ’‘ connected lamindb: testuser1/test-cellxgene-annotate
❗ Full backed capabilities are not available for this version of anndata, please install anndata>=0.9.1.

An h5ad file#

Let’s start with an AnnData object that we’d like to inspect and curate:

adata = datasets.anndata_human_immune_cells(populate_registries=True)
adata
AnnData object with n_obs Γ— n_vars = 1626 Γ— 36503
    obs: 'donor', 'tissue', 'cell_type', 'assay', 'sex_ontology_term_id'
    var: 'feature_is_filtered'
    uns: 'default_embedding'
    obsm: 'X_umap'
adata.write_h5ad("anndata_human_immune_cells.h5ad")
!cellxgene-schema validate anndata_human_immune_cells.h5ad
Loading dependencies
Loading validator modules
Starting validation...
WARNING: Validation of raw layer was not performed due to current errors, try again after fixing current errors.
ERROR: Add labels error: Column 'cell_type' is a reserved column name of 'obs'. Remove it from h5ad and try again.
ERROR: Add labels error: Column 'assay' is a reserved column name of 'obs'. Remove it from h5ad and try again.
ERROR: Add labels error: Column 'tissue' is a reserved column name of 'obs'. Remove it from h5ad and try again.
ERROR: 'title' in 'uns' is not present.
ERROR: 'ENSG00000269933' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000261737' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000259834' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000256374' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000263464' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000203812' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000272196' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000272880' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000270188' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000287116' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000237133' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000224739' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000227902' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000239467' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000272551' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000280374' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000236886' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000229352' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000286601' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000227021' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000259855' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000273301' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000271870' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000237838' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000286996' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000269028' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000286699' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000273370' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000261490' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000272567' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000270394' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000272370' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000272354' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000251044' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000272040' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000182230' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000204092' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000261068' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000236740' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000236996' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000232295' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000271734' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000236673' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000227220' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000236166' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000112096' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000285162' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000286228' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000237513' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000285106' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000226380' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000270672' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000225932' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000244693' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000268955' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000272267' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000253878' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000259820' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000226403' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000233776' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000269900' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000261534' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000237548' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000239665' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000256892' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000249860' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000271409' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000224745' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000261438' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000231575' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000260461' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000255823' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000254740' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000254561' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000282080' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000256427' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000287388' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000276814' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000280710' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000215271' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000258414' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000258808' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000277050' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000273888' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000258861' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000259444' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000244952' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000273923' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000262668' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000232196' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000256618' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000221995' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000226377' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000273576' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000267637' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000282965' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000273837' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000286949' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000256222' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000280095' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000278927' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000278955' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000277352' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000239446' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000256045' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000228906' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000228139' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000261773' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000278198' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000273496' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000277666' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000278782' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000277761' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000269933' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000261737' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000259834' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000256374' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000263464' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000203812' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000272196' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000272880' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000270188' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000287116' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000237133' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000224739' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000227902' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000239467' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000272551' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000280374' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000236886' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000229352' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000286601' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000227021' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000259855' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000273301' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000271870' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000237838' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000286996' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000269028' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000286699' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000273370' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000261490' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000272567' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000270394' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000272370' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000272354' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000251044' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000272040' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000182230' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000204092' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000261068' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000236740' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000236996' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000232295' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000271734' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000236673' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000227220' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000236166' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000112096' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000285162' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000286228' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000237513' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000285106' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000226380' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000270672' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000225932' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000244693' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000268955' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000272267' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000253878' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000259820' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000226403' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000233776' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000269900' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000261534' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000237548' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000239665' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000256892' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000249860' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000271409' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000224745' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000261438' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000231575' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000260461' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000255823' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000254740' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000254561' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000282080' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000256427' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000287388' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000276814' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000280710' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000215271' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000258414' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000258808' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000277050' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000273888' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000258861' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000259444' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000244952' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000273923' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000262668' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000232196' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000256618' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000221995' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000226377' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000273576' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000267637' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000282965' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000273837' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000286949' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000256222' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000280095' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000278927' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000278955' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000277352' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000239446' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000256045' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000228906' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000228139' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000261773' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000278198' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000273496' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000277666' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000278782' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000277761' is not a valid feature ID in 'raw.var'.
ERROR: Dataframe 'obs' is missing column 'cell_type_ontology_term_id'.
ERROR: Dataframe 'obs' is missing column 'assay_ontology_term_id'.
ERROR: Dataframe 'obs' is missing column 'disease_ontology_term_id'.
ERROR: Dataframe 'obs' is missing column 'organism_ontology_term_id'.
ERROR: Dataframe 'obs' is missing column 'tissue_ontology_term_id'.
ERROR: Dataframe 'obs' is missing column 'self_reported_ethnicity_ontology_term_id'.
ERROR: Dataframe 'obs' is missing column 'development_stage_ontology_term_id'.
ERROR: Dataframe 'obs' is missing column 'is_primary_data'.
ERROR: Dataframe 'obs' is missing column 'donor_id'.
ERROR: Dataframe 'obs' is missing column 'suspension_type'.
ERROR: Dataframe 'obs' is missing column 'tissue_type'.
Validation complete in 0:00:00.323915 with status is_valid=False

Validate and curate metadata#

Validate the AnnData object:

try:
    annotate = Annotate(adata)
except Exception as e:
    print(e)
Columns {'organism', 'donor_id', 'suspension_type', 'tissue_type', 'disease', 'self_reported_ethnicity', 'development_stage'} are not found in the data object!

Let’s fix the β€œdonor_id” column name:

adata.obs.rename(columns={"donor": "donor_id"}, inplace=True)

For the missing columns, we can pass default values suggested from CELLxGENE:

CellxGeneFields.OBS_FIELD_DEFAULTS
{'disease': 'normal',
 'development_stage': 'unknown',
 'self_reported_ethnicity': 'unknown',
 'suspension_type': 'cell',
 'donor_id': 'na',
 'tissue_type': 'tissue',
 'cell_type': 'native_cell',
 'sex': 'unknown'}
annotate = Annotate(adata, organism="human", **CellxGeneFields.OBS_FIELD_DEFAULTS)
πŸ’‘ added defaults to the AnnData object: {'organism': 'human', 'disease': 'normal', 'development_stage': 'unknown', 'self_reported_ethnicity': 'unknown', 'suspension_type': 'cell', 'tissue_type': 'tissue'}
βœ… added 1 record with Feature.name for columns: ['sex_ontology_term_id']
βœ… added 10 records from laminlabs/cellxgene with Feature.name for columns: ['assay', 'cell_type', 'development_stage', 'disease', 'donor_id', 'self_reported_ethnicity', 'tissue', 'organism', 'tissue_type', 'suspension_type']
βœ… added 123 records from laminlabs/cellxgene with Gene.ensembl_gene_id for var_index: ['ENSG00000112096', 'ENSG00000182230', 'ENSG00000203812', 'ENSG00000204092', 'ENSG00000215271', 'ENSG00000221995', 'ENSG00000224739', 'ENSG00000224745', 'ENSG00000225932', 'ENSG00000226377', 'ENSG00000226380', 'ENSG00000226403', 'ENSG00000227021', 'ENSG00000227220', 'ENSG00000227902', 'ENSG00000228139', 'ENSG00000228906', 'ENSG00000229352', 'ENSG00000231575', 'ENSG00000232196', 'ENSG00000232295', 'ENSG00000233776', 'ENSG00000236166', 'ENSG00000236673', 'ENSG00000236740', 'ENSG00000236886', 'ENSG00000236996', 'ENSG00000237133', 'ENSG00000237513', 'ENSG00000237548', 'ENSG00000237838', 'ENSG00000239446', 'ENSG00000239467', 'ENSG00000239665', 'ENSG00000244693', 'ENSG00000244952', 'ENSG00000249860', 'ENSG00000251044', 'ENSG00000253878', 'ENSG00000254561', 'ENSG00000254740', 'ENSG00000255823', 'ENSG00000256045', 'ENSG00000256222', 'ENSG00000256374', 'ENSG00000256427', 'ENSG00000256618', 'ENSG00000256892', 'ENSG00000258414', 'ENSG00000258808', 'ENSG00000258861', 'ENSG00000259444', 'ENSG00000259820', 'ENSG00000259834', 'ENSG00000259855', 'ENSG00000260461', 'ENSG00000261068', 'ENSG00000261438', 'ENSG00000261490', 'ENSG00000261534', 'ENSG00000261737', 'ENSG00000261773', 'ENSG00000262668', 'ENSG00000263464', 'ENSG00000267637', 'ENSG00000268955', 'ENSG00000269028', 'ENSG00000269900', 'ENSG00000269933', 'ENSG00000270188', 'ENSG00000270394', 'ENSG00000270672', 'ENSG00000271409', 'ENSG00000271734', 'ENSG00000271870', 'ENSG00000272040', 'ENSG00000272196', 'ENSG00000272267', 'ENSG00000272354', 'ENSG00000272370', 'ENSG00000272551', 'ENSG00000272567', 'ENSG00000272880', 'ENSG00000273301', 'ENSG00000273370', 'ENSG00000273496', 'ENSG00000273554', 'ENSG00000273576', 'ENSG00000273837', 'ENSG00000273888', 'ENSG00000273923', 'ENSG00000274175', 'ENSG00000274792', 'ENSG00000275249', 'ENSG00000275869', 'ENSG00000276017', 'ENSG00000276814', 'ENSG00000277050', 'ENSG00000277196', 'ENSG00000277352', 'ENSG00000277666', 'ENSG00000277761', 'ENSG00000277836', 'ENSG00000278198', 'ENSG00000278633', 'ENSG00000278782', 'ENSG00000278817', 'ENSG00000278927', 'ENSG00000278955', 'ENSG00000280095', 'ENSG00000280374', 'ENSG00000280710', 'ENSG00000282080', 'ENSG00000282965', 'ENSG00000285106', 'ENSG00000285162', 'ENSG00000286228', 'ENSG00000286601', 'ENSG00000286699', 'ENSG00000286949', 'ENSG00000286996', 'ENSG00000287116', 'ENSG00000287388']
annotate.categoricals
{'assay': FieldAttr(ExperimentalFactor.name),
 'cell_type': FieldAttr(CellType.name),
 'development_stage': FieldAttr(DevelopmentalStage.name),
 'disease': FieldAttr(Disease.name),
 'donor_id': FieldAttr(ULabel.name),
 'self_reported_ethnicity': FieldAttr(Ethnicity.name),
 'sex_ontology_term_id': FieldAttr(Phenotype.ontology_id),
 'suspension_type': FieldAttr(ULabel.name),
 'tissue': FieldAttr(Tissue.name),
 'tissue_type': FieldAttr(ULabel.name),
 'organism': FieldAttr(Organism.name)}
validated = annotate.validate()
πŸ’‘ validating metadata using registries of instance laminlabs/cellxgene
βœ… var_index is validated against Gene.ensembl_gene_id
πŸ’‘ mapping assay on ExperimentalFactor.name
❗    found 3 terms validated terms: ["10x 3' v3", "10x 5' v2", "10x 5' v1"]
      β†’ save terms via .add_validated_from('assay')
βœ… assay is validated against ExperimentalFactor.name
βœ… cell_type is validated against CellType.name
πŸ’‘ mapping development_stage on DevelopmentalStage.name
❗    found 1 terms validated terms: ['unknown']
      β†’ save terms via .add_validated_from('development_stage')
βœ… development_stage is validated against DevelopmentalStage.name
πŸ’‘ mapping disease on Disease.name
❗    found 1 terms validated terms: ['normal']
      β†’ save terms via .add_validated_from('disease')
βœ… disease is validated against Disease.name
πŸ’‘ mapping donor_id on ULabel.name
❗    12 terms are not validated: '640C-1', 'A52-1', 'A37-1', 'A31-1', '637C-1', 'A35-1', '582C-1', '621B-1', 'D496-1', 'D503-1', 'A36-1', 'A29-1'
      β†’ save terms via .add_new_from('donor_id')
πŸ’‘ mapping self_reported_ethnicity on Ethnicity.name
❗    found 1 terms validated terms: ['unknown']
      β†’ save terms via .add_validated_from('self_reported_ethnicity')
βœ… self_reported_ethnicity is validated against Ethnicity.name
πŸ’‘ mapping sex_ontology_term_id on Phenotype.ontology_id
❗    found 1 terms validated terms: ['PATO:0000384']
      β†’ save terms via .add_validated_from('sex_ontology_term_id')
βœ… sex_ontology_term_id is validated against Phenotype.ontology_id
πŸ’‘ mapping suspension_type on ULabel.name
❗    found 1 terms validated terms: ['cell']
      β†’ save terms via .add_validated_from('suspension_type')
βœ… suspension_type is validated against ULabel.name
πŸ’‘ mapping tissue on Tissue.name
❗    found 16 terms validated terms: ['blood', 'thoracic lymph node', 'spleen', 'mesenteric lymph node', 'lamina propria', 'liver', 'jejunal epithelium', 'omentum', 'bone marrow', 'ileum', 'caecum', 'thymus', 'skeletal muscle tissue', 'duodenum', 'sigmoid colon', 'transverse colon']
      β†’ save terms via .add_validated_from('tissue')
❗    1 terms is not validated: 'lungg'
      β†’ save terms via .add_new_from('tissue')
πŸ’‘ mapping tissue_type on ULabel.name
❗    found 1 terms validated terms: ['tissue']
      β†’ save terms via .add_validated_from('tissue_type')
βœ… tissue_type is validated against ULabel.name
βœ… organism is validated against Organism.name
validated
False

Register new metadata labels#

Following the suggestions above to register genes and labels that aren’t present in the current instance:

(Note that our instance is rather empty. Once you filled up the registries, registering new labels won’t be frequently needed)

annotate.add_validated_from("all")
πŸ’‘ saving labels for 'assay'
βœ… added 3 records from laminlabs/cellxgene with ExperimentalFactor.name for assay: ["10x 5' v1", "10x 5' v2", "10x 3' v3"]
πŸ’‘ saving labels for 'cell_type'
πŸ’‘ saving labels for 'development_stage'
βœ… added 1 record from laminlabs/cellxgene with DevelopmentalStage.name for development_stage: ['unknown']
πŸ’‘ saving labels for 'disease'
βœ… added 1 record from laminlabs/cellxgene with Disease.name for disease: ['normal']
πŸ’‘ saving labels for 'donor_id'
❗ 12 non-validated categories are not saved in ULabel.name: ['D496-1', '621B-1', 'A29-1', 'A36-1', 'A35-1', '637C-1', 'A52-1', 'A37-1', 'D503-1', '640C-1', 'A31-1', '582C-1']!
      β†’ to lookup categories, use lookup().donor_id
      β†’ to save, run .add_new_from('donor_id')
πŸ’‘ saving labels for 'self_reported_ethnicity'
βœ… added 1 record from laminlabs/cellxgene with Ethnicity.name for self_reported_ethnicity: ['unknown']
πŸ’‘ saving labels for 'sex_ontology_term_id'
βœ… added 1 record from laminlabs/cellxgene with Phenotype.ontology_id for sex_ontology_term_id: ['PATO:0000384']
πŸ’‘ saving labels for 'suspension_type'
βœ… added 1 record from laminlabs/cellxgene with ULabel.name for suspension_type: ['cell']
πŸ’‘ saving labels for 'tissue'
❗ 1 non-validated categories are not saved in Tissue.name: ['lungg']!
      β†’ to lookup categories, use lookup().tissue
      β†’ to save, run .add_new_from('tissue')
βœ… added 16 records from laminlabs/cellxgene with Tissue.name for tissue: ['spleen', 'sigmoid colon', 'jejunal epithelium', 'bone marrow', 'skeletal muscle tissue', 'transverse colon', 'thymus', 'liver', 'duodenum', 'lamina propria', 'mesenteric lymph node', 'caecum', 'omentum', 'blood', 'ileum', 'thoracic lymph node']
πŸ’‘ saving labels for 'tissue_type'
βœ… added 1 record from laminlabs/cellxgene with ULabel.name for tissue_type: ['tissue']
πŸ’‘ saving labels for 'organism'

For donors, we register the new labels:

annotate.add_new_from("donor_id")
βœ… added 12 records with ULabel.name for donor_id: ['D496-1', '621B-1', 'A29-1', 'A36-1', 'A35-1', '637C-1', 'A52-1', 'A37-1', 'D503-1', '640C-1', 'A31-1', '582C-1']

An error is shown for the tissue label β€œlungg”, which is a typo, should be β€œlung”. Let’s fix it:

tissues = annotate.lookup().tissue
# using a lookup object to find the correct term
tissues.lung
Tissue(uid='7Tt4iEKc', name='lung', ontology_id='UBERON:0002048', synonyms='pulmo', description='Respiration Organ That Develops As An Outpocketing Of The Esophagus.', updated_at=2024-01-08 15:22:49 UTC, public_source_id=47, created_by_id=1)
adata.obs["tissue"] = adata.obs["tissue"].cat.rename_categories(
    {"lungg": tissues.lung.name}
)
annotate.add_validated_from("tissue")
βœ… added 1 record from laminlabs/cellxgene with Tissue.name for tissue: ['lung']

Let’s validate the object again:

validated = annotate.validate()
πŸ’‘ validating metadata using registries of instance laminlabs/cellxgene
βœ… var_index is validated against Gene.ensembl_gene_id
βœ… assay is validated against ExperimentalFactor.name
βœ… cell_type is validated against CellType.name
βœ… development_stage is validated against DevelopmentalStage.name
βœ… disease is validated against Disease.name
βœ… donor_id is validated against ULabel.name
βœ… self_reported_ethnicity is validated against Ethnicity.name
βœ… sex_ontology_term_id is validated against Phenotype.ontology_id
βœ… suspension_type is validated against ULabel.name
βœ… tissue is validated against Tissue.name
βœ… tissue_type is validated against ULabel.name
βœ… organism is validated against Organism.name
validated
True
adata.obs.head()
donor_id tissue cell_type assay sex_ontology_term_id organism disease development_stage self_reported_ethnicity suspension_type tissue_type
CZINY-0109_CTGGTCTAGTCTGTAC D496-1 blood classical monocyte 10x 3' v3 PATO:0000384 human normal unknown unknown cell tissue
CZI-IA10244332+CZI-IA10244434_CCTTCGACATACTCTT 621B-1 thoracic lymph node T follicular helper cell 10x 5' v2 PATO:0000384 human normal unknown unknown cell tissue
Pan_T7935491_CTGGTCTGTACATGTC A29-1 spleen memory B cell 10x 5' v1 PATO:0000384 human normal unknown unknown cell tissue
Pan_T7980367_GGGCATCCAGGTGGAT A36-1 lung alveolar macrophage 10x 5' v1 PATO:0000384 human normal unknown unknown cell tissue
Pan_T7935494_ATCATGGTCTACCTGC A29-1 mesenteric lymph node naive thymus-derived CD4-positive, alpha-beta ... 10x 5' v1 PATO:0000384 human normal unknown unknown cell tissue

Register file#

Now we are ready to register the artifact to the working instance:

# track the current notebook
ln.transform.stem_uid = "WOK3vP0bNGLx"
ln.transform.version = "0"
ln.track()
πŸ’‘ Assuming editor is Jupyter Lab.
πŸ’‘ notebook imports: cellxgene_lamin==0.2.1 lamindb==0.71.0 lnschema_bionty==0.41.9
πŸ’‘ saved: Transform(uid='WOK3vP0bNGLx6K79', name='Annotate an h5ad file based on CELLxGENE schema', key='cellxgene-annotate', version='0', type='notebook', updated_at=2024-05-01 18:51:46 UTC, created_by_id=1)
πŸ’‘ saved: Run(uid='NIYRMzYX9HVmKHRyBYpk', transform_id=1, created_by_id=1)
πŸ’‘ tracked pip freeze > /home/runner/.cache/lamindb/run_env_pip_NIYRMzYX9HVmKHRyBYpk.txt
# this will modify the AnnData object by adding required columns and categories
artifact = annotate.save_artifact(description="test h5ad file")
πŸ’‘ path content will be copied to default storage upon `save()` with key `None` ('.lamindb/grwhQApI0RNbtcwIy6JF.h5ad')
βœ… storing artifact 'grwhQApI0RNbtcwIy6JF' at '/home/runner/work/cellxgene-lamin/cellxgene-lamin/docs/test-cellxgene-annotate/.lamindb/grwhQApI0RNbtcwIy6JF.h5ad'
πŸ’‘ parsing feature names of X stored in slot 'var'
βœ…    36503 terms (100.00%) are validated for ensembl_gene_id
βœ…    linked: FeatureSet(uid='hRXlpnSWZ6YytVwncnth', n=36503, type='number', registry='bionty.Gene', hash='xtVNbbhs3ty63qs-rwKZ', created_by_id=1)
πŸ’‘ parsing feature names of slot 'obs'
βœ…    11 terms (100.00%) are validated for name
βœ…    linked: FeatureSet(uid='70HYmM82B5sexTzXLrwH', n=11, registry='core.Feature', hash='rDPm9fsZP1Ur28L9E8S-', created_by_id=1)
βœ… saved 2 feature sets for slots: 'var','obs'
βœ… linked feature 'sex_ontology_term_id' to registry 'bionty.Phenotype'

View the registered artifact with metadata:

artifact.describe()
Artifact(uid='grwhQApI0RNbtcwIy6JF', suffix='.h5ad', accessor='AnnData', description='test h5ad file', size=54727155, hash='5esmrdu-DFv9nKyK4ZFA0G', hash_type='sha1-fl', n_observations=1626, visibility=1, key_is_virtual=True, updated_at=2024-05-01 18:51:50 UTC)

Provenance:
  πŸ“Ž storage: Storage(uid='2AiCunADFPVb', root='/home/runner/work/cellxgene-lamin/cellxgene-lamin/docs/test-cellxgene-annotate', type='local', instance_uid='1Dd1nk1DP8Uy')
  πŸ“Ž transform: Transform(uid='WOK3vP0bNGLx6K79', name='Annotate an h5ad file based on CELLxGENE schema', key='cellxgene-annotate', version='0', type='notebook')
  πŸ“Ž run: Run(uid='NIYRMzYX9HVmKHRyBYpk', started_at=2024-05-01 18:51:46 UTC, is_consecutive=True)
  πŸ“Ž created_by: User(uid='DzTjkKse', handle='testuser1', name='Test User1')
Features:
  var: FeatureSet(uid='hRXlpnSWZ6YytVwncnth', n=36503, type='number', registry='bionty.Gene')
    'CCDC117', 'PLOD1', 'SLC26A8', 'XRCC5', 'SMOC1', 'PELATON', 'HDGFL2', 'A2ML1-AS1', 'LINC00028', 'RIPK2-DT', 'PSMB2', 'RECQL', 'SLC1A2', 'MEIOSIN', 'SMAD5', 'ESR2', 'TPBGL', 'SLC2A10', 'ZXDA', 'PHACTR3', ...
  obs: FeatureSet(uid='70HYmM82B5sexTzXLrwH', n=11, registry='core.Feature')
    πŸ”— assay (3, bionty.ExperimentalFactor): '10x 5' v2', '10x 3' v3', '10x 5' v1'
    πŸ”— cell_type (31, bionty.CellType): 'naive thymus-derived CD4-positive, alpha-beta T cell', 'CD8-positive, alpha-beta memory T cell', 'progenitor cell', 'effector memory CD8-positive, alpha-beta T cell, terminally differentiated', 'CD16-positive, CD56-dim natural killer cell, human', 'dendritic cell, human', 'CD16-negative, CD56-bright natural killer cell, human', 'conventional dendritic cell', 'group 3 innate lymphoid cell', 'naive thymus-derived CD8-positive, alpha-beta T cell', ...
    πŸ”— development_stage (1, bionty.DevelopmentalStage): 'unknown'
    πŸ”— disease (1, bionty.Disease): 'normal'
    πŸ”— donor_id (12, core.ULabel): '640C-1', 'A52-1', 'A37-1', 'A31-1', '637C-1', 'A35-1', '582C-1', '621B-1', 'D496-1', 'D503-1', ...
    πŸ”— self_reported_ethnicity (1, bionty.Ethnicity): 'unknown'
    πŸ”— tissue (17, bionty.Tissue): 'mesenteric lymph node', 'jejunal epithelium', 'transverse colon', 'thymus', 'caecum', 'sigmoid colon', 'bone marrow', 'duodenum', 'skeletal muscle tissue', 'blood', ...
    πŸ”— organism (1, bionty.Organism): 'human'
    πŸ”— tissue_type (1, core.ULabel): 'tissue'
    πŸ”— suspension_type (1, core.ULabel): 'cell'
    πŸ”— sex_ontology_term_id (1, bionty.Phenotype): 'male'
Labels:
  πŸ“Ž organism (1, bionty.Organism): 'human'
  πŸ“Ž tissues (17, bionty.Tissue): 'mesenteric lymph node', 'jejunal epithelium', 'transverse colon', 'thymus', 'caecum', 'sigmoid colon', 'bone marrow', 'duodenum', 'skeletal muscle tissue', 'blood', ...
  πŸ“Ž cell_types (31, bionty.CellType): 'naive thymus-derived CD4-positive, alpha-beta T cell', 'CD8-positive, alpha-beta memory T cell', 'progenitor cell', 'effector memory CD8-positive, alpha-beta T cell, terminally differentiated', 'CD16-positive, CD56-dim natural killer cell, human', 'dendritic cell, human', 'CD16-negative, CD56-bright natural killer cell, human', 'conventional dendritic cell', 'group 3 innate lymphoid cell', 'naive thymus-derived CD8-positive, alpha-beta T cell', ...
  πŸ“Ž diseases (1, bionty.Disease): 'normal'
  πŸ“Ž phenotypes (1, bionty.Phenotype): 'male'
  πŸ“Ž experimental_factors (3, bionty.ExperimentalFactor): '10x 5' v2', '10x 3' v3', '10x 5' v1'
  πŸ“Ž developmental_stages (1, bionty.DevelopmentalStage): 'unknown'
  πŸ“Ž ethnicities (1, bionty.Ethnicity): 'unknown'
  πŸ“Ž ulabels (14, core.ULabel): '640C-1', 'A52-1', 'tissue', 'A37-1', 'A31-1', '637C-1', 'A35-1', 'cell', '582C-1', '621B-1', ...

Register collection#

Register a new collection for the registered artifact:

# register a new collection
collection = annotate.save_collection(
    artifact,  # registered artifact above, can also pass a list of artifacts
    name=(  # title of the publication
        "Cross-tissue immune cell analysis reveals tissue-specific features in humans"
        " (for test demo only)"
    ),
    description="10.1126/science.abl5197",  # DOI of the publication
    reference="E-MTAB-11536",  # accession number (e.g. GSE#, E-MTAB#, etc.)
    reference_type="ArrayExpress",
)  # source type (e.g. GEO, ArrayExpress, SRA, etc.)
βœ… loaded: FeatureSet(uid='hRXlpnSWZ6YytVwncnth', n=36503, type='number', registry='bionty.Gene', hash='xtVNbbhs3ty63qs-rwKZ', updated_at=2024-05-01 18:51:49 UTC, created_by_id=1)
βœ… loaded: FeatureSet(uid='70HYmM82B5sexTzXLrwH', n=11, registry='core.Feature', hash='rDPm9fsZP1Ur28L9E8S-', updated_at=2024-05-01 18:51:50 UTC, created_by_id=1)
collection.artifact

Return an input h5ad file for cellxgene-schema#

adata_cxg = annotate.to_cellxgene(is_primary_data=True)
adata_cxg
AnnData object with n_obs Γ— n_vars = 1626 Γ— 36503
    obs: 'donor_id', 'sex_ontology_term_id', 'suspension_type', 'tissue_type', 'tissue_ontology_term_id', 'cell_type_ontology_term_id', 'assay_ontology_term_id', 'organism_ontology_term_id', 'disease_ontology_term_id', 'development_stage_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'is_primary_data'
    var: 'feature_is_filtered'
    uns: 'default_embedding', 'title', 'cxg_lamin_schema_reference', 'cxg_lamin_schema_version'
    obsm: 'X_umap'
adata_cxg.write_h5ad("anndata_human_immune_cells_cxg.h5ad")
!cellxgene-schema validate anndata_human_immune_cells_cxg.h5ad
Loading dependencies
Loading validator modules
Starting validation...
WARNING: Validation of raw layer was not performed due to current errors, try again after fixing current errors.
ERROR: 'ENSG00000269933' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000261737' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000259834' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000256374' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000263464' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000203812' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000272196' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000272880' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000270188' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000287116' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000237133' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000224739' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000227902' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000239467' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000272551' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000280374' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000236886' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000229352' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000286601' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000227021' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000259855' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000273301' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000271870' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000237838' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000286996' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000269028' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000286699' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000273370' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000261490' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000272567' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000270394' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000272370' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000272354' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000251044' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000272040' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000182230' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000204092' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000261068' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000236740' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000236996' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000232295' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000271734' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000236673' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000227220' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000236166' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000112096' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000285162' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000286228' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000237513' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000285106' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000226380' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000270672' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000225932' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000244693' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000268955' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000272267' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000253878' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000259820' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000226403' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000233776' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000269900' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000261534' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000237548' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000239665' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000256892' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000249860' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000271409' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000224745' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000261438' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000231575' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000260461' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000255823' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000254740' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000254561' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000282080' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000256427' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000287388' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000276814' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000280710' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000215271' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000258414' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000258808' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000277050' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000273888' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000258861' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000259444' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000244952' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000273923' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000262668' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000232196' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000256618' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000221995' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000226377' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000273576' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000267637' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000282965' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000273837' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000286949' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000256222' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000280095' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000278927' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000278955' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000277352' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000239446' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000256045' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000228906' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000228139' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000261773' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000278198' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000273496' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000277666' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000278782' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000277761' is not a valid feature ID in 'var'.
ERROR: 'ENSG00000269933' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000261737' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000259834' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000256374' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000263464' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000203812' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000272196' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000272880' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000270188' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000287116' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000237133' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000224739' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000227902' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000239467' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000272551' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000280374' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000236886' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000229352' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000286601' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000227021' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000259855' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000273301' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000271870' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000237838' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000286996' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000269028' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000286699' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000273370' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000261490' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000272567' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000270394' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000272370' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000272354' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000251044' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000272040' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000182230' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000204092' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000261068' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000236740' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000236996' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000232295' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000271734' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000236673' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000227220' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000236166' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000112096' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000285162' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000286228' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000237513' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000285106' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000226380' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000270672' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000225932' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000244693' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000268955' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000272267' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000253878' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000259820' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000226403' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000233776' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000269900' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000261534' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000237548' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000239665' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000256892' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000249860' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000271409' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000224745' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000261438' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000231575' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000260461' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000255823' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000254740' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000254561' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000282080' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000256427' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000287388' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000276814' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000280710' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000215271' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000258414' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000258808' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000277050' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000273888' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000258861' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000259444' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000244952' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000273923' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000262668' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000232196' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000256618' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000221995' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000226377' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000273576' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000267637' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000282965' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000273837' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000286949' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000256222' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000280095' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000278927' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000278955' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000277352' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000239446' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000256045' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000228906' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000228139' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000261773' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000278198' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000273496' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000277666' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000278782' is not a valid feature ID in 'raw.var'.
ERROR: 'ENSG00000277761' is not a valid feature ID in 'raw.var'.
Validation complete in 0:00:03.130235 with status is_valid=False