hub

CELLxGENE: scRNA-seq#

CZ CELLxGENE hosts the globally largest standardized collection of scRNA-seq datasets.

LaminDB makes it easy to query the CELLxGENE data and integrate it with in-house data of any kind (omics, phenotypes, pdfs, notebooks, ML models, …).

You can use the CELLxGENE data in three ways:

  1. In the current guide, you’ll see how to query metadata and data based on AnnData objects.

  2. If you want to use these in your own LaminDB instance, see the transfer guide.

  3. If you’d like to leverage the TileDB-SOMA API for the data subset of CELLxGENE Census, see the Census guide.

If you are interested in building similar data assets in-house:

  1. See the scRNA guide for how to create a growing versioned queryable scRNA-seq dataset.

  2. Reach out if you are interested in a full zero-copy clone of laminlabs/cellxgene to accelerate building your in-house LaminDB instances.

Setup#

Load the public LaminDB instance that mirrors cellxgene on the CLI:

!lamin load laminlabs/cellxgene
💡 loaded instance: laminlabs/cellxgene

import lamindb as ln
import lnschema_bionty as lb
💡 lamindb instance: laminlabs/cellxgene

Query & understand metadata#

Auto-complete metadata#

You can create look-up objects for any registry in LaminDB, including basic biological entities and things like users or storage locations.

Let’s use auto-complete to look up cell types:

Show me a screenshot
cell_types = lb.CellType.lookup()
cell_types.effector_t_cell
CellType(uid='yvHkIrVI', name='effector T cell', ontology_id='CL:0000911', synonyms='effector T-cell|effector T-lymphocyte|effector T lymphocyte', description='A Differentiated T Cell With Ability To Traffic To Peripheral Tissues And Is Capable Of Mounting A Specific Immune Response.', updated_at=2023-11-28 22:30:57 UTC, bionty_source_id=48, created_by_id=1)

You can also arbitrarily chain filters and create lookups from them:

organisms = lb.Organism.lookup()  # species
genes = lb.Gene.filter(organism=organisms.human).lookup()  # ~60k human genes
features = ln.Feature.lookup()  # non-gene features, like `cell_type`, `assay`, etc.
experimental_factors = lb.ExperimentalFactor.lookup()  # labels for experimental factors
tissues = lb.Tissue.lookup()  # tissue labels
ulabels = ln.ULabel.lookup()  # universal labels, e.g. dataset collections
suspension_types = ulabels.is_suspension_type.children.all().lookup()

Search & filter metadata#

We can use search & filters for metadata:

lb.CellType.search("effector T cell")
Hide code cell output
uid synonyms score
name
effector T cell yvHkIrVI effector T-cell|effector T-lymphocyte|effector... 100.0
ectodermal cell e2QmwdvB ectoderm cell 71.4
helper T cell TwCkoWgT helper T-lymphocyte|T-helper cell|helper T lym... 71.4
memory T cell Re00kg0W memory T-cell|memory T lymphocyte|memory T-lym... 71.4
sensory receptor cell j0WdHDdi receptor cell 71.4
excretory cell AA00OTcM 69.0
secretory cell wVT2qeb9 69.0
neurectodermal cell KjesToYa neurectoderm cell 68.8
pro-T cell XaWRfcwg pro-T lymphocyte|progenitor T cell 68.8
regulatory T cell Z7uMAWUF regulatory T lymphocyte|Treg|regulatory T-lymp... 68.8
Kupffer cell YN0gzDt3 hepatic macrophage|macrophagocytus stellatus|l... 66.7
chemoreceptor cell 9lDVTP4o 66.7
follicular B cell FMTngXKK Fo B cell|follicular B lymphocyte|follicular B... 66.7
lb.CellType.search("CD8-positive cytokine effector T cell")
Hide code cell output
uid synonyms score
name
CD8-positive, alpha-beta cytokine secreting effector T cell pam4JjkW CD8-positive, alpha-beta cytokine secreting ef... 77.1
CD4-positive helper T cell oyjZhi4K CD4-positive T-helper cell|CD4-positive helper... 69.8
CD8-positive, alpha-beta T cell VnKkQsME CD8-positive, alpha-beta T-cell|CD8-positive, ... 67.6
CD8-positive, alpha-beta cytotoxic T cell baEuJabx CD8-positive, alpha-beta cytotoxic T-cell|CD8-... 66.7
CD8-positive, alpha-beta memory T cell 9FR0LnTI CD8-positive, alpha-beta memory T lymphocyte|C... 66.7
CD1c-positive myeloid dendritic cell gXOMeVM0 65.8
CD141-positive myeloid dendritic cell dRUgw2Fo 64.9
CD4-positive, alpha-beta T cell 05vQoepH CD4-positive, alpha-beta T lymphocyte|CD4-posi... 64.7
CD4-positive, alpha-beta cytotoxic T cell 3sKh2cA7 CD4-positive, alpha-beta cytotoxic T-cell|CD4-... 64.1
CD34-positive, CD38-negative hematopoietic stem cell Tf2NM0hD CD133-positive hematopoietic stem cell 64.0

And use a uid to filter exactly one metadata record:

effector_t_cell = lb.CellType.filter(uid="yvHkIrVI").one()
effector_t_cell
CellType(uid='yvHkIrVI', name='effector T cell', ontology_id='CL:0000911', synonyms='effector T-cell|effector T-lymphocyte|effector T lymphocyte', description='A Differentiated T Cell With Ability To Traffic To Peripheral Tissues And Is Capable Of Mounting A Specific Immune Response.', updated_at=2023-11-28 22:30:57 UTC, bionty_source_id=48, created_by_id=1)

Understand ontologies#

View the surrounding ontology terms:

effector_t_cell.view_parents(distance=2, with_children=True)
_images/fbfb23ee2547e35ae5582e7af59abd1ce59dcb18bd469282c20f01cc4e4ae6de.svg

Or access them programmatically:

effector_t_cell.children.df()
uid name ontology_id abbr synonyms description bionty_source_id updated_at created_by_id
id
931 o9T53Uso effector CD8-positive, alpha-beta T cell CL:0001050 None effector CD8-positive, alpha-beta T lymphocyte... A Cd8-Positive, Alpha-Beta T Cell With The Phe... 48 2023-11-28 22:27:55.565981+00:00 1
1088 tQZFurra effector CD4-positive, alpha-beta T cell CL:0001044 None effector CD4-positive, alpha-beta T lymphocyte... A Cd4-Positive, Alpha-Beta T Cell With The Phe... 48 2023-11-28 22:27:55.569832+00:00 1
1229 7roaTzhI exhausted T cell CL:0011025 None Tex cell|An effector T cell that displays impa... None 48 2023-11-28 22:27:55.572884+00:00 1
1309 OxsmyL44 cytotoxic T cell CL:0000910 None cytotoxic T lymphocyte|cytotoxic T-lymphocyte|... A Mature T Cell That Differentiated And Acquir... 48 2023-11-28 22:27:55.575444+00:00 1
1331 TwCkoWgT helper T cell CL:0000912 None helper T-lymphocyte|T-helper cell|helper T lym... A Effector T Cell That Provides Help In The Fo... 48 2023-11-28 22:27:55.575955+00:00 1

Query files#

Unlike in the SOMA guide, here, we’ll query sets of h5ad files, which correspond to AnnData objects.

To access them, we query the Dataset record that links the latest LTS set of h5ad files:

dataset = ln.Dataset.filter(name="cellxgene-census", version="2023-07-25").one()
dataset
Dataset(uid='OirHTWDrudY2TYltvIX1', name='cellxgene-census', version='2023-07-25', hash='pEJ9uvIeTLvHkZW2TBT5', visibility=1, updated_at=2023-11-28 21:46:40 UTC, transform_id=11, run_id=16, created_by_id=1)

You can get all linked files as a dataframe - there are 850 files in cellxgene-census version 2023-07-25.

dataset.files.df().head()  # not tracking run & transform because read-only instance
Hide code cell output
❗ no run & transform get linked, consider passing a `run` or calling ln.track()
uid storage_id key suffix accessor description version size hash hash_type transform_id run_id initial_version_id visibility key_is_virtual updated_at created_by_id
id
1029 6IYilXiyiTxZYMCJ2TnY 2 cell-census/2023-07-25/h5ads/2fb24a91-55b9-4cc... .h5ad AnnData High Resolution Slide-seqV2 Spatial Transcript... None 8856712 BXH-IIW1Et1CyugN0DMroQ-2 md5-n 11 16 None 1 False 2023-11-28 22:45:38.554629+00:00 1
872 vEw6vGy47Zi0Qj6T6YJr 2 cell-census/2023-07-25/h5ads/0041b9c3-6a49-4bf... .h5ad AnnData Tabula Sapiens None 198592773 0tEolD_cGXenPjobh1M8Gw-24 md5-n 11 16 None 1 False 2023-11-28 22:44:26.759174+00:00 1
873 dptYEcjH6o3p9Vy1qZAp 2 cell-census/2023-07-25/h5ads/00476f9f-ebc1-4b7... .h5ad AnnData Human Brain Cell Atlas v1.0 None 131643578 HCQOV1VHonILymJHLkcNdg-16 md5-n 11 16 None 1 False 2023-11-28 22:44:27.440055+00:00 1
875 bittNWi0gJTdcJ0pm9Jo 2 cell-census/2023-07-25/h5ads/00ff600e-6e2e-4d7... .h5ad AnnData Single-cell analysis of human B cell maturatio... None 5919670 PxGgTrFmiCh6AMwiu1fHWw md5 11 16 None 1 False 2023-11-28 22:44:28.116237+00:00 1
876 HQPT59lX80spJyfKXDC5 2 cell-census/2023-07-25/h5ads/01209dce-3575-4be... .h5ad AnnData Single-cell transcriptomics of human T cells r... None 312536917 zAlluOa2WUIWvs2jkXKvkQ-38 md5-n 11 16 None 1 False 2023-11-28 22:44:28.784784+00:00 1

You can query across files by arbitrary metadata combinations, for instance:

query = dataset.files.filter(
    organism=organisms.human,
    cell_types__in=[cell_types.dendritic_cell, cell_types.neutrophil],
    tissues=tissues.kidney,
    ulabels=suspension_types.cell,
    experimental_factors=experimental_factors.ln_10x_3_v2,
)
query = query.order_by("size").distinct()  # order by size, drop duplicates
query.df().head()  # convert to DataFrame
Hide code cell output
uid storage_id key suffix accessor description version size hash hash_type transform_id run_id initial_version_id visibility key_is_virtual updated_at created_by_id
id
983 WwmBIhBNLTlRcSoBky88 2 cell-census/2023-07-25/h5ads/20d87640-4be8-487... .h5ad AnnData Spatiotemporal immune zonation of the human ki... None 44647761 dAApZI2IZr64F5b1jDMgtA-6 md5-n 11 16 None 1 False 2023-11-28 22:45:31.292961+00:00 1
1019 gHlQ5Muwu3G9pvFC4GDV 2 cell-census/2023-07-25/h5ads/2d31c0ca-0233-41c... .h5ad AnnData Spatiotemporal immune zonation of the human ki... None 64056560 YjLm7iPkIFIEYimgQEfJSA-8 md5-n 11 16 None 1 False 2023-11-28 22:45:52.133169+00:00 1
1382 P4Oai3OLGAzRwoicQ5HD 2 cell-census/2023-07-25/h5ads/9ea768a2-87ab-46b... .h5ad AnnData Spatiotemporal immune zonation of the human ki... None 192484358 odAyLe_6uoRCQV5eJRijqQ-23 md5-n 11 16 None 1 False 2023-11-28 22:49:46.348257+00:00 1
932 DSpevwaIl5E2jIWHp0uR 2 cell-census/2023-07-25/h5ads/105c7dad-0468-462... .h5ad AnnData Single-cell transcriptomes from human kidneys ... None 232722706 3yOOhI-gP3TlpyLcDNUTBA-28 md5-n 11 16 None 1 False 2023-11-28 22:53:48.624548+00:00 1
1579 11HQaMeIUaOwyHoOjEVN 2 cell-census/2023-07-25/h5ads/d7dcfd8f-2ee7-438... .h5ad AnnData Spatiotemporal immune zonation of the human ki... None 341214674 R8-G4h5ztVfX29r58T4g_Q-41 md5-n 11 16 None 1 False 2023-11-28 22:52:00.821957+00:00 1

Query arrays#

Each file stores an array in form of an annotated data matrix, an AnnData object.

Let’s look at the first array in the file query and show metadata using .describe():

file = query.first()
file.describe()
Hide code cell output
File(uid='WwmBIhBNLTlRcSoBky88', key='cell-census/2023-07-25/h5ads/20d87640-4be8-487f-93d4-dce38378d00f.h5ad', suffix='.h5ad', accessor='AnnData', description='Spatiotemporal immune zonation of the human kidney', size=44647761, hash='dAApZI2IZr64F5b1jDMgtA-6', hash_type='md5-n', visibility=1, key_is_virtual=False, updated_at=2023-11-28 22:45:31 UTC)

Provenance:
  🗃️ storage: Storage(uid='oIYGbD74', root='s3://cellxgene-data-public', type='s3', region='us-west-2', updated_at=2023-10-16 15:04:08 UTC, created_by_id=1)
  📔 transform: Transform(uid='pNa7RdI26sp4z8', name='Register files from Census release 2023-07-25', short_name='census-release-2023-07-25', version='0', type='notebook', updated_at=2023-11-29 13:53:43 UTC, latest_report_id=1724, source_file_id=1723, created_by_id=1)
  👣 run: Run(uid='ZYgsnqK5v2hPmFlS0kfG', run_at=2023-11-29 13:52:08 UTC, is_consecutive=False, transform_id=11, created_by_id=1, report_id=1724)
  👤 created_by: User(uid='kmvZDIX9', handle='sunnyosun', name='Sunny Sun', updated_at=2023-11-28 21:14:48 UTC)
  ⬇️ input_of (core.Run): ['2023-11-29 12:51:05 UTC']
Features:
  obs: FeatureSet(uid='kwKICViF5O3QjHdg0nov', name='obs features', n=9, type='category', registry='core.Feature', hash='Bx10EzvDxdlAVjqVKdKC', updated_at=2023-11-29 09:28:28 UTC, created_by_id=1)
    🔗 assay (1, bionty.ExperimentalFactor): '10x 3' v2'
    🔗 cell_type (12, bionty.CellType): 'CD8-positive, alpha-beta T cell', 'mature NK T cell', 'CD4-positive, alpha-beta T cell', 'natural killer cell', 'non-classical monocyte', 'plasmacytoid dendritic cell', 'neutrophil', 'B cell', 'kidney resident macrophage', 'dendritic cell', ...
    🔗 development_stage (12, bionty.DevelopmentalStage): '2-year-old human stage', '4-year-old human stage', '12-year-old human stage', '44-year-old human stage', '49-year-old human stage', '53-year-old human stage', '63-year-old human stage', '64-year-old human stage', '67-year-old human stage', '70-year-old human stage', ...
    🔗 disease (1, bionty.Disease): 'normal'
    🔗 donor_id (13, core.ULabel): 'TxK2', 'Wilms1', 'TxK4', 'TTx', 'RCC3', 'RCC1', 'VHL', 'TxK3', 'TxK1', 'Wilms3', ...
    🔗 self_reported_ethnicity (1, bionty.Ethnicity): 'unknown'
    🔗 sex (2, bionty.Phenotype): 'male', 'female'
    🔗 suspension_type (1, core.ULabel): 'cell'
    🔗 tissue (5, bionty.Tissue): 'renal medulla', 'kidney blood vessel', 'renal pelvis', 'cortex of kidney', 'kidney'
  external: FeatureSet(uid='zIgncie4AywRKgLmKHUW', name='external features', n=2, type='category', registry='core.Feature', hash='5E4xD6tOhDB5EOnLx3tv', updated_at=2023-11-29 09:28:20 UTC, created_by_id=1)
    🔗 organism (1, bionty.Organism): 'human'
    🔗 collection (1, core.ULabel): 'Spatiotemporal immune zonation of the human kidney'
  var: FeatureSet(uid='8AAiWbuUrP2DI1MpuPD0', n=32922, type='number', registry='bionty.Gene', hash='fHMWMViqV_PilN1PWrgF', updated_at=2023-11-29 13:28:55 UTC, created_by_id=1)
    'MIR1302-2HG', 'FAM138A', 'OR4F5', 'None', 'None', 'None', 'None', 'DDX11L17', 'WASH9P', 'None', 'None', 'None', 'None', 'None', 'None', 'LINC01409', 'FAM87B', 'LINC00115', 'FAM41C', 'None', ...
Labels:
  🏷️ organism (1, bionty.Organism): 'human'
  🏷️ tissues (5, bionty.Tissue): 'renal medulla', 'kidney blood vessel', 'renal pelvis', 'cortex of kidney', 'kidney'
  🏷️ cell_types (12, bionty.CellType): 'CD8-positive, alpha-beta T cell', 'mature NK T cell', 'CD4-positive, alpha-beta T cell', 'natural killer cell', 'non-classical monocyte', 'plasmacytoid dendritic cell', 'neutrophil', 'B cell', 'kidney resident macrophage', 'dendritic cell', ...
  🏷️ diseases (1, bionty.Disease): 'normal'
  🏷️ phenotypes (2, bionty.Phenotype): 'male', 'female'
  🏷️ experimental_factors (1, bionty.ExperimentalFactor): '10x 3' v2'
  🏷️ developmental_stages (12, bionty.DevelopmentalStage): '2-year-old human stage', '4-year-old human stage', '12-year-old human stage', '44-year-old human stage', '49-year-old human stage', '53-year-old human stage', '63-year-old human stage', '64-year-old human stage', '67-year-old human stage', '70-year-old human stage', ...
  🏷️ ethnicities (1, bionty.Ethnicity): 'unknown'
  🏷️ ulabels (15, core.ULabel): 'Spatiotemporal immune zonation of the human kidney', 'TxK2', 'Wilms1', 'TxK4', 'TTx', 'RCC3', 'RCC1', 'VHL', 'TxK3', 'TxK1', ...
More ways of accessing metadata

Access just features:

file.features

Or get labels given a feature:

file.labels.get(features.tissue).df()
file.labels.get(features.collection).one()

If you want to query a slice of the array data, you have two options:

  1. Cache & load the entire array into memory via file.load() -> AnnData (caches the h5ad on disk, so that you only download once)

  2. Stream the array from the cloud using a cloud-backed accessor file.backed() -> AnnDataAccessor

Both options will run much faster if you run them close to the data (AWS S3 on the US West Coast, consider logging into hosted compute there).

1. Cache & load#

Let us first consider option 1:

adata = file.load()
adata
Hide code cell output
AnnData object with n_obs × n_vars = 7803 × 32922
    obs: 'donor_id', 'donor_age', 'self_reported_ethnicity_ontology_term_id', 'organism_ontology_term_id', 'sample_uuid', 'tissue_ontology_term_id', 'development_stage_ontology_term_id', 'suspension_uuid', 'suspension_type', 'library_uuid', 'assay_ontology_term_id', 'mapped_reference_annotation', 'is_primary_data', 'cell_type_ontology_term_id', 'author_cell_type', 'disease_ontology_term_id', 'reported_diseases', 'sex_ontology_term_id', 'compartment', 'Experiment', 'Project', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage'
    var: 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype'
    uns: 'default_embedding', 'schema_version', 'title'
    obsm: 'X_umap'

Now we have an AnnData object, which stores observation annotations matching our file-level query in the .obs slot, and we can re-use almost the same query on the array-level:

See the file-level query for comparison
query = dataset.files.filter(
    organism=organisms.human,
    cell_types__in=[cell_types.dendritic_cell, cell_types.neutrophil],
    tissues=tissues.kidney,
    ulabels=suspension_types.cell,
    experimental_factors=experimental_factors.ln_10x_3_v2,
)

AnnData uses pandas to manage metadata and the syntax differs slightly. However, the same metadata records are used.

adata_slice = adata[
    adata.obs.cell_type.isin(
        [cell_types.dendritic_cell.name, cell_types.neutrophil.name]
    )
    & (adata.obs.tissue == tissues.kidney.name)
    & (adata.obs.suspension_type == suspension_types.cell.name)
    & (adata.obs.assay == experimental_factors.ln_10x_3_v2.name)
]
adata_slice
Hide code cell output
View of AnnData object with n_obs × n_vars = 199 × 32922
    obs: 'donor_id', 'donor_age', 'self_reported_ethnicity_ontology_term_id', 'organism_ontology_term_id', 'sample_uuid', 'tissue_ontology_term_id', 'development_stage_ontology_term_id', 'suspension_uuid', 'suspension_type', 'library_uuid', 'assay_ontology_term_id', 'mapped_reference_annotation', 'is_primary_data', 'cell_type_ontology_term_id', 'author_cell_type', 'disease_ontology_term_id', 'reported_diseases', 'sex_ontology_term_id', 'compartment', 'Experiment', 'Project', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage'
    var: 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype'
    uns: 'default_embedding', 'schema_version', 'title'
    obsm: 'X_umap'

2. Stream#

Let us now consider option 2:

adata_backed = file.backed()
adata_backed
Hide code cell output
AnnDataAccessor object with n_obs × n_vars = 7803 × 32922
  constructed for the AnnData object 20d87640-4be8-487f-93d4-dce38378d00f.h5ad
    obs: ['Experiment', 'Project', '_index', 'assay', 'assay_ontology_term_id', 'author_cell_type', 'cell_type', 'cell_type_ontology_term_id', 'compartment', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_age', 'donor_id', 'is_primary_data', 'library_uuid', 'mapped_reference_annotation', 'organism', 'organism_ontology_term_id', 'reported_diseases', 'sample_uuid', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'suspension_uuid', 'tissue', 'tissue_ontology_term_id']
    obsm: ['X_umap']
    raw: ['X', 'var', 'varm']
    uns: ['default_embedding', 'schema_version', 'title']
    var: ['_index', 'feature_biotype', 'feature_is_filtered', 'feature_name', 'feature_reference']

We now have an AnnDataAccessor object, which behaves much like an AnnData, and the query looks the same:

adata_backed_slice = adata_backed[
    adata_backed.obs.cell_type.isin(
        [cell_types.dendritic_cell.name, cell_types.neutrophil.name]
    )
    & (adata_backed.obs.tissue == tissues.kidney.name)
    & (adata_backed.obs.suspension_type == suspension_types.cell.name)
    & (adata_backed.obs.assay == experimental_factors.ln_10x_3_v2.name)
]

adata_backed_slice.to_memory()
Hide code cell output
AnnData object with n_obs × n_vars = 199 × 32922
    obs: 'donor_id', 'donor_age', 'self_reported_ethnicity_ontology_term_id', 'organism_ontology_term_id', 'sample_uuid', 'tissue_ontology_term_id', 'development_stage_ontology_term_id', 'suspension_uuid', 'suspension_type', 'library_uuid', 'assay_ontology_term_id', 'mapped_reference_annotation', 'is_primary_data', 'cell_type_ontology_term_id', 'author_cell_type', 'disease_ontology_term_id', 'reported_diseases', 'sex_ontology_term_id', 'compartment', 'Experiment', 'Project', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage'
    var: 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype'
    uns: 'default_embedding', 'schema_version', 'title'
    obsm: 'X_umap'

3. Concatenate slices#

If we want to concatenate these individual file-level slices, loop over all files in query and concatenate the results.

How would this look like?
adata_slices = []
for file in query:
    adata_backed = file.backed()
    adata_slice = adata_backed[
        adata_backed.obs.cell_type.isin(
            [cell_types.dendritic_cell.name, cell_types.neutrophil.name]
        )
        & (adata_backed.obs.tissue == tissues.kidney.name)
        & (adata_backed.obs.suspension_type == suspension_types.cell.name)
        & (adata_backed.obs.assay == experimental_factors.ln_10x_3_v2.name)
    ]
    adata_slices.append(adata_slice.to_memory())

import anndata as ad

adata_query = ad.concat(adata_slices)

(LaminDB will track data lineage if we store the concatenated result as a new File or Dataset.)

Train an ML model#

See Train an ML model on a dataset.

Exploring data by collection#

Alternatively,

Let’s search the collections from CELLxGENE:

ulabels.is_collection.search("immune human kidney", limit=10)
uid score
name
Spatiotemporal immune zonation of the human kidney iBsTRZPg 55.1
mouse_HAKYY 1tpv6c10 53.3
mouse_HKIEN gvzn29mX 53.3
Human-WT-D kTlCuYMA 48.3
mouse_EUNBK vva626sq 46.7
mouse_HBTAE hSuaL0T3 46.7
mouse_KFMKE 6k4F0ucU 46.7
mouse_SQUNI 25VbKfwE 46.7
mouse_UAOAE 1H5vbbjE 46.7
mouse_WANEU oetUq9Ie 46.7

Let’s get the full metadata record of the top hit collection:

collection_iBsTRZPg = ln.ULabel.filter(uid="iBsTRZPg").one()

collection_iBsTRZPg
ULabel(uid='iBsTRZPg', name='Spatiotemporal immune zonation of the human kidney', description='10.1126/science.aat5031', reference='120e86b4-1195-48c5-845b-b98054105eec', reference_type='collection_id', updated_at=2023-11-28 21:50:41 UTC, created_by_id=1)

We see it’s a Science paper and we could find more information using the DOI or CELLxGENE collection id.

Each collection has at least one File file associated to it. Let’s query them for this collection:

ln.File.filter(ulabels=collection_iBsTRZPg).df()
uid storage_id key suffix accessor description version size hash hash_type transform_id run_id initial_version_id visibility key_is_virtual updated_at created_by_id
id
1579 11HQaMeIUaOwyHoOjEVN 2 cell-census/2023-07-25/h5ads/d7dcfd8f-2ee7-438... .h5ad AnnData Spatiotemporal immune zonation of the human ki... None 341214674 R8-G4h5ztVfX29r58T4g_Q-41 md5-n 11 16 None 1 False 2023-11-28 22:52:00.821957+00:00 1
1513 6mnZ3SeQFhffr3wTiMEZ 2 cell-census/2023-07-25/h5ads/c52de62a-058d-4d7... .h5ad AnnData Spatiotemporal immune zonation of the human ki... None 109942751 Pqa4Ln0Xt7xmTN5IMiU4OA-14 md5-n 11 16 None 1 False 2023-11-28 22:51:15.788096+00:00 1
1382 P4Oai3OLGAzRwoicQ5HD 2 cell-census/2023-07-25/h5ads/9ea768a2-87ab-46b... .h5ad AnnData Spatiotemporal immune zonation of the human ki... None 192484358 odAyLe_6uoRCQV5eJRijqQ-23 md5-n 11 16 None 1 False 2023-11-28 22:49:46.348257+00:00 1
1030 USUgRVwrCMquHiImAnnJ 2 cell-census/2023-07-25/h5ads/2fc9c59f-3cfd-48d... .h5ad AnnData Spatiotemporal immune zonation of the human ki... None 39294782 rXmzfcuICx72PcUvYHsOiA-5 md5-n 11 16 None 1 False 2023-11-28 22:45:39.238212+00:00 1
1019 gHlQ5Muwu3G9pvFC4GDV 2 cell-census/2023-07-25/h5ads/2d31c0ca-0233-41c... .h5ad AnnData Spatiotemporal immune zonation of the human ki... None 64056560 YjLm7iPkIFIEYimgQEfJSA-8 md5-n 11 16 None 1 False 2023-11-28 22:45:52.133169+00:00 1
983 WwmBIhBNLTlRcSoBky88 2 cell-census/2023-07-25/h5ads/20d87640-4be8-487... .h5ad AnnData Spatiotemporal immune zonation of the human ki... None 44647761 dAApZI2IZr64F5b1jDMgtA-6 md5-n 11 16 None 1 False 2023-11-28 22:45:31.292961+00:00 1
906 b2x19Eg28GGSNnXWVa1m 2 cell-census/2023-07-25/h5ads/08073b32-d389-41f... .h5ad AnnData Spatiotemporal immune zonation of the human ki... None 159545411 e8gqdcJCy_gsp6sZ_8OI7Q-20 md5-n 11 16 None 1 False 2023-11-28 22:44:43.041536+00:00 1