hub

CELLxGENE: scRNA-seq#

CZ CELLxGENE hosts the globally largest standardized collection of scRNA-seq datasets.

LaminDB makes it easy to query the CELLxGENE data and integrate it with in-house data of any kind (omics, phenotypes, pdfs, notebooks, ML models, …).

You can use the CELLxGENE data in three ways:

  1. In the current guide, you’ll see how to query metadata and data based on AnnData objects.

  2. If you want to use these in your own LaminDB instance, see the transfer guide.

  3. If you’d like to leverage the TileDB-SOMA API for the data subset of CELLxGENE Census, see the Census guide.

If you are interested in building similar data assets in-house:

  1. See the scRNA guide for how to create a growing versioned queryable scRNA-seq dataset.

  2. See the cellxgene-lamin-validator for validating, curating and registering your own AnnData objects.

  3. Reach out if you are interested in a full zero-copy clone of laminlabs/cellxgene to accelerate building your in-house LaminDB instances.

Setup#

Load the public LaminDB instance that mirrors cellxgene on the CLI:

!lamin load laminlabs/cellxgene
💡 last migration: lamindb==0.67.0 <> your env: lamindb==0.68.0
💡 loaded instance: laminlabs/cellxgene
import lamindb as ln
import bionty as bt
💡 lamindb instance: laminlabs/cellxgene

Query & understand metadata#

Auto-complete metadata#

You can create look-up objects for any registry in LaminDB, including basic biological entities and things like users or storage locations.

Let’s use auto-complete to look up cell types:

Show me a screenshot
cell_types = bt.CellType.lookup()
cell_types.effector_t_cell
CellType(uid='3nfZTVV4', name='effector T cell', ontology_id='CL:0000911', synonyms='effector T-cell|effector T-lymphocyte|effector T lymphocyte', description='A Differentiated T Cell With Ability To Traffic To Peripheral Tissues And Is Capable Of Mounting A Specific Immune Response.', updated_at=2023-11-28 22:30:57 UTC, public_source_id=48, created_by_id=1)

You can also arbitrarily chain filters and create lookups from them:

organisms = bt.Organism.lookup()  # species
genes = bt.Gene.filter(organism=organisms.human).lookup()  # ~60k human genes
features = ln.Feature.lookup()  # non-gene features, like `cell_type`, `assay`, etc.
experimental_factors = bt.ExperimentalFactor.lookup()  # labels for experimental factors
tissues = bt.Tissue.lookup()  # tissue labels
ulabels = ln.ULabel.lookup()  # universal labels, e.g. dataset collections
suspension_types = (
    ulabels.is_suspension_type.children.all().lookup()
)  # suspension types

Search & filter metadata#

We can use search & filters for metadata:

bt.CellType.search("effector T cell")
Hide code cell output
uid synonyms score
name
effector T cell 3nfZTVV4 effector T-cell|effector T-lymphocyte|effector... 100.0
ectodermal cell 2rFEBLPn ectoderm cell 71.4
sensory receptor cell 6GkjRSiR receptor cell 71.4
memory T cell 1oa5G2Mq memory T-cell|memory T lymphocyte|memory T-lym... 71.4
helper T cell 43cBCa7s helper T-lymphocyte|T-helper cell|helper T lym... 71.4
excretory cell 5teqLp2U 69.0
secretory cell 4eEkKmdU 69.0
regulatory T cell 6IELBVIu regulatory T lymphocyte|Treg|regulatory T-lymp... 68.8
neurectodermal cell 1eJqfkLq neurectoderm cell 68.8
pro-T cell 4twkhtZN pro-T lymphocyte|progenitor T cell 68.8
Kupffer cell 5fdXwyLs hepatic macrophage|macrophagocytus stellatus|l... 66.7
chemoreceptor cell 6wMTbaYL 66.7
follicular B cell 2EhFTUoZ Fo B cell|follicular B lymphocyte|follicular B... 66.7
bt.CellType.search("CD8-positive cytokine effector T cell")
Hide code cell output
uid synonyms score
name
CD8-positive, alpha-beta cytokine secreting effector T cell 6JD5JCZC CD8-positive, alpha-beta cytokine secreting ef... 77.1
CD4-positive helper T cell 531hEapj CD4-positive T-helper cell|CD4-positive helper... 69.8
CD8-positive, alpha-beta T cell 6IC9NGJE CD8-positive, alpha-beta T-cell|CD8-positive, ... 67.6
CD8-positive, alpha-beta cytotoxic T cell Mv6woHvO CD8-positive, alpha-beta cytotoxic T-cell|CD8-... 66.7
CD8-positive, alpha-beta memory T cell 7MuNkhO9 CD8-positive, alpha-beta memory T lymphocyte|C... 66.7
CD1c-positive myeloid dendritic cell 5fo7nTlc 65.8
Tc1 cell 4AQr9CRo Tc1 T lymphocyte|Tc1 T-cell|Tc1 T cell|T-cytot... 65.5
CD141-positive myeloid dendritic cell CAwwMhIV 64.9
CD4-positive, alpha-beta T cell 4PSMdO3I CD4-positive, alpha-beta T lymphocyte|CD4-posi... 64.7
CD4-positive, alpha-beta cytotoxic T cell 5zRXDnpu CD4-positive, alpha-beta cytotoxic T-cell|CD4-... 64.1

And use a uid to filter exactly one metadata record:

effector_t_cell = bt.CellType.filter(uid="3nfZTVV4").one()
effector_t_cell
CellType(uid='3nfZTVV4', name='effector T cell', ontology_id='CL:0000911', synonyms='effector T-cell|effector T-lymphocyte|effector T lymphocyte', description='A Differentiated T Cell With Ability To Traffic To Peripheral Tissues And Is Capable Of Mounting A Specific Immune Response.', updated_at=2023-11-28 22:30:57 UTC, public_source_id=48, created_by_id=1)

Understand ontologies#

View the related ontology terms:

effector_t_cell.view_parents(distance=2, with_children=True)
_images/6cdfc2f61da5a14e92b8512c8b1af5865ee670a550a55ae2659acf11ebca5fbc.svg

Or access them programmatically:

effector_t_cell.children.df()
uid name ontology_id abbr synonyms description public_source_id created_at updated_at created_by_id
id
931 2VQirdSp effector CD8-positive, alpha-beta T cell CL:0001050 None effector CD8-positive, alpha-beta T lymphocyte... A Cd8-Positive, Alpha-Beta T Cell With The Phe... 48 2023-11-28 22:27:55.565976+00:00 2023-11-28 22:27:55.565981+00:00 1
1088 490Xhb24 effector CD4-positive, alpha-beta T cell CL:0001044 None effector CD4-positive, alpha-beta T lymphocyte... A Cd4-Positive, Alpha-Beta T Cell With The Phe... 48 2023-11-28 22:27:55.569828+00:00 2023-11-28 22:27:55.569832+00:00 1
1229 69TEBGqb exhausted T cell CL:0011025 None Tex cell|An effector T cell that displays impa... None 48 2023-11-28 22:27:55.572880+00:00 2023-11-28 22:27:55.572884+00:00 1
1309 5s4gCMdn cytotoxic T cell CL:0000910 None cytotoxic T lymphocyte|cytotoxic T-lymphocyte|... A Mature T Cell That Differentiated And Acquir... 48 2023-11-28 22:27:55.575440+00:00 2023-11-28 22:27:55.575444+00:00 1
1331 43cBCa7s helper T cell CL:0000912 None helper T-lymphocyte|T-helper cell|helper T lym... A Effector T Cell That Provides Help In The Fo... 48 2023-11-28 22:27:55.575949+00:00 2023-11-28 22:27:55.575955+00:00 1

Query artifacts#

Unlike in the SOMA guide, here, we’ll query sets of h5ad files, which correspond to AnnData objects.

To access them, we query the Collection record that links the latest LTS set of h5ad files:

collection = ln.Collection.filter(name="cellxgene-census", version="2023-07-25").one()
collection
Collection(uid='dMyEX3NTfKOEYXyMKDAQ', name='cellxgene-census', version='2023-07-25', hash='pEJ9uvIeTLvHkZW2TBT5', visibility=1, updated_at=2024-01-30 09:06:05 UTC, transform_id=18, run_id=23, created_by_id=1)

You can get all linked files as a dataframe - there are 850 files in cellxgene-census version 2023-07-25.

collection.artifacts.df().head()  # not tracking run & transform because read-only instance
Hide code cell output
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
1717 6GM0siRkIUISUwpFb54A 2 cell-census/2023-07-25/h5ads/ff12e239-9292-4d2... .h5ad AnnData Pla_HDBR9518710 2023-07-25 21993955 PFgOHr86dGrEOrPELpKvrQ-3 md5-n None 1216.0 11 16 1 False 2023-11-28 21:46:39.659067+00:00 2024-01-24 07:05:41.771446+00:00 1
1613 qnlBU5a6OR46Zkku1gEl 2 cell-census/2023-07-25/h5ads/dfe8e072-94a7-415... .h5ad AnnData WS_PLA_S9101764 2023-07-25 39725919 -2OE3CsXIxegNGOQO7cJVw-5 md5-n None 3568.0 11 16 1 False 2023-11-28 21:46:33.371255+00:00 2024-01-24 07:05:43.770581+00:00 1
1497 w6UNsyZp0eTrl3Uz15Wl 2 cell-census/2023-07-25/h5ads/c1568274-2af2-4cd... .h5ad AnnData WS_PLA_S9101769 2023-07-25 34328778 oaxWjROHsHneVVf7VgcoaA-5 md5-n None 3130.0 11 16 1 False 2023-11-28 21:46:26.369149+00:00 2024-01-24 07:05:45.247888+00:00 1
1434 txst3MWWFtvPGcYOR145 2 cell-census/2023-07-25/h5ads/ab326369-b63c-48d... .h5ad AnnData primary_trophoblast_organoid 2023-07-25 530799215 2i7cuX3r562-mfIK8GYAyA-64 md5-n None 26853.0 11 16 1 False 2023-11-28 21:46:22.575516+00:00 2024-01-24 07:05:45.699596+00:00 1
979 Vd2uGOTQOwleGdFRlEOp 2 cell-census/2023-07-25/h5ads/1fe63353-9e75-482... .h5ad AnnData Heart - A single-cell transcriptomic atlas cha... 2023-07-25 148387585 4keMLCOrr6OP70fPvE0TIQ-18 md5-n None 8613.0 11 16 1 False 2023-11-28 21:45:55.034851+00:00 2024-01-24 07:15:09.559452+00:00 1

You can query across files by arbitrary metadata combinations, for instance:

query = collection.artifacts.filter(
    organism=organisms.human,
    cell_types__in=[cell_types.dendritic_cell, cell_types.neutrophil],
    tissues=tissues.kidney,
    ulabels=suspension_types.cell,
    experimental_factors=experimental_factors.ln_10x_3_v2,
)
query = query.order_by("size")  # order by size
query.df().head()  # convert to DataFrame
Hide code cell output
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
983 WwmBIhBNLTlRcSoBky88 2 cell-census/2023-07-25/h5ads/20d87640-4be8-487... .h5ad AnnData Mature kidney dataset: immune 2023-07-25 44647761 dAApZI2IZr64F5b1jDMgtA-6 md5-n None 7803 11 16 1 False 2023-11-28 21:45:55.276010+00:00 2024-01-24 07:12:02.486527+00:00 1
983 WwmBIhBNLTlRcSoBky88 2 cell-census/2023-07-25/h5ads/20d87640-4be8-487... .h5ad AnnData Mature kidney dataset: immune 2023-07-25 44647761 dAApZI2IZr64F5b1jDMgtA-6 md5-n None 7803 11 16 1 False 2023-11-28 21:45:55.276010+00:00 2024-01-24 07:12:02.486527+00:00 1
1019 gHlQ5Muwu3G9pvFC4GDV 2 cell-census/2023-07-25/h5ads/2d31c0ca-0233-41c... .h5ad AnnData Fetal kidney dataset: immune 2023-07-25 64056560 YjLm7iPkIFIEYimgQEfJSA-8 md5-n None 6847 11 16 1 False 2023-11-28 21:45:57.452209+00:00 2024-01-24 07:12:01.949491+00:00 1
1382 P4Oai3OLGAzRwoicQ5HD 2 cell-census/2023-07-25/h5ads/9ea768a2-87ab-46b... .h5ad AnnData Mature kidney dataset: full 2023-07-25 192484358 odAyLe_6uoRCQV5eJRijqQ-23 md5-n None 40268 11 16 1 False 2023-11-28 21:46:19.442984+00:00 2024-01-24 07:12:01.026486+00:00 1
1382 P4Oai3OLGAzRwoicQ5HD 2 cell-census/2023-07-25/h5ads/9ea768a2-87ab-46b... .h5ad AnnData Mature kidney dataset: full 2023-07-25 192484358 odAyLe_6uoRCQV5eJRijqQ-23 md5-n None 40268 11 16 1 False 2023-11-28 21:46:19.442984+00:00 2024-01-24 07:12:01.026486+00:00 1

Query arrays#

Each file stores an array in form of an annotated data matrix, an AnnData object.

Let’s look at the first array in the file query and show metadata using .describe():

artifact = query.first()
artifact.describe()
Hide code cell output
Artifact(uid='WwmBIhBNLTlRcSoBky88', key='cell-census/2023-07-25/h5ads/20d87640-4be8-487f-93d4-dce38378d00f.h5ad', suffix='.h5ad', accessor='AnnData', description='Mature kidney dataset: immune', version='2023-07-25', size=44647761, hash='dAApZI2IZr64F5b1jDMgtA-6', hash_type='md5-n', n_observations=7803, visibility=1, key_is_virtual=False, updated_at=2024-01-24 07:12:02 UTC)

Provenance:
  🗃️ storage: Storage(uid='oIYGbD74', root='s3://cellxgene-data-public', type='s3', region='us-west-2', updated_at=2023-10-16 15:04:08 UTC, created_by_id=1)
  📔 transform: Transform(uid='pNa7RdI26sp4z8', name='Register files from Census release 2023-07-25', short_name='census-release-2023-07-25', version='0', type='notebook', updated_at=2023-11-29 13:53:43 UTC, latest_report_id=1724, source_code_id=1723, created_by_id=1)
  👣 run: Run(uid='ZYgsnqK5v2hPmFlS0kfG', run_at=2023-11-29 13:52:08 UTC, is_consecutive=False, transform_id=11, created_by_id=1, report_id=1724)
  👤 created_by: User(uid='kmvZDIX9', handle='sunnyosun', name='Sunny Sun', updated_at=2023-12-13 16:23:44 UTC)
  ⬇️ input_of (core.Run): ['2023-11-29 12:51:05 UTC', '2024-01-30 09:03:47 UTC']
Features:
  var: FeatureSet(uid='8AAiWbuUrP2DI1MpuPD0', n=32922, type='number', registry='bionty.Gene', hash='fHMWMViqV_PilN1PWrgF', updated_at=2023-11-29 13:28:55 UTC, created_by_id=1)
    'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None', ...
  obs: FeatureSet(uid='zAQ6WnmIMDLslhfgdIOt', name='obs metadata', n=11, type='category', registry='core.Feature', hash='CFxuf-VqTFbrkbqPHiY-', updated_at=2024-01-24 06:43:51 UTC, created_by_id=1)
    🔗 tissue (5, bionty.Tissue): 'renal medulla', 'kidney blood vessel', 'renal pelvis', 'cortex of kidney', 'kidney'
    🔗 tissue_type (1, core.ULabel): 'tissue'
    🔗 assay (1, bionty.ExperimentalFactor): '10x 3' v2'
    🔗 cell_type (12, bionty.CellType): 'classical monocyte', 'plasmacytoid dendritic cell', 'natural killer cell', 'dendritic cell', 'CD4-positive, alpha-beta T cell', 'mast cell', 'neutrophil', 'non-classical monocyte', 'CD8-positive, alpha-beta T cell', 'B cell', ...
    🔗 development_stage (12, bionty.DevelopmentalStage): '2-year-old human stage', '4-year-old human stage', '12-year-old human stage', '44-year-old human stage', '49-year-old human stage', '53-year-old human stage', '63-year-old human stage', '64-year-old human stage', '67-year-old human stage', '70-year-old human stage', ...
    🔗 disease (1, bionty.Disease): 'normal'
    🔗 donor_id (13, core.ULabel): 'TxK2', 'Wilms1', 'TxK4', 'TTx', 'RCC3', 'RCC1', 'VHL', 'TxK3', 'TxK1', 'Wilms3', ...
    🔗 self_reported_ethnicity (1, bionty.Ethnicity): 'unknown'
    🔗 sex (2, bionty.Phenotype): 'male', 'female'
    🔗 suspension_type (1, core.ULabel): 'cell'
    🔗 organism (1, bionty.Organism): 'human'
Labels:
  🏷️ organism (1, bionty.Organism): 'human'
  🏷️ tissues (5, bionty.Tissue): 'renal medulla', 'kidney blood vessel', 'renal pelvis', 'cortex of kidney', 'kidney'
  🏷️ cell_types (12, bionty.CellType): 'classical monocyte', 'plasmacytoid dendritic cell', 'natural killer cell', 'dendritic cell', 'CD4-positive, alpha-beta T cell', 'mast cell', 'neutrophil', 'non-classical monocyte', 'CD8-positive, alpha-beta T cell', 'B cell', ...
  🏷️ diseases (1, bionty.Disease): 'normal'
  🏷️ phenotypes (2, bionty.Phenotype): 'male', 'female'
  🏷️ experimental_factors (1, bionty.ExperimentalFactor): '10x 3' v2'
  🏷️ developmental_stages (12, bionty.DevelopmentalStage): '2-year-old human stage', '4-year-old human stage', '12-year-old human stage', '44-year-old human stage', '49-year-old human stage', '53-year-old human stage', '63-year-old human stage', '64-year-old human stage', '67-year-old human stage', '70-year-old human stage', ...
  🏷️ ethnicities (1, bionty.Ethnicity): 'unknown'
  🏷️ ulabels (15, core.ULabel): 'TxK2', 'Wilms1', 'TxK4', 'TTx', 'RCC3', 'RCC1', 'VHL', 'TxK3', 'TxK1', 'Wilms3', ...
More ways of accessing metadata

Access just features:

artifact.features

Or get labels given a feature:

artifact.labels.get(features.tissue).df()
artifact.labels.get(features.collection).one()

If you want to query a slice of the array data, you have two options:

  1. Cache & load the entire array into memory via artifact.load() -> AnnData (caches the h5ad on disk, so that you only download once)

  2. Stream the array from the cloud using a cloud-backed accessor artifact.backed() -> AnnDataAccessor

Both options will run much faster if you run them close to the data (AWS S3 on the US West Coast, consider logging into hosted compute there).

1. Cache & load#

Let us first consider option 1:

adata = artifact.load()
adata
Hide code cell output
AnnData object with n_obs × n_vars = 7803 × 32922
    obs: 'donor_id', 'donor_age', 'self_reported_ethnicity_ontology_term_id', 'organism_ontology_term_id', 'sample_uuid', 'tissue_ontology_term_id', 'development_stage_ontology_term_id', 'suspension_uuid', 'suspension_type', 'library_uuid', 'assay_ontology_term_id', 'mapped_reference_annotation', 'is_primary_data', 'cell_type_ontology_term_id', 'author_cell_type', 'disease_ontology_term_id', 'reported_diseases', 'sex_ontology_term_id', 'compartment', 'Experiment', 'Project', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage'
    var: 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype'
    uns: 'default_embedding', 'schema_version', 'title'
    obsm: 'X_umap'

Now we have an AnnData object, which stores observation annotations matching our file-level query in the .obs slot, and we can re-use almost the same query on the array-level:

See the file-level query for comparison
query = collection.files.filter(
    organism=organisms.human,
    cell_types__in=[cell_types.dendritic_cell, cell_types.neutrophil],
    tissues=tissues.kidney,
    ulabels=suspension_types.cell,
    experimental_factors=experimental_factors.ln_10x_3_v2,
)

AnnData uses pandas to manage metadata and the syntax differs slightly. However, the same metadata records are used.

adata_slice = adata[
    adata.obs.cell_type.isin(
        [cell_types.dendritic_cell.name, cell_types.neutrophil.name]
    )
    & (adata.obs.tissue == tissues.kidney.name)
    & (adata.obs.suspension_type == suspension_types.cell.name)
    & (adata.obs.assay == experimental_factors.ln_10x_3_v2.name)
]
adata_slice
Hide code cell output
View of AnnData object with n_obs × n_vars = 199 × 32922
    obs: 'donor_id', 'donor_age', 'self_reported_ethnicity_ontology_term_id', 'organism_ontology_term_id', 'sample_uuid', 'tissue_ontology_term_id', 'development_stage_ontology_term_id', 'suspension_uuid', 'suspension_type', 'library_uuid', 'assay_ontology_term_id', 'mapped_reference_annotation', 'is_primary_data', 'cell_type_ontology_term_id', 'author_cell_type', 'disease_ontology_term_id', 'reported_diseases', 'sex_ontology_term_id', 'compartment', 'Experiment', 'Project', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage'
    var: 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype'
    uns: 'default_embedding', 'schema_version', 'title'
    obsm: 'X_umap'

2. Stream#

Let us now consider option 2:

adata_backed = artifact.backed()
adata_backed
Hide code cell output
AnnDataAccessor object with n_obs × n_vars = 7803 × 32922
  constructed for the AnnData object 20d87640-4be8-487f-93d4-dce38378d00f.h5ad
    obs: ['Experiment', 'Project', '_index', 'assay', 'assay_ontology_term_id', 'author_cell_type', 'cell_type', 'cell_type_ontology_term_id', 'compartment', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_age', 'donor_id', 'is_primary_data', 'library_uuid', 'mapped_reference_annotation', 'organism', 'organism_ontology_term_id', 'reported_diseases', 'sample_uuid', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'suspension_uuid', 'tissue', 'tissue_ontology_term_id']
    obsm: ['X_umap']
    raw: ['X', 'var', 'varm']
    uns: ['default_embedding', 'schema_version', 'title']
    var: ['_index', 'feature_biotype', 'feature_is_filtered', 'feature_name', 'feature_reference']

We now have an AnnDataAccessor object, which behaves much like an AnnData, and the query looks the same:

adata_backed_slice = adata_backed[
    adata_backed.obs.cell_type.isin(
        [cell_types.dendritic_cell.name, cell_types.neutrophil.name]
    )
    & (adata_backed.obs.tissue == tissues.kidney.name)
    & (adata_backed.obs.suspension_type == suspension_types.cell.name)
    & (adata_backed.obs.assay == experimental_factors.ln_10x_3_v2.name)
]

adata_backed_slice.to_memory()
Hide code cell output
AnnData object with n_obs × n_vars = 199 × 32922
    obs: 'donor_id', 'donor_age', 'self_reported_ethnicity_ontology_term_id', 'organism_ontology_term_id', 'sample_uuid', 'tissue_ontology_term_id', 'development_stage_ontology_term_id', 'suspension_uuid', 'suspension_type', 'library_uuid', 'assay_ontology_term_id', 'mapped_reference_annotation', 'is_primary_data', 'cell_type_ontology_term_id', 'author_cell_type', 'disease_ontology_term_id', 'reported_diseases', 'sex_ontology_term_id', 'compartment', 'Experiment', 'Project', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage'
    var: 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype'
    uns: 'default_embedding', 'schema_version', 'title'
    obsm: 'X_umap'

3. Concatenate slices#

If we want to concatenate these individual file-level slices, loop over all files in query and concatenate the results.

How would this look like?
adata_slices = []
for file in query:
    adata_backed = artifact.backed()
    adata_slice = adata_backed[
        adata_backed.obs.cell_type.isin(
            [cell_types.dendritic_cell.name, cell_types.neutrophil.name]
        )
        & (adata_backed.obs.tissue == tissues.kidney.name)
        & (adata_backed.obs.suspension_type == suspension_types.cell.name)
        & (adata_backed.obs.assay == experimental_factors.ln_10x_3_v2.name)
    ]
    adata_slices.append(adata_slice.to_memory())

import anndata as ad

adata_query = ad.concat(adata_slices)

Train an ML model#

See Train a machine learning model on a collection.

Exploring data by collection#

Alternatively,

Let’s search the collections from CELLxGENE:

ln.Collection.search("immune human kidney", limit=10)
uid score
name
Spatiotemporal immune zonation of the human kidney kqiPjpzpK9H9rdtnHWas 55.1
Spatiotemporal immune zonation of the human kidney kqiPjpzpK9H9rdtnV67f 55.1
The integrated Human Lung Cell Atlas FaJmPleTV3HjPBTdyqup 43.6
The integrated Human Lung Cell Atlas FaJmPleTV3HjPBTdFyOZ 43.6
Asian Immune Diversity Atlas (AIDA) ZzUntpjOt8v7Awqdkpja 41.5
Single cell derived mRNA signals across human kidney tumors Yed6da6CsPXaGmLQIFCu 41.0
Single cell derived mRNA signals across human kidney tumors Yed6da6CsPXaGmLQDTBi 41.0
Live Human Microglia Single-cell RNA-seq olY10cghAPIz2oGrQvRC 40.7
Human Brain Cell Atlas v1.0 kDJ9Xb8d11d93LAHLr2V 39.1
Human Brain Cell Atlas v1.0 kDJ9Xb8d11d93LAHZLTC 39.1

Let’s get the record of the top hit collection:

collection = ln.Collection.filter(uid="kqiPjpzpK9H9rdtnHWas").one()

collection
Collection(uid='kqiPjpzpK9H9rdtnHWas', name='Spatiotemporal immune zonation of the human kidney', description='10.1126/science.aat5031', version='2023-07-25', hash='w_VZE7n841ktaA9FjdLh', reference='120e86b4-1195-48c5-845b-b98054105eec', reference_type='CELLxGENE Collection ID', visibility=1, updated_at=2024-01-08 12:01:20 UTC, created_by_id=1)

We see it’s a Science paper and we could find more information using the DOI or CELLxGENE collection id.

Each collection has at least one Artifact file associated to it. Let’s get the associated artifacts:

collection.artifacts.df()
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
1579 11HQaMeIUaOwyHoOjEVN 2 cell-census/2023-07-25/h5ads/d7dcfd8f-2ee7-438... .h5ad AnnData Fetal kidney dataset: full 2023-07-25 341214674 R8-G4h5ztVfX29r58T4g_Q-41 md5-n None 27197 11 16 1 False 2023-11-28 21:46:31.314438+00:00 2024-01-24 07:11:59.856863+00:00 1
1513 6mnZ3SeQFhffr3wTiMEZ 2 cell-census/2023-07-25/h5ads/c52de62a-058d-4d7... .h5ad AnnData Fetal kidney dataset: stroma 2023-07-25 109942751 Pqa4Ln0Xt7xmTN5IMiU4OA-14 md5-n None 8345 11 16 1 False 2023-11-28 21:46:27.340888+00:00 2024-01-24 07:12:00.372957+00:00 1
1382 P4Oai3OLGAzRwoicQ5HD 2 cell-census/2023-07-25/h5ads/9ea768a2-87ab-46b... .h5ad AnnData Mature kidney dataset: full 2023-07-25 192484358 odAyLe_6uoRCQV5eJRijqQ-23 md5-n None 40268 11 16 1 False 2023-11-28 21:46:19.442984+00:00 2024-01-24 07:12:01.026486+00:00 1
1030 USUgRVwrCMquHiImAnnJ 2 cell-census/2023-07-25/h5ads/2fc9c59f-3cfd-48d... .h5ad AnnData Mature kidney dataset: non PT parenchyma 2023-07-25 39294782 rXmzfcuICx72PcUvYHsOiA-5 md5-n None 4620 11 16 1 False 2023-11-28 21:45:58.120307+00:00 2024-01-24 07:12:01.481325+00:00 1
1019 gHlQ5Muwu3G9pvFC4GDV 2 cell-census/2023-07-25/h5ads/2d31c0ca-0233-41c... .h5ad AnnData Fetal kidney dataset: immune 2023-07-25 64056560 YjLm7iPkIFIEYimgQEfJSA-8 md5-n None 6847 11 16 1 False 2023-11-28 21:45:57.452209+00:00 2024-01-24 07:12:01.949491+00:00 1
983 WwmBIhBNLTlRcSoBky88 2 cell-census/2023-07-25/h5ads/20d87640-4be8-487... .h5ad AnnData Mature kidney dataset: immune 2023-07-25 44647761 dAApZI2IZr64F5b1jDMgtA-6 md5-n None 7803 11 16 1 False 2023-11-28 21:45:55.276010+00:00 2024-01-24 07:12:02.486527+00:00 1
906 b2x19Eg28GGSNnXWVa1m 2 cell-census/2023-07-25/h5ads/08073b32-d389-41f... .h5ad AnnData Fetal kidney dataset: nephron 2023-07-25 159545411 e8gqdcJCy_gsp6sZ_8OI7Q-20 md5-n None 10790 11 16 1 False 2023-11-28 21:45:50.629303+00:00 2024-01-24 07:12:02.954217+00:00 1