Linking flow cytometry data against CellMarker#

Let us now consider a flow cytometry example dataset:

import lamindb as ln
import lnschema_bionty as bt

ln.track()
ℹ️ Instance: testuser1/mydata
ℹ️ User: testuser2
ℹ️ Added notebook: Transform(id='OWuTtS4SApon', v='0', name='12-flow', type=notebook, title='Linking flow cytometry data against `CellMarker`', created_by='bKeW4T6E', created_at=datetime.datetime(2023, 3, 30, 23, 17, 53))
ℹ️ Added run: Run(id='kXZegnhfZcQNtvYGY6C3', transform_id='OWuTtS4SApon', transform_v='0', created_by='bKeW4T6E', created_at=datetime.datetime(2023, 3, 30, 23, 17, 53))
Hide code cell content
filepath = ln.dev.datasets.file_fcs()
filepath
PosixPath('example.fcs')

Configure a CellMarker reference for parsing features#

Because the file is a standard .fcs file, ln.File - under the hood - can parse it using readfcs.

Alternatively, we can load it into memory: AnnData = readfcs.read_fcs(filepath).

We’ll use the CellMarker-ontology based reference to link features:

reference = bt.CellMarker(species="human")

Parse features#

features = ln.Features(filepath, reference)
✅ 14 terms (87.5%) are mapped.
🔶 2 terms (12.5%) are not mapped.
features
Features(id='IV2WkLDyi2WNBMvLrtiG', type='cell_marker', created_by='bKeW4T6E')
features.cell_markers[:3]
[CellMarker(id='CM_CD19', name='CD19', ncbi_gene_id='930', gene_symbol='CD19', gene_name='CD19 molecule', uniprotkb_id='P15391', species_id='NCBI_9606'),
 CellMarker(id='CM_CD14', name='CD14', ncbi_gene_id='4695', gene_symbol='CD14', gene_name='NADH:ubiquinone oxidoreductase subunit A2', uniprotkb_id='O43678', species_id='NCBI_9606'),
 CellMarker(id='CM_CCR7', name='CCR7', ncbi_gene_id='1236', gene_symbol='CCR7', gene_name='C-C motif chemokine receptor 7', uniprotkb_id='P32248', species_id='NCBI_9606')]

Track data with features (cell markers)#

See also

Basic queries:

file = ln.File(filepath, features=features);
ln.add(file)
File(id='tsjjl9Tl837wKMUmxkXC', name='example', suffix='.fcs', size=6785467, hash='KCEXRahJ-Ui9Y6nksQ8z1A', source_id='kXZegnhfZcQNtvYGY6C3', storage_id='8Pj12JLb', created_at=datetime.datetime(2023, 3, 30, 23, 17, 57))

Querying data by features#

We can now query datasets by cell markers:

files = (
    ln.select(ln.File)
    .join(ln.File.features)
    .join(ln.Features.cell_markers)
    .where(bt.CellMarker.gene_symbol == "CD14")
)
files.df().head()
name suffix size hash source_id storage_id created_at updated_at
id
tsjjl9Tl837wKMUmxkXC example .fcs 6785467 KCEXRahJ-Ui9Y6nksQ8z1A kXZegnhfZcQNtvYGY6C3 8Pj12JLb 2023-03-30 23:17:57 None