Linking flow cytometry data against CellMarker
#
Let us now consider a flow cytometry example dataset:
import lamindb as ln
import lnschema_bionty as bt
ln.track()
ℹ️ Instance: testuser1/mydata
ℹ️ User: testuser2
ℹ️ Added notebook: Transform(id='OWuTtS4SApon', v='0', name='12-flow', type=notebook, title='Linking flow cytometry data against `CellMarker`', created_by='bKeW4T6E', created_at=datetime.datetime(2023, 3, 30, 23, 17, 53))
ℹ️ Added run: Run(id='kXZegnhfZcQNtvYGY6C3', transform_id='OWuTtS4SApon', transform_v='0', created_by='bKeW4T6E', created_at=datetime.datetime(2023, 3, 30, 23, 17, 53))
Show code cell content
filepath = ln.dev.datasets.file_fcs()
filepath
PosixPath('example.fcs')
Configure a CellMarker
reference for parsing features#
Because the file is a standard .fcs
file, ln.File
- under the hood - can parse it using readfcs.
Alternatively, we can load it into memory: AnnData = readfcs.read_fcs(filepath)
.
We’ll use the CellMarker
-ontology based reference to link features:
reference = bt.CellMarker(species="human")
Parse features#
features = ln.Features(filepath, reference)
✅ 14 terms (87.5%) are mapped.
🔶 2 terms (12.5%) are not mapped.
features
Features(id='IV2WkLDyi2WNBMvLrtiG', type='cell_marker', created_by='bKeW4T6E')
features.cell_markers[:3]
[CellMarker(id='CM_CD19', name='CD19', ncbi_gene_id='930', gene_symbol='CD19', gene_name='CD19 molecule', uniprotkb_id='P15391', species_id='NCBI_9606'),
CellMarker(id='CM_CD14', name='CD14', ncbi_gene_id='4695', gene_symbol='CD14', gene_name='NADH:ubiquinone oxidoreductase subunit A2', uniprotkb_id='O43678', species_id='NCBI_9606'),
CellMarker(id='CM_CCR7', name='CCR7', ncbi_gene_id='1236', gene_symbol='CCR7', gene_name='C-C motif chemokine receptor 7', uniprotkb_id='P32248', species_id='NCBI_9606')]
Track data with features (cell markers)#
file = ln.File(filepath, features=features);
ln.add(file)
File(id='tsjjl9Tl837wKMUmxkXC', name='example', suffix='.fcs', size=6785467, hash='KCEXRahJ-Ui9Y6nksQ8z1A', source_id='kXZegnhfZcQNtvYGY6C3', storage_id='8Pj12JLb', created_at=datetime.datetime(2023, 3, 30, 23, 17, 57))
Querying data by features#
We can now query datasets by cell markers:
files = (
ln.select(ln.File)
.join(ln.File.features)
.join(ln.Features.cell_markers)
.where(bt.CellMarker.gene_symbol == "CD14")
)
files.df().head()
name | suffix | size | hash | source_id | storage_id | created_at | updated_at | |
---|---|---|---|---|---|---|---|---|
id | ||||||||
tsjjl9Tl837wKMUmxkXC | example | .fcs | 6785467 | KCEXRahJ-Ui9Y6nksQ8z1A | kXZegnhfZcQNtvYGY6C3 | 8Pj12JLb | 2023-03-30 23:17:57 | None |