RxRx: cell imaging#

rxrx.ai hosts high-throughput cell imaging datasets generated by Recursion.

High numbers of fluorescent microscopy images characterize cellular phenotypes in vitro based on morphology and protein expression (5-10 stains) across a range of conditions.

In this guide, you’ll see how to query some of these data using LaminDB: laminlabs/rxrx.
If you’d like to transfer data into your own LaminDB instance, see the transfer guide.
If you’d like to understand how the laminlabs/rxrx instance was curated, see this repository.

Setup#

import lamindb as ln
import bionty as bt
import wetlab as wl

ln.connect("laminlabs/lamindata")

Search & look up metadata#

We’ll find all treatments in the Treatment registry:

df = wl.Treatment.df()
df.shape

(1139, 13)

Let us create a look up object for siRNAs so that we can easily auto-complete queries involving it:

sirnas = wl.Treatment.filter(system="siRNA").lookup(return_field="name")

We’re also interested in features, cell lines & wells:

ln.Feature.df()

	uid	name	type	unit	description	registries	synonyms	created_at	updated_at	created_by_id
id
135	UPnuN18Vro7T	sirna	float	None	None	wetlab.Treatment	None	2023-07-12 12:54:25.605932+00:00	2024-03-26 13:23:37.050138+00:00	2
134	RFz9tVF39RXJ	well_type	float	None	None	core.ULabel	None	2023-07-12 12:54:25.605879+00:00	2024-03-26 13:20:59.284093+00:00	2
132	ghhC57uNYQhD	well	float	None	None	wetlab.Well	None	2023-07-12 12:54:25.605769+00:00	2024-03-26 13:20:57.352241+00:00	2
131	gUecWT2bNsch	plate	float	None	None	core.ULabel	None	2023-07-12 12:54:25.605717+00:00	2024-03-26 13:20:13.255028+00:00	2
303	4ycwa8er0EB2	experiment	category	None	None	core.ULabel\|wetlab.Experiment	None	2023-07-12 12:54:25.605663+00:00	2024-03-26 13:20:11.349207+00:00	2
...	...	...	...	...	...	...	...	...	...	...
5	b1oB0I2Nxx7w	feature_4	float	None	None	None	None	2023-07-12 12:54:24.401456+00:00	2023-10-14 15:42:03.557973+00:00	2
4	qehni2DU75bT	feature_3	float	None	None	None	None	2023-07-12 12:54:24.401441+00:00	2023-10-14 15:42:03.431243+00:00	2
3	cANjhBnEosz7	feature_2	float	None	None	None	None	2023-07-12 12:54:24.401425+00:00	2023-10-14 15:42:03.306655+00:00	2
2	RhHNXlP1jpqi	feature_1	float	None	None	None	None	2023-07-12 12:54:24.401408+00:00	2023-10-14 15:42:03.181750+00:00	2
1	UwWDQLrCTdks	feature_0	float	None	None	None	None	2023-07-12 12:54:24.401373+00:00	2023-10-14 15:42:03.055457+00:00	2

311 rows × 10 columns

cell_lines = bt.CellLine.lookup(return_field="abbr")
wells = wl.Well.lookup(return_field="name")

Load the collection#

This is RxRx1: 125k images for 1138 siRNA perturbation across 4 cell lines reading out 5 stains, image dimension is 512x512x6.

Let us get the corresponding object and some information about it:

collection = ln.Collection.filter(uid="KMEQhAvRQDXLvNTNWlsT").one()
collection.view_lineage()
collection.describe()

The dataset consists in a metadata file and a folder path pointing to the image files:

collection.artifact.load().head()

	site_id	well_id	cell_line	split	experiment	plate	well	site	well_type	sirna	sirna_id	path
0	HEPG2-08_1_B02_1	HEPG2-08_1_B02	HEPG2	test	HEPG2-08	1	B02	1	negative_control	EMPTY	1138	images/test/HEPG2-08/Plate1/B02_s1_w1.png
1	HEPG2-08_1_B02_1	HEPG2-08_1_B02	HEPG2	test	HEPG2-08	1	B02	1	negative_control	EMPTY	1138	images/test/HEPG2-08/Plate1/B02_s1_w2.png
2	HEPG2-08_1_B02_1	HEPG2-08_1_B02	HEPG2	test	HEPG2-08	1	B02	1	negative_control	EMPTY	1138	images/test/HEPG2-08/Plate1/B02_s1_w3.png
3	HEPG2-08_1_B02_1	HEPG2-08_1_B02	HEPG2	test	HEPG2-08	1	B02	1	negative_control	EMPTY	1138	images/test/HEPG2-08/Plate1/B02_s1_w4.png
4	HEPG2-08_1_B02_1	HEPG2-08_1_B02	HEPG2	test	HEPG2-08	1	B02	1	negative_control	EMPTY	1138	images/test/HEPG2-08/Plate1/B02_s1_w5.png

Query image files#

Because we didn’t choose to register each image as a record in the Artifact registry, we have to query the images through the metadata file of the dataset:

# df = collection.artifact.load()

We can query a subset of images using metadata registries & pandas query syntax:

# query = df[
#     (df.cell_line == cell_lines.hep_g2_cell)
#     & (df.sirna == sirnas.s15652)
#     & (df.well == wells.m15)
#     & (df.plate == 1)
#     & (df.site == 2)
# ]
# query

To access the individual images based on this query result:

# images = [artifact.path.parent / key for key in query.path]
# images

Download an image to disk:

# path = UPath(images[1])
# path.download_to(".")

# from IPython.display import Image
# Image(f"./{path.name}")