Nextflow#

Nextflow is a workflow management system used for executing scientific workflows across platforms scalably, portably, and reproducibly.

Here, we’ll run a demo of the microscopy pipeline mcmicro to correct uneven illumination. Reference

Note

This notebook serves as a demo for Python scripting that you could run before and after Nextflow runs.

Typically, you’d run the workflows from the command line or Nextflow Tower and register data from a Python script and not necessarily a notebook.

Setup#

Let’s load an instance that already has example data..

!lamin load nextflow-mcmicro
Hide code cell output
2023-12-12 11:59:10,357:INFO - HTTP Request: GET https://hub.lamin.ai/rest/v1/instance?select=%2A%2C%20account%21inner%21fk_instance_account_id_account%28%2A%29&account.handle=eq.testuser1&name=eq.nextflow-mcmicro "HTTP/1.1 200 OK"
2023-12-12 11:59:10,413:INFO - HTTP Request: GET https://hub.lamin.ai/rest/v1/account?select=%2A&handle=eq.testuser1 "HTTP/1.1 200 OK"
2023-12-12 11:59:10,469:INFO - HTTP Request: GET https://hub.lamin.ai/rest/v1/instance?select=%2A&account_id=eq.29cff183-c34d-445f-b6cf-31fb3b566158&name=eq.nextflow-mcmicro "HTTP/1.1 200 OK"
💡 found cached instance metadata: /home/runner/.lamin/instance--testuser1--nextflow-mcmicro.env
💡 loaded instance: testuser1/nextflow-mcmicro
import lamindb as ln
from subprocess import getoutput
💡 lamindb instance: testuser1/nextflow-mcmicro

Track the Nextflow run#

Track the Nextflow workflow & run:

transform = ln.Transform(
    name="mcmicro",
    version="1.0.0",
    type="pipeline",
    reference="https://github.com/labsyspharm/mcmicro",
)
ln.track(transform)
# grab the run of the global run context
run = ln.dev.run_context.run
💡 saved: Transform(uid='1wMHZ0TpEytcvj', name='mcmicro', version='1.0.0', type='pipeline', reference='https://github.com/labsyspharm/mcmicro', updated_at=2023-12-12 11:59:12 UTC, created_by_id=1)
💡 saved: Run(uid='4CpfEVfiEQlGaN2sE0pg', run_at=2023-12-12 11:59:12 UTC, transform_id=2, created_by_id=1)

If we now stage input files, they’ll be tracked as inputs for the global run:

# we fetch example input files that are already stored in the instance
mcmicro_input = ln.Artifact.filter(key__startswith="exemplar-001")
input_paths = [input_fastq.stage() for input_fastq in mcmicro_input]

Run the nextflow pipeline:

!nextflow run https://github.com/labsyspharm/mcmicro --in exemplar-001 --start-at illumination --stop-at registration -name "lamin_{run.uid}"
Hide code cell output
N E X T F L O W  ~  version 23.10.0
Launching `https://github.com/labsyspharm/mcmicro` [lamin_4CpfEVfiEQlGaN2sE0pg] DSL2 - revision: 049baedbe8 [master]
[45/fb6429] Submitted process > illumination (2)
[6b/e0ff87] Submitted process > illumination (3)
[d3/bd16e1] Submitted process > illumination (1)
[78/1bba26] Submitted process > registration:ashlar

Here, we passed the LaminDB run id to nextflow so that we can query it from within nextflow.

Register outputs#

output = ln.Artifact("exemplar-001/registration/exemplar-001.ome.tif")
output.save()
❗ file has more than one suffix (path.suffixes), using only last suffix: '.tif' - if you want your file format to be recognized, make an issue: https://github.com/laminlabs/lamindb/issues/new

Track Nextflow ID#

Let us look at the nextflow logs:

!nextflow log
TIMESTAMP          	DURATION	RUN NAME                  	STATUS	REVISION ID	SESSION ID                          	COMMAND                                                                                                                                                
2023-12-12 11:59:02	5.1s    	irreverent_volhard        	OK    	049baedbe8 	aa75b235-0430-43bc-8dd3-c402e2770326	nextflow run labsyspharm/mcmicro/exemplar.nf --name exemplar-001                                                                                       
2023-12-12 11:59:15	1m 43s  	lamin_4CpfEVfiEQlGaN2sE0pg	OK    	049baedbe8 	40e0b3af-7534-4ba5-823e-0fdb8ea3e412	nextflow run 'https://github.com/labsyspharm/mcmicro' --in exemplar-001 --start-at illumination --stop-at registration -name lamin_4CpfEVfiEQlGaN2sE0pg

Let us add the information about the session ID to our run record:

nextflow_id = getoutput(f"nextflow log | awk '/{run.id}/{{print $8}}'")
run.reference = nextflow_id
run.reference_type = "nextflow_id"
run.save()

Data lineage#

View data lineage:

output.view_flow()
_images/5130089de190903e2a57a82536d9c85f6e472f835eba75bdcb2e847e72647a0e.svg

View the database content:

ln.view()
Artifact
uid storage_id key suffix accessor description version size hash hash_type transform_id run_id initial_version_id visibility key_is_virtual updated_at created_by_id
id
11 yOMveO4HNooYv4OVluoc 1 exemplar-001/registration/exemplar-001.ome.tif .tif None None None 427834084 1cfP16_ovpQ9MOsyyhZVjh sha1-fl 2 2 None 1 False 2023-12-12 12:00:58.365075+00:00 1
10 avaZg26Bm0CBEFyhbZJl 1 exemplar-001/raw/exemplar-001-cycle-08.ome.tiff .tiff None None None 66638496 QA47PmEBICqvn7QGPVeqwO sha1-fl 1 1 None 1 False 2023-12-12 11:59:07.788181+00:00 1
9 pDVHmCdRO6vO0kYUJFSA 1 exemplar-001/raw/exemplar-001-cycle-06.ome.tiff .tiff None None None 66638705 2l1yiKDiRNGVSDd1oNIJL2 sha1-fl 1 1 None 1 False 2023-12-12 11:59:07.787770+00:00 1
8 BkvbMAGxI3Km9AOlS2nf 1 exemplar-001/raw/exemplar-001-cycle-07.ome.tiff .tiff None None None 66638686 nvJOq4dPjuWc-dwhb9YwVb sha1-fl 1 1 None 1 False 2023-12-12 11:59:07.787363+00:00 1
7 2MwjRyDLNV2H6sKKK1fu 1 exemplar-001/illumination/exemplar-001-cycle-0... .tif None None None 22119019 Yw4DJkg2QQ7ez4j2_qWN_Q md5 1 1 None 1 False 2023-12-12 11:59:07.786952+00:00 1
6 OruRzbfnH0DrBouYKuRI 1 exemplar-001/illumination/exemplar-001-cycle-0... .tif None None None 22119019 idW8uRMTLfXNJHnboZy8GQ md5 1 1 None 1 False 2023-12-12 11:59:07.786527+00:00 1
5 59OyeefuiqbzYKFWSMav 1 exemplar-001/illumination/exemplar-001-cycle-0... .tif None None None 22119019 qpmIHKbuxwe2sE_rdcPqfA md5 1 1 None 1 False 2023-12-12 11:59:07.786075+00:00 1
Run
uid transform_id run_at created_by_id report_id is_consecutive reference reference_type
id
1 S3omjGXoU9x8MKtrDG7y 1 2023-12-12 11:58:52.934498+00:00 1 None None https://github.com/nf-core/mcmicro url
2 4CpfEVfiEQlGaN2sE0pg 2 2023-12-12 11:59:12.235257+00:00 1 None None nextflow\n40e0b3af-7534-4ba5-823e-0fdb8ea3e412 nextflow_id
Storage
uid root type region updated_at created_by_id
id
1 vtZUJZ8D /home/runner/work/nextflow-lamin-usecases/next... local None 2023-12-12 11:58:51.661689+00:00 1
Transform
uid name short_name version type latest_report_id source_code_id reference reference_type initial_version_id updated_at created_by_id
id
2 1wMHZ0TpEytcvj mcmicro None 1.0.0 pipeline None None https://github.com/labsyspharm/mcmicro None None 2023-12-12 11:59:12.227290+00:00 1
1 KCBIgXAWpOXaLO Download None None notebook None None None None None 2023-12-12 11:58:52.930165+00:00 1
User
uid handle name updated_at
id
1 DzTjkKse testuser1 Test User1 2023-12-12 11:58:51.657620+00:00

Clean up the test instance:

!lamin delete --force nextflow-mcmicro
Hide code cell output
💡 deleting instance testuser1/nextflow-mcmicro
✅     deleted instance settings file: /home/runner/.lamin/instance--testuser1--nextflow-mcmicro.env
✅     instance cache deleted
✅     deleted '.lndb' sqlite file
❗     consider manually deleting your stored data: /home/runner/work/nextflow-lamin-usecases/nextflow-lamin-usecases/docs

If you are interested in registering bulk RNA-seq data with Lamin, you can have a look at our nf-core/rnaseq example.