Jupyter Notebook

Redun#

Here, we’ll see how to track redun workflow runs with LaminDB.

Note

This use case is based on github.com/ricomnl/bioinformatics-pipeline-tutorial.

Setup#

!lamin init --storage .  --name redun-lamin-fasta
Hide code cell output
✅ saved: User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at=2024-03-04 13:54:33 UTC)
✅ saved: Storage(uid='hbefSKd0', root='/home/runner/work/redun-lamin-fasta/redun-lamin-fasta/docs', type='local', updated_at=2024-03-04 13:54:33 UTC, created_by_id=1)
💡 loaded instance: testuser1/redun-lamin-fasta
💡 did not register local instance on lamin.ai

Register the workflow#

import lamindb as ln
import json
💡 lamindb instance: testuser1/redun-lamin-fasta

Register the workflow in the Transform registry:

ln.Transform(
    name="lamin-redun-fasta",
    type="pipeline",
    version="0.1.0",
    reference="https://github.com/laminlabs/redun-lamin-fasta",
).save()
How to amend a redun workflow.py to register input & output files in LaminDB?

To query input files via LaminDB, we added the following lines:

# register input files in lamindb
ln.save(ln.Artifact.from_dir(input_dir))
# query & track this pipeline
transform = ln.Transform.filter(name="lamin-redun-fasta", version="0.1.0").one()
ln.track(transform)
# query input files
input_filepaths = [
    file.stage() for file in ln.Artifact.filter(key__startswith="fasta/")
]

To register the output file via LaminDB, we added the following line to the last task:

ln.Artifact(output_path).save()

Run redun#

Let’s see what the input files are:

!ls ./fasta
KLF4.fasta  MYC.fasta  PO5F1.fasta  SOX2.fasta

And call the workflow:

!redun run workflow.py main --input-dir ./fasta --tag run=test-run  1> redun_stdout.txt 2>redun_stderr.txt

Inspect the output:

!cat redun_stdout.txt
💡 lamindb instance: testuser1/redun-lamin-fasta
❗ this creates one artifact per file in the directory - you might simply call ln.Artifact(dir) to get one artifact for the entire directory
❗ no run & transform get linked, consider passing a `run` or calling ln.track()
❗ no run & transform get linked, consider passing a `run` or calling ln.track()
❗ no run & transform get linked, consider passing a `run` or calling ln.track()
❗ no run & transform get linked, consider passing a `run` or calling ln.track()
💡 loaded: Transform(uid='LfeqokOTMp2O5tMk', name='lamin-redun-fasta', version='0.1.0', type='pipeline', reference='https://github.com/laminlabs/redun-lamin-fasta', updated_at=2024-03-04 13:54:34 UTC, created_by_id=1)
💡 saved: Run(uid='fm2u2O3tOZjooUazkL2v', run_at=2024-03-04 13:54:48 UTC, transform_id=1, created_by_id=1)
File(path=/home/runner/work/redun-lamin-fasta/redun-lamin-fasta/docs/data/results.tgz, hash=ba82929a)

And the error log:

!tail -1 redun_stderr.txt
2024-03-04 13:54:49,963:INFO - Execution duration: 2.00 seconds

View data lineage:

artifact = ln.Artifact.filter(key="data/results.tgz").one()  # query by name
artifact.view_lineage()
_images/2c815062b4e83a3ea2bd4e96872ac14b8952c93aa9d24e457476a3a4fc7b4acd.svg

Register the redun execution id#

If we want to be able to query LaminDB for redun execution ID, this here is a way to get it:

# export the run information from redun
!redun log --exec --exec-tag run=test-run --format json --no-pager > redun_exec.json
# load the redun execution id from the JSON and store it in the LaminDB run record
redun_exec = json.load(open("redun_exec.json"))
artifact.run.reference = redun_exec["id"]
artifact.run.reference_type = "redun_id"
artifact.run.save()

View the database content#

ln.view()
Artifact
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
5 Ot4a58zU1sYtcot0k7cK 1 data/results.tgz .tgz None None None 83702 v8UqGGktM-_nDhG_hK1TCA md5 None None 1.0 1.0 1 False 2024-03-04 13:54:49.905090+00:00 2024-03-04 13:54:49.905125+00:00 1
4 tE7KZbY5v3lwPOHbmr09 1 fasta/KLF4.fasta .fasta None None None 609 LyuoYkWs4SgYcH7P7JLJtA md5 None None NaN NaN 1 False 2024-03-04 13:54:48.001609+00:00 2024-03-04 13:54:48.001629+00:00 1
3 NlAJbU5LKCNMijjJN06N 1 fasta/PO5F1.fasta .fasta None None None 477 -7iJgveFO9ia0wE1bqVu6g md5 None None NaN NaN 1 False 2024-03-04 13:54:48.001038+00:00 2024-03-04 13:54:48.001060+00:00 1
2 A2yqvFCDQT8j0EzLxclV 1 fasta/SOX2.fasta .fasta None None None 414 C5q_yaFXGk4SAEpfdqBwnQ md5 None None NaN NaN 1 False 2024-03-04 13:54:48.000260+00:00 2024-03-04 13:54:48.000283+00:00 1
1 o4qkhFtiYeGkjoMZnEQR 1 fasta/MYC.fasta .fasta None None None 536 WGbEtzPw-3bQEGcngO_pHQ md5 None None NaN NaN 1 False 2024-03-04 13:54:47.999124+00:00 2024-03-04 13:54:47.999156+00:00 1
Run
uid transform_id run_at created_by_id report_id environment_id is_consecutive reference reference_type created_at
id
1 fm2u2O3tOZjooUazkL2v 1 2024-03-04 13:54:48.006654+00:00 1 None None None 52d1a3a2-1700-4f53-b27f-809bdc75dbdf redun_id 2024-03-04 13:54:48.006768+00:00
Storage
uid root description type region created_at updated_at created_by_id
id
1 hbefSKd0 /home/runner/work/redun-lamin-fasta/redun-lami... None local None 2024-03-04 13:54:33.293272+00:00 2024-03-04 13:54:33.293295+00:00 1
Transform
uid name short_name version type latest_report_id source_code_id reference reference_type created_at updated_at created_by_id
id
1 LfeqokOTMp2O5tMk lamin-redun-fasta None 0.1.0 pipeline None None https://github.com/laminlabs/redun-lamin-fasta None 2024-03-04 13:54:34.822971+00:00 2024-03-04 13:54:34.823002+00:00 1
User
uid handle name created_at updated_at
id
1 DzTjkKse testuser1 Test User1 2024-03-04 13:54:33.288741+00:00 2024-03-04 13:54:33.288767+00:00

Delete the test instance:

!lamin delete --force redun-lamin-fasta
💡 deleting instance testuser1/redun-lamin-fasta
✅     deleted instance settings file: /home/runner/.lamin/instance--testuser1--redun-lamin-fasta.env
✅     instance cache deleted
✅     deleted '.lndb' sqlite file
❗     consider manually deleting your stored data: /home/runner/work/redun-lamin-fasta/redun-lamin-fasta/docs