Jupyter Notebook

Redun#

Here, we’ll see how to track redun workflow runs with LaminDB.

Note

This use case is based on github.com/ricomnl/bioinformatics-pipeline-tutorial.

Setup#

!lamin init --storage .  --name redun-lamin-fasta
Hide code cell output
✅ saved: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-09-26 15:21:39)
✅ saved: Storage(id='yEGjlxac', root='/home/runner/work/redun-lamin-fasta/redun-lamin-fasta/docs', type='local', updated_at=2023-09-26 15:21:39, created_by_id='DzTjkKse')
💡 loaded instance: testuser1/redun-lamin-fasta
💡 did not register local instance on hub (if you want, call `lamin register`)

Register the workflow#

import lamindb as ln
import json
💡 loaded instance: testuser1/redun-lamin-fasta (lamindb 0.54.2)

Register the workflow in the Transform registry:

ln.Transform(
    name="lamin-redun-fasta",
    type="pipeline",
    version="0.1.0",
    reference="https://github.com/laminlabs/redun-lamin-fasta",
).save()
How to amend a redun workflow.py to register input & output files in LaminDB?

To query input files via LaminDB, we added the following lines:

# register input files in lamindb
ln.save(ln.File.from_dir(input_dir))
# query & track this pipeline
transform = ln.Transform.filter(name="lamin-redun-fasta", version="0.1.0").one()
ln.track(transform)
# query input files
input_filepaths = [
    file.stage() for file in ln.File.filter(key__startswith="fasta/")
]

To register the output file via LaminDB, we added the following line to the last task:

ln.File(output_path).save()

Run redun#

Let’s see what the input files are:

!ls ./fasta
KLF4.fasta  MYC.fasta  PO5F1.fasta  SOX2.fasta

And call the workflow:

!redun run workflow.py main --input-dir ./fasta --tag run=test-run  1> redun_stdout.txt 2>redun_stderr.txt

Inspect the output:

!cat redun_stdout.txt
💡 loaded instance: testuser1/redun-lamin-fasta (lamindb 0.54.2)
❗ no run & transform get linked, consider passing a `run` or calling ln.track()
❗ no run & transform get linked, consider passing a `run` or calling ln.track()
❗ no run & transform get linked, consider passing a `run` or calling ln.track()
❗ no run & transform get linked, consider passing a `run` or calling ln.track()
💡 Transform(id='pZAgEF9uyFfxDG', name='lamin-redun-fasta', version='0.1.0', type='pipeline', reference='https://github.com/laminlabs/redun-lamin-fasta', updated_at=2023-09-26 15:21:41, created_by_id='DzTjkKse')
💡 Run(id='rFgycCGE75FgOkW48gkW', run_at=2023-09-26 15:21:46, transform_id='pZAgEF9uyFfxDG', created_by_id='DzTjkKse')
File(path=/home/runner/work/redun-lamin-fasta/redun-lamin-fasta/docs/data/results.tgz, hash=99435ffd)

And the error log:

!tail -1 redun_stderr.txt
2023-09-26 15:21:47,768:INFO - Execution duration: 1.74 seconds

View data lineage:

file = ln.File.filter(key="data/results.tgz").one()  # query by name
file.view_flow()
_images/49040aee901e4c45fd9e5242f3e41a6712b5bde897ffd2a1c21b1cdc93ce28ba.svg

Register the redun execution id#

If we want to be able to query LaminDB for redun execution ID, this here is a way to get it:

# export the run information from redun
!redun log --exec --exec-tag run=test-run --format json --no-pager > redun_exec.json
# load the redun execution id from the JSON and store it in the LaminDB run record
redun_exec = json.load(open("redun_exec.json"))
file.run.reference = redun_exec["id"]
file.run.reference_type = "redun_id"
file.run.save()

View the database content#

ln.view()
File
storage_id key suffix accessor description version size hash hash_type transform_id run_id initial_version_id updated_at created_by_id
id
SH1sHZ8HzZM9RtpPl4CI yEGjlxac data/results.tgz .tgz None None None 83510 xZLyfXAEplMaQTDzDFn9cA md5 pZAgEF9uyFfxDG rFgycCGE75FgOkW48gkW None 2023-09-26 15:21:47 DzTjkKse
XvsABV9ZYKUWes3akuSu yEGjlxac fasta/MYC.fasta .fasta None None None 536 WGbEtzPw-3bQEGcngO_pHQ md5 None None None 2023-09-26 15:21:46 DzTjkKse
pGeH4U2pAO0rmKH7g0tF yEGjlxac fasta/SOX2.fasta .fasta None None None 414 C5q_yaFXGk4SAEpfdqBwnQ md5 None None None 2023-09-26 15:21:46 DzTjkKse
kMjPH9gpMJDKF9UmDw8D yEGjlxac fasta/KLF4.fasta .fasta None None None 609 LyuoYkWs4SgYcH7P7JLJtA md5 None None None 2023-09-26 15:21:46 DzTjkKse
bNvVDWUeZZlwdKWSAVTQ yEGjlxac fasta/PO5F1.fasta .fasta None None None 477 -7iJgveFO9ia0wE1bqVu6g md5 None None None 2023-09-26 15:21:46 DzTjkKse
Run
transform_id run_at created_by_id reference reference_type
id
rFgycCGE75FgOkW48gkW pZAgEF9uyFfxDG 2023-09-26 15:21:46 DzTjkKse dd067ed4-de2a-4525-b0ac-842fd6a80328 redun_id
Storage
root type region updated_at created_by_id
id
yEGjlxac /home/runner/work/redun-lamin-fasta/redun-lami... local None 2023-09-26 15:21:39 DzTjkKse
Transform
name short_name version type reference reference_type initial_version_id updated_at created_by_id
id
pZAgEF9uyFfxDG lamin-redun-fasta None 0.1.0 pipeline https://github.com/laminlabs/redun-lamin-fasta None None 2023-09-26 15:21:47 DzTjkKse
User
handle email name updated_at
id
DzTjkKse testuser1 testuser1@lamin.ai Test User1 2023-09-26 15:21:39

Delete the test instance:

!lamin delete --force redun-lamin-fasta
💡 deleting instance testuser1/redun-lamin-fasta
✅     deleted instance settings file: /home/runner/.lamin/instance--testuser1--redun-lamin-fasta.env
✅     instance cache deleted
✅     deleted '.lndb' sqlite file
❗     consider manually deleting your stored data: /home/runner/work/redun-lamin-fasta/redun-lamin-fasta/docs