Jupyter Notebook

Redun#

Here, we’ll see how to track redun workflow runs with LaminDB.

Note

This use case is based on github.com/ricomnl/bioinformatics-pipeline-tutorial.

Setup#

!lamin init --storage .  --name redun-lamin-fasta
Hide code cell output
💡 connected lamindb: testuser1/redun-lamin-fasta

Register the workflow#

import lamindb as ln
import json
💡 connected lamindb: testuser1/redun-lamin-fasta

Register the workflow in the Transform registry:

ln.Transform(
    name="lamin-redun-fasta",
    type="pipeline",
    version="0.1.0",
    reference="https://github.com/laminlabs/redun-lamin-fasta",
).save()
Transform(uid='salPyO83O2dRxYr3', name='lamin-redun-fasta', version='0.1.0', type='pipeline', reference='https://github.com/laminlabs/redun-lamin-fasta', updated_at=2024-05-01 18:50:14 UTC, created_by_id=1)
How to amend a redun workflow.py to register input & output files in LaminDB?

To query input files via LaminDB, we added the following lines:

# register input files in lamindb
ln.save(ln.Artifact.from_dir(input_dir))
# query & track this pipeline
transform = ln.Transform.filter(name="lamin-redun-fasta", version="0.1.0").one()
ln.track(transform=transform)
# query input files
input_filepaths = [
    file.stage() for file in ln.Artifact.filter(key__startswith="fasta/")
]

To register the output file via LaminDB, we added the following line to the last task:

ln.Artifact(output_path).save()

Run redun#

Let’s see what the input files are:

!ls ./fasta
KLF4.fasta  MYC.fasta  PO5F1.fasta  SOX2.fasta

And call the workflow:

!redun run workflow.py main --input-dir ./fasta --tag run=test-run  1> redun_stdout.txt 2>redun_stderr.txt

Inspect the output:

!cat redun_stdout.txt
💡 connected lamindb: testuser1/redun-lamin-fasta
❗ this creates one artifact per file in the directory - you might simply call ln.Artifact(dir) to get one artifact for the entire directory
❗ no run & transform get linked, consider calling ln.track()
❗ no run & transform get linked, consider calling ln.track()
❗ no run & transform get linked, consider calling ln.track()
❗ no run & transform get linked, consider calling ln.track()
💡 loaded: Transform(uid='salPyO83O2dRxYr3', name='lamin-redun-fasta', version='0.1.0', type='pipeline', reference='https://github.com/laminlabs/redun-lamin-fasta', updated_at=2024-05-01 18:50:14 UTC, created_by_id=1)
💡 saved: Run(uid='xf99FUoCijyXUxELh2In', transform_id=1, created_by_id=1)
File(path=/home/runner/work/redun-lamin-fasta/redun-lamin-fasta/docs/data/results.tgz, hash=124ab42c)

And the error log:

!tail -1 redun_stderr.txt
[redun] Execution duration: 2.05 seconds

View data lineage:

artifact = ln.Artifact.filter(key="data/results.tgz").one()  # query by name
artifact.view_lineage()
_images/80b98f39a38382f2c5cd9e6e424c3f60e9946b1e17d3bdc54a0b54d35836147d.svg

Register the redun execution id#

If we want to be able to query LaminDB for redun execution ID, this here is a way to get it:

# export the run information from redun
!redun log --exec --exec-tag run=test-run --format json --no-pager > redun_exec.json
# load the redun execution id from the JSON and store it in the LaminDB run record
redun_exec = json.load(open("redun_exec.json"))
artifact.run.reference = redun_exec["id"]
artifact.run.reference_type = "redun_id"
artifact.run.save()
Run(uid='xf99FUoCijyXUxELh2In', started_at=2024-05-01 18:50:18 UTC, reference='a3496344-4278-446c-a958-6ed9adf3467b', reference_type='redun_id', transform_id=1, created_by_id=1)

View the database content#

ln.view()
Artifact
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
5 0uoaYughGjhB1IhjXjcA 1 data/results.tgz .tgz None None None 83761 2spQe47CPa3IF2m3l9saag md5 None None 1.0 1.0 1 False 2024-05-01 18:50:20.322037+00:00 2024-05-01 18:50:20.322075+00:00 1
4 O2fVeND4d6aQhR83ZxaS 1 fasta/SOX2.fasta .fasta None None None 414 C5q_yaFXGk4SAEpfdqBwnQ md5 None None NaN NaN 1 False 2024-05-01 18:50:18.369007+00:00 2024-05-01 18:50:18.369030+00:00 1
3 WNahnRHhvirQGAQkmJP1 1 fasta/MYC.fasta .fasta None None None 536 WGbEtzPw-3bQEGcngO_pHQ md5 None None NaN NaN 1 False 2024-05-01 18:50:18.368418+00:00 2024-05-01 18:50:18.368441+00:00 1
2 OZHZC6xQQbIUqf0BWFDN 1 fasta/KLF4.fasta .fasta None None None 609 LyuoYkWs4SgYcH7P7JLJtA md5 None None NaN NaN 1 False 2024-05-01 18:50:18.367661+00:00 2024-05-01 18:50:18.367685+00:00 1
1 geRxHhFts9MKKvbOz54L 1 fasta/PO5F1.fasta .fasta None None None 477 -7iJgveFO9ia0wE1bqVu6g md5 None None NaN NaN 1 False 2024-05-01 18:50:18.366452+00:00 2024-05-01 18:50:18.366487+00:00 1
Run
uid transform_id started_at finished_at created_by_id json report_id environment_id is_consecutive reference reference_type created_at
id
1 xf99FUoCijyXUxELh2In 1 2024-05-01 18:50:18.374249+00:00 None 1 None None None None a3496344-4278-446c-a958-6ed9adf3467b redun_id 2024-05-01 18:50:18.374384+00:00
Storage
uid root description type region instance_uid created_at updated_at created_by_id
id
1 yO09IJOQw3di /home/runner/work/redun-lamin-fasta/redun-lami... None local None 8SgWe7slTFKk 2024-05-01 18:50:13.129498+00:00 2024-05-01 18:50:13.129524+00:00 1
Transform
uid name key version description type latest_report_id source_code_id reference reference_type created_at updated_at created_by_id
id
1 salPyO83O2dRxYr3 lamin-redun-fasta None 0.1.0 None pipeline None None https://github.com/laminlabs/redun-lamin-fasta None 2024-05-01 18:50:14.666261+00:00 2024-05-01 18:50:14.666293+00:00 1
User
uid handle name created_at updated_at
id
1 DzTjkKse testuser1 Test User1 2024-05-01 18:50:13.124749+00:00 2024-05-01 18:50:13.124780+00:00

Delete the test instance:

!lamin delete --force redun-lamin-fasta
💡 deleting instance testuser1/redun-lamin-fasta
💡 not deleting instance from hub as instance not found there
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.14/x64/bin/lamin", line 8, in <module>
    sys.exit(main())
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/rich_click/rich_command.py", line 360, in __call__
    return super().__call__(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/lamin_cli/__main__.py", line 103, in delete
    return delete(instance, force=force)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/lamindb_setup/_delete.py", line 176, in delete
    isettings.storage.root.rmdir()
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/pathlib.py", line 1215, in rmdir
    self._accessor.rmdir(self)
OSError: [Errno 39] Directory not empty: '/home/runner/work/redun-lamin-fasta/redun-lamin-fasta/docs'