Track the redun run#

LaminDB only tracks the inputs, outputs and the execution ID of any redun run.

More finegrained information is tracked by redun.

import lamindb as ln
import lamindb.schema as lns
import os
import json

ln.nb.header()
authorTest User1 (testuser1)
idCKhlMCFA52oD
version0
time_init2022-11-13 21:36
time_run2023-03-09 17:02
pypackagelamindb==0.30.3
ℹ️ Instance: testuser1/fasta
ℹ️ Added notebook: CKhlMCFA52oD v0
ℹ️ Added run: tP9qjkt7COobBAKeBCTY

Query input data and create a run#

This is an artificial situation in which we pretend our input data comes from a notebook with ID 0ymQDuqM5Lwq:

input_dobjects = (
    ln.select(ln.DObject).join(lns.Run).join(lns.Notebook, id="0ymQDuqM5Lwq")
)

input_dobjects.df()
name suffix size hash source_id storage_id created_at updated_at
id
KqkKXJ457k00kO010gve MYC .fasta 535 yT6x3fflTrhfifJIZqpNGQ NsKuQJ3EYpITaGUS03jt 23mKzOkS 2023-03-09 17:02:39 None
dsgk069xk49S3Dc2RcqE PO5F1 .fasta 476 q-HnUbqF7Z5qCeTUv9xDig NsKuQJ3EYpITaGUS03jt 23mKzOkS 2023-03-09 17:02:39 None
Ed25tCL5xDLDAChkBJ7x SOX2 .fasta 413 rhr_rPQ9hb3bDGAIDNJj4Q NsKuQJ3EYpITaGUS03jt 23mKzOkS 2023-03-09 17:02:39 None
iM7uwupZurkXw1us8Fsg KLF4 .fasta 608 52g58tOwGbdCohjFCkfcpA NsKuQJ3EYpITaGUS03jt 23mKzOkS 2023-03-09 17:02:39 None

Let’s create a LaminDB run record and link it agains the input files:

pipeline = ln.select(lns.Pipeline, name="lamin-redun-fasta").one()  # load a pipeline
run = lns.Run(
    name="Test run", pipeline=pipeline
)  # create a run record that is linked against the pipeline
run.inputs = input_dobjects.all()  # link inputs to run
run = ln.add(run)  # add run to DB
run
Run(id='1Sn4xFNvlGCiPVD95aIk', name='Test run', pipeline_id='R8QwchFP', pipeline_v='0.1.0', created_by='DzTjkKse', created_at=datetime.datetime(2023, 3, 9, 17, 2, 44))

Execute redun#

In order to pass it to the redun CLI, export the run.id as an env variable:

os.environ["LNDB_RUN_ID"] = run.id

Let us now call the workflow (tagging with the run.id is optional but ensures convenient query via redun log later):

!redun run workflow.py main --run-id $LNDB_RUN_ID --tag run-id=$LNDB_RUN_ID  1> redun_stdout.txt 2>redun_stderr.txt

Inspect the output:

!cat redun_stdout.txt
File(path=/home/runner/work/redun-lamin-fasta/redun-lamin-fasta/docs/guide/data/results.tgz, hash=0d9727b7)
!tail -1 redun_stderr.txt
2023-03-09 17:02:50,732:INFO - Execution duration: 1.74 seconds
!redun log --exec --exec-tag run-id=$LNDB_RUN_ID --format json --no-pager > redun_exec.json
redun_exec = json.load(open("redun_exec.json"))

redun_exec
{'_type': 'Execution',
 '_version': 1,
 'args': '["/home/runner/work/redun-lamin-fasta/redun-lamin-fasta/.nox/build-3-9/bin/redun", "run", "workflow.py", "main", "--run-id", "1Sn4xFNvlGCiPVD95aIk", "--tag", "run-id=1Sn4xFNvlGCiPVD95aIk"]',
 'id': '97b0ca53-4b09-470c-b97f-085463abd45d',
 'job_id': 'f6263b60-03fc-4361-a0a2-57e1889889b0'}

Track redun outputs and execution ID#

run = ln.select(lns.Run, id=run.id).one()
run.external_id = redun_exec["id"]
ln.add(run)
Run(id='1Sn4xFNvlGCiPVD95aIk', name='Test run', external_id='97b0ca53-4b09-470c-b97f-085463abd45d', pipeline_id='R8QwchFP', pipeline_v='0.1.0', created_by='DzTjkKse', created_at=datetime.datetime(2023, 3, 9, 17, 2, 44))

There is just a single output file to track, here:

dobject = ln.DObject(data="data/results.tgz", source=run)
ln.add(dobject)
DObject(id='7Yq6PwJoF6OmnyGUrHWw', name='results', suffix='.tgz', size=83769, hash='-2S5ssheWzZo5ykRz5-K8g', source_id='1Sn4xFNvlGCiPVD95aIk', storage_id='23mKzOkS', created_at=datetime.datetime(2023, 3, 9, 17, 2, 53))

View the database content#

ln.view()
****************
* module: core *
****************
DObject
name suffix size hash source_id storage_id created_at updated_at
id
KqkKXJ457k00kO010gve MYC .fasta 535 yT6x3fflTrhfifJIZqpNGQ NsKuQJ3EYpITaGUS03jt 23mKzOkS 2023-03-09 17:02:39 None
dsgk069xk49S3Dc2RcqE PO5F1 .fasta 476 q-HnUbqF7Z5qCeTUv9xDig NsKuQJ3EYpITaGUS03jt 23mKzOkS 2023-03-09 17:02:39 None
Ed25tCL5xDLDAChkBJ7x SOX2 .fasta 413 rhr_rPQ9hb3bDGAIDNJj4Q NsKuQJ3EYpITaGUS03jt 23mKzOkS 2023-03-09 17:02:39 None
iM7uwupZurkXw1us8Fsg KLF4 .fasta 608 52g58tOwGbdCohjFCkfcpA NsKuQJ3EYpITaGUS03jt 23mKzOkS 2023-03-09 17:02:39 None
7Yq6PwJoF6OmnyGUrHWw results .tgz 83769 -2S5ssheWzZo5ykRz5-K8g 1Sn4xFNvlGCiPVD95aIk 23mKzOkS 2023-03-09 17:02:53 None
Notebook
name title created_by created_at updated_at
id v
0ymQDuqM5Lwq 0 1-redun Track redun workflows DzTjkKse 2023-03-09 17:02:38 None
CKhlMCFA52oD 0 2-redun-run Track the redun run DzTjkKse 2023-03-09 17:02:44 None
Pipeline
name reference created_by created_at updated_at
id v
R8QwchFP 0.1.0 lamin-redun-fasta https://github.com/laminlabs/redun-lamin-fasta DzTjkKse 2023-03-09 17:02:38 None
Run
name external_id pipeline_id pipeline_v notebook_id notebook_v created_by created_at
id
NsKuQJ3EYpITaGUS03jt None None None None 0ymQDuqM5Lwq 0 DzTjkKse 2023-03-09 17:02:38
tP9qjkt7COobBAKeBCTY None None None None CKhlMCFA52oD 0 DzTjkKse 2023-03-09 17:02:44
1Sn4xFNvlGCiPVD95aIk Test run 97b0ca53-4b09-470c-b97f-085463abd45d R8QwchFP 0.1.0 None None DzTjkKse 2023-03-09 17:02:44
Storage
root type region created_at updated_at
id
23mKzOkS /home/runner/work/redun-lamin-fasta/redun-lami... local None 2023-03-09 17:02:34 None
User
email handle name created_at updated_at
id
DzTjkKse testuser1@lamin.ai testuser1 Test User1 2023-03-09 17:02:34 None