Track redun workflows#
Note
This use case starts out with Rico Meinl’s GitHub repository (and blog post).
Tip
Source notebooks are in the redun-lamin-fasta repository.
While redun focuses on managing worfklows for data pipelines, LaminDB offers a provenance-aware data lake.
redun schedules, executes, and tracks pipelines runs with a great level of control and metadata.
LaminDB’s data lake complements redun with
data lineage across computational pipelines, interactive analyses (notebooks), and UI-submitted data
curating, querying & structuring data by biological entities
extensible & modular Python ORM for queries & data access
Track the workflow as a pipeline#
!lamin login testuser1@lamin.ai --password cEvcwMJFX4OwbsYVaMt2Os6GxxGgDUlBGILs2RyS
ℹ️ Your handle is testuser1 and your id is DzTjkKse.
!lamin init --storage ./fasta
ℹ️ Loading schema modules: core==0.29.1
ℹ️ Created instance testuser1/fasta
import lamindb as ln
import lamindb.schema as lns
from pathlib import Path
import redun_lamin_fasta
ln.nb.header()
author | Test User1 (testuser1) |
id | 0ymQDuqM5Lwq |
version | 0 |
time_init | 2022-11-13 21:29 |
time_run | 2023-03-09 17:02 |
pypackage_store | lamindb==0.10.0 |
pypackage_live | lamindb==0.30.3 redun_lamin_fasta==0.1.0 |
ℹ️ Instance: testuser1/fasta
ℹ️ Added notebook: 0ymQDuqM5Lwq v0
ℹ️ Added run: NsKuQJ3EYpITaGUS03jt
Create a pipeline record:
pipeline = lns.Pipeline(
name="lamin-redun-fasta",
v=redun_lamin_fasta.__version__,
reference="https://github.com/laminlabs/redun-lamin-fasta",
)
Add the record to the db.
ln.add(pipeline)
Pipeline(id='R8QwchFP', v='0.1.0', name='lamin-redun-fasta', reference='https://github.com/laminlabs/redun-lamin-fasta', created_by='DzTjkKse', created_at=datetime.datetime(2023, 3, 9, 17, 2, 38))
Register the input files#
Let’s first register input files for processing with the redun pipeline.
!ls ./fasta
KLF4.fasta MYC.fasta PO5F1.fasta SOX2.fasta fasta.lndb
for filepath in Path("./fasta/").glob("*.fasta"):
dobject = ln.DObject(filepath)
ln.add(dobject)