Errors for constructing File from data#

import lamindb as ln
import pandas as pd
import pytest
import re
✅ Loaded instance: testuser1/mydata
df = pd.DataFrame({"a": [0, 1], "b": [2, 3]})

No data source#

If we try to ingest data without providing a source run, this will raise the following error:

with pytest.raises(ValueError):
    file = ln.File(df)
💬 No run & transform get linked to this file
💡 Consider using the `run` argument or ln.track()

Based on the error message, we have two options.

Fix using manually created run record#

Let’s create a run record that matches a pipeline that we’re running:

transform = ln.Transform(name="My test pipeline")
run = ln.Run(transform=transform)
run
Run(id='UWDd66nPoKcB8x1KO7ty', created_by_id='DzTjkKse')
run.transform
Transform(name='My test pipeline', type=notebook, created_by_id='DzTjkKse')
file = ln.File(df, run=run, name="My test data")
💡 file will be copied to storage upon `ln.add()` using storage key = NW4Y99ftx3Szmouiogm1.parquet

We see that the current notebook run is linked against the file.

file.run
Run(id='UWDd66nPoKcB8x1KO7ty', created_by_id='DzTjkKse')

Fix using automatically created run record from notebook#

Alternatively, we can call ln.track(), which auto-assigns the notebook run as the data source:

ln.track()
💬 Instance: testuser1/mydata
💬 User: testuser1
✅ Added: Transform(id='jlxQlPT93Htf', version='0', name='02-ingest', type=notebook, title='Errors for constructing `File` from data', created_by_id='DzTjkKse', created_at=datetime.datetime(2023, 5, 30, 20, 25, 48))
✅ Added: Run(id='cS2YVpHTHhaBU8h28BpW', transform_id='jlxQlPT93Htf', transform_version='0', created_by_id='DzTjkKse', created_at=datetime.datetime(2023, 5, 30, 20, 25, 48))
file = ln.File(df, name="My test data")
💡 file will be copied to storage upon `ln.add()` using storage key = fajjAJ2e57WoIg4UKMly.parquet
file.run
Run(id='cS2YVpHTHhaBU8h28BpW', transform_id='jlxQlPT93Htf', transform_version='0', created_by_id='DzTjkKse', created_at=datetime.datetime(2023, 5, 30, 20, 25, 48))

You may also fetch the current notebook run by:

ln.context.run
Run(id='cS2YVpHTHhaBU8h28BpW', transform_id='jlxQlPT93Htf', transform_version='0', created_by_id='DzTjkKse', created_at=datetime.datetime(2023, 5, 30, 20, 25, 48))

No name#

with pytest.raises(ValueError):
    file = ln.File(df)