Errors for constructing File
from data#
import lamindb as ln
import pandas as pd
import pytest
import re
✅ Loaded instance: testuser1/mydata
df = pd.DataFrame({"a": [0, 1], "b": [2, 3]})
No data source#
If we try to ingest data without providing a source run, this will raise the following error:
with pytest.raises(ValueError):
file = ln.File(df)
💬 No run & transform get linked to this file
💡 Consider using the `run` argument or ln.track()
Based on the error message, we have two options.
Fix using manually created run record#
Let’s create a run record that matches a pipeline that we’re running:
transform = ln.Transform(name="My test pipeline")
run = ln.Run(transform=transform)
run
Run(id='UWDd66nPoKcB8x1KO7ty', created_by_id='DzTjkKse')
run.transform
Transform(name='My test pipeline', type=notebook, created_by_id='DzTjkKse')
file = ln.File(df, run=run, name="My test data")
💡 file will be copied to storage upon `ln.add()` using storage key = NW4Y99ftx3Szmouiogm1.parquet
We see that the current notebook run is linked against the file.
file.run
Run(id='UWDd66nPoKcB8x1KO7ty', created_by_id='DzTjkKse')
Fix using automatically created run record from notebook#
Alternatively, we can call ln.track()
, which auto-assigns the notebook run as the data source:
ln.track()
💬 Instance: testuser1/mydata
💬 User: testuser1
✅ Added: Transform(id='jlxQlPT93Htf', version='0', name='02-ingest', type=notebook, title='Errors for constructing `File` from data', created_by_id='DzTjkKse', created_at=datetime.datetime(2023, 5, 30, 20, 25, 48))
✅ Added: Run(id='cS2YVpHTHhaBU8h28BpW', transform_id='jlxQlPT93Htf', transform_version='0', created_by_id='DzTjkKse', created_at=datetime.datetime(2023, 5, 30, 20, 25, 48))
file = ln.File(df, name="My test data")
💡 file will be copied to storage upon `ln.add()` using storage key = fajjAJ2e57WoIg4UKMly.parquet
file.run
Run(id='cS2YVpHTHhaBU8h28BpW', transform_id='jlxQlPT93Htf', transform_version='0', created_by_id='DzTjkKse', created_at=datetime.datetime(2023, 5, 30, 20, 25, 48))
You may also fetch the current notebook run by:
ln.context.run
Run(id='cS2YVpHTHhaBU8h28BpW', transform_id='jlxQlPT93Htf', transform_version='0', created_by_id='DzTjkKse', created_at=datetime.datetime(2023, 5, 30, 20, 25, 48))
No name#
with pytest.raises(ValueError):
file = ln.File(df)