Loading relationships: Session
#
import lamindb as ln
import lamindb.schema as lns
import pytest
from sqlalchemy.orm.exc import DetachedInstanceError
✅ Loaded instance: testuser1/mydata
ln.track()
💬 Instance: testuser1/mydata
💬 User: testuser1
✅ Added: Transform(id='KGYsQOIpS43O', version='0', name='01-session', type=notebook, title='Loading relationships: `Session`', created_by_id='DzTjkKse', created_at=datetime.datetime(2023, 5, 30, 20, 25, 44))
✅ Added: Run(id='8nAL2b2CwjGL5q4fFY7D', transform_id='KGYsQOIpS43O', transform_version='0', created_by_id='DzTjkKse', created_at=datetime.datetime(2023, 5, 30, 20, 25, 44))
Let’s create related sample data records and add them to the database:
transform = ln.Transform(name="Transform A")
run = ln.Run(name="Solve Problem X", transform=transform)
transform
Transform(name='Transform A', type=notebook, created_by_id='DzTjkKse')
run
Run(id='5ejiAJDM21F5BxGqMEVW', name='Solve Problem X', created_by_id='DzTjkKse')
run.transform
Transform(name='Transform A', type=notebook, created_by_id='DzTjkKse')
ln.add(run)
Run(id='5ejiAJDM21F5BxGqMEVW', name='Solve Problem X', transform_id='9I9th7fJqaX3', transform_version='0', created_by_id='DzTjkKse', created_at=datetime.datetime(2023, 5, 30, 20, 25, 44))
Both records got just added to the database.
In the background, a Session
object was created, which connected to the database, inserted the records, and closed the connection.
Query results without session#
run_queried = ln.select(ln.Run, name="Solve Problem X").first()
Also here, in the background, a session was created and closed. This is good enough if we need to use simple properties of the returned record, for instance, the pipeline id:
run_queried
Run(id='5ejiAJDM21F5BxGqMEVW', name='Solve Problem X', transform_id='9I9th7fJqaX3', transform_version='0', created_by_id='DzTjkKse', created_at=datetime.datetime(2023, 5, 30, 20, 25, 44))
However, if we’d like to access the entire related record, we’ll get a DetachedInstanceError
error (it would tell us that the “lazy load operation of attribute ‘inputs’ cannot proceed”):
with pytest.raises(DetachedInstanceError):
run_queried.inputs
The queried run would need to have an open connection to the DB in order for it to automatically load the related record. Under the hood, it needs to perform an automated query for this.
But when ln.select(...).first()
completed its execution, the database connection was closed.
Note
We can pre-configure to always load relationships in certain cases: ln.Run.transform
is such a case, see here!
Hence, you’ll be able to access run_queried.transform
.
The Session object#
In order to lazily load related data records, we need to use a Session
object!
ss = ln.Session()
The Session
object comes with add
, delete
and select
, just as the global namespace. They are equivalent to the global version, with the only difference being that all data records manipulated will be bound to an open session.
run_session = ss.select(ln.Run, name="Solve Problem X").first()
run_session
[session open] Run(id='5ejiAJDM21F5BxGqMEVW', name='Solve Problem X', transform_id='9I9th7fJqaX3', transform_version='0', created_by_id='DzTjkKse', created_at=datetime.datetime(2023, 5, 30, 20, 25, 44))
It’s clear we don’t need it for the simple attributes. But we need it for lazily loaded relationships:
run_session.inputs
[]
Let us close the session.
ss.close()
Given we already loaded the pipeline record, it’s still available in memory.
run_session.inputs
[]
But, we can’t access the outputs
relationship, as the session is now closed.
with pytest.raises(DetachedInstanceError):
run_session.outputs
The Session object in a context manager#
We can also call Session
in a context manager:
with ln.Session() as ss:
run_session2 = ss.select(ln.Run, name="Solve Problem X").first()
print(run_session2.outputs)
[]
Because we loaded the ouputs, they’re still in memory and available:
run_session2.outputs
[]
Accessing another relationship, however, will error:
with pytest.raises(DetachedInstanceError):
run_session2.inputs