ACID upload of data objects#
Here we explore the ACID behavior of LaminDB’s upload API.
import lamindb as ln
from upath import UPath
import pytest
from laminci.db import setup_local_test_postgres
🔶 No row in the versions table
✅ Loaded instance: testuser1/sqlapi
pgurl = setup_local_test_postgres()
ln.setup.init(name="acidtests", storage="./acidtests", db=pgurl)
💬 Created Postgres test instance: 'postgresql://postgres:pwd@0.0.0.0:5432/pgtest'
It runs in docker container 'pgtest'
💬 Not registering instance on hub, if you want, call `lamin register`
💬 Loading schema modules: core==0.34.0
✅ Loaded instance: testuser1/acidtests
✅ Created & loaded instance: testuser1/acidtests
ln.track()
💬 Instance: testuser1/acidtests
💬 User: testuser1
✅ Added: Transform(id='zz23msudiiQR', version='0', name='07-ingest-acid', type=notebook, title='ACID upload of data objects', created_by_id='DzTjkKse', created_at=datetime.datetime(2023, 5, 30, 20, 26, 18, 125463))
✅ Added: Run(id='eJ49ZBDx0sPDdSJrV0k3', transform_id='zz23msudiiQR', transform_version='0', created_by_id='DzTjkKse', created_at=datetime.datetime(2023, 5, 30, 20, 26, 18, 144732))
Ingestion failure due to failed upload to storage#
Let’s try to ingest a data object to a storage location without permission.
# Create data object
adata = ln.dev.datasets.anndata_mouse_sc_lymph_node()
file = ln.File(adata, name="Mouse Lymph Node scRNA-seq")
💡 file will be copied to storage upon `ln.add()` using storage key = jfYJrtvdYPn9s8iTBg7T.h5ad
# Update storage path with corrupt path
ln.setup.settings.instance.storage._root = UPath("s3://nf-core-awsmegatests")
ln.setup.settings.storage.root
S3Path('s3://nf-core-awsmegatests/')
# Ingest data object
with pytest.raises(RuntimeError) as e:
error = e
added_file = ln.add(file)
print(error.exconly())
💡 storing object Mouse Lymph Node scRNA-seq with key jfYJrtvdYPn9s8iTBg7T.h5ad
RuntimeError: No entries were uploaded or committed to the database. See error message:
Install s3fs to access S3
NoneType: None
Let’s check that no metadata records were added to the database.
files = ln.select(ln.File).all()
assert len(files) == 0
Ingestion failure due to failed database transaction#
Let’s try to add the same Project
record twice, violating the primary key unique constraint.
added_project = ln.add(ln.Project(name="test-project"))
with pytest.raises(RuntimeError) as e:
error = e
ln.add(ln.Project, id=added_project.id, name="conflict-project")
print(error.exconly())
RuntimeError: No entries were uploaded or committed to the database. See error message:
(psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "pk_lnschema_core_project"
DETAIL: Key (id)=(ZnbRPo7R) already exists.
[SQL: INSERT INTO lnschema_core_project (id, name, created_by_id, updated_at) VALUES (%(id)s, %(name)s, %(created_by_id)s, %(updated_at)s)]
[parameters: {'id': 'ZnbRPo7R', 'name': 'conflict-project', 'created_by_id': 'DzTjkKse', 'updated_at': None}]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
NoneType: None
Ingestion failure during list-based ingestion#
If a list of data objects is passed to ln.add()
and the upload of one of these data objects fails, the successful uploads are maintained and a RuntimeError
is raised, listing the successfully uploaded data objects up until that point.