ACID upload of data objects#

Here we explore the ACID behavior of LaminDB’s upload API.

import lamindb as ln
from upath import UPath
import pytest
from laminci.db import setup_local_test_postgres
🔶 No row in the versions table
✅ Loaded instance: testuser1/sqlapi
pgurl = setup_local_test_postgres()
ln.setup.init(name="acidtests", storage="./acidtests", db=pgurl)
💬 Created Postgres test instance: 'postgresql://postgres:pwd@0.0.0.0:5432/pgtest'
It runs in docker container 'pgtest'
💬 Not registering instance on hub, if you want, call `lamin register`
💬 Loading schema modules: core==0.34.0 
✅ Loaded instance: testuser1/acidtests
✅ Created & loaded instance: testuser1/acidtests
ln.track()
💬 Instance: testuser1/acidtests
💬 User: testuser1
✅ Added: Transform(id='zz23msudiiQR', version='0', name='07-ingest-acid', type=notebook, title='ACID upload of data objects', created_by_id='DzTjkKse', created_at=datetime.datetime(2023, 5, 30, 20, 26, 18, 125463))
✅ Added: Run(id='eJ49ZBDx0sPDdSJrV0k3', transform_id='zz23msudiiQR', transform_version='0', created_by_id='DzTjkKse', created_at=datetime.datetime(2023, 5, 30, 20, 26, 18, 144732))

Ingestion failure due to failed upload to storage#

Let’s try to ingest a data object to a storage location without permission.

# Create data object
adata = ln.dev.datasets.anndata_mouse_sc_lymph_node()
file = ln.File(adata, name="Mouse Lymph Node scRNA-seq")
💡 file will be copied to storage upon `ln.add()` using storage key = jfYJrtvdYPn9s8iTBg7T.h5ad
# Update storage path with corrupt path
ln.setup.settings.instance.storage._root = UPath("s3://nf-core-awsmegatests")
ln.setup.settings.storage.root
S3Path('s3://nf-core-awsmegatests/')
# Ingest data object
with pytest.raises(RuntimeError) as e:
    error = e
    added_file = ln.add(file)
print(error.exconly())
💡 storing object Mouse Lymph Node scRNA-seq with key jfYJrtvdYPn9s8iTBg7T.h5ad
RuntimeError: No entries were uploaded or committed to the database. See error message:

Install s3fs to access S3

NoneType: None

Let’s check that no metadata records were added to the database.

files = ln.select(ln.File).all()
assert len(files) == 0

Ingestion failure due to failed database transaction#

Let’s try to add the same Project record twice, violating the primary key unique constraint.

added_project = ln.add(ln.Project(name="test-project"))
with pytest.raises(RuntimeError) as e:
    error = e
    ln.add(ln.Project, id=added_project.id, name="conflict-project")
print(error.exconly())
RuntimeError: No entries were uploaded or committed to the database. See error message:

(psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "pk_lnschema_core_project"
DETAIL:  Key (id)=(ZnbRPo7R) already exists.

[SQL: INSERT INTO lnschema_core_project (id, name, created_by_id, updated_at) VALUES (%(id)s, %(name)s, %(created_by_id)s, %(updated_at)s)]
[parameters: {'id': 'ZnbRPo7R', 'name': 'conflict-project', 'created_by_id': 'DzTjkKse', 'updated_at': None}]
(Background on this error at: https://sqlalche.me/e/14/gkpj)

NoneType: None

Ingestion failure during list-based ingestion#

If a list of data objects is passed to ln.add() and the upload of one of these data objects fails, the successful uploads are maintained and a RuntimeError is raised, listing the successfully uploaded data objects up until that point.

Ingestion failure unrelated to upload to storage or DB transaction#

Let’s now restore the storage location.

ln.setup.load("acidtests")
💬 Found cached instance metadata: /home/runner/.lamin/testuser1-instance-acidtests.env
✅ Loaded instance: testuser1/acidtests
'migrate-unnecessary'

Errors that are not related to database connection or file upload are raised with their original exception.

No entries are committed to the database or uploaded to storage.

from sqlalchemy.orm.exc import UnmappedInstanceError

filepath = ln.dev.datasets.file_jpg_paradisi05()
file = ln.File(filepath)
files = [file, "this is not a data object"]
with pytest.raises(UnmappedInstanceError) as e:
    exception = e
    ln.add(files)
print(exception.exconly())
💡 file will be copied to storage upon `ln.add()` using storage key = ATQg4KD8FupjhiNnwG5i.jpg
sqlalchemy.orm.exc.UnmappedInstanceError: Class 'builtins.str' is not mapped
files = ln.select(ln.File).all()
assert len(files) == 0
!docker stop pgtest && docker rm pgtest
pgtest
pgtest
ln.setup.delete("acidtests")
💬 Deleting instance testuser1/acidtests
💬     instance settings '.env' deleted
💬     current instance settings /home/runner/.lamin/current_instance.env deleted
💬     consider deleting your stored data manually: /home/runner/work/lamindb/lamindb/docs/faq/acidtests