Will data & metadata stay in sync?#

Here, we walk through different errors that can occur while saving artifacts & metadata records, and show that the LaminDB instance does not get corrupted by dangling metadata or artifacts. You could say transactions across data & metadata are ACID.

Setup#

from laminci.db import setup_local_test_postgres

pgurl = setup_local_test_postgres()
💡 Created Postgres test instance: 'postgresql://postgres:pwd@0.0.0.0:5432/pgtest'
It runs in docker container 'pgtest'
!lamin init --db {pgurl} --storage ./test-acid
❗ using the sql database name for the instance name
✅ saved: User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at=2024-03-04 13:54:24 UTC)
✅ saved: Storage(uid='3PTiXklD', root='/home/runner/work/lamindb/lamindb/docs/faq/test-acid', type='local', updated_at=2024-03-04 13:54:24 UTC, created_by_id=1)
💡 loaded instance: testuser1/pgtest
💡 did not register local instance on lamin.ai
import pytest
import lamindb as ln
from upath import UPath
💡 lamindb instance: testuser1/pgtest

Save error due to failed upload#

Let’s try to save an artifact to a storage location without permission.

artifact = ln.Artifact.from_anndata(
    ln.core.datasets.anndata_mouse_sc_lymph_node(),
    description="Mouse Lymph Node scRNA-seq",
)
❗ no run & transform get linked, consider passing a `run` or calling ln.track()

Because the public API only allows you to set a default storage for which you have permission, we need to hack it:

ln.setup.settings.storage._root = UPath("s3://nf-core-awsmegatests")
ln.settings.storage
S3Path('s3://nf-core-awsmegatests/')

This raises a RuntimeError:

with pytest.raises(RuntimeError) as error:
    artifact.save()
print(error.exconly())
❗ could not upload artifact: Artifact(uid='Y4vjp6wMWyPvaWWZmzXj', suffix='.h5ad', accessor='AnnData', description='Mouse Lymph Node scRNA-seq', size=17177479, hash='JYwvsCMVfGYZLiZPSdHZNw', hash_type='md5', visibility=1, key_is_virtual=True, updated_at=2024-03-04 13:54:26 UTC, storage_id=1, created_by_id=1)
RuntimeError: Access Denied

Let’s now check that no metadata records were added to the database:

assert len(ln.Artifact.filter().all()) == 0

Save error during bulk creation#

filepath = ln.core.datasets.file_jpg_paradisi05()
artifact = ln.Artifact(filepath, description="My image")
artifacts = [artifact, "this is not a record"]
❗ no run & transform get linked, consider passing a `run` or calling ln.track()

This raises an exception:

with pytest.raises(Exception) as error:
    ln.save(artifacts)
print(error.exconly())
AttributeError: 'str' object has no attribute 'pk'

Nothing got saved:

artifacts = ln.Artifact.filter().all()
assert len(artifacts) == 0

If a list of data objects is passed to ln.save() and the upload of one of these data objects fails, the successful uploads are maintained and a RuntimeError is raised, listing the successfully uploaded data objects up until that point.

Hide code cell content
!docker stop pgtest && docker rm pgtest
!lamin delete --force pgtest
!rm -r ./test-acid
pgtest
pgtest
💡 deleting instance testuser1/pgtest
✅     deleted instance settings file: /home/runner/.lamin/instance--testuser1--pgtest.env
✅     instance cache deleted
❗     consider manually deleting your stored data: /home/runner/work/lamindb/lamindb/docs/faq/test-acid