What happens if I save the same artifacts & records twice?#

LaminDB’s operations are idempotent in the sense defined in this document.

This allows you to re-run a notebook or script without erroring or duplicating data. Similar behavior holds for human data entry.

Summary#

Metadata records#

If you try to create any metadata record (Registry) and upon_create_search_names is True (the default):

  1. LaminDB will warn you if a record with similar name exists and display a table of similar existing records.

  2. You can then decide whether you’d like to save a record to the database or rather query an existing one from the table.

  3. If a name already has an exact match in a registry, LaminDB will return it instead of creating a new record. For versioned entities, also the version must be passed.

If you set upon_create_search_names to False, you’ll directly populate the DB.

Files#

If you try to create a Artifact object from the same content, depending on upon_artifact_create_if_hash_exists,

  • you’ll get an existing object, if upon_artifact_create_if_hash_exists = "warn_return_existing" (the default)

  • you’ll get an error, if upon_artifact_create_if_hash_exists = "error"

  • you’ll get a warning and a new object, if upon_artifact_create_if_hash_exists = "warn_create_new"

Examples#

!lamin init --storage ./test-idempotency
✅ saved: User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at=2024-03-04 13:54:42 UTC)
✅ saved: Storage(uid='yz4jmIi2', root='/home/runner/work/lamindb/lamindb/docs/faq/test-idempotency', type='local', updated_at=2024-03-04 13:54:42 UTC, created_by_id=1)
💡 loaded instance: testuser1/test-idempotency
💡 did not register local instance on lamin.ai
import lamindb as ln
import pytest

ln.settings.verbosity = "hint"
💡 lamindb instance: testuser1/test-idempotency

Metadata records#

assert ln.settings.upon_create_search_names

Let us add a first record to the ULabel registry:

label = ln.ULabel(name="My project 1")
label.save()

If we create a new record, we’ll automatically get search results that give clues on whether we are prone to duplicating an entry:

label = ln.ULabel(name="My project 2")
❗ record with similar name exist! did you mean to load it?
uid score
name
My project 1 uXUANKsp 91.7
label.save()

In case we match an existing name directly, we’ll get the existing object:

label = ln.ULabel(name="My project 1")
✅ loaded ULabel record with exact same name: 'My project 1'

If we save it again, it will not create a new entry in the registry:

label.save()

Now, if we create a third record, we’ll get two alternatives:

label = ln.ULabel(name="My project 3")
❗ records with similar names exist! did you mean to load one of them?
uid score
name
My project 1 uXUANKsp 91.7
My project 2 hlmmgCBl 91.7

If we prefer to not perform a search, e.g. for performance reasons or too noisy logging, we can switch it off.

ln.settings.upon_create_search_names = False
label = ln.ULabel(name="My project 3")

In this walkthrough, switch it back on:

ln.settings.upon_create_search_names = True

Files#

Warn upon trying to re-ingest an existing artifact#

assert ln.settings.upon_artifact_create_if_hash_exists == "warn_return_existing"
filepath = ln.core.datasets.file_fcs()

Create a File object.

artifact = ln.Artifact(filepath, description="My fcs artifact")
artifact.save()
❗ no run & transform get linked, consider passing a `run` or calling ln.track()
💡 path content will be copied to default storage upon `save()` with key `None` ('.lamindb/cGXTQALUZ14ThLUFiLa9.fcs')
✅ storing artifact 'cGXTQALUZ14ThLUFiLa9' at '/home/runner/work/lamindb/lamindb/docs/faq/test-idempotency/.lamindb/cGXTQALUZ14ThLUFiLa9.fcs'
Hide code cell content
assert artifact.hash == "KCEXRahJ-Ui9Y6nksQ8z1A"

Create a File object from the same path:

artifact2 = ln.Artifact(filepath)
❗ no run & transform get linked, consider passing a `run` or calling ln.track()
❗ returning existing artifact with same hash: Artifact(uid='cGXTQALUZ14ThLUFiLa9', suffix='.fcs', description='My fcs artifact', size=6785467, hash='KCEXRahJ-Ui9Y6nksQ8z1A', hash_type='md5', visibility=1, key_is_virtual=True, updated_at=2024-03-04 13:54:44 UTC, storage_id=1, created_by_id=1)

It gives us the existing object:

assert artifact.id == artifact2.id

If you save it again, nothing will happen (the operation is idempotent):

artifact2.save()

Error upon trying to re-ingest an existing artifact#

ln.settings.upon_artifact_create_if_hash_exists = "error"

In this case, you’ll not be able to create an object from the same content:

with pytest.raises(RuntimeError):
    artifact3 = ln.Artifact(filepath, description="My new fcs artifact")
❗ no run & transform get linked, consider passing a `run` or calling ln.track()

Warn and create a new artifact#

Lastly, let us discuss the following setting:

ln.settings.upon_artifact_create_if_hash_exists = "warn_create_new"

In this case, you’ll create a new object:

artifact4 = ln.Artifact(filepath, description="My new fcs artifact")
artifact4.save()
❗ no run & transform get linked, consider passing a `run` or calling ln.track()
❗ creating new Artifact object despite existing artifact with same hash: Artifact(uid='cGXTQALUZ14ThLUFiLa9', suffix='.fcs', description='My fcs artifact', size=6785467, hash='KCEXRahJ-Ui9Y6nksQ8z1A', hash_type='md5', visibility=1, key_is_virtual=True, updated_at=2024-03-04 13:54:44 UTC, storage_id=1, created_by_id=1)
💡 path content will be copied to default storage upon `save()` with key `None` ('.lamindb/vktklCCIccbrnTc66Buq.fcs')
✅ storing artifact 'vktklCCIccbrnTc66Buq' at '/home/runner/work/lamindb/lamindb/docs/faq/test-idempotency/.lamindb/vktklCCIccbrnTc66Buq.fcs'

You can verify that it’s a new entry by comparing the ids:

assert artifact4.id != artifact.id
artifact4.filter(hash="KCEXRahJ-Ui9Y6nksQ8z1A").df()
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
1 cGXTQALUZ14ThLUFiLa9 1 None .fcs None My fcs artifact None 6785467 KCEXRahJ-Ui9Y6nksQ8z1A md5 None None None None 1 True 2024-03-04 13:54:44.379587+00:00 2024-03-04 13:54:44.433364+00:00 1
2 vktklCCIccbrnTc66Buq 1 None .fcs None My new fcs artifact None 6785467 KCEXRahJ-Ui9Y6nksQ8z1A md5 None None None None 1 True 2024-03-04 13:54:44.494056+00:00 2024-03-04 13:54:44.494083+00:00 1
Hide code cell content
assert len(artifact.filter(hash="KCEXRahJ-Ui9Y6nksQ8z1A").list()) == 2
!lamin delete --force test-idempotency
!rm -r test-idempotency
💡 deleting instance testuser1/test-idempotency
✅     deleted instance settings file: /home/runner/.lamin/instance--testuser1--test-idempotency.env
✅     instance cache deleted
✅     deleted '.lndb' sqlite file
❗     consider manually deleting your stored data: /home/runner/work/lamindb/lamindb/docs/faq/test-idempotency