Query & search registries#

Find & access data using registries.

Setup#

!lamin init --storage ./mydata
Hide code cell output
✅ saved: User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at=2024-03-04 14:09:08 UTC)
✅ saved: Storage(uid='NebTDuEg', root='/home/runner/work/lamindb/lamindb/docs/mydata', type='local', updated_at=2024-03-04 14:09:08 UTC, created_by_id=1)
💡 loaded instance: testuser1/mydata
💡 did not register local instance on lamin.ai
import lamindb as ln
💡 lamindb instance: testuser1/mydata
ln.settings.verbosity = "info"
ln.transform.stem_uid = "vldHzF3aTAiW"
ln.transform.version = "1"
ln.track()
💡 Assuming editor is Jupyter Lab.
💡 Attaching notebook metadata
💡 notebook imports: Django==5.0.3 lamindb==0.68.0
💡 saved: Transform(uid='vldHzF3aTAiW5zKv', name='Query & search registries', short_name='meta', version='1', type=notebook, updated_at=2024-03-04 14:09:11 UTC, created_by_id=1)
💡 saved: Run(uid='6LiySRQxbNv9JIe7xoy5', run_at=2024-03-04 14:09:11 UTC, transform_id=1, created_by_id=1)
💡 tracked pip freeze > /home/runner/.cache/lamindb/run_env_pip_6LiySRQxbNv9JIe7xoy5.txt

We’ll need some toy data:

ln.Artifact(ln.core.datasets.file_jpg_paradisi05(), description="My image").save()
ln.Artifact.from_df(ln.core.datasets.df_iris(), description="The iris collection").save()
ln.Artifact(ln.core.datasets.file_fastq(), description="My fastq").save()
Hide code cell output
✅ storing artifact 'AvN7bCZ5k7L2mexSpqmq' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/AvN7bCZ5k7L2mexSpqmq.jpg'
✅ storing artifact 'UeXgKSR31XGYxiSFsZKL' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/UeXgKSR31XGYxiSFsZKL.parquet'
✅ storing artifact 'gd72jMtbIMDYaL4l19mG' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/gd72jMtbIMDYaL4l19mG.fastq.gz'

Look up metadata#

For entities where we don’t store more than 100k records, a look up object can be a convenient way of selecting a record.

Consider the User registry:

users = ln.User.lookup(field="handle")

With auto-complete, we find a user:

user = users.testuser1
user
User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at=2024-03-04 14:09:08 UTC)

Note

You can also auto-complete in a dictionary:

users_dict = ln.User.lookup().dict()

Filter by metadata#

Filter for all artifacts created by a user:

ln.Artifact.filter(created_by=user).df()
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
1 AvN7bCZ5k7L2mexSpqmq 1 None .jpg None My image None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 1 1 True 2024-03-04 14:09:12.205005+00:00 2024-03-04 14:09:12.205033+00:00 1
2 UeXgKSR31XGYxiSFsZKL 1 None .parquet DataFrame The iris collection None 5629 EwCwz5PD3uhjhYt-QAa0EQ md5 None None 1 1 1 True 2024-03-04 14:09:12.296446+00:00 2024-03-04 14:09:12.296478+00:00 1
3 gd72jMtbIMDYaL4l19mG 1 None .fastq.gz None My fastq None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 1 1 True 2024-03-04 14:09:12.303173+00:00 2024-03-04 14:09:12.303200+00:00 1

To access the results encoded in a filter statement, execute its return value with one of:

  • .df(): A pandas DataFrame with each record stored as a row.

  • .all(): An indexable django QuerySet.

  • .list(): A list of records.

  • .one(): Exactly one record. Will raise an error if there is none.

  • .one_or_none(): Either one record or None if there is no query result.

Note

filter() returns a QuerySet.

The ORMs in LaminDB are Django Models and any Django query works. LaminDB extends Django’s API for data scientists.

Under the hood, any .filter() call translates into a SQL select statement.

.one() and .one_or_none() are two parts of LaminDB’s API that are borrowed from SQLAlchemy.

Search for metadata#

ln.Artifact.search("iris")
key description score
uid
UeXgKSR31XGYxiSFsZKL The iris collection 90.0
AvN7bCZ5k7L2mexSpqmq My image 34.2
gd72jMtbIMDYaL4l19mG My fastq 25.7
ln.Artifact.search("iris", return_queryset=True).first()
Artifact(uid='UeXgKSR31XGYxiSFsZKL', suffix='.parquet', accessor='DataFrame', description='The iris collection', size=5629, hash='EwCwz5PD3uhjhYt-QAa0EQ', hash_type='md5', visibility=1, key_is_virtual=True, updated_at=2024-03-04 14:09:12 UTC, storage_id=1, transform_id=1, run_id=1, created_by_id=1)

Let us create 500 notebook objects with fake titles and save them:

ln.save(
    [
        ln.Transform(name=title, type="notebook")
        for title in ln.core.datasets.fake_bio_notebook_titles(n=500)
    ]
)

We can now search for any combination of terms:

ln.Transform.search("intestine").head()
uid score
name
Ameloblast IgG4 intestine Monocyte IgG Proprioceptive. 5HJjuVaCP82FwjMZ 90.0
Blue-Sensitive Cone Cells intestine IgD IgY Ameloblast IgA candidate. 4rpoYmoWdCq01ZJH 90.0
Candidate IgG Melanotropes IgY Ameloblast IgD intestine IgG. LBW5cemHwm9SJ8TW 90.0
Choroid Plexus intestine IgG4 candidate IgE. p06cQEsWhIgdM5LO 90.0
Igd IgG Ameloblast Vagina IgG4 intestine Melanotropes. JMM140EgXLjUIaSX 90.0

Leverage relations#

Django has a double-under-score syntax to filter based on related tables.

This syntax enables you to traverse several layers of relations:

ln.Artifact.filter(run__created_by__handle__startswith="testuse").df()
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
1 AvN7bCZ5k7L2mexSpqmq 1 None .jpg None My image None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 1 1 True 2024-03-04 14:09:12.205005+00:00 2024-03-04 14:09:12.205033+00:00 1
2 UeXgKSR31XGYxiSFsZKL 1 None .parquet DataFrame The iris collection None 5629 EwCwz5PD3uhjhYt-QAa0EQ md5 None None 1 1 1 True 2024-03-04 14:09:12.296446+00:00 2024-03-04 14:09:12.296478+00:00 1
3 gd72jMtbIMDYaL4l19mG 1 None .fastq.gz None My fastq None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 1 1 True 2024-03-04 14:09:12.303173+00:00 2024-03-04 14:09:12.303200+00:00 1

The filter selects all artifacts based on the users who ran the generating notebook.

(Under the hood, in the SQL database, it’s joining the artifact table with the run and the user table.)

Beyond __startswith, Django supports about two dozen field comparators field__comparator=value.

Here are some of them.

and#

ln.Artifact.filter(suffix=".jpg", created_by=user).df()
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
1 AvN7bCZ5k7L2mexSpqmq 1 None .jpg None My image None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 1 1 True 2024-03-04 14:09:12.205005+00:00 2024-03-04 14:09:12.205033+00:00 1

less than/ greater than#

Or subset to artifacts greater than 10kB. Here, we can’t use keyword arguments, but need an explicit where statement.

ln.Artifact.filter(created_by=user, size__lt=1e4).df()
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
2 UeXgKSR31XGYxiSFsZKL 1 None .parquet DataFrame The iris collection None 5629 EwCwz5PD3uhjhYt-QAa0EQ md5 None None 1 1 1 True 2024-03-04 14:09:12.296446+00:00 2024-03-04 14:09:12.296478+00:00 1
3 gd72jMtbIMDYaL4l19mG 1 None .fastq.gz None My fastq None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 1 1 True 2024-03-04 14:09:12.303173+00:00 2024-03-04 14:09:12.303200+00:00 1

or#

from django.db.models import Q

ln.Artifact.filter().filter(Q(suffix=".jpg") | Q(suffix=".fastq.gz")).df()
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
1 AvN7bCZ5k7L2mexSpqmq 1 None .jpg None My image None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 1 1 True 2024-03-04 14:09:12.205005+00:00 2024-03-04 14:09:12.205033+00:00 1
3 gd72jMtbIMDYaL4l19mG 1 None .fastq.gz None My fastq None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 1 1 True 2024-03-04 14:09:12.303173+00:00 2024-03-04 14:09:12.303200+00:00 1

in#

ln.Artifact.filter(suffix__in=[".jpg", ".fastq.gz"]).df()
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
1 AvN7bCZ5k7L2mexSpqmq 1 None .jpg None My image None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 1 1 True 2024-03-04 14:09:12.205005+00:00 2024-03-04 14:09:12.205033+00:00 1
3 gd72jMtbIMDYaL4l19mG 1 None .fastq.gz None My fastq None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 1 1 True 2024-03-04 14:09:12.303173+00:00 2024-03-04 14:09:12.303200+00:00 1

order by#

ln.Artifact.filter().order_by("-updated_at").df()
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
3 gd72jMtbIMDYaL4l19mG 1 None .fastq.gz None My fastq None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 1 1 True 2024-03-04 14:09:12.303173+00:00 2024-03-04 14:09:12.303200+00:00 1
2 UeXgKSR31XGYxiSFsZKL 1 None .parquet DataFrame The iris collection None 5629 EwCwz5PD3uhjhYt-QAa0EQ md5 None None 1 1 1 True 2024-03-04 14:09:12.296446+00:00 2024-03-04 14:09:12.296478+00:00 1
1 AvN7bCZ5k7L2mexSpqmq 1 None .jpg None My image None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 1 1 True 2024-03-04 14:09:12.205005+00:00 2024-03-04 14:09:12.205033+00:00 1

contains#

ln.Transform.filter(name__contains="search").df().head(10)
uid name short_name version type latest_report_id source_code_id reference reference_type created_at updated_at created_by_id
id
1 vldHzF3aTAiW5zKv Query & search registries meta 1 notebook None None None None 2024-03-04 14:09:11.292177+00:00 2024-03-04 14:09:11.292206+00:00 1
4 ia2aWbnfZ6iPsYVm Study Olfactory epithelium supporting cells re... None None notebook None None None None 2024-03-04 14:09:16.111053+00:00 2024-03-04 14:09:16.111068+00:00 1
33 pPmgd3KQbhFXAloP Research IgE Parotid glands Ameloblast IgG int... None None notebook None None None None 2024-03-04 14:09:16.115325+00:00 2024-03-04 14:09:16.115340+00:00 1
35 vvbOuCL1LzC74lL7 Intestinal research IgG4 intestinal investigat... None None notebook None None None None 2024-03-04 14:09:16.115616+00:00 2024-03-04 14:09:16.115631+00:00 1
41 iWGbT5iCUEWkNnfQ Igd research Ameloblast Eosinophil granulocyte... None None notebook None None None None 2024-03-04 14:09:16.116517+00:00 2024-03-04 14:09:16.116533+00:00 1
68 3wdg1tfkghXKEFb0 Planum Semilunatum Epithelial Cell Of Vestibul... None None notebook None None None None 2024-03-04 14:09:16.120621+00:00 2024-03-04 14:09:16.120637+00:00 1
71 3WE6XDLXSziIsD7D Proprioceptive Proprioceptive Choroid plexus r... None None notebook None None None None 2024-03-04 14:09:16.121071+00:00 2024-03-04 14:09:16.121087+00:00 1
72 MbB9paVEBXe7VD43 Glucagon-Like Peptide-1-Secreting L Cell intes... None None notebook None None None None 2024-03-04 14:09:16.121221+00:00 2024-03-04 14:09:16.121237+00:00 1
75 8Zw2l7kOOibZPh78 Rank Skin Monocyte Melanotropes research. None None notebook None None None None 2024-03-04 14:09:16.121669+00:00 2024-03-04 14:09:16.121685+00:00 1
87 k5Y5MpUNqHSXp4nq Igg2 intestine research blue-sensitive cone ce... None None notebook None None None None 2024-03-04 14:09:16.127812+00:00 2024-03-04 14:09:16.127829+00:00 1

And case-insensitive:

ln.Transform.filter(name__icontains="Search").df().head(10)
uid name short_name version type latest_report_id source_code_id reference reference_type created_at updated_at created_by_id
id
1 vldHzF3aTAiW5zKv Query & search registries meta 1 notebook None None None None 2024-03-04 14:09:11.292177+00:00 2024-03-04 14:09:11.292206+00:00 1
4 ia2aWbnfZ6iPsYVm Study Olfactory epithelium supporting cells re... None None notebook None None None None 2024-03-04 14:09:16.111053+00:00 2024-03-04 14:09:16.111068+00:00 1
33 pPmgd3KQbhFXAloP Research IgE Parotid glands Ameloblast IgG int... None None notebook None None None None 2024-03-04 14:09:16.115325+00:00 2024-03-04 14:09:16.115340+00:00 1
35 vvbOuCL1LzC74lL7 Intestinal research IgG4 intestinal investigat... None None notebook None None None None 2024-03-04 14:09:16.115616+00:00 2024-03-04 14:09:16.115631+00:00 1
41 iWGbT5iCUEWkNnfQ Igd research Ameloblast Eosinophil granulocyte... None None notebook None None None None 2024-03-04 14:09:16.116517+00:00 2024-03-04 14:09:16.116533+00:00 1
68 3wdg1tfkghXKEFb0 Planum Semilunatum Epithelial Cell Of Vestibul... None None notebook None None None None 2024-03-04 14:09:16.120621+00:00 2024-03-04 14:09:16.120637+00:00 1
71 3WE6XDLXSziIsD7D Proprioceptive Proprioceptive Choroid plexus r... None None notebook None None None None 2024-03-04 14:09:16.121071+00:00 2024-03-04 14:09:16.121087+00:00 1
72 MbB9paVEBXe7VD43 Glucagon-Like Peptide-1-Secreting L Cell intes... None None notebook None None None None 2024-03-04 14:09:16.121221+00:00 2024-03-04 14:09:16.121237+00:00 1
75 8Zw2l7kOOibZPh78 Rank Skin Monocyte Melanotropes research. None None notebook None None None None 2024-03-04 14:09:16.121669+00:00 2024-03-04 14:09:16.121685+00:00 1
87 k5Y5MpUNqHSXp4nq Igg2 intestine research blue-sensitive cone ce... None None notebook None None None None 2024-03-04 14:09:16.127812+00:00 2024-03-04 14:09:16.127829+00:00 1

startswith#

ln.Transform.filter(name__startswith="Query").df()
uid name short_name version type latest_report_id source_code_id reference reference_type created_at updated_at created_by_id
id
1 vldHzF3aTAiW5zKv Query & search registries meta 1 notebook None None None None 2024-03-04 14:09:11.292177+00:00 2024-03-04 14:09:11.292206+00:00 1
Hide code cell content
# clean up test instance
!lamin delete --force mydata
!rm -r mydata
💡 deleting instance testuser1/mydata
✅     deleted instance settings file: /home/runner/.lamin/instance--testuser1--mydata.env
✅     instance cache deleted
✅     deleted '.lndb' sqlite file
❗     consider manually deleting your stored data: /home/runner/work/lamindb/lamindb/docs/mydata