Query & search registries#

Find & access data using registries.

Setup#

!lamin init --storage ./mydata
Hide code cell output
💡 connected lamindb: testuser1/mydata
import lamindb as ln

ln.settings.verbosity = "info"
💡 connected lamindb: testuser1/mydata

We’ll need some toy data:

ln.Artifact(ln.core.datasets.file_jpg_paradisi05(), description="My image").save()
ln.Artifact.from_df(ln.core.datasets.df_iris(), description="The iris collection").save()
ln.Artifact(ln.core.datasets.file_fastq(), description="My fastq").save()
Hide code cell output
❗ no run & transform get linked, consider calling ln.track()
✅ storing artifact 'cwFNANqriYscAAc0AYuj' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/cwFNANqriYscAAc0AYuj.jpg'
❗ no run & transform get linked, consider calling ln.track()
✅ storing artifact 'y4w9pTiR4PlWjLHhPHes' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/y4w9pTiR4PlWjLHhPHes.parquet'
❗ no run & transform get linked, consider calling ln.track()
✅ storing artifact 'PbQB9MUHDnsgorxmWe4K' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/PbQB9MUHDnsgorxmWe4K.fastq.gz'
Artifact(uid='PbQB9MUHDnsgorxmWe4K', suffix='.fastq.gz', description='My fastq', size=20, hash='hi7ZmAzz8sfMd3vIQr-57Q', hash_type='md5', visibility=1, key_is_virtual=True, updated_at=2024-05-01 18:48:53 UTC, storage_id=1, created_by_id=1)

Look up metadata#

For entities where we don’t store more than 100k records, a look up object can be a convenient way of selecting a record.

Consider the User registry:

users = ln.User.lookup(field="handle")

With auto-complete, we find a user:

user = users.testuser1
user
User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at=2024-05-01 18:48:51 UTC)

Note

You can also auto-complete in a dictionary:

users_dict = ln.User.lookup().dict()

Filter by metadata#

Filter for all artifacts created by a user:

ln.Artifact.filter(created_by=user).df()
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
1 cwFNANqriYscAAc0AYuj 1 None .jpg None My image None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None None None 1 True 2024-05-01 18:48:53.050313+00:00 2024-05-01 18:48:53.050360+00:00 1
2 y4w9pTiR4PlWjLHhPHes 1 None .parquet DataFrame The iris collection None 5629 h9S873DqkeBVN8PcwcYdgA md5 None None None None 1 True 2024-05-01 18:48:53.159355+00:00 2024-05-01 18:48:53.159387+00:00 1
3 PbQB9MUHDnsgorxmWe4K 1 None .fastq.gz None My fastq None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None None None 1 True 2024-05-01 18:48:53.168151+00:00 2024-05-01 18:48:53.168178+00:00 1

To access the results encoded in a filter statement, execute its return value with one of:

  • .df(): A pandas DataFrame with each record stored as a row.

  • .all(): An indexable django QuerySet.

  • .list(): A list of records.

  • .one(): Exactly one record. Will raise an error if there is none.

  • .one_or_none(): Either one record or None if there is no query result.

Note

filter() returns a QuerySet.

The ORMs in LaminDB are Django Models and any Django query works. LaminDB extends Django’s API for data scientists.

Under the hood, any .filter() call translates into a SQL select statement.

.one() and .one_or_none() are two parts of LaminDB’s API that are borrowed from SQLAlchemy.

Search for metadata#

ln.Artifact.search("iris")
key description score
uid
y4w9pTiR4PlWjLHhPHes The iris collection 90.0
cwFNANqriYscAAc0AYuj My image 34.2
PbQB9MUHDnsgorxmWe4K My fastq 25.7
ln.Artifact.search("iris", return_queryset=True).first()
Artifact(uid='y4w9pTiR4PlWjLHhPHes', suffix='.parquet', accessor='DataFrame', description='The iris collection', size=5629, hash='h9S873DqkeBVN8PcwcYdgA', hash_type='md5', visibility=1, key_is_virtual=True, updated_at=2024-05-01 18:48:53 UTC, storage_id=1, created_by_id=1)

Let us create 500 notebook objects with fake titles and save them:

ln.save(
    [
        ln.Transform(name=title, type="notebook")
        for title in ln.core.datasets.fake_bio_notebook_titles(n=500)
    ]
)

We can now search for any combination of terms:

ln.Transform.search("intestine").head()
uid score
name
Basal Cells Of Olfactory Epithelium candidate Tonsils intestine study. rab0XWvwZgiihYjA 90.0
Classify IgD intestine IgG4 IgY research. LztqGlCWtv4uRE8D 90.0
Cluster candidate intestine Astrocytes cluster. TjU1r0nhCje7Uv9W 90.0
Efficiency study intestine IgM. sY3KFCt1IBAxjHMY 90.0
Ganglia intestine IgM Cold-sensitive sensory neurons IgY. 61XXLg1RxKm00FtW 90.0

Leverage relations#

Django has a double-under-score syntax to filter based on related tables.

This syntax enables you to traverse several layers of relations:

ln.Artifact.filter(run__created_by__handle__startswith="testuse").df()
uid key suffix accessor description version size hash hash_type n_objects n_observations visibility key_is_virtual created_at updated_at storage_id transform_id run_id created_by_id
id

The filter selects all artifacts based on the users who ran the generating notebook.

(Under the hood, in the SQL database, it’s joining the artifact table with the run and the user table.)

Beyond __startswith, Django supports about two dozen field comparators field__comparator=value.

Here are some of them.

and#

ln.Artifact.filter(suffix=".jpg", created_by=user).df()
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
1 cwFNANqriYscAAc0AYuj 1 None .jpg None My image None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None None None 1 True 2024-05-01 18:48:53.050313+00:00 2024-05-01 18:48:53.050360+00:00 1

less than/ greater than#

Or subset to artifacts greater than 10kB. Here, we can’t use keyword arguments, but need an explicit where statement.

ln.Artifact.filter(created_by=user, size__lt=1e4).df()
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
2 y4w9pTiR4PlWjLHhPHes 1 None .parquet DataFrame The iris collection None 5629 h9S873DqkeBVN8PcwcYdgA md5 None None None None 1 True 2024-05-01 18:48:53.159355+00:00 2024-05-01 18:48:53.159387+00:00 1
3 PbQB9MUHDnsgorxmWe4K 1 None .fastq.gz None My fastq None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None None None 1 True 2024-05-01 18:48:53.168151+00:00 2024-05-01 18:48:53.168178+00:00 1

or#

from django.db.models import Q

ln.Artifact.filter().filter(Q(suffix=".jpg") | Q(suffix=".fastq.gz")).df()
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
1 cwFNANqriYscAAc0AYuj 1 None .jpg None My image None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None None None 1 True 2024-05-01 18:48:53.050313+00:00 2024-05-01 18:48:53.050360+00:00 1
3 PbQB9MUHDnsgorxmWe4K 1 None .fastq.gz None My fastq None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None None None 1 True 2024-05-01 18:48:53.168151+00:00 2024-05-01 18:48:53.168178+00:00 1

in#

ln.Artifact.filter(suffix__in=[".jpg", ".fastq.gz"]).df()
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
1 cwFNANqriYscAAc0AYuj 1 None .jpg None My image None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None None None 1 True 2024-05-01 18:48:53.050313+00:00 2024-05-01 18:48:53.050360+00:00 1
3 PbQB9MUHDnsgorxmWe4K 1 None .fastq.gz None My fastq None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None None None 1 True 2024-05-01 18:48:53.168151+00:00 2024-05-01 18:48:53.168178+00:00 1

order by#

ln.Artifact.filter().order_by("-updated_at").df()
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
3 PbQB9MUHDnsgorxmWe4K 1 None .fastq.gz None My fastq None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None None None 1 True 2024-05-01 18:48:53.168151+00:00 2024-05-01 18:48:53.168178+00:00 1
2 y4w9pTiR4PlWjLHhPHes 1 None .parquet DataFrame The iris collection None 5629 h9S873DqkeBVN8PcwcYdgA md5 None None None None 1 True 2024-05-01 18:48:53.159355+00:00 2024-05-01 18:48:53.159387+00:00 1
1 cwFNANqriYscAAc0AYuj 1 None .jpg None My image None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None None None 1 True 2024-05-01 18:48:53.050313+00:00 2024-05-01 18:48:53.050360+00:00 1

contains#

ln.Transform.filter(name__contains="search").df().head(10)
uid name key version description type latest_report_id source_code_id reference reference_type created_at updated_at created_by_id
id
11 NWERWfHYaDFWmLrv Iga Descending colon research IgG IgG3 Ganglia... None None None notebook None None None None 2024-05-01 18:48:54.074138+00:00 2024-05-01 18:48:54.074152+00:00 1
19 LHYHshfneMWTbJD1 Igg4 IgG3 IgG4 research. None None None notebook None None None None 2024-05-01 18:48:54.075376+00:00 2024-05-01 18:48:54.075390+00:00 1
36 hY8oNbFSouy6ONsN Igd visualize research IgG IgG4 IgD efficiency. None None None notebook None None None None 2024-05-01 18:48:54.078042+00:00 2024-05-01 18:48:54.078056+00:00 1
37 aR0JBcvRcPRP5jK0 Igg3 IgD Uterus Uterus study research. None None None notebook None None None None 2024-05-01 18:48:54.078197+00:00 2024-05-01 18:48:54.078211+00:00 1
38 4MNYX4vA0yIcic3z Basal Cells Of Olfactory Epithelium Basal cell... None None None notebook None None None None 2024-05-01 18:48:54.078352+00:00 2024-05-01 18:48:54.078366+00:00 1
41 AXdB1oyYNcJimf3L Igg3 classify investigate rank IgG research IgD. None None None notebook None None None None 2024-05-01 18:48:54.078817+00:00 2024-05-01 18:48:54.078831+00:00 1
43 FPNdj4tFs8CuE7PK Intestinal IgG4 research. None None None notebook None None None None 2024-05-01 18:48:54.079127+00:00 2024-05-01 18:48:54.079141+00:00 1
46 H7vtUqLO1u2Zwcbi Astrocytes research Descending colon IgG. None None None notebook None None None None 2024-05-01 18:48:54.079591+00:00 2024-05-01 18:48:54.079605+00:00 1
63 iYRCFKUdKFKZRDlu Ige IgE Veins efficiency gastric inhibitory pe... None None None notebook None None None None 2024-05-01 18:48:54.082246+00:00 2024-05-01 18:48:54.082261+00:00 1
79 mv6zIcHOlmPKVFPS Red Skeletal Muscle Cell Melanotropes classify... None None None notebook None None None None 2024-05-01 18:48:54.088260+00:00 2024-05-01 18:48:54.088274+00:00 1

And case-insensitive:

ln.Transform.filter(name__icontains="Search").df().head(10)
uid name key version description type latest_report_id source_code_id reference reference_type created_at updated_at created_by_id
id
11 NWERWfHYaDFWmLrv Iga Descending colon research IgG IgG3 Ganglia... None None None notebook None None None None 2024-05-01 18:48:54.074138+00:00 2024-05-01 18:48:54.074152+00:00 1
19 LHYHshfneMWTbJD1 Igg4 IgG3 IgG4 research. None None None notebook None None None None 2024-05-01 18:48:54.075376+00:00 2024-05-01 18:48:54.075390+00:00 1
36 hY8oNbFSouy6ONsN Igd visualize research IgG IgG4 IgD efficiency. None None None notebook None None None None 2024-05-01 18:48:54.078042+00:00 2024-05-01 18:48:54.078056+00:00 1
37 aR0JBcvRcPRP5jK0 Igg3 IgD Uterus Uterus study research. None None None notebook None None None None 2024-05-01 18:48:54.078197+00:00 2024-05-01 18:48:54.078211+00:00 1
38 4MNYX4vA0yIcic3z Basal Cells Of Olfactory Epithelium Basal cell... None None None notebook None None None None 2024-05-01 18:48:54.078352+00:00 2024-05-01 18:48:54.078366+00:00 1
41 AXdB1oyYNcJimf3L Igg3 classify investigate rank IgG research IgD. None None None notebook None None None None 2024-05-01 18:48:54.078817+00:00 2024-05-01 18:48:54.078831+00:00 1
43 FPNdj4tFs8CuE7PK Intestinal IgG4 research. None None None notebook None None None None 2024-05-01 18:48:54.079127+00:00 2024-05-01 18:48:54.079141+00:00 1
46 H7vtUqLO1u2Zwcbi Astrocytes research Descending colon IgG. None None None notebook None None None None 2024-05-01 18:48:54.079591+00:00 2024-05-01 18:48:54.079605+00:00 1
63 iYRCFKUdKFKZRDlu Ige IgE Veins efficiency gastric inhibitory pe... None None None notebook None None None None 2024-05-01 18:48:54.082246+00:00 2024-05-01 18:48:54.082261+00:00 1
79 mv6zIcHOlmPKVFPS Red Skeletal Muscle Cell Melanotropes classify... None None None notebook None None None None 2024-05-01 18:48:54.088260+00:00 2024-05-01 18:48:54.088274+00:00 1

startswith#

ln.Transform.filter(name__startswith="Research").df()
uid name key version description type latest_report_id source_code_id reference reference_type created_at updated_at created_by_id
id
143 arZ4h51LatzbjgrB Research Red skeletal muscle cell IgG4 Ganglia... None None None notebook None None None None 2024-05-01 18:48:54.097978+00:00 2024-05-01 18:48:54.097992+00:00 1
168 C4mabiClMXtcqzfL Research Descending colon IgG3 cluster visuali... None None None notebook None None None None 2024-05-01 18:48:54.104375+00:00 2024-05-01 18:48:54.104388+00:00 1
177 NauGq3zsqdzB7uuV Research IgG Tonsils IgG3 visualize investigat... None None None notebook None None None None 2024-05-01 18:48:54.105710+00:00 2024-05-01 18:48:54.105723+00:00 1
232 VEvCfwHsnHoEDfCJ Research IgG IgG3 Gland of Moll Veins gastric ... None None None notebook None None None None 2024-05-01 18:48:54.116502+00:00 2024-05-01 18:48:54.116516+00:00 1
293 dxcIJxL3T7FPD6Ku Research IgG4 Melanotropes. None None None notebook None None None None 2024-05-01 18:48:54.125651+00:00 2024-05-01 18:48:54.125665+00:00 1
347 591DQlD2iYxKiJIc Research IgG4 IgY study IgD IgG4 Red skeletal ... None None None notebook None None None None 2024-05-01 18:48:54.136270+00:00 2024-05-01 18:48:54.136283+00:00 1
358 feHkOuj45D0kiG0D Research Veins IgG1 Parietal epithelial cell D... None None None notebook None None None None 2024-05-01 18:48:54.137894+00:00 2024-05-01 18:48:54.137907+00:00 1
365 WJ1ppOjbbTTfgQF8 Research IgG Cold-sensitive sensory neurons IgG4. None None None notebook None None None None 2024-05-01 18:48:54.138920+00:00 2024-05-01 18:48:54.138933+00:00 1
439 N3a7PnyDPj3Bgcrz Research IgE IgG3 IgG3. None None None notebook None None None None 2024-05-01 18:48:54.152514+00:00 2024-05-01 18:48:54.152528+00:00 1
455 6eEMyuHXfz3U2GyU Research IgG research Descending colon IgG3. None None None notebook None None None None 2024-05-01 18:48:54.154881+00:00 2024-05-01 18:48:54.154894+00:00 1
464 NWAUwrVIjidAcBfJ Research investigate Parietal epithelial cell ... None None None notebook None None None None 2024-05-01 18:48:54.158867+00:00 2024-05-01 18:48:54.158880+00:00 1
Hide code cell content
# clean up test instance
!lamin delete --force mydata
!rm -r mydata
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.11.9/x64/bin/lamin", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich_click/rich_command.py", line 360, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamin_cli/__main__.py", line 103, in delete
    return delete(instance, force=force)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamindb_setup/_delete.py", line 140, in delete
    n_objects = check_storage_is_empty(
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamindb_setup/core/upath.py", line 814, in check_storage_is_empty
    raise InstanceNotEmpty(message)
lamindb_setup.core.upath.InstanceNotEmpty: Storage /home/runner/work/lamindb/lamindb/docs/mydata/.lamindb contains 4 objects ('./lamindb/_is_initialized'  ignored) - delete them prior to deleting the instance
['/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/PbQB9MUHDnsgorxmWe4K.fastq.gz', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/_is_initialized', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/cwFNANqriYscAAc0AYuj.jpg', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/y4w9pTiR4PlWjLHhPHes.parquet']