Tutorial: Files & datasets#
Biology is measured in samples that generate data batches and you’ll almost always start out with files.
LaminDB helps you transform files into more useful representations: validated, queryable datasets or analytical insights.
The best way to build a map of the API is to embed into an iterative data warehousing or learning process (graphic).
The tutorial has two parts, each is a Jupyter notebook:
Tutorial: Files & datasets - register & access
Tutorial: Features & labels - validate & annotate
Setup#
Install the
lamindb
Python package:pip install 'lamindb[aws,jupyter]'
Log in on the command line:
lamin login <email> --password <password>
You can now init a LaminDB instance with a directory ./lamin-tutorial
for storing data:
!lamin init --storage ./lamin-tutorial # or "s3://my-bucket" or "gs://my-bucket"
Show code cell output
✅ saved: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-09-26 15:21:30)
✅ saved: Storage(id='at1jQOFk', root='/home/runner/work/lamindb/lamindb/docs/lamin-tutorial', type='local', updated_at=2023-09-26 15:21:30, created_by_id='DzTjkKse')
💡 loaded instance: testuser1/lamin-tutorial
💡 did not register local instance on hub (if you want, call `lamin register`)
What else can I configure during setup?
Instead of the default SQLite database, use PostgreSQL:
--db postgresql://<user>:<pwd>@<hostname>:<port>/<dbname>
Instead of a default instance name derived from storage, provide a custom name:
--name myinstance
Beyond the core schema, use bionty and other schemas:
--schema bionty,custom1,template1
For more, see Install & setup LaminDB.
Track a data source#
import lamindb as ln
💡 loaded instance: testuser1/lamin-tutorial (lamindb 0.54.2)
If new to LaminDB, set verbosity
to hint level:
ln.settings.verbosity = "hint"
The code that generates a batch of data is a transform (Transform
). It could be a pipeline, a notebook or an app upload.
Let’s track the notebook that’s being run:
ln.track()
💡 notebook imports: lamindb==0.54.2
✅ saved: Transform(id='NJvdsWWbJlZSz8', name='Tutorial: Files & datasets', short_name='tutorial', version='0', type=notebook, updated_at=2023-09-26 15:21:32, created_by_id='DzTjkKse')
✅ saved: Run(id='wgHeuqipA5Ujff8alH6Y', run_at=2023-09-26 15:21:32, transform_id='NJvdsWWbJlZSz8', created_by_id='DzTjkKse')
By calling track()
, the notebook is automatically linked as the source of all data that’s about to be saved!
What happened under the hood?
Imported package versions of current notebook were detected
Notebook metadata was detected and stored in a
Transform
recordRun metadata was detected and stored in a
Run
record
The Transform
class registers data transformations: a notebook, a pipeline or a UI operation.
The Run
class registers executions of transforms. Several runs can be linked to the same transform if executed with different context (time, user, input data, etc.).
How do I track a pipeline instead of a notebook?
transform = ln.Transform(name="My pipeline", version="1.2.0")
ln.track(transform)
Why should I care about tracking notebooks?
If you can, avoid interactive notebooks: Anything that can be a deterministic pipeline, should be a pipeline.
Just: much insight generated from biological data is driven by computational biologists interacting with it.
A notebook that’s run a single time on specific data is not a pipeline: it’s a (versioned) document that produced insight or some other form of data representation (with parallels to an ELN in the wetlab).
Because humans are in the loop, most mistakes happen when using notebooks: track()
helps avoiding some.
(An early blog post on this is here.)
Manage files#
We’ll work with a toy dataset of image files and transform it into higher-level features for downstream analysis.
(For other data types: see Data types.)
Consider 3 directories storing images & metadata of Iris flowers, generated in 3 subsequent studies:
ln.File.view_tree("s3://lamindb-dev-datasets/iris_studies")
Show code cell output
iris_studies (3 sub-directories & 151 files with suffixes '.jpg', '.csv'):
├── study0_raw_images
│ ├── iris-0337d20a3b7273aa0ddaa7d6afb57a37a759b060e4401871db3cefaa6adc068d.jpg
│ ├── iris-0797945218a97d6e5251b4758a2ba1b418cbd52ce4ef46a3239e4b939bd9807b.jpg
│ ├── iris-0f133861ea3fe1b68f9f1b59ebd9116ff963ee7104a0c4200218a33903f82444.jpg
│ ├── iris-0fec175448a23db03c1987527f7e9bb74c18cffa76ef003f962c62603b1cbb87.jpg
│ ├── iris-125b6645e086cd60131764a6bed12650e0f7f2091c8bbb72555c103196c01881.jpg
│ ├── iris-13dfaff08727abea3da8cfd8d097fe1404e76417fefe27ff71900a89954e145a.jpg
│ ...
│ └── meta.csv
├── study1_raw_images
│ ├── iris-0879d3f5b337fe512da1c7bf1d2bfd7616d744d3eef7fa532455a879d5cc4ba0.jpg
│ ├── iris-0b486eebacd93e114a6ec24264e035684cebe7d2074eb71eb1a71dd70bf61e8f.jpg
│ ├── iris-0ff5ba898a0ec179a25ca217af45374fdd06d606bb85fc29294291facad1776a.jpg
│ ├── iris-1175239c07a943d89a6335fb4b99a9fb5aabb2137c4d96102f10b25260ae523f.jpg
│ ├── iris-1289c57b571e8e98e4feb3e18a890130adc145b971b7e208a6ce5bad945b4a5a.jpg
│ ├── iris-12adb3a8516399e27ff1a9d20d28dca4674836ed00c7c0ae268afce2c30c4451.jpg
│ ...
│ └── meta.csv
└── study2_raw_images
├── iris-01cdd55ca6402713465841abddcce79a2e906e12edf95afb77c16bde4b4907dc.jpg
├── iris-02868b71ddd9b33ab795ac41609ea7b20a6e94f2543fad5d7fa11241d61feacf.jpg
├── iris-0415d2f3295db04bebc93249b685f7d7af7873faa911cd270ecd8363bd322ed5.jpg
├── iris-0c826b6f4648edf507e0cafdab53712bb6fd1f04dab453cee8db774a728dd640.jpg
├── iris-10fb9f154ead3c56ba0ab2c1ab609521c963f2326a648f82c9d7cabd178fc425.jpg
├── iris-14cbed88b0d2a929477bdf1299724f22d782e90f29ce55531f4a3d8608f7d926.jpg
...
└── meta.csv
Our goal is to turn these files into a validated & queryable dataset that can be used alongside many other datasets.
Register a file#
LaminDB uses the File
class to model files with their metadata and access. It’s a registry that manages search, queries, validation & access of files through metadata.
Let’s create a File
record from one of the files:
file = ln.File("s3://lamindb-dev-datasets/iris_studies/study0_raw_images/meta.csv")
file
✅ saved: Storage(id='qBDFItXr', root='s3://lamindb-dev-datasets', type='s3', region='us-east-1', updated_at=2023-09-26 15:21:33, created_by_id='DzTjkKse')
💡 file in storage 's3://lamindb-dev-datasets' with key 'iris_studies/study0_raw_images/meta.csv'
File(id='e8L3Y6GTSMoZct1D1jsd', key='iris_studies/study0_raw_images/meta.csv', suffix='.csv', size=4355, hash='ZpAEpN0iFYH6vjZNigic7g', hash_type='md5', storage_id='qBDFItXr', transform_id='NJvdsWWbJlZSz8', run_id='wgHeuqipA5Ujff8alH6Y', created_by_id='DzTjkKse')
Which fields are populated when creating a File record?
Basic fields:
id
: a universal ID (serves as a primary key in the underlying SQL table of the instance)key
: an optional storage key, i.e., the relative path of the file instorage
description
: an optional string descriptionstorage
: the storage location (the root, say, an S3 bucket or network location)suffix
: the file suffixsize
: the file size in byteshash
: a hash useful to check for integrity and collisions (is this file already stored?)hash_type
: the type of the hash (usually, an MD5 or SHA1 checksum)created_at
: time of creationupdated_at
: time of last update
Provenance-related fields:
created_by
: theUser
who created the filetransform
: theTransform
(pipeline, notebook, instrument, app) that was runrun
: theRun
of the transform that created the file
For a full reference, see File
.
Upon .save()
, file metadata is written to the database:
file.save()
What happens during save?
In the database: A file record is inserted into the File
registry. If the file record exists already, it’s updated.
In storage:
If the default storage is in the cloud,
.save()
triggers an upload for a local file.If the file is already in a registered storage location, only the metadata of the record is saved to the
File
registry.
The meta.csv
file is now registered in the database:
ln.File.filter().df()
storage_id | key | suffix | accessor | description | version | size | hash | hash_type | transform_id | run_id | initial_version_id | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||
e8L3Y6GTSMoZct1D1jsd | qBDFItXr | iris_studies/study0_raw_images/meta.csv | .csv | None | None | None | 4355 | ZpAEpN0iFYH6vjZNigic7g | md5 | NJvdsWWbJlZSz8 | wgHeuqipA5Ujff8alH6Y | None | 2023-09-26 15:21:33 | DzTjkKse |
View data flow#
Because we called track()
, we know that the file was saved in the current notebook (view_flow()
):
file.view_flow()
We can also directly access its linked Transform
& Run
records:
file.transform
Transform(id='NJvdsWWbJlZSz8', name='Tutorial: Files & datasets', short_name='tutorial', version='0', type=notebook, updated_at=2023-09-26 15:21:33, created_by_id='DzTjkKse')
file.run
Run(id='wgHeuqipA5Ujff8alH6Y', run_at=2023-09-26 15:21:32, transform_id='NJvdsWWbJlZSz8', created_by_id='DzTjkKse')
(For a comprehensive example with data flow through app uploads, pipelines & notebooks of multiple data types, see Project flow.)
Access a file#
path
gives you the filepath:
file.path
S3Path('s3://lamindb-dev-datasets/iris_studies/study0_raw_images/meta.csv')
To download the file to a local cache, call stage()
:
file.stage()
PosixUPath('/home/runner/.cache/lamindb/lamindb-dev-datasets/iris_studies/study0_raw_images/meta.csv')
To load a file into memory with a default loader, call load()
:
df = file.load(index_col=0) # calls `pd.read_csv` and passes `index_col=0` to it
df.head()
0 | 1 | |
---|---|---|
0 | iris-0797945218a97d6e5251b4758a2ba1b418cbd52ce... | setosa |
1 | iris-0f133861ea3fe1b68f9f1b59ebd9116ff963ee710... | versicolor |
2 | iris-9ffe51c2abd973d25a299647fa9ccaf6aa9c8eecf... | versicolor |
3 | iris-83f433381b755101b9fc9fbc9743e35fbb8a1a109... | setosa |
4 | iris-bdae8314e4385d8e2322abd8e63a82758a9063c77... | virginica |
If the file is large, you’ll likely want to query it via backed()
. For more on this, see: Query files & datasets.
How do I update a file?
If you’d like to replace the underlying stored object, use replace()
.
If you’d like to update metadata:
file.description = "My new description"
file.save() # save the change to the database
Register directories#
With from_dir()
we now register the entire directory of the first study:
files = ln.File.from_dir("s3://lamindb-dev-datasets/iris_studies/study0_raw_images")
❗ returning existing file with same hash: File(id='e8L3Y6GTSMoZct1D1jsd', key='iris_studies/study0_raw_images/meta.csv', suffix='.csv', size=4355, hash='ZpAEpN0iFYH6vjZNigic7g', hash_type='md5', updated_at=2023-09-26 15:21:33, storage_id='qBDFItXr', transform_id='NJvdsWWbJlZSz8', run_id='wgHeuqipA5Ujff8alH6Y', created_by_id='DzTjkKse')
✅ created 51 files from directory using storage s3://lamindb-dev-datasets and key = iris_studies/study0_raw_images/
(We see that we already registered one of the files. Instead of creating a new file record, the existing one is returned: see idempotency).
Let’s only register the first 5 records to keep things simple:
files_subset = files[:5]
ln.save(files_subset)
Query & search files#
You can search files directly based on the File
registry:
ln.File.search("meta").head()
key | description | __ratio__ | |
---|---|---|---|
id | |||
e8L3Y6GTSMoZct1D1jsd | iris_studies/study0_raw_images/meta.csv | 60.0 | |
svCQu04U5Z30DIxFdDk3 | iris_studies/study0_raw_images/iris-0337d20a3b... | 30.0 | |
BRfjqMzHzYHUZsILu4ns | iris_studies/study0_raw_images/iris-0797945218... | 30.0 | |
RLMC2dAZ9JnEqLOIMOGz | iris_studies/study0_raw_images/iris-0f133861ea... | 30.0 | |
mTqifcqWCYKp4M0VLrm7 | iris_studies/study0_raw_images/iris-0fec175448... | 30.0 |
You can also query & search the file by any metadata combination.
For instance, look up a user with auto-complete from the User
registry:
users = ln.User.lookup()
users.testuser1
User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-09-26 15:21:30)
Filter the Transform
registry for a name:
transform = ln.Transform.filter(
name__contains="files & datasets"
).one() # get exactly one result
transform
Transform(id='NJvdsWWbJlZSz8', name='Tutorial: Files & datasets', short_name='tutorial', version='0', type='notebook', updated_at=2023-09-26 15:21:34, created_by_id='DzTjkKse')
What does a double underscore mean?
For any field, the double underscore defines a comparator, e.g.,
name__icontains="Martha"
:name
contains"Martha"
when ignoring casename__startswith="Martha"
:name
starts with"Martha
name__in=["Martha", "John"]
:name
is"John"
or"Martha"
For more info, see: Query & search registries.
Use these results to filter the File
registry:
ln.File.filter(
created_by=users.testuser1,
transform=transform,
suffix=".jpg",
).df().head()
storage_id | key | suffix | accessor | description | version | size | hash | hash_type | transform_id | run_id | initial_version_id | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||
RLMC2dAZ9JnEqLOIMOGz | qBDFItXr | iris_studies/study0_raw_images/iris-0f133861ea... | .jpg | None | None | None | 12201 | 1uP_ORc_dQpcuk3oKkIOLw | md5 | NJvdsWWbJlZSz8 | wgHeuqipA5Ujff8alH6Y | None | 2023-09-26 15:21:34 | DzTjkKse |
svCQu04U5Z30DIxFdDk3 | qBDFItXr | iris_studies/study0_raw_images/iris-0337d20a3b... | .jpg | None | None | None | 14529 | e0Gct8LodEyQzNwy1glOPA | md5 | NJvdsWWbJlZSz8 | wgHeuqipA5Ujff8alH6Y | None | 2023-09-26 15:21:34 | DzTjkKse |
mTqifcqWCYKp4M0VLrm7 | qBDFItXr | iris_studies/study0_raw_images/iris-0fec175448... | .jpg | None | None | None | 10773 | d3I43842Sd5PUMgFBrgjKA | md5 | NJvdsWWbJlZSz8 | wgHeuqipA5Ujff8alH6Y | None | 2023-09-26 15:21:34 | DzTjkKse |
BRfjqMzHzYHUZsILu4ns | qBDFItXr | iris_studies/study0_raw_images/iris-0797945218... | .jpg | None | None | None | 19842 | v3G73F-8oISKexASY3RvUw | md5 | NJvdsWWbJlZSz8 | wgHeuqipA5Ujff8alH6Y | None | 2023-09-26 15:21:34 | DzTjkKse |
ihJ13ctAlqiqZqQDzE8Q | qBDFItXr | iris_studies/study0_raw_images/iris-125b6645e0... | .jpg | None | None | None | 21418 | Bsko3tdvYxWq_JB5fdoIbw | md5 | NJvdsWWbJlZSz8 | wgHeuqipA5Ujff8alH6Y | None | 2023-09-26 15:21:34 | DzTjkKse |
You can also query for directories using key__startswith
(LaminDB treats directories like AWS S3, as the prefix of the storage key
):
ln.File.filter(key__startswith="iris_studies/study0_raw_images/").df().head()
storage_id | key | suffix | accessor | description | version | size | hash | hash_type | transform_id | run_id | initial_version_id | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||
e8L3Y6GTSMoZct1D1jsd | qBDFItXr | iris_studies/study0_raw_images/meta.csv | .csv | None | None | None | 4355 | ZpAEpN0iFYH6vjZNigic7g | md5 | NJvdsWWbJlZSz8 | wgHeuqipA5Ujff8alH6Y | None | 2023-09-26 15:21:33 | DzTjkKse |
RLMC2dAZ9JnEqLOIMOGz | qBDFItXr | iris_studies/study0_raw_images/iris-0f133861ea... | .jpg | None | None | None | 12201 | 1uP_ORc_dQpcuk3oKkIOLw | md5 | NJvdsWWbJlZSz8 | wgHeuqipA5Ujff8alH6Y | None | 2023-09-26 15:21:34 | DzTjkKse |
svCQu04U5Z30DIxFdDk3 | qBDFItXr | iris_studies/study0_raw_images/iris-0337d20a3b... | .jpg | None | None | None | 14529 | e0Gct8LodEyQzNwy1glOPA | md5 | NJvdsWWbJlZSz8 | wgHeuqipA5Ujff8alH6Y | None | 2023-09-26 15:21:34 | DzTjkKse |
mTqifcqWCYKp4M0VLrm7 | qBDFItXr | iris_studies/study0_raw_images/iris-0fec175448... | .jpg | None | None | None | 10773 | d3I43842Sd5PUMgFBrgjKA | md5 | NJvdsWWbJlZSz8 | wgHeuqipA5Ujff8alH6Y | None | 2023-09-26 15:21:34 | DzTjkKse |
BRfjqMzHzYHUZsILu4ns | qBDFItXr | iris_studies/study0_raw_images/iris-0797945218... | .jpg | None | None | None | 19842 | v3G73F-8oISKexASY3RvUw | md5 | NJvdsWWbJlZSz8 | wgHeuqipA5Ujff8alH6Y | None | 2023-09-26 15:21:34 | DzTjkKse |
Note
You can look up, filter & search any registry (Registry
).
You can chain filter()
statements and search()
: ln.File.filter(suffix=".jpg").search("my image")
An empty filter returns the entire registry: ln.File.filter()
For more info, see: Query & search registries.
Describe files#
Get an overview of what happened:
file.describe()
File(id='e8L3Y6GTSMoZct1D1jsd', key='iris_studies/study0_raw_images/meta.csv', suffix='.csv', size=4355, hash='ZpAEpN0iFYH6vjZNigic7g', hash_type='md5', updated_at=2023-09-26 15:21:33)
Provenance:
🗃️ storage: Storage(id='qBDFItXr', root='s3://lamindb-dev-datasets', type='s3', region='us-east-1', updated_at=2023-09-26 15:21:33, created_by_id='DzTjkKse')
💫 transform: Transform(id='NJvdsWWbJlZSz8', name='Tutorial: Files & datasets', short_name='tutorial', version='0', type=notebook, updated_at=2023-09-26 15:21:34, created_by_id='DzTjkKse')
👣 run: Run(id='wgHeuqipA5Ujff8alH6Y', run_at=2023-09-26 15:21:32, transform_id='NJvdsWWbJlZSz8', created_by_id='DzTjkKse')
👤 created_by: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-09-26 15:21:30)
file.view_flow()
Version files#
If you’d like to version a file or transform, either provide the version
parameter when creating it or create new versions through is_new_version_of
.
For instance:
new_file = ln.File(data, is_new_version_of=old_file)
Are there remaining questions about storing files? If so, see: Storage FAQ.
Create a dataset#
The 50 image files together with their metadata annotations present a dataset. Let’s track it as such:
dataset = ln.Dataset(
files_subset, name="Iris study 1", description="50 image files and metadata"
)
dataset.save()
Most functionality that you just learned about files - e.g., queries & provenance - also applies to Dataset
.
The important difference is that a Dataset
does not have a key
field: it’s an abstraction over storing data in one or several files or other storage backends.
We’ll learn more about dataasets in the next part of the tutorial.
View changes#
With view()
, you can see the latest changes to the database:
ln.view() # link tables in the database are not shown
Show code cell output
Dataset
name | description | version | hash | reference | reference_type | transform_id | run_id | file_id | initial_version_id | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||
muderX8fqRejxX0zyfwQ | Iris study 1 | 50 image files and metadata | None | qW6WbNWDV_xiHYqAhku7 | None | None | NJvdsWWbJlZSz8 | wgHeuqipA5Ujff8alH6Y | None | None | 2023-09-26 15:21:34 | DzTjkKse |
File
storage_id | key | suffix | accessor | description | version | size | hash | hash_type | transform_id | run_id | initial_version_id | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||
ihJ13ctAlqiqZqQDzE8Q | qBDFItXr | iris_studies/study0_raw_images/iris-125b6645e0... | .jpg | None | None | None | 21418 | Bsko3tdvYxWq_JB5fdoIbw | md5 | NJvdsWWbJlZSz8 | wgHeuqipA5Ujff8alH6Y | None | 2023-09-26 15:21:34 | DzTjkKse |
BRfjqMzHzYHUZsILu4ns | qBDFItXr | iris_studies/study0_raw_images/iris-0797945218... | .jpg | None | None | None | 19842 | v3G73F-8oISKexASY3RvUw | md5 | NJvdsWWbJlZSz8 | wgHeuqipA5Ujff8alH6Y | None | 2023-09-26 15:21:34 | DzTjkKse |
mTqifcqWCYKp4M0VLrm7 | qBDFItXr | iris_studies/study0_raw_images/iris-0fec175448... | .jpg | None | None | None | 10773 | d3I43842Sd5PUMgFBrgjKA | md5 | NJvdsWWbJlZSz8 | wgHeuqipA5Ujff8alH6Y | None | 2023-09-26 15:21:34 | DzTjkKse |
svCQu04U5Z30DIxFdDk3 | qBDFItXr | iris_studies/study0_raw_images/iris-0337d20a3b... | .jpg | None | None | None | 14529 | e0Gct8LodEyQzNwy1glOPA | md5 | NJvdsWWbJlZSz8 | wgHeuqipA5Ujff8alH6Y | None | 2023-09-26 15:21:34 | DzTjkKse |
RLMC2dAZ9JnEqLOIMOGz | qBDFItXr | iris_studies/study0_raw_images/iris-0f133861ea... | .jpg | None | None | None | 12201 | 1uP_ORc_dQpcuk3oKkIOLw | md5 | NJvdsWWbJlZSz8 | wgHeuqipA5Ujff8alH6Y | None | 2023-09-26 15:21:34 | DzTjkKse |
e8L3Y6GTSMoZct1D1jsd | qBDFItXr | iris_studies/study0_raw_images/meta.csv | .csv | None | None | None | 4355 | ZpAEpN0iFYH6vjZNigic7g | md5 | NJvdsWWbJlZSz8 | wgHeuqipA5Ujff8alH6Y | None | 2023-09-26 15:21:33 | DzTjkKse |
Run
transform_id | run_at | created_by_id | reference | reference_type | |
---|---|---|---|---|---|
id | |||||
wgHeuqipA5Ujff8alH6Y | NJvdsWWbJlZSz8 | 2023-09-26 15:21:32 | DzTjkKse | None | None |
Storage
root | type | region | updated_at | created_by_id | |
---|---|---|---|---|---|
id | |||||
qBDFItXr | s3://lamindb-dev-datasets | s3 | us-east-1 | 2023-09-26 15:21:33 | DzTjkKse |
at1jQOFk | /home/runner/work/lamindb/lamindb/docs/lamin-t... | local | None | 2023-09-26 15:21:30 | DzTjkKse |
Transform
name | short_name | version | type | reference | reference_type | initial_version_id | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|
id | |||||||||
NJvdsWWbJlZSz8 | Tutorial: Files & datasets | tutorial | 0 | notebook | None | None | None | 2023-09-26 15:21:34 | DzTjkKse |
User
handle | name | updated_at | ||
---|---|---|---|---|
id | ||||
DzTjkKse | testuser1 | testuser1@lamin.ai | Test User1 | 2023-09-26 15:21:30 |
Read on#
Now, you already know about 6 out of 10 LaminDB core classes! The two most central are:
And the four registries related to provenance:
Transform
: transforms of files & datasetsRun
: runs of transformsUser
: usersStorage
: storage locations like S3/GCP buckets or local directories
If you want to validate data, label files & datasets and manage features, read on: Tutorial: Features & labels.