Storage FAQ#
Often, one works with temporary files but ultimately wants to store them in persistent storage locations, typically in the cloud. Hence, LaminDB comes with
a default storage location for persisting data
a registry for managing storage locations:
Storage
What is the default storage location?#
It’s the directory or cloud bucket that you pass when initializing a LaminDB instance:
lamin init --storage ./default-storage # or s3://default-bucket or gs://default-bucket
It’s easiest to see and update default storage in the Python API (storage
):
import lamindb as ln
ln.settings.storage # set via ln.settings.storage = "s3://other-bucket"
#> s3://default-bucket
You can also change it using the CLI via
lamin set --storage s3://other-bucket
Where is my SQLite file?#
The SQLite file is in the default storage location of the instance and called: f"{instance_name}.lndb"
You can also see it as part of the database connection string:
ln.setup.settings.instance.db
#> sqlite:///path-to-sqlite
If default storage is in the cloud, the SQLite file is cached in the local cache directory (cache_dir
):
ln.setup.settings.storage.cache_dir
#> path-to-cache-dir
What happens if I move the .lndb
file around?#
The SQLite file has to remain in the default storage location of the instance.
You can, however, take the SQLite file and place it in a new location (./mydir
, s3://my-bucket
) and create a new LaminDB instance passing --storage ./mydir
(or --storage s3://my-bucket
). All your metadata is then present in the new instance.
What is the .lamindb/
directory?#
It stores files that are merely referenced by metadata (the key
field of File
is None
).
There is only a single .lamindb/
directory per LaminDB instance.
What should I do if I want to bulk migrate files to another storage?#
Currently, you can only achieve this manually:
Copy or move files into the desired new storage location
Adapt the corresponding record in the
Storage
registry by setting theroot
field to the new location
When should I pass key
and when should I rely purely on metadata to register a file?#
The recommended way of making files findable in LaminDB is to link them to labels and use the description
field of File
.
When you’re registering existing data, however, they’ll often come with a semantic key
(the relative path within the storage location).
Will I never be able to find my file if I don’t give it a description?#
You can’t create files that have none of description
, key
and run
set.
Hence, you will always be able to find your find through either of these or
through additional metadata.
What should I do if I have a local file and want to upload it to S3?#
You can either create a file object from the local file and auto-upload it to the cloud during file.save()
:
file = ln.File(local_filepath)
file.save() # this will upload to the cloud
You can also create a file object from an existing cloud path:
file = ln.File("s3://my-bucket/my-file.csv")
file.save() # this will only save metadata as the file is already in registered storage
This enables to use any tool to move data into the cloud.
How to replace a file in storage?#
file.replace(new_data)
How to update metadata of a file?#
You can edit metadata of the file by querying it and then resetting its attributes. For instance,
file.description = "My new description"
file.save() # save the change to the database
What should I do if I acidentially delete a file from storage?#
The clean way to delete a file in LaminDB is via ln.delete(file)
which will:
always delete the metadata record
prompt to ask if user wants to delete the file from storage
If you delete a file from storage outside of LaminDB, you are left with a file record without valid storage. In this case, you can:
use
ln.delete()
to delete the file record from databsealternatively, if you’d like to keep the record, link the storage back via
file.stage()
file.description = "My new description"
file.save() # save the change to the database
How do I version a file?#
You use the is_new_version_of
parameter:
new_file = ln.File(df, is_new_version_of=old_file)
Then, new_file
automatically has the version
field set, incrementing the version number by one.
You can also pass a custom version:
new_file = ln.File(df, version="1.1", is_new_version_of=old_file)
It doesn’t matter which old version of the file you use, any old version is good!
How to set up a public read-only instance on an s3 bucket?#
For a public read-only instance the bucket should have certain policies configured.
You can read about s3 bucket policies here. For a public read-only instance the bucket should have s3:GetObject
and s3:ListBucket
permissions. The example policy is given below:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AddPerm",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::your-bucket-name/*"
},
{
"Sid": "AllowList",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::your-bucket-name"
}
]
}
Change your-bucket-name
above to the name of your s3 bucket.