lamindb.Feature#

class lamindb.Feature(name: str, type: str, unit: str | None, description: str | None, synonyms: str | None)#

Bases: Registry, CanValidate

Numerical and categorical random variables.

A feature denotes a random variable, or, equivalently, a “measured dimension”.

A feature that you’d want to track with LaminDB is almost always a column storing observations of numbers or categories in a table or array.

The Feature registry is used to manage the metadata of these columns, most importantly, the column name & the data type. It helps validate column names & annotating datasets by whether they measured a feature.

See also

from_df()

Create feature records from DataFrame.

features

Feature manager of an artifact or collection.

ULabel

Universal labels.

FeatureSet

Feature sets.

Parameters:
  • namestr Name of the feature, typically, a column name.

  • typestr Simple type ("number", "category", "datetime"), equivalent of dtype in numpy.

  • unitstr | None = None Unit of measure, ideally SI ("m", "s", "kg", etc.) or "normalized" etc.

  • descriptionstr | None = None A description.

  • synonymsstr | None = None Bar-separated synonyms.

  • registriesstr | None = None Bar-separated registries that provide values for categories.

Note

Features and labels denote two ways for using entities to organize data:

  1. A feature qualifies what is measured (a numerical or categorical random variable)

  2. A label is a measured value (a category)

If re-shaping data introduced ambiguity, ask yourself what the joint measurement was: a feature qualifies variables in a joint measurement.

Notes

For more control, you can use bionty registries to manage common basic biological entities like genes, proteins & cell markers involved in expression/count measurements.

Similarly, you can define custom registries to manage high-level derived features like gene sets, malignancy, etc.

Examples

>>> df = pd.DataFrame({"feat1": [1, 2], "feat2": [3.1, 4.2], "feat3": ["cond1", "cond2"]})
>>> features = ln.Feature.from_df(df)
>>> features.save()
>>> # the information from the DataFrame is now available in the Feature table
>>> ln.Feature.filter().df()
id    name    type
 a   feat1     int
 b   feat2   float
 c   feat3     str

Fields

id AutoField

Internal id, valid only in one DB instance.

uid CharField

Universal id, valid across DB instances.

name CharField

Name of feature (required).

type CharField

Simple type.

If “category”, consider managing categories with ULabel or another Registry for managing labels.

unit CharField

Unit of measure, ideally SI (m, s, kg, etc.) or ‘normalized’ etc. (optional).

description TextField

A description.

registries CharField

Registries that provide values for labels, bar-separated (|) (optional).

synonyms TextField

Bar-separated (|) synonyms (optional).

created_at DateTimeField

Time of creation of record.

updated_at DateTimeField

Time of run execution.

created_by ForeignKey

Creator of record, a User.

feature_sets ManyToManyField

Feature sets linked to this feature.

Methods

classmethod from_df(df, field=None)#

Create Feature records for columns..

Return type:

RecordsList

save(*args, **kwargs)#

Save.

Return type:

None