lamindb.Feature#

class lamindb.Feature(name: str, type: str, unit: str | None, description: str | None, synonyms: str | None)#

Bases: Registry, CanValidate

Numerical and categorical random variables.

A feature denotes a random variable, or, equivalently, a “measured dimension”.

A feature that you’d want to track with LaminDB is almost always a column storing observations of numbers or categories in a table or array.

The Feature registry is used to manage the metadata of these columns, most importantly, the column name & the data type. It helps validate column names & annotating datasets by whether they measured a feature.

See also

from_df(): Create feature records from DataFrame.
features: Feature manager of an artifact or collection.
ULabel: Universal labels.
FeatureSet: Feature sets.

Parameters:

name – str Name of the feature, typically, a column name.
type – str Simple type ("number", "category", "datetime"), equivalent of dtype in numpy.
unit – str | None = None Unit of measure, ideally SI ("m", "s", "kg", etc.) or "normalized" etc.
description – str | None = None A description.
synonyms – str | None = None Bar-separated synonyms.
registries – str | None = None Bar-separated registries that provide values for categories.

Note

Features and labels denote two ways for using entities to organize data:

A feature qualifies what is measured (a numerical or categorical random variable)
A label is a measured value (a category)

If re-shaping data introduced ambiguity, ask yourself what the joint measurement was: a feature qualifies variables in a joint measurement.

Notes

For more control, you can use bionty registries to manage common basic biological entities like genes, proteins & cell markers involved in expression/count measurements.

Similarly, you can define custom registries to manage high-level derived features like gene sets, malignancy, etc.

Examples

>>> df = pd.DataFrame({"feat1": [1, 2], "feat2": [3.1, 4.2], "feat3": ["cond1", "cond2"]})
>>> features = ln.Feature.from_df(df)
>>> features.save()
>>> # the information from the DataFrame is now available in the Feature table
>>> ln.Feature.filter().df()
id    name    type
 a   feat1     int
 b   feat2   float
 c   feat3     str

Fields

id AutoField: Internal id, valid only in one DB instance.

uid CharField: Universal id, valid across DB instances.

name CharField: Name of feature (required).

type CharField

Simple type.

If “category”, consider managing categories with ULabel or another Registry for managing labels.

unit CharField: Unit of measure, ideally SI (m, s, kg, etc.) or ‘normalized’ etc. (optional).

description TextField: A description.

registries CharField: Registries that provide values for labels, bar-separated (|) (optional).

synonyms TextField: Bar-separated (|) synonyms (optional).

created_at DateTimeField: Time of creation of record.

updated_at DateTimeField: Time of run execution.

created_by ForeignKey: Creator of record, a User.

feature_sets ManyToManyField: Feature sets linked to this feature.

Methods

classmethod from_df(df, field=None)#

Create Feature records for columns..

Return type:: RecordsList

save(*args, **kwargs)#

Save.

Return type:: None