lamindb.FeatureSet#

class lamindb.FeatureSet(features: Iterable[Registry], type: str | None = None, name: str | None = None)#

Bases: Registry

Feature sets.

Stores references to sets of Feature and other registries that may be used to identify features (e.g., class:~bionty.Gene or class:~bionty.Protein).

See also

from_values()

Create from values.

from_df()

Create from dataframe columns.

Note

Feature sets are useful as you likely have many datasets that measure the same features. In LaminDB, they are all linked against the exact same feature set. If instead, you’d link each of the datasets against single features (say, genes), you’d face exploding link tables.

A feature set is identified by the hash of the feature uids in the set.

Parameters:
  • featuresIterable[Registry] An iterable of Feature records to hash, e.g., [Feature(...), Feature(...)]. Is turned into a set upon instantiation. If you’d like to pass values, use from_values() or from_df().

  • typestr | None = None The simple type. Defaults to None for sets of Feature records, and otherwise defaults to "number" (e.g., for sets of Gene).

  • namestr | None = None A name.

Examples

Create a featureset from df with types:

>>> df = pd.DataFrame({"feat1": [1, 2], "feat2": [3.1, 4.2], "feat3": ["cond1", "cond2"]})
>>> feature_set = ln.FeatureSet.from_df(df)

Create a featureset from features:

>>> features = ln.Feature.from_values(["feat1", "feat2"], type=float)
>>> feature_set = ln.FeatureSet(features)

Create a featureset from feature values:

>>> import bionty as bt
>>> feature_set = ln.FeatureSet.from_values(adata.var["ensemble_id"], Gene.ensembl_gene_id, orgaism="mouse")
>>> feature_set.save()

Link a feature set to an artifact:

>>> artifact.features.add_feature_set(feature_set, slot="var")

Link features to an artifact (will create a featureset under the hood):

>>> artifact.features.add(features)

Properties

members#

A queryset for the individual records of the set..

Fields

id AutoField

Internal id, valid only in one DB instance.

uid CharField

A universal id (hash of the set of feature values).

name CharField

A name (optional).

n IntegerField

Number of features in the set.

type CharField

Simple type, e.g., “str”, “int”. Is None for Feature (optional).

For Feature, types are expected to be in-homogeneous and defined on a per-feature level.

registry CharField

The registry that stores & validated the feature identifiers, e.g., 'core.Feature' or 'bt.Gene'.

hash CharField

The hash of the set.

created_at DateTimeField

Time of creation of record.

updated_at DateTimeField

Time of last update to record.

created_by ForeignKey

Creator of record, a User.

Methods

classmethod from_df(df, field=FieldAttr(Feature.name), name=None, mute=False, organism=None, public_source=None)#

Create feature set for validated features..

Return type:

Optional[FeatureSet]

classmethod from_values(values, field=FieldAttr(Feature.name), type=None, name=None, mute=False, organism=None, public_source=None)#

Create feature set for validated features.

Parameters:
  • values (Union[List[str], Series, array]) – A list of values, like feature names or ids.

  • field (DeferredAttribute, default: FieldAttr(Feature.name)) – The field of a reference Registry to map values.

  • type (Optional[str], default: None) – The simple type. Defaults to None if reference registry is Feature, defaults to “float” otherwise.

  • name (Optional[str], default: None) – A name.

  • **kwargs – Can contain organism or other context to interpret values.

Return type:

Optional[FeatureSet]

Examples

>>> features = ["feat1", "feat2"]
>>> feature_set = ln.FeatureSet.from_values(features)
>>> genes = ["ENS980983409", "ENS980983410"]
>>> feature_set = ln.FeatureSet.from_values(features, bt.Gene.ensembl_gene_id, float)

.

save(*args, **kwargs)#

Save.

Return type:

None