lamindb.FeatureSet#

class lamindb.FeatureSet(features: Iterable[Registry], type: str | None = None, name: str | None = None)#

Bases: Registry

Jointly measured sets of features.

See also

from_values()

Create from values.

from_df()

Create from dataframe columns.

Note

Feature sets are useful as you might have thousands of datasets that measure the same features: all of them link against the same feature set. If instead, you’d link against single features (say, genes), you’d face exploding link tables.

A feature_set is identified by the hash of feature values.

Parameters:
  • featuresIterable[Registry] An iterable of Feature records to hash, e.g., [Feature(...), Feature(...)]. Is turned into a set upon instantiation. If you’d like to pass values, use from_values() or from_df().

  • typestr | None = None The simple type. Defaults to None if reference Registry is Feature, defaults to "number" otherwise.

  • namestr | None = None A name.

Examples

>>> df = pd.DataFrame({"feat1": [1, 2], "feat2": [3.1, 4.2], "feat3": ["cond1", "cond2"]})
>>> feature_set = ln.FeatureSet.from_df(df)
>>> features = ln.Feature.from_values(["feat1", "feat2"], type=float)
>>> ln.FeatureSet(features)
>>> import bionty as bt
>>> reference = bt.Gene(organism="mouse")
>>> feature_set = ln.FeatureSet.from_values(adata.var["ensemble_id"], Gene.ensembl_gene_id)
>>> feature_set.save()
>>> artifact = ln.Artifact(adata, name="Mouse Lymph Node scRNA-seq")
>>> artifact.save()
>>> artifact.features.add_feature_st(feature_set, slot="var")

Properties

members#

A queryset for the individual records of the set..

Fields

id AutoField

Internal id, valid only in one DB instance.

uid CharField

A universal id (hash of the set of feature values).

name CharField

A name (optional).

n IntegerField

Number of features in the set.

type CharField

Simple type, e.g., “str”, “int”. Is None for Feature (optional).

For Feature, types are expected to be in-homogeneous and defined on a per-feature level.

registry CharField

The registry that stores & validated the feature identifiers, e.g., 'core.Feature' or 'bt.Gene'.

hash CharField

The hash of the set.

created_at DateTimeField

Time of creation of record.

updated_at DateTimeField

Time of last update to record.

created_by ForeignKey

Creator of record, a User.

Methods

classmethod from_df(df, field=FieldAttr(Feature.name), name=None, mute=False, organism=None, public_source=None)#

Create feature set for validated features..

Return type:

Optional[FeatureSet]

classmethod from_values(values, field=FieldAttr(Feature.name), type=None, name=None, mute=False, organism=None, public_source=None)#

Create feature set for validated features.

Parameters:
  • values (Union[List[str], Series, array]) – A list of values, like feature names or ids.

  • field (DeferredAttribute, default: FieldAttr(Feature.name)) – The field of a reference Registry to map values.

  • type (Optional[str], default: None) – The simple type. Defaults to None if reference registry is Feature, defaults to “float” otherwise.

  • name (Optional[str], default: None) – A name.

  • **kwargs – Can contain organism or other context to interpret values.

Return type:

Optional[FeatureSet]

Examples

>>> features = ["feat1", "feat2"]
>>> feature_set = ln.FeatureSet.from_values(features)
>>> genes = ["ENS980983409", "ENS980983410"]
>>> feature_set = ln.FeatureSet.from_values(features, bt.Gene.ensembl_gene_id, float)

.

save(*args, **kwargs)#

Save.

Return type:

None