lamindb.FeatureSet#
- class lamindb.FeatureSet(features: Iterable[Registry], type: Optional[Union[Type, str]] = None, name: Optional[str] = None)#
Bases:
Registry
Jointly measured sets of features.
See also
from_values()
Create from values.
from_df()
Create from dataframe columns.
Note
Feature sets are useful as you might have millions of data batches that measure the same features: all of them link against the same feature set. If instead, you’d link against single features (say, genes), you’d face exploding link tables.
A
feature_set
is identified by the hash of feature values.- Parameters:
features –
Iterable[Registry]
An iterable ofFeature
records to hash, e.g.,[Feature(...), Feature(...)]
. Is turned into a set upon instantiation. If you’d like to pass values, usefrom_values()
orfrom_df()
.type –
Optional[Union[Type, str]] = None
The simple type. Defaults toNone
if reference Registry isFeature
, defaults to"float"
otherwise.name –
Optional[str] = None
A name.
Examples
>>> df = pd.DataFrame({"feat1": [1, 2], "feat2": [3.1, 4.2], "feat3": ["cond1", "cond2"]}) >>> feature_set = ln.FeatureSet.from_df(df)
>>> features = ln.Feature.from_values(["feat1", "feat2"], type=float) >>> ln.FeatureSet(features)
>>> import lnschema_bionty as bt >>> reference = bt.Gene(organism="mouse") >>> feature_set = ln.FeatureSet.from_values(adata.var["ensemble_id"], Gene.ensembl_gene_id) >>> feature_set.save() >>> file = ln.File(adata, name="Mouse Lymph Node scRNA-seq") >>> file.save() >>> file.features.add_feature_st(feature_set, slot="var")
Properties
- members#
A queryset for the individual records of the set.
Fields
- id AutoField
Internal id, valid only in one DB instance.
- uid CharField
A universal id (hash of the set of feature values).
- name CharField
A name (optional).
- n IntegerField
Number of features in the set.
- type CharField
Simple type, e.g., “str”, “int”. Is
None
forFeature
(optional).For
Feature
, types are expected to be in-homogeneous and defined on a per-feature level.
- registry CharField
The registry that stores & validated the feature identifiers, e.g.,
'core.Feature'
or'bionty.Gene'
.
- hash CharField
The hash of the set.
- created_at DateTimeField
Time of creation of record.
- updated_at DateTimeField
Time of last update to record.
- created_by ForeignKey
Creator of record, a
User
.
Methods
- classmethod from_df(df, field=FieldAttr(Feature.name), name=None, **kwargs)#
Create feature set for validated features.
- Return type:
Optional
[FeatureSet
]
- classmethod from_values(values, field=FieldAttr(Feature.name), type=None, name=None, **kwargs)#
Create feature set for validated features.
- Parameters:
values (
TypeVar
(ListLike
,list
, pd.Series, np.array)) – A list of values, like feature names or ids.field (
DeferredAttribute
, default:FieldAttr(Feature.name)
) – The field of a reference Registry to map values.type (
Union
[Type
,str
,None
], default:None
) – The simple type. Defaults to None if reference registry isFeature
, defaults to “float” otherwise.name (
Optional
[str
], default:None
) – A name.**kwargs – Can contain
organism
or other context to interpret values.
- Return type:
Optional
[FeatureSet
]
Examples
>>> features = ["feat1", "feat2"] >>> feature_set = ln.FeatureSet.from_values(features)
>>> genes = ["ENS980983409", "ENS980983410"] >>> feature_set = ln.FeatureSet.from_values(features, lb.Gene.ensembl_gene_id, float)
- save(*args, **kwargs)#
Save.
- Return type:
None