Convert a number of files to a single array store#
In the previous notebooks, we’ve seen how to incrementally create a dataset and train models on it.
Once we have a dataset of validated files, we might want to create them into one big array store.
This is what CellxGene team did for the data in the CellxGene portal: a high number of h5ad files were concatenated to give rise to a single array store.
This requires duplicating the data that’s present in a collection of
.h5ad files, but provides the advantage that one can now query slices for arbitrary metadata, rather than just the individual files.
See how this looks for
cellxgene here: CELLxGENE: scRNA-seq.