CHANGELOG
As of November 2023, the CHANGELOG is a feature of our documentation we’ll use to report and summarize changes to downloads from the ScPCA Portal.
You can find more information about how and when your download was prepared in the following places:
The date your download was packaged (
Generated on: {date}) is included at the top of the README in your download.The version of the
AlexsLemonade/scpca-nfpipeline used to process data in your download is included in theworkflow_versioncolumn of thesingle_cell_metadata.tsv,bulk_metadata.tsv, orspatial_metadata.tsvfile in your download. For more information aboutAlexsLemonade/scpca-nfversions, please see the releases page on GitHub.
2026.02.09
Consensus cell types have been updated to fix a minor bug in assigning labels. Previously, if two out of three methods (
SingleR,CellAssign, andSCimilarity) agreed but the third method was unable to classify a cell, the consensus cell type was incorrectly classified asUnknown. The consensus cell types have been updated so that if two out of three methods agree the consensus cell type is assigned with the appropriate label using an ontology-based approach, regardless of whether or not the third method is able to classify the cell.See the documentation on cell type annotation for more information on how consensus cell type are assigned.
See the the single-cell gene expression file contents page page for more information on where to find cell type annotations in the processed objects.
The
infercnv_successstatus previously reported in the metadata ofSingleCellExperimentobjects andunsofAnnDataobjects has been renamed asinfercnv_statusand now contains a string instead of boolean value. For more information on the contents of this value, see the single-cell gene expression file contents page.
2026.01.08
Data on the Portal can now be downloaded in three different ways:
By selecting a single project to download.
By creating a custom dataset with a selection of projects and/or samples. Custom datasets are referred to as
My Dataseton the Portal.By choosing one of the Portal-wide download options.
Although the content of the data included in each download has not changed, the download file structures have changed. See the Downloadable Files page for more information.
A new section of the documentation describing possible download options has been added.
The previously named
single_cell_metadata.tsvfiles that are included with each download have been renamed tosingle-cell_metadata.tsv.
2025.12.04
All data on the Portal has been updated to include a number of new features.
Doublet detection was run on all samples using
scDblFinder. No doublets were filtered, but the results fromscDblFinderare present in the filtered and processed objects.Updated cell type annotations
All samples include cell type annotations obtained from
SCimilarity, in addition to the existing cell type annotations fromSingleRandCellAssign.Consensus cell types have been updated to incorporate
SCimilarityresults. If two of the three automated methods agree using an ontology-based approach, a consensus cell type is assigned.See our documentation on cell type annotation for more information on these updated cell types.
Cell types were annotated as part of the ongoing OpenScPCA project for
SCPCP000004(Neuroblastoma) andSCPCP000015(Ewing sarcoma). These cell types are now included in all objects for those samples.For more information see our documentation on OpenScPCA cell types.
CNV inference was performed using
InferCNVon all samples with at least 100 non-malignant reference cells, as identified by the consensus cell types.See our documentation on CNV inference
For more information on where to find the inferCNV results in the downloaded objects see the single-cell gene expression file contents page and the merged object file contents page.
In addition to these new features, data from the Portal can now be downloaded programmatically using the new ScPCAr package.
See an example in our documentation.
2025.07.25
Previously, the
cell_idcolumn in the cell metadata for merged objects was incorrectly formatted.
This has now been fixed so that all merged objects have thecell_idformatted as<library_id>-<barcode>.
2025.04.24
Consensus cell type annotations are now available in all processed
SingleCellExperimentandAnnDataobjects and merged objects.The labels obtained from
SingleRandCellAssignare used to assign an ontology-aware consensus cell type label.See our documentation on cell type annotation to learn more about how consensus cell types are assigned.
See the single-cell gene expression file contents page and the merged object file contents page for more information on obtaining these cell type annotations from the downloaded objects.
All assays within a merged object are now saved as a sparse matrix (
CsparseMatrix), whereas previously these assays were saved as aDelayedArray.
2024.11.14
Recent versions of the raw count matrices in
SingleCellExperimentandAnnDataobjects were not rounded. The raw counts are now rounded to integer values in agreement with the documentation.Some project-specific metadata columns were renamed for uniformity:
Columns previously labeled as
mycn_statusor similar with a gene name prefix are capitalizedMYCN_status.Columns that were previously labeled with sentence case are now all lower case.
2024.10.14
Some cell count-related fields in downloadable metadata have changed.
sample_cell_count_estimatehas been removed from downloads for non-multiplexed samples.A new field that indicates the estimated number of cells from a sample for a given library—
demux_cell_count_estimate—has been added to downloadable metadata for multiplexed samples. This replaces thesample_cell_estimatefield, which has been removed.
Fusion nomenclature has been standardized using the double colon recommendation from the HUGO Gene Nomenclature Committee (Bruford et al. 2021) in downloadable metadata.
2024.10.01
You can now use a
Copy Download Linkbutton to get download links for projects on the ScPCA Portal. Please see the FAQ for more information.
2024.09.24
Metadata for all samples from all projects on the Portal can now be downloaded in a single tab-separated values file.
For more information on what to expect in the metadata file, see the metadata section of the Downloadable files page.
2024.08.13
A new column,
age_timing, is now present in the sample metadata tables included with each download.This column indicates if the age specified in the
agecolumn is the age at diagnosis (diagnosis), age at collection (collection), orunknown.This will also be present in the metadata of the
SingleCellExperimentandAnnDataobjects.
AnnData objects have been updated to improve compatibility with
Scanpy.PCA and UMAP embeddings are now stored as
X_pcaandX_umap(previouslyX_PCAandX_UMAP).A new column has been added to the
.varslot,highly_variable, indicating if the given gene can be found in the list of highly variable genes.Parameters and variance weights associated with the PCA results is now available in
.uns["pca"].See Components of an AnnData object for more information.
Downloads now follow a new naming convention:
{identifier}_{modality}_{file format}_{date}.zipFor example, a sample (
SCPCS999990) downloaded on 2024-08-13 in AnnData format will be named:SCPCP999990_SINGLE-CELL_ANN-DATA_2024-08-13.zipSee the Downloadable files page for more information.
Bulk RNA-seq files now follow a new naming convention:
{project accession identifier}_bulk_quant.tsvand{project accession identifier}_bulk_metatdata.tsv. See the Downloadable files page for more information.
2024.08.01
A table containing sample metadata (e.g., age, sex, diagnosis) is now available in both the QC report (
qc.html) and the supplemental cell type report (celltype-report.html) included in all downloads.
2024.06.20
Metadata for all samples in a specified project can now be downloaded as a tab-separated values file.
This allows users to download and view all sample, library, project, and processing-related metadata for all samples in a project without having to download all data in a project.
Metadata is also included with each data download.
For more information on what to expect in the metadata files, see the metadata section of the Downloadable files page.
2024.04.26
AnnDataobjects are stored in files that now have the extension.h5adinstead of.hdf5.A preprint describing the ScPCA Portal and the pipeline used to process data is now available on bioRxiv (DOI: 10.1101/2024.04.19.590243). Please cite this preprint when citing the ScPCA Portal (see How to Cite for more information).
2024.04.18
When downloading data for an entire project, you have the option to download a single file with a single merged object containing all gene expression and metadata for all samples in that project.
These merged objects are available as either
SingleCellExperiment(.rdsfiles) orAnnData(.hdf5files) objects.This option is available for most projects. If the project you are interested in does not have this option, see our FAQ on which projects can be downloaded as merged objects.
Merged project downloads will contain a brief summary report about the merged object as well as the individual QC and cell type annotation reports for all libraries in the merged object.
See our documentation on how merged objects were created and our FAQ about merged objects for more information.
2024.04.11
Cell type annotations are now included in each download. Cells were annotated using both
SingleRandCellAssign.You can find more information about how cell types were annotated in the cell type annotation procedures section on the Processing Information page. For more information on locating cell type annotations and any associated processing information in the downloaded objects see the Single-cell gene expression file contents page.
Downloads also contain a separate cell type report providing more information about cell type annotations, including comparisons between different cell type annotations and diagnostic assessments of cell type annotation reliability.
Sample metadata now includes two additional pieces of information which can be used to filter datasets: Whether the given sample is a patient-derived xenograft, and whether the sample is derived from a cell line.
Processed
SingleCellExperimentobjects no longer include the fullmiQCresult object in their metadata, but themiQCobject is still available in the filteredSingleCellExperimentobjects.Download files with
SingleCellExperimentobjects now usebz2compression. This means the file sizes will be much smaller, but you may notice slower read times when loading them into R.This release additionally includes community-contributed projects. Community-contributed projects are 10x Genomics single-cell or single-nuclei datasets that have been processed with the ScPCA pipeline. Please refer to the contributions page for more information about community contributions.
2024.03.08
Downloads for most projects are now available in
AnnDataformat as HDF5 files. Multiplexed samples are not yet supported.The sample metadata found in
single_cell_metadata.tsvhas been updated to include ontology term ids for age, sex, organism, ethnicity, diagnosis, and tissue location, when available. See the section describing Metadata on the Downloadable Files page.All samples now have an assigned
participant_id, which can be found insingle_cell_metadata.tsv. Previously, aparticipant_idwas only assigned when multiple samples mapped to the same participant for most projects.All data files now include both the gene expression data and metadata for each sample (e.g., age, sex, organism, ethnicity, diagnosis, and tissue location). For more information on the contents of the data files, see the Single-cell gene expression file contents page.
Data files will include cell type annotations provided by submitters when applicable.
2023.11.10
The README included in your download now contains the following:
More information about how to cite the ScPCA Portal (see also: How to Cite).
The date your download was packaged (
Generated on: {date}) at the top of the file.
The root directory of your download will contain the date you accessed and downloaded data from the ScPCA Portal when uncompressed.