TCGA data

ABOUT DATASETS > TCGA data

📘
Seven Bridges is committed to providing Platform users with the most up-to-date version of the TCGA legacy dataset that is available from the NCI Genomic Data Commons (GDC). In keeping with this commitment, the Platform transitioned from hosting the CGHub version of this dataset to the GDC Legacy Archive Data Release 11.0 version on July 10, 2018. As of this date, all files accessible via the Data Browser and the API correspond to Data Release 11.0. As of July 12, 2018, all files accessible via the Datasets API also correspond to Data Release 11.0 Files that were added to individual projects before this date and are no longer represented in the new dataset version will no longer be accessible via those projects but may be obtainable from the GDC archive by contacting the GDC Help Desk. Similarly, files that are no longer represented in Data Release 11.0 are no longer accessible through saved Data Browser queries, and affected queries will return a result of '0'. In addition, due to a change in the way some files within this dataset are hosted, a small number of saved Data Browser queries for which the files are still available also will return a '0' result. Such queries can be recreated using the Data Browser query-building canvas and will continue to return the same results as previously. Please contact our Support Team at [email protected] if you have any questions. The Seven Bridges Team looks forward to continuing to collaborate with the GDC in the months ahead to ensure the timely availability through the Platform of new data releases for this dataset.

The Cancer Genome Atlas (TCGA) is one of the richest and most complete genomics datasets and was compiled to understand the molecular basis of cancers. Data collection for TCGA began in 2006 as a joint effort by the National Cancer Institute (NCI), National Human Genome Research Institute (NHGRI), the National Institutes of Health (NIH), and the U.S. Department of Health and Human Services.

Over the past decade, TCGA has grown to contain data on 33 different tumor types and over 11,000 cases (patients). Between 50 and 1500 cases have been sampled for each tumor type. For each case, multiple samples were analyzed, using microarray technology for genome characterization, and next-generation technology for sequencing. TCGA data currently represents more than 2.5 petabytes of information and is expected to grow as new samples are processed.

For a full list of TCGA data available on the Platform, see the table below. The table details data types and subtypes, the data format of data subtypes, and the access level of each data subtype.

Data type	Data subtype	Data format	Data Access Tier
Clinical	Clinical Data	XML	Open Data
Clinical	Biospecimen Data	XML	Open Data
Raw Sequencing Data	Aligned Reads	BAM	Controlled Data
Raw Sequencing Data	Unaligned Reads	TAR	Controlled Data
Raw Sequencing Data	Sequencing Tag	DGE-Tag	Open Data
Raw Sequencing Data	Sequencing Tag Counts	TXT	Open Data
Raw Microarray Data	Raw Intensities	Idat, CEL, TXT, TIF	Open and Controlled Data
Raw Microarray Data	Intensities Log2Ratio	TXT	Open Data
Raw Microarray Data	Intensities	TXT	Open Data
Raw Microarray Data	Normalized Intensities	TXT, Dat	Open and Controlled Data
Simple Nucleotide Variation	Genotypes	TXT, Dat	Controlled Data
Simple Nucleotide Variation	Simple Somatic Mutation	MAF	Open and Controlled Data
Simple Nucleotide Variation	Simple Nucleotide Variation	VCF	Controlled Data
Gene Expression	Gene Expression Quantification	TXT	Open Data
Gene Expression	miRNA Quantification	TXT	Open Data
Gene Expression	Isoform Expression Quantification	TXT	Open Data
Gene Expression	Exon Junction Quantification	TXT	Open Data
Gene Expression	Exon Quantification	TXT	Open Data
Structural Rearrangement	Structural Variation	VCF, FA	Controlled Data
DNA Methylation	Bisulfite Sequence Alignment	VCF	Controlled Data
DNA Methylation	Methylation Beta Value	TXT	Open Data
DNA Methylation	Methylation Percentage	BED	Open Data
Copy Number Variation	Copy Number Segmentation	TXT, Dat	Open Data
Copy Number Variation	Copy Number Estimate	TXT	Controlled Data
Copy Number Variation	LOH	TXT	Open Data
Copy Number Variation	Copy Number Variation	VCF	Controlled Data
Copy Number Variation	Normalized Copy Numbers	TXT	Controlled Data
Protein Expression	Protein Expression Quantification	TXT	Open Data
Other	Microsatellite Instability	FSA, TXT	Controlled Data
Raw microarray data	CGH array QC	PNG	Open Data
Other	ABI sequence trace	TR	Controlled data
Raw microarray data	CGH array QC	JPG	Open data
Raw sequencing data	Unaligned reads	FASTQ	Controlled data
Clinical	Clinical Data Biospecimen Data	Biotab	Open data
Raw microarray data	CGH array QC	TSV	Open data
Raw sequencing data	Coverage WIG	WIG	Open and controlled data
Clinical Raw microarray data	Pathology report CGH array QC	PDF	Open data
Clinical	Tissue slide image Diagnostic image	SVS	Open data
Biospecimen Clinical	Biospecimen Supplement Clinical Supplement	BCR XML	Open data

Updated 8 months ago