The Seven Bridges Knowledge Center

The Seven Bridges Platform is a simple solution for doing bioinformatics at industrial scale. But sometimes, everyone needs a little help.

Get Started

TCGA data

ABOUT DATASETS > TCGA data

Seven Bridges is committed to providing Platform users with the most up-to-date version of the TCGA legacy dataset that is available from the NCI Genomic Data Commons (GDC). In keeping with this commitment, the Platform transitioned from hosting the CGHub version of this dataset to the GDC Legacy Archive Data Release 11.0 version on July 10, 2018. As of this date, all files accessible via the Data Browser and the API correspond to Data Release 11.0. As of July 12, 2018, all files accessible via the Datasets API also correspond to Data Release 11.0 Files that were added to individual projects before this date and are no longer represented in the new dataset version will no longer be accessible via those projects but may be obtainable from the GDC archive by contacting the GDC Help Desk. Similarly, files that are no longer represented in Data Release 11.0 are no longer accessible through saved Data Browser queries, and affected queries will return a result of '0'. In addition, due to a change in the way some files within this dataset are hosted, a small number of saved Data Browser queries for which the files are still available also will return a '0' result. Such queries can be recreated using the Data Browser query-building canvas and will continue to return the same results as previously. Please contact our Support Team at support@sbgenomics.com if you have any questions. The Seven Bridges Team looks forward to continuing to collaborate with the GDC in the months ahead to ensure the timely availability through the Platform of new data releases for this dataset.

The Cancer Genome Atlas (TCGA) is one of the richest and most complete genomics datasets and was compiled to understand the molecular basis of cancers. Data collection for TCGA began in 2006 as a joint effort by the National Cancer Institute (NCI), National Human Genome Research Institute (NHGRI), the National Institutes of Health (NIH), and the U.S. Department of Health and Human Services.

Over the past decade, TCGA has grown to contain data on 33 different tumor types and over 11,000 cases (patients). Between 50 and 1500 cases have been sampled for each tumor type. For each case, multiple samples were analyzed, using microarray technology for genome characterization, and next-generation technology for sequencing. TCGA data currently represents more than 2.5 petabytes of information and is expected to grow as new samples are processed.

For a full list of TCGA data available on the Platform, see the table below. The table details data types and subtypes, the data format of data subtypes, and the access level of each data subtype.

Data type
Data subtype
Data format
Data Access Tier

Clinical

Clinical Data

XML

Open Data

Clinical

Biospecimen Data

XML

Open Data

Raw Sequencing Data

Aligned Reads

BAM

Controlled Data

Raw Sequencing Data

Unaligned Reads

TAR

Controlled Data

Raw Sequencing Data

Sequencing Tag

DGE-Tag

Open Data

Raw Sequencing Data

Sequencing Tag Counts

TXT

Open Data

Raw Microarray Data

Raw Intensities

Idat, CEL, TXT, TIF

Open and Controlled Data

Raw Microarray Data

Intensities Log2Ratio

TXT

Open Data

Raw Microarray Data

Intensities

TXT

Open Data

Raw Microarray Data

Normalized Intensities

TXT, Dat

Open and Controlled Data

Simple Nucleotide Variation

Genotypes

TXT, Dat

Controlled Data

Simple Nucleotide Variation

Simple Somatic Mutation

MAF

Open and Controlled Data

Simple Nucleotide Variation

Simple Nucleotide Variation

VCF

Controlled Data

Gene Expression

Gene Expression Quantification

TXT

Open Data

Gene Expression

miRNA Quantification

TXT

Open Data

Gene Expression

Isoform Expression Quantification

TXT

Open Data

Gene Expression

Exon Junction Quantification

TXT

Open Data

Gene Expression

Exon Quantification

TXT

Open Data

Structural Rearrangement

Structural Variation

VCF, FA

Controlled Data

DNA Methylation

Bisulfite Sequence Alignment

VCF

Controlled Data

DNA Methylation

Methylation Beta Value

TXT

Open Data

DNA Methylation

Methylation Percentage

BED

Open Data

Copy Number Variation

Copy Number Segmentation

TXT, Dat

Open Data

Copy Number Variation

Copy Number Estimate

TXT

Controlled Data

Copy Number Variation

LOH

TXT

Open Data

Copy Number Variation

Copy Number Variation

VCF

Controlled Data

Copy Number Variation

Normalized Copy Numbers

TXT

Controlled Data

Protein Expression

Protein Expression Quantification

TXT

Open Data

Other

Microsatellite Instability

FSA, TXT

Controlled Data

Raw microarray data

CGH array QC

PNG

Open Data

Other

ABI sequence trace

TR

Controlled data

Raw microarray data

CGH array QC

JPG

Open data

Raw sequencing data

Unaligned reads

FASTQ

Controlled data

Clinical

Clinical Data
Biospecimen Data

Biotab

Open data

Raw microarray data

CGH array QC

TSV

Open data

Raw sequencing data

Coverage WIG

WIG

Open and controlled data

Clinical
Raw microarray data

Pathology report
CGH array QC

PDF

Open data

Clinical

Tissue slide image
Diagnostic image

SVS

Open data

Biospecimen
Clinical

Biospecimen Supplement
Clinical Supplement

BCR XML

Open data

TCGA data


ABOUT DATASETS > TCGA data

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.