The Seven Bridges Knowledge Center

The Seven Bridges Platform is a simple solution for doing bioinformatics at industrial scale. But sometimes, everyone needs a little help.

Get Started

TCGA GRCh38 data

ABOUT DATASETS > TCGA GRCh38 data

📘

Seven Bridges is committed to providing Platform users with up-to-date versions of the datasets that are available from the NCI Genomic Data Commons (GDC). The currently available version of this dataset corresponds to GDC Data Release 24. More information about the data in this release can be found in the GDC Data Release Notes.

Learn more about our policies regarding updates to the GDC datasets.

The Cancer Genome Atlas (TCGA) is one of the richest and most complete genomics datasets and was compiled to understand the molecular basis of cancers. Data collection for TCGA began in 2006 as a joint effort by the National Cancer Institute (NCI), National Human Genome Research Institute (NHGRI), the National Institutes of Health (NIH), and the U.S. Department of Health and Human Services.

Over the past decade, TCGA has grown to contain data on 33 different tumor types and over 11,000 cases (patients). Between 50 and 1500 cases have been sampled for each tumor type. For each case, multiple samples were analyzed, using microarray technology for genome characterization, and next-generation technology for sequencing. TCGA data currently represents more than 2.5 petabytes of information and is expected to grow as new samples are processed.

This page details data within TCGA GRCh38. Nomenclature for TCGA GRCh38 is in accordance with GDC. For instance, the category Data type for legacy TCGA data is renamed Data Category for harmonized TCGA GRCh38 data. Similarly, Data subtype in legacy TCGA data is Data type in harmonized GRCh38 data. For a full list of TCGA GRCh38 data available on the Platform, see the table below. The table details data categories and types, the data format of data subtypes, and the access level of each data type.

Data category

Data type

Data format

Data access tier

Biospecimen

Biospecimen supplement

BCR XML

Open data

Clinical

Clinical supplement

BCR XML

Open data

Copy Number Variation

Copy Number Segment

TXT

Open data

Copy Number Variation

Masked Copy Number Segment

TXT

Open data

DNA Methylation

Methylation Beta Value

TXT

Open data

Raw Sequencing Data

Aligned reads

BAM

Controlled data

Simple Nucleotide Variation

Aggregated Somatic Mutation

MAF

Controlled data

Simple Nucleotide Variation

Annotated Somatic Mutation

VCF

Controlled data

Simple Nucleotide Variation

Masked Somatic Mutation

MAF

Controlled data

Simple Nucleotide Variation

Raw Simple Somatic Mutation

VCF

Controlled data

Transcriptome profiling

Gene Expression Quantification

TXT

Open data

Transcriptome profiling

Isoform Expression Quantification

TSV

Open data

Transcriptome profiling

miRNA Expression Quantification

TSV

Open data

Biospecimen

Slide Image

SVS

Open data

Updated 3 months ago

TCGA GRCh38 data


ABOUT DATASETS > TCGA GRCh38 data

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.