About datasets


As a researcher or a clinician you may be interested in accessing publicly available datasets to complement your research and compare findings. As an authorized user you can access these data through the Platform.

Several of these cancer datasets are available on the Seven Bridges sister platform, the Cancer Genomics Cloud, powered by Seven Bridges (CGC).

  • The Cancer Genome Atlas, TCGA
  • Therapeutically Applicable Research To Generate Effective Treatments - TARGET
  • National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium - https://proteomics.cancer.gov/data-portal and CPTAC-3*
  • The Cancer Imaging Archive - TCIA

Most of these datasets include both Open and Controlled Data, with the exception of TCIA (open data only). While all data in TCGA and the other datasets is stripped of direct identifiers, DNA information is inherently unique to an individual.

Two types of data access ‘tiers’ have been put in place to balance the desire to make the data as widely available as possible while ensuring that the rights of study participants are well protected. These two access tiers are described below:

Open Data includes information which is not unique to an individual. This includes information such as:

  • De-identified clinical and demographic data
  • Gene expression data
  • Copy number alterations in regions of the genome
  • Epigenetic data
  • Summaries of data across individuals

Controlled Data includes information which is unique to an individual. This includes most raw data files and some processed data such as:

  • Primary sequencing data (BAM and FASTQ files) from DNA, RNA, miRNA or bisulfite sequencing studies
  • Raw and processed SNP6 array data
  • Raw and processed Exon array data
  • Somatic and germ-line mutation calls for an individual (VCF and MAF files)

Once the dbGAP credentials are approved, you will be able to use the data in an analysis, both on CGC and on the Platform. Please note that The Cancer Imaging Archive (TCIA) is public and does not require authentication.

*For CPTAC-III, only genomic data is available through the CGC (WGS, WXS, and RNA-Seq data). Proteomic data from CPTAC-III can be found via the Proteomics Data Commons portal and uploaded into your account directly.

For more information on how to add these data to your project, please contact us at [email protected].

User Responsibilities

  • Seven Bridges is an NIH Trusted Partner, and we've made data security a priority. In addition, users are required to abide by their dbGaP data access requests and the NIH Genomic Data User Code of Conduct, the elements of which are reproduced below:
  • Investigator(s) will use requested datasets solely in connection with the research project described in the approved Data Access Request for each dataset;
  • Investigator(s) will make no attempt to identify or contact individual participants from whom these data were collected without appropriate approvals from the relevant IRBs;
  • Investigator(s) will not distribute these data to any entity or individual beyond those specified in the approved Data Access Request;
  • Investigator(s) will adhere to computer security practices that ensure that only authorized individuals can gain access to data files;
  • Investigator(s) will not submit for publication or any other form of public dissemination analyses or other reports on work using or referencing NIH datasets prior to the embargo release date listed for the dataset (or dataset version) on dbGaP;
  • Investigator(s) acknowledge the Intellectual Property Policies as specified in the Data Use Certification; and,
  • Investigator(s) will report any inadvertent data release in accordance with the terms in the Data Use Certification, breach of data security, or other data management incidents contrary to the terms of data access.
  • Learn more about updating your Data Access Request to list Seven Bridges as the Platform as a Service (PaaS) and include cloud use.

Authenticate and access data

The data above is available through an integration with the Cancer Genomics Cloud (CGC), powered by Seven Bridges.

CGC is a source that allows you to authenticate with dbGaP and CGC and gain access to TCGA, CPTAC, and TARGET data.

For more information about the CGC, please visit www.cancergenomicscloud.org.

To be able to authenticate and access data, you will first need to create an account on CGC. After registering for a CGC account, you can connect your CGC account to your Seven Bridges Platform account and associate your CGC credentials.

Register for an account on CGC

You can sign up for CGC using your eRA Commons or NIH cit credentials or your email address.

To access Controlled Data on CGC, you need to register with eRA Commons credentials, which have the appropriate data access permissions through dbGaP.

If you use your email address to register, you will only be able to access Open Data.

We encourage you to read the CGC Terms of Use and TCGA Data Use policy to learn more about CGC.

Register using eRA Commons credentials

To register for an account using eRA Commons credentials, follow these steps:

  1. Access the CGC.
  2. Click Create an account.
  3. Click Continue with eRA Commons.
  1. You are redirected to the iTrust login page, where you should enter your eRA Commons or NIH CIT credentials and click Log in.
  2. Click Yes, I authorize. You are redirected back to CGC registration form.
  3. Fill out the registration form.
  4. Click Proceed to the CGC.

Register using your email address

To register with your email address:

  1. Access CGC.
  2. Click Create an account.
  3. Click Continue with email and password.
  1. Fill out the registration form.
  2. Click Register. CGC will send you an email containing a verification link.
  3. Open the email from CGC and click Confirm your email.

Connect your Seven Bridges account with your CGC account

Once you have created a CGC account, you can connect it to your Seven Bridges account. Your CGC credentials will be associated with your Seven Bridges account, and you will be able to access publicly available data right away.

Steps on CGC

The first step is obtaining your CGC authentication token:

  1. Choose option Authentication token from the Developer menu.
  1. Click Generate Token. CGC will generate your authentication token.
  1. Copy the token and access the Seven Bridges Platform.

Steps on the Sven Bridges Platform

Once you have obtained the authentication token from CGC (see above), access the Seven Bridges Platform, and follow these steps:

  1. Cilck your username in the upper right corner and choose option Account Settings.
  1. Click Dataset Access tab.
  2. Enter the authentication token you have copied on CGC.
  3. Click Connect account.

What type of data can you access?

Once you register for a CGC account, you will have access to various data based on your approved data access.

The following publicly available datasets are also available on the Platform:

  • The Cancer Genome Atlas, TCGA
  • Therapeutically Applicable Research To Generate Effective Treatments - TARGET
  • National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium - CPTAC and CPTAC-3*
  • The Cancer Imaging Archive - TCIA

Open Data access

All Platform users can access Open Data as soon as they create and connect their CGC account and agree to the use certifications as mentioned above.

Controlled Data access

Researchers requiring access to Controlled Data for their studies are required to obtain an approved Data Access Request through dbGaP and to agree to the respective data agreements and publication guidelines.

If you are either a PI or a downloader in an approved dbGaP application, be sure to list Seven Bridges as the Platform as a Service (PaaS) in your dbGaP application.