↳ TCGA data access
ABOUT DATASETS > TCGA data > TCGA data access
Overview
The Cancer Genome Atlas (TCGA) is made available on the Seven Bridges Platform through an integration with the Seven Bridges Cancer Genomics Cloud (CGC). TCGA on the Platform includes both Open and Controlled Data. While all data in TCGA is stripped of direct identifiers, DNA information is inherently unique to an individual. Two types of data access ‘tiers’ have been put in place to balance the desire to make the data as widely available as possible while ensuring that the rights of study participants are well protected. These two access tiers are described below.
Open Data includes information which is not unique to an individual. This includes information such as:
- De-identified clinical and demographic data
- Gene expression data
- Copy number alterations in regions of the genome
- Epigenetic data
- Summaries of data across individuals
Controlled Data includes information which is unique to an individual. This includes most raw data files and some processed data such as:
- Primary sequencing data (BAM and FASTQ files) from DNA, RNA, miRNA or bisulfite sequencing studies
- Raw and processed SNP6 array data
- Raw and processed Exon array data
- Somatic and germ-line mutation calls for an individual (VCF and MAF files)
Learn about your user responsibilities and how to authenticate and access TCGA data on the Platform.
User responsibilities
Seven Bridges is an NIH Trusted Partner, and we've made data security a priority. In addition, users are required to abide by their dbGaP data access requests and the NIH Genomic Data User Code of Conduct, the elements of which are reproduced below:
- Investigator(s) will use requested datasets solely in connection with the research project described in the approved Data Access Request for each dataset;
- Investigator(s) will make no attempt to identify or contact individual participants from whom these data were collected without appropriate approvals from the relevant IRBs;
- Investigator(s) will not distribute these data to any entity or individual beyond those specified in the approved Data Access Request;
- Investigator(s) will adhere to computer security practices that ensure that only authorized individuals can gain access to data files;
- Investigator(s) will not submit for publication or any other form of public dissemination analyses or other reports on work using or referencing NIH datasets prior to the embargo release date listed for the dataset (or dataset version) on dbGaP;
- Investigator(s) acknowledge the Intellectual Property Policies as specified in the Data Use Certification; and,
- Investigator(s) will report any inadvertent data release in accordance with the terms in the Data Use Certification, breach of data security, or other data management incidents contrary to the terms of data access.
Learn more about updating your Data Access Request to list Seven Bridges as the Platform as a Service (PaaS) and include cloud use. For TCGA-specific documents, please refer to the TCGA publication guidelines for point 5 above and the TCGA Data Use Certifications for points 6 and 7 above.
Authenticate and access TCGA
As TCGA on the Platform is available through an integration with the Seven Bridges Cancer Genomics Cloud (CGC), the CGC is the source for authenticating you with dbGaP and authorizing access to TCGA data. To access TCGA on the Platform, you will first be directed to create an account on the Seven Bridges CGC. After registering for a CGC account, you can connect your CGC account to your Seven Bridges Platform account to associate your CGC credentials.
Step 1: Register for a CGC account
You can sign up for the CGC using your (1) eRA Commons or NIH cit credentials or (2) your email address.
Note that to access TCGA Controlled Data on the CGC, you need to register with eRA Commons or NIH cit credentials which have the appropriate data access permissions through dbGaP. If you don't log in with eRA Commons or NIH cit credentials, you will only be able to access TCGA Open Data.
Please read the following instructions carefully before registering for the CGC.
- Option 1: If you have an eRA Commons or NIH cit account, register using these credentials.
- Option 2: If you don't have an eRA Commons account, register for a CGC account with your email address.
Option 1: register using eRA Commons or NIH cit credentials
To register for the CGC using your eRA Commons or NIH cit credentials:
- Access the CGC.
- On the left panel of the login page, click LOG IN to access the external NIH iTrust site for authentication.
- To complete authentication, enter your eRA Commons or NIH cit username and password.
- To complete your registration, enter the additional information required by the CGC and click PROCEED TO THE CGC PLATFORM.
We encourage you to read the CGC Terms of Use and TCGA Data Use policy carefully before using the CGC.
Option 2: register for the CGC if you do not have eRA Commons credentials
If you do not have eRA Commons credentials, create a CGC account using your email and a password of your choice. Note that accounts in this method will not have access to TCGA Controlled Data. Register using your eRA Commons or NIH cit credentials if you have approval to use TCGA Controlled Data.
To register with your email:
- Access the CGC.
- Click Create an account below the LOG IN button on the right panel.
- Select Register with good old email/password combo provide the information requested.
- Check your email to confirm your registration.
Step 2: Connect your CGC account with your Seven Bridges Platform account
Once you've created a CGC account, you can connect your CGC account to your Platform account. Your CGC credentials will be associated with your Platform account, and you will be able to access TCGA data right away.
To connect your CGC account, first you must obtain your CGC authentication token:
- On the CGC, click your username in the upper right corner and choose Developer from the menu.
The Developer Hub is displayed. - Click the Auth token tab.
- Click Generate Token to create your authentication token.
- Copy your authentication token to the clipboard. We'll be using this in a later step..
Now that you have your CGC authentication token, you can connect your account as follows:
- On the Seven Bridges Platform, click your username in the upper right corner and choose Account Settings from the menu.
- Select the Dataset access tab from the menu on the left.
- Paste your CGC authentication token into the form and click Connect accounts.
Your CGC account, along with your TCGA data access credentials, is now linked to your Platform account, as shown below. On this screen, you can also see the datasets available to you.
Note that your CGC authentication token will expire every few months. At this point, you need to reconnect your CGC account to your Platform account by following steps 1 through 3 above.
What type of TCGA data will I be able to access?
Once you register for a CGC account, you'll have access to TCGA data based on your data access approval. TCGA data on the Platform consists of Open Data and Closed Data.
Open Data access
All Platform users can access Open Data as soon as they create and connect their CGC account and agree to the TCGA Data Use Certifications as well as the TCGA publication guidelines.
Controlled Data access
Researchers requiring access to Controlled Data for their studies are required to obtain an approved Data Access Request through dbGaP and to agree to the TCGA Data Use Certifications](http://cancergenome.nih.gov/pdfs/Data_Use_Certv082014) as well as the TCGA publication guidelines.
If you are either a PI or a downloader in an approved dbGaP application, be sure to list Seven Bridges as the Platform as a Service (PaaS) in your dbGaP application.
Updated over 3 years ago