TCGA GRCh38 metadata

ABOUT METADATA FOR DATASETS > TCGA GRCh38 metadata

Overview

Metadata is data that describes other data. On this page, we've detailed TCGA metadata that are available for viewing and filtering TCGA data in the Data Browser and the Datasets API. TCGA metadata on the Platform consists of properties which describe the entities of the TCGA dataset.

Entities are particular resources with UUIDs, such as files, cases, samples, and cell lines.

Properties can either describe an entity or relate that entity to another entity. For instance, properties include an entity's vital status, gender, data format, or experimental strategy.

Entities for TCGA GRCh38

Entities for TCGA GRCh38
The following are entities for TCGA GRCh38. They represent clinical data, biospecimen data, and data about TCGA GRCh38 files. Note that these entities differ from the entities of legacy TCGA data. Learn more about TCGA GRCh38 data.

  • investigation
  • case
  • demographic
  • diagnosis
  • treatment
  • exposure
  • drug_therapy
  • radiation_therapy
  • follow_up
  • new_tumor_event
  • sample
  • portion
  • slide
  • analyte
  • aliquot
  • read_group
  • read_group_qc
  • file
  • analysis

Below, each of these entities is followed by a table of their related properties.

Investigation

The investigation entity represents the project or study that generated the data. Members of the investigation entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the investigation entity below.

PropertyDescription
dbGaP accession numberThe dbGaP accession number provided for each study. See NCI Thesaurus Code: C25402.
Investigation nameThe full name of the project or study that generated the data. See NCI Thesaurus Code: C41198.
Submitter IDA human-readable identifier, such as a number or a string that may contain metadata information for investigations.

Case

The case entity represents TCGA cases. Members of the case entity are subjects who have taken part in an investigation or program and can be identified by a Universally Unique Identifier (UUID). See the table below for the clinical properties and descriptions of the case entity.

PropertyDescription
Batch numberA set of related analytes prepared for further analysis and numbered sequentially from the same disease. Once a Case has been assigned to a batch number, subsequent shipments from that case are assigned the same batch number as the original. Seven Bridges only field.
Submitter IDUsually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.
Disease typeThe type of the disease or condition studied. See NCI Thesaurus Code: C2991.
Primary siteThe anatomical site where the primary tumor is located in the organism. See NCI Thesaurus Code: C43761.
Tissue source site IDA clinical site that collects and provides patient samples and clinical metadata for research use. This is identified with UUID. See NCI Thesaurus Code: C103264.
Tissue source site nameThe full name of a clinical site that collects and provides patient samples and clinical metadata for research use. See NCI Thesaurus Code: C103264.
Tissue source site codeThe alphanumeric code for clinical site that collects and provides patient samples and clinical metadata for research use. See NCI Thesaurus Code: C103264.
Tissue source site BCR IDThe BCR (Biospecimen Core Resource) provided ID for a tissue source site. See NCI Thesaurus Code: C103264.

Demographic

The demographic entity represents the statistical characterization of human populations or segments of human populations (e.g., characterization by age, sex, race, or income) and can be identified by a Universally Unique Identifier (UUID). Find the properties of the demographic entity below.

PropertyDescription
Submitter IDUsually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.
EthnicityA socially-defined category of people based on common ancestral, cultural, biological, and social factors. See NCI Thesaurus Code: C29933.
RaceA classification of humans characterized by certain heritable traits, common history, nationality, or geographic distribution. See NCI Thesaurus Code: C17049.
GenderThe collection of behaviors and attitudes that distinguish people on the basis of the societal roles expected for the two sexes. See NCI Thesaurus Code: C17357.
Year of birthA numeric value to represent the calendar year in which an individual was born. See CDE (Common Data Element) Public ID: 2896954.
Year of deathA numeric value to represent the year of the death of an individual. See CDE (Common Data Element) Public ID: 2897030.

Diagnosis

The diagnosis entity represents the investigation, analysis, or recognition of the presence and nature of a disease, condition, or injury from expressed signs and symptoms. A diagnosis can be identified by a Universally Unique Identifier (UUID). Find the properties of the diagnosis entity below.

PropertyDescription
Submitter IDUsually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.
Age at diagnosisThe age in years of the Case at the initial pathological diagnosis of the disease or cancer. See NCI Thesaurus Code: C15220.
Classification of tumorText that describes the kind of disease present in the tumor specimen as related to a specific point in time. See CDE (Common Data Element) Public ID: 3288124.
Days to birthThe time interval from a person's date of birth to the date of initial pathologic diagnosis, represented as a calculated negative number of days. See CDE (Common Data Element) Public ID: 3008233.
Days to deathThe time interval from a person's date of death to the date of initial pathologic diagnosis, represented as a calculated number of days. See CDE (Common Data Element) Public ID: 3165475.
Days to last follow upThe time interval from the date of the last follow up to the date of the initial pathologic diagnosis, represented as a calculated number of days. See CDE (Common Data Element) Public ID: 3008273.
Days to last known disease statusThe time interval from the date of the last follow up to the date of the initial pathologic diagnosis, represented as a calculated number of days. See CDE (Common Data Element) Public ID: 3008273.
Days to recurrenceThe time interval from the date of new tumor event, including progression, recurrence and new primary malignancies, to the date of the initial pathologic diagnosis, represented as a calculated number of days. See CDE (Common Data Element) Public ID: 3392464.
Last known disease statusThe state or condition of an individual's neoplasm at a particular point in time. See CDE (Common Data Element) Public ID: 3392464.
MorphologyThe morphology code which describes the characteristics of the tumor itself, including its cell type and biologic activity, according to the third edition of the International Classification of Diseases for Oncology (ICD-O). See CDE (Common Data Element) Public ID: 3226275.
Primary diagnosisText term for the structural pattern of cancer cells used to define a microscopic diagnosis. See CDE (Common Data Element) Public ID: 3081934.
Prior malignancyText term to describe the patient's history of prior cancer diagnosis and the spatial location of any previous cancer occurrence. See CDE (Common Data Element) Public ID: 3081934.
Progression or recurrenceYes/No/Unknown indicator to identify whether a patient has had a new tumor event after initial treatment. See CDE (Common Data Element) Public ID: 3121376.
New tumor event after initial treatmentA Boolean value denoting whether a neoplasm developed after the initial treatment was finished.
Site of resection or biopsyThe topography code which describes the anatomical site of origin of the neoplasm according to the third edition of the International Classification of Diseases for Oncology (ICD-O). See NCI Thesaurus Code: C37978. See CDE (Common Data Element) Public ID: 3226281.
Tissue or organ of originThe text term that describes the anatomic site of the tumor or disease. See CDE (Common Data Element) Public ID: 3226281.
Tumor gradeThe numeric value to express the degree of abnormality of cancer cells, a measure of differentiation and aggressiveness. See CDE (Common Data Element) Public ID: 2785839.
Tumor stageThe extent of a cancer in the body. Staging is usually based on the size of the tumor, whether lymph nodes contain cancer, and whether the cancer has spread from the original site to other parts of the body. NCI Thesaurus Code: C16899; also see NCI Thesaurus Code: C28257 for Pathological stage.
Vital statusThe state of being living or deceased for Cases that are part of the investigation. See NCI Thesaurus Code: C25717.
Histological diagnosisThe diagnosis of a disease based on the type of tissue as determined based on the microscopic examination of the tissue. See NCI Thesaurus Code: C61478.
Histological diagnosis otherAdditional options for histologics diagnosis (see Histologic diagnosis), which have not been pre-determined in the listed values for histologic diagnosis.
Year of diagnosisThe numeric value to represent the year of an individual's initial pathologic diagnosis of cancer. See CDE (Common Data Element) Public ID: 2896960.
Clinical T (TNM)The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). The T category describes the original (primary) tumor. NCI Thesaurus Code: C48881 and C253840.
Clinical M (TNM)The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). The M category tells whether there are distant metastases (spread of cancer to other parts of the body). NCI Thesaurus Code: C48881 and C25385.
Clinical N (TNM)The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). The N category describes whether or not the cancer has reached nearby lymph nodes NCI Thesaurus Code: C48881 and C25384.
Clinical stageThe extent of a cancer in the body. Staging is usually based on the size of the tumor, whether lymph nodes contain cancer, and whether the cancer has spread from the original site to other parts of the body. See CDE (Common Data Element) Public ID: 5243162.
Pathologic T (TNM)The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). The T category describes the original (primary) tumor. NCI Thesaurus Code: C48881 and C48739.
Pathologic N (TNM)The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). The N category describes whether or not the cancer has reached nearby lymph nodes NCI Thesaurus Code: C48881 and C48740.
Pathologic M (TNM)The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). The M category tells whether there are distant metastases (spread of cancer to other parts of the body). NCI Thesaurus Code: C48881 and C48741.
Performance status scale: TimingA time reference for the Karnofsky score and/or the ECOG score using the defined categories.
Performance status scale: Karnofsky scoreAn index designed for classifying patients 16 years of age or older by their functional impairment. A standard way of measuring the ability of cancer patients to perform ordinary tasks. NCI Thesaurus Code: C28013.
Performance status scale: ECOGA performance status scale designed to assess disease progression and its effect on the daily living abilities of the patient. NCI Thesaurus Code: C105721.
Tumor statusThe condition or state of the tumor at a particular time. See NCI Thesaurus Code: C96643.
Primary therapy outcome successA value denoting the result of therapy for a given disease or condition in a patient or group of patients. See NCI Thesaurus Code: C18919.

Treatment

The treatment entity represents records of the administration of therapeutic agents to a patient to alter the course of a pathologic process. Members of the treatment entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the treatment entity below.

PropertyDescription
Submitter IDUsually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.
Days to treatmentThe number of days from the date of the initial pathologic diagnosis that treatment began.
Therapeutic agentsThe text identification of the individual agent(s) used as part of a prior treatment regimen. See CDE (Common Data Element) Public ID: 2975232.
Treatment intent typeThe text term to identify the reason for the administration of a treatment regimen. [Manually-curated]. See CDE (Common Data Element) Public ID: 2793511.
Treatment or therapyA yes/no/unknown/not applicable indicator related to the administration of therapeutic agents received before the body specimen was collected. See CDE (Common Data Element) Public ID: 4231463.

Exposure

The exposure entity represents clinically relevant patient information which does not immediately result from genetic predispositions. Members of the exposure entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the exposure entity below.

PropertyDescription
Submitter IDUsually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.
Alcohol historyA response to the question that asks whether the participant has consumed at least 12 drinks of any kind of alcoholic beverage in their lifetime. See CDE (Common Data Element) Public ID: 2201918. Also: A description of an individual's current and past experience with alcoholic beverage consumption. See NCI Thesaurus Code: C81229.
Alcohol intensityA category to describe the patient's current level of alcohol use as self-reported by the patient. See CDE (Common Data Element) Public ID: 3457767.
BMIThe body mass divided by the square of the body height expressed in units of kg/m2. See CDE (Common Data Element) Public ID: 4973892.
Cigarettes per dayThe average number of cigarettes smoked per day. See CDE (Common Data Element) Public ID: 2001716.
HeightThe height of the patient in centimeters. See CDE (Common Data Element) Public ID: 649.
WeightThe weight of the patient measured in kilograms. See CDE (Common Data Element) Public ID: 651.
Years smokedThe numeric value (or unknown) to represent the number of years a person has been smoking. See CDE (Common Data Element) Public ID: 3137957.

Drug therapy

The drug therapy entity represents the use of pharmaceutical products that contains one or more active and/or inactive ingredients to treat, prevent or alleviate the symptoms of disease. A Case can have more them one drug treatment. Members of the drug therapy entity can be identified by a Universally Unique Identifier (UUID).

PropertyDescription
Submitter IDUsually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.
Drug nameThe most recognizable term associated with a pharmaceutical product used to prevent, diagnose, treat or relieve symptoms of a disease or abnormal condition. NCI Thesaurus Code: C97104.
Pharmaceutical therapy typeThe type of treatment of the disease through the use of drugs. NCI Thesaurus Code: C15986.

Radiation therapy

The radiation therapy entity represents the treatment of a disease with radiation therapy, in which the whole or a portion of the patient's body is exposed to radiation. A Case can have more than one radiation treatment. Members of the radiation therapy entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the radiation therapy entity below.

PropertyDescription
Submitter IDThis refers to the radiation therapy id, a human-readable identifier, such as a number or a string that may contain metadata information, for the radiation treatment of a disease by means of exposure of the target or the whole body to radiation. NCI Thesaurus Code: C15986.
Radiation typeThe value denotes the type of high-energy radiation used to kill cancer cells and shrink tumors. NCI Thesaurus Code: C15986.
Radiation therapy siteThe location to which radiation therapy was administered.

Follow up

The follow up entity refers to follow ups which monitor a person's health over time after treatment. A Case can have multiple follow ups generated at different times. Members of the follow up entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the follow up entity below.

PropertyDescription
Submitter IDUsually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.
Days to deathThe time interval from a person's date of death to the date of initial pathologic diagnosis, represented as a calculated number of days. See CDE (Common Data Element) Public ID: 3165475.
Days to last follow upThe time interval from the date of the last follow up to the date of the initial pathologic diagnosis, represented as a calculated number of days. See CDE (Common Data Element) Public ID: 3008273.
New tumor event after initial treatmentA Boolean value which denotes whether a neoplasm developed after the initial treatment has finished.
Tumor statusThe condition or state of the tumor at a particular time. See NCI Thesaurus Code: C96643.
Vital statusThe state of being living or deceased for Cases that are part of the investigation. See NCI Thesaurus Code: C25717.
Performance status scale: ECOGA performance status scale designed to assess disease progression and its effect on the daily living abilities of the patient. NCI Thesaurus Code: C105721.
Performance status scale: Karnofsky scoreAn index designed for classifying patients 16 years of age or older by their functional impairment. A standard way of measuring the ability of cancer patients to perform ordinary tasks. NCI Thesaurus Code: C28013.
Performance status scale: TimingA time reference for the Karnofsky score and/or the ECOG score using the defined categories.
Other new tumor anatomic siteAlternative anatomic site of a newly developed neoplasm which has not been listed under 'New tumor anatomic site'.
Primary therapy outcome successA value denoting the result of therapy for a given disease or condition in a patient or group of patients. See NCI Thesaurus Code: C18919.
New tumor anatomic siteAnatomic site of newly developed neoplasm.
New tumor event typeType of newly developed neoplasm after initial treatment has finished.

New tumor event

The new tumor event entity represents a newly developed neoplasm after initial treatment has finished. Members of the new tumor event entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the new tumor event entity below.

PropertyDescription
New tumor anatomic siteAnatomic site of newly developed neoplasm.
Other new tumor anatomic siteAlternative anatomic site of a newly developed neoplasm which has not been listed under 'New tumor anatomic site'.
New tumor event typeType of newly developed neoplasm after initial treatment has finished.

Sample

The sample entity represents samples or specimen material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. For instance, samples include tissues, body fluids, cells, organs, embryos, and body excretory products. Members of the sample entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the sample entity below.

PropertyDescription
Submitter IDUsually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.
Sample typeThe type of material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. This includes tissues, body fluids, cells, organs, embryos, body excretory products, etc. See NCI Thesaurus Code: C70713.
Sample type IDA code that determines type of material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. This includes tissues, body fluids, cells, organs, embryos, body excretory products, etc. See NCI Thesaurus Code: C70713.
CompositionThe cellular composition of the sample.
Current weightCurrent sample/specimen weight (in grams).
Days to collectionThe time interval from the date of biospecimen collection to the date of initial pathologic diagnosis, represented as a calculated number of days. Sample can be collected prospectively or retrospectively. This can be a negative value for samples taken retrospectively. See CDE (Common Data Element) Public ID: 3008340.
Country of sample procurementCountry where the specimen/sample has been procured.
Days to sample procurementThe time interval from the date of sample collection to the date of sample procurement, expressed in days.
Freezing methodMethod used to freeze the sample/specimen.
Initial weightInitial sample/specimen weight (in grams).
Intermediate dimensionThe intermediate dimension of sample/specimen (in millimeters).
Is FFFPA Boolean value that denotes whether tissue samples used in the analysis were formalin-fixed paraffin-embedded (FFPE).
Longest dimensionThe longest dimension of the sample/specimen, in millimeters.
OCT embeddedA Boolean value indicating whether the Optimal Cutting Temperature compound (OCT) is used to embed tissue samples prior to frozen sectioning on a microtome-cryostat.
Pathology report UUIDUUID of the related pathology report.
Preservation methodThe primary preservation method used to store the sample.
Shortest dimensionThe shortest dimension of the sample/specimen, in millimeters.
Time between clamping and freezingThe time elapsed (in minutes) between clamping (supplying vessel) and freezing a sample.
Time between excision and freezingWarm ischemia time, elapsed between clamping and freezing a sample, as denoted in minutes.
Tissue typeA description of the tissue type with respect its tumor/normal source.
Tumor codeThe diagnostic tumor code of the tissue sample source.
Tumor code IDA BCR-defined ID code for the tumor sample.
Tumor descriptorA description of the tumor from which the sample was derived

Portion

The portion entity represents the sequential 100-120 mg sections derived from samples. Members of the portion entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the portion entity below.

PropertyDescription
Submitter IDUsually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.
Is FFFPA Boolean value that denotes whether tissue samples used in the analysis were formalin-fixed paraffin-embedded (FFPE).
Portion weightWeight of a portion prepared for the analysis (in mg).
Portion numberThe numerical value that represents the order of a portion in the series.
Center IDA professional organization or group which has or is able to submit data. It can be identified by a UUID.
Center typeThe type classification of the center (e.g. CGCC).
Center codeThe code that determins center that has submitted data.
Center nameThe name of the center (e.g. Broad Institute of MIT and Harvard).
Center namespaceThe domain name of the center (e.g. borad.mit.edu).
Center short nameA shortened name of the center (e.g. BI).

Slide

The slide entity represents slides, thin slices of a snap-frozen OCT embedded block of tissue sent for imaging. This same tissue also provides DNA and RNA for further analyses after they are reviewed by histopathologists. Members of the slide entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the slide entity below.

PropertyDescription
Submitter IDUsually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.
Section locationThe section of a tissue that has been imaged. The value denotes top, middle, or bottom.
Number proliferating cellsThe number of proliferating cells identified in the slide sample.
Percent eosinophil infiltrationThe fraction of eosinophil cells to the gross granulocyte component of inflammatory cells seen on a slide.
Percent granulocyte infiltrationThe fraction of the granulocyte component to the gross inflammatory cells seen on a slide.
Percent inflam infiltrationThe ratio of inflammatory cells to the gross cell population seen on a slide.
Percent lymphocyte infiltrationThe fraction of lymphocyte cells to the gross inflammatory cells seen on a slide.
Percent monocyte infiltrationThe fraction of monocyte cells to the gross inflammatory cells seen on a slide.
Percent necrosisThe percent of identified tumor cell necrosis based on the tissue image.
Percent neutrophil infiltrationThe fraction of neutrophile cells to the gross granulocyte component of inflammatory cells seen on a slide.
Percent normal cellsThe percent of identified normal cell based on the tissue image.
Percent stromal cellsThe ratio of identified stromal cells present on the tissue slide.
Percent tumor cellsThe percent of identified tumor cells based on the tissue image.
Percent tumor nucleiThe percent of identified tumor nuclei based on the tissue image.

Analyte

The analyte entity represents the analytes or molecules, such as DNA or RNA, used for analyses. An analyte is a molecular specimen extracted for analysis from a portion using a specific extraction protocol. Members of the analyte entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the analyte entity below.

PropertyDescription
Submitter IDUsually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.
AmountThe amount of a product (in g or volume in mL) prepared for an analysis.
A260_A280 ratioA purity measurement that weighs the absorbance at 260nm (DNA concentration) against the absorbance at 280nm (protein concentration/contamination).
Analyte typeThis defines the type of an analyte on molecular bases.
Analyte type IDAn ID that determines the type of an analyte on molecular bases. A single letter BCR code for the analyte type.
ConcentrationThe concentration of a product (in molarity) prepared for an analysis.
Spectrophotometer methodA method of quantifying the content of nucleic acids in any sample, used to measure sample purity (e.g. UV spec.)
Well numberThe number of wells on the plate in which an analyte has been stored for shipment and for the analysis.

Aliquot

The aliquot entity represents to aliquots, products or units extracted from a sample or specimen 's portion and prepared for analysis. Members of the aliquot entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the aliquot entity below.

PropertyDescription
Submitter IDUsually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.
AmountThe amount of a product (in g or volume in mL) prepared for an analysis.
ConcentrationThe concentration of a product (in molarity) prepared for an analysis.
Source centerThe name of the center that provided the item.
Center IDA professional organization or group which has or is able to submit data. It can be identified by a UUID.
Center typeThe type classification of the center (e.g. CGCC).
Center codeThe code that determines center that has submitted data.
Center nameThe name of the center (e.g. Broad Institute of MIT and Harvard).
Center namespaceThe domain name of the center (e.g. borad.mit.edu).
Center short nameThe shortened name of the center (e.g. BI).

Read group

The read group entity refers to the sequencing reads from one lane of an NGS experiment. Members of the read group entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the read group entity below.

PropertyDescription
Submitter IDUsually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.
Read group nameThe name of the read group.
Adapter nameThe name of the sequencing adapter.
Adapter sequenceThe base sequence of the sequencing adapter.
Base caller nameThe name of the base caller.
Base caller versionThe version of the base caller.
Experiment nameA submitter-defined name for the experiment.
Flow cell barcodeThe barcode assigned to flow cell.
Includes spike insA Boolean value which denotes whether a spike-in is included or not.
Instrument modelA specific model of sequencing instrument used.
Is paired endA Boolean value which denotes whether sequence reads are paired end or not.
Library nameThe name of the sequencing library preparation.
Library preparation kit catalog numberThe catalog number of the sequencing library preparation kit.
Library preparation kit nameThe name of the sequencing library preparation kit.
Library preparation kit vendorThe vendor of the sequencing library preparation kit.
Library preparation kit versionThe version of the sequencing library preparation kit.
Library selectionThe method used to select and/or enrich the material being sequenced.
Library strandThis determines whether the 'first strand' or 'second strand' of cDNA was used to prepare the library.
Library strategyThe sequencing technique intended for the library.
PlatformThe version (for instance, manufacturer or model) of the technology that was used for sequencing or assaying. See NCI Thesaurus Code: C45378.
Read lengthThe length of the reads.
RINThe RNA integrity number.
Sequencing centerThe name of the center that provided the sequence files.
Sequencing dateThe date of sequencing.
Size selection rangeThe range of size selection.
Spike ins concentrationThe concentration of a spike-in.
Spike ins fastaThe name of the FASTA file that contains the spike-in sequences.
Target capture kit catalog numberThe catalog number of target capture kit.
Target capture kit nameThe name of the target capture kit.
Target capture kit target regionThe target region for target capture kit.
Target capture kit vendorThe vendor of target capture kit.
Target capture kit versionThe version of a target capture kit.
To trim adapter sequenceA Boolean value for adapter trimming.

Read group QC

The read group QC represents read group quality control. Members of the read group QC entity can be identified with a Universally Unique Identifier (UUID). Find the properties of the read group QC entity below.

PropertyDescripton
Submitter IDUsually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.
Adapter contentAn analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/.
Basic statisticsAn analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/.
EncodingThe version of ASCII encoding of quality values found in the file.
FASTQ nameThe names of FASTQs.
Kmer contentThe number of times the kmer occurs in the sequence. Analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/.
Overrepresented sequencesThe analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/.
Per base N contentThe analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/.
Per base sequence contentThe analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/.
Per base sequence qualityThe analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/.
Per sequence GC contentThe analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/.
Per sequence quality scoreThe analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/.
Per tile sequence qualityThe analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/.
Percent GC contentThe overall %GC of all bases in all sequences.
Sequence duplication levelsThe analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/.
Sequence length distributionThe analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/.
Total sequencesThe analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/.
Workflow start datetimeThe start of the analysis workflow in datetime format.
Workflow end datetimeThe end of the analysis workflow in datatime format.
Workflow linkThe link to Github hash for the CWL workflow used (GDC related).
Workflow typeA generic name for the workflow used to analyze data.
Workflow versionThe version of the workflow used to analyze data.

File

The file entity refers to the files in TCGA produced by aliquot analyses. Members of the file entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the file entity below.

PropertyDescription
Submitter IDUsually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.
Data categoryThe classification of data used in (or produced by) the analysis, based on its form and content. See NCI Thesaurus Code: C42645.
Data typeA further, more specific classification of the data category, based on the information that it contains.
Data formatThe type of format that determines data content.
Experimental strategyThe method or protocol used to perform the laboratory analysis. See NCI Thesaurus Code: C43622.
File sizeThe size of a file measured in bytes (B), kilobytes (KB), megabytes (MB), gigabytes (GB), terabytes (TB), and larger values.
Access levelA Boolean value indicating Controlled Data or Open Data. Controlled Data is data from public datasets that has limitations on use and requires approval by dbGaP. Open Data is data from public datasets that doesn't have limitations on its use.
PlatformThe version (for instance, manufacturer or model) of the technology that was used for sequencing or assaying. See NCI Thesaurus Code: C45378.
Genome buildThe reference genome or assembly (such as HG19/GRCh37 or GRCh38) to which the nucleotide sequence of a case/subject/sample can be aligned.
Genome nameThe reference genome or assembly that also contains decoy viral sequence to which the nucleotide sequence of a case/subject/sample can be aligned.
GDC file UUIDThe unique identifier for a file, such as a UUID.

Analysis

The analysis entity represents analysis workflows used for processing data. Members of the analysis entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the analysis entity below.

PropertyDescription
Submitter IDUsually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.
Workflow start datetimeThe start of the analysis workflow in date/time format.
Workflow end datetimeThe end of the analysis workflow in date/time format.
Workflow linkA link to Github hash for the CWL workflow used (GDC related).
Workflow typeThe generic name for the workflow used to analyze data.
Workflow versionThe version of the workflow used to analyze data.