TARGET GRCh38 metadata
ABOUT METADATA FOR DATASETS > TARGET GRCh38 metadata
Overview
Metadata is data that describes other data. On this page, we've detailed TARGET GRCh38 metadata that are available for viewing and filtering TARGET GRCh38 data in the Data Browser. TARGET GRCh38 metadata on the Platform consists of properties which describe the entities of the TARGET GRCh38 dataset.
Entities are particular resources with UUIDs, such as files, cases, samples, and cell lines.
Properties can either describe an entity or relate that entity to another entity. For instance, properties include an entity's vital status, gender, data format, or experimental strategy.
Entities for TARGET GRCh38
The following are entities for TARGET GRCh38.
- investigation
- case
- demographic
- diagnosis
- sample
- aliquot
- read_group
- read_group_qc
- file
- analysis
Below, each of these entities is followed by a table of their related properties.
Investigation
The investigation entity represents the project or study that generated the data. Members of the investigation entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the investigation entity below.
Property | Description |
---|---|
dbGaP accession number | The dbGaP accession number provided for each study. See NCI Thesaurus Code: C25402. |
Investigation name | The full name of the project or study that generated the data. See NCI Thesaurus Code: C41198. |
Case
The case entity represents TARGET cases. Members of the case entity are subjects who have taken part in an investigation or program and can be identified by a Universally Unique Identifier (UUID). See the table below for the clinical properties and descriptions of the case entity.
Property | Description |
---|---|
Submitter ID | Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID. |
Disease type | The type of the disease or condition studied. See NCI Thesaurus Code: C2991. |
Primary site | The anatomical site where the primary tumor is located in the organism. See NCI Thesaurus Code: C43761. |
Demographic
The demographic entity represents the statistical characterization of human populations or segments of human populations (e.g., characterization by age, sex, race, or income) and can be identified by a Universally Unique Identifier (UUID). Find the properties of the demographic entity below.
Property | Description |
---|---|
Submitter ID | Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID. |
Ethnicity | A socially-defined category of people based on common ancestral, cultural, biological, and social factors. See NCI Thesaurus Code: C29933. |
Race | A classification of humans characterized by certain heritable traits, common history, nationality, or geographic distribution. See NCI Thesaurus Code: C17049. |
Gender | The collection of behaviors and attitudes that distinguish people on the basis of the societal roles expected for the two sexes. See NCI Thesaurus Code: C17357. |
Diagnosis
The diagnosis entity represents the investigation, analysis, or recognition of the presence and nature of a disease, condition, or injury from expressed signs and symptoms. A diagnosis can be identified by a Universally Unique Identifier (UUID). Find the properties of the diagnosis entity below.
Property | Description |
---|---|
Submitter ID | Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID. |
Age at diagnosis | The age in years of the Case at the initial pathological diagnosis of the disease or cancer. See NCI Thesaurus Code: C15220. |
Days to death | The time interval from the date of initial pathologic diagnosis to a person's date of death, represented as a calculated number of days. See CDE (Common Data Element) Public ID: 3165475. |
Days to last follow up | The time interval from the date of the last follow up to the date to the current date, represented as a calculated number of days. See CDE (Common Data Element) Public ID: 3008273. |
Morphology | The morphology code which describes the characteristics of the tumor itself, including its cell type and biologic activity, according to the third edition of the International Classification of Diseases for Oncology (ICD-O). See CDE (Common Data Element) Public ID: 3226275. |
Primary diagnosis | Text term for the structural pattern of cancer cells used to define a microscopic diagnosis. See CDE (Common Data Element) Public ID: 3081934. |
Site of resection or biopsy | The topography code which describes the anatomical site of origin of the neoplasm according to the third edition of the International Classification of Diseases for Oncology (ICD-O). See NCI Thesaurus Code: C37978. See CDE (Common Data Element) Public ID: 3226281. |
Tumor stage | The extent of a cancer in the body. Staging is usually based on the size of the tumor, whether lymph nodes contain cancer, and whether the cancer has spread from the original site to other parts of the body. NCI Thesaurus Code: C16899; also see NCI Thesaurus Code: C28257 for Pathological stage. |
Vital status | The state of being living or deceased for Cases that are part of the investigation. See NCI Thesaurus Code: C25717. |
Sample
The sample entity represents samples or specimen material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. For instance, samples include tissues, body fluids, cells, organs, embryos, and body excretory products. Members of the sample entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the sample entity below.
Property | Description |
---|---|
Submitter ID | Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID. |
Sample type | The type of material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. This includes tissues, body fluids, cells, organs, embryos, body excretory products, etc. See NCI Thesaurus Code: C70713. |
Sample type ID | A code that determines type of material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. This includes tissues, body fluids, cells, organs, embryos, body excretory products, etc. See NCI Thesaurus Code: C70713. |
Tumor code | The diagnostic tumor code of the tissue sample source. |
Tumor Code ID | A BCR-defined ID code for the tumor sample. |
Aliquot
The aliquot entity represents to aliquots, products or units extracted from a sample or specimen 's portion and prepared for analysis. Members of the aliquot entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the aliquot entity below.
Property | Description |
---|---|
Submitter ID | Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID. |
Read group
The read group entity refers to the sequencing reads from one lane of an NGS experiment. Members of the read group entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the read group entity below.
Property | Description |
---|---|
Submitter ID | Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID. |
Read group name | The name of the read group. |
Experiment name | A submitter-defined name for the experiment. |
Instrument model | A specific model of sequencing instrument used. |
Is paired end | A Boolean value which denotes whether sequence reads are paired end or not. |
Library name | The name of the sequencing library preparation. |
Library strategy | The sequencing technique intended for the library. |
Platform | The version (for instance, manufacturer or model) of the technology that was used for sequencing or assaying. See NCI Thesaurus Code: C45378. |
Read length | The length of the reads. |
Sequencing center | The name of the center that provided the sequence files. |
Sequencing date | The date of sequencing. |
Target capture kit catalog number | The catalog number of target capture kit. |
Target capture kit name | The name of the target capture kit. |
Target capture kit target region | The target region for target capture kit. |
Target capture kit vendor | The vendor of target capture kit. |
Target capture kit version | The version of a target capture kit. |
Read group QC
The read group QC represents read group quality control. Members of the read group QC entity can be identified with a Universally Unique Identifier (UUID). Find the properties of the read group QC entity below.
Property | Descripton |
---|---|
Submitter ID | Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID. |
Adapter content | An analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/. |
Basic statistics | An analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/. |
Encoding | The version of ASCII encoding of quality values found in the file. |
FASTQ name | The names of FASTQs. |
Kmer content | The number of times the kmer occurs in the sequence. Analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/. |
Overrepresented sequences | The analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/. |
Per base N content | The analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/. |
Per base sequence content | The analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/. |
Per base sequence quality | The analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/. |
Per sequence GC content | The analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/. |
Per sequence quality score | The analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/. |
Per tile sequence quality | The analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/. |
Percent GC content | The overall %GC of all bases in all sequences. |
Sequence duplication levels | The analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/. |
Sequence length distribution | The analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/. |
Total sequences | The analysis module for quality control checks. Please refer to quality control tool for high throughput sequence data at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/. |
Workflow start datetime | The start of the analysis workflow in datetime format. |
Workflow end datetime | The end of the analysis workflow in datatime format. |
Workflow link | The link to Github hash for the CWL workflow used (GDC related). |
Workflow type | A generic name for the workflow used to analyze data. |
Workflow version | The version of the workflow used to analyze data. |
File
The file entity refers to the files in TARGET GRCh38 produced by aliquot analyses. Members of the file entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the file entity below.
Property | Description |
---|---|
Submitter ID | Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID. |
Data category | The classification of data used in (or produced by) the analysis, based on its form and content. See NCI Thesaurus Code: C42645. |
Data type | A further, more specific classification of the data category, based on the information that it contains. |
Data format | The type of format that determines data content. |
Experimental strategy | The method or protocol used to perform the laboratory analysis. See NCI Thesaurus Code: C43622. |
File size | The size of a file measured in bytes (B), kilobytes (KB), megabytes (MB), gigabytes (GB), terabytes (TB), and larger values. |
Access level | A Boolean value indicating Controlled Data or Open Data. Controlled Data is data from public datasets that has limitations on use and requires approval by dbGaP. Open Data is data from public datasets that doesn't have limitations on its use. |
Platform | The version (for instance, manufacturer or model) of the technology that was used for sequencing or assaying. See NCI Thesaurus Code: C45378. |
Genome build | The reference genome or assembly (such as HG19/GRCh37 or GRCh38) to which the nucleotide sequence of a case/subject/sample can be aligned. |
Genome name | The reference genome or assembly that also contains decoy viral sequence to which the nucleotide sequence of a case/subject/sample can be aligned. |
GDC file UUID | The unique identifier for a file, such as a UUID. |
Analysis
The analysis entity represents analysis workflows used for processing data. Members of the analysis entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the analysis entity below.
Property | Description |
---|---|
Submitter ID | Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID. |
Workflow link | A link to Github hash for the CWL workflow used (GDC related). |
Workflow type | The generic name for the workflow used to analyze data. |
Workflow version | The version of the workflow used to analyze data. |
Updated about 1 year ago