The Seven Bridges Knowledge Center

The Seven Bridges Platform is a simple solution for doing bioinformatics at industrial scale. But sometimes, everyone needs a little help.

Get Started

CPTAC-3 metadata

🚧

On this page:

Overview

Metadata is data that describes other data. On this page, we've detailed CPTAC-3 metadata that is available for viewing and filtering CPTAC-3 data in the Data Browser. CPTAC-3 metadata on the Platform consists of properties which describe the entities of the CPTAC-3 dataset.

Entities are particular resources with UUIDs, such as files, cases, samples, and cell lines.

Properties can either describe an entity or relate that entity to another entity. For instance, properties include an entity's vital status, gender, data format, or experimental strategy.

Entities for CPTAC-3

The following are entities for CPTAC-3

  • investigation
  • case
  • demographic
  • diagnosis
  • sample
  • aliquot
  • read_group
  • file
  • analysis
  • analyte
  • exposure
  • portion

Below, each of these entities is followed by a table of their related properties.

Investigation

The investigation entity represents the project or study that generated the data. Members of the investigation entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the investigation entity below.

Property

Description

dbGaP accession number

The dbGaP accession number provided for each study. See NCI Thesaurus Code: C25402.

Investigation name

The full name of the project or study that generated the data. See NCI Thesaurus Code: C41198.

Submitter ID

A human-readable identifier, such as a number or a string that may contain metadata information for investigations.

Case

The case entity represents CPTAC-3 cases. Members of the case entity are subjects who have taken part in an investigation or program and can be identified by a Universally Unique Identifier (UUID). See the table below for the clinical properties and descriptions of the case entity.

Property

Description

Submitter ID

Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.

Disease type

The type of the disease or condition studied. See NCI Thesaurus Code: C2991.

Primary site

The anatomical site where the primary tumor is located in the organism. See NCI Thesaurus Code: C43761.

Tissue source site ID

A clinical site that collects and provides patient samples and clinical metadata for research use. This is identified with UUID. See NCI Thesaurus Code: C103264.

Tissue source site name

The full name of a clinical site that collects and provides patient samples and clinical metadata for research use. See NCI Thesaurus Code: C103264.

Tissue source site code

The alphanumeric code for clinical site that collects and provides patient samples and clinical metadata for research use. See NCI Thesaurus Code: C103264.

Tissue source site BCR ID

The BCR (Biospecimen Core Resource) provided ID for a tissue source site. See NCI Thesaurus Code: C103264.

Demographic

The demographic entity represents the statistical characterization of human populations or segments of human populations (e.g., characterization by age, sex, race, or income) and can be identified by a Universally Unique Identifier (UUID). Find the properties of the demographic entity below.

Property

Description

Submitter ID

Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.

Ethnicity

A socially-defined category of people based on common ancestral, cultural, biological, and social factors. See NCI Thesaurus Code: C29933.

Race

A classification of humans characterized by certain heritable traits, common history, nationality, or geographic distribution. See NCI Thesaurus Code: C17049.

Gender

The collection of behaviors and attitudes that distinguish people on the basis of the societal roles expected for the two sexes. See NCI Thesaurus Code: C17357.

Year of birth

A numeric value to represent the calendar year in which an individual was born. See CDE (Common Data Element) Public ID: 2896954.

Year of death

A numeric value to represent the year of the death of an individual. See CDE (Common Data Element) Public ID: 2897030.

Diagnosis

The diagnosis entity represents the investigation, analysis, or recognition of the presence and nature of a disease, condition, or injury from expressed signs and symptoms. A diagnosis can be identified by a Universally Unique Identifier (UUID). Find the properties of the diagnosis entity below.

Property

Description

Submitter ID

Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.

Age at diagnosis

The age in years of the Case at the initial pathological diagnosis of the disease or cancer. See NCI Thesaurus Code: C15220.

Days to death

The time interval from the date of initial pathologic diagnosis to a person's date of death, represented as a calculated number of days. See CDE (Common Data Element) Public ID: 3165475.

Days to last follow up

The time interval from the date of the last follow up to the date to the current date, represented as a calculated number of days. See CDE (Common Data Element) Public ID: 3008273.

Morphology

The morphology code which describes the characteristics of the tumor itself, including its cell type and biologic activity, according to the third edition of the International Classification of Diseases for Oncology (ICD-O). See CDE (Common Data Element) Public ID: 3226275.

Primary diagnosis

Text term for the structural pattern of cancer cells used to define a microscopic diagnosis. See CDE (Common Data Element) Public ID: 3081934.

Site of resection or biopsy

The topography code which describes the anatomical site of origin of the neoplasm according to the third edition of the International Classification of Diseases for Oncology (ICD-O). See NCI Thesaurus Code: C37978. See CDE (Common Data Element) Public ID: 3226281.

Tumor stage

The extent of a cancer in the body. Staging is usually based on the size of the tumor, whether lymph nodes contain cancer, and whether the cancer has spread from the original site to other parts of the body. NCI Thesaurus Code: C16899; also see NCI Thesaurus Code: C28257 for Pathological stage.

Vital status

The state of being living or deceased for Cases that are part of the investigation. See NCI Thesaurus Code: C25717.

Histological diagnosis

The diagnosis of a disease based on the type of tissue as determined based on the microscopic examination of the tissue. See NCI Thesaurus Code: C61478.

Histological diagnosis other

Additional options for histologics diagnosis (see Histologic diagnosis), which have not been pre-determined in the listed values for histologic diagnosis.

Year of diagnosis

The numeric value to represent the year of an individual's initial pathologic diagnosis of cancer. See CDE (Common Data Element) Public ID: 2896960.

Clinical T (TNM)

The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). The T category describes the original (primary) tumor. NCI Thesaurus Code: C48881 and C253840.

Clinical M (TNM)

The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). The M category tells whether there are distant metastases (spread of cancer to other parts of the body). NCI Thesaurus Code: C48881 and C25385.

Clinical N (TNM)

The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). The N category describes whether or not the cancer has reached nearby lymph nodes NCI Thesaurus Code: C48881 and C25384.

Clinical stage

The extent of a cancer in the body. Staging is usually based on the size of the tumor, whether lymph nodes contain cancer, and whether the cancer has spread from the original site to other parts of the body. See CDE (Common Data Element) Public ID: 5243162.

Pathologic T (TNM)

The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). The T category describes the original (primary) tumor. NCI Thesaurus Code: C48881 and C48739.

Pathologic N (TNM)

The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). The N category describes whether or not the cancer has reached nearby lymph nodes NCI Thesaurus Code: C48881 and C48740.

Pathologic M (TNM)

The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). The M category tells whether there are distant metastases (spread of cancer to other parts of the body). NCI Thesaurus Code: C48881 and C48741.

Performance status scale: Timing

A time reference for the Karnofsky score and/or the ECOG score using the defined categories.

Performance status scale: Karnofsky score

An index designed for classifying patients 16 years of age or older by their functional impairment. A standard way of measuring the ability of cancer patients to perform ordinary tasks. NCI Thesaurus Code: C28013.

Performance status scale: ECOG

A performance status scale designed to assess disease progression and its effect on the daily living abilities of the patient. NCI Thesaurus Code: C105721.

Tumor status

The condition or state of the tumor at a particular time. See NCI Thesaurus Code: C96643.

Primary therapy outcome success

A value denoting the result of therapy for a given disease or condition in a patient or group of patients. See NCI Thesaurus Code: C18919.

Sample

The sample entity represents samples or specimen material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. For instance, samples include tissues, body fluids, cells, organs, embryos, and body excretory products. Members of the sample entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the sample entity below.

Property

Description

Submitter ID

Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.

Sample type

The type of material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. This includes tissues, body fluids, cells, organs, embryos, body excretory products, etc. See NCI Thesaurus Code: C70713.

Sample type ID

A code that determines type of material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. This includes tissues, body fluids, cells, organs, embryos, body excretory products, etc. See NCI Thesaurus Code: C70713.

Tumor code

The diagnostic tumor code of the tissue sample source.

Tumor Code ID

A BCR-defined ID code for the tumor sample.

Days to collection

The time interval from the date of biospecimen collection to the date of initial pathologic diagnosis, represented as a calculated number of days. Sample can be collected prospectively or retrospectively. This can be a negative value for samples taken retrospectively. See CDE (Common Data Element) Public ID: 3008340.

Country of sample procurement

Country where the specimen/sample has been procured.

Days to sample procurement

The time interval from the date of sample collection to the date of sample procurement, expressed in days.

Freezing method

Method used to freeze the sample/specimen.

Initial weight

Initial sample/specimen weight (in grams).

Intermediate dimension

The intermediate dimension of sample/specimen (in millimeters).

Is FFFP

A Boolean value that denotes whether tissue samples used in the analysis were formalin-fixed paraffin-embedded (FFPE).

Longest dimension

The longest dimension of the sample/specimen, in millimeters.

OCT embedded

A Boolean value indicating whether the Optimal Cutting Temperature compound (OCT) is used to embed tissue samples prior to frozen sectioning on a microtome-cryostat.

Pathology report UUID

UUID of the related pathology report.

Preservation method

The primary preservation method used to store the sample.

Shortest dimension

The shortest dimension of the sample/specimen, in millimeters.

Time between clamping and freezing

The time elapsed (in minutes) between clamping (supplying vessel) and freezing a sample.

Time between excision and freezing

Warm ischemia time, elapsed between clamping and freezing a sample, as denoted in minutes.

Tissue type

A description of the tissue type with respect its tumor/normal source.

Tumor code

The diagnostic tumor code of the tissue sample source.

Tumor code ID

A BCR-defined ID code for the tumor sample.

Tumor descriptor

A description of the tumor from which the sample was derived

Aliquot

The aliquot entity represents aliquots, products or units extracted from a sample or specimen's portion and prepared for analysis. Members of the aliquot entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the aliquot entity below.

Property

Description

Submitter ID

Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.

Amount

The amount of a product (in g or volume in mL) prepared for an analysis.

Concentration

The concentration of a product (in molarity) prepared for an analysis.

Source center

The name of the center that provided the item.

Center ID

A professional organization or group which has or is able to submit data. It can be identified by a UUID.

Center type

The type classification of the center (e.g. CGCC).

Center code

The code that determines center that has submitted data.

Center name

The name of the center (e.g. Broad Institute of MIT and Harvard).

Center namespace

The domain name of the center (e.g. borad.mit.edu).

Center short name

The shortened name of the center (e.g. BI).

Read group

The read group entity refers to the sequencing reads from one lane of an NGS experiment. Members of the read group entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the read group entity below.

Property

Description

Submitter ID

Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.

Read group name

The name of the read group.

Experiment name

A submitter-defined name for the experiment.

Instrument model

A specific model of sequencing instrument used.

Is paired end

A Boolean value which denotes whether sequence reads are paired end or not.

Library name

The name of the sequencing library preparation.

Library strategy

The sequencing technique intended for the library.

Platform

The version (for instance, manufacturer or model) of the technology that was used for sequencing or assaying. See NCI Thesaurus Code: C45378.

Read length

The length of the reads.

Sequencing center

The name of the center that provided the sequence files.

Sequencing date

The date of sequencing.

Target capture kit catalog number

The catalog number of target capture kit.

Target capture kit name

The name of the target capture kit.

Target capture kit target region

The target region for target capture kit.

Target capture kit vendor

The vendor of target capture kit.

Target capture kit version

The version of a target capture kit.

File

The file entity refers to the files in CPTAC-3 produced by aliquot analyses. Members of the file entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the file entity below.

Property

Description

Submitter ID

Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.

Data category

The classification of data used in (or produced by) the analysis, based on its form and content. See NCI Thesaurus Code: C42645.

Data type

A further, more specific classification of the data category, based on the information that it contains.

Data format

The type of format that determines data content.

Experimental strategy

The method or protocol used to perform the laboratory analysis. See NCI Thesaurus Code: C43622.

File size

The size of a file measured in bytes (B), kilobytes (KB), megabytes (MB), gigabytes (GB), terabytes (TB), and larger values.

Access level

A Boolean value indicating Controlled Data or Open Data. Controlled Data is data from public datasets that has limitations on use and requires approval by dbGaP. Open Data is data from public datasets that doesn't have limitations on its use.

Platform

The version (for instance, manufacturer or model) of the technology that was used for sequencing or assaying. See NCI Thesaurus Code: C45378.

Genome build

The reference genome or assembly (such as HG19/GRCh37 or GRCh38) to which the nucleotide sequence of a case/subject/sample can be aligned.

Genome name

The reference genome or assembly that also contains decoy viral sequence to which the nucleotide sequence of a case/subject/sample can be aligned.

GDC file UUID

The unique identifier for a file, such as a UUID.

Analysis

The analysis entity represents analysis workflows used for processing data. Members of the analysis entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the analysis entity below.

Property

Description

Submitter ID

Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.

Workflow link

A link to Github hash for the CWL workflow used (GDC related).

Workflow type

The generic name for the workflow used to analyze data.

Workflow version

The version of the workflow used to analyze data.

Analyte

The analyte entity represents the analytes or molecules, such as DNA or RNA, used for analyses. An analyte is a molecular specimen extracted for analysis from a portion using a specific extraction protocol. Members of the analyte entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the analyte entity below.

Property

Description

Submitter ID

Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.

Amount

The amount of a product (in g or volume in mL) prepared for an analysis.

A260_A280 ratio

A purity measurement that weighs the absorbance at 260nm (DNA concentration) against the absorbance at 280nm (protein concentration/contamination).

Analyte type

This defines the type of an analyte on molecular bases.

Analyte type ID

An ID that determines the type of an analyte on molecular bases. A single letter BCR code for the analyte type.

Concentration

The concentration of a product (in molarity) prepared for an analysis.

Spectrophotometer method

A method of quantifying the content of nucleic acids in any sample, used to measure sample purity (e.g. UV spec.)

Well number

The number of wells on the plate in which an analyte has been stored for shipment and for the analysis.

Exposure

The exposure entity represents clinically relevant patient information which does not immediately result from genetic predispositions. Members of the exposure entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the exposure entity below.

Property

Description

Submitter ID

Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.

Alcohol history

A response to the question that asks whether the participant has consumed at least 12 drinks of any kind of alcoholic beverage in their lifetime. See CDE (Common Data Element) Public ID: 2201918. Also: A description of an individual's current and past experience with alcoholic beverage consumption. See NCI Thesaurus Code: C81229.

Alcohol intensity

A category to describe the patient's current level of alcohol use as self-reported by the patient. See CDE (Common Data Element) Public ID: 3457767.

BMI

The body mass divided by the square of the body height expressed in units of kg/m2. See CDE (Common Data Element) Public ID: 4973892.

Cigarettes per day

The average number of cigarettes smoked per day. See CDE (Common Data Element) Public ID: 2001716.

Height

The height of the patient in centimeters. See CDE (Common Data Element) Public ID: 649.

Weight

The weight of the patient measured in kilograms. See CDE (Common Data Element) Public ID: 651.

Years smoked

The numeric value (or unknown) to represent the number of years a person has been smoking. See CDE (Common Data Element) Public ID: 3137957.

Portion

The portion entity represents the sequential 100-120 mg sections derived from samples. Members of the portion entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the portion entity below.

Property

Description

Submitter ID

Usually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID.

Is FFFP

A Boolean value that denotes whether tissue samples used in the analysis were formalin-fixed paraffin-embedded (FFPE).

Portion weight

Weight of a portion prepared for the analysis (in mg).

Portion number

The numerical value that represents the order of a portion in the series.

Center ID

A professional organization or group which has or is able to submit data. It can be identified by a UUID.

Center type

The type classification of the center (e.g. CGCC).

Center code

The code that determines the center that has submitted data.

Center name

The name of the center (e.g. Broad Institute of MIT and Harvard).

Center namespace

The domain name of the center (e.g. borad.mit.edu).

Center short name

A shortened name of the center (e.g. BI).

Updated about a year ago

CPTAC-3 metadata


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.