Browse datasets via the Datasets API

ā—ļø

Advance Access

This feature is in our advance access program. This means that, while it is fully operational, it is subject to change.

šŸ“˜

Seven Bridges is committed to providing Cavatica users with up-to-date versions of the datasets that are available from the NCI Genomic Data Commons (GDC). The currently available version of this dataset corresponds to GDC Data Release 31.

More information about the data in this release can be found in the GDC Data Release Notes.

Learn more about our policies regarding updates to the GDC datasets.

šŸš§

On this page:

Overview

Browse datasets via the Datasetsf API by issuing successive GET requests. Use these browsing requests individually or in conjunction with querying. For instance, entities located through browsing are resources which can be the subject of a query.

On this page, learn about requests to return:

Return accessible datasets

Make the following GET request. Be sure to replace the authentication token with your own.

GET /datasets/ HTTP/1.1
Host: datasets-api.sbgenomics.com
X-SBG-Auth-Token: 7942f56901534434a054dafc3813bc96

This returns a list of accessible datasets, as shown below.

{
ā€‚ā€‚ā€‚ā€‚"description": "Datasets API Advanced Access Program",
ā€‚ā€‚ā€‚ā€‚"_links": {
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚"self": {
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚"href": "datasets-api.sbgenomics.com/datasets/"
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚},
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚"tcga_grch38": {
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚"href": "datasets-api.sbgenomics.com/datasets/tcga_grch38/v0",
             "label": "TCGA GRCh38"
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚},
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚"ccle": {
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚"href": "datasets-api.sbgenomics.com/datasets/ccle/v0",
             "label": "CCLE Legacy"
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚},
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚"tcga": {
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚"href": "datasets-api.sbgenomics.com/datasets/tcga/v0",
             "label": "TCGA Legacy"
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚},
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚"target": {
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚"href": "datasets-api.sbgenomics.com/datasets/target/v0",
             "label": "TARGET GRCh38"
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚},
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚"tcia": {
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚"href": "datasets-api.sbgenomics.com/datasets/tcia/v0",
             "label": "TCIA"
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚},
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚"cptac": {
ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚ā€‚"href": "datasets-api.sbgenomics.com/datasets/cptac/v0",
             "label": "CPTAC" 
       }
ā€‚ā€‚ā€‚ā€‚}
}

The href element lists the path for each dataset, such as https://datasets-api.sbgenomics.com/datasets/ccle/v0 for CCLE.

Return all entities within a dataset

Make a GET request to the href of a dataset to return all of its entities. Learn more about each dataset's entities from its metadata page.

GET /datasets/tcga/v0 HTTP/1.1
Host: datasets-api.sbgenomics.com
X-SBG-Auth-Token: 7942f56901534434a054dafc3813bc96

The response contains a list of entities for the dataset you specified.

{
  "_links": {
    "cases": {
      "schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/cases/schema",
      "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/cases"
    },
    "analytes": {
      "schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/analytes/schema",
      "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/analytes"
    },
    "radiation_therapies": {
      "schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/radiation_therapies/schema",
      "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/radiation_therapies"
    },
    "drug_therapies": {
      "schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/drug_therapies/schema",
      "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/drug_therapies"
    },
    "follow_ups": {
      "schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/follow_ups/schema",
      "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/follow_ups"
    },
    "portions": {
      "schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/portions/schema",
      "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/portions"
    },
    "aliquots": {
      "schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/aliquots/schema",
      "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/aliquots"
    },
    "samples": {
      "schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/samples/schema",
      "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/samples"
    },
    "slides": {
      "schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/slides/schema",
      "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/slides"
    },
    "query": {
      "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/query"
    },
    "self": {
      "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0"
    },
    "files": {
      "schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files/schema",
      "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files"
    },
    "new_tumor_events": {
      "schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/new_tumor_events/schema",
      "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/new_tumor_events"
    }
  }
}

Note that two items in the list returned, query and self, are not dataset entities:

  • self contains the property href which is set to the same path issued in making the query.
  • query contains the property href which is set to a path that can be used to issue a query into the entities returned, by making a POST request. Learn more about querying with the Datasets API.

Return all instances of an entity

Make a GET request to the href of an entity to return all instances of that entity within the specified dataset. For example, to see a list of all TCGA files, make the following request.

GET /datasets/tcga/v0/files HTTP/1.1
Host: datasets-api.sbgenomics.com
X-SBG-Auth-Token: 7942f56901534434a054dafc3813bc96

This returns the following response. As you can see, the response contains 100 results (count) per page. You can page through using the paths under _links. The resulting files are listed under the _embedded section.

{
  "total": 573241,
  "count": 100,
  "_links": {
    "self": {
      "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files"
    },
    "next": {
      "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files?page=1"
    },
    "last": {
      "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files?page=5732"
    },
    "first": {
      "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files"
    }
  },
  "_embedded": {
    "files": [
      {
        "label": "LIGER_p_TCGA_166_180_SNP_N_GenomeWideSNP_6_G03_896632.ismpolish.data.txt",
        "id": "564a31a5e4b09c884b215ccb",
        "_links": {
          "self": {
            "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files/564a31a5e4b09c884b215ccb"
          }
        }
      },
      {
        "label": "6055424037_R05C01_Grn.idat",
        "id": "564a31a6e4b0298dd2c5492c",
        "_links": {
          "self": {
            "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files/564a31a6e4b0298dd2c5492c"
          }
        }
      },
      {
        "label": "mdanderson.org_SARC.MDA_RPPA_Core.SuperCurve.Level_2.B06C2A68-8B01-41AC-B3B2-20C7BBFA562B.txt",
        "id": "564a31a6e4b0298dd2c54934",
        "_links": {
          "self": {
            "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files/564a31a6e4b0298dd2c54934"
          }
        }
      },
      {
        "label": "PULED_p_TCGA_130_157_N_GenomeWideSNP_6_H10_831788.nocnv_hg19.seg.txt",
        "id": "564a31a6e4b0298dd2c54936",
        "_links": {
          "self": {
            "href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files/564a31a6e4b0298dd2c54936"
          }
        }
      },
     
    
        <snip>
   
        }
    ]
  }
}

Return an entity's metadata schema

Make a GET request to an entity's href to obtain its schema. Each entity has its own metadata schema, a list of the metadata fields used to describe the entity and the permissible datatypes (strings, integers, etc) that each field takes.

Each metadata field, such as hasDataType in the request below, is followed by an object which indicates the type of value for each field, such as integer, string, or enum. If the type is given as enum, the object also contains a list of all possible values for the given metadata field.

Additionally, note that under _links, there is a list of connections to other entities in the dataset. In the example below, these connections include "hasAliquot", "hasCase", "hasSample", "hasPortion", and "hasAnalyte".

For example, to see the metadata schema for files send the request:

GET /datasets/tcga/v0/files/schema HTTP/1.1
Host: datasets-api.sbgenomics.com
X-SBG-Auth-Token: 7942f56901534434a054dafc3813bc96

This returns the following response which lists all the properties of the file entity:

{
  "hasSize": {
    "type": "integer"
  },
  "hasDataType": {
    "values": [
      "Structural rearrangement",
      "Protein expression",
      "Gene expression",
      "Clinical",
      "Simple nucleotide variation",
      "DNA methylation",
      "Other",
      "Copy number variation",
      "Not available",
      "Raw sequencing data",
      "Raw microarray data"
    ],
    "type": "enum"
  },
  "id": {
    "type": "string"
  },
  "hasDataSubmittingCenter": {
    "values": [
      "MD Anderson - Institute for Applied Cancer Science",
      "Johns Hopkins / University of Southern California",
      "Baylor College of Medicine",
      "HudsonAlpha Institute for Biotechnology",
      "Harvard Medical School",
      "MD Anderson - RPPA Core Facility (Proteomics)",
      "Complete Genomics Inc.",
      "Washington University School of Medicine",
      "NCH BCR",
      "Broad Institute of MIT and Harvard",
      "Lawrence Berkeley National Laboratory",
      "Memorial Sloan-Kettering Cancer Center",
      "Wellcome Trust Sanger Institute",
      "University of North Carolina",
      "Canada's Michael Smith Genome Sciences Centre",
      "Not available",
      "Nationwide Children's Hospital BCR",
      "University of California, Santa Cruz"
    ],
    "type": "enum"
  },
  "hasDiseaseType": {
    "values": [
      "Adrenocortical Carcinoma",
      "Stomach Adenocarcinoma",
      "Colon Adenocarcinoma",
      "Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma",
      "Brain Lower Grade Glioma",
      "Bladder Urothelial Carcinoma",
      "Sarcoma",
      "Lung Adenocarcinoma",
      "Testicular Germ Cell Tumors",
      "Cholangiocarcinoma",
      "Lymphoid Neoplasm Diffuse Large B-cell Lymphoma",
      "Thymoma",
      "Liver Hepatocellular Carcinoma",
      "Thyroid Carcinoma",
      "Uveal Melanoma",
      "Mesothelioma",
      "Kidney Renal Papillary Cell Carcinoma",
      "Kidney Renal Clear Cell Carcinoma",
      "Kidney Chromophobe",
      "Pheochromocytoma and Paraganglioma",
      "Acute Myeloid Leukemia",
      "Prostate Adenocarcinoma",
      "Uterine Corpus Endometrial Carcinoma",
      "Esophageal Carcinoma",
      "Breast Invasive Carcinoma",
      "Skin Cutaneous Melanoma",
      "Uterine Carcinosarcoma",
      "Pancreatic Adenocarcinoma",
      "Glioblastoma Multiforme",
      "Ovarian Serous Cystadenocarcinoma",
      "Head and Neck Squamous Cell Carcinoma",
      "Rectum Adenocarcinoma",
      "Lung Squamous Cell Carcinoma"
    ],
    "type": "enum"
  },
  "hasPlatform": {
    "values": [
      "HG-CGH-244A",
      "Ion Torrent PGM",
      "BCR Record",
      "Not available",
      "HG-CGH-415K_G4124A",
      "Illumina DNA Methylation OMA003 CPI",
      "Illumina HiSeq",
      "AgilentG4502A_07_2",
      "Complete Genomics",
      "AgilentG4502A_07_1",
      "H-miRNA_8x15Kv2",
      "Hospital Record",
      "HiSeq X Ten",
      "Illumina DNA Methylation OMA002 CPI",
      "LS 454",
      "Affymetrix U133 Plus 2",
      "Illumina GA",
      "AgilentG4502A_07_3",
      "MDA_RPPA_Core",
      "Illumina HumanHap550",
      "ABI capillary sequencer",
      "CGH-1x1M_G4447A",
      "ABI SOLiD",
      "Affymetrix SNP Array 6.0",
      "Mixed platforms",
      "Illumina MiSeq",
      "HT_HG-U133A",
      "H-miRNA_8x15K",
      "HG-U133_Plus_2",
      "HuEx-1_0-st-v2",
      "Illumina Human Methylation 27",
      "Illumina Human 1M Duo",
      "Illumina Human Methylation 450"
    ],
    "type": "enum"
  },
  "hasReferenceGenome": {
    "values": [
      "GRCh37-lite_WUGSC_variant_2",
      "GRCh37",
      "HG19",
      "GRCh37-lite_WUGSC_variant_1",
      "HG18",
      "HS37D5",
      "GRCh37-lite-+-HPV_Redux-build",
      "HG18_Broad_variant",
      "NCBI36_BCCAGSC_variant",
      "NCBI-human-build36",
      "GRCh37_BI_Variant",
      "NCBI36_BCM_variant",
      "HG19_Broad_variant",
      "NCBI36_WUGSC_variant",
      "GRCh37-lite",
      "Not available"
    ],
    "type": "enum"
  },
  "publishedDate": {
    "type": "dateTime"
  },
  "hasDataSubtype": {
    "values": [
      "Bisulfite sequence alignment",
      "Sequencing tag counts",
      "Sequencing tag",
      "Isoform expression quantification",
      "Genotypes",
      "miRNA quantification",
      "Copy number variation",
      "Simple somatic mutation",
      "Raw intensities",
      "Protein expression quantification",
      "Gene expression summary",
      "Methylation percentage",
      "Gene expression quantification",
      "Structural variation",
      "Copy number segmentation",
      "Microsattelite instability",
      "Not available",
      "Intensities Log2Ratio",
      "Exon quantification",
      "Biospecimen data",
      "Probeset summary",
      "Simple nucleotide variation",
      "Methylation beta value",
      "Intensities",
      "Exon junction quantification",
      "Unaligned reads",
      "Copy number estimate",
      "LOH",
      "Aligned reads",
      "Normalized copy numbers",
      "Clinical data",
      "Normalized intensities"
    ],
    "type": "enum"
  },
  "lastModifiedDate": {
    "type": "dateTime"
  },
  "_links": {
    "hasAliquot": "anyURI",
    "self": {
      "href": "metadata-api-vayu.sbgenomics.com:9989/datasets/v0/tcga/files/schema"
    },
    "hasSample": "anyURI",
    "hasPortion": "anyURI",
    "hasAnalyte": "anyURI",
    "hasCase": "anyURI"
  },
  "hasAccessLevel": {
    "values": [
      "Controlled",
      "Open"
    ],
    "type": "enum"
  },
  "hasDataFormat": {
    "values": [
      "BED",
      "FA",
      "XLSX",
      "DGE-TAG",
      "IDAT",
      "XML",
      "TXT",
      "BAM",
      "BAI",
      "DAT",
      "CHP",
      "MAF",
      "CEL",
      "TIF",
      "TAR",
      "FSA",
      "Not available",
      "GCT",
      "VCF",
      "TARGZ"
    ],
    "type": "enum"
  },
  "hasInvestigation": {
    "values": [
      "TCGA-THYM",
      "TCGA-KIRC",
      "TCGA-SARC",
      "TCGA-ESCA",
      "TCGA-PCPG",
      "TCGA-PRAD",
      "TCGA-UCEC",
      "TCGA-ACC",
      "TCGA-READ",
      "TCGA-UVM",
      "TCGA-CESC",
      "TCGA-COAD",
      "Not available",
      "TCGA-TGCT",
      "TCGA-DLBC",
      "TCGA-KICH",
      "TCGA-THCA",
      "TCGA-HNSC",
      "TCGA-UCS",
      "TCGA-CHOL",
      "TCGA-BLCA",
      "TCGA-GBM",
      "TCGA-SKCM",
      "TCGA-LUSC",
      "TCGA-STAD",
      "TCGA-LUAD",
      "TCGA-LIHC",
      "TCGA-KIRP",
      "TCGA-BRCA",
      "TCGA-MESO",
      "TCGA-PAAD",
      "TCGA-LAML",
      "TCGA-OV",
      "TCGA-LGG"
    ],
    "type": "enum"
  },
  "hasStoragePath": {
    "type": "string"
  },
  "hasExperimentalStrategy": {
    "values": [
      "Exon array",
      "CGH array",
      "RNA-Seq",
      "miRNA-Seq",
      "AMPLICON",
      "Not available",
      "VALIDATION",
      "WXS",
      "DNA-Seq",
      "Total RNA-Seq",
      "Gene expression array",
      "Protein expression array",
      "miRNA expression array",
      "Genotyping array",
      "Bisulfite-Seq",
      "Methylation array",
      "WGS",
      "MSI-Mono-Dinucleotide Assay"
    ],
    "type": "enum"
  },
  "label": {
    "type": "string"
  },
  "hasSubmitterId": {
    "type": "string"
  },
  "uploadDate": {
    "type": "dateTime"
  },
  "hasGDCFileUUID": {
    "type": "string"
  }
}

Next step

Browsing requests can be used in conjunction with querying. For instance, entities located through browsing are resources which can be the subject of a query. Learn more about querying via the Datasets API.

Resources