Browse datasets via the Datasets API
Advance Access
This feature is in our advance access program. This means that, while it is fully operational, it is subject to change.
Seven Bridges is committed to providing Cavatica users with up-to-date versions of the datasets that are available from the NCI Genomic Data Commons (GDC). The currently available version of this dataset corresponds to GDC Data Release 31.
More information about the data in this release can be found in the GDC Data Release Notes.
Learn more about our policies regarding updates to the GDC datasets.
On this page:
Overview
Browse datasets via the Datasetsf API by issuing successive GET
requests. Use these browsing requests individually or in conjunction with querying. For instance, entities located through browsing are resources which can be the subject of a query.
On this page, learn about requests to return:
- datasets you can access
- all the entities within a dataset
- all instances of a single entity
- a single entity's metadata schema
Return accessible datasets
Make the following GET
request. Be sure to replace the authentication token with your own.
GET /datasets/ HTTP/1.1
Host: datasets-api.sbgenomics.com
X-SBG-Auth-Token: 7942f56901534434a054dafc3813bc96
This returns a list of accessible datasets, as shown below.
{
āāāā"description": "Datasets API Advanced Access Program",
āāāā"_links": {
āāāāāāāā"self": {
āāāāāāāāāāāā"href": "datasets-api.sbgenomics.com/datasets/"
āāāāāāāā},
āāāāāāāā"tcga_grch38": {
āāāāāāāāāāāā"href": "datasets-api.sbgenomics.com/datasets/tcga_grch38/v0",
"label": "TCGA GRCh38"
āāāāāāāā},
āāāāāāāā"ccle": {
āāāāāāāāāāāā"href": "datasets-api.sbgenomics.com/datasets/ccle/v0",
"label": "CCLE Legacy"
āāāāāāāā},
āāāāāāāā"tcga": {
āāāāāāāāāāāā"href": "datasets-api.sbgenomics.com/datasets/tcga/v0",
"label": "TCGA Legacy"
āāāāāāāā},
āāāāāāāā"target": {
āāāāāāāāāāāā"href": "datasets-api.sbgenomics.com/datasets/target/v0",
"label": "TARGET GRCh38"
āāāāāāāā},
āāāāāāāā"tcia": {
āāāāāāāāāāāā"href": "datasets-api.sbgenomics.com/datasets/tcia/v0",
"label": "TCIA"
āāāāāāāā},
āāāāāāāā"cptac": {
āāāāāāāāāāāā"href": "datasets-api.sbgenomics.com/datasets/cptac/v0",
"label": "CPTAC"
}
āāāā}
}
The href
element lists the path for each dataset, such as https://datasets-api.sbgenomics.com/datasets/ccle/v0
for CCLE.
Return all entities within a dataset
Make a GET
request to the href
of a dataset to return all of its entities. Learn more about each dataset's entities from its metadata page.
GET /datasets/tcga/v0 HTTP/1.1
Host: datasets-api.sbgenomics.com
X-SBG-Auth-Token: 7942f56901534434a054dafc3813bc96
The response contains a list of entities for the dataset you specified.
{
"_links": {
"cases": {
"schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/cases/schema",
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/cases"
},
"analytes": {
"schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/analytes/schema",
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/analytes"
},
"radiation_therapies": {
"schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/radiation_therapies/schema",
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/radiation_therapies"
},
"drug_therapies": {
"schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/drug_therapies/schema",
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/drug_therapies"
},
"follow_ups": {
"schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/follow_ups/schema",
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/follow_ups"
},
"portions": {
"schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/portions/schema",
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/portions"
},
"aliquots": {
"schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/aliquots/schema",
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/aliquots"
},
"samples": {
"schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/samples/schema",
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/samples"
},
"slides": {
"schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/slides/schema",
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/slides"
},
"query": {
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/query"
},
"self": {
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0"
},
"files": {
"schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files/schema",
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files"
},
"new_tumor_events": {
"schema": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/new_tumor_events/schema",
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/new_tumor_events"
}
}
}
Note that two items in the list returned, query
and self
, are not dataset entities:
self
contains the propertyhref
which is set to the same path issued in making the query.query
contains the propertyhref
which is set to a path that can be used to issue a query into the entities returned, by making aPOST
request. Learn more about querying with the Datasets API.
Return all instances of an entity
Make a GET
request to the href
of an entity to return all instances of that entity within the specified dataset. For example, to see a list of all TCGA files, make the following request.
GET /datasets/tcga/v0/files HTTP/1.1
Host: datasets-api.sbgenomics.com
X-SBG-Auth-Token: 7942f56901534434a054dafc3813bc96
This returns the following response. As you can see, the response contains 100 results (count) per page. You can page through using the paths under _links
. The resulting files are listed under the _embedded
section.
{
"total": 573241,
"count": 100,
"_links": {
"self": {
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files"
},
"next": {
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files?page=1"
},
"last": {
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files?page=5732"
},
"first": {
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files"
}
},
"_embedded": {
"files": [
{
"label": "LIGER_p_TCGA_166_180_SNP_N_GenomeWideSNP_6_G03_896632.ismpolish.data.txt",
"id": "564a31a5e4b09c884b215ccb",
"_links": {
"self": {
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files/564a31a5e4b09c884b215ccb"
}
}
},
{
"label": "6055424037_R05C01_Grn.idat",
"id": "564a31a6e4b0298dd2c5492c",
"_links": {
"self": {
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files/564a31a6e4b0298dd2c5492c"
}
}
},
{
"label": "mdanderson.org_SARC.MDA_RPPA_Core.SuperCurve.Level_2.B06C2A68-8B01-41AC-B3B2-20C7BBFA562B.txt",
"id": "564a31a6e4b0298dd2c54934",
"_links": {
"self": {
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files/564a31a6e4b0298dd2c54934"
}
}
},
{
"label": "PULED_p_TCGA_130_157_N_GenomeWideSNP_6_H10_831788.nocnv_hg19.seg.txt",
"id": "564a31a6e4b0298dd2c54936",
"_links": {
"self": {
"href": "https://datasets-api.sbgenomics.com/datasets/tcga/v0/files/564a31a6e4b0298dd2c54936"
}
}
},
<snip>
}
]
}
}
Return an entity's metadata schema
Make a GET
request to an entity's href to obtain its schema. Each entity has its own metadata schema, a list of the metadata fields used to describe the entity and the permissible datatypes (strings, integers, etc) that each field takes.
Each metadata field, such as hasDataType
in the request below, is followed by an object which indicates the type of value for each field, such as integer
, string
, or enum
. If the type is given as enum
, the object also contains a list of all possible values for the given metadata field.
Additionally, note that under _links
, there is a list of connections to other entities in the dataset. In the example below, these connections include "hasAliquot"
, "hasCase"
, "hasSample"
, "hasPortion"
, and "hasAnalyte"
.
For example, to see the metadata schema for files
send the request:
GET /datasets/tcga/v0/files/schema HTTP/1.1
Host: datasets-api.sbgenomics.com
X-SBG-Auth-Token: 7942f56901534434a054dafc3813bc96
This returns the following response which lists all the properties of the file
entity:
{
"hasSize": {
"type": "integer"
},
"hasDataType": {
"values": [
"Structural rearrangement",
"Protein expression",
"Gene expression",
"Clinical",
"Simple nucleotide variation",
"DNA methylation",
"Other",
"Copy number variation",
"Not available",
"Raw sequencing data",
"Raw microarray data"
],
"type": "enum"
},
"id": {
"type": "string"
},
"hasDataSubmittingCenter": {
"values": [
"MD Anderson - Institute for Applied Cancer Science",
"Johns Hopkins / University of Southern California",
"Baylor College of Medicine",
"HudsonAlpha Institute for Biotechnology",
"Harvard Medical School",
"MD Anderson - RPPA Core Facility (Proteomics)",
"Complete Genomics Inc.",
"Washington University School of Medicine",
"NCH BCR",
"Broad Institute of MIT and Harvard",
"Lawrence Berkeley National Laboratory",
"Memorial Sloan-Kettering Cancer Center",
"Wellcome Trust Sanger Institute",
"University of North Carolina",
"Canada's Michael Smith Genome Sciences Centre",
"Not available",
"Nationwide Children's Hospital BCR",
"University of California, Santa Cruz"
],
"type": "enum"
},
"hasDiseaseType": {
"values": [
"Adrenocortical Carcinoma",
"Stomach Adenocarcinoma",
"Colon Adenocarcinoma",
"Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma",
"Brain Lower Grade Glioma",
"Bladder Urothelial Carcinoma",
"Sarcoma",
"Lung Adenocarcinoma",
"Testicular Germ Cell Tumors",
"Cholangiocarcinoma",
"Lymphoid Neoplasm Diffuse Large B-cell Lymphoma",
"Thymoma",
"Liver Hepatocellular Carcinoma",
"Thyroid Carcinoma",
"Uveal Melanoma",
"Mesothelioma",
"Kidney Renal Papillary Cell Carcinoma",
"Kidney Renal Clear Cell Carcinoma",
"Kidney Chromophobe",
"Pheochromocytoma and Paraganglioma",
"Acute Myeloid Leukemia",
"Prostate Adenocarcinoma",
"Uterine Corpus Endometrial Carcinoma",
"Esophageal Carcinoma",
"Breast Invasive Carcinoma",
"Skin Cutaneous Melanoma",
"Uterine Carcinosarcoma",
"Pancreatic Adenocarcinoma",
"Glioblastoma Multiforme",
"Ovarian Serous Cystadenocarcinoma",
"Head and Neck Squamous Cell Carcinoma",
"Rectum Adenocarcinoma",
"Lung Squamous Cell Carcinoma"
],
"type": "enum"
},
"hasPlatform": {
"values": [
"HG-CGH-244A",
"Ion Torrent PGM",
"BCR Record",
"Not available",
"HG-CGH-415K_G4124A",
"Illumina DNA Methylation OMA003 CPI",
"Illumina HiSeq",
"AgilentG4502A_07_2",
"Complete Genomics",
"AgilentG4502A_07_1",
"H-miRNA_8x15Kv2",
"Hospital Record",
"HiSeq X Ten",
"Illumina DNA Methylation OMA002 CPI",
"LS 454",
"Affymetrix U133 Plus 2",
"Illumina GA",
"AgilentG4502A_07_3",
"MDA_RPPA_Core",
"Illumina HumanHap550",
"ABI capillary sequencer",
"CGH-1x1M_G4447A",
"ABI SOLiD",
"Affymetrix SNP Array 6.0",
"Mixed platforms",
"Illumina MiSeq",
"HT_HG-U133A",
"H-miRNA_8x15K",
"HG-U133_Plus_2",
"HuEx-1_0-st-v2",
"Illumina Human Methylation 27",
"Illumina Human 1M Duo",
"Illumina Human Methylation 450"
],
"type": "enum"
},
"hasReferenceGenome": {
"values": [
"GRCh37-lite_WUGSC_variant_2",
"GRCh37",
"HG19",
"GRCh37-lite_WUGSC_variant_1",
"HG18",
"HS37D5",
"GRCh37-lite-+-HPV_Redux-build",
"HG18_Broad_variant",
"NCBI36_BCCAGSC_variant",
"NCBI-human-build36",
"GRCh37_BI_Variant",
"NCBI36_BCM_variant",
"HG19_Broad_variant",
"NCBI36_WUGSC_variant",
"GRCh37-lite",
"Not available"
],
"type": "enum"
},
"publishedDate": {
"type": "dateTime"
},
"hasDataSubtype": {
"values": [
"Bisulfite sequence alignment",
"Sequencing tag counts",
"Sequencing tag",
"Isoform expression quantification",
"Genotypes",
"miRNA quantification",
"Copy number variation",
"Simple somatic mutation",
"Raw intensities",
"Protein expression quantification",
"Gene expression summary",
"Methylation percentage",
"Gene expression quantification",
"Structural variation",
"Copy number segmentation",
"Microsattelite instability",
"Not available",
"Intensities Log2Ratio",
"Exon quantification",
"Biospecimen data",
"Probeset summary",
"Simple nucleotide variation",
"Methylation beta value",
"Intensities",
"Exon junction quantification",
"Unaligned reads",
"Copy number estimate",
"LOH",
"Aligned reads",
"Normalized copy numbers",
"Clinical data",
"Normalized intensities"
],
"type": "enum"
},
"lastModifiedDate": {
"type": "dateTime"
},
"_links": {
"hasAliquot": "anyURI",
"self": {
"href": "metadata-api-vayu.sbgenomics.com:9989/datasets/v0/tcga/files/schema"
},
"hasSample": "anyURI",
"hasPortion": "anyURI",
"hasAnalyte": "anyURI",
"hasCase": "anyURI"
},
"hasAccessLevel": {
"values": [
"Controlled",
"Open"
],
"type": "enum"
},
"hasDataFormat": {
"values": [
"BED",
"FA",
"XLSX",
"DGE-TAG",
"IDAT",
"XML",
"TXT",
"BAM",
"BAI",
"DAT",
"CHP",
"MAF",
"CEL",
"TIF",
"TAR",
"FSA",
"Not available",
"GCT",
"VCF",
"TARGZ"
],
"type": "enum"
},
"hasInvestigation": {
"values": [
"TCGA-THYM",
"TCGA-KIRC",
"TCGA-SARC",
"TCGA-ESCA",
"TCGA-PCPG",
"TCGA-PRAD",
"TCGA-UCEC",
"TCGA-ACC",
"TCGA-READ",
"TCGA-UVM",
"TCGA-CESC",
"TCGA-COAD",
"Not available",
"TCGA-TGCT",
"TCGA-DLBC",
"TCGA-KICH",
"TCGA-THCA",
"TCGA-HNSC",
"TCGA-UCS",
"TCGA-CHOL",
"TCGA-BLCA",
"TCGA-GBM",
"TCGA-SKCM",
"TCGA-LUSC",
"TCGA-STAD",
"TCGA-LUAD",
"TCGA-LIHC",
"TCGA-KIRP",
"TCGA-BRCA",
"TCGA-MESO",
"TCGA-PAAD",
"TCGA-LAML",
"TCGA-OV",
"TCGA-LGG"
],
"type": "enum"
},
"hasStoragePath": {
"type": "string"
},
"hasExperimentalStrategy": {
"values": [
"Exon array",
"CGH array",
"RNA-Seq",
"miRNA-Seq",
"AMPLICON",
"Not available",
"VALIDATION",
"WXS",
"DNA-Seq",
"Total RNA-Seq",
"Gene expression array",
"Protein expression array",
"miRNA expression array",
"Genotyping array",
"Bisulfite-Seq",
"Methylation array",
"WGS",
"MSI-Mono-Dinucleotide Assay"
],
"type": "enum"
},
"label": {
"type": "string"
},
"hasSubmitterId": {
"type": "string"
},
"uploadDate": {
"type": "dateTime"
},
"hasGDCFileUUID": {
"type": "string"
}
}
Next step
Browsing requests can be used in conjunction with querying. For instance, entities located through browsing are resources which can be the subject of a query. Learn more about querying via the Datasets API.
Resources
Updated about 1 year ago