The Seven Bridges Knowledge Center

The Seven Bridges Platform is a simple solution for doing bioinformatics at industrial scale. But sometimes, everyone needs a little help.

Get Started

Simons Genome Diversity Project (SGDP) dataset

The SGDP public project is not available for AWS EU and GCP

To use the SGDP Public project, you should know the cloud infrastructure provider on which you run the Seven Bridges Platform: Amazon Web Services in the US (AWS US East), Amazon Web Services in Frankfurt, Germany (AWS EU), or Google Cloud Platform. If you didn't choose a cloud provider when you signed up for the Platform, you are using AWS US East.

If you signed up from early 2016, you had the option to select between AWS US East and the Google Cloud Platform. If you signed up from early 2017, you had the additional option to select AWS EU as your cloud provider.

Overview

The Simons Genome Diversity Project (SGDP) public project contains large Open Access files from the SGDP dataset which you can use on the Seven Bridges Platform.

The SGDP dataset is made possible by the Simons Foundation. The dataset contains complete genome sequences from more than one hundred diverse human populations. It is the largest dataset of diverse, high quality human genome sequences ever reported. To represent as much anthropological, linguistic, and cultural diversity as possible, the dataset includes many deeply divergent human populations that are not well-represented in other datasets.

The SGDP public project contains Open Access whole genome sequencing data for 279 samples.

You don't need special access or authorization status to use the data in this project. In fact, any data you copy from this public project into your own projects will not count towards your storage.

The Simons Foundation asks that you please observe the Fort Lauderdale principles in your usage of SGDP data.

top

What's contained in the project?

The SGDP public project contains the following distribution of samples and files.

By geographical regions, the SGDP dataset is comprised of 44 Africans, 22 Native Americans, 27 Central Asians or Siberians, 47 East Asians, 25 Oceanians, 39 South Asians and 75 West Eurasians. Learn more about the metadata for the dataset.

Access the SGDP public project

  1. Click on Public projects from the top navigation bar.
  2. Select Simons Genome Diversity Project (SGDP), as shown below.

You'll be taken to the main dashboard of the SGDP public project.

Use the SGDP public project

All Seven Bridges Platform users automatically have copy permissions for this project. This means that while you cannot upload data or tools to the project, you can copy the available data to your own projects on the Platform to execute analyses.

You have the options to:

Copy the entire project

  1. Access the SGDP public project by selecting Simons Genome Diversity Project from Public projects in the top navigation bar.
  2. Click Copy this project, next to the project's title, as shown below.
  1. In the pop-up window, you can name your copy of the project and select a billing group.
  1. Once you've customized the details, click Copy to copy the entire project.

You'll be redirected to the dashboard of your cloned project when it is ready, as shown below. Add apps to conduct analyses on the data in your project.

Use a subset of the data

Instead of cloning the entire project, you can choose to select and copy a subset of the data.

  1. Access the SGDP public project by selecting Simons Genome Diversity Project from Public projects in the top navigation bar. You'll be taken to the project dashboard of the SGDP public project, as shown below.
  1. Click the Files tab in the upper righthand corner. This will take you to the Files page for the SGDP project, as shown below.
  1. Filter or search for the desired files. You can filter by:
    • Keywords - You can use the search bar at the top of the page to find files by entering the file name or notes associated with a file.
    • Metadata fields - Next to the search bar, you will see drop-down menus for the metadata fields Investigation, File extension, and Sample ID. Selecting a particular metadata value from one of these menus displays only files that match the value. For example, filter by SGDP-Australian in the Investigation field to only see samples from the Australian population. You can add additional drop-down menus to filter by other metadata fields by clicking the + icon.
  2. You can choose specific files by selecting the corresponding checkbox in front of the file name.
  3. Select as many files as you desire and click Copy to.
  4. Select your desired project from the drop-down menu.

Now, you can start using the SGDP files you've added to your personal project in your own analysis.