Access dbGaP Controlled Data on the Seven Bridges Platform
On this page:
On March 23, 2015, Vivien Bonazzi from the office of the Associate Director for Data Science, declared the release of the NIH position statement about the use of the Database of Genotypes and Phenotypes (dbGaP) data on the cloud to be “one small step for the NIH, one giant leap forward for the community”. For the first time, this policy allows researchers to take advantage of dynamic compute environments to analyze genomic data in a scalable and cost effective manner.
Investigators wishing to use cloud computing resources for storage and analysis of controlled data need to indicate in their dbGaP Data Access Request that they are requesting to use cloud computing and identify the cloud service provider(s) that will be employed. Additionally, researchers must abide by the data security and compliance requirements outlined for use of Controlled Data by dbGaP. While researchers retain responsibility for ensuring appropriate use of Controlled Data, the use of a platform like Seven Bridges significantly reduces the organizational burden of securing genomic data in the cloud. A review how Seven Bridges supports secure and compliant use of dbGaP data is available in our compliance white paper.
A new dbGaP request is not needed as long as the original research use statement remains valid. Instead, researchers may simply select update request and provide a Cloud Use Statement as well as the specific services to be used. In our experience, updated requests are approved within a few business days.
We’ve provided an example of a Cloud Use Statement and Provider Information below. This should be tailored to the specific research questions to be addressed by your application.
If you have any questions, please don’t hesitate to reach out to us at [email protected].
Cloud Use Statement
In this application we are requesting permission for project team members (who have all completed the required NIH security awareness training) to transfer, store, and access copies of XXX controlled access data in a cloud environment. The primary point of interaction of project staff with cloud resources will be through the Seven Bridges PaaS, which utilizes IaaS services from AWS and/or Google Cloud. All data transfers from XXX to the team’s secure project space on on the Seven Bridges Platform will be encrypted in-flight using industry best-practices including robust security certificates and client-authenticated SSL connections for secure transfers. All data will be stored in the Seven Bridges project space which provides an interaction layer for AWS Simple Scalable Service (S3) architecture for file objects, or in the case of Google as a IaaS, Google Cloud Storage (GCS). AWS elastic compute (EC2) and/or Google Compute Engine instances are used on-demand to facilitate transfer and analysis. All data are encrypted at rest. All access to controlled data will be limited to members of the project team, and used for the described research objectives only. The Seven Bridges Platform provides additional security and compliance features beyond those provided by the underlying IaaS. The reader is asked to refer to the compliance white paper available at https://www.sevenbridges.com/library/white-papers/compliance/ for a full description of the security and compliance features of the Seven Bridges Platform. All project personnel will be appropriately trained for access to protected data with emphasis on the special controls needed for cloud computing architectures. The project PI and senior personnel will ensure that the policies and responsibilities described in the XXX Data Use Certification Agreement are adhered to.
Cloud Provider Information
Name of the company: Seven Bridges Genomics Inc. Type: Commercial
Details: PaaS; primary point of interaction with all IaaS services. Provides additional access and security controls, workflow/ computation management and interactive analysis.
Name of the company: Amazon Web Services Inc. Type: Commercial
Details: IaaS, primary underlying infrastructure for the Seven Bridges Platform. Resources used include data storage (S3) and on-demand computational instances (E3).
Name of the company: Google Inc. Type: Commercial
Details: IaaS, secondary underlying infrastructure for the Seven Bridges Platform. Resources used include encrypted data storage (GCS) and on-demand computational instances (GE).