On this page, you'll find the most common questions regarding the Seven Bridges graph-based whole genome sequencing analysis app (Graph WGS).
If your question is not in this FAQ section, feel free to contact us at [email protected]
Graph WGS is the Seven Bridges Graph-based Whole Genome Sequencing Analysis app.
In the context of genomics, graph technology refers to using a graph structure to store and analyse variations between genomes, in contrast to the traditional linear approach.
A linear reference genome is built from a small number of individuals and is therefore not an accurate description of genetic variants within a population. This leads to inaccurate read alignment against that reference, which in turn biases the discovery of new genetic variants. This reference bias can be addressed by using a graph structure to achieve a more accurate description of genetic variation within a population. Work with small genomic regions containing high sequence or structural diversity has shown that using a graph reference produces better read alignment and improved variant calling. Seven Bridges’s Graph WGS app makes graph technology applicable at the whole genome level, enabling powerful and highly accurate read alignment and variant calling.
The graph reference genome is built using genomes from a wide range of human populations, including data from the 1000 Genomes Project and the Simons Genome Diversity Project Datasets. The differences between genomes are stored for every position in the genome using a directed acyclic graph. Given that human genomes are 99.9% identical, the increase in storage requirements is minimal with the addition of a new genome to the population reference graph, allowing analysis to be carried out at a previously impossible scale.
Sequencing reads are aligned to the reference graph in a two-step process. First, regions to which a read is likely to map to in the graph are identified using a fast global search algorithm. In the second step, the read is aligned against the graph reference genome, using a precise local alignment algorithm. Following the alignment step, a Bayesian variant caller is used to detect variants from the aligned reads.
Unlike linear alignment methods that treat all genetic variants as novel, graph aligner incorporates known variants to get more precise alignments and gives us more accurate detection of known and novel variants.
The use of graph technology, enables you not only to detect complex variants, but also cases when a complex variant is phased with a SNP. This allows you to find multiple interconnected/correlated mutation events in the genome.
Unlike any of the current linear alignment methods, graph aligner can also detect variants within variants due to the implementation of variation graphs.
Yes, as long as your input contains the required read type, file format, and metadata, you can input similar types of data e.g. whole human exome sequencing data. However, bear in mind that Graph WGS has been optimised for whole genome sequencing analysis, and its strength lies in the ability to identify the traditionally elusive long structural variations, and variations in highly repetitive genomic regions.
You can choose between the human genome reference builds v37 and v38 when defining your app settings for your Graph WGS draft task. Seven Bridges have built graph reference genomes corresponding to HGv37 and HGv38, using available variant information. Graph reference genome build v37 is recommended as it contains better variant characterisation. There are few public variant datasets built on top of v38.
Graph WGS accepts any of the following input formats: FQ, FASTQ, FQ.GZ, FASTQ.GZ. Read more about input file requirements.
When you run Graph WGS on the Seven Bridges Platform, you will be charged per execution. For example, performing a whole genome sequencing analysis using two high quality paired-end raw FASTQ files (~80 GB) – similar to the task described in the Graph WGS tutorial – will cost you about $14.10.
Running one whole genome sequencing analysis using high-quality raw FASTQ (~80 GB), like the one described in the Graph WGS tutorial, will take you about 8 hours. The exact time will depend on the compression of your FASTQ inputs.
Graph WGS is available by request to Seven Bridges Platform users running Amazon Web Services cloud infrastructure. Learn how to be one of the first people to use Graph WGS.
How can I run Graph WGS on the Seven Bridges Platform using the Google Cloud Platform infrastructure?
If your task running Graph WGS fails, the error is immediately analysed and you will shortly be contacted by one of the Seven Bridges bioinformaticians with advice on how to get the task up and running.
As a Graph WGS user, you do not have access to the Graph WGS task logs. However, if your task does fail, one of the Seven Bridges bioinformaticians will be in touch to help with troubleshooting.
Graph WGS is a standalone app and cannot be connected to other tools or workflows using the Workflow Editor.
As a Graph WGS user, you do not have access to the Common Workflow Language (CWL) JSON file for the app. If you request the CWL JSON for Graph WGS via the API, a limited number of fields will be returned.
Users don't have access the Graph WGS Docker image.
Users cannot edit the Graph WGS app or see any of its components.
Once your Graph WGS task has finished running, you can view and download the output files.
Updated about a year ago