Large datasets hosted on corporate or academic clusters and workstations can be uploaded to the Seven Bridges Platform using the Seven Bridges Command Line Uploader. This is a fast and secure upload client that has been optimized to efficiently upload files to the Seven Bridges Platform, taking advantage of parallelization where possible.
The Seven Bridges Platform command line uploader requires Java 1.8 or newer. Issue the following command in a terminal window.
$ java -version
and look for the version number in the first line of the output. It should look something like this:
java version "1.8.0_20"
Java(TM) SE Runtime Environment (build 1.8.0_20-b26)
Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)
This feature depends on your cloud infrastructure provider
To use this feature, you should know the cloud infrastructure provider on which you run the Seven Bridges Platform: Amazon Web Services in the US (AWS US East); Amazon Web Services in Frankfurt, Germany (AWS EU); or Google Cloud Platform.
If you didn't chose a cloud provider when you signed up for the Platform, you are using AWS US East. If you signed up from early 2016, you had the option to select your cloud provider from AWS and Google Cloud Platform.
If you signed up from early 2017, you had the additional option to select AWS EU as your cloud provider.
- Download the Command Line Uploader:
- Unpack the uploader to a directory of your choice. Your home directory is a good default location. To do this, enter:
$ tar zxvf ~/Downloads/sbg-uploader.tgz -C ~
- Run the uploader with the -h switch to list all the available command line options (listed below) and their usage:
$ ~/sbg-uploader/bin/sbg-uploader.sh -h
sbg-uploader.sh [-h] [-l] -p id [-t token] [-u username] [-x url] [-f project_folder] [-pf] file/folder [--tag "enter tag here"]
This option prints a short usage summary and will cause the uploader to ignore any other options. It will exit with a status of 0 (see Exit Statuses below).
This option gives you a list of projects available as upload targets with the following two columns:
- The project identifier which is used for specifying the target project (e.g.
rfranklin/samtoolswhere "rfranklin" is the project owner and "samtools" is the name of the project).
- The project name (e.g. “my new project”).
This option is used for choosing the target project by specifying its unique identifier, e.g.
--list-projects option to find the project identifier for your project.
To upload files to a project, you must be a member of that project and must have the write permission granted by the project administrator.
This option is mandatory.
This option is used to specify an authorization token. This option overrides the credentials read from the configuration file (see below).
This option is used to specify a username for the Seven Bridges Platform. If omitted and not using the --token option, you will be prompted for a username.
This option can be used only by users who haven't connected their Seven Bridges Platform account with an eRA account.
This option specifies a proxy server through which the uploader should connect. The proxy parameter should be of the form proto://[username:password@]host[:port]. proto can be 'http’ or 'socks’. HTTP proxies must allow the CONNECT command to port 443. SOCKS proxies can be both SOCKS4 and SOCKS5.
- username and password are optional and will be used if the proxy requires authentication.
- host is required.
- port is optional. If omitted, the uploader will use 8080 for HTTP and 1080 for SOCKS proxies.
Use this option to specify the destination folder in the target project on the Platform, into which the items will be uploaded. If the specified folder does not exist in the destination project on the Platform, it is created. If it does exist, items are uploaded into the specified folder.
Specify one or more local folders to upload to the Platform, while replicating the folder structure in the destination project based on the structure on your local machine. This option can be combined with
-f - you can specify the destination folder into which to upload the local folder structure. If
-pf is not used, the folder structure will be flattened when uploaded to the destination project on the Platform.
Use this option to enter tags composed of strings for your files. Format your tags as such:
--tag "first tag here" --tag "second tag here" --tag "third tag here"
Use this option to list all the tags in your destination project.
Use this option to upload multiple files and set their metadata using the manifest file.
To only apply individual metadata fields from the manifest, list them after the
--manifest-metadata option, e.g.
--manifest-metadata sample paired_end.
This option should be used in combination with the
Use this option to specify the name of the manifest file.
This option should be used in combination with
--manifest-metadata , e.g.
--manifest-metadata --manifest-file filename.csv.
To only upload files while omitting the metadata use
Use this option to only output data in the terminal and check the settings without uploading anything.
To output information about specific metadata fields, list them after the
--dry-run option, e.g.
--dry-run sample library.
Let's assume we want to upload three files into a folder named
samples that does not currently exist within our project on the Platform:
./sbg-uploader.sh -t token1 -p usr1/prj1 -f samples sample1.fastq sample2.fastq sample3.fastq
The folder is automatically created and the three .fastq files are uploaded into the folder. The resulting folder structure is as follows:
Now we are uploading additional two files into the same samples directory:
./sbg-uploader.sh -t token1 -p usr1/prj1 -f samples sample1.fastq sample4.fastq
sample1.fastq has the same name and size as the file that is already present at the destination, it is skipped. The other file,
sample4.fastq is uploaded to the folder within the project.
If the size of
sample1.fastq was different from the size of the file already present on the Platform, the file would be uploaded and automatically prefixed with a number and underscore, e.g.
Let's say we now want to add a file to a non-existing subfolder on the Platform:
./sbg-uploader.sh -t token1 -p usr1/prj1 -f samples/march/week4 sample5.fastq
This command will result in creation of the
march/week4 subfolder structure within the target folder, while the specified file will be uploaded into the subfolder named
Finally, let's assume that we want to upload another folder to the samples folder on the Platform and preserve its folder structure at the same time:
./sbg-uploader.sh -t token1 -p usr1/prj1 -f samples -pf april
Assuming that the structure of the
april folder is:
The final folder structure on the Platform is:
To upload larger volumes of data, use the command line uploader with a manifest file.
sbg-uploader.sh -u username --manifest-file filename.csv
Getting the authentication token
You can obtain an authentication token for your Seven Bridges Platform account from the Developer Dashboard.
The Seven Bridges Platform Command Line Uploader looks for credentials in the following locations in order:
If the -u username option is given, the uploader prompts for and reads the password from standard input.
- If the -t token option is given, the uploader uses your authentication token.
- The uploader looks for an authentication token in the configuration file (see below).
- The uploader looks for the username and password in the configuration file (see below).
- The uploader prompts for and reads the username and password from standard input.
To avoid providing your credentials each time you use the uploader, you can store them in a configuration file. This file is called .sbgrc and resides in your home directory. This location varies across operating systems, but would typically be:
/home/$USER/.sbgrc on UNIX;
/Users/$USER/.sbgrc on OS X;
C:\Documents and Settings\%USERNAME% on Windows XP, 2000 and 2003; and
C:\Users\%USERNAME% on Windows Vista, 7, 8 and 10.
The .sbgrc configuration file should contain key-value pairs of the following form:
username = johndoe
password = supersecret123
auth-token = ec43d6dce3c54193ac18e3855f734ccf
You can specify the username and password, or the authentication token, or both. If both are given, then the authentication token will take precedence. The uploader will use these values only if no other authentication options are provided on the command line.
Please keep in mind that
.sbgrc configuration file may only contain a single set of credentials. If multiple "username", "password" or "token" lines are encountered, the uploader will disregard all values but the last.
Files on the Seven Bridges Platform are accompanied by metadata describing, amongst other things, their file type, origin, sample ID, and information about the sequencing technology used to create it. This metadata is often required by tools and workflows, and must be set before a file becomes fully usable. Click here to learn how to set metadata using the command line uploader. You can also manually set metadata later.
Normal termination. The upload has either finished successfully, or usage information was written to standard output.
The upload has failed in the pre-processing phase or the uploader was unable to initialize it properly.
Input arguments were not properly set.
Mandatory options were not set.
Authentication error; invalid user credentials were used.
Bad metadata file.
Abnormal termination; an unknown error caused the upload to fail.
For example, if you want to use the Command Line Uploader to upload the FASTQ file sample1.fastq, with the associated metadata file, sample1.fastq.meta, to a project whose ID is 1234. Then, you should enter
sbg-uploader$ bin/sbg-uploader.sh -t $AUTH_TOKEN -p 1234 sample1.fastq --tag "fastq" --tag "sample 1"
$AUTH_TOKEN with your own authentication token.
As shown in the example above, don't forget to change directory to the one containing the sbg-uploader, and to prefix the executable name with