Upload via the command line

Overview

The Seven Bridges CLI Uploader is the recommended tool for uploading data to the Seven Bridges Platform. It is integrated into the Seven Bridges CLI.

The Command Line Uploader is recommended for large scale uploads. If you need to upload smaller scale uploads instead, we recommend using the Web uploader.

The maximum number of files you can submit for upload is 250,000.

COMMAND LINE OPTIONS

Syntax:

sb [global-parameters] upload <upload-subcommand> [command-parameters]

The supported global parameters are:

OptionShort parameterDescription
--config <string>Configuration file to use instead of the default one.
--help-hDisplay help.
--profile <string>Configuration profile to use from credentials file. Please note that this parameter is applicable only to "start" subcommand, "resume" and "delete" subcommands will use the same profile as the one used for starting the upload job, while for "status" subcommand this parameter is not applicable. (default "default")
--debugRun the command with debug information in the output.

Subcommands

SubcommandDescription
startStart the upload job
statusCheck the upload status
resumeResume the upload job
deleteDelete an upload job. Please note that only jobs that are paused can be deleted. To pause an upload job, use CTRL+C.

Read below for detailed information and instructions for all of the commands.

Start the upload

This command initializes a new upload job. Each job is identified by its unique name which can be assigned manually (using the corresponding command parameter) or automatically. Job name can be used to track the upload status and resume its execution (in case it is paused).

The upload is performed in the foreground, in the current active command shell session. The command shell session needs to remain open during the upload command execution, otherwise the upload will be paused and can be resumed. During the upload providing any other input will not be possible.

You can perform multiple uploads (i.e. start multiple upload jobs) on a single machine. The following are the statuses for an upload job:

  • Initializing -preparing the list of items (files and folders) which will be uploaded
  • Running- the items are being uploaded.
  • Paused - the upload job can be interrupted by terminating active command shell or issuing the CTRL+C command 
  • Completed- this status denotes that all items are processed and the upload job is done; it doesn't necessarily mean all items have been uploaded successfully. The log file contains more detailed information about the upload job, including failed items. The status command provides an overview of the upload job, including the number of uploaded, failed, skipped, and remaining items.
sb [global-parameters] upload start [<input> | --manifest-file <manifest-file>] --destination <path> [command-parameters]

Required parameters

NameDescription
--destination <string>Upload destination, which can be either project root or a specific folder inside a Platform project. The destination can be specified either by project name or ID (e.g. "sbguser/sbgproject" or "sbguser/sbgproject/directory1" or "5dc01c9ae4b01e2d090a700a").

The project name should be specified by combining the information about the project owner and the name {project_owner}/{project}, where {project_owner} is the username of the user who created the project and {project} is the project slug (e.g. "rfranklin/my-project").

Optional parameters

NameDescription
<input>The Item (file or folder) or list of items to be uploaded. By default, the source folder structure is preserved on the Platform. The <input> and --manifest-file parameters are mutually exclusive, and one of them must be included within the start command.
--manifest-file <manifest-file>The manifest file which contains the list of items that will be uploaded as well as the accompanying metadata (only applies to files). By default, the source folder structure is preserved on the Platform after the upload. The <input> and --manifest-file parameters are mutually exclusive, and one of them must be included within the start command. Learn about the manifest file format.
--name <string>The upload job name which must be unique. The allowed characters are A-Za-z0-9_-. The maximum number of characters is 20. If omitted, the name will be automatically assigned using the following format "DAY_DDMMMYYYY_HHMMSS", for example Thu_27Feb2020_231455).
--autorenameIf a file with the same name already exists at the upload destination, an underscore followed by a serial number will be added as a prefix. In case of a folder, the contents of both folders will be merged. In case a file and a folder bear the same name, the upload will be skipped. The --autorename and --overwrite parameters are mutually exclusive, if omitted SKIP will be used as default method.
--overwriteIf a file with the same name already exists at the upload destination, it will be overwritten by the file that is uploaded.  In case of a folder, the contents of both folders will be merged. In case a file and a folder bear the same name, the upload will be skipped. The --autorename and --overwrite parameters are mutually exclusive, if omitted SKIP will be used as default method.
--tag <string>Tag which will be set for each file once its upload is complete. Please note that tagging is not applicable to folders on the Seven Bridges Platform. It is possible to set maximum 32 tags, with each tag being maximum 64 characters long.
--chunk-size <int>Preferred size of the upload part in bytes. If omitted, the default value of 64MiB will be used.
Unstable network connection: use for example --chunk-size 8000000 (note that min value is 5243000)
.

If you have an unstable network connection, we recommend using '--chunk-size 8000000'.
--parallelMaximum number of parallel file uploads. The allowed range for this value is 1 to 8. The default value is 8.


If you have a low upload speed or low (filesystem) read speed (eg. magnetic discs) or if you want to limit system resources used by the uploader, we recommend using --parallel 4 or --parallel 2.
--speed-limit <int>Maximum allowed network bandwidth for the process that executes this upload job. Should be specified in kbps (kilobits per second). If omitted, the maximum possible bandwidth will be used.

The following table illustrates the entire naming conflict resolution mechanism:

Check the upload status

Use the status command to check the upload status. You can check the status for all upload jobs by omitting the name parameter.

sb [global-parameters] upload status [<name>]

This will return the following information about each of the upload jobs:

  • Upload job name

  • Status

  • Processed (percent completed, bytes uploaded / total bytes)

  • Average upload speed (total bytes uploaded / total time spent in "Running" status)

  • ETA (Estimated Time of Arrival - expected remaining time for job completion)
    To check a specific upload job, you should specify the name of the upload job. This will return more detailed information about the upload job:

  • Upload job name

  • Status

  • Log file path

  • Time submitted

  • Upload command (note: if wildcard is used in upload command, expanded command will be displayed)

  • Total number of submitted files

  • Number of uploaded files (successfully uploaded to the Platform)

  • Number of skipped files

  • Number of failed files

  • Number of remaining files (files in "Queued" and "Uploading" state)

  • Processed (percent completed, bytes uploaded / total bytes)

  • Average upload speed - total bytes uploaded / total time spent in "Running" status

  • ETA (Estimated Time of Arrival - expected remaining time for job completion)
    The information about completed jobs will be kept in job history for at least 1 month after it has been completed or paused, unless it's been deleted (using the sb upload delete option, see below). After this time, the completed and paused jobs will be removed from the list of upload jobs.

Optional parameters

Name Description
<name>The name of the upload job. Use this parameter to obtain more detailed information about a specified job.

Resume an upload job

Use this command to resume a previously paused upload job. The upload will resume its execution from where it had been paused.

sb [global-parameters] upload resume [<name>]

Optional parameters

NameDescription
<name>Specify the name of the paused job you wish to resume. This parameter is required unless there's only one paused job.

Delete an upload job

Use this command to delete a specified upload job or all upload jobs. Only jobs that have been completed or paused can be deleted. To pause the job, use CTRL+C.

sb [global-parameters] upload delete [<name> | --all ]

Please keep in mind that an upload job is kept in the job history for (at least) 1 month after it has been completed or paused and is automatically deleted from the list after that time.

Optional parameters

NameDescription
<name>Upload job name, required if the -all parameter is not used. The <name> and -all parameters are mutually exclusive. One or the other is required.
-allDeletes all upload jobs which are COMPLETED or PAUSED. The <name> and -all parameters are mutually exclusive. One or the other is required.

Examples

This section will list several examples on how to upload files. The destination project that is used in all of the examples is: rfanklin/my-project.

Please note that tags will not be shown in the examples.

Initial state

The following snippet shows the example of a directory tree on a local computer (from the local path where sb commands are executed):

├── dir1
├── dir2
│   ├── dir2-1
│   │   └── file4.bam
│   └── dir2-2
├── file1.bam
├── file2.bam
└── file3.bam

Example 1

Upload a folder and 2 files into the project root and tag the uploaded files.

Command

sb upload start dir1/ file1.bam file2.bam --destination rfranklin/my-project --tag upload1

Terminal output

upload job name: Thu_08Apr2021_182030
COMPLETE
Successfully uploaded 2 of 2 files

Result on the Platform

├── dir1
├── file1.bam
└── file2.bam

Example 2

Upload a folder (with underlying folder structure and one file included) and 3 files into the existing folder within the project.

  • assigning name to the upload job (for easier status tracking)
  • tagging uploaded files
  • and specifying a custom setting for the number of parallel file uploads (covering the
    low upload speed case).

Command

sb upload start dir2 file1.bam file2.bam file3.bam --destination rfranklin/my-project/dir1 --name upload2 --tag upload2 --parallel 2

Terminal output

upload job name: upload2
COMPLETE
Successfully uploaded 4 of 4 files

Result on the Platform

The following will be the result on the Platform, assuming that the Example 1 (see above) is already executed and that the result remains intact:

├── dir1
│   ├── dir2
│   │   ├── dir2-1
│   │   │   └── file4.bam
│   │   └── dir2-2
│   ├── file1.bam
│   ├── file2.bam
│   └── file3.bam
├── file1.bam
└── file2.bam

Example 3

Upload 2 folders (with underlying folder structure and one file included) and 1 file into the project root, while:

  • choosing auto-rename as a method for resolving name conflict
  • tagging uploaded files
  • and specifying a custom setting for the chunk size (covering the unstable network connection case).

Command

sb upload start dir1/ dir2/ file1.bam --destination rfranklin/my-project --autorename --tag upload3 --chunk-size 8000000

Terminal output

upload job name: Thu_08Apr2021_182550
COMPLETE
Successfully uploaded 2 of 2 files

Result on the Platform

The following will be the result on the Platform, assuming that the two previous examples are already executed:

├── _1_file1.bam
├── dir1
│   ├── dir2
│   │   ├── dir2-1
│   │   │   └── file4.bam
│   │   └── dir2-2
│   ├── file1.bam
│   ├── file2.bam
│   └── file3.bam
├── dir2
│   ├── dir2-1
│   │   └── file4.bam
│   └── dir2-2
├── file1.bam
└── file2.bam

Example 4

Check the status for all upload jobs.

Command

sb upload status

Terminal output

Upload job name       Status                 Processed  Average speed  Estimated time  
                                                                                       
Thu_08Apr2021_182030  COMPLETED    100% (85.00/85.00B)      85.00 Bps  N/A  
upload2               COMPLETED  100% (114.00/114.00B)      38.00 Bps  N/A  
Thu_08Apr2021_182550  COMPLETED    100% (68.00/68.00B)      68.00 Bps  N/A

Example 5

Get the detailed status for an upload job.

Command

sb upload status upload2

Terminal output

Upload job name:       upload2  
Status:                COMPLETED  
Log file path:         /home/nikola/.sevenbridges/sb/logs/sb.log  
Time submitted:        08.04.2021 18:23  
Command:               sb upload start dir2 file1.bam file2.bam file3.bam --destination rfranklin/my-project/dir1 --name upload2 --tag upload2 --parallel 2  
Total files:           4  
Total size:            114.00 B  
# uploaded:            4  
# skipped:             0  
# failed:              0  
# remaining:           0  
% processed:           100%  
Average upload speed:  38.00 Bps  
ETA                    N/A

Example 6

Delete single upload job and check the status.

Commands

sb upload delete upload2
sb upload status

Terminal output

Upload job name       Status               Processed  Average speed  Estimated time  
                                                                                     
Thu_08Apr2021_182030  COMPLETED  100% (85.00/85.00B)      85.00 Bps  N/A  
Thu_08Apr2021_182550  COMPLETED  100% (68.00/68.00B)      68.00 Bps  N/A

Example 7

Delete all upload jobs and check the status.

Commands

sb upload delete --all
sb upload status

Terminal output

Upload job name       Status               Processed  Average speed  Estimated time