The Seven Bridges Knowledge Center

The Seven Bridges Platform is a simple solution for doing bioinformatics at industrial scale. But sometimes, everyone needs a little help.

Get Started    What's new?

Set metadata using the command line uploader

You can use the Command Line Uploader to set some or all of the metadata during upload. Or, you can manually set metadata later.

Set metadata for a single file

This feature depends on your cloud infrastructure provider

To use this feature, you should know which cloud infrastructure provider you run the Seven Bridges Platform on: Amazon Web Services in the US (AWS US East); Amazon Web Services in Frankfurt, Germany (AWS EU) or Google Cloud Platform.

If you didn't chose a cloud provider when you signed up for the Platform, you will be using AWS US East.

If you signed up from early 2016, you will have had the option to select your cloud provider from AWS and Google Cloud Platform. If you signed up from early 2017, you will have had the additional option to select AWS EU as your cloud provider.

For each file queued for upload, the Uploader looks for a supplementary file containing metadata to set for the file. This supplementary file should exist in the same directory as the file being uploaded, have an identical name to the original filename, and be appended by .meta.

For example, if you are uploading sample1.fastq, the supplementary file should be named sample1.fastq.meta.

The supplementary file should contain a valid JSON object, as shown in the example below. Key-value pairs from this JSON object will be set on the server as metadata describing the uploaded file. For a list of key-value pairs that should be used to set a file's metadata, see the section on the JSON metadata schema in our documentation on metadata.

If the supplementary .meta file contains invalid JSON or metadata values that fall outside of their acceptable range, a warning will be issued on the standard output, but the file upload will continue. Note that if you set invalid metadata values, the workflows you use with your files may not function correctly.

Supplementary files do not need to be included for upload in order for their metadata to be applied to the files being uploaded. Parsing and assigning metadata from supplementary files happens automatically as long as they are properly matched to their principal files via the naming convention described above.

The following array of key-value pairs is an example of the metadata that could be contained in the metadata file sample1.fastq.meta:

{
  "sample_id": "sample1",
  "library_id": "library1",
  "paired_end": "1",
  "platform": "illumina HiSeq",
  "quality_scale": "illumina13"
}

For Seven Bridged users on AWS:

If you are using old style projects and want to set metadata using the command line uploader, you need to use the following array of key-value pairs instead of example above:

{
"file_type": "fastq",
"sample": "sample1",
"library": "library1",
"paired_end": "1",
"qual_scale": "illumina13",
"seq_tech": "illumina"
}

Apart from the standard set of metadata fields that can be seen through the visual interface, you are also able to add custom metadata for your files. Custom metadata fields are user-defined key-value pairs that allow you to provide additional metadata associated to files on the Platform. Custom metadata can be added via the command line uploader or via the API, but not through the visual interface.

Custom metadata fields will not be visible on the visual interface, but their values can be retrieved by getting file details via the API.

When adding custom metadata fields, you need to pay attention to the following set of rules:

  • Keys and values are case sensitive unless explicitly treated differently by a tool or a part of the Platform.
  • Maximum number of key-value pairs per file is 1000, including null-value keys.
  • Keys and values are UTF-8 encoded strings.
  • Maximum length of a key is 100 bytes (UTF-8 encoding).
  • Maximum length of a value is 300 bytes (UTF-8 encoding).

Set metadata for multiple files using a manifest file

Metadata can be set for multiple files during the upload by supplying a manifest file which contains the metadata for a group of accompanying files.

Set metadata for multiple files

Metadata can be set for multiple files during the upload by supplying a manifest file that contains the metadata for a group of accompanying files.

Learn more about the manifest file format.

Upload files and set metadata

To upload multiple files and set their metadata using the manifest, issue the following command:

sbg-uploader.sh --manifest-file filename.csv --manifest-metadata

This will upload all files which are specified in the manifest (e.g. filename.csv) and apply relevant metadata for each of the files.

The --manifest-file option is used for specifying the name (and path) of the manifest file, while the --manifest-metadata option instructs the Command Line Uploader to also parse metadata values from the manifest.

Upload files and set individual metadata fields

To upload multiple files and set individual metadata fields, issue the following command:

sbg-uploader.sh --manifest-file filename.csv --manifest-metadata sample paired_end

In the example above the only two metadata fields which will be set for to uploaded files are sample and paired_end. The metadata fields are specified after the --manifest-metadata option.

You can specify any number of metadata fields by listing them after the --manifest-metadata option.

Upload files without setting metadata

The manifest file allows you to specify multiple files for the upload without setting any metadata. This is useful in case you are dealing with larger volumes of data, or if you want to automate the upload of a fixed list of files.

To upload files which are specified in the manifest while omitting the metadata, issue the following command:

sbg-uploader.sh --manifest-file filename.csv

Perform a dry run

Before performing an actual upload you can do a dry run. This will only output data in the terminal allowing you to check if all the settings are correct without uploading anything. To perform a dry run, issue the following command:

sbg-uploader.sh --manifest-file manifest.csv --manifest-metadata --dry-run

To only output information about specific metadata fields, issue the following command:

sbg-uploader.sh --manifest-file manifest.csv --manifest-metadata --dry-run sample library

The sample and library metadata fields are the only ones which will be outputted in the terminal.

You can specify any number of individual metadata fields by listing them after the --dry-run option.

General notes

The Command Line Uploader assumes that both the files which are being uploaded and the accompanying manifest file reside in the same directory. If that is not the case, you can specify the path:

  • within the manifest, by prepending the file path to the file name.
  • in the command line by specifying the full path to the manifest file.

If a file you have specified in the manifest also has an accompanying .meta file, the contents of that .meta file will be applied in addition to what is parsed from the manifest, expanding and/or overriding any key-value pairs.



<< Previous: Metadata on the Seven Bridges Platform

Set metadata using the command line uploader