Folder Structure
Folders are suffixed with red
or green
to indicate the type of data that is stored there. Red is for potentially sensitive data that should not be shared outside. Green is for data that can be shared with the outside world. When you log into your sandboxes, you will have a number of folders available for you. To get started, we will concentrate on the library-red
, red
, and home
folders.
This reference page goes through the other folders and explain what they are for and how they should be used.
The following is a high-level overview of the directories in the TRE:
Useful Folders
- library-red
- red
- Home
Available at
library-red
in your sandbox, this is a read-only folder that is shared between all users. It contains the data you need for your analyses. library-red
is slower storage of large capacity (>8 PiB as of February). For large files, the entire file must be read and cached first by gcsfuse; direct file seeking to a specific part of the file is not possible.For high-performance or large files, it may be better to make a copy to red
or home/ivm
.library-red
corresponds to the Google Storage bucket gs://qmul-sandbox-production-library-red/
(read-write access only for admins). library-red
stores curated and raw data necessary for your analysis. This is where you will find the data you need to run your analysis. It includes several subfolders, each designated for specific data types and purposes. If you find a folder without a readme file, please contact the Genes and Health team for more information on its intended use.red
is used directly by the virtual machine, and is specific to each sandbox. Data that will not be used during job execution should be placed in the red
directory, while any data actively being worked on should go in the /home/ivm
directory. Users in the same sandbox can view the contents of the red folder. Most organisations use this folder to store and run their analysis. It is advisable to create your own directory in the red folder to store your data. This will allow you to share your data with other users in the same sandbox without risking accidental deletion by others. In the Old TRE you can do this directly in the File Manager or on the Command Line. In the New TRE, files can only be copied to the red
bucket by right-clicking on the file and selecting Upload to red bucket
.This can also be done via
gsutil
from within the TRE, for example:Run this command to copy files to the selected sandbox:
gsutil cp -r -n my_file gs://qmul-production-sandbox-1-red/
To remove a file from
/genesandhealth/red
, right-click on it in the File Manager and select Delete from Red Bucket
.Available at
/home/ivm
in your sandbox, this is your personal folder. This folder can be used to store any files you wish to keep, but it is not backed up. If you delete a file from here, it is gone forever. /home/ivm
is semi-fast (HDD) storage and as such is faster than other parts of the sandbox. It might be worth running some jobs here, especially if you are loading large amounts of data. However, remember that this folder is not backed up, so anything you want to keep should be moved to the red folder.