Transferring files
Computational workflows often require files on Roar, archive storage, OneDrive, or your laptop or desktop machine to be transferred -- copied from one place to another.
File transfer rates
Multiple tools exist to perform file transfers. No single tool is best for all cases; below, we recommend methods, and list approximate transfer rates for large files.
| Transfer | Method | Rate (MB/sec) |
|---|---|---|
| Roar ↔ Archive | Globus | 50 |
| Roar → OneDrive | Firefox or Globus | 50 |
| OneDrive → Roar | Firefox or Globus | 10 |
| Roar ↔ laptop | Portal Files Menu (< 1GB) | 25 |
| Roar ↔ laptop | Cyberduck or FileZilla | 15 |
| Roar ↔ laptop | Globus Personal Collection (> 1GB) | 50 |
| OneDrive ↔ laptop | Web Access | 20 |
(Transfer rates may be slower, if limited by intervening network or storage speeds.)
Portal
The Portal top menu under Files/Home opens a window that enables file transfer between Roar and your laptop. With this utility, small files can be moved, edited, uploaded, and downloaded.
Use this method only for moving small (<1 GB) files; for larger files, use Globus.
Upload Button Issues
The "Upload" button on the Portal fails when transfering files above 1GB. Instead, use the "Globus" button, which accesses the Globus interface.
Globus
Globus is a web-based tool designed for transfers of large files. It can move files from Roar to filesystems outside Penn State, including our PSU OneDrive accounts. Globus is interactive, but time-consuming file transfers can be submitted as batch jobs.
Globus moves files between named "collections".
Many institutions have established collections;
ICDS has endpoints for Roar, Archive, and OneDrive:
| Filesystem | Endpoint |
|---|---|
| Roar | Penn State ICDS RC |
| Archive | Penn State ICDS Archive |
| PSU OneDrive | Penn State ICDS OneDrive |
To transfer files to or from a laptop, use the upload/download buttons on the Globus web interface.
Globus Connect Personal
Globus Connect Personal can make files from your personal computer (laptop or desktop) available as a Globus collection. Once set up, you can securely transfer files between your computer and other Globus collections.
To get started, download and install the Globus Connect Personal client, available for Linux, macOS, and Windows.
Globus Guest Collections
Globus can be used to share data with collaborators by creating Guest Collections. Collections can be set up with a variety of sharing options from complete public access to individual user-level permission levels.
Guest Collections are subject to the following restrictions:
- User-level Collections can be created anywhere within the user’s work directory.
- Group-level Collections can only be created by the faculty owner, within the default directory of the group storage.
Globus Collections Authorized Directories
| Filesystem location | Authorized User |
|---|---|
/storage/work/$USER |
Directory Owner |
/storage/group/$USER/default/ |
PI / Faculty Owner |
(where $USER matches the Penn State ID of the directory owner)
Permission Errors
Users can create Guest Collections anywhere in the Roar filesystem Globus can access; “Permission Denied” errors will occur if they are created outside of authorized directories.
Custom Collection locations
Collections inside group storage with non-standard naming conventions (where $USER (above) is not a Penn State ID) will not work by default. Please contact us to set up an exception.
sftp
sftp (secure file transfer protocol) is a Unix tool
for file transfers. To launch sftp,
sftp <username>@<address>
<address> is the address of a "remote machine",
and <username> is your userid on that machine.
For sftp to Roar, the address is submit.hpc.psu.edu,
the same as for ssh logon.
Just as for ssh logon, you will be prompted for your password on the remote machine, and multi-factor authentication (MFA) if the remote machine requires it.
sftp is an interactive program.
Once logged on, you can copy files
from the local machine to the remote with put <filename>,
and from the remote machine to the local with get <filename>.
When sftp launches, your location on the local machine
is the directory in which you launched sftp,
and your location on the remote machine is your home directory.
To control where on the remote machine files go to and come from,
within sftp you can navigate on the remote machine with cd
and list files with ls.
Likewise, within sftp you can navigate on the local machine
with "local" versions of these commands, lcd and lls.
"Graphical" sftp clients for your laptop can be used for file transfer to Roar, as well as to OneDrive or other cloud storage providers. Two popular options for both OS X and Windows are Cyberduck and FileZilla.
rsync
Sometimes, you want to copy a directory of files
from one place to another,
and then later update the copy
with any changes made to the originals,
so that the copy reflects the current version.
If the directory contains many files but only a few are changed,
it would be nice to have a program that automatically updates
only the changed files. rsync does this:
rsync <options> <source-path> <destination-path>
<source-path> to <destination-path>,
and deletes files if necessary,
to make the destination the same as the source.
Later, if you change files on <source-path>,
run rsync again to update files on <destination-path>.
rsync has several important options:
a– "archive" mode; traverses directories recursivelyv– "verbose"; reports which files are copiedz– "zips" (compresses) the files on transferh– "human readable" reporting
The source and destination can be on the same filesystem, or they can be different machines entirely. From a Unix command line on your laptop (running Linux or OS X),
rsync /work/newData abc123@submit.hpc.psu.edu:/storage/work/abc123/toAnalyze/
Note that the destination pathnames end with a /;
this signifies that the copied files will go into the directory toAnalyze.
For more examples, visit Tecmint.
With rsync, the source is the original, and the destination is the copy. Don't reverse direction, or you will confuse rsync and yourself, and wind up clobbering or deleting files.