gcp

Efficient files transfers with gcp

 gcp is a grid enabled version of the scp copy command.

The gcp command is meant to emulate as closely as possible the scp command while using globus-url-copy for file transfers for performance reasons. gcp allows recursive transfers of directories and wildcards.

Since gcp is a grid tool using globus-url-copy it needs to use grid certificates for authentication. This is accomplished by first checking whether a valid grid proxy certificate exists. If that is not the case, gcp will attempt to run grid-proxy-init. Failing that gcp will contact the myproxy.westgrid.ca proxy server in order to obtain a proxy certificate. If that fails as well, gcp issues an error, explaining how to upload certificates to the myproxy server. The whole authentication process is setup such that it is not required that the user has knowledge of grid authentication.

In most cases the syntax for gcp is the same as for scp.

Some useful options:

  • -r : recursive transfer (obviously you only need this if you want to descent into directories)

    -v : verbose output; gcp is giving detailed information on what it is doing.

    -i : gcp will print a list of source files and destination files and (in the case of -r) destination directories that it will create on the destination host. It then will ask whether you want to proceed at which point you can answer 'n' and no files will be transfered.

Detailed information about gcp is available through its man page: type man gcp.

On machines with hierarchical storage management (HSM) gcp will recall files from tape in an efficient manner before attempting to transfer the files. Within WestGrid this applies to the network storage facility (gridstore). In order to take advantage of the HSM recall mechanism the sources must reside on the local host, i.e., the transfer must be initiated on the host with the HSM system. Therefore: if you want to transfer files that reside on the storage facility, ssh to gridstore first and then use gcp to transfer the files. This is particularly important for files and directories that reside in the data and vault directories. Transfering these files from blackhole, robson or tantalus or any WestGrid host other than gridstore can result in large delays and very bad performance, particularly when more than one file is transferred.

Examples

  • gcp file1 glacier:document

    • will transfer the file file1 in the current directory to glacier under the name document in your glacier home directory.

    gcp -r dir1 gridstore:data

    • transfer the directory dir1 and all its files, subdirectories to gridstore into the directory data/dir1 on gridstore.

    gcp -r docs* programs/*.exe progdir gridstore:vault/new

    • copy recursively all files and directories that start with docs, all files that match programs/*.exe, and the directory progdir into the directory vault/new in your home directory on gridstore.


Updated 2009-01-30.