Large parallel filesystems found on HPC clusters -- such as /home, /scratch and /project — have one weak spot: they were not designed for storing large numbers of small files.
Due to this limitation, we always advise our users to reduce the number of files stored in their directories, either by instrumenting their code to write fewer larger files, or by using an archive tool such as the classic Unix utility `tar` to pack their files into archives.
There is a little-known, but incredibly useful open-source tool called `dar` that was developed as a faster, modern replacement to `tar`. DAR stands for `disk archive` and supports file indexing, differential and incremental backups, Linux file Access Control Lists (ACL), compression, symmetric and public key encryption, remote archives, and has many other nice features.
In this webinar we will go through several use cases for `dar` both on Compute Canada clusters and on your own laptop with a bash shell. We will show you how to manage directories with many files, how to backup and restore your data, and other workflows.