-
Notifications
You must be signed in to change notification settings - Fork 2
Dataset rsync instructions
All of the following commands should work in bash on Linux and macos, as well as PowerShell on Windows. You must have a working installation of rsync and, for certain commands, ruby or perl.
Replace $TREE with to the name of the rsync tree you are connecting to (such as ht_text_pd) and $LOCAL_PATH is the path on your local filesystem you want to write the dataset to, for example /path/to/datasets.
rsync --copy-links --delete --ignore-errors --recursive --times --verbose datasets.hathitrust.org::$TREE $LOCAL_PATH
id_list.txt must be a plain text file containing one HathiTrust Volume ID per line, with Unix line endings and no other encoding (URL esaping, quotes, etc)
First run pip install pairtree, then save this script as ids_to_ppath.py:
import sys, pairtree;
for line in sys.stdin:
(n,i) = line.strip().split('.',1);
print("/".join([n, 'pairtree_root', pairtree.id2path(i), pairtree.id_encode(i)]))Then run:
python ids_to_ppath.py < id_list.txt > path_list.txt
First run gem install pairtree to install the pairtree gem.
ruby -e 'require "pairtree";ARGF.each {|l|l.chomp!;n,i=l.split(/\./,2);puts "#{n}/pairtree_root/#{Pairtree::Path.id_to_path i}"}' id_list.txt > path_list.txt
First install File::Pairtree CPAN module
perl -MFile::Pairtree -ne 'chomp;($n,$i)=split /\./,$_,2;print "$n/".File::Pairtree::id2ppath($i).File::Pairtree::s2ppchars($i)."\n"' id_list.txt > path_list.txt
Replace $TREE with to the name of the rsync tree you are connecting to (such as ht_text_pd) and $LOCAL_PATH is the path on your local filesystem you want to write the dataset to, for example /path/to/datasets.
rsync --copy-links --delete --ignore-errors --recursive --times --verbose --files-from=path_list.txt datasets.hathitrust.org::$TREE $LOCAL_PATH