How to access my old Swestore data via iRODS?
Using the archrest tool
module load irods
... to load the irods module
archrest.sh /swestore-legacy/proj/PROJECTID/ARCHIVE_ID FILE_TO_RESTORE
... where PROJECTID is the name of the project the file was archived under, ARCHIVE_ID is the archive collection name you received after the file was archived and FILE_TO_RESTORE is the name of the file you want to restore.
It will run (probably for a very long time). After it's done, it will have created a temporary folder with a name starting with slask. and the FILE_TO_RESTORE should be in the directory from which archrest.sh was called.
List and downloading files
1. log in to milou.
2. Load the irods/swestore_legacy module:
module load irods
3. Change to your archive directory
... where PROJECT_ID is the project id under which the files have been archived.
4. List files in your (iRODS-wise) current working directory:
5. Change directory (irods-wise) into one of the folders listed:
6. List files again:
(Note that You will not see the files that you uploaded *directly*, but instead the structure of (sometimes splitted) tar chunks, that we use to make our current upload process robust. More about that further below)
7. Download (to your current working directory on milou) a file:
(Recommended to start with one of the meta files, since otherwise you will start a 20GB download, which takes a little time)
8. Try to read the file:
Some notes on the file structure
For making the handling of the thousands/millions of files uploaded to Swestore robust, we chunk all files in to max 20GB big tar files, and if a single file does not fit withing 20GB, we split that into 20GB split files, according to Swestore recommendations.
Thus, you can not see the files you uploaded *directly*. BUT, if you search through the chunkXXX.meta files in the/proj/a2010002/swestore/moved2swestore/arch_mssn-XXXXXXXX-XXXXXX/ folders, and find the chunk number (and archiving mission id) where your file is located, you are actually now able to download the data yourself.
You will need to download the whole tar chunk and untar it, in order to retrieve your file, and if the chunk is splitted, you need to download and concatenate (with the cat command) the split files before untar:ing. This can all be done in a single piped command though, so that it is faster and does not need your intervention. See how to do that below.
How to concatenate and untar files
iget chunk001.tar.split00001 iget chunk001.tar.split00002 iget chunk001.tar.split00003
... etc ...
mkdir [some-folder-name] cat chunk001.tar.split* | tar -xvf - -C [some-folder-name]/
(The cat command will write the contents of all files matching the pattern "chunk001.tar.split*", to stdout, which is here piped to tar.
For tar, the "-" sign after the "f" flag, tells tar to read from stdin, from the cat command, and the -C flag tells to what folder the contents will be unpacked
If you know that the content of the tar file is e.g. a bzip2:ed tar file, you can actually unpack that, *inner* tar file in the same go as well, by using the "-O" flag to tell tar to write to stdout, and do another tar operation after that (in the example, the "j" flag is added to the second tar, in order to do un-bzip2:ing):
cat chunk001.tar.split* | tar -xvf - -O | tar -jxvf - -C [some-folder-name]/