Other bioinformatics-oriented local data resources

GnomAD: Genome Aggregation Database

The Genome Aggregation Database (gnomAD) VCF database is downloaded and loacated in /sw/data/gnomad_data/vcf/{exomes, genomes}.

ExAC: Exome Aggregation Consortium

The ExAC Exome Aggregation Consortium database releases 0.1, 0.2, 0.3 and 0.3.1 are downloaded in their entirety and are available at /sw/data/ExAC/release{0.1,0.2,0.3,0.3.1}.

Pfam

The Pfam database versions 28.0 and 31.0 are downloaded in their entirety and available at /sw/data/Pfam/{28.0,31.0}. The family-specific trees are available in /sw/data/Pfam/{28.0,31.0}/trees. The given directory can be used for the -dir argument to the pfam_scan.pl script provided by the pfam_scan module.  Module version pfam_scan/1.5 is for Pfam 28.0, and module version pfam_scan/1.6 is for Pfam 31.0.

pfam_scan.pl -dir /sw/data/Pfam/31.0 ...

The pfam_scan.pl script is designed to work with the Pfam database.

An earlier version of the Pfam database, an unknown version marked '2011', is available at /sw/data/Pfam/2011. The module pfam_scan/1.3 should work with this version.

dbCAN

The dbCAN 4.0 database for automated carbohydrate-active enzyme annotation is now available in directory /sw/data/dbCAN/4.0 on Uppmax servers. The database is formatted for use with the hmmer/3.1b1-{gcc,intel} modules. For more information see /sw/data/dbCAN/4.0/readme.txt or the remote version.

The local path to the script for post-processing hmmscan --domtblout output is /sw/data/dbCAN/4.0/hmmscan-parser.sh. The CAZyDB trees have also been unpacked and are available in /sw/data/dbCAN/4.0/CAZyDB-phylogeny.

Variant Effect Predictor cache files

A local cache for all database files available for Ensembl's Variant Effect Predictor 87, 89 and 91 are available in directories /sw/data/vep/{87,89,91}. When module version vep/89 or vep/91 is loaded, the environment variable VEP_CACHE is set to the directory for the appropriate version.  Local caches for versions 82, 84 and 86 exist only for homo_sapiens.  To use the cached databases, run the script using the --cache option to indicate the use of a locally-cached database, and the --dir option to specify where this is:

vep --cache --dir $VEP_CACHE  ...

If you are using vep/89, use:

variant_effect_predictor.pl --cache --dir $VEP_CACHE  ...

All plugins are also available.  For more script options, see its online help page.

CDD - Position-Specific Scoring Matrices for CD-Search

The CDD database versions 3.14 and 3.16 are downloaded in their entirety and are available at /sw/data/cdd/{3.14,3.16}. These directories contains collections of position-specific scoring matrices (PSSMs) that have been created for the CD-Search service (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi).

The PSSMs are meant to be used for compiling RPS-BLAST search databases, which can be used with the standalone RPS-BLAST programs (rpsblast and rpsblastn). These programs, as well as the makeprofiledb application needed to convert files in this directory, are part of the BLAST+ executables (available on Uppmax as part of bioinfo-tools, e.g., module blast/2.2.31+). The makeprofiledb application is described at http://www.ncbi.nlm.nih.gov/books/NBK1763.

More information is available at ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/README or its local copy /sw/data/cdd/README.

iGenomes - Collection of reference sequences and annotation files

A local copy of illumina's iGenomes collection of commonly analyzed organisms is available at /sw/data/igenomes. In addition to the annotations provided by the collection, Bismark and STAR indexes have been added.

UK Biobank institutional data set (GENETICS)

The UKBB data set is available for eligible projects in the system for sensitive research SNIC-SENS Bianca. If you believe you are elgiible, contact Professor Tove Fall to gain access.