Managing module dependencies
On UPPMAX and other clusters, one can frequently encounter problems with software tools that are not caused by the tools themselves, but instead are caused by conflicting dependencies between multiple tools in concurrent use.
This is easiest to show with an example, where the cutadapt tool works, and then doesn't, after a conflicting module load.
milou-b: ~ $ module load bioinfo-tools cutadapt/1.8.0 milou-b: ~ $ cutadapt -h cutadapt version 1.8 Copyright (C) 2010-2015 Marcel Martin <firstname.lastname@example.org> cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq ... milou-b: ~ $ module load python/2.7 milou-b: ~ $ cutadapt -h /sw/comp/python/2.7.6_milou/bin/python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
The message results from a conflict in the version of python expected by
cutadapt in the
cutadapt/1.8.0 module -- a conflict in module dependencies. We back up and learn more about the problem by examining what is loaded when we load
milou-b: ~ $ module unload python cutadapt milou-b: ~ $ module list Currently Loaded Modules: 1) uppmax 2) bioinfo-tools milou-b: ~ $ module load cutadapt/1.8.0 milou-b: ~ $ module list Currently Loaded Modules: 1) uppmax 2) bioinfo-tools 3) cutadapt/1.8.0 4) python/2.7.6
We see that loading
cutadapt/1.8.0 also resulted in loading
python/2.7.6. This is a dependency, and if the python module is unloaded or a different version is loaded, as we did by loading
python/2.7 above, problems will start to appear.
milou-b: ~ $ module load cutadapt/1.8.0 python/2.7 milou-b: ~ $ module list Currently Loaded Modules: 1) uppmax 2) bioinfo-tools 3) cutadapt/1.8.0 4) python/2.7
This is particularly a problem for python- and Perl-based tools, for which different tools may depend upon different interpreter versions, or (more likely) the predominant version in use at installation was different, say python 2.7.6 vs 2.7.9.
This is also a problem when using say two modules that themselves depend on different versions of a dependency.
milou-b: ~ $ module list Currently Loaded Modules: 1) uppmax 2) bioinfo-tools milou-b: ~ $ module load cutadapt/1.8.0 milou-b: ~ $ module list Currently Loaded Modules: 1) uppmax 2) bioinfo-tools 3) cutadapt/1.8.0 4) python/2.7.6 milou-b: ~ $ module load pysam/0.8.3-py27 The following have been reloaded with a version change: 1) python/2.7.6 => python/2.7 milou-b: ~ $ module list Currently Loaded Modules: 1) uppmax 2) bioinfo-tools 3) cutadapt/1.8.0 4) pysam/0.8.3-py27 5) python/2.7
Note the message about changing the python version while loading
pysam/0.8.3-py27. This will (as we saw above) prevent us from using
pysam/0.8.3-py27 is loaded.
The application experts at UPPMAX try to reduce the likelihood of such issues by standardising on particular interpreter and compiler versions during installation. This is not always possible, as important bug fixes or performance enhancements might be available available in later versions of an interpreter, or a tool requires specific features introduced in a later version.
At times we do provide tool versions that depend upon different interpreter versions; in the above case,
pysam/0.8.3-py27 depends on
python/2.7, while the
pysam/0.8.3 module depends on
python/2.7.6, and would be a much better choice in this instance. But in general, there is also little time available to reinstall already-installed tools where the only change in installation is in the interpreter version used.
Tools themselves can do much more to manage their own dependencies independently. For example, python-based tools can set up their own virtual environment so that the python interpreter and libraries used by the tool are specific versions fixed during installation. Unfortunately, this is not common.
Ultimately, it is the user's responsibility to manage dependency conflicts in tools. Here are a few tips to help with this.
1. Check dependencies first
Check dependencies of modules by using
module list before and after a module is loaded. This will give you a clue to whether dependency problems are likely. When loading multiple modules, pay attention to messages like the one above:
The following have been reloaded with a version change: 1) python/2.7.6 => python/2.7
2. Use tool versions with no dependency conflicts
Put the knowledge gained by using
module list, together with our limited documentation on the Installed Software List, to figure out which versions of different modules have compatible dependencies. After you find
pysam on that page, you will see that
pysam/0.8.3 is the version that has
python/2.7.6 as its dependency, while
pysam/0.8.3-py27 that you were loading has
python/2.7 as its dependency, and that conflicts with what cutadapt needs. We should have loaded
We have a few such dependencies listed for modules on the Installed Software page, but this information is incomplete and likely always will be. By using
module list as noted above, you will have complete information.
3. Load and unload modules as you need them
Tools within different modules can only have problems arising from conflicting dependencies if the different modules are both loaded at the same time. It is better to load and unload modules as you need them. This can be particularly helpful with python- and Perl-based tools.
We often see (and I often write) SLURM scripts that contain a preamble that loads several modules at once, even when the modules are not required simultaneously. This practice is helpful for self-documentation, but is also a potential source of conflicting dependencies.
For example, a SLURM script for a variant-calling pipeline starting from raw sequence data does not need to load a read-mapper module while performing QC with tools loaded with the
FastQC modules; does not need to have any of these modules loaded while mapping reads and creating BAM files with tools from the
samtools modules; and does not need a read mapper module loaded when manipulating BAM files with
Picard or calling variants with
Some modules simply won't create conflicts, for example current versions of
GATK won't conflict with python-based tools, nor will it conflict with compiled tools such as