Lab Notebook - Beagle optimiser

Download from github.
Code page.

Update:

beagle_optimiser now works with MrBayes too.

Background

Phylogenetic inference can be computationally costly, perhaps not in money but in time. To log in to your favorit cluster in order to run that analysis you have worked on preparing the last hours/days/weeks to find there are no CPU's available, can be frustrating. Perhaps more frustrating is to realize that your analysis will probably run longer that the "hard time limit" your account have on that cluster, and hence, will be terminated prematurely. The solution to this problem is sometimes to use the Beagle library for the likelihood calculation. This library can potentially shorten the computation time significantly. The greatest merit of the beagle library is that it can utilize the Graphics Processing Unit's (GPU's) on a computer, since GPU's are much faster than regular CPU's for some calculations. Many researches don't yet have access to this type of hardware but can still benefit from using the Beagle library in combination with regular CPU's. A number of options to Beast + Beagle are available, but there is not that strait forward to determine, apriori, which options are optimal for the combination of hardware, software and input file that are at hand.

Objectives and Progress

  • Create a script that finds the optimal (for speed) Beagle settings - DONE
  • Integrate the script with SGE on Albiorix

Method

To deal with this I have created a smal Python script called Beagle_optimiser that will make short test runs with any given XML input file, in order to find the option(s) that will presumably result in the shortest execution time. The initial idea for this script emerged after reading about Andrew Rambaut's tests with different datasets and hardware.

Results

Page five in this pdf show some peculiar results.

TODO

  • Come up with a beter name for the program
  • Design and write the R code that will be used to analyse the output from "BO"
  • Request people to share their XML files so I can compile a set of XML files with different properties (nr. or taxa, nr. or unike site patterns, different models etc.) that can be used for the investigation. - DONE
  • Make the code work with MrBayes



In order to find out how different datasets, Beast options and hardware influence the execution time I need to make test runs of many different types of beast files. Therefore, I ask you to share your beast XML file with me. I'm not interested in the results from de different phylogenetic analyse (hence no risk of being 'Scooped' by sharing), only in the execution time of each file.