Brandon Invergo

My CODEML pipeline

For my research, I make heavy usage of CODEML, a program that is part of the Phylogenetic Analysis by Maximum Likelihood (PAML) package. While the program is quite handy in general for molecular evolutionary analysis, it’s a chore to integrate into pipelines used to analyze large numbers of genes. In the course of my research, I’ve built up a pipeline that I use to simplify the process. A big part of that, a Python interface to PAML, has since been integrated into Biopython. The rest of the pipeline has slowly suffered an accumulation of the effects of bit rot. Parts of it were poorly designed and needed to be modified each time I ran it.

I got tired of that so I have spent the better part of the last week cleaning up some of the code and completely rewriting other parts of it to be fit for usage by other people. In particular, while much of the code that actually runs CODEML remains largely unchanged, much of the actual pipeline logic, determining when to run each part of the analysis, has been replaced by a (hopefully) robust Makefile. Make already magically handles process inter-dependencies so it didn’t make sense that I was re-inventing the wheel in my Python code. The end result is much cleaner in my opinion, even if Make recipes are a bit harder to read than Python code.

I have now made the pipeline available on Github. Enjoy!