A paper in PLoS One, Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes, reports a new approach to assess the performance of automated orthology detection. These authors also wrote the OrthoMCL (2006 DB paper, 2003 algorithm paper) which uses MCL to build orthologous gene families. The authors discuss the trade-offs between highly
sensitive specific tree-based methods and fast but less sensitive approaches of the Best-Reciprocal-Hits from BLAST or FASTA or some of the hybrid approaches. The authors employ Latent Class Analysis (LCA) to aid in “evaluation and optimization of a comprehensive set of orthology detection methods, providing a guide for selecting methods and appropriate parameters”. LCA is also the statistical basis for feature choice in combing gene predictions into a single set of gene calls in GLEAN written by many of the same authors including Aaron Mackey.
I’ve been reading a lot of orthology and gene tree-species tree reconcilation papers lately, some are listed in Ian Holmes’s group as well as listing some of the software on the BioPerl site. This also follows with on our Phyloinformatics hackathon work which we are trying to formalize in some more documentation for phyloinformatics pipelines to support some of the described use cases. I’m also applying some of this to a tutorial I’m teaching at ISMB2007 this summer.