Gene prediction without training?

A new paper in Genome Research from Borodovsky lab at Georgia Tech provides an improved ab initio gene prediction building on their previous program GeneMark called GeneMark.hmm ES.  This application doesn’t require a training set when building models for gene prediction in fungal genomes and reports to have as good or better sensitivity and specificity than most of the commonly used ab initio programs. They are picking up on proviously described insights about fungal gene structures and introns which is the lack of a necessary branch site and varying degrees of conservation of splice-sites in most intron rich fungi (Schwartz et al, 2008) and that these intron sizes remain short across the fungi (Stajich et al. 2007).

In practice it should simplify the initial genome annotation protocols used and could really streamline the procedures. It doesn’t replace the need to gathering EST sequence data that can also be used generate a training set in an automated fashion.  EST and transcriptional evidence is still very important for identification of UTR and alternative splicing isoforms.

Hopefully these data from the predictions will integrate into the Cryptococcus and Coprinus genome annotations that are undergoing an update at the Broad.  We’ll see how well this performs on a couple of the Chytrid genome sequences we are working on as well.