Category Archives: genome annotation

Updated Cryptococcus serotype A annotation

SEM of clamp cell, yeast cells and sexual spore chains. Courtesy R. Velagapudi & J. Heitman

A new and improved annotation of Cryptococcus neoformans var grubii strain H99 (serotype A) has been made available in GenBank and the Broad Institute website. This update is collaboration between several groups providing data and analyses and the genome annotation team at the Broad Institute.

Some changes noted by the Broad Institute include:

“This release of gene predictions for the serotype A isolate Cryptococcus neoformans var. grubii H99 is based on a new genomic assembly provided by Dr. Fred Dietrich at the Duke Center for Genome Technology. The new assembly consists of 14 nuclear chromosomes and a single 21 KB mitochondrial chromosome, and has resulted in a reduction of the estimated genome size from 19.5 to 18.9 Mb. Improvements in the assembly and in our annotation process have resulted in a set of 6,967 predicted protein products, 335 fewer than the previous release.”

Lichen genome projects and the power shift prompted by next-gen sequencing

Genome Technology highlights the very cool thing about next-gen sequencing – it puts the power in the hands of the researchers to explore genome sequence and doesn’t limit them to projects only funded through sequencing centers. The Genome Technology piece highlights work at Duke to sequence the genome Cladonia grayi, a lichenized fungus, with 454 technology at Duke’s Institute for Genome Sciences and Policy through their next-gen sequencing program. This is the way of the future where sequencing core facilities will be able to generate sequence only having to wait in the queue at the own university rather than through community sequencing project or sequencing center proposal queues.

This isn’t the only lichen being sequenced. Xanthoria parietina is also in the queue at JGI, but has taken a while to get going because of some logistical problems getting the DNA (and any problems are amplified because it takes a long time to get new material since lichens grow very slow).

The transfer of the power for researchers to be able to quick exploratory whole-genome sequencing with next-gen and eventually, high quality genome sequences from next-gen sequencing is predicted to transform how this kind of science gets done. It means we’ll probably just sequence a mutant strain instead of trying to map the mutation – this is happening already in anecdotal stories in worms and in our work in mushrooms. N.B. this is done after a mutagenized strain has been cleaned up a bit to insure we’re looking for one or only a few mutations based on some crosses – but that is part of standard genetic approaches anyways.

This fast,cheap,whole-genome-sequencing is also the stuff of personal genomics, but for basic research it will also mean that a first pass exploring gene repertoire of an organism will be a multi-week instead of multi-year project. I just hope we’re training enough people who can efficiently extract the information from all this data with solid bioinformatics, computational, data-oriented programming, and statistical skills to support all the labs that will want to take this approach. You’ll need a life-vest to swim in the big data pool for a while until more tools are developed that can be deployed by non-experts.

Gene prediction without training?

A new paper in Genome Research from Borodovsky lab at Georgia Tech provides an improved ab initio gene prediction building on their previous program GeneMark called GeneMark.hmm ES.  This application doesn’t require a training set when building models for gene prediction in fungal genomes and reports to have as good or better sensitivity and specificity than most of the commonly used ab initio programs. They are picking up on proviously described insights about fungal gene structures and introns which is the lack of a necessary branch site and varying degrees of conservation of splice-sites in most intron rich fungi (Schwartz et al, 2008) and that these intron sizes remain short across the fungi (Stajich et al. 2007).

In practice it should simplify the initial genome annotation protocols used and could really streamline the procedures. It doesn’t replace the need to gathering EST sequence data that can also be used generate a training set in an automated fashion.  EST and transcriptional evidence is still very important for identification of UTR and alternative splicing isoforms.

Hopefully these data from the predictions will integrate into the Cryptococcus and Coprinus genome annotations that are undergoing an update at the Broad.  We’ll see how well this performs on a couple of the Chytrid genome sequences we are working on as well.

Chlamy genome investigations

Chlamy coverThis month’s Genetics has a series of articles exploring the genome (published last year & freely available at Science) of the green algae Chlamydomonas reinhardtii. These manuscripts are primarily genome analyses making for a very bioinformatics focused issue of Genetics. Some of the highlights include:

Trichoderma reesei genome paper published

TrichodermaThe Trichoderma reesei genome paper was recently published in Nature Biotechnology from Diego Martinez at LANL with collaborators at JGI, LBNL, and others. This fungus was chosen for sequencing because it was found on canvas tents eating the cotton material suggesting it may be a good candidate for degrading cellulose plant material as part of cellulosic ethanol or other biofuels production.  The fungus also has starring roles in industrial processes like making stonewashed jeans due to its prodigious cellulase production.

The most surprising findings from the paper include the fact that there are so few members of some of the enzyme families even though this fungus is able to generate enzymes with so much cellulase activity. The authors found that there is not a significantly larger number of glucoside hydrolases which is a collection of carbohydrate degrading enzymes great for making simple sugars out of complex ones. In fact, several plant pathogens compared (Fusarium graminearum and Magnaporthe grisea) and the sake fermenting Aspergillus oryzae all have more members of this family than does.  T. reesei has almost the least (36) copies of a cellulose binding domain (CBM) of any of the filamentous ascomycete fungi.  They used the CAZyme database (carbohydrate active enzymes) database which has done a fantastic job building up profiles of different enzymes involved in carhohydrate degradation binding, and modifications.

Whether T. reesei is really the best cellulose degrading fungus is definitely an open question.  That it works well in the industrial culture that it has been utilized in is important, but there may be other species of fungi with improved cellulase activity and who may in fact have many more copies of cellulases.  So it will be good to add other fungi to the mix with quantitative information about degradation to try and glean what are the most important combination of enzymes and activities.

One technical note.  The comparison of copy number differences employed in the paper is a simple enough Chi-Squared, work that I’ve done with Matt Hahn and others include a gene family size comparison approach that also taked into account phylogenetic distances and assumes a birth-death process of gene family size change.  It would be great to apply the copy number differences through this or other approaches that just evaluate gene trees for these domains to see where the differences are significant and if they can be polarized to a particular branch of the tree.

So will this genome sequence lead to cheaper, better biofuel production? Certainly it provides an important toolkit to start systematically testing individual cellulase enzymes. It’s hard to say how fast this will make an impact, but the work of JBEI and a host of other research groups and biotech companies are going to be able to systematically test out the utility of these individual enzymes.

There is also evolutionary work by other groups on the evolution of these Hypocreales fungi trying to better define when biotrophic and heterotrophic transitions occurred to sample fungi with different lifestyles that might have different cellulase enyzmes that may not have been observed. Defining the relationships of these fungi and when and how many times transitions to lifestyles occurred to choose the most diverse fungi may be an important part of discovering novel enzymes.

Also see

Martinez, D., Berka, R.M., Henrissat, B., Saloheimo, M., Arvas, M., Baker, S.E., Chapman, J., Chertkov, O., Coutinho, P.M., Cullen, D., Danchin, E.G., Grigoriev, I.V., Harris, P., Jackson, M., Kubicek, C.P., Han, C.S., Ho, I., Larrondo, L.F., de Leon, A.L., Magnuson, J.K., Merino, S., Misra, M., Nelson, B., Putnam, N., Robbertse, B., Salamov, A.A., Schmoll, M., Terry, A., Thayer, N., Westerholm-Parvinen, A., Schoch, C.L., Yao, J., Barbote, R., Nelson, M.A., Detter, C., Bruce, D., Kuske, C.R., Xie, G., Richardson, P., Rokhsar, D.S., Lucas, S.M., Rubin, E.M., Dunn-Coleman, N., Ward, M., Brettin, T.S. (2008). Genome sequencing and analysis of the biomass-degrading fungus Trichoderma reesei (syn. Hypocrea jecorina). Nature Biotechnology DOI: 10.1038/nbt1403

Yes, Ecology can improve Genomics

Blogging on Peer-Reviewed ResearchFew organisms are as well understood at the genetic level as Saccharomyces cerevisiae. Given that there are more yeast geneticists than yeast genes and exemplary resources for the community (largely a result of their size), this comes as no surprise. What is curious is the large number of yeast genes for which we’ve been unable to characterize. Of the ~6000 genes currently identified in the yeast genome, 1253 have no verified function (for the uninclined, this is roughly 21% of the yeast proteome). Egads! If we can’t figure this out in yeast, what hope do we have in non-model organisms?Lourdes Peña-Castillo and Timothy R. Hughes discuss this curious observation and its cause in their report in Genetics.

Continue reading Yes, Ecology can improve Genomics

Fusarium graminearum genome published

The genome of the wheat and cereal pathogen Fusarium graminearum was published in Science this week in an article entitled “The Fusarium graminearum Genome Reveals a Link Between Localized Polymorphism and Pathogen Specializationtion”. The project was a collaboration of many different Fusarium research groups. The genome sequencing was spearheaded by the Broad Institute at Harvard and MIT and is part of a larger project to sequence several different species of Fusarium. The group sequenced a second strain in order to identify polymorphisms.

Some of the key findings

  • The presence of Repeat Induced point-mutation (RIP) has likely limited the amount of repetitive and duplicated sequences in the genome
  • Most of the genes unique to F. graminearum (and thus not present in 4 other Fusarium spp genomes) are found in the telomeres
  • Between the sequenced strains SNP density ranged from 0 to 17.5 polymorphisms per kb.
  • Some of the genes expressed uniquely during plant infection (408 total) include known virulence factors and many plant cell-wall degrading enzymes.
  • The genes showing some of the highest SNP diversity tended to be unique to Fusarium and often unique to F. graminearum