Tag Archives: species

A word about databases

Logo for fungal GenomesReport concludes that a fungal genome database is of “the highest priority”.

This is the title as listed in PubMed for this article from Future Medicine about the AAM report on charting future needs and avenues of research on the fungal kingdom.

The need for a comprehensive database for information about fungi, starting at least with systematic collections of genomic and transcript data, is highlighted as a major need.  Really and sort of new database effort should strive to be more comprehensive and include genetic and population data (alleles, strains) and information like protein-protein, protein-nucleic acid interactions (as Pedro mentioned). But on top of that it, it needs to be comparative so that information from systems that serve as great models can be transferred to other fungal systems that are being studied for their role as pathogens or interacting in the environmental.

Affordable next-gen sequencing will allow us to obtain genome and transcript sequence for basically all species or strains of interest.  Researchers with no bioinformatics support in their lab will likely be able to outsource this to a company or campus core facility.  But how can they easily map in the collective information about genes, proteins, and pathways onto this new data?  And have it be a dynamic system that can update as new information is published and curated in other systems.

I think this has to be the future beyond setting up a SGD, CGD, etc for every system.  The individual databases are useful for a large enough community where there are curators (and funding), but we will have to move to a more modular system in the future (aspects of which are in GMOD) that can have both an individual focus on a specific species/clade and a more comprehensive view of the that is comparable across the kingdom.  There are 100+ fungal genomes, but the community size for some of them are in the dozens of labs or less. How can they take advantage of the new resources without an existing infrastructure of curators?  Their systems serve an important need in a research aim, but how can discoveries there make its way back into the datastream of othe systems?

I see it as there are several ways one would interact with a system that provided single-genome tools as well as a framework for comparative information.  At a gene level, one might be looking for all information about a specific gene, based on sequence similarity searches, or starting with a cloned gene in one species. Something akin to Phylofacts or precomputed Orthogroups for defining a Gene but with more linking information about function by linking in information from all sources.  So a comparative resource, but also tapping into curated andliterature mined data.

At a genome level, one might want to do whole genome comparisons of gene content from evolutionarily defined families genes (gene family size change) or at a functional level.  To start out with, each gene/protein would already need a systematic functional mapping.  This could be as simple as running InterProScan on every protein, expanded to find Orthogroups (or OrthoMCL orthologs) and transfer function from model systems, and finally even more advanced, do further classified better with tools like SIFTER.

Interlinked with these orthologous and paralogous gene sets would be anchors for analyses of chromosomal synteny and even comparative assembly including tools like Mercator.  Certainly things like all of this exist but making it more pluggable for different sets of species would be an important additional component.

At a utility level, the gene annotation and functional mapping of all this information should be possible. I would imagine a researcher could upload the sequence assembly they received from the core facility and the system can generate multiple gene predictions, annotate the genes, and link these genes within the known orthogroups of the system (preserving their privacy for these genes if desired).  Presumably this sort of thing would be easier as a standalone in-house for the researcher, but web services could also be the place for this.

For fungal-sized genomes this amount of data is not too extereme.  Things like Genome Browser, BLAST, etc should all be rolled out of the box based on the basic builds.

On the DIY and community annotation front, there would also need to be a layer of community derived annotation that could be layered on all these systems.  I would imagine this both to be for gene structure annotation (genome annotation) and functional annotation (protein X does Y based on experiment Z, here is the journal reference).  I think aspects of this would be visible, auditable (tracked), but maybe not blessed as official until a curator could oversee these inputs. In my mind, whether or not this is in a Wiki per se or just new system that allows community input is less important to me than having it be a) structured (not a bunch of free text) b) tracked and versionable c) easy for researchers to input so that the knowledge is captured, even if it has to be reorganized later on.

Seems like a lot of work to be done, but really many of these things already exist through what  the GMOD project has built.  Many loose ends and software that doesn’t fully meet up to these needs, but I think the important concept is these are all general solutions that will be of benefit to most communities, not just the fungal ones.  One lingering question I always have when approaching genomic datas

that will be dynamic, what if any of this makes its way into GenBank?  How is this sort of thing banked so that it can be captured, and does the improved functional or gene structure annotation ever make its way into the repository databases to correct and improve what has already been submitted there?

Basidiomycete genomes galore


Just finished attending Genetics and Cell Biology of Basidiomycetes in Cape Girardeau, MO which was an intimate gathering of basidiomycetaphiles.  I learned about systems that are used for studying fruiting body development, genetic mapping, pheromone and mating genes, kinesin dynamics, meoitic gene regulation, and a host of topics.  I’m happy I got a chance to meet more folks in the community and learned about where informatics and computational approaches are really needed to push along some of the interpretation of the more than a dozen basidiomycete genomes.  In particular it sounds like the PleurotusSchizophyllum, Agaricus bisporus, and Serpula genomes are all marching along to completion with some already in 4X assembly or further.  

GCBBVI Group Picture

So we’ll further have more samples from of key model and some less-model species to assist researchers working on many different mushroom-forming fungi that range from brown and white-rotting saprophyte fungi to mycorrhizal fungi that associate with plants.    I’m excited about the work to make transformation and knockouts more readily in these systems too to push the genetics and cellular biology of these systems even further.  The genome sequences will be another tool in these endeavors.

The last day ended with a discussion about genome annotation and future support for curating gene models.  Basically everyone is unhappy with computational predictions and want to be able to go in and fix things. (I think people remember the ones that are gotten wrong more readily than the ones that were right, but computational prediction definitely performs poorly in some situations).   In this Web 2.0-land we live in, this is still not something easily done with any of the freely available genome browsing tools. The JGI’s browser was lauded for its ability to handle these kinds of requests, but how do we proceed when genomes are not sequenced by that center or when (not too distant future) communities are able to sequence a genome themselves using 454/Illumina-Solexa/Helicos/Pacific Biosystems approaches in their own lab?  There is still a huge lag in what kinds of tools researchers can use to annotate genomes to fix gene models and add functions.  Hopefully projects like GMOD will continue to develop useful tools for solving these needs, but there is certainly a need for better support of distributed community annotation of genomes where this little direct money for supporting curators from a single place.

Trichoderma reesei genome paper published

TrichodermaThe Trichoderma reesei genome paper was recently published in Nature Biotechnology from Diego Martinez at LANL with collaborators at JGI, LBNL, and others. This fungus was chosen for sequencing because it was found on canvas tents eating the cotton material suggesting it may be a good candidate for degrading cellulose plant material as part of cellulosic ethanol or other biofuels production.  The fungus also has starring roles in industrial processes like making stonewashed jeans due to its prodigious cellulase production.

The most surprising findings from the paper include the fact that there are so few members of some of the enzyme families even though this fungus is able to generate enzymes with so much cellulase activity. The authors found that there is not a significantly larger number of glucoside hydrolases which is a collection of carbohydrate degrading enzymes great for making simple sugars out of complex ones. In fact, several plant pathogens compared (Fusarium graminearum and Magnaporthe grisea) and the sake fermenting Aspergillus oryzae all have more members of this family than does.  T. reesei has almost the least (36) copies of a cellulose binding domain (CBM) of any of the filamentous ascomycete fungi.  They used the CAZyme database (carbohydrate active enzymes) database which has done a fantastic job building up profiles of different enzymes involved in carhohydrate degradation binding, and modifications.

Whether T. reesei is really the best cellulose degrading fungus is definitely an open question.  That it works well in the industrial culture that it has been utilized in is important, but there may be other species of fungi with improved cellulase activity and who may in fact have many more copies of cellulases.  So it will be good to add other fungi to the mix with quantitative information about degradation to try and glean what are the most important combination of enzymes and activities.

One technical note.  The comparison of copy number differences employed in the paper is a simple enough Chi-Squared, work that I’ve done with Matt Hahn and others include a gene family size comparison approach that also taked into account phylogenetic distances and assumes a birth-death process of gene family size change.  It would be great to apply the copy number differences through this or other approaches that just evaluate gene trees for these domains to see where the differences are significant and if they can be polarized to a particular branch of the tree.

So will this genome sequence lead to cheaper, better biofuel production? Certainly it provides an important toolkit to start systematically testing individual cellulase enzymes. It’s hard to say how fast this will make an impact, but the work of JBEI and a host of other research groups and biotech companies are going to be able to systematically test out the utility of these individual enzymes.

There is also evolutionary work by other groups on the evolution of these Hypocreales fungi trying to better define when biotrophic and heterotrophic transitions occurred to sample fungi with different lifestyles that might have different cellulase enyzmes that may not have been observed. Defining the relationships of these fungi and when and how many times transitions to lifestyles occurred to choose the most diverse fungi may be an important part of discovering novel enzymes.

Also see

Martinez, D., Berka, R.M., Henrissat, B., Saloheimo, M., Arvas, M., Baker, S.E., Chapman, J., Chertkov, O., Coutinho, P.M., Cullen, D., Danchin, E.G., Grigoriev, I.V., Harris, P., Jackson, M., Kubicek, C.P., Han, C.S., Ho, I., Larrondo, L.F., de Leon, A.L., Magnuson, J.K., Merino, S., Misra, M., Nelson, B., Putnam, N., Robbertse, B., Salamov, A.A., Schmoll, M., Terry, A., Thayer, N., Westerholm-Parvinen, A., Schoch, C.L., Yao, J., Barbote, R., Nelson, M.A., Detter, C., Bruce, D., Kuske, C.R., Xie, G., Richardson, P., Rokhsar, D.S., Lucas, S.M., Rubin, E.M., Dunn-Coleman, N., Ward, M., Brettin, T.S. (2008). Genome sequencing and analysis of the biomass-degrading fungus Trichoderma reesei (syn. Hypocrea jecorina). Nature Biotechnology DOI: 10.1038/nbt1403

Podospora genome published

P.anserinaThe genome of Podospora anserina S mat+ strain was sequenced by Genoscope and CNRS and published recently in Genome Biology. The genome sequence data has been available for several years, but it is great to see a publication describing the findings.  The 10X genome assembly with ~10,000 genes provides an important dataset for comparisons among filamentous Sordariomycete fungi. The authors primarily focused on comparative genomics of Podospora to Neurospora crassa, the next closest model filamentous species.  Within the Sordariomycetes there are now a very interesting collection of closely related species which can be useful for applying synteny and phylogenomics approaches.

The analyses in the manuscript focused on these differences between Neurospora and Podospora identifying some key differences in carbon utilization contrasting the coprophillic (Podospora) and plant saprophyte (Neurospora).  There are several observations of gene family expansions in the Podospora genome which could be interpreted as additional enzyme capacity to break down carbon sources that are present in dung.

The genome of Neurospora has be shaped by the action of the genome defense mechanisms like RIP that has been on interpretation of the reduced number of large gene families and paucity of transposons. The authors report a surprising finding that in their analysis that despite sharing orthologs of genes that are involved in several genome defense, they in fact find fewer repetitive sequences in Podospora while it still fails to have good evidence of RIP.

Overall, these data suggest that P. anserina has experienced a fairly complex history of transposition and duplications, although it has not accumulated as many repeats as N. crassaP. anserina possesses all the orthologues of N. crassa factors necessary for gene silencing, including RIP, meiotic MSUD and also vegetative quelling, a post transcriptional gene silencing mechanism akin to RNA interference

I think this data and observations interleaves nicely with the work our group is exploring on evolution of genome of several Neurospora species which have different mating systems. The fact that the gene components that play a role in MSUD and a RIP are found in Podpospora but yet the degree of RIP and the lack of any observed meiotic silencing suggests some interesting occurrences on the Neurospora branch to be explored.  The potentially different degrees of RIP efficiency and types of mating systems (heterothallic and pseudohomothallic) among the Neurospora spp may also provide a link to understanding how RIP evolved and its role on N. crassa evolution.

Senescence in Podospora

Another aspect of Podopsora biology that isn’t touched on, is the use of the fungus as a model for senescence.  The fungus exhibits maternal senescence which involves targeted changes in the mitochondria that leads to cell death.  The evolutionary and molecular basis for this process has been of interest to many research groups and the genome sequence can provide an additional toolkit for identifying the factors involved in the apoptosis process in this filamentous fungi. Whether it will help find a real link for aging research in other eukaryotes remains to be seen, but it is a good model system for some aspects of how aging and damage to mtDNA are linked.

Espagne, E., Lespinet, O., Malagnac, F., Da Silva, C., Jaillon, O., Porcel, B.M., Couloux, A., Aury, J., et al (2008). The genome sequence of the model ascomycete fungus Podospora anserina. Genome Biology, 9(5), R77. DOI: 10.1186/gb-2008-9-5-r77

More RIP without sex?

In followup to the Aspergillus RIP paper discussion, Jo Anne posted in the comments that her paper published in FGB about RIP in another asexual species of fungi also found that evidence for the meiosis-specific process of Repeat Induced Point-mutations (RIP).

Continue reading More RIP without sex?

Neurospora speciation through experimental evolution

ResearchBlogging.orgDettman, Anderson, and Kohn recently published a paper in BMC Evolutionary Biology on reproductive experimental evolution in two Neurospora crassa populations evolved under different selective conditions. This is a great study that complements work published last year in Nature on experimental evolution in Saccharomyces cerevisiae populations. Neurospora populations were evolved under high salt and low temperature and were started from either high diversity (interspecific crosses, N. crassa vs N. intermedia) or low diversity (intraspecific cross, two N. crassa isolates D143 (Louisiana, USA)and D69 (Ivory Coast)) as described in Figure 1. The experimentally evolved populations were then tested for asexual and sexual fitness (they were taken through complete meiotic cycle throughout the experiment to avoid insure there was selection on the sexual reproduction pathway.

Continue reading Neurospora speciation through experimental evolution

Fungal Genetics 2007 details

I’m including a recapping as many of the talks as I remember. There were 6 concurrent sessions each afternoon so you have to miss a lot of talks. The conference was bursting at the seams as it was- at least 140 people had to be turned away beyond the 750 who attended.

If there was any theme in the conference it was “Hey we are all using these genome sequences we’ve been talking about getting”. I only found the overview talks that solely describe the genome solely a little dry as compared to those more focused on particular questions. I guess my genome palate is becoming refined.

Continue reading Fungal Genetics 2007 details

Whole genome tiling arrays

A recent paper describes the discovery of 9 new introns in Saccharomyces cerevisiae by Ron Davis’s group at Stanford, using high density tiling arrays from Affymetrix. The arrays are designed for both strands allow the detection of transcripts transcribed from both strands. The arrays were also put to work by the Davis and Steinmetz labs to create a high density map of transcription in yeast and for polymorphism mapping from the Kruglyak lab.

PNAS Yeast Transcriptional map

Whole genome tiling arrays have also been employed in other fungi. For example, Anita Sil’s group at UCSF constructed a random tiling array for Histoplasma capsulatum and used it to identify genes responding to reactive nitrogen species. A similar approach was used in Cryptococcus neoformans to investigate temperature regulated genes using random sequencing clones.

As the technology has become cheaper, it may become sensible to use a tiling array to detect transcripts rather than ESTs when attempting to annotate a genome. In the Histoplasma work transcriptional units could be identified from hybridization alone. Some of the algorithms will need some work to correct incorporate this information, and the sensitivity and density of the array will influence this. These techniques can be part of a resequencing approaches or fast genotyping progeny from QTL experiments when the sequence from both parents is known (or at least enough of the polymorphims for the genetic map).

What is superior about the current Affymetrix yeast tiling array is the inclusion of both strands. This allows detection of transcripts from both strands. Several anti-sense transcripts in yeast have been discovered recently including in the IME4 locus through more classical approaches, but perhaps many more await discovery with high resolution transcriptional data from whole genome tiling arrays.