New genomes from Microsporidia are on the way from the Broad Institute and other groups, and will be a boon to those working on these fascinating creatures. Microsporidia are obligate intracellular parasites of eukaryotic cells and many can cause serious disease in humans. Some parasitize worms and insects too. The evolutionary placement of these species in the fungi is still debated with recent evidence placing them as derived members of the Mucormycotina based on shared synteny (conserved gene order), in particular around the mating type locus. There is still some debate as to where this group belongs in the Fungal kingdom, with their highly derived characteristics and long branches they are still make them hard to place. The synteny-based evidence was another way to find a phylogenetic placement for them but it would be helpful to have additional support in the form of additional shared derived characteristics that group Mucormycotina and Microsporidia. There is hope that increased number of genome sequences and phylogenomic approaches can help resolve the placement and more further understand the evolution of the group.
For data analysis, a new genome database for comparing these genomes is online called MicrosporidiaDB. This project has begun incorporating the available genomes and providing a data mining interface that extends from the EuPathDB project.
There are several databases that include orthology prediction for fungi. These all have pros and cons. Some are more comprehensive and have many more species. Some are curated orthologies and paralogy which should be pretty stable. Some are automated and groupings and ortholog group IDs change at each iteration.
- A phylogenetic approach from a Saccharomyces perspective is at PhylomeDB.
- Fungal Orthogroups is based on Synergy algorithm from I. Wapinski formerly of the Regev group at the Broad Institutue.
- Yeast gene order browser (YGOB) for Saccharomyces spp and CGOB for Candida spp.
- OrthoMCL database based on whole genomes, not a ton of fungi but useful starting set.
- Ensembl Genomes provides ortholog prediction as part of the Compara pipeline though there is a limited phylogenetic diversity in the current Ensembl Fungal genomes.
- TreeFam has Saccharomyces cerevisiae and Schizosaccharomyces pombe as the two fungi included in the curated ortholog assignments and phylogenies.
- SIMAP provides pre-computed similarities among all proteins in UniProt.
- InParanoid provides a pretty comprehensive of available 100 whole genomes and many fungal genomes which I tried to help select.
- JGI’s Mycocosm attempts to provide a fungal focused paralog/gene family look at clusters of genes based on whole genomes
- E-Fungi is also an attempt at automated clustering with some fancy webservices logic.
- Fungal Transcription Factor database focused just on families of transcription factors.
Some of these tools are better than others in terms of providing downloadable tables. Another problem is what Identifiers are used. Many biologists are using gene names or Locus identifiers not UniProt/GenPept IDs to identify genes or proteins of interest. So tools that just cluster UniProt data aren’t as useful as those which refer to the gene or locus names. Also, providing a way to download all the data from a comparison is important for further mining and grouping of the data or cross-referencing local datasets. One-by-one plugging in geneids is not really a tool that respects the idea that your user wants to ask sophisticated queries.
Also – beware that some approaches are very much pairwise comparisons lists whereas others are finding orthologous groupings. So if you want to fine the Rad59 ortholog from all fungi it may be easier or harder depending on the source.
[I may make this a static page in the future to allow for more detailed updating since I know the available resources wax and wane]
Gene sequences evolve at different rates due to different constraints, either due to chromosome position, functional constraint, and status as a single-copy or multi-copy gene. In a recent paper, Allen Rodrigo (the new NESCent director by the, way, congrats!) the authors hypothesize that correlation in branch lengths of gene trees suggest they operate in the same pathway or have a similar function. To do this they took alignments of orthologous genes from 10 bacterial species which were seeded with E. coli as the target species. The alignments were used to build trees with MrBayes and only those which recovered the known species topology were retained. The ortholog groups were assigned GO terms via similarities.
They then looked at the branch lengths of gene trees and found a correlation between GO categories and rates of gene evolution/shape of the tree. I’ll not go into more details here but I think this is an interesting finding that is probably not so surprising when you think about it. I’m be very curious to see if this held up much in multi-domained proteins as well and of course taking this approach for a drive in fungal orthologs would be an interesting project for someone to try.
Li WL, & Rodrigo AG (2009). Covariation of branch lengths in phylogenies of functionally related genes. PloS one, 4 (12) e8487. PMID: 20041191. doi:10.1371/journal.pone.0008487
A recent paper in MBE presents evidence that the Taphrinomycota (containing S. pombe and Pneumocystis) are in fact a monophyletic group. This is considered an early branch in the Ascomycota with the Pezizomycotina (filamentous ascomycete fungi like Neurospora and Aspergillus) and Saccharomycotina (fungi mainly with yeast forms including Candida and Saccharomyces). The monophyly of Taphrinomyoctina fungi is something that has been fairly accepted but there are a few publications reporting conflicting evidence in some sets gene trees. This conflict is most likely due to Long Branch Attraction (LBA) and the Philippe lab has long worked on this problem of LBA working to develop tools like PhyloBayes that attempt to correct for LBA with a parameter rich model and using lots of data (like whole genomes). These authors are employing phylogenomics in the sense that multiple genes are used to reconstruct the phylogeny. This use is different from the J.Eisen/Sjölander sense which is to infer gene function from a phylogeny.
This paper presents evidence using proteins of 113 mitochondrial and nuclear genes and finds strong statistical support for this monophyly. They also note that it was necessary to remove fast evolving sites from a dataset of only mitochondrial genes in order to overcome LBA artifacts that lead to Saccharomyces and S. pombe sister relationship in previous analyses.
This paper also presents work using the Pneumocystis genome sequence helps resolve its placement and eventually understanding the evolution of this pathogen. In this tree the sister group to Pneumocystis is Schizosaccharomyces but both lineages have very long branches. The Saitoella lineage is basal in this paper which is different from what was found with a 4 gene (AFTOL) dataset (see Figure 2). Further work sampling more genes from these Taphrina lineages will likely help resolve the intra-clade relationships.
Y. Liu, J. W. Leigh, H. Brinkmann, M. T. Cushion, N. Rodriguez-Ezpeleta, H. Philippe, B. F. Lang (2008). Phylogenomic Analyses Support the Monophyly of Taphrinomycotina, including Schizosaccharomyces Fission Yeasts Molecular Biology and Evolution, 26 (1), 27-34 DOI: 10.1093/molbev/msn221