Post-doctoral Position in Fungal Phylogenetics
A Post-Doctoral position in fungal phylogenetics is available in the Hibbett laboratory at Clark University (http://www.clarku.edu/faculty/dhibbett/). The Post-doc will participate in a large collaborative endeavor supported by the NSF AVATOL Program that is aimed at synthesizing a comprehensive tree of life from published analyses, and developing novel tools for community-driven annotation of the tree. Specific responsibilities will include (1) assembly and integration of phylogenetic datasets and trees representing all groups of Fungi; (2) coordination with a multi-laboratory team including software developers and systematists to develop and test new methods for tree integration and annotation; (3) outreach to the fungal systematics community; (4) contribution to a distributed web-based undergraduate course on assembling the tree of life, and co-instruction of a linked undergraduate course at Clark University.
The ideal candidate will be a productive researcher with interests in fungal systematics and the construction and interpretation of large-scale phylogenetic trees, will have excellent communication and interpersonal skills, and will seek a career involving both research and education. Candidates lacking background in fungal systematics, but with strong qualifications in phylogenetics, and excellent potential as educators, may be considered.
It is anticipated that the position will be available beginning May 1, 2012. Up to three years of support is possible, depending on progress. Funding is contingent on final NSF approval.
To apply, e-mail a curriculum vitae, statement of research interests and career goals, PDFs of major publications, and names and e-mail addresses for three references. Applications from women and members of underrepresented groups in science are encouraged.
Clark University is an EEO/AA Employer.
David S. Hibbett
Worcester, MA 01610
There are several databases that include orthology prediction for fungi. These all have pros and cons. Some are more comprehensive and have many more species. Some are curated orthologies and paralogy which should be pretty stable. Some are automated and groupings and ortholog group IDs change at each iteration.
- A phylogenetic approach from a Saccharomyces perspective is at PhylomeDB.
- Fungal Orthogroups is based on Synergy algorithm from I. Wapinski formerly of the Regev group at the Broad Institutue.
- Yeast gene order browser (YGOB) for Saccharomyces spp and CGOB for Candida spp.
- OrthoMCL database based on whole genomes, not a ton of fungi but useful starting set.
- Ensembl Genomes provides ortholog prediction as part of the Compara pipeline though there is a limited phylogenetic diversity in the current Ensembl Fungal genomes.
- TreeFam has Saccharomyces cerevisiae and Schizosaccharomyces pombe as the two fungi included in the curated ortholog assignments and phylogenies.
- SIMAP provides pre-computed similarities among all proteins in UniProt.
- InParanoid provides a pretty comprehensive of available 100 whole genomes and many fungal genomes which I tried to help select.
- JGI’s Mycocosm attempts to provide a fungal focused paralog/gene family look at clusters of genes based on whole genomes
- E-Fungi is also an attempt at automated clustering with some fancy webservices logic.
- Fungal Transcription Factor database focused just on families of transcription factors.
Some of these tools are better than others in terms of providing downloadable tables. Another problem is what Identifiers are used. Many biologists are using gene names or Locus identifiers not UniProt/GenPept IDs to identify genes or proteins of interest. So tools that just cluster UniProt data aren’t as useful as those which refer to the gene or locus names. Also, providing a way to download all the data from a comparison is important for further mining and grouping of the data or cross-referencing local datasets. One-by-one plugging in geneids is not really a tool that respects the idea that your user wants to ask sophisticated queries.
Also – beware that some approaches are very much pairwise comparisons lists whereas others are finding orthologous groupings. So if you want to fine the Rad59 ortholog from all fungi it may be easier or harder depending on the source.
[I may make this a static page in the future to allow for more detailed updating since I know the available resources wax and wane]
Another result from the analysis of the recently published genome of the pea aphid, Acyrthosiphon pisum. Nancy Moran and Tyler Jarvik present a study of the origin of the carotenoid production gene in pea aphid. Animals typically cannot make carotenoids so they sought to discover how this is possible. They find that it is derived from a horizontal gene transfer event of a fungal gene into the aphid lineage. This gene is responsible for the red-green color polymorphism in the aphid. It appears the gene is derived from a ‘zygomycete’ or relative in the early branching lineage of the fungi. One gene, a carotenoid desaturase, is encoded in a 30kb genomic region that is missing in green aphids but present in the red morphs. The region is apparently maintained in the population by frequency dependent selection since each color has an advantage or disadvantage for evading detection by predators in different environments.
The reports of eukaryotic HGT event from fungi to animals is quite rare so this finding is surprising in that sense, but the authors argue that the important ecological role of carotenoids suggest we might see even more examples if we look harder.
Moran, N., & Jarvik, T. (2010). Lateral Transfer of Genes from Fungi Underlies Carotenoid Production in Aphids Science, 328 (5978), 624-627 DOI: 10.1126/science.1187113
Gene sequences evolve at different rates due to different constraints, either due to chromosome position, functional constraint, and status as a single-copy or multi-copy gene. In a recent paper, Allen Rodrigo (the new NESCent director by the, way, congrats!) the authors hypothesize that correlation in branch lengths of gene trees suggest they operate in the same pathway or have a similar function. To do this they took alignments of orthologous genes from 10 bacterial species which were seeded with E. coli as the target species. The alignments were used to build trees with MrBayes and only those which recovered the known species topology were retained. The ortholog groups were assigned GO terms via similarities.
They then looked at the branch lengths of gene trees and found a correlation between GO categories and rates of gene evolution/shape of the tree. I’ll not go into more details here but I think this is an interesting finding that is probably not so surprising when you think about it. I’m be very curious to see if this held up much in multi-domained proteins as well and of course taking this approach for a drive in fungal orthologs would be an interesting project for someone to try.
Li WL, & Rodrigo AG (2009). Covariation of branch lengths in phylogenies of functionally related genes. PloS one, 4 (12) e8487. PMID: 20041191. doi:10.1371/journal.pone.0008487
A recent paper in MBE presents evidence that the Taphrinomycota (containing S. pombe and Pneumocystis) are in fact a monophyletic group. This is considered an early branch in the Ascomycota with the Pezizomycotina (filamentous ascomycete fungi like Neurospora and Aspergillus) and Saccharomycotina (fungi mainly with yeast forms including Candida and Saccharomyces). The monophyly of Taphrinomyoctina fungi is something that has been fairly accepted but there are a few publications reporting conflicting evidence in some sets gene trees. This conflict is most likely due to Long Branch Attraction (LBA) and the Philippe lab has long worked on this problem of LBA working to develop tools like PhyloBayes that attempt to correct for LBA with a parameter rich model and using lots of data (like whole genomes). These authors are employing phylogenomics in the sense that multiple genes are used to reconstruct the phylogeny. This use is different from the J.Eisen/Sjölander sense which is to infer gene function from a phylogeny.
This paper presents evidence using proteins of 113 mitochondrial and nuclear genes and finds strong statistical support for this monophyly. They also note that it was necessary to remove fast evolving sites from a dataset of only mitochondrial genes in order to overcome LBA artifacts that lead to Saccharomyces and S. pombe sister relationship in previous analyses.
This paper also presents work using the Pneumocystis genome sequence helps resolve its placement and eventually understanding the evolution of this pathogen. In this tree the sister group to Pneumocystis is Schizosaccharomyces but both lineages have very long branches. The Saitoella lineage is basal in this paper which is different from what was found with a 4 gene (AFTOL) dataset (see Figure 2). Further work sampling more genes from these Taphrina lineages will likely help resolve the intra-clade relationships.
Y. Liu, J. W. Leigh, H. Brinkmann, M. T. Cushion, N. Rodriguez-Ezpeleta, H. Philippe, B. F. Lang (2008). Phylogenomic Analyses Support the Monophyly of Taphrinomycotina, including Schizosaccharomyces Fission Yeasts Molecular Biology and Evolution, 26 (1), 27-34 DOI: 10.1093/molbev/msn221
Estimating divergence times is notorious difficult and the field can be downright rancorous with some being accused of reading tea leaves and chicken entrails – interesting reading for personalities as much as the different scientific approaches. There are several different approaches to trying to estimate a divergence time among species, using calibration points usually anchored by fossil data. Molecular clock methods have problems sometimes producing extremely old dates that are quite hotly debated. In fungi we have a very few fossils (and their placement on the phylogeny is debated).
There are quite a few available methods for reconstructing divergence times including r8s and multidivtime which start with various types of trees and use calibration time points that are typically informed by fossil dates. The simplest approaches assume a molecular clock (rates are same across the tree) and then one only needs to calibrate the number of substitutions (or rate really) to time to determine how phylogenetic tree branch lengths map to time. The BEAST package also does phylogenetic inference and divergence time estimation (and provided the necessary analysis for exoneration of the Tripoli Six) across a sample of trees. BEAST (and MrBayes) use MCMC to sample the space of parameters and tree space to identify phylogenies and evolutionary parameters that explain the data (an alignment of sequences).
A paper from Akerborg and colleagues introduces a new approach that uses MCMC but apply a few twists, using a birth-death model that doesn’t assume a molecular clock and employing a hill-climbing algorithm instead of MCMC to find parameter optima. They use a Maximum a posterior (MAP) framework which is more computational efficient than MCMC. They couple the MAP approach with a dynamic-programming approach that separates the estimation of rates (branch length) from the estimation of times (which often require assumption of a molecular clock). While I can’t speak with much authority on the MAP approach or yet how well this compares on different datasets, it suggests a different method to tackle these problems. They authors point out one drawback with their approach is it only allows for derivation of point-estimates so statistical confidences like bootstrap support are not easily calculated through this approach. Their software, called PRIME is available here and I will be curious to see how it performs in other peoples’ hands.
Akerborg, O., Sennblad, B., Lagergren, J. (2008). Birth-death prior on phylogeny and speed dating. BMC Evolutionary Biology, 8(1), 77. DOI: 10.1186/1471-2148-8-77
What delineates species boundaries in fungi? Much work has been done on biological and phylogenetic species concepts in fungi. Some concepts are reviewed in Taylor et al 2006 and in Taylor et al 2000, and applications can be seen in several pathogens such as Paraccocidiodies, Coccidioides, and the model filamentous (non-pathogenic) fungus Neurospora.
A paper in Fungal Genetics and Biology on species definitions in Cryptococcus neoformans from multi-locus sequencing seeks to provide additional treatment of the observed diversity. A large study of 117 Cryptococcus isolates were examined through multi-locus sequencing (6 loci) and identified two monophyletic lineages within C. neoformans varieties that correspond to var. neoformans and var. grubii. However within the C. gattii samples they identified four monophyletic groups consistent with deep divergences observed from whole genome trees for two strains of C. gattii, MLST, and AFLP studies. By first defining species, we can now test whether any of the species groups have different traits including prevalence in clinical settings and in nature.
BOVERS, M., HAGEN, F., KURAMAE, E., BOEKHOUT, T. (2007). Six monophyletic lineages identified within Cryptococcus neoformans and Cryptococcus gattii by multi-locus sequence typing. Fungal Genetics and Biology DOI: 10.1016/j.fgb.2007.12.004
WrightFisher talks about a paper & the commentary in Science describing how alignment uncertainty should be taken into account when doing phylogenetic analyses on genomic datastets (some might call this phylogenomics, but Dr Eisen won’t). If the sequence alignment is treated as a random variable (and in bayesian approaches have a prior based on result(s) from an alignment program) then more accurate reconstruction. Robin points out several statistical alignment approaches that do just this including TKF91 and recent work that unifies a probabilistic framework with transducers.
The Willi Hennig Society, homebase for all good cladists, has subsidized the license fee for TNT so that it is now a freely available program (although it is not open-source). TNT implements phylogenetic analysis under parsimony with a fast tree searching algorithm. I believe TNT was one of the software tools that CIPRES was targeting for optimization as well so this may reflect some of that work.