Univ of Oregon Faculty position in Genomics, Bioinformatics, Statistical Genetics

UNIVERSITY OF OREGON

Faculty Positions in Genomics, Bioinformatics, Statistical Genetics

The Departments of Biology (http://biology.uoregon.edu) and Mathematics (http://math.uoregon.edu ) at the University of Oregon announce a cluster hire of up to three tenure-related faculty positions in Fall 2013. One of these positions may be at the level of Associate or Full Professor with indefinite tenure. These hires are part of an integrated effort to strengthen research and scholarship at the nexus of statistics/mathematics and biology at the University of Oregon, and will serve as a catalyst for future growth in this area. We are broadly interested in recruiting candidates working in areas developing statistical methodology related to the life sciences. Examples of these areas include, but are not limited to, statistical analysis of large data sets, algorithms for analyzing sequence data, and stochastic models for neuroscience, population genomics and molecular evolution. Successful candidates will bolster our emerging strengths in biomathematics, maintain an outstandi
ng research program that focuses on solving core problems in this area, and have a commitment to excellence in teaching. Ph.D. required. Position responsibilities include undergraduate teaching.

Interested persons should apply online to the MATHBIO SEARCH, University of Oregon at https://www.mathjobs.org/jobs/jobs/4035. Applicants should submit a cover letter, a curriculum vitae including a publication list, a statement of research accomplishments and future research plans, a description of teaching experience and philosophy, and three letters of recommendation. Ideally the research description and at least one of the letters of recommendation would include descriptions of the statistical/mathematical tools or models used in the applicant’s research. To ensure consideration, application materials should be uploaded by November 15th, 2012, but the search will remain open until the positions are filled.

Women and minorities are encouraged to apply. The University of Oregon is an Equal Opportunity/Affirmative Action Institution committed to cultural
diversity and compliance with the Americans with Disabilities Act, and supportive of the needs of dual career couples. We invite applications
from qualified candidates who share our commitment to diversity.

Summer courses for informatics and genomics

Cornwall School House Nº 4 (1892)Here’s a few courses to consider for the summer that cover informatics, genomics, and metagenomics analysis focusing on next generation sequencing. The deadlines are fast approaching so apply soon. (There are undoubtably more, and I’m happy to post here if you have suggestions)
Continue reading

Lichen genome projects and the power shift prompted by next-gen sequencing

Genome Technology highlights the very cool thing about next-gen sequencing – it puts the power in the hands of the researchers to explore genome sequence and doesn’t limit them to projects only funded through sequencing centers. The Genome Technology piece highlights work at Duke to sequence the genome Cladonia grayi, a lichenized fungus, with 454 technology at Duke’s Institute for Genome Sciences and Policy through their next-gen sequencing program. This is the way of the future where sequencing core facilities will be able to generate sequence only having to wait in the queue at the own university rather than through community sequencing project or sequencing center proposal queues.

This isn’t the only lichen being sequenced. Xanthoria parietina is also in the queue at JGI, but has taken a while to get going because of some logistical problems getting the DNA (and any problems are amplified because it takes a long time to get new material since lichens grow very slow).

The transfer of the power for researchers to be able to quick exploratory whole-genome sequencing with next-gen and eventually, high quality genome sequences from next-gen sequencing is predicted to transform how this kind of science gets done. It means we’ll probably just sequence a mutant strain instead of trying to map the mutation – this is happening already in anecdotal stories in worms and in our work in mushrooms. N.B. this is done after a mutagenized strain has been cleaned up a bit to insure we’re looking for one or only a few mutations based on some crosses – but that is part of standard genetic approaches anyways.

This fast,cheap,whole-genome-sequencing is also the stuff of personal genomics, but for basic research it will also mean that a first pass exploring gene repertoire of an organism will be a multi-week instead of multi-year project. I just hope we’re training enough people who can efficiently extract the information from all this data with solid bioinformatics, computational, data-oriented programming, and statistical skills to support all the labs that will want to take this approach. You’ll need a life-vest to swim in the big data pool for a while until more tools are developed that can be deployed by non-experts.

A word about databases

Logo for fungal GenomesReport concludes that a fungal genome database is of “the highest priority”.

This is the title as listed in PubMed for this article from Future Medicine about the AAM report on charting future needs and avenues of research on the fungal kingdom.

The need for a comprehensive database for information about fungi, starting at least with systematic collections of genomic and transcript data, is highlighted as a major need.  Really and sort of new database effort should strive to be more comprehensive and include genetic and population data (alleles, strains) and information like protein-protein, protein-nucleic acid interactions (as Pedro mentioned). But on top of that it, it needs to be comparative so that information from systems that serve as great models can be transferred to other fungal systems that are being studied for their role as pathogens or interacting in the environmental.

Affordable next-gen sequencing will allow us to obtain genome and transcript sequence for basically all species or strains of interest.  Researchers with no bioinformatics support in their lab will likely be able to outsource this to a company or campus core facility.  But how can they easily map in the collective information about genes, proteins, and pathways onto this new data?  And have it be a dynamic system that can update as new information is published and curated in other systems.

I think this has to be the future beyond setting up a SGD, CGD, etc for every system.  The individual databases are useful for a large enough community where there are curators (and funding), but we will have to move to a more modular system in the future (aspects of which are in GMOD) that can have both an individual focus on a specific species/clade and a more comprehensive view of the that is comparable across the kingdom.  There are 100+ fungal genomes, but the community size for some of them are in the dozens of labs or less. How can they take advantage of the new resources without an existing infrastructure of curators?  Their systems serve an important need in a research aim, but how can discoveries there make its way back into the datastream of othe systems?

I see it as there are several ways one would interact with a system that provided single-genome tools as well as a framework for comparative information.  At a gene level, one might be looking for all information about a specific gene, based on sequence similarity searches, or starting with a cloned gene in one species. Something akin to Phylofacts or precomputed Orthogroups for defining a Gene but with more linking information about function by linking in information from all sources.  So a comparative resource, but also tapping into curated andliterature mined data.

At a genome level, one might want to do whole genome comparisons of gene content from evolutionarily defined families genes (gene family size change) or at a functional level.  To start out with, each gene/protein would already need a systematic functional mapping.  This could be as simple as running InterProScan on every protein, expanded to find Orthogroups (or OrthoMCL orthologs) and transfer function from model systems, and finally even more advanced, do further classified better with tools like SIFTER.

Interlinked with these orthologous and paralogous gene sets would be anchors for analyses of chromosomal synteny and even comparative assembly including tools like Mercator.  Certainly things like all of this exist but making it more pluggable for different sets of species would be an important additional component.

At a utility level, the gene annotation and functional mapping of all this information should be possible. I would imagine a researcher could upload the sequence assembly they received from the core facility and the system can generate multiple gene predictions, annotate the genes, and link these genes within the known orthogroups of the system (preserving their privacy for these genes if desired).  Presumably this sort of thing would be easier as a standalone in-house for the researcher, but web services could also be the place for this.

For fungal-sized genomes this amount of data is not too extereme.  Things like Genome Browser, BLAST, etc should all be rolled out of the box based on the basic builds.

On the DIY and community annotation front, there would also need to be a layer of community derived annotation that could be layered on all these systems.  I would imagine this both to be for gene structure annotation (genome annotation) and functional annotation (protein X does Y based on experiment Z, here is the journal reference).  I think aspects of this would be visible, auditable (tracked), but maybe not blessed as official until a curator could oversee these inputs. In my mind, whether or not this is in a Wiki per se or just new system that allows community input is less important to me than having it be a) structured (not a bunch of free text) b) tracked and versionable c) easy for researchers to input so that the knowledge is captured, even if it has to be reorganized later on.

Seems like a lot of work to be done, but really many of these things already exist through what  the GMOD project has built.  Many loose ends and software that doesn’t fully meet up to these needs, but I think the important concept is these are all general solutions that will be of benefit to most communities, not just the fungal ones.  One lingering question I always have when approaching genomic datas

that will be dynamic, what if any of this makes its way into GenBank?  How is this sort of thing banked so that it can be captured, and does the improved functional or gene structure annotation ever make its way into the repository databases to correct and improve what has already been submitted there?

Chlamy genome investigations

Chlamy coverThis month’s Genetics has a series of articles exploring the genome (published last year & freely available at Science) of the green algae Chlamydomonas reinhardtii. These manuscripts are primarily genome analyses making for a very bioinformatics focused issue of Genetics. Some of the highlights include:

S.pombe telomerase RNA identified

Blogging on Peer-Reviewed Research Webb, C.J., Zakian, V.A. (2008). Identification and characterization of the Schizosaccharomyces pombe TER1 telomerase RNA. Nature Structural & Molecular Biology, 15(1), 34-42. DOI: 10.1038/nsmb1354

Leonardi, J., Box, J.A., Bunch, J.T., Baumann, P. (2008). TER1, the RNA subunit of fission yeast telomerase. Nature Structural & Molecular Biology, 15(1), 26-33. DOI: 10.1038/nsmb1343

Two papers in Nature Structural & Molecular Biology identify the telomerase RNA in Schizosaccharomyces pombe. Telomerase is a multi-unit enzyme that has both protein and RNA components. While the protein subunit is highly conserved and identifiable through sequence comparisons of eukaryotes, the RNA subunit has a variable size and sequence making identification through comparative means more difficult. The S. pombe telomerase RNA subunit, or TER1, was discovered by two labs applying similar biochemical approaches to identify the locus.

Continue reading

Yes, Ecology can improve Genomics

Blogging on Peer-Reviewed ResearchFew organisms are as well understood at the genetic level as Saccharomyces cerevisiae. Given that there are more yeast geneticists than yeast genes and exemplary resources for the community (largely a result of their size), this comes as no surprise. What is curious is the large number of yeast genes for which we’ve been unable to characterize. Of the ~6000 genes currently identified in the yeast genome, 1253 have no verified function (for the uninclined, this is roughly 21% of the yeast proteome). Egads! If we can’t figure this out in yeast, what hope do we have in non-model organisms?Lourdes Peña-Castillo and Timothy R. Hughes discuss this curious observation and its cause in their report in Genetics.

Continue reading

Evolution of aflatoxin gene cluster


Blogging on Peer-Reviewed ResearchIgnazio Carbone and colleagues published a recent analysis of the evolution of the aflatoxin gene cluster in five Aspergillus fungi entitled “Gene duplication, modularity and adaptation in the evolution of the aflatoxin gene cluster” in BMC Evolutionary Biology. The authors were able to identify seven modules pairs of genes whose history of duplication were highly correlated. Several genomes of Aspergillus have been sequenced along with more Eurotioales fungi. Continue reading

Orthology detection software

Blogging about Peer-Reviewed Research A paper in PLoS One, Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes, reports a new approach to assess the performance of automated orthology detection. These authors also wrote the OrthoMCL (2006 DB paper, 2003 algorithm paper) which uses MCL to build orthologous gene families. The authors discuss the trade-offs between highly sensitive specific tree-based methods and fast but less sensitive approaches of the Best-Reciprocal-Hits from BLAST or FASTA or some of the hybrid approaches. The authors employ Latent Class Analysis (LCA) to aid in “evaluation and optimization of a comprehensive set of orthology detection methods, providing a guide for selecting methods and appropriate parameters”. LCA is also the statistical basis for feature choice in combing gene predictions into a single set of gene calls in GLEAN written by many of the same authors including Aaron Mackey.

I’ve been reading a lot of orthology and gene tree-species tree reconcilation papers lately, some are listed in Ian Holmes’s group as well as listing some of the software on the BioPerl site. This also follows with on our Phyloinformatics hackathon work which we are trying to formalize in some more documentation for phyloinformatics pipelines to support some of the described use cases. I’m also applying some of this to a tutorial I’m teaching at ISMB2007 this summer.

Wikis for genome (re)annotation

Steven Salzberg (who is nominated for the Franklin award at bioinformatics.org) has an opinion piece in Genome Biology proposing wiki technology to help solve the problem of genome annotations getting out of date.
Continue reading