Category Archives: short-read

A cacophony of comparative genomics papers

A nice series of comparative genomics articles have been published in the last few weeks. The pace of genome sequencing has accelerated to the point that we have lots of sequencing projects coming from individual labs and small consortia not necessarily from genome centers. We are seeing a preview of what next (2nd) generation sequencing will enable and can start to imagine what happens when even cheaper 3rd generation sequencing technologies are applied. I’m behind in reviewing these papers for you, dear reader, but I hope you’ll click through and take a look at some of these papers if you are interested in the topics.

In the following set of papers we have some nice examples of comparative genomics of closely related species and among a clade of species. The papers mentioned below include our work on the human pathogens Coccidioides and Histoplasma (Sharpton et al) studied at several evolutionary distances, a study on Saccharomycetaceae (Souciet et al) clade of yeast species, and a comparison of two species of Candida (Jackson et al): the commensal and opportunistic fungal pathogen Candida albicans with a very closely related species Candida dubliensis.  There is also a nice comparison of strains of Saccharomyces cerevisiae looking at effects of domestication and examples of horizontal transfer.

There is also a report of de novo sequencing of a filamentous fungus using several approaches, traditional Sanger sequencing, 454, and Illumina/Solexa (DiGuistini et al).

Finally, a paper from a few months ago (Ma et al), gives a fantastic look at one of the early branches in the fungal tree – the Mucorales (formerly Zygomycota) – via the genome of Rhizopus oryzae.  This paper is a really excellent example of what we can learn about a group of species by contrasting genomic features in the early branches in the tree with the more well studied Ascomycete and Basidiomycete fungi.  More genome sequences will help us build on these findings and clarify if some of the observations are unique to the lineage or universal aspects of the earliest fungi.

I hope you enjoy!

Novo, M., Bigey, F., Beyne, E., Galeote, V., Gavory, F., Mallet, S., Cambon, B., Legras, J., Wincker, P., Casaregola, S., & Dequin, S. (2009). Eukaryote-to-eukaryote gene transfer events revealed by the genome sequence of the wine yeast Saccharomyces cerevisiae EC1118 Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.0904673106 (via J Heitman)

Jackson, A., Gamble, J., Yeomans, T., Moran, G., Saunders, D., Harris, D., Aslett, M., Barrell, J., Butler, G., Citiulo, F., Coleman, D., de Groot, P., Goodwin, T., Quail, M., McQuillan, J., Munro, C., Pain, A., Poulter, R., Rajandream, M., Renauld, H., Spiering, M., Tivey, A., Gow, N., Barrell, B., Sullivan, D., & Berriman, M. (2009). Comparative genomics of the fungal pathogens Candida dubliniensis and C. albicans Genome Research DOI: 10.1101/gr.097501.109

DiGuistini, S., Liao, N., Platt, D., Robertson, G., Seidel, M., Chan, S., Docking, T., Birol, I., Holt, R., Hirst, M., Mardis, E., Marra, M., Hamelin, R., Bohlmann, J., Breuil, C., & Jones, S. (2009). De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data. Genome Biology, 10 (9) DOI: 10.1186/gb-2009-10-9-r94 (open access)

Sharpton, T., Stajich, J., Rounsley, S., Gardner, M., Wortman, J., Jordar, V., Maiti, R., Kodira, C., Neafsey, D., Zeng, Q., Hung, C., McMahan, C., Muszewska, A., Grynberg, M., Mandel, M., Kellner, E., Barker, B., Galgiani, J., Orbach, M., Kirkland, T., Cole, G., Henn, M., Birren, B., & Taylor, J. (2009). Comparative genomic analyses of the human fungal pathogens Coccidioides and their relatives Genome Research DOI: 10.1101/gr.087551.108 (open access)

Souciet, J., Dujon, B., Gaillardin, C., Johnston, M., Baret, P., Cliften, P., Sherman, D., Weissenbach, J., Westhof, E., Wincker, P., Jubin, C., Poulain, J., Barbe, V., Segurens, B., Artiguenave, F., Anthouard, V., Vacherie, B., Val, M., Fulton, R., Minx, P., Wilson, R., Durrens, P., Jean, G., Marck, C., Martin, T., Nikolski, M., Rolland, T., Seret, M., Casaregola, S., Despons, L., Fairhead, C., Fischer, G., Lafontaine, I., Leh, V., Lemaire, M., de Montigny, J., Neuveglise, C., Thierry, A., Blanc-Lenfle, I., Bleykasten, C., Diffels, J., Fritsch, E., Frangeul, L., Goeffon, A., Jauniaux, N., Kachouri-Lafond, R., Payen, C., Potier, S., Pribylova, L., Ozanne, C., Richard, G., Sacerdot, C., Straub, M., & Talla, E. (2009). Comparative genomics of protoploid Saccharomycetaceae Genome Research DOI: 10.1101/gr.091546.109 (open access)

Ma, L., Ibrahim, A., Skory, C., Grabherr, M., Burger, G., Butler, M., Elias, M., Idnurm, A., Lang, B., Sone, T., Abe, A., Calvo, S., Corrochano, L., Engels, R., Fu, J., Hansberg, W., Kim, J., Kodira, C., Koehrsen, M., Liu, B., Miranda-Saavedra, D., O’Leary, S., Ortiz-Castellanos, L., Poulter, R., Rodriguez-Romero, J., Ruiz-Herrera, J., Shen, Y., Zeng, Q., Galagan, J., Birren, B., Cuomo, C., & Wickes, B. (2009). Genomic Analysis of the Basal Lineage Fungus Rhizopus oryzae Reveals a Whole-Genome Duplication PLoS Genetics, 5 (7) DOI: 10.1371/journal.pgen.1000549 (open access)

Yeast population genomics
I have cheered the Sanger-Wellcome SGRP group work to generate multiple Saccharomyces cerevisiae and S. paradoxus strain genome sequences.   The group had previously submitted a version of the manuscript to Nature precedings and it is now published in Nature AOP showing that submitting to a preprint server doesn’t necessarily hurt your manuscript getting published…  The research groups explored the impact of domestication (as was also recently done for the sake and soy sauce worker fungus, Aspergillus oryzae) on the Saccharomyces genome by comparing individuals from wild strains of S. paradoxus.

This paper addressed several challenges including methodology for light genome sequencing for population genomics. This data represents in a way, a pilot project on for genome resequencing projects and using draft genome sequencing with next generation sequencing tools. Of course with the pace of sequencing technology development, any project more than a couple months old will be using outdated technology it seems, but this work represents some important progress.  Tools like MAQ were also developed and tuned as part of the project.  In addition to the methods development it also provided a new look at evolutionary dynamics of a well-studied fungus.

Genome assembly
The authors apply several different quality controls and utilize a new tool called PALAS (Parallel ALignment and ASsembly)  to assemble all the strains at the same time using a graph-based approach that utilized the reference genome sequences for each species. This is different than a full-blown WGA approach like PCAP, Phusion or Arachne because this is deliberately low-coverage sequencing pass.  The authors are trying impute missing sequence via Ancestral Recombination Graphs as implemented in the Margarita system.   They also use MAQ to align sequence from Illumina/Solexa sequencing to these assemblies made by PALAS.

Since this project was on two species of SaccharomycesS. cerevisiae and S. paradoxus they needed good reference assemblies for each of these species. The previously availably S.paradoxus assembly wasn’t complete enough for this study so they did an addition 4.3 X coverage with sanger/ABI sequencing and 80X coverage with Illumina.

Population genomics and domestication

The sequencing data also provided a framework for population genetic investigations. Some simple findings showed that geographic isolates within each species were more genetically similar to each other.  The main geographic regions of samples for S.paradoxus data included the UK, American, and Far East samples, some of which had been analyzed in a very nice study on Chromosome III.  For the S. cerevisiae samples there were individuals from around Europe, at least 10 European wine strains, Malaysian, Sake brewing strains, West Africa, and North America. From these data it was possible to discover that there are several of strains with mosiac genomes meaning that pieces of the genome match best with the sake fermentation strains and other parts from the wine/European samples.

Efforts to detect the effects of natural selection that may be linked to domestication of these strains explored two different approaches. The McDonald-Kreitman test did not identify any loci under positive selection while Tajima’s D was negative in the S.cerevisiae global and wine strain populations indicating an excess of singleton polymorphisms – though they draw little conclusions from that.  The authors also observed a sharper decay of linkage disequilibrium in S.cerevisiae (half maximum of 3kb) than S.paradoxus (half maximum 9kb) suggesting that S.cerevisiae is recombining more, either due to increased opportunities or a great frequency of recombination events when it does.

In context of the paper title and the idea of exploring the effects of domestication on the genome, the authors observe that the standard paradigm that ‘domesticated’ species have lower diversity levels is simply not the case in these samples.  This isn’t to say there isn’t evidence of the selection for fermentation production from these strains based on the stress response conditions they were tested on, but that there is still ample evidence of maintaining diversity within the populations presumably through various amounts of outcrossing.

We are also interested in these results as we apply similar questions to population genomics of the human pathogenic fungus Coccidioides where 14 strains have been sequenced with sanger sequencing technology.  Hopefully some of these lessons will resonate in our analyses and also that this era of population genomics will see ever more extensive collections to address aspects of migration, phylogeography, and local adaptations within populations of fungi and other microbes.

Gianni Liti, David M. Carter, Alan M. Moses, Jonas Warringer, Leopold Parts, Stephen A. James, Robert P. Davey, Ian N. Roberts, Austin Burt, Vassiliki Koufopanou, Isheng J. Tsai, Casey M. Bergman, Douda Bensasson, Michael J. T. O’Kelly, Alexander van Oudenaarden, David B. H. Barton, Elizabeth Bailes, Alex N. Nguyen, Matthew Jones, Michael A. Quail, Ian Goodhead, Sarah Sims, Frances Smith, Anders Blomberg, Richard Durbin, Edward J. Louis (2009). Population genomics of domestic and wild yeasts Nature DOI: 10.1038/nature07743

Lichen genome projects and the power shift prompted by next-gen sequencing

Genome Technology highlights the very cool thing about next-gen sequencing – it puts the power in the hands of the researchers to explore genome sequence and doesn’t limit them to projects only funded through sequencing centers. The Genome Technology piece highlights work at Duke to sequence the genome Cladonia grayi, a lichenized fungus, with 454 technology at Duke’s Institute for Genome Sciences and Policy through their next-gen sequencing program. This is the way of the future where sequencing core facilities will be able to generate sequence only having to wait in the queue at the own university rather than through community sequencing project or sequencing center proposal queues.

This isn’t the only lichen being sequenced. Xanthoria parietina is also in the queue at JGI, but has taken a while to get going because of some logistical problems getting the DNA (and any problems are amplified because it takes a long time to get new material since lichens grow very slow).

The transfer of the power for researchers to be able to quick exploratory whole-genome sequencing with next-gen and eventually, high quality genome sequences from next-gen sequencing is predicted to transform how this kind of science gets done. It means we’ll probably just sequence a mutant strain instead of trying to map the mutation – this is happening already in anecdotal stories in worms and in our work in mushrooms. N.B. this is done after a mutagenized strain has been cleaned up a bit to insure we’re looking for one or only a few mutations based on some crosses – but that is part of standard genetic approaches anyways.

This fast,cheap,whole-genome-sequencing is also the stuff of personal genomics, but for basic research it will also mean that a first pass exploring gene repertoire of an organism will be a multi-week instead of multi-year project. I just hope we’re training enough people who can efficiently extract the information from all this data with solid bioinformatics, computational, data-oriented programming, and statistical skills to support all the labs that will want to take this approach. You’ll need a life-vest to swim in the big data pool for a while until more tools are developed that can be deployed by non-experts.

Fungal genome assembly from short-read sequences

This is a research blog so I though I’d post some quick numbers we are seeing for de novo assembly of the Neurospora crassa genome using Velvet. The genome of N.crassa is about 40Mb and sequencing of several flow cells using Solexa/Illumina technology to see what kind of de novo reconstruction we’d get. I knew that this is probably insufficient for a very good assembly given what has been reported in the literature, but sometimes it is helpful to give it a try on local data.  Mostly this is a project about SNP discovery from the outset. I used a hash size of 21 in velvet with an early (2FC) and later (4FC) dataset. Velvet was run with a hashsize of 21 for these data based on some calculations and running it with different hash sizes to see the optimal N50.  Summary contig size numbers come from the commands using cndtools from Colin Dewey.

  faLen < contigs.fa | stats

2 flowcells (~10M reads @36bp/read or about 10X coverage of 40Mb genome)

            N = 199562
          SUM = 25463251
          MIN = 49
       MEDIAN = 107.0
          MAX = 5371
         MEAN = 127.59568956
          N50 = 130

4 flow cells  (~20M reads @36bp/read; or about 20X coverage of a 40Mb genome)

            N = 102437
          SUM = 38352075
          MIN = 41
 1ST-QUARTILE = 77.0
       MEDIAN = 153
          MAX = 7189
         MEAN = 374.396702363
          N50 = 837

So that’s N50 of 837bp – for those used to seeing N50 on the order or 1.5Mb this is not great.  But from4 FC worth of sequencing which was pretty cheap.  This is a reasonably repeat-limited genome so we should get pretty good recovery if the seq coverage is high enough. Using Maq we can both scaffold the reads and recover a sufficient number of high quality SNPs for the mapping part of the project.

To get a better assembly one would need much deeper coverage as Daniel and Ewan explain in their Velvet paper and shown in Figure 4 (sorry, not open-access for 6 mo). Full credit: This sequence was from unpaired sequence reads from Illumina/Solexa Genomic sequencing done at UCB/QB3 facility on libraries prepared by Charles Hall in the Glass lab.