New Saccharomyces resequencing assembly

SGRP LogoDavid Carter at the Sanger Centre emailed a message that new assemblies of Saccharomyces strain resequencing project have been posted including a new three-way alignment of S. bayanusS.paradoxusS.cerevisiae. This updates the Dec 2007 release.

“I have uploaded a new release of the SGRP data to our FTP server:

This release, which supersedes the one made on December 4th 2007,
fixes some bugs and adds several further types of data files. The
alignments and assemblies are different, but the reads are not. I
hope that this release will be the final one in the sense that the
data in it will not change, though other files may be added in
the future, in which case I will send out another message.

A user manual for the data is available at

  1. Quality scores are now combined in a better-motivated way for both
    ABI and Solexa data, and a bug has been fixed which caused quality
    scores for many reverse-strand alignments to be misaligned with their
  2. Some nucleotides now appear as “N” in the “imputed.gz” data
    files. These are for regions which seem to have diverged significantly
    from the reference so that no safe alignments or imputations are
    possible. About 5% of each strain sequence is affected. The “sequenced.gz”
    files are not affected by this change.
  3. There is a three-way alignment between S cerevisiae, S paradoxus
    and S bayanus.
  4. There is a genome.gff file for S paradoxus, lifted over from the S
    cerevisiae one using the inter-species alignments. Please treat this
    with extreme caution; it has not been checked at all, and in
    particular, the regions marked as coding sequences contain many frame
    shifts, non-final stop codons and other problems. Thus you should take
    the feature type “CDS” to mean “orthologous to a CDS in S cerevisiae”
    rather than necessarily “a CDS in S paradoxus”.
  5. Various bad alignments in the previous version have been removed.
  6. Files have been added containing contigs created by Casey Bergman
    with PCAP from the reads for each strain; listings of every SNP
    detected; translations of every coding sequence in the genome;
    details of how every read is aligned; which reads have higher than
    expected numbers of disagreements with co-aligned reads from the
    same strain, indicating possible mapping errors or copy number
    variation; and where the recombination points are estimated to be.

For full details, see the user manual.”

Leave a Reply