David Carter at the Sanger Centre emailed a message that new assemblies of Saccharomyces strain resequencing project have been posted including a new three-way alignment of S. bayanus–S.paradoxus–S.cerevisiae. This updates the Dec 2007 release.
“I have uploaded a new release of the SGRP data to our FTP server:
This release, which supersedes the one made on December 4th 2007,
fixes some bugs and adds several further types of data files. The
alignments and assemblies are different, but the reads are not. I
hope that this release will be the final one in the sense that the
data in it will not change, though other files may be added in
the future, in which case I will send out another message.
A user manual for the data is available at http://www.sanger.ac.uk/Teams/Team71/durbin/sgrp/sgrp_manual.pdf
- Quality scores are now combined in a better-motivated way for both
ABI and Solexa data, and a bug has been fixed which caused quality
scores for many reverse-strand alignments to be misaligned with their
- Some nucleotides now appear as “N” in the “imputed.gz” data
files. These are for regions which seem to have diverged significantly
from the reference so that no safe alignments or imputations are
possible. About 5% of each strain sequence is affected. The “sequenced.gz”
files are not affected by this change.
- There is a three-way alignment between S cerevisiae, S paradoxus
and S bayanus.
- There is a genome.gff file for S paradoxus, lifted over from the S
cerevisiae one using the inter-species alignments. Please treat this
with extreme caution; it has not been checked at all, and in
particular, the regions marked as coding sequences contain many frame
shifts, non-final stop codons and other problems. Thus you should take
the feature type “CDS” to mean “orthologous to a CDS in S cerevisiae”
rather than necessarily “a CDS in S paradoxus”.
- Various bad alignments in the previous version have been removed.
- Files have been added containing contigs created by Casey Bergman
with PCAP from the reads for each strain; listings of every SNP
detected; translations of every coding sequence in the genome;
details of how every read is aligned; which reads have higher than
expected numbers of disagreements with co-aligned reads from the
same strain, indicating possible mapping errors or copy number
variation; and where the recombination points are estimated to be.
For full details, see the user manual.”