A Catalog of Molecular Diversity of Prunus Germplasm Gathered from Aligning NGS Reads to the Peach Reference Sequence: Bioinformatic Approaches and Challenges


Genome analysis based on next generation sequencing (NGS) technologies provides a novel approach for surveying molecular diversity among individuals, which in turn can generate tools for linkage and association mapping, gene cloning, molecular breeding, population genetics, germplasm management, and crop systematics and evolution. ‘De novo’ assembly of short reads is challenging in general and even more so as the size and complexity of genomes increase. A high quality and well annotated reference genome sequence can help solve most of the conflicts. Yet, the identification of several structural variants, such as the movement of transposable elements, large insertions/deletions, segmental duplications, inversions and other genomic features is still a challenge to algorithms and automatic procedures. We sequenced 14 Prunus accessions that include ten peach cultivars, two wild peach-related species, one almond and one apricot accession using the NGS Illumina platform. We produced 64 to 109 bp long single reads as well as paired ends from approx. 300-500 bp long fragments. The coverage varied from approximately 16 to 75 genome equivalents. Individual genomes were aligned using the reference sequence of the doubled haploid peach cultivar ‘Lovell’, recently released by the International Peach Genome Initiative (IPGI) (http://www.rosaceae. org/peach/genome). In this paper we present a repertoire of molecular variants that can be mined, namely SNPs (Single Nucleotide Polymorphisms), DIPs (Deletion/Insertion Polymorphisms), larger structural variations, which include movement of transposable elements, the so called copy-number variations, segmental duplications and others. Some of these variants, such as SNPs, are easily detectable and much commercial and open-access software can perform the search. Others variants, such as the large structural variations, still need analytical approaches to be implemented or improved. For several variants, theoretical and methodological approaches are presented and discussed and, when available, preliminary results are reported.