Literature DB >> 18323538

Detecting polymorphic regions in Arabidopsis thaliana with resequencing microarrays.

Georg Zeller1, Richard M Clark, Korbinian Schneeberger, Anja Bohlen, Detlef Weigel, Gunnar Rätsch.   

Abstract

Whole-genome oligonucleotide resequencing arrays have allowed the comprehensive discovery of single nucleotide polymorphisms (SNPs) in eukaryotic genomes of moderate to large size. With this technology, the detection rate for isolated SNPs is typically high. However, it is greatly reduced when other polymorphisms are located near a SNP as multiple mismatches inhibit hybridization to arrayed oligonucleotides. Contiguous tracts of suppressed hybridization therefore typify polymorphic regions (PRs) such as clusters of SNPs or deletions. We developed a machine learning method, designated margin-based prediction of polymorphic regions (mPPR), to predict PRs from resequencing array data. Conceptually similar to hidden Markov models, the method is trained with discriminative learning techniques related to support vector machines, and accurately identifies even very short polymorphic tracts (<10 bp). We applied this method to resequencing array data previously generated for the euchromatic genomes of 20 strains (accessions) of the best-characterized plant, Arabidopsis thaliana. Nonredundantly, 27% of the genome was included within the boundaries of PRs predicted at high specificity ( approximately 97%). The resulting data set provides a fine-scale view of polymorphic sequences in A. thaliana; patterns of polymorphism not apparent in SNP data were readily detected, especially for noncoding regions. Our predictions provide a valuable resource for evolutionary genetic and functional studies in A. thaliana, and our method is applicable to similar data sets in other species. More broadly, our computational approach can be applied to other segmentation tasks related to the analysis of genomic variation.

Entities:  

Mesh:

Year:  2008        PMID: 18323538      PMCID: PMC2413159          DOI: 10.1101/gr.070169.107

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  46 in total

1.  Molecular analysis of FRIGIDA, a major determinant of natural variation in Arabidopsis flowering time.

Authors:  U Johanson; J West; C Lister; S Michaels; R Amasino; C Dean
Journal:  Science       Date:  2000-10-13       Impact factor: 47.728

2.  High-throughput variation detection and genotyping using microarrays.

Authors:  D J Cutler; M E Zwick; M M Carrasquillo; C T Yohn; K P Tobin; C Kashuk; D J Mathews; N A Shah; E E Eichler; J A Warrington; A Chakravarti
Journal:  Genome Res       Date:  2001-11       Impact factor: 9.043

3.  Rapid gene mapping in Caenorhabditis elegans using a high density polymorphism map.

Authors:  S R Wicks; R T Yeh; W R Gish; R H Waterston; R H Plasterk
Journal:  Nat Genet       Date:  2001-06       Impact factor: 38.330

4.  A haplotype map of the human genome.

Authors: 
Journal:  Nature       Date:  2005-10-27       Impact factor: 49.962

5.  Recombination and linkage disequilibrium in Arabidopsis thaliana.

Authors:  Sung Kim; Vincent Plagnol; Tina T Hu; Christopher Toomajian; Richard M Clark; Stephan Ossowski; Joseph R Ecker; Detlef Weigel; Magnus Nordborg
Journal:  Nat Genet       Date:  2007-08-05       Impact factor: 38.330

6.  An introduction to kernel-based learning algorithms.

Authors:  K R Müller; S Mika; G Rätsch; K Tsuda; B Schölkopf
Journal:  IEEE Trans Neural Netw       Date:  2001

7.  Large-scale identification and analysis of genome-wide single-nucleotide polymorphisms for mapping in Arabidopsis thaliana.

Authors:  Karl J Schmid; Thomas Rosleff Sorensen; Ralf Stracke; Otto Torjek; Thomas Altmann; Tom Mitchell-Olds; Bernd Weisshaar
Journal:  Genome Res       Date:  2003-06       Impact factor: 9.043

8.  Athena: a resource for rapid visualization and systematic analysis of Arabidopsis promoter sequences.

Authors:  Timothy R O'Connor; Curtis Dyreson; John J Wyrick
Journal:  Bioinformatics       Date:  2005-10-13       Impact factor: 6.937

9.  Improving the Caenorhabditis elegans genome annotation using machine learning.

Authors:  Gunnar Rätsch; Sören Sonnenburg; Jagan Srinivasan; Hanh Witte; Klaus-R Müller; Ralf-J Sommer; Bernhard Schölkopf
Journal:  PLoS Comput Biol       Date:  2006-12-21       Impact factor: 4.475

10.  Global discriminative learning for higher-accuracy computational gene prediction.

Authors:  Axel Bernal; Koby Crammer; Artemis Hatzigeorgiou; Fernando Pereira
Journal:  PLoS Comput Biol       Date:  2007-02-02       Impact factor: 4.475

View more
  29 in total

Review 1.  Natural variation in Arabidopsis: from molecular genetics to ecological genomics.

Authors:  Detlef Weigel
Journal:  Plant Physiol       Date:  2011-12-06       Impact factor: 8.340

2.  Web-based Arabidopsis functional and structural genomics resources.

Authors:  Yan Lu; Robert L Last
Journal:  Arabidopsis Book       Date:  2008-10-28

Review 3.  Web-queryable large-scale data sets for hypothesis generation in plant biology.

Authors:  Siobhan M Brady; Nicholas J Provart
Journal:  Plant Cell       Date:  2009-04-28       Impact factor: 11.277

4.  Sequencing of natural strains of Arabidopsis thaliana with short reads.

Authors:  Stephan Ossowski; Korbinian Schneeberger; Richard M Clark; Christa Lanz; Norman Warthmann; Detlef Weigel
Journal:  Genome Res       Date:  2008-09-25       Impact factor: 9.043

5.  MicroRNAs in plants: Possible contributions to phenotypic diversity.

Authors:  Ian M Ehrenreich; Michael Purugganan
Journal:  Plant Signal Behav       Date:  2008-10

6.  The 1001 genomes project for Arabidopsis thaliana.

Authors:  Detlef Weigel; Richard Mott
Journal:  Genome Biol       Date:  2009-05-27       Impact factor: 13.583

7.  Single feature polymorphism (SFP)-based selective sweep identification and association mapping of growth-related metabolic traits in Arabidopsis thaliana.

Authors:  Liam H Childs; Hanna Witucka-Wall; Torsten Günther; Ronan Sulpice; Maria V Korff; Mark Stitt; Dirk Walther; Karl J Schmid; Thomas Altmann
Journal:  BMC Genomics       Date:  2010-03-20       Impact factor: 3.969

8.  Comprehensive analysis of Arabidopsis expression level polymorphisms with simple inheritance.

Authors:  Stephanie Plantegenet; Johann Weber; Darlene R Goldstein; Georg Zeller; Cindy Nussbaumer; Jérôme Thomas; Detlef Weigel; Keith Harshman; Christian S Hardtke
Journal:  Mol Syst Biol       Date:  2009-02-17       Impact factor: 11.429

9.  Simultaneous alignment of short reads against multiple genomes.

Authors:  Korbinian Schneeberger; Jörg Hagmann; Stephan Ossowski; Norman Warthmann; Sandra Gesing; Oliver Kohlbacher; Detlef Weigel
Journal:  Genome Biol       Date:  2009-09-17       Impact factor: 13.583

10.  Substantial deletion overlap among divergent Arabidopsis genomes revealed by intersection of short reads and tiling arrays.

Authors:  Luca Santuari; Sylvain Pradervand; Amelia-Maria Amiguet-Vercher; Jerôme Thomas; Eavan Dorcey; Keith Harshman; Ioannis Xenarios; Thomas E Juenger; Christian S Hardtke
Journal:  Genome Biol       Date:  2010-01-12       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.