Literature DB >> 19040730

ProSeeK: a web server for MLPA probe design.

Lorena Pantano¹, Lluís Armengol, Sergi Villatoro, Xavier Estivill.

Abstract

BACKGROUND: The technological evolution of platforms for detecting genome-wide copy number imbalances has allowed the discovery of an unexpected amount of human sequence that is variable in copy number among individuals. This type of human variation can make an important contribution to human diversity and disease susceptibility. Multiplex Ligation-dependent Probe Amplification (MLPA) is a targeted method to assess copy number differences for up to 40 genomic loci in one single experiment. Although specific MLPA assays can be ordered from MRC-Holland (the proprietary company of the MLPA technology), custom designs are also developed in many laboratories worldwide. After our own experience, an important drawback of custom MLPA assays is the time spent during the design of the specific oligonucleotides that are used as probes. Due to the large number of probes included in a single assay, a number of restrictions need to be met in order to maximize specificity and to increase success likelihood.
RESULTS: We have developed a web tool for facilitating and optimising custom probe design for MLPA experiments. The algorithm only requires the target sequence in FASTA format and a set of parameters, that are provided by the user according to each specific MLPA assay, to identify the best probes inside the given region.
CONCLUSION: To our knowledge, this is the first available tool for optimizing custom probe design of MLPA assays. The ease-of-use and speed of the algorithm dramatically reduces the turn around time of probe design. ProSeeK will become a useful tool for all laboratories that are currently using MLPA in their research projects for CNV studies.

Entities: CellLine Chemical Disease Species

Mesh：

Substances：
DNA Probes

Year: 2008 PMID： 19040730 PMCID： PMC2625369 DOI： 10.1186/1471-2164-9-573

Source DB: PubMed Journal: BMC Genomics ISSN： 1471-2164 Impact factor: 3.969

Background

The technological evolution of platforms for assessing genome-wide copy number imbalances [1] has allowed the discovery of an unexpected amount of human sequence involved in duplications and deletions (termed copy number variants or CNVs). In terms of sequence coverage, this is the most important type of human variation identified so far and can make an important contribution to human diversity and disease susceptibility (see [2] for review). So far, derived from the study of several hundreds of individual genomes, ~19% of the euchromatic portion of the human genome has been reported as variable (mainly in copy number) [3]. Several studies have shown the relationship between CNVs and disease phenotypes [4,5]. MLPA [6], Multiplex Ligation-dependent Probe Amplification, is a targeted method to assess copy-number differences for up to 40 genomic regions in one single experiment. Each MLPA probe is composed of two oligonucleotides that are only ligated, and subsequently amplified, if specifically hybridized to the target locus. The left probe oligonucleotide (LPO) is made of a complementary sequence of an universal forward PCR primer at its 5' end, plus the specific hybridizing sequence (LHS) at its 3' end. The right oligonucleotide (RPO) has the specific hybridizing sequence (RHS) at its 5' end followed by the complementary sequence to the reverse universal PCR primer, at the 3' end. After ligation all probes are amplified, by means of a universal primer pair, in a multiplex PCR reaction. This PCR produces loci-specific amplicons due to a stuffer sequence located between the hybridizing and the universal sequences. They are then resolved by capillary electrophoresis and copy number of each region is measured as a function of peak intensities of the MLPA amplification products (Figure 1).

Figure 1

MLPA assay. Typical steps in a MLPA assay and the final output where each peak represents a probe in the experiment.

MLPA assay. Typical steps in a MLPA assay and the final output where each peak represents a probe in the experiment. Although specific MLPA assays can be ordered from MRC-Holland (the proprietary company of the MLPA technology), custom designs are also developed in many laboratories worldwide. After our own experience, an important drawback of custom MLPA assays is the time spent during the design of the specific oligonucleotides. Due to the large number of probes included in a single assay, a number of restrictions need to be met in order to maximize specificity and to increase success likelihood. Given the tedious stepwise procedure that is followed, the goal of ProSeeK is to automate the process of probe design and to obtain the best candidate probes for a given region.

Implementation

ProSeeK is presented as an easy-to-use and point-and-click web interface. Is implemented in CGI (Common Gateway Interface) Perl scripts and made accessible to the user using PHP on top of an APACHE server with MYSQL database support. It is accessible through the Internet (at ) with IE5.0 and Netscape 7 or higher, from any platform. By making use of universally available web GUIs, the system solves the problem of portability of this software. No client-side software installation is required. The algorithm for probe design consists of several modules (Figure 2) that are run iteratively. (1) Sequence Checker, ensures that a valid sequence format is entered by the user. (2) Hybridization Finder, that identifies a set of hybridizing sequences (HSs), with the correct size, that are required to start and end with either a C or a G (according to the MRC-Holland protocol advises). Candidate HSs are filtered based on melting temperature and GC content, according to the set of thresholds provided by the user, and subsequently added primers and stuffers as needed. (3) Sequence Aligner, that performs a genome alignment using BLAT [7] to identify the optimal HS. Candidate HSs, regardless of having single or multiple matches to the genome, are filtered by the e-value of the alignment before passing to the next module. In the case that the HSs map onto a copy number variable (CNV) region or segmental duplication (SD) in the reference genome, the HSs are only recognized as optimal if the multiple matches are perfect and the other possible matches are below the e-value threshold. In the case that the probes designed are located in CNV or in SDs regions, this information is shown to the user in the output flagging them as 'CNR' and 'SD' respectively. (4) HS Trimmer, conveniently splits the HSs to fulfill user-entered criteria in terms of length, melting temperature and global sequence composition. (5) Results Generator, presents results to the user. ProSeeK can be asked to retrieve different results: the "partial" will only produce the optimal design for the left and right hybridizing sequences, while the "complete" will produce the whole oligonucleotide sequences corresponding to the LPO and RPO (see above). (6) Data Keeper, takes care of storing the results in a personal space of the database for future retrieval of the designs.

Figure 2

Protocol. Flowchart describing the implementation of ProSeek algorithm. HS (Hybridizing sequence), PP (Partial project), CP (complete project), FR (Forward primers), RP (Reverse primers).

Input to server

ProSeek requires the DNA sequence of the target region in which the MLPA probes will be designed. Several parameters can be used to restrict the probe design: (1) maximum GC content, (2) maximum melting temperature (Tm) of the hybridizing sequence, (3) Blat e-value (minimal length that the Blat will detect as a match), (4) hybridizing sequences (HS) length, (5) stuffer sequence, (6) sequence of the universal primers to flank the HS, and (7) desired probe length. (Additional file 1).

Output from server

After computing all available possibilities, ProSeek produces a table in HTML format containing optimal probes which are presented to the user, together with their characteristics, which include position within the user-entered sequence, genome mapping, GC content, melting temperature, probe sequence, nucleotide length, self-folding capacity (i.e. DNA secondary structure prediction using DINAMelt Server [8]), and links to the UCSC Genome Browser [9] and to the Database of Genomic Variants [10] The projects are kept on the server for one month, so the users can retrieve their results at any time by returning to the website and identifying himself on the initial web page. (Additional file 1).

Conclusion

A number of high-throughput technologies have become available to address the genome-wide detection of structural variations in humans. An important drawback of these new methods is that a huge amount of false positive results typically arise after analysis, thus it is mandatory to validate observations made with these technologies using alternative and more reliable approaches. Among others, due to its simplicity, robustness and relative low price, the MLPA is often used as a targeted method to assess copy-number differences. One important inconvenience is the required time for designing the probe-mixes to target the desired regions, since a lot of restriction should be fulfilled to get a sensitive, specific and reproducible experiment. To overcome this aspect, we developed ProSeek that produces the optimal probes for the regions of interest. ProSeeK is, to our knowledge, the first algorithm for the design of MLPA probes, that allows saving time and improving accuracy of MLPA assays.

Availability and requirements

• Project name: ProSeeK • Project home page: • Programming language: Perl • License: GNU General Public License

Authors' contributions

LP initiated this web server project, wrote the original source code, constructed the web interface, implemented it on the server and wrote the manuscript. LA conceived the server and participated in manuscript writing. XE revised and helped to write the manuscript. SV helped to design the web interface particularly from the viewpoint of an experimental research field. All authors contributed to the final manuscript and agreed the final version.

Additional file 1

Tutorial. A complete tutorial explaining step by step the ProSeeK procedure. Click here for file

10 in total

1. BLAT--the BLAST-like alignment tool.

Authors: W James Kent
Journal: Genome Res Date: 2002-04 Impact factor: 9.043

2. The human genome browser at UCSC.

Authors: W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal: Genome Res Date: 2002-06 Impact factor: 9.043

3. Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification.

Authors: Jan P Schouten; Cathal J McElgunn; Raymond Waaijer; Danny Zwijnenburg; Filip Diepvens; Gerard Pals
Journal: Nucleic Acids Res Date: 2002-06-15 Impact factor: 16.971

4. Detection of large-scale variation in the human genome.

Authors: A John Iafrate; Lars Feuk; Miguel N Rivera; Marc L Listewnik; Patricia K Donahoe; Ying Qi; Stephen W Scherer; Charles Lee
Journal: Nat Genet Date: 2004-08-01 Impact factor: 38.330

Review 5. Structural variants: changing the landscape of chromosomes and design of disease studies.

Authors: Lars Feuk; Christian R Marshall; Richard F Wintle; Stephen W Scherer
Journal: Hum Mol Genet Date: 2006-04-15 Impact factor: 6.150

Review 6. Structural variation in the human genome.

Authors: Lars Feuk; Andrew R Carson; Stephen W Scherer
Journal: Nat Rev Genet Date: 2006-02 Impact factor: 53.242

Review 7. Methods and strategies for analyzing copy number variation using DNA microarrays.

Authors: Nigel P Carter
Journal: Nat Genet Date: 2007-07 Impact factor: 38.330

Review 8. Copy-number variation and association studies of human disease.

Authors: Steven A McCarroll; David M Altshuler
Journal: Nat Genet Date: 2007-07 Impact factor: 38.330

Review 9. Challenges and standards in integrating surveys of structural variation.

Authors: Stephen W Scherer; Charles Lee; Ewan Birney; David M Altshuler; Evan E Eichler; Nigel P Carter; Matthew E Hurles; Lars Feuk
Journal: Nat Genet Date: 2007-07 Impact factor: 38.330

10. DINAMelt web server for nucleic acid melting prediction.

Authors: Nicholas R Markham; Michael Zuker
Journal: Nucleic Acids Res Date: 2005-07-01 Impact factor: 16.971

10 in total

5 in total

1. De novo mutations in FOXP1 in cases with intellectual disability, autism, and language impairment.

Authors: Fadi F Hamdan; Hussein Daoud; Daniel Rochefort; Amélie Piton; Julie Gauthier; Mathieu Langlois; Gila Foomani; Sylvia Dobrzeniecka; Marie-Odile Krebs; Ridha Joober; Ronald G Lafrenière; Jean-Claude Lacaille; Laurent Mottron; Pierre Drapeau; Miriam H Beauchamp; Michael S Phillips; Eric Fombonne; Guy A Rouleau; Jacques L Michaud
Journal: Am J Hum Genet Date: 2010-10-14 Impact factor: 11.025

2. A common cognitive, psychiatric, and dysmorphic phenotype in carriers of NRXN1 deletion.

Authors: Marina Viñas-Jornet; Susanna Esteba-Castillo; Elisabeth Gabau; Núria Ribas-Vidal; Neus Baena; Joan San; Anna Ruiz; Maria Dolors Coll; Ramon Novell; Miriam Guitart
Journal: Mol Genet Genomic Med Date: 2014-08-18 Impact factor: 2.183

3. High Incidence of Copy Number Variants in Adults with Intellectual Disability and Co-morbid Psychiatric Disorders.

Authors: Marina Viñas-Jornet; Susanna Esteba-Castillo; Neus Baena; Núria Ribas-Vidal; Anna Ruiz; David Torrents-Rodas; Elisabeth Gabau; Elisabet Vilella; Lourdes Martorell; Lluís Armengol; Ramon Novell; Míriam Guitart
Journal: Behav Genet Date: 2018-06-07 Impact factor: 2.805

4. BAC array CGH in patients with Velocardiofacial syndrome-like features reveals genomic aberrations on chromosome region 1q21.1.

Authors: Anna Brunet; Lluís Armengol; Damià Heine; Jordi Rosell; Manel García-Aragonés; Elisabeth Gabau; Xavier Estivill; Miriam Guitart
Journal: BMC Med Genet Date: 2009-12-23 Impact factor: 2.103

5. Identification of copy number variants defining genomic differences among major human groups.

Authors: Lluís Armengol; Sergi Villatoro; Juan R González; Lorena Pantano; Manel García-Aragonés; Raquel Rabionet; Mario Cáceres; Xavier Estivill
Journal: PLoS One Date: 2009-09-30 Impact factor: 3.240

5 in total