| Literature DB >> 34997147 |
Filipe Alves1, Filipa M S Martins1,2, Miguel Areias1, Antonio Muñoz-Mérida3,4.
Abstract
Analysis of intra- and inter-population diversity has become important for defining the genetic status and distribution patterns of a species and a powerful tool for conservation programs, as high levels of inbreeding could lead into whole population extinction in few generations. Microsatellites (SSR) are commonly used in population studies but discovering highly variable regions across species' genomes requires demanding computation and laboratorial optimization. In this work, we combine next generation sequencing (NGS) with automatic computing to develop a genomic-oriented tool for characterizing SSRs at the population level. Herein, we describe a new Python pipeline, named Micro-Primers, designed to identify, and design PCR primers for amplification of SSR loci from a multi-individual microsatellite library. By combining commonly used programs for data cleaning and microsatellite mining, this pipeline easily generates, from a fastq file produced by high-throughput sequencing, standard information about the selected microsatellite loci, including the number of alleles in the population subset, and the melting temperature and respective PCR product of each primer set. Additionally, potential polymorphic loci can be identified based on the allele ranges observed in the population, to easily guide the selection of optimal markers for the species. Experimental results show that Micro-Primers significantly reduces processing time in comparison to manual analysis while keeping the same quality of the results. The elapsed times at each step can be longer depending on the number of sequences to analyze and, if not assisted, the selection of polymorphic loci from multiple individuals can represent a major bottleneck in population studies.Entities:
Mesh:
Substances:
Year: 2022 PMID: 34997147 PMCID: PMC8741888 DOI: 10.1038/s41598-021-04275-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Details on the existing integrative microsatellite development tools for SSR mining and primer design for direct comparison to micro-Primers.
| Micro-Primers | MiMi# | QDD | SSREnricher# | IDSSR# | FullSSR# | Krait | SSR Pipeline | GMATA# | CandiSSR# | |
|---|---|---|---|---|---|---|---|---|---|---|
| This study | Fox et al.[ | Meglécz et al.[ | Luo et al.[ | Guang et al.[ | Metz et al.[ | Du et al.[ | Miller et al.[ | Wang et al.[ | Xia et al | |
| Auto-installation of all required software (through conda environment) | Software installation required (BioPython, MUSCLE, pandaseq) | Software installation required (RepeatMasker, NCBI nt database,bioperl, blast + , clustalW, primer3) | Software installation required ( Biopython, Perl, CD-HIT, muscle) | Software installation required (primer3) | Software installation required (primer3) | Software installation required (PySide2, pyfastx, numpy, requests, jinja2, appdirs, primer3-py, Cython, pyinstaller) | Software installation required (glibc) | Software installation required (e-PCR.exe) | Software installation required (Blast, Bioperl, MISA, ClustalW, Primer3) | |
| DNA source | Multi-individual (uniplexed) | Multi-individual (multiplexed) | Single-, multi-individual | Single-, multi-individual | Single-, multi-individual | Single-individual | Single-individual | Single-individual | Single-, multi-individual | Multi-individual |
| Target fragments | Restriction fragments and shotgun | Shotgun fragments | Shotgun fragments | Assembled sequences | Shotgun fragments | Assembled sequences | Assembled sequences | Shotgun fragments | Assembled sequences | Shotgun fragments |
| Input files | Fastq (reads) | Fastq (reads) | Cleaned fastq/fasta (contigs or reads) | Fasta (contigs) | Fasta [reference genome] and cleaned fastq (reads) | Cleaned fasta (contigs) | Cleaned fasta (contigs) | Fastq and fasta | Cleaned fasta (contigs) | Cleaned fasta and reference |
| Adapter/quality filtering | Trimmomatic w/ Cutadapt | FastQC w/Trimmomatic (Palfinder Galaxy Service) | ‒ | ‒ | ‒ | ‒ | ‒ | SSR Pipeline | ‒ | ‒ |
| Paired read merge | FLASH | Pandaseq | ‒ | ‒ | ‒ | ‒ | ‒ | FLASH | ‒ | ‒ |
| SSR search | MISA w/ CD-HIT | Pal_finder (Palfinder Galaxy Service) | QDD | MISA w/ CD-HIT | IDSSR | FullSSR | Krait | SSR Pipeline | GMATA | MISA |
| Primer design | Primer3 | Primer3 (Palfinder Galaxy Service) | Primer3 | ‒ | Primer3 [reference] | Primer3 | Primer3 | ‒ | GMATA | Primer3 |
| Loci filtering | Flanking length, repeat motif; observed and potential polymorphism, range of amplicon length | Repeat motif, observed polymorphism, SSR size-range | Minisatellite removal, flanking length, repeat motif, minimum coverage, locus specificity | Flanking length, repeat motif, observed polymorphism | Flanking length, minimum coverage, repeat motif, primer mismatch and specificity, observed polymorphism | Flanking length, repeat motif, imperfect repeat removal | Flanking length, compound, imperfect and perfect repeat motif, plots | Flanking length, repeat motif | Observed polymorphism, repeat motif, plots | Flanking length, loci coverage |
| Loci filtering (programs) | Micro-Primers | Pal_filter and PANDAseq* (Palfinder Galaxy Service) and MUSCLE (MiMi) | BLAST w/ RepeatMasker | SSREnricher | BLASTn; SOAP2 w/ SOAPindel | ‒ | ‒ | ‒ | ‒ | Blast, ClustalW |
| User interface | GUI | Galaxy (Palfinder only) | Galaxy | GUI | No | No | GUI | No | GUI | No |
Information taken from the description of each software.
*This step is optional.
Figure 1Micro-Primers’ output file capture. Columns show sequence ID, PCR amplicon length for the corresponding primer pair (size), sequence and melting temperature for left and right primer, SSR pattern (Motif), range of sizes for the SSR loci including all alleles (range), number of alleles found for the SSR and maximum number of alleles to expect from the difference between the longest and the shortest allele (Alleles), flag for best primer pair for the SSR loci (Flag) and the sequence from where the microsatellite was found and primers where designed from (Sequence).
Variation of the selected sequences during the steps of Micro-Primers.
| MIN_ALLEL_CNT | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
|---|---|---|---|---|---|---|---|---|
| SPECIAL DIF | 0 | 1(8) | 0 | 0 | 0 | 0 | 0 | 0 |
| MIN FLANK LEN | 50 | 50 | 25 | 75 | 100 | 50 | 50 | 50 |
| MAX DIFF TM | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.2 | 1 | 2 |
| Original FASTQ file | 259,506 | |||||||
| Trimming | 188,843 | |||||||
| Pair-end merge | 130,603 | |||||||
| Filter 1 | 19,695 | |||||||
| Filter 2 | 8083 | 8083 | 9568 | 6616 | 3752 | 8083 | 8083 | 8083 |
| Clusters | 4924 | 4924 | 6049 | 3801 | 2453 | 4924 | 4924 | 4924 |
| Filter 3 | 2092 | 2092 | 2110 | 2028 | 779 | 2092 | 2092 | 2092 |
| Unique loci | 26 | 104 | 28 | 23 | 20 | 26 | 26 | 26 |
| Primers selected | 23 | 83 | 23 | 20 | 15 | 25 | 21 | 21 |
Figure 2Visualization of the repeated region for the SSR loci Bats1 (A), Bats8 (B), Bats13 (C) and Bats20 (D) including the sequence ID of each of the alleles. SSR loci Bats1 was identified by Micro-Primers with 19 observed alleles and 19 expected, Bats8 loci was predicted with 10 observed alleles and 11 expected, Bats13 was predicted with 7 observed alleles and 13 expected, and Bats20 loci was predicted with 5 observed alleles and 12 expected.
Figure 3Flowchart of Micro-Primers. Green diamonds represent the different sequence filtering stages based on the parameters established by the user.