Literature DB >> 34912169

RAD-R scripts: R pipeline for RAD-seq from FASTQ files to linkage maps construction and run R/QTL, operating only at copying and pasting scripts into R console.

Kousuke Seki1.   

Abstract

Coupled with the reduction in sequencing costs, the number of RAD-seq analysis have been surging, generating vast genetic knowledge in relation with many crops. Specialized platforms might be intimidating to non-expert users and difficult to implement on each computer despite the growing interest in the usage of the dataset obtained by high-throughput sequencing. Therefore, RAD-R scripts were developed on Windows10 for RAD-seq analysis, allowing users who are not familiar with bioinformatics to easily analyze big sequence data. These RAD-R scripts that run a flow from raw sequence reads of F2 population for the self-fertilization plants to the linkage map construction as well as the QTL analysis can be also useful to many users with limited experience due to the simplicity of copying Excel cells into the R console. During the comparison of linkage maps constructed by RAD-R scripts and Stacks, RAD-R scripts were shown to construct the linkage map with less missing genotype data and a shorter total genetic distance. QTL analysis results can be easily obtained by selecting the reliable genotype data that is visually inferred to be appropriate for error correction from the genotype data files created by RAD-R scripts.
Copyright © 2021 by JAPANESE SOCIETY OF BREEDING.

Entities:  

Keywords:  BWA; Excel software; R pipeline; R/QTL; RAD-seq; Windows10; copying and pasting

Year:  2021        PMID: 34912169      PMCID: PMC8661492          DOI: 10.1270/jsbbs.20159

Source DB:  PubMed          Journal:  Breed Sci        ISSN: 1344-7610            Impact factor:   2.086


Introduction

With the widespread use of high-throughput sequencing, the number of crops to which genomic breeding can be applied is increasing (Liu , Matsumura , Seki , Talukder , Wang ). High-throughput sequencing has contributed to the study of not only model but non-model plant genomics by providing big sequence data, while whole-genome sequencing of a large variety of crops has also become increasingly available (Seki , Wang , Yonemaru ). Moreover, the RAD-seq (Restriction-site Associated DNA sequencing), which targets only polymorphisms contained in short genomic sequences flanked by restriction enzyme sites and is able to obtain high-resolution population genomic data at a time is considered to be a suitable analytical method for the rapid understanding of crop traits (Matsumura , Seki , Talukder , Wang ). Stacks is a widely known software for the analysis of big sequence data obtained with RAD-seq (Catchen , Rochette ), although its expertise-based nature can make it difficult to be implemented for users with low computer knowledge, which is the case for most plant breeders. Therefore, an in silico analysis was set out to be developed, allowing users who are not familiar with bioinformatics to easily analyze the big sequence data of RAD-seq. For the significant reduction of the required personal implementation, R scripts were created by the utilization of the Excel software (Broman ) from raw sequence reads to the construction map and QTL analysis with the R/QTL package. A validation study was conducted to validate the performance of RAD-R scripts, and comparison results with the stacks were described.

Materials and Methods

The availability of scripts

RAD-R scripts are provided as R scripts using the Excel software. The Excel file named “RAD-R scripts.xlsx” for the RAD-seq analysis is freely available at https://github.com/KousukeSEKI/RAD-seq_scripts. Since green cells are marked with red letters indicating parts that need to be filled in, green cells must be filled out by each user upon which R scripts will appear in the purple cells.

RAD-seq library and sequencing

The RAD-seq library procedure is an improvement of the previously reported method (Matsumura , Seki ). Two restriction enzymes, PacI and NlaIII were employed in this study. To avoid sequencing errors due to sequence imbalance, two libraries, namely “Pac5Nla7” and “Pac7Nla5” were applied to each half of the total sample (Fig. 1). For this reason, this script can be trimmed with four kinds of adapters. After the specific ligation of the adapters to the 5ʹ- and 3ʹ-end of various genomic sequences, the library was prepared by PCR with primers containing index sequences (Fig. 1). The adapter sequences to be entered into the green cells in the “1. Input and Output files list” tab of the Excel file were shown in the image of the RAD-seq library structure as Adapter 1 to 4 (Fig. 1). This script can be applied to either single or double digestion by restriction enzymes. In case of sequence reads without adapters, the script can be run by completing a sequence with dozens of consecutive “A”, “T”, “C”, and “G” sequences in the green cells to fill in the adapter sequence (Fig. 2).
Fig. 1.

The overview of the RAD-seq library structure and the adapter sequence.

Fig. 2.

Screenshot of the RAD-R scripts implemented by the Excel software. To set up the script, simply fill in the Information about the specification of the data file and the trimming values in the green cell with the red character of the “1. Input and Output files list” tab.

Information about input files

In the “1. Input and Output files list” tab of the Excel file (Fig. 2), enter the address of the folder containing the raw sequence reads as well as the address of the folder where the analysis data are planned to be saved, based on the entry example. In case of files of the raw sequence read for the parents and the F2 population, enter the full path addresses. A full_path script in purple cells can be used to examine the full path of raw sequence reads. Next, names for the parents and individuals in the F2 population are entered. Since names with only numbers will produce errors, users should use the examples to determine the names. Additionally, an underbar should be used instead of a hyphen. Some green cells have been set up to be able to be chosen from a tab list. It is recommended to run the analysis with the default values of the parameters at first.

The input of the phenotype data

In the “1. Input and Output files list” tab and “2. Input Phenotype data” tab, the parent and the names of the F2 individuals were linked. Entering phenotype data into the “2. Input Phenotype data” tab, the phe file for R/QTL is automatically created by run RAD-R scripts.

Preparation of reference genome sequence for alignment using BWA

Reference genome sequences should be saved in the save folder. This R script was designed to make use of the pseudomolecule-level referencing of genome sequences. It can be used for the scaffold-level referencing of genome sequences, although it is not recommended due to the high probability of error in the ABHgenotypeR package (Furuta ). If contigs or scaffolds are found to exist in the reference genome sequences, it is advised to remove them from the fasta file of the reference genome sequence beforehand. The R and Burrows-Wheeler Alignment tool (BWA) are used by this pipeline (Li and Durbin 2009). BWA is a Linux-based and BSD-based Mac OS (Unix) software. In the case of Windows10, BWA can run on the Windows System for Linux (WSL). Microsoft’s HP (https://docs.microsoft.com/ja-jp/windows/wsl/install-win10) describes the implementation of WSL. In this script, Ubuntu was adopted. To install the BWA, the BWA_install script should be run with the R console after implementing the WSL. The index files of the reference genome sequence should be created for the BWA by using the make_BWA_index script presented in the purple cells beforehand.

The R packages required for the RAD-R scripts

The Biostrings package, ShortRead package, QuasR package, data.table package, ABHgenotypeR package, and the qtl package are all required to be installed with the R_package install script presented in the purple cells (Gentleman , R Development Core Team 2017).

Run RAD-R scripts

RAD-R scripts consist of two main R scripts, including the “3. From FASTQ to linkage map” as well as the “4. QTL” that can be easily run by copying and pasting the 19674 lines of R script displayed in the purple cells of the “3. From FASTQ to linkage map” tab into the R console. To perform the QTL analysis, the script of the “4. QTL” tab should be run. Detailed manual and sample data are available at https://github.com/KousukeSEKI/RAD-seq_scripts.

Quality control

The quality control of the raw sequence reads is necessary as the accuracy of the linkage map is affected by their quality. Raw sequence reads are provided by high-throughput sequencing with a certain percentage of reading errors. Therefore, the PHRED score of the raw sequence reads can be checked by the qQCReport script provided in the purple cells, potentially setting the base trimming of some low-quality sequences of the 5ʹ- and 3ʹ-end (Gaidatzis ). The default settings include the trimming of 6 bp of the 5ʹ-end as well as 1 bp of the 3ʹ-end. Besides, it can remove low-quality reads and the adapter sequences from the raw sequence read with the RAD-R scripts. By default, the minimum PHRED score is set to 35 and the acceptable ratio for the low PHRED score is 0.1 (i.e., 10%). The details of the adapter sequence were shown in Fig. 1.

Approaches to linkage map construction

By comparing the sequence reads in the FASTQ files of the parents, each parent-unique sequence and the common sequences were picked up and three FASTA files were subsequently created. The aligned sequences and positions obtained from the sam files through the alignment of these FASTA files to the reference genome sequence with BWA were saved in the “mappinggenotypelist.csv” file. The mem and aln algorithms were provided in the BWA. From the sam files obtained from each algorithm, two patterns of genotype data were created, one through the application of all the mapped data and the other using only the data with the highest mapping quality. Therefore, the “mappinggenotypelist.csv” file was created in four versions: “mem”, “mem_60”, “aln” and “aln_37”. For each plant in an F2 population, the sequence and the read frequency per sequence were summarized in “name.tagcount” files using the table function. The list of the read frequency for all individual plants about each aligned sequence will were summarized in the “genotypingSum_complete.csv” file by comparing the sequences located in the “name.tagcount” files with those in the “mappinggenotypelist.csv” file using the match function. The “genotypingSum_complete.csv” was considered to be the initial list of genotype data. This pipeline was designed to specifically address F2 populations of the self-fertilization plants, and was not suitable for heterozygous plants. For the dataset of each aligned sequence, the chi-square test was performed for the presence or absence of the read frequency in association with all F2 individual plants as well as datasets of aligned sequences with outliers of p values (3:1) were excluded from the “genotypingSum_complete.csv” file. As the genotype data produced by RAD-seq has a significant amount of missing data in comparison with datasets obtained from genetic markers obtained by conventional methods, the ABHgenotypeR package was adopted to correct errors in the genotype data (Furuta ). To achieve better results, three construction methods were provided by this pipeline (Fig. 3).
Fig. 3.

Schematic view of three construction methods.

1. The datasets of sequences aligned at the same genomic position between parents in the data of the “genotypingSum_complete.csv” file were directly extracted as co-dominant markers and this genotype data was created with the “ABH” notation. The first approach was that the ABHgenotypeR package was run to correct the error of this genotype data. The genotype files obtained in this method were named “Genotype_csvs”. 2. Regarding the data of each parent, the data in association with the reading frequency per genomic position data in the “genotypingSum_complete.csv” file was compared to its neighboring genomic position and subsequently excluded from the data set in case its data pattern differed significantly. This process was repeated three times. The data aligned to the same genomic position between the parents were extracted as co-dominant markers and the selected genotype data was afterwards created with the “ABH” notation. Moreover, each marker in the selected genotype data was re-compared to its neighboring marker and excluded from the selected genotype data in case significant difference in the data pattern. This process was also repeated three times. The second approach was that the ABHgenotypeR package was run to correct the error of these selected genotype data. The genotype files obtained in this method were named “Genotype_select_csvs”. 3. The data of the reading frequency per position data in the “genotypingSum_complete.csv” file was compared to its neighboring position and subsequently excluded from the dataset when its data pattern differed significantly. This process was repeated three times. The ABHgenotypeR package was run to correct the error of this selected dataset per each parent. The parameter of maxHapLength can be selected from 3, 6, and 9. Moreover, the selected data that aligned to the same genomic position between the parents were extracted as co-dominant markers and selected genotype data were created with the “ABH” notation. In addition, each marker in the selected genotype data was re-compared to its neighboring marker and subsequently excluded from the selected genotype data in case of significant difference in its data pattern. This process was also repeated three times. The third approach was that the ABHgenotypeR package was run to correct the error of these selected genotype data. The genotype files obtained in this method were named “Genotype_select_ABHR (maxHapLength)_csvs”. Two patterns of the error-correction approach by the ABHgenotypeR were set up in which the order of the functions was switched (Fig. 4). “S” means the correctStretches function, and “U” means the correctUndercalledHets function. The parameter of the maxHapLength was set from 1 to 10 for each of the 10 error-corrected csvs files to be created in the save folder. Regarding this RAD-R script, three construction methods were shown in Fig. 3 as well as two correction approaches were depicted in Fig. 4, therefore the total number of combinations included six patterns of genotype data. Besides, as four types of file versions were available based on the mapping quality of the BWA, including “mem”, “mem_60”, “aln”, and “aln_37”, 24 patterns were identified in total. Subsequently, 10 error-corrected csvs files (maxHapLength = 1~10) were created for each pattern, making a total of 240 error-corrected csvs files. The genotype images of these error-corrected csvs files were summarized into 12 PDFs saved in the “Genotype_images_PDF_files” folder to provide an easy method for visually checking the results of each correction approaches. Moreover, duplicate markers with the same genotype data were deleted from each csvs file using the findDupMarkers and the drop.markers functions of the R/QTL package. In addition, the number of markers before and after the deletion of the duplicate markers were summarized in the “Maps_list_(mem and aln).csv” files saved in the “Genotype_images_PDF_files” folder. Using these PDF and CSV files as a reference, genotype data that are more suitable for the QTL analysis can be selected from 240 error-correcting csv files. The names of the genotype data files were indicated by the “mode of BWA (mem, mem_60, aln, and aln_37”, the “construction method (csvs, select, and ABH)” and the “correction approach (the value of maxHapLength plus SU and the US)”.
Fig. 4.

Schematic view of two, SU and US, correction approaches.

Composite interval mapping (CIM) using R/QTL

In the “4. QTL” tab of the Excel file, in order to select the genotype data file to be analyzed by the R/QTL package (Broman ), an appropriate should be selected first as a part of the file name from the four tabs lists settled in green cells. In this way the selected genotype data file would be displayed in the yellow cell. Two visualization functions are offered by the Genotype_Freq_and_Density script along with the physical position of the chromosomes. The plotMarkerDensity function allows the plotting of the density of the markers, whereas the plotAlleleFreq function allows the plotting of the parental allele frequencies. Moreover, the CIM_plot script can provide the visual results of the CIM for all phenotypes. For significant LOD peaks, more detailed results for each locus can be obtained with the Re-CIM_each_Trait script as well as more detailed LOD peak plots can be acquired with the Re-plot script.

Verification test

To demonstrate and illustrate the features of the RAD-R scripts, the publicly available sequence data from the study by Seki along with the accession number PRJNA523045 (ddRAD-seq) were used for the analysis. The verification test of the RAD-R script was performed using data previously reported on the identification of LsTCP4, a causative candidate gene for marginal leaf shape in lettuce (Seki ). Filtered sequence reads were mapped onto the L. sativa v8.0 genome (https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Lsativa_er) via the utilization of the mem and the aln modes of the BWA. Using the alignment data associated with these modes, the linkage maps constructed using RAD-R scripts and stacks were compared. The total distance (centimorgan: cM) for the genotype data was calculated using the estimate.map option of the read.cross function in the R/QTL package.

Data availability

The reference sequence of Lactuca sativa L. used in this study can be found at Phytozome (DOE-JGI, https://phytozome.jgi.doe.gov/pz/portal.html). The high-throughput sequencing data were deposited to the Sequence Read Archive with the accession number PRJNA523045 (ddRAD-seq).

Results

From raw sequence reads to linkage map construction

The percentage of the sequence reads that had the highest mapping quality, namely the “mem_60” and the “aln_37” among those aligned to the reference genome sequence with the BWA were 44.2% and 61.4% in mem and aln modes, respectively (Table 1). The trend was similar for both parents. Without the sequence reads which were not detected in F2 plants, the datasets were reduced from 2078649, 917849, 1607719, and 986689 to 756098 (36.4%), 325665 (35.5%), 693132 (43.1%), and 405243 (41.1%) in “mem”, “mem_60”, “aln”, and “aln_37”, respectively (Table 1). Moreover, the datasets were reduced to 60505 (2.9%), 30102 (3.3%), 64106 (4.0%), and 42937 (4.4%) in “mem”, “mem_60”, “aln”, and “aln_37”, respectively, due to excluded statistical outliers of p values by the chi-square test (Table 1). The summary of the linkage maps constructed was presented in Table 2. Using the RAD-R script, four types of BWA datasets (mem, mem_60, aln, and aln_37) were present, three types of map construction methods (ABH, select, and csv) were identified, and twenty types of error-correction approaches (Each parameter of maxHapLength for SU and US ranges from 1 to 10) were defined, and 240 types of linkage maps could be created. The number of markers ranged from 1954 (mem_60, select) to 5214 (aln, csvs). In regard of the comparison of the linkage maps created by the three construction methods, the number of markers was higher in the order of csv, ABH, and select. In addition, the linkage maps created using stacks had 3908 markers in mem and 3871 markers in aln. The summary of the genotypes of the linkage maps was summarized in Table 3. The minimum and maximum genetic distances after correction were 1269 cM (mem, select, 10US) and 5305.9 cM (aln, csvs, 5US), respectively. In stacks, 15443.7 cM was observed in mem and 16706.6 cM in aln. The linkage maps created using the RAD-R scripts had the missing values in the range of 0.02% to 2.2%, whereas the linkage maps created using stacks had the missing value of 10.44% in mem and 9.89% in aln. The constituted ratio of “A”, “H”, and “B” in each genotype data was compared against the expected ratio of 1:2:1 by goodness-of-fit chi-square tests. Since the all null hypotheses were not rejected, the correction approaches were considered to be appropriate.
Table 1.

Summary of a flow from raw sequence reads to a list of genotype data

parentLinkage groupmemmem_60alnaln_37
Number of aligned position to the genome sequence by BWAP1LG1-91018399464798788105495881
P2LG1-91060250453051819614490808
Number of datasets without no-count data of F2 plantsP1LG1-9365898162612329219197382
P2LG1-9390200163053363913207861
Number of datasets with p-value > 0.001 by chi-square testP1LG1-930190167433317223903
P2LG1-930315133593093419034
Table 2.

Summary of the linkage maps constructed by in silico analysis

In silico analysisBWAConstruction methodNumber of co-dominant marker
LG1LG2LG3LG4LG5LG6LG7LG8LG9LG1-9
RAD-R scriptsmemcsvs6395784932157992282846735524461
select425390265584541051343702912492
ABH457416300754961161584173552790
mem_60csvs422391287855301521754043462792
select3432932014638599992732151954
ABH365308228524181061183182512164
alncsvs7516326112319122713737536805214
select497416335745681291814164043020
ABH528444377946141432094764623347
aln_37csvs6285434731427692172876045724235
select516416327805611441773973782996
ABH545441367966101532034604513326
Stacksmem6345344101277352072425514683908
aln6305114301167042182615444573871
Table 3.

Summary of the genotypes of the linkage maps constructed by in silico analysis

In silico analysisBWAConstruction methodCorrection approachThe linkage map
No. MarkerTotal distance (cM)AHBMissing valueChi-test (1:2:1)
No. Genotype(%)No. Genotype(%)No. Genotype(%)No. Genotype(%)
RAD-R scriptsmemcsvs4461298622.611507026.8718987044.3411391026.6094062.200.643
6US5059.39659922.5623650555.239466322.104890.110.571
select249212690.16513927.2310662744.576490427.1325621.070.616
10US1269.05908224.7012137950.745867924.53920.040.988
ABH2790105437261927.1112205745.577156426.7216000.600.709
10US1362.86641324.8013568850.666561724.501220.050.990
mem_60csvs279225650.27424627.7011548143.087405127.6342541.590.467
7US1410.76614124.6813602750.756569324.511710.060.988
select1954103505134827.378277044.125133127.3621351.140.566
10US1301.24652624.809476450.524622224.64720.040.994
ABH21646239.35704127.469315144.845626127.0812910.620.623
10US1332.45198125.0210434050.235129024.691330.060.998
alncsvs521430891913467426.9122255544.4613282026.54104952.100.653
5US5305.911562823.1027181754.3011255022.495490.110.681
select3020151277877127.1713005144.867819126.9729071.000.646
6US1370.87156324.6814745950.867083024.43680.020.984
ABH33479848.88711427.1114726445.838514126.5017930.560.735
10US1493.17979824.8416316450.787818224.331680.050.985
aln_37csvs423555102.511203227.5617724643.6011101827.3162641.540.524
5US2963.19989924.5720772151.099870124.282390.060.974
select299615275.87845727.2812828844.607784827.0730231.050.619
9US1334.67099524.6814629450.867026124.43660.020.984
ABH33268485.18710627.2814517545.478516726.6718480.580.694
10US1407.87948824.8916184450.697781324.371510.050.987
stacksmem390815443.78264422.0317204945.868129921.673917610.440.566
aln387116706.68242922.1817118846.078126021.87367399.890.600

QTL mapping of the locus for validation test

To validate the accuracy of the genotype data constructed by in silico analysis, QTL mapping with CIM was carried out for the locus of the marginal leaf shape located in LG5. For each mem and aln dataset, a comparison was undertaken in association with the results of the QTL analysis using marker genotype data constructed by the RAD-R scripts and stacks. Although a major LOD peak at the almost appropriate position of LG5 was obtained from all genotype data, the genotype data constructed with stacks was over 1000 cM for mem and 4000 cM for aln even on the LG5 only, and it would not be appropriate to employ this marker genotype data as it is for the QTL analysis. Therefore, to find the genotype data with more appropriate error correction from the 240 genotype data constructed with RAD-R script, the contents of the “Maps_list (mem and aln). csv” and 12 PDF files were subsequently checked. The 12 PDF files that included the images of the genotypes were summarized by varying the value of the parameter of maxHapLength from 1 to 10 in order to facilitate the judgment of the appropriate parameter value in a visual manner. After the review of the content of the PDF files of the “mem” and “aln” data, the correction of the “mem, select, 10US” and “aln, select, 6US” could be visually judged. The genotype data of “aln” was identified to be the most accurate to pinpoint the fine-mapping of the locus of the marginal leaf shape. The results demonstrated that the locus of the leaf marginal serration was located from the 251.599 to the 253.367 Mbp at an interval of 2.1 cM on LG5, as well as the genotype of the RAD marker designated as LG5_v8_252.185_Mbp showed complete co-segregation with the leaf phenotype based on the present F2 population. On the other hand, the results of the “aln” with stacks showed that the locus was located from the 251.386 to the 253.367 Mbp at an interval of 2.6 cM on LG5. The LG5_v8_252.185_Mbp marker was also present in the marker genotype data of the “aln” with stacks, although there were a significant number of missing values resulting in hurdles to identify their complete consistency with the phenotype data. Thus, the results of the validation test showed that the genotype data constructed with the RAD-R scripts could be mapped more accurately than stacks.

Discussion

R, a free software that is highly reliable and common for users was decided to be used (Broman , R Development Core Team 2017) as it has a wide range of packages to handle not only statistical processing but also biological data, making it suitable for handling big data on nucleotide sequences (Gentleman ). Moreover, due to the flexibility of R, the R environment has been established as a versatile and comprehensive platform for the development of the analysis pipeline. The RAD-R scripts are a user-friendly R pipeline that can be implemented using the Excel software and provides an automated flow that could control the quality of raw sequence reads as well as to run, to align, and to reference genome sequences with the BWA, to correct errors by the ABHgenotypeR package, to create 240 error-corrected genotype data, and to perform the QTL analysis. Users without extensive bioinformatics knowledge can easily perform RAD-seq analysis on Windows 10 through the application of the RAD-R scripts. One of the most important issues in the RAD-seq analysis is that the data produced by high-throughput sequencing has a considerable error-rate. This issue can be dealt with data correction and imputation, that is time-consuming and requires specific bioinformatics awareness. Although the linkage maps constructed with stacks showed high potential, the marker genotype data contained many missing values. Post correction would provide quite accurate genotype data with specialized software, such as TASSEL (Glaubitz ), whereas the RAD-R scripts are pipelines that include post correction by default. In this way users are able to perform the analysis in an easier manner. Furthermore, the RAD-R scripts allow the correction of genetic errors, such as unexpected markers in RAD-seq data by the ABHgenotypeR package in order to obtain better results during the QTL analysis. The graphical maps of genotype data created by RAD-R script showed that unexpected markers like biallelic markers that differed in genetic pattern among their neighbor markers were found in multiple genomic positions (Fig. 5). The total distance of the linkage map became enormous due to the abundance of unlinked markers. In the ABHgenotypeR package, the degree of error correction could be adjusted by the value of the maxHapLength parameters. Although the large value of which might reduce the total distance of linkage maps, the error correction may be excessive. With this in mind, during the selection of marker genotype data for the QTL analysis, it is recommended to check the genotype images of the corrected data. By visually validating 12 PDF files containing 240 error-corrected genotype data, users can easily select the appropriate genotype data for the QTL analysis. Besides, the QTL analysis results for the selected genotype data can be quickly obtained with the “4. QTL” script using the R/QTL package.
Fig. 5.

Graphical representations of genotype data constructed by stacks and RAD-R scripts and the LOD plot of the leaf marginal serration locus at LG5 by CIM with R/QTL.

By incorporating the chi-square tests into the pipeline, reads with a low mapping quality of the BWA could be also employed for markers of linkage map construction. This is based on the view that the segregation ratio of the F2 population is more important for the QTL analysis than the mapping quality of the BWA to the reference genome sequence. In this way RAD-R scripts have great potential to contribute to the RAD-seq analysis for the F2 population using parent with a low mapping quality of the BWA, for example the wild type. In conclusion, a new tool was developed, called RAD-R scripts. This R pipeline would offer fairly reliable genotype data comparable with stacks and can be of use to a large variety of users with limited experience due to the simplicity of copying and pasting R scripts displayed in Excel cells into the R console.

Author Contribution Statement

The RAD-R scripts were developed, the data analysis was performed, and the manuscript was written by KS.
  15 in total

1.  R/qtl: QTL mapping in experimental crosses.

Authors:  Karl W Broman; Hao Wu; Saunak Sen; Gary A Churchill
Journal:  Bioinformatics       Date:  2003-05-01       Impact factor: 6.937

2.  Bioconductor: open software development for computational biology and bioinformatics.

Authors:  Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal:  Genome Biol       Date:  2004-09-15       Impact factor: 13.583

3.  Stacks: building and genotyping Loci de novo from short-read sequences.

Authors:  Julian M Catchen; Angel Amores; Paul Hohenlohe; William Cresko; John H Postlethwait
Journal:  G3 (Bethesda)       Date:  2011-08-01       Impact factor: 3.154

4.  Adapting Genotyping-by-Sequencing for Rice F2 Populations.

Authors:  Tomoyuki Furuta; Motoyuki Ashikari; Kshirod K Jena; Kazuyuki Doi; Stefan Reuscher
Journal:  G3 (Bethesda)       Date:  2017-03-10       Impact factor: 3.154

5.  A CIN-like TCP transcription factor (LsTCP4) having retrotransposon insertion associates with a shift from Salinas type to Empire type in crisphead lettuce (Lactuca sativa L.).

Authors:  Kousuke Seki; Kenji Komatsu; Keisuke Tanaka; Masahiro Hiraga; Hiromi Kajiya-Kanegae; Hideo Matsumura; Yuichi Uno
Journal:  Hortic Res       Date:  2020-02-01       Impact factor: 6.793

6.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

7.  Development of genome-wide simple sequence repeat markers using whole-genome shotgun sequences of sorghum (Sorghum bicolor (L.) Moench).

Authors:  Jun-ichi Yonemaru; Tsuyu Ando; Tatsumi Mizubayashi; Shigemitsu Kasuga; Takashi Matsumoto; Masahiro Yano
Journal:  DNA Res       Date:  2009-04-10       Impact factor: 4.458

8.  An evaluation of genotyping by sequencing (GBS) to map the Breviaristatum-e (ari-e) locus in cultivated barley.

Authors:  Hui Liu; Micha Bayer; Arnis Druka; Joanne R Russell; Christine A Hackett; Jesse Poland; Luke Ramsay; Pete E Hedley; Robbie Waugh
Journal:  BMC Genomics       Date:  2014-02-06       Impact factor: 3.969

9.  TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline.

Authors:  Jeffrey C Glaubitz; Terry M Casstevens; Fei Lu; James Harriman; Robert J Elshire; Qi Sun; Edward S Buckler
Journal:  PLoS One       Date:  2014-02-28       Impact factor: 3.240

10.  A high-density SNP Map of sunflower derived from RAD-sequencing facilitating fine-mapping of the rust resistance gene R12.

Authors:  Zahirul I Talukder; Li Gong; Brent S Hulke; Venkatramana Pegadaraju; Qijian Song; Quentin Schultz; Lili Qi
Journal:  PLoS One       Date:  2014-07-11       Impact factor: 3.240

View more
  1 in total

1.  Detection of candidate gene LsACOS5 and development of InDel marker for male sterility by ddRAD-seq and resequencing analysis in lettuce.

Authors:  Kousuke Seki
Journal:  Sci Rep       Date:  2022-05-05       Impact factor: 4.996

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.