Literature DB >> 31231761

Highly multiplexed AmpliSeq technology identifies novel variation of flowering time-related genes in soybean (Glycine max).

Eri Ogiso-Tanaka1, Takehiko Shimizu1, Makita Hajika1, Akito Kaga1, Masao Ishimoto1.   

Abstract

Whole-genome re-sequencing is a powerful approach to detect gene variants, but it is expensive to analyse only the target genes. To circumvent this problem, we attempted to detect novel variants of flowering time-related genes and their homologues in soybean mini-core collection by target re-sequencing using AmpliSeq technology. The average depth of 382 amplicons targeting 29 genes was 1,237 with 99.85% of the sequence data mapped to the reference genome. Totally, 461 variants were detected, of which 150 sites were novel and not registered in dbSNP. Known and novel variants were detected in the classical maturity loci-E1, E2, E3, and E4. Additionally, large indel alleles, E1-nl and E3-tr, were successfully identified. Novel loss-of-function and missense variants were found in FT2a, MADS-box, WDR61, phytochromes, and two-component response regulators. The multiple regression analysis showed that four genes-E2, E3, Dt1, and two-component response regulator-can explain 51.1-52.3% of the variation in flowering time of the mini-core collection. Among them, the two-component response regulator with a premature stop codon is a novel gene that has not been reported as a soybean flowering time-related gene. These data suggest that the AmpliSeq technology is a powerful tool to identify novel alleles.
© The Author(s) 2019. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

Entities:  

Keywords:  AmpliSeq; flowering time-related gene; genotyping; soybean; target re-sequencing

Mesh:

Year:  2019        PMID: 31231761      PMCID: PMC6589554          DOI: 10.1093/dnares/dsz005

Source DB:  PubMed          Journal:  DNA Res        ISSN: 1340-2838            Impact factor:   4.458


1. Introduction

Flowering time is critical for successful seed production by plants. Flowering time and maturity are the most important traits to determine the adaptability of soybean [Glycine max (L.) Merr.] cultivation. These not only restrict the cultivation area but also greatly affect plant architecture and yield., Therefore, it is necessary to clarify the genetic factors affecting flowering time and maturity and control them using a combination of alleles with different genetic effects on flowering time. To combine such alleles freely based on DNA marker-assisted selection, a catalogue of alleles for breeding materials will be necessary. Soybean is a typical short-day plant. Several functional nucleotide polymorphisms responsible for diversity in flowering time among cultivars are already known. Classical maturity loci designated as E loci have been characterized, including E1 and E2,E3,E4,E7,E8,E9, and E10. Of these, E1,E2,E3,E4, and E9 have been isolated as flowering time-related genes. E1 encodes putative transcriptional factor containing plant-specific B3 domain.E2 encodes a homologue of GIGANTEA.E3 and E4 encode a homologue of the photoreceptor phytochrome A (PHYA).E9 encodes the florigen protein FT2a. In the E1 gene, three alleles, namely e1-as (=e1 designated by Bernard), e1-fs, and e1-nl, have been reported as early flowering phenotype under long-day conditions. The e1-as allele has a single missense mutation (Arg15Thr) in the coding region. The e1-as genotype promotes flowering for ∼10 days compared with that by the E1 genotype under natural day-length conditions at Matsudo, Japan (35°78′N, 139°90′E). The e1-fs allele has a 1-bp deletion, resulting in a premature stop codon in the cultivar Sakamotowase.e1-nl is a null allele in which ∼142 kb, including the entire E1 gene, is deleted in some early flowering cultivars. In contrast, only one e2 allele has been reported in the E2 gene. The e2 allele has one premature stop codon mutation due to single nucleotide polymorphism (SNP) in the 10th exon. The e2 genotype promotes flowering for ∼9 days under natural day-length conditions at Tsukuba, Japan (36°03′N, 140°04′E). In the E3 gene, e3-Mo, e3-fs, e3-tr (=e3 designated by Buzzell), and e3-ns have been reported as nonfunctional alleles., The e3-Mo alleles have SNP for a non-synonymous amino acid substitution (G1050R) in the third exon., The e3-fs allele has a single base insertion in the exon, resulting in frameshift mutation. The e3-ns allele has a nonsense mutation in which a single nucleotide substitution in exon 3 creates a stop codon. The e3-tr alleles lack a 13.33-kb genomic region including a part of exons 3 and 4., These nonfunctional alleles promote flowering under long-day conditions. In addition, E3-Mi and E3-Ha also have been reported as functional alleles., The E3-Mi alleles have a 2.633-kb deletion in the third intron. As for the E4 gene, five nonfunctional alleles, viz., e4-SORE-1, e4-kam, e4-kes, e4-oto, and e4-tsu, have been reported.,, The e4-SORE-1 (=e4 designated by Buzzell and Voldeng) alleles have a Ty1/copia-like retrotransposon (SORE-1) insertion in first exon, resulting in a nonfunctional allele. These E1–E4 genes can result in variation in flowering time by controlling the expression of FLOWERING LOCUS T (FT) genes, FT2a and FT5a.,,, The florigen protein FT2a is encoded by E9. The e9 allele has a SORE-1 insertion in the first intron. Although eight SNPs and six InDels in the E9 have been reported, the influence on gene function is unknown. The FT5a gene was identified as qDTF-J, which promotes the flowering time for ∼5 days under natural day-length conditions at Hokkaido, Japan (43°07′N, 141°35′E). In the FT5a gene, 13 SNPs and 3 InDels are reported only in the promoter and untranslated regions (UTRs) in 439 cultivated and wild soybean accessions. The functional nucleotide polymorphisms of the four E genes (E1–E4) are useful to predict the flowering time and could explain ∼62–66% of the phenotypic variation in flowering time among 63 Japanese accessions under long-day conditions. However, prediction of flowering time will be difficult if the breeding materials have unknown alleles affecting flowering phenotype. Therefore, development of a sequencing system that can easily capture as many alleles as possible is required. Recently, it became possible to obtain whole-genome information easily with the development of next-generation sequencing (NGS) technologies. However, it is still expensive for re-sequencing large genomes. Moreover, it is necessary to have analytical and storage environments to deal with enormous amounts of whole-genome sequence data of genetic resources. Target re-sequence is one of the alternative sequencing methods to obtain sequence data of a limited region, which can minimize cost and time for data analysis and decrease data storage. The AmpliSeq technology (Thermo Fisher Scientific, Waltham, MA, USA) is one of the target re-sequencing technologies, a multiplex polymerase chain reaction (PCR)-based assay targeting regions of interest. The AmpliSeq Designer designs primer set that amplifies PCR products ranging from 75 to 375 bp in the target region, and multiplex-PCR products are sequenced by NGS. The method enables amplification of ∼6,000 amplicons by ultra-high multiplex PCR and constructs a targeted sequencing library in 10 h. In routine genotyping of crop breeding, NGS-based techniques need to meet several criteria. The processing time between sample collection and interpretation of sequencing should be short. Furthermore, it is necessary to construct libraries using limited amount of input DNA including partially degraded DNA sample and the read depth must be deep enough to detect variant accurately. The Ion Torrent platform, in combination with the AmpliSeq multiplex PCR can use DNA input of as low as 10 ng, and the processing time between sample collection and sequence analysis can be finished within 5 days. The AmpliSeq technology is frequently used for studying human inherited cancer, but it can also be applied to plant and agronomic research. In this study, we applied AmpliSeq technology to clarify the alleles of flowering time-related genes and their homologues in diverse soybean germplasm to identify novel and known variations associated with flowering time.

2. Materials and methods

2.1. Plant materials and DNA extraction

DNA was extracted from 192 accessions of a soybean mini-core collection, provided by Genebank, NARO (Supplementary Table S1). Of these, 122 accessions were sown and germinated in plastic pots on rock wool material ‘grodan’ (Nittobo, Tokyo, Japan) moistened with water. After 10 days under 12 h light/12 h dark conditions at 25°C, the first leaf was collected in a 2-ml tube. The leaf tissue of 38 samples was ground in liquid nitrogen and CTAB buffer, and then immediately used for DNA extraction manually. The remaining 84 samples were dried in a freeze dryer (FDU-2100, EYELA, Tokyo, Japan). These samples were lyophilized at −80°C for 12 h and stored at 4°C. The dried leaves were crushed using a ShakeMaster (Bio Medical Science Inc. Tokyo, Japan) and the leaf powder was used to extract DNA. DNA from 44 samples was extracted using the CTAB DNA extraction kit (NR-502, KURABO, Osaka, Japan) and DNA extraction robot PE-480 (GENE PREP STAR, KURABO). DNA from another 40 samples was extracted by the bead-based method of the BioSprint 96 DNA Plant Kit on robotic workstation (QIAGEN, Hilden, Germany). From the other 70 samples, DNA was extracted from the seed tissue using the BioSprint 96 DNA Plant Kit on robotic workstation, according to the manufacturer’s instruction (QIAGEN). The seed tissue samples were obtained by scraping dried seed and crushing using Zirconia beads and TissueLyser II (QIAGEN). The quality of extracted DNA was evaluated based on the DNA integrity number, which is an index showing the fragmentation degree of DNA using TapeStation (Agilent Technologies, Santa Clara, CA, USA). The DNA concentration was measured using the Qubit Fluorometer (Thermo Fisher Scientific) by exciting at 485 nm and measuring the fluorescence intensity at 520 nm. The instrument was calibrated with the Quant-iT dsDNA BR Assay kit (Thermo Fisher Scientific), according to the manufacturer’s instructions.

2.2. Ion AmpliSeq custom panel design

A custom panel targeting 29 genes was designed based on Soybean reference genome version 1.1 using the Ion AmpliSeq Designer tool version 1.2.9 using the standard DNA (125–275 bp amplicon target sizes) option. Two primer pools were designed to amplify 382 amplicons, covering 29 target genes of total length 64.98 kb (Table 1 and Supplementary Tables S2 and S3). These included the coding regions of B3 domain containing genes (E1 and homologue), Phytochrome A genes (E3 and E4), Phytochrome B genes, FT/TERMINAL FLOWER 1 (TFL1) family genes (including FT2a, FT5a, and Dt1), two-component response regulator-like genes, MADS box gene, WD repeat-containing gene (WDR61), Achaete-scute transcription factor gene, and a part of the exon of GIGANTEA (E2).
Table 1

Genomic regions or SNP targeted by the AmpliSeq design

 Gmax275 (ver. 2.0)
  Gmax189 (ver. 1.1)
DescriptionaGene namebTargetTotal designed target length of ampliconscTarget sizedNumber of ampliconeCoveragef (%)Reference
Gene IDChr.StartEndGene IDChr.StartEnd
Glyma.02G069500261163796117379Glyma02g07650260417706042409FLOWERING LOCUS T GmFTL7 Exon + UTR8616396100.037, 38
Glyma.03G19470034052259740525110Glyma03g3525034253407842535865TERMINAL FLOWER 1 GmTFL2 Exon + UTR1,5771,0039100.037, 38
Glyma.03G22730034291877142923401Glyma03g3862034492573044930360Phytochrome A GmPHYA4 Exon + UTR3,6882,8762098.6 
Glyma.04G143300g42612001126120532Glyma18g22670182573992925740831B3 domain-containing proteinh E1Lb Exon + UTR1,0379025100.011, 54
Glyma.04G15640043675812536758770Glyma04g2464042829393328294806B3 domain-containing proteinh E1La Exon + UTR9678735100.011, 54
Glyma.06G20780062020707720207940Glyma06g2302662000692820007814B3 domain-containing proteinh E1 Exon + UTR1,0598866100.03, 4, 11
Glyma.08G36310084745814247459829Glyma08g4781084660693446608654FLOWERING LOCUS T GmFT4 Exon + UTR1,3007468100.020
Glyma.08G36320084747288147473362Glyma08g4782384662170446622185FLOWERING LOCUS T GmFT6 Exon + UTR6122653100.037, 38
Glyma.09G035500929603952967229Glyma09g03990929198872926740Phytochrome B GmPHYB1 Exon + UTR5,1504,3362799.732
Glyma.09G14350093565221935653967Glyma09g2655093304910733050904BROTHER OF FT AND TFL 1 GmTFL4 Gene1,8131,1221198.737
Glyma.10G141400103748956037495624Glyma10g28170103696252136968813Phytochrome A   Exon + UTR5,2124,0652994.614
Glyma.10G221500104529473545316121Glyma10g366001044732730GIGANTEA E2 SNP22211100.03, 4, 12
Glyma.12G0739001255083655522772Glyma12g078611254965655511828Two-component response regulator-like   Exon + UTR4,1162,87124100.0 
Glyma.15G140000151143555111442683Glyma15g14980151141549511422656Phytochrome B   Exon + UTR5,4454,5422898.132
Glyma.16G0441001641358854137742Glyma16g048301641150334116923FLOWERING LOCUS T GmFT5a/GmFTL4 Exon + UTR1,6831,109998.015, 19, 37
Glyma.16G0442001641625254164824Glyma16g048401641417744144073FLOWERING LOCUS T GmFT3a/GmFTL1 Exon + UTR1,031486692.715, 37
Glyma.16G150700163110999931114963Glyma16g26660163074166030746677FLOWERING LOCUS T GmFT2a/GmFTL3 Exon + UTR1,515935999.49, 15, 18, 33, 37
Glyma.16G151000163114882931151842Glyma16g26690163078049630783509FLOWERING LOCUS T GmFT2b/GmFTL5 Exon + UTR860464588.015, 37
Glyma.16G196300163577781535779317Glyma16g32080163527414735275762BROTHER OF FT AND TFL 1 GmTFL3 Exon + UTR1,5541,038699.037
Glyma.16G200700163617989136187469Glyma16g32540163567658135684221MADS box protein   Exon + UTR2,5341,0951598.848, 49
Glyma.17G0521001739555183958432Glyma17g059901742258394228888WD repeat-containing protein 61 WDR61 Exon + UTR1,7431,3189100.050
Glyma.17G0905001770525067053858Glyma17g098101773172717318756Achaete-scute transcription factor-relatedi   Exon + UTR1,5941,1669100.0 
Glyma.19G108100193603063236032867Glyma19g28390193584998135852216FLOWERING LOCUS T GmFT3b/GmFTL2 Exon + UTR9524635100.015, 37
Glyma.19G108200193604911136051851Glyma19g28400193586846035871203FLOWERING LOCUS T GmFT5b/GmFTL6 Exon + UTR1,113560690.015, 37
Glyma.19G194300194518335745185175Glyma19g37890194497974344981657TERMINAL FLOWER 1 Dt1/GmTFL1 Exon + UTR1,4111,0788100.037, 53
Glyma.19G224200194763305947641958Glyma19g41210194751109547520052Phytochrome A E3 Exon + UTR5,2354,4002798.63
Glyma.19G260400195036471850369677Glyma19g44970195024404650249070Pseudo-response regulator 5   Gene5,6734,9413198.3 
Glyma.20G090000203323601833241692Glyma20g22160203208741232093306Phytochrome A E4 Exon + UTR4,8714,0762599.23, 14
Glyma.U034500gScaffold_32197150220019Glyma11g15580111123227111255186Two-component response regulator-like   Exon + UTR5,3523,8333098.1 

Gene description from Phytozome 12.

Gene name refers to Kong et al., Wu et al., Fan et al., and Cao et al.

Total size of amplified region by designed amplicon primers.

Target size (bp) of amplicon based on soybean genome version 1.1 (Gmax189).

Total number of designed amplicons on target gene.

Percentage of target region covered by amplicon.

Glyma.04G143300 and Glyma.U034500on Gmax275 genome version were different chromosome positions on Gmax189.

The gene annotation manually curated.

‘Achaete-scute transcription factor related’ gene was included as control for variant detection.

Genomic regions or SNP targeted by the AmpliSeq design Gene description from Phytozome 12. Gene name refers to Kong et al., Wu et al., Fan et al., and Cao et al. Total size of amplified region by designed amplicon primers. Target size (bp) of amplicon based on soybean genome version 1.1 (Gmax189). Total number of designed amplicons on target gene. Percentage of target region covered by amplicon. Glyma.04G143300 and Glyma.U034500on Gmax275 genome version were different chromosome positions on Gmax189. The gene annotation manually curated. ‘Achaete-scute transcription factor related’ gene was included as control for variant detection.

2.3. Library preparation and sequencing

The Qubit dsDNA High Sensitivity Assay Kit (Thermo Fisher Scientific) was used to quantify DNA for NGS library construction. The NGS library was constructed using the Ion AmpliSeq Library Kit 2.0 (Thermo Fisher Scientific), according to the manufacturer’s protocol (Japanese version corresponding to Manual 2014.7 rev.B.0 version: https://assets.thermofisher.com/TFS-Assets/LSG/manuals/MAN0013432_Ion_AmpliSeq_Library_Prep_on_Ion_Chef_UG.pdf, 25 March 2019, date last accessed). For the multiplex-PCR amplification, 1–10 ng of each DNA was amplified using one primer pool (191 amplicon primer pairs) per reaction. This was performed using 4 µl of 5× Ion AmpliSeq HiFi Master Mix, 10 µl of 2× AmpliSeq Custom Primer Pool, 1–10 ng of DNA, and the volume was made up to 20 µl with nuclease-free water. The reaction mix was heated for 2 min at 99°C for enzyme activation, followed by 18 two-step cycles at 99°C for 15 s and at 60°C for 4 min, and ending with a holding period at 10°C. As for low-quality DNA samples, 21 cycles were subjected under similar conditions. The primers of amplicons were digested and phosphorylated for adapter ligation using 2 µl of FuPa enzyme per sample at 55°C for 10 min, followed by enzyme inactivation at 60°C for 20 min. To enable multiple libraries to be loaded per chip, 2 µl of a unique diluted mix, including Ion Xpress Barcode and Ion P1 Adapters at standard volumes, was ligated to the end of the digested amplicons using 2 µl of DNA ligase at 22°C for 30 min, followed by ligase inactivation for 10 min at 72°C. The resulting un-amplified adapter-ligated library was purified using 45 µl of Agencourt AMPure XP Reagent (Beckman Coulter, Brea, CA, USA), followed by washing using 150 µl of freshly prepared 70% ethanol. After purification, 50 µl of Platinum PCR SuperMix High Fidelity and 2 µl of Library Amplification Primer Mix of the Ion AmpliSeq Library Kit 2.0 were added to the dried AMPure XP beads, and then the reaction plate was placed on a magnetic rack to separate the beads from the supernatant. The amplicon library in the supernatant was further amplified to enrich the material for accurate quantification at 98°C for 2 min, followed by five two-step cycles at 98°C for 15 s and at 60°C for 1 min. The amplified amplicon library was then purified using 25 µl of AMPure XP, followed by a second purification step with 60 µl of AMPure XP and 150 µl of freshly prepared 70% ethanol. The concentration and size distribution of amplicons in the library were then determined using an Agilent BioAnalyzer DNA High-Sensitivity chip or TapeStation 4200 D1000 chip (Agilent Technologies), according to the instruction of the manufacturer. After quantification, each library was diluted to a concentration of 100 pM prior to template preparation. Subsequently, the libraries were pooled in equimolar amounts prior to further processing. Emulsion PCR, emulsion breaking, and enrichment for template preparation of ion sphere particles were performed using the Ion 520 & 530 and 540 Kit-Chef (Thermo Fisher Scientific) according to the instruction of the manufacturer. After the preparation of ion sphere particles, sequencing was performed with an Ion Torrent Ion S5 or S5XL system using Ion 520 and 540 Chip (Thermo Fisher Scientific), according to the instruction of the manufacturer.

2.4. Data analysis

The Ion S5/S5XL sequence data were mapped to the soybean genome reference version 2.0 (Gmax275: https://genome.jgi.doe.gov/portal/pages/dynamicOrganismDownload.jsf?%20organism=Phytozome, 25 March 2019, date last accessed) using Ion Torrent Suite version 5.2.1 software. In typical genome databases of soybean (Williams 82), such as, Phytozome and Soybase, Gmax275 is widely used instead of Gmax189. The assembly size and number of predicted protein-coding loci of Gmax275 are 978 Mb and 56,044, which are higher than 969.6 Mb and 46,430 of Gmax189, respectively. In the Gmax275 assembly, several genes are located on chromosomes/scaffolds different from those of Gmax189. For example, 238 genes on the chromosomes of Gmax189 are located on the scaffolds of Gmax275, whereas 100 genes on the scaffolds of Gmax189 are located on the chromosomes of Gmax275 (http://www.soybase.org/correspondence/methods.txt, 25 March 2019, date last accessed). In this study, one of the target genes, Glyma18g22670 (B3 domain-containing protein), on chromosome 18 of Gmax189 was located on chromosome 4 (Glyma.04G143300) in Gmax275. However, the structure of all target genes used in this study was the same between Gmax189 and Gmax275. Ion Torrent Suite software was optimized for Ion Torrent raw data analysis; alignment using Torrent Mapping Alignment Program (TMAP) version 5.2.25 and Coverage Analysis v.5.8.0.8, and variant calling using Torrent Variant Caller version 5.2.1.38 and plug-in version 5.2.25. To evaluate PCR amplification efficiency of each amplicon, the amplicons per 100k reads mapped (APKM) as the count scaled by the total number of amplicons sequenced N times per 100k reads as follows: where X represents the read coverage X of target amplicon i. Variant calling was performed using the default (low stringency) and custom parameters (Supplementary Table S4). All accessions used in this study were propagated by the single seed descent method; therefore, all variants should be detected as homozygous theoretically. The parameter of TMAP in Torrent Variant Caller was changed to loosen the judgment condition of homozygous by setting ‘snp_min_allele_freq’ from 0.15 (default) to 0.3. In this condition, the SNP was detected with allele frequency of >70% as homozygous. The InDels were detected as homozygous when allele frequency was >75% (default parameter). Sequence variants detected as heterozygous under these conditions were excluded. The vcf files obtained were annotated and filtered using the snpEff version 4.0e.

2.5. Detection of known and novel SNPs and InDels

The known alleles E1, E2, E3, and E4 were investigated as described above. The polymorphism information from the Single Nucleotide Polymorphism Database (dbSNP: https://www.ncbi.nlm.nih.gov/snp, 25 March 2019, date last accessed, data downloaded on 29 April 2016) was used to investigate whether the detected polymorphism has already been identified in the whole-genome sequence of soybean. As the position of soybean genome in the National Center for Biotechnology Information (NCBI) and Phytozome v12.1 are not consistent, we converted the position of SNP in the dbSNP from the NCBI to that of Gmax275 of Phytozome v12.1. The SNP ID number (rs; refSNP cluster) was used for the SNP name.

2.6. Validation of SNPs and InDels

The detected variants of the Phytochrome A (E3, E4) and FT genes (FT2a and FT5a) were further confirmed by Sanger sequencing. The exon containing the novel variants was amplified by PCR using the primers shown in Supplementary Table S5. The PCR product was purified using Affymetrix ExoSap-IT regent (ExoSap-IT, USB Corporation, Staufen, Germany) and directly sequenced for both sense and antisense strands using Big Dye Terminator version 3.1 (Applied Biosystems, Foster City, CA, USA) in an ABI 3500 Genetic Analyzer (Applied Biosystems), according to the manufacturer’s protocol. The sequences were analysed using Genetics software version 10.0.8 (GENETYX Corp., Japan).

2.7. Gene-based multiple regression association testing for flowering time

Flowering time was evaluated from 2011 to 2013 at the National Institute of Crop Science (36°02′N, 140°11′E), Tsukuba, Japan. Seeds were sown on 12 July 2011, and 10 July 2012 and 2013. A starter fertilizer containing 3, 10, and 10 g m−2 of N, P2O5, and K2O, respectively, was applied. Each accession was planted in single-row plots. Each row was paved 0.7 m apart and each plot comprised 12 plants that were spaced 0.13 m apart. The average days to flowering in each plot were used for analysis. Association between days to flowering and each polymorphic SNPs/InDels was assessed using linear regression, where the simulated trait values across the 190 individuals were regressed onto the numeric code of each SNP and InDel genotype; this tested the null hypothesis of the additive allelic effect on the trait. Regression analyses were performed using ‘lm()’ in R. First, simple linear regression analysis was performed to assess the influence of the detected variant on flowering time at the significance level of P < 0.05. Subsequently, multiple linear regression analysis was performed using the significantly representative variants after removing redundant variants at the significance level of P < 0.05.

3. Results and discussion

3.1. Amplicon design and comparison of library quality using DNA samples derived from the leaf and seed

To evaluate the performance of AmpliSeq, we focused on gene region of 29 genes (Table 1) selected from known genes related to flowering time and their homologues in soybean. A total of 382 amplicon primer pairs consisting of two primer pools (Supplementary Table S3) were designed for the 64.98-kb target region using the AmpliSeq designer tool. These primer pairs covered 98.4% of the target region, ranging from 89.8% to 100%, by overlapping PCR products of total length 70,180 bp (Supplementary Table S2). The average amplicon size including primer region was 237.1 bp ranging from 125 to 275 bp (target region was 65–232 bp). The target gene with the lowest coverage (89.8%) was Glyma.16g044200 (FT-like gene). We examined the DNA quality necessary for AmpliSeq library construction because genotyping is commonly performed using low-quality DNA especially that derived from the seed of soybean for marker-assisted selection. Low-quality DNA derived from the seed was obtained at concentrations of 0.5–3 ng/µl, whereas high-quality DNA from the leaf was obtained at concentrations of 30–50 ng/µl (Fig. 1A). We used 1–10 ng of seed-derived DNA and 10 ng of leaf-derived DNA for preparing AmpliSeq library. To confirm whether low-quality DNA can produce a library of sufficient quality, distribution range of amplicons in the libraries prepared using low-quality DNA was compared with that of high-quality DNA, which is recommended for sequencing using the Agilent 2100 Bioanalyzer or TapeStation 4200 (Fig. 1A and B). No difference was observed between low-quality DNA from the seed and high-quality DNA from the leaf in the size range of amplicons (130–370 bp) or maximum peak amplitude (Fig. 1B and C). These results reveal that the AmpliSeq library of sufficient yield and quality can be prepared from low-quality DNA. We then prepared sufficient amount of library using DNA derived from the leaf or seed of the soybean mini-core collection.
Figure 1

Evaluation of quality of DNA and AmpliSeq library prepared from the DNA using the Agilent 2200 TapeStation system. The AmpliSeq libraries were evaluated using the D1000 screen tape. (A) Quality of the DNA derived from the leaf and seeds. W and E indicate Williams 82 and Enrei, respectively. The numerical assessment of DNA quality ranged from 1 to 10 based on the DNA integrity number (DIN). A high DIN indicates highly intact DNA, whereas a low DIN indicates degraded DNA. (B) Distribution of amplicons in the AmpliSeq library shown as a gel image. (C) Electropherogram of the same AmpliSeq library as shown in (B). Lower (25 bp) and upper (1,500 bp) peak are the standard markers. The middle peak indicates the library.

Evaluation of quality of DNA and AmpliSeq library prepared from the DNA using the Agilent 2200 TapeStation system. The AmpliSeq libraries were evaluated using the D1000 screen tape. (A) Quality of the DNA derived from the leaf and seeds. W and E indicate Williams 82 and Enrei, respectively. The numerical assessment of DNA quality ranged from 1 to 10 based on the DNA integrity number (DIN). A high DIN indicates highly intact DNA, whereas a low DIN indicates degraded DNA. (B) Distribution of amplicons in the AmpliSeq library shown as a gel image. (C) Electropherogram of the same AmpliSeq library as shown in (B). Lower (25 bp) and upper (1,500 bp) peak are the standard markers. The middle peak indicates the library.

3.2. Performance of NGS and uniformity of amplicon coverage

Among 105,761,267 reads obtained, 105,603,249 (99.85%) reads were mapped to Williams 82 reference genome Gmax275 using TMAP and the average read depth across the target region was 1,237× (Supplementary Table S6). According to the on-target rate, 94.12% of the reads was mapped to the targeted regions. The average read length, average on-target rate, and uniformity (percent of reads >0.2× of mean coverage in the sample) of the leaf and seed samples were similar, but a few seed samples showed lower average read length and uniformity (Fig. 2A). The highly fragmented DNA sample showed low amplification of long amplicons (> 200 bp) and low uniformity (Supplementary Fig. S1A and B). Low average amplicon length or low uniformity of the samples might be caused by DNA fragmentation or contamination of the DNA solution.
Figure 2

Comparison of sequence performance metrics classified by the plant materials used for DNA extraction. (A) The box plot of average read length, on-target rate, and uniformity. The on-target rate is on-target percent of the aligned reads. Uniformity is the percent of bases in all the amplicon-targeted regions covered by at least 0.2× the mean base read depth. (B) Average normalized reads (APKM) per sample across 382 amplicons generated from the 192 mini-core collection. The X and Y axes indicate 382 amplicons sorted in their read coverage and APKM shown as the mean on a log scale, respectively.

Comparison of sequence performance metrics classified by the plant materials used for DNA extraction. (A) The box plot of average read length, on-target rate, and uniformity. The on-target rate is on-target percent of the aligned reads. Uniformity is the percent of bases in all the amplicon-targeted regions covered by at least 0.2× the mean base read depth. (B) Average normalized reads (APKM) per sample across 382 amplicons generated from the 192 mini-core collection. The X and Y axes indicate 382 amplicons sorted in their read coverage and APKM shown as the mean on a log scale, respectively. To compare the efficiency of PCR amplification of each amplicon, the APKM was calculated for each amplicon. The magnitude of APKM was similar irrespective of the type of DNA sample between the leaf and seed (Supplementary Fig. S2A). In contrast, there was no relationship between the APKM and read length of amplicons (Supplementary Fig. S2B). The APKM ranged from 0 to 3,333. Only one amplicon (AMPL1040290) had zero read. As the primer pair of AMPL1040290 was designed for the region flanking the TA-repeat microsatellite, it is difficult to amplify by multiplex PCR. We could amplify amplicons of 400–450 bp using the single primer pair of AMPL1040290. We evaluated known alleles as an example to verify whether the reads were correctly mapped. Among two alleles with a large deletion, e3-Mo and e3-tr, at the E3 gene on Chr19, the read mapping status of the e3tr allele, with a 15-kb deletion including the fourth exon, was examined by designing four amplicons, viz., AMPL1040313, AMPL1040314, AMPL1037722, and AMPL1036854 (Fig. 3). The mapped reads from Williams 82 covered the entire fourth exon by the four amplicons (Fig. 3A). In contrast, the reads from PGC010 were mapped to a part of the fourth exon, which could not be mapped because of absence of the fourth exon in PGC010. When we confirmed the sequence of the mapped reads, these are found to contain several polymorphic sites originated from another region (Supplementary Fig. S3A). The primer pair of AMPL1040313 was found to have a similar (2- and 1-bp mismatches in the forward and reverse primers) sequence to that of the E3 homologous gene on Chr3 (Supplementary Fig. S3B). A comparison of sequence of the original amplicon designed for the E3 gene on Chr19 with that designed for the phyA gene on Chr3 revealed that the amplicon of phyA could be mapped preferentially to E3 in the absence of sequence information (Supplementary Fig. S3C). To preferentially output the alignment containing indel, we changed the penalty parameter by using the option ‘-A 10 -M 60 -O 50 -E 1’ to TMAP. The default parameters of TMAP option are as follows: ‘-A’, score for a match [default = 1]; ‘-M’, the mismatch penalty [3]; ‘-O’, the indel start penalty [5]; and ‘-E’, the indel extension penalty [2]. By increasing the penalty values related to base match and InDels, the miss-mapped reads on the fourth exon can be reduced from 2,426 (default parameter) to 3 reads (Fig. 3A). By optimizing these parameters, the miss-mapped reads on the second exon of E3 were also mapped to the correct position, phyA on Chr3 (second exon; Fig. 3A and Supplementary Fig. S4). We also investigated whether the 15-kb deletion can be detected using the read coverage. The APKM of the four amplicons located at the fourth exon of E3 was compared between the E3 and e3tr alleles (Fig. 3B). The number of accessions classified as E3 and e3tr by additional marker analysis was 152 and 36, respectively (Supplementary Table S7). The APKM of four amplicons on the fourth exon of the e3tr allele was almost zero, whereas the APKM of the E3 allele varied depending on the accessions, and it was difficult to judge the presence or absence of deletion from the APKM of each amplicon. However, it was possible to classify the presence or absence of deletion clearly (Fig. 3B and Supplementary Table S7) when the average APKM of the four amplicons was used instead, because differences in amplification efficiency due to sequence variation at the priming site can be cancelled using the APKM of multiple amplicons.
Figure 3

Distribution of mapped sequence reads of two different alleles in the E3 gene region. (A) Top: Read coverage of Williams 82 with the default parameters of TMAP. Bottom: Read coverage of PGC010 with the default and optimized parameters of TMAP. The E3 gene of PGC010 has a large deletion in the fourth exon. (B) APKM of amplicons from variant (e3T) and wild type (WT) on the fourth exon of E3. Average APKM of four amplicons (left most) and APKM of each amplicon—(a) AMPL1040313, (b) AMPL1040314, (c) AMPL1037722, and (d) AMPL1036854.

Distribution of mapped sequence reads of two different alleles in the E3 gene region. (A) Top: Read coverage of Williams 82 with the default parameters of TMAP. Bottom: Read coverage of PGC010 with the default and optimized parameters of TMAP. The E3 gene of PGC010 has a large deletion in the fourth exon. (B) APKM of amplicons from variant (e3T) and wild type (WT) on the fourth exon of E3. Average APKM of four amplicons (left most) and APKM of each amplicon—(a) AMPL1040313, (b) AMPL1040314, (c) AMPL1037722, and (d) AMPL1036854. As described above, appropriate parameters are required to map short amplicon reads to the correct genomic region. As the AmpliSeq technology has been mainly used in animals in which palaeopolyploidy is considerably rare, no such limitation has been reported. Soybean is an ancient tetraploid, which underwent two whole-genome duplications (palaeopolyploidy); most of the genes have paralogous genes with multiple copies. The information provided above would be useful when the AmpliSeq technology is applied to plant species, which have experienced whole-genome duplication or triplications.

3.3. Detected variants of flowering time-related genes

A total of 192 soybean mini-core collection was analysed to detect novel variants in flowering time-related genes by AmpliSeq. Among the 461 variants (SNPs or InDels) detected in the target regions, 311 (67.5%) sites have already been reported or registered in dbSNP, whereas 150 sites (32.5%) were novel (Table 2 and Supplementary Table S8). The variants detected were compared in depth with information of flowering time-related genes, E1, E2, E3, E4, FT2a, FT5a, and their homologues.,, Further, we performed linear regression analysis to detect responsible variants associated with flowering time under long-day field conditions. Among the 461 variants, 207, 206, and 219 were found to be potentially associated with flowering time in 2011, 2012, and 2013 by the simple linear regression analysis, respectively (Table 3 and Supplementary Table S8). The most significantly associated variant with flowering time was SNP (rs124971350) at E2 (e2 allele) (P < 2.0e−16 for 3 yrs) (Table 3 and Supplementary Table S8). The second most significant variant was a large deletion in E3 (e3-tr allele) (P < 1.9e−13, 1.6e−13, and 5.2e−14 for 2011, 2012, and 2013, respectively). The other seven genes, WD repeat-containing protein 61, Dt1, MADS-box protein, two genes of two-component response regulator-like genes, FT2a, and PhyB, showed highly significant association with flowering time (P < 0.0001) (Table 3 and Supplementary Table S8).
Table 2

Classification of detected variants of 28 flowering time-related genes by snpEff analysis

Glyma IDDescriptionGene symbolNumber of variants in totalNumber of novel variantsNumber of known variantsNumber of missense variantsNumber of ‘high’ effect variants to gene functiona
Glyma.02G069500FLOWERING LOCUS T GmFTL7 102800
Glyma.03G194700TERMINAL FLOWER 1 GmTFL2 20200
Glyma.03G227300Phytochrome A GmPHYA4 70565355
Glyma.04G143300B3 domain-containing protein E1Lb 31200
Glyma.04G156400B3 domain-containing protein E1La 40400
Glyma.06G207800B3 domain-containing protein E1 52332
Glyma.08G363100FLOWERING LOCUS T GmFT4 92700
Glyma.08G363200FLOWERING LOCUS T GmFT6 61500
Glyma.09G035500Phytochrome B GmPHYB1 106420
Glyma.09G143500BROTHER OF FT AND TFL 1 GmTFL4 11000
Glyma.10G141400Phytochrome A   2081231
Glyma.10G221500GIGANTEA E2 21111
Glyma.12G073900Two-component response regulator-like   1831551
Glyma.15G140000Phytochrome B   2791872
Glyma.16G044100FLOWERING LOCUS T GmFT5a/GmFTL4 145900
Glyma.16G044200FLOWERING LOCUS T GmFT3a/GmFTL1 82610
Glyma.16G150700FLOWERING LOCUS T GmFT2a/GmFTL3 2213921
Glyma.16G151000FLOWERING LOCUS T GmFT2b/GmFTL5 93610
Glyma.16G196300BROTHER OF FT AND TFL 1 GmTFL3 2051510
Glyma.16G200700MADS box protein   45153070
Glyma.17G052100WD repeat-containing protein 61 WDR61 1511401
Glyma.17G090500Achaete-scute transcription factor-related   1710706
Glyma.19G108100FLOWERING LOCUS T GmFT3b/GmFTL2 00000
Glyma.19G108200FLOWERING LOCUS T GmFT5b/GmFTL6 113810
Glyma.19G194300TERMINAL FLOWER 1 Dt1/GmTFL1 145950
Glyma.19G224200Phytochrome A E3 168853
Glyma.19G260400Pseudo-response regulator 5   42113180
Glyma.20G090000Phytochrome A E4 1621410
Glyma.U034500Two-component response regulator-like   2626057

High-impact variant to gene function includes criteria of stop lost, stop gained, and frameshift.

Table 3

DNA polymorphisms in nine genes significantly associated with flowering time variation in the mini-core collection as revealed by the single linear regression analysis

Glyma IDGene descriptionGene symbolaChromosomePosition (bp)Reference sequenceAlternative sequenceSNP name P value of association tests
Functional effectAmino acid changeFunctional class
201120122013
Glyma.06G207800B3 domain- containing protein E1 Chr0620207322CGrs123612969 (e1-as)0.02260.01530.0333Missense_variantp.Thr75Arg/c.224C>GMISSENSE
Glyma.10G221500GIGANTEA E2 Chr1045310798ATrs124971350 (e2)2E-162E-162E-16Stop-gainedp.Lys527*/c.1582A>THIGH
                
Glyma.12G073900Two-component response regulator-like - Chr125508242TGrs1253081010.002580.00820.0019Upstream_gene_ variant  
     Chr125508672CTCrs7451924140.001550.002610.00503Intron_variantc.-120 + 219delT 
     Chr125508702TCrs7433879350.02110.02280.0109Intron_variantc.-119-198T>C 
     Chr125509310GArs3886190680.01460.01190.00907Missense_variantp.Asp98Asn/c.292G>AMISSENSE
     Chr125509317CTrs1253081030.001070.001480.00319Missense_variantp.Ser100Leu/c.299C>TMISSENSE
     Chr125519728ACrs1253081150.002290.007490.00169Missense_variantp.Lys378Gln/c.1132A>CMISSENSE
     Chr125520578AGrs3891226570.0140.01120.00865Synonymous_ variantp.Ala504Ala/c.1512A>GSILENT
     Chr125520945TCrs1253081178.96E-101.76E-087.74E-11Stop_lostp.Ter627Glnext*?/c.1879T>CMISSENSE
HIGH     Chr125521029GTrs7431798140.02440.02030.00893_Prime_UTR_ variantc.*82G>T 
Glyma.15G140000Phytochrome B - Chr1511436193GArs3884357415.17E-067.93E-068.47E-063_Prime_UTR_ variantc.*218C>T 
     Chr1511441207CTrs1262794955.17E-067.93E-068.47E-06Missense_variantp.Val394Ile/c.1180G>AMISSENSE
     Chr1511442400ACrs1262795020.07530.07690.04315_Prime_UTR_ variantc.-14T>G 
Glyma.16G150700FLOWERING LOCUS T GmFT2a/GmFTL3 Chr1631110004TATAAGAAAGCTrs392064733_10.03970.1170.02665_Prime_UTR_ variantc.-50_-59delATAAGAAAGC 
     Chr1631110004TATAAGAAAGCATArs392064733_20.03970.1170.02665_Prime_UTR_ variantc.-49_-58delTAAGAAAGCA 
     Chr1631110991AAATATrs864598505_16.24E-050.0002740.000111Intron_variantc.202-56_202-55insATAT 
     Chr1631111033TArs1268298176.24E-050.0002740.000111Intron_variantc.202-15T>A 
     Chr1631111042TCrs1268298180.06970.210.0419Splice_region_ variantc.202T>C 
     Chr1631111349AGrs1268298196.24E-050.0002740.000111Intron_variantc.304 + 72A>G 
     Chr1631114633GAChr16_311146335.46E-064.14E-063.05E-07Missense_ variantp.Gly169Asp/c.506G>AMISSENSE
     Chr1631114658AGAAAChr16_31114658_20.07650.05380.0433_Prime_UTR_ variantc.*1delG 
     Chr1631114930GTrs1268298466.24E-050.0002740.0001113_Prime_UTR_ variantc.*272G>T 
Glyma.16G200700MADS box protein - Chr1636179909TGrs1268885261.13E-073.18E-063.35E-073_Prime_UTR_ variantc.*218A>C 
     Chr1636180002GArs1268885271.09E-071.63E-061.98E-073_Prime_UTR_ variantc.*125C>T 
     Chr1636180087TTGAChr16_361800878.8E-082.16E-062.07E-073_Prime_UTR_ variantc.*39_*40insTC 
     Chr1636180122ACrs1268885284.19E-079.01E-061.03E-063_Prime_UTR_ variantc.*5T>G 
     Chr1636180183TArs3890223944.96E-071.03E-051.32E-06Missense_variantp.Thr219Ser/c.655A>TMISSENSE
     Chr1636182854TTAChr16_361828546.15E-064.11E-053.79E-06Intron_variantc.511-49_511-50insT 
     Chr1636182857TArs1268885748.41E-077.3E-065.2E-07Intron_variantc.511-52A>T 
     Chr1636183423TGrs126888580_26.9E-065.27E-062.61E-06Intron_variantc.427-19A>C 
     Chr1636183426GCrs126888581_10.002860.001590.0018Intron_variantc.427-22C>G 
     Chr1636183435GCrs1268885826.8E-065.07E-062.66E-06Intron_variantc.427-31C>G 
     Chr1636183450GArs1268885830.01040.007210.00804Intron_variantc.427-46C>T 
     Chr1636183510CTrs1268885840.0002190.002750.000461Intron_variantc.427-106G>A 
     Chr1636183518AGrs1268885853.22E-109.24E-093.46E-10Intron_variantc.427-114T>C 
     Chr1636183541CArs1268885862.98E-108.83E-093.27E-10Intron_variantc.427-137G>T 
     Chr1636183556TCChr16_361835560.05070.03630.048Intron_variantc.427-152A>G 
     Chr1636183568TArs1268885880.001930.01720.00328Intron_variantc.427-164A>T 
     Chr1636183573TArs1268885890.001930.01720.00328Intron_variantc.427-169A>T 
     Chr1636184165TCrs1268886054.71E-053.54E-050.000019Intron_variantc.327-41A>G 
     Chr1636184729TCrs1268886092.18E-092.34E-086.84E-10Missense_variantp.Thr79Ala/c.235A>GMISSENSE
     Chr1636184733CTrs1268886102.18E-092.34E-086.84E-10Synonymous_ variantp.Ser77Ser/c.231G>ASILENT
     Chr1636184819ATrs1268886110.03410.08170.0244Intron_variantc.183-38T>A 
     Chr1636187211ACrs7442333190.0002870.0007440.00118Synonymous_ variantp.Ser36Ser/c.108T>GSILENT
     Chr1636187552TGrs1268886360.03830.03150.0302Upstream_gene_ variant  
     Chr1636187600AGrs1268886370.03830.03150.0302Upstream_gene_ variant  
Glyma.17G052100WD repeat-containing protein 61 WDR61 Chr173955280AATrs126942305_10.002140.001910.00197Downstream_gene_ variant  
     Chr173955411AGrs1269423132.56E-103.12E-101.07E-10Downstream_gene_ variant  
     Chr173955475TCAArs126942314_10.003590.006450.00279Downstream_gene_ variant  
     Chr173955476CArs1269423150.003590.006450.00279Downstream_gene_ variant  
     Chr173955546TArs1269423160.003740.001570.002643_Prime_UTR_ variantc.*125A>T 
     Chr173955554TCrs1269423170.0310.02990.03583_Prime_UTR_ variantc.*117A>G 
     Chr173955716AGrs3882581390.004180.007610.00338Synonymous_ variantp.Ala307Ala/c.921T>CSILENT
     Chr173955763CCGChr17_39557630.004440.001940.00318Frameshift_ variantp.Val291_Ala292fs/c.873_874insCHIGH
     Chr173955764AGrs1269423180.003740.001570.00264Synonymous_ variantp.Val291Val/c.873T>CSILENT
     Chr173955884GCrs1269423190.003740.001570.00264Synonymous_ variantp.Val251Val/c.753C>GSILENT
     Chr173956142TCrs1269423216.36E-083.72E-081.25E-08Synonymous_ variantp.Ala165Ala/c.495A>GSILENT
     Chr173956163TCrs1269423220.006440.002950.00466Synonymous_ variantp.Lys158Lys/c.474A>GSILENT
     Chr173958319CGrs1269423409.03E-104.86E-101.98E-10Synonymous_ variantp.Ser16Ser/c.48G>CSILENT
     Chr173958374TGrs1269423412.32E-109.96E-115.12E-115_Prime_UTR_ variantc.-8A>C 
Glyma.19G194300TERMINAL FLOWER 1 Dt1/GmTFL1 Chr1945183701TArs127928573 (dt1)0.001220.005870.000401Missense_variantp.Arg166Trp/c.496A>TMISSENSE
     Chr1945183808CTrs7450098060.03190.01840.0153Missense_variantp.Arg130Lys/c.389G>AMISSENSE
     Chr1945183859GArs1279285744.29E-113.71E-116.02E-10Missense_variantp.Pro113Leu/c.338C>TMISSENSE
     Chr1945184581GTChr19_451845810.0140.1530.0428Splice_region_ variantc.202C>A 
     Chr1945185131CTChr19_451851310.008050.003510.007645_Prime_UTR_ variantc.-142G>A 
Glyma.19G224200Phytochrome A E3 Chr1947633086TTAChr19_476330860.0007630.0003530.000855_Prime_UTR_ variantc.-693_-694insA 
     Chr1947634596AGrs1279446610.00520.001080.00429Synonymous_ variantp.Ser43Ser/c.129A>GSILENT
     Chr1947635025CTrs3886442813.18E-053.43E-053.69E-05Synonymous_ variantp.Ile186Ile/c.558C>TSILENT
     Chr1947635737CArs3890011100.05340.0520.048Missense_variantp.Leu424Ile/c.1270C>AMISSENSE
     Chr1947636564GArs1279446640.00980.002320.00888Intron_variantc.2074 + 23G>A 
     Chr1947636607GTrs1279446650.01270.00310.0113Intron_variantc.2074 + 66G>T 
     Chr1947637258AGrs393405985_10.0440.04470.0427Missense_variantp.Thr832Ala/c.2494A>GMISSENSE
     Chr1947638302GAChr19_47638302 (e3-Mo)0.04330.03060.0485Missense_variantp.Gly1050Arg/c.3148G>AMISSENSE
     Chr1947638344ATrs3896365220.005080.001130.00433Splice_region_ variantc.3183A>T 
     Chr1947641562 15 kb deletion e3-tr 1.9E-131.61E-135.15E-14Loss of exon 4 HIGH
Glyma.U034500Two-component response regulator-like - Scaffold_32197169CTScaffold_32_1971690.0003180.0007710.0006123_Prime_UTR_ variantc.*958G>A 
     Scaffold_32197421TCScaffold_32_1974210.0000340.000078.54E-053_Prime_UTR_ variantc.*706A>G 
     Scaffold_32197459ATScaffold_32_1974591.23E-091.94E-082.71E-093_Prime_UTR_ variantc.*668T>A 
     Scaffold_32198053TAScaffold_32_1980532.16E-054.99E-050.000048Intron_variantc.*133 + 29A>T 
     Scaffold_32198773AATScaffold_32_1987731.85E-063.75E-062.85E-06Frameshift_variantp.Met737_Ala738fs/c.2209_2210insAHIGH
     Scaffold_32199551ACScaffold_32_1995514.88E-096.54E-081.23E-08Synonymous_ variantp.Ala529Ala/c.1587T>GSILENT
     Scaffold_32199605TGScaffold_32_1996052.16E-054.99E-050.000048Intron_variantc.1575-42A>C 
     Scaffold_32199656TATScaffold_32_1996560.00960.01040.011Intron_variantc.1574 + 2delT 
     Scaffold_32199722CGScaffold_32_1997220.0003180.0007710.000612Missense_variantp.Gly504Ala/c.1511G>CMISSENSE
     Scaffold_32202754GAScaffold_32_2027540.01420.010.0105Stop_gainedp.Arg308*/c.922C>T
HIGH     Scaffold_32206717AGScaffold_32_2067171.23E-091.94E-082.71E-09Intron_variantc.792 + 27T>C 
     Scaffold_32216494CTScaffold_32_2164944.88E-096.54E-081.23E-08Synonymous_ variantp.Val168Val/c.504G>ASILENT
     Scaffold_32218380ATScaffold_32_2183803.29E-094.29E-088.4E-09Synonymous_ variantp.Pro73Pro/c.219T>ASILENT

Gene names refer to those in Kong et al., Fan et al., and Cao et al.

Classification of detected variants of 28 flowering time-related genes by snpEff analysis High-impact variant to gene function includes criteria of stop lost, stop gained, and frameshift. DNA polymorphisms in nine genes significantly associated with flowering time variation in the mini-core collection as revealed by the single linear regression analysis Gene names refer to those in Kong et al., Fan et al., and Cao et al.

E1 and E1-like genes

Five alleles, E1, e1-as, e1-nl, e1-fs, and one novel missense (Chr06_20207355) were identified at E1 (Fig. 4, Supplementary Fig. S5A, and Supplementary Tables S8 and S9). In the other two B3 domain containing E1-like genes, only one synonymous variant (rs123097808) was detected in Glyma.04G156400/E1La, whereas no variant was detected in the coding region of Glyma.04G143300/E1Lb (Supplementary Fig. S5B and C). Among the five E1 alleles, the frequency of e1-as allele (Chr06:20207322 C: Williams 82 type) was 0.09 and Chr06:20207322 C to G nucleotide change (rs123612969) was 0.91 among the soybean mini-core collection. In contrast, e1-nl, which lacks the entire E1 gene, was only found in Swedish cultivar FiskebyV (PGC001) (Supplementary Table S9). This allele was determined by the read coverage at the E1 genomic region. The average normalized read coverage of all six amplicons (AMPL1037682–AMPL1037687) was three in PGC001, whereas that of these amplicons in the other accessions was 181 (ranging from 36 to 430). Additional experiments to confirm the deletion in the E1 genomic region by PCR amplification revealed that only PGC001 lacks the E1 genomic region among all accessions and possesses the e1-nl allele. Another allele, e1-fs, which had 1-bp deletion variant (Chr06_20207323) was also found in one accession PGC002 (Fig. 4, Supplementary Fig. S5A, and Supplementary Table S9). The deletion (Chr06_20207323) causes a frameshift and introduces a premature stop codon at Lys76. These loss-of-function alleles were not included in the association analysis due to very low allele frequency (only one accession each), but might explain very early flowering of PGC001 and PGC002 under long-day-length field condition. PGC002 (Wase kuro daizu) is originated from the southern part of Japan and classified as the summer-type soybean, early-maturity group in low-latitude regions of Japan. The summer-type soybean has low photoperiod sensitivity, and e1-fs can explain this characteristic. In contrast, the novel missense (Chr06_20207355) from PGC139 and PGC147 (Supplementary Fig. S5A and Supplementary Table S9) did not show large effect on flowering time and might not significantly affect the E1 function.
Figure 4

Detected variants in E1, E3, E4, FT5a, and FT2a from 192 mini-core collection. The grey and white boxes indicate UTRs and exons. The solid lines indicate 5′-uptream and intron regions. The black and grey circles indicate loss-of-function (described as ‘HIGH’ impact on the gene function in Supplementary Table S8) and missense variants. Braces indicate known large InDels. These InDels can be detected by read depth.

Detected variants in E1, E3, E4, FT5a, and FT2a from 192 mini-core collection. The grey and white boxes indicate UTRs and exons. The solid lines indicate 5′-uptream and intron regions. The black and grey circles indicate loss-of-function (described as ‘HIGH’ impact on the gene function in Supplementary Table S8) and missense variants. Braces indicate known large InDels. These InDels can be detected by read depth.

E2

Two alleles, E2 and e2, and one novel SNP variant (Chr10_45310686) were detected at E2 (Fig. 4, Supplementary Fig. S5D, and Supplementary Table S8). Among them, functional defective e2 allele had A to T nucleotide change (Table 3, K528*, rs124971350) and the allele frequency among the soybean mini-core collection was 0.42 (Supplementary Table S8). A novel SNP (Chr10_45310686), which causes missense variant of Ile490Met, was detected only in PGC086 with the e2 allele (Supplementary Table S9 and Supplementary Fig. S5D).

E3

Among two alleles, e3-tr and e3-Mo, detected at E3, the frequency of e3-tr allele, which has a large deletion in the fourth exon, was 0.19 (Fig. 4, Supplementary Fig. S5E, and Supplementary Tables S7 and S8). The missense variant of e3-Mo (Chr19_47638302: G to A, Gly1050Arg) in the third exon of E3 was detected in PGC019 and PGC042 (Moshidou Gong 503) derived from Korean Peninsula and China, respectively. The e3-Mo variant is not registered in dbSNP, but we found in one Chinese landrace Ni Ding Hua Mei Dou from 302 soybean re-sequence data (SRR1533240 in NCBI SRA).

E4

The e4-SORE-1 allele has a 6.2-kb insertion in the first exon. It was difficult to estimate this insertion from the read coverage of amplicon in the region. However, the presence or absence of a large insertion could be estimated from the read coverage of amplicon (AMPL1037734) at the break point of a large insertion (Supplementary Fig. S5F). The average APKM of break point was 325 in the reference type sequence, whereas it was zero in the insertion type sequence of PGC001 and PGC021 derived from Sweden and Japan, respectively. This insertion was also confirmed by the PCR. Most SNPs (11 of 13 sites) in E4 were found from PGC123 and PGC134 derived from Nepal and China (Supplementary Table S9), but there was only one missense variant (rs390866037: Leu151Ser), which likely affects gene function. As these variants were detected as homozygous, they are considered to be real variants, not detected by the miss-mapped reads. A frameshift variant in the second exon was only found in PGC005 (Supplementary Table S9). This accession flowered earlier than Williams 82 under field conditions in spite of the same gene combination for all other flowering-related genes (Supplementary Table S1).

Other Phytochrome A genes

Five variants, three frameshifts and two splice site variants, were identified to be high-impact variant to another PhyA gene, Glyma.03G227300/GmPHYA4 (Supplementary Fig. S5G and Supplementary Table S9). This PhyA gene consisted of two main haplotypes, namely, reference type (Hap1–Hap5) and pseudogene type (Hap6–Hap11), which had various loss-of-function sites. The other PhyA gene, Glyma.10G141400, had only one novel frameshift variant (Chr10_37491867) from two accessions, PGC045 and PGC189 derived from Korea and East Timor (Supplementary Tables S6 and S9 and Supplementary Fig. S5H).

Phytochrome B genes

There has been no report of natural variation in the PhyB genes affecting flowering, but the overexpression of GmPHYB1 accelerates flowering under short-day conditions in Arabidopsis. Only one missense variant (rs124458274) was found from GmPHYB1 (Glyma.09G035500) (Supplementary Table S8 and Supplementary Fig. S5I). rs124458274 was a common variation in the mini-core collection (allele frequency = 0.69). Two novel frameshifts and six missense variants (one was novel) were found in the other PhyB gene Glyma.15G140000 (Supplementary Table S8 and Supplementary Fig. S5J). Although the frameshift variant (Chr15_11442094) was identified in the 19 accessions (Hap8, Supplementary Table S9), no association with flowering time under the examined field conditions was observed.

FLOWERING LOCUS T

Two florigen genes FT5a and FT2a in the soybean genome play a major role in the induction of flowering.,, As no variant was detected in the exon of FT5a, it appears that FT5a is highly conserved under the evolutionary constraint (Fig. 4 and Supplementary Fig. S5K). Nine and five variants were detected in the intron and 3′-UTR, respectively. Of these, four variants were associated with flowering time determined by simple linear regression analysis (Supplementary Tables S3 and S8). Two variants (rs126630615 in 3′-UTR and rs126639618 in the third intron) were reported by Takeshima et al. This FT5a region has been reported to be one of the flowering time quantitative trait loci (QTLs) in the chromosomal segment substitution lines (CSSLs) derived from a cross between Peking and Enrei. In this study, the nucleotide differences between Enrei and Peking were identified as rs126639616_1 (Fig. 4) in the intron and rs388994144_1 (Fig. 4) in the 3′-UTR region (Supplementary Fig. S5K and Supplementary Table S8). As natural variants in 5′- and 3′-UTR of the FT-like gene affect gene expression and flowering time in rice,, the variant rs388994144_1 in 3′-UTR region might be involved in gene expression of FT5a and regulation of flowering time in soybean. The FT2a is a paralogue of FT5a and has been named as E9.E9 is a leaky allele that is caused by allele-specific transcriptional repression due to the insertion of SORE-1 into the first intron. The presence of SORE-1 (FT2a-TO allele) delays flowering for 10 days under natural day-length conditions at Harbin, China (45°43′N, 126°45′E). Zhao et al. also reported a difference of 10 days or more in flowering time between E9 and e9 in Sapporo, Japan (43°07′N, 141°35′E). As we did not design the primers on the intron of FT2a, the presence or absence of SORE-1 is unknown. In this study, one frameshift and two missense variants were found in FT2a (Glyma.16G150700) (Fig. 4 and Supplementary Fig. S5L). In the first exon, missense SNP rs388788554 (Glu23Asp) was detected in PGC066 (Hap13, Supplementary Table S9). In the fourth exon, missense novel SNP (Chr16_31114633) was detected in seven accessions (Hap10, Supplementary Table S9). Another novel frameshift variant (Chr16_31111088) in the fourth exon was detected in PGC166 (Hap14, Supplementary Table S9). Enrei had four known and two novel variants in the intron and 3′-UTR of FT2a, whereas there was no variant in Peking. As QTL is not reported in the FT2a region of the CSSL between Peking and Enrei, these variants might be not involved in the regulation of flowering time under the evaluation conditions of CSSLs.

FT-like genes

Four missense variants were detected in the other three FT homologues, rs126830445 in Glyma.16G151000 (GmFT2b/GmFTL5, Supplementary Fig. S5M), rs127848197 in Glyma.19G108200 (GmFT5b/GmFTL6, Supplementary Fig. S5O), a novel SNP (Chr16_35778390) in Glyma.16G196300 (GmTFL3, Supplementary Fig. S5N), and Chr16_4162554 in Glyma.16G044200 (FT3a/GmFTL1, Supplementary Fig. S5P). The frequency of rs126830445 in GmFT2b/GmFTL5 was 0.45, whereas that of rs127848197 in GmFT5b/GmFTL6 was 0.94. A novel missense (Val98Ile) SNP (Chr16_35778390) in GmTFL3 was only found in PGC037. This accession (YAKUMO MEAKA) is a landrace from Hokkaido, northern part of Japan. In contrast, a novel missense variant (Chr16_4162554) in FT3a/GmFTL1 was only found in PGC134 (Hap7, Supplementary Table S9), which is a medium-maturing accession. No functional defect or missense variant in the other FT-like genes, Glyma.02G069500 (GmFTL7, Supplementary Fig. S5Q), Glyma.08G363100 (GmFT4, Supplementary Fig. S5R), and Glyma.08G363200 (GmFTL6, Supplementary Fig. S5S), was found. No variants were detected in Glyma.19G108100 (GmFT3b/GmFTL2). The information of alleles identified in these FT-like genes will be useful to clarify the influence of these variants on flowering regulation.

TFL1-like genes

Two TFL1-like genes, GmTFL2 (Glyma.03G194700) and GmTFL1 (Glyma.19G194300), exist in the soybean genome. No loss-of-function or missense variant was found in GmTFL2 (Supplementary Fig. S5T), whereas five missense variants were found in GmTFL1, which determine the growth habit of soybean, classically named as Dt1 locus (Supplementary Fig. S5U and Supplementary Table S8). These sites are located where amino acids are highly conserved across TFL1 orthologues: GmTFL2, GmTFL1/Dt1, Lotus japonicas CEN/TFL1, pea TFL1a, Arabidopsis TFL1, Arabidopsis ATC, and Antirrhinum majus CEN. The variant site of rs745009806 (Arg130Lys) was conserved in TFL, but not in ATC and CEN. Another four variant sites, rs127928577 (Arg62Ser), rs392653457 (Leu67Gln), rs127928574 (Pro113Leu), and rs127928573 (Arg166Trp), exist at a highly conserved amino acid site. Of these, rs127928573 (Arg166Trp) is known as dt1 allele in soybean. As the loss-of-function Sidt1 allele has been reported at S79N in Sesamum indium L., the other three missense variants should be examined to verify whether they are new defective dt1 alleles or not.

Two-component response regulator-like genes

Among three two-component response regulator-like genes screened, the stop-lost variant (rs125308117) and five missense variants were found in Glyma.12G073900 (Supplementary Table S8 and Supplementary Fig. S5V). The allele frequency of the stop-lost variant (rs125308117) was 0.31. In Glyma.19G260400, only seven missense variants were found (Supplementary Table S8 and Supplementary Fig. S5W). Among seven variants with high impact on gene function in Glyma.U034500 on scaffold 32 (Supplementary Table S8 and Supplementary Fig. S5), four were frameshift variants due to InDels. Frameshift variant (A > AT, M737I, Scaffold32: 198773) was the major allele, and 82% of the mini-core collection possesses this allele. In contrast, other frameshift variants of insertion (C > CT, Q759M, scaffold_32: 198706) from PGC044, deletion (TTGCC > -, G409D, scaffold_32: 200001) from 16 accessions, and deletion (AC > A, V355L, scaffold_32: 200258) from three accessions, PGC005, PGC094, and PGC174, were rare alleles in the mini-core collection (Supplementary Table S9). The remaining three variants (scaffold_32_199043, scaffold_32_202754, and scaffold_32_218486) with high impact on gene function were stop-gained variant. These results indicate that Glyma.U034500 of most soybean accessions, except for Hap1, Hap2, Hap3, and Hap4 (Supplementary Table S9), losses its function. Among these three genes, Glyma.12G073900 and Glyma.U034500 showed high similarity (91%) at the amino acid sequence level. The fact that length of the amino acid sequence of Glyma.12G073900 of Williams 82 is shorter (92 aa) than that of Glyma.U034500 (765 aa) at the C terminal indicates that Glyma.12G073900 encodes truncated protein. Although flowering control by two-component response regulator-like genes has been reported in various species, the role of this gene and its variant in soybean flowering are unknown. Among the two-component response regulator-like genes, only variants in Glyma.12G073900 and Glyma.U034500 were associated with flowering time, determined by simple linear regression analysis (Table 3 and Supplementary Table S8). Glyma.U034500 (Chr11 11.23-11.26Mb on Gmax189) is located near a previously reported QTL as qFT-B1 (nearest marker: Satt519 74.7cM, Chr11 13.98Mb on Gmax189) in the 96 from the cross between Tokei 780 and the soja accession Hidaka 4. Although they reported the effect of qFT-B1 is 3.4–10.8 days, the genotype of this frameshift site (Scaffold 32:198773) in the parent of recombinant inbred lines (RILs) is unknown. It can be confirmed using the detected variants as a DNA marker whether detected stop-lost and stop-gained variants in two-component response regulator-like genes are responsible genes for the flowering time.

Other genes

Seven missense variants were identified in Glyma.16G200700 encoding MADS box protein, whereas functional defect variant was not found (Supplementary Table S8 and Supplementary Fig. S5Y). MADS-domain transcription factor of the AGL6 gene is known to be a factor responsible for the regulation of lateral organ development, flowering time, and circadian clock in Arabidopsis.,AGL6 regulates flowering through the FLC family genes and FT. Two missense variants, rs389022394 and rs126888609, on Glyma.16G200700 are significantly associated with the flowering time, determined by the simple linear regression analysis (Table 3 and Supplementary Table S8). As there is no report for FLC-like genes in soybean, it will be important to examine whether the detected two variants from Glyma.16G200700 have an effect on the flowering time. One novel frameshift variant (Chr17_3955763) of WD repeat-containing protein 61 (Glyma.17g52100) was significantly associated with flowering time, determined by the simple linear regression analysis (Table 3 and Supplementary Table S8). Although the allele frequency of the frameshift variant was 0.52 (Supplementary Table S8), no QTL has been reported in this region. In Arabidopsis, WD repeat-containing protein VIP3 regulates flowering time via the vernalization pathway; however, the vernalization pathway is not known in soybean and it is difficult to infer the role. As a large proportion of the mini-core collection has the novel frameshift variant for Glyma.17G052100 and missense variant for Glyma.16G200700, it is necessary to confirm genetically whether these novel variants really affect the flowering time. The transcription factor gene, Glyma.17G090500, was sequenced as the control for variant detection. All known variants were detected correctly (data not shown).

3.4. Gene-based association test for flowering time

To refine responsible variants associated with variation in flowering time in the mini-core collection, we performed multiple linear regression analysis using variants significantly associated with flowering time in the simple linear regression analysis. The variants of e2, e3-tr and stop-lost variant (rs125308117) of two-component response regulator-like gene on Chr12 were significant in 3 yrs, and rs127928573 in Dt1 was significant only in 2013 (Table 4). These genes could explain 51.82%, 51.13%, and 52.83% of the phenotypic variation of flowering time among the mini-core collection in 3 yrs, respectively. In this study, the variants of E1 and E4 could not be incorporated into the association analysis due to the low frequency of e1-nl (0.5%) and e4-SORE1 (1%) alleles in the mini-core collection. The extent of variation explained in this study was ∼10% lower than 62–66% reported by Zhai et al. Even though the allele frequency of e1-as was relatively high (9%), e1-as was not significant in the multiple linear regression analysis. This is probably because the genetic effect of e1-as is smaller than that of E2, E3, and two-component response regulator-like gene. In the simple linear regression analysis, the P-value of E2 (2.0e−16), E3 (5.2e−14–1.9e−13), and two-component response regulator-like gene (7.7e−11–1.8e−8) was considerably lower than that of e1-as (0.015–0.033) (Table 3). Further experiment using a larger population size is required to examine the remaining variation that could not be explained by the three genes with e1-as.
Table 4

DNA polymorphisms responsible for flowering time variation in the mini-core collection as revealed by multiple regression analysis

Data setSNP No.SNP namePhysical positiona
Glyma IDDescriptionbAllelesEffectcMAFParameters estimated by linear regression
Contribution rate (%)
Chromosomebpβd s.e. P-valuee
2011            
 E2rs124971350 (e2)Chr1045310798Glyma.10G221500GIGANTEA (E2)A/TStop_gained0.42−5.211.512.5E-03**21
 SNP379 e3tr f Chr1947641562Glyma.19G224200Phytochrome A (E3)Large deletionLoss of exon 40.17−9.883.053.9E-03**19
 SNP156rs125308117Chr125520945Glyma.12G073900Two-component response regulator-likeT/CStop_lost0.3111.243.534.5E-03**12
2012             
 E2rs124971350 (e2)Chr1045310798Glyma.10G221500GIGANTEA (E2)A/TStop_gained0.42−5.441.471.3E-03**22
 SNP379 e3tr f Chr1947641562Glyma.19G224200Phytochrome A (E3)Large deletionLoss of exon 40.17−8.502.959.0E-03**19
 SNP156rs125308117Chr125520945Glyma.12G073900Two-component response regulator-likeT/CStop_lost0.319.653.421.0E-02*10
2013             
 E2rs124971350 (e2)Chr1045310798Glyma.10G221500GIGANTEA (E2)A/TStop_gained0.42−5.361.664.0E-03**20
 SNP379 e3tr f Chr1947641562Glyma.19G224200Phytochrome A (E3)Large deletionLoss of exon 40.17−12.183.341.5E-03**13
 SNP156rs125308117Chr125520945Glyma.12G073900Two-component response regulator-likeT/CStop_lost0.3112.693.863.5E-03**19
 SNP349rs127928573Chr1945183701Glyma.19G194300TERMINAL FLOWER 1T/AMissense_variant0.0911.235.113.9E-02*0.5

MAF: minor allele frequency; s.e.: standard error.

Physical position on Gmax275.

Gene description was obtained from Phytozome 12.

Effect to gene function annotated by snpEff. Effect of AMPL1040314 was defined by manually.

Standardized regression coefficients.

Adjusted P-value was obtained from multivariate models days to flowering and genotype as covariates. Signification codes: ‘**’ 0.01 ‘*’ 0.05.

Large deletion on E3 estimated by coverage of four amplicon on 4th exon.

DNA polymorphisms responsible for flowering time variation in the mini-core collection as revealed by multiple regression analysis MAF: minor allele frequency; s.e.: standard error. Physical position on Gmax275. Gene description was obtained from Phytozome 12. Effect to gene function annotated by snpEff. Effect of AMPL1040314 was defined by manually. Standardized regression coefficients. Adjusted P-value was obtained from multivariate models days to flowering and genotype as covariates. Signification codes: ‘**’ 0.01 ‘*’ 0.05. Large deletion on E3 estimated by coverage of four amplicon on 4th exon. The other five genes, namely, WD repeat-containing protein 61 (Glyma.17G052100), MADS-box protein (Glyma.16G200700), PhyB (Glyma.15G140000), two-component response regulator-like gene (Glyma.U034500), and FT2a/GmFTL3 (Glyma.16G150700), were significant in the simple linear regression analysis (P < 0.0001) but not significant in the multiple linear regression analysis (Table 3 and Supplementary Table S8). Variants that differ between Enrei and Peking can be used to confirm allele effect on flowering time using the phenotypic data of CSSLs. Peking had a novel frameshift variant (Chr17_3955763) in WD repeat-containing protein 61 (Glyma.17G052100), two missense variants (rs389022394 and rs126888609) in MADS-box protein (Glyma.16G200700), and one frameshift variant (Chr15_11442094) in PhyB (Glyma.15G140000), and no variant in Enrei (Supplementary Table S8). However, no flowering time QTL has been reported to Chr17, Chr16, and Chr15; these genes may not be involved in flowering time regulation under the evaluation conditions of CSSLs. It was the stop-gain allele E2 that showed the highest association with flowering time. The effect of this variant promotes flowering about 5 days (Table 4). Watanabe et al. reported that the difference in days to flowering between E2/E2 and e2/e2 was ∼9 days, which is consistent with the result of this study. The next strong association with flowering time was observed at E3. The e3-tr allele (Horosy-e3) has been reported to promote flowering for ∼17 days, but it was estimated as 9–13 days in this study. The smaller estimation at E3 can be explained by the absence of e3-Mo allele. As there are only two accessions, PGC019 and PGC042 (Supplementary Table S9), the e3-Mo allele could not be included in the association analysis. The effect of the missense variant (rs127928573, Arg166Trp) of Dt1 was detected only for 2013 data set; it delayed flowering by ∼11 days compared with that by the Dt1 allele (Table 4). Dt1 is reported as the locus strongly associated with days to maturity and plant height. Zhang et al. identified the Dt1 gene at 18.6-kb upstream of the peak SNP, which was associated with days to maturity and plant height. Dt1 plays a primary role in not only stem termination but also floral transition., As no visible influence on the flowering time has been reported with dt1 VIGS-induced suppression, the detected SNP on Dt1 in this study suggests the presence of other gene in the surrounding region related to the flowering time. The effect of stop-lost variant (rs125308117) in the two-component response regulator-like gene (Glyma.12G073900) was significant (P = 4.5 e−3 in 2011, P = 1.0 e−2 in 2012, P = 3.5 e−3 in 2013), and the plant flowers ∼10–13 days later. Involvement of the two-component response regulator-like gene in flowering time has been reported in Arabidopsis and rice; it may be functionally preserved as a flowering time-related gene in soybean. Williams 82 (reference genome) has C-terminal truncated protein as described above, whereas the rs125308117 variant has longer amino acid sequence and allele effect of delayed flowering for 4.7 days (Table 4). Although Glyma.12G073900 is located near a previously reported QTL as qFT-H (nearest marker: Satt442 on Chr12: 6,390,806–6,391,062) in RILs with the E1 allele from the cross between Tokei 780 and the soja accession Hidaka 4, the allele type of Glyma.12G073900 in both accessions is unknown. The genomic region surrounding Glyma.12G073900 has been reported to include flowering time QTL qDFF-Gm12 in CSSLs.Glyma.12G073900 of Peking (PGC084) is the stop-lost type (longer protein), whereas that of Enrei (PGC025) is reference type (truncated protein). Similar to the present study, Peking allele delayed flowering by ∼3.7 days (LOD score is 36.3, flanking markers: C12-BARC- 015603-02006 and s024200450). These data suggest that Glyma.12G073900 is one of the candidate gene for qDFF-Gm12.

4. Conclusions

Flowering time and maturity are the most important factors affecting adaptability and yield. To increase the yield of soybean, it is necessary to control flowering time at an appropriate time using a combination of flowering time-related genes or alleles. Preparing a catalogue of flowering time-related genes makes it possible to freely combine alleles with various effects using the DNA markers. Our results indicate that novel alleles and accessions with such novel alleles can be rapidly detected using the AmpliSeq technology. Although multiple defective alleles were identified, we could not include all of them in the association study of flowering time due to low allele frequency. Nevertheless, the variants detected in this study could explain 51.1–52.3% of the flowering time variation in the soybean mini-core collection. These variants consisted of a novel two-component response regulator gene besides known flowering time-related genes. Therefore, the AmpliSeq technology is useful for discovering novel variants in the target genes.

Data availability

All sequences analysed in this study have been deposited in the DDBJ database under the BioProject Accession number: PRJDB7633. Click here for additional data file.
  10 in total

1.  PhenGenVar: A User-Friendly Genetic Variant Detection and Visualization Tool for Precision Medicine.

Authors:  JaeMoon Shin; Junbeom Jeon; Dawoon Jung; Kiyong Kim; Yun Joong Kim; Dong-Hoon Jeong; JeeHee Yoon
Journal:  J Pers Med       Date:  2022-06-12

2.  Validation of AmpliSeq NGS Panel for BRCA1 and BRCA2 Variant Detection in Canine Formalin-Fixed Paraffin-Embedded Mammary Tumors.

Authors:  Daniela Di Giacomo; Marco Di Domenico; Sabrina Vanessa Patrizia Defourny; Daniela Malatesta; Giovanni Di Teodoro; Michele Martino; Antonello Viola; Nicola D'Alterio; Cesare Cammà; Paola Modesto; Antonio Petrini
Journal:  Life (Basel)       Date:  2022-06-07

3.  Development of co-dominant markers linked to a hemizygous region that is related to the self-compatibility locus (S) in buckwheat (Fagopyrum esculentum).

Authors:  Katsuhiro Matsui; Nobuyuki Mizuno; Mariko Ueno; Ryoma Takeshima; Yasuo Yasui
Journal:  Breed Sci       Date:  2020-02-11       Impact factor: 2.086

4.  A Soybean Deletion Mutant That Moderates the Repression of Flowering by Cool Temperatures.

Authors:  Jingyu Zhang; Meilan Xu; Maria Stefanie Dwiyanti; Satoshi Watanabe; Tetsuya Yamada; Yoshihiro Hase; Akira Kanazawa; Takashi Sayama; Masao Ishimoto; Baohui Liu; Jun Abe
Journal:  Front Plant Sci       Date:  2020-04-15       Impact factor: 5.753

Review 5.  Impacts of genomic research on soybean improvement in East Asia.

Authors:  Man-Wah Li; Zhili Wang; Bingjun Jiang; Akito Kaga; Fuk-Ling Wong; Guohong Zhang; Tianfu Han; Gyuhwa Chung; Henry Nguyen; Hon-Ming Lam
Journal:  Theor Appl Genet       Date:  2019-10-23       Impact factor: 5.699

6.  Characterization and quantitative trait locus mapping of late-flowering from a Thai soybean cultivar introduced into a photoperiod-insensitive genetic background.

Authors:  Fei Sun; Meilan Xu; Cheolwoo Park; Maria Stefanie Dwiyanti; Atsushi J Nagano; Jianghui Zhu; Satoshi Watanabe; Fanjiang Kong; Baohui Liu; Tetsuya Yamada; Jun Abe
Journal:  PLoS One       Date:  2019-12-05       Impact factor: 3.240

7.  Whole-genome sequence diversity and association analysis of 198 soybean accessions in mini-core collections.

Authors:  Hiromi Kajiya-Kanegae; Hideki Nagasaki; Akito Kaga; Ko Hirano; Eri Ogiso-Tanaka; Makoto Matsuoka; Motoyuki Ishimori; Masao Ishimoto; Masatsugu Hashiguchi; Hidenori Tanaka; Ryo Akashi; Sachiko Isobe; Hiroyoshi Iwata
Journal:  DNA Res       Date:  2021-01-19       Impact factor: 4.458

8.  Targeted amplicon sequencing + next-generation sequencing-based bulked segregant analysis identified genetic loci associated with preharvest sprouting tolerance in common buckwheat (Fagopyrum esculentum).

Authors:  Ryoma Takeshima; Eri Ogiso-Tanaka; Yasuo Yasui; Katsuhiro Matsui
Journal:  BMC Plant Biol       Date:  2021-01-06       Impact factor: 4.215

9.  Construction of prediction models for growth traits of soybean cultivars based on phenotyping in diverse genotype and environment combinations.

Authors:  Andi Madihah Manggabarani; Takuyu Hashiguchi; Masatsugu Hashiguchi; Atsushi Hayashi; Masataka Kikuchi; Yusdar Mustamin; Masaru Bamba; Kunihiro Kodama; Takanari Tanabata; Sachiko Isobe; Hidenori Tanaka; Ryo Akashi; Akihiro Nakaya; Shusei Sato
Journal:  DNA Res       Date:  2022-06-25       Impact factor: 4.477

Review 10.  The Modification of Circadian Clock Components in Soybean During Domestication and Improvement.

Authors:  Man-Wah Li; Hon-Ming Lam
Journal:  Front Genet       Date:  2020-09-30       Impact factor: 4.599

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.