Literature DB >> 31231761

Highly multiplexed AmpliSeq technology identifies novel variation of flowering time-related genes in soybean (Glycine max).

Eri Ogiso-Tanaka¹, Takehiko Shimizu¹, Makita Hajika¹, Akito Kaga¹, Masao Ishimoto¹.

Abstract

Whole-genome re-sequencing is a powerful approach to detect gene variants, but it is expensive to analyse only the target genes. To circumvent this problem, we attempted to detect novel variants of flowering time-related genes and their homologues in soybean mini-core collection by target re-sequencing using AmpliSeq technology. The average depth of 382 amplicons targeting 29 genes was 1,237 with 99.85% of the sequence data mapped to the reference genome. Totally, 461 variants were detected, of which 150 sites were novel and not registered in dbSNP. Known and novel variants were detected in the classical maturity loci-E1, E2, E3, and E4. Additionally, large indel alleles, E1-nl and E3-tr, were successfully identified. Novel loss-of-function and missense variants were found in FT2a, MADS-box, WDR61, phytochromes, and two-component response regulators. The multiple regression analysis showed that four genes-E2, E3, Dt1, and two-component response regulator-can explain 51.1-52.3% of the variation in flowering time of the mini-core collection. Among them, the two-component response regulator with a premature stop codon is a novel gene that has not been reported as a soybean flowering time-related gene. These data suggest that the AmpliSeq technology is a powerful tool to identify novel alleles.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: AmpliSeq; flowering time-related gene; genotyping; soybean; target re-sequencing

Mesh：

Year: 2019 PMID： 31231761 PMCID： PMC6589554 DOI： 10.1093/dnares/dsz005

Source DB: PubMed Journal: DNA Res ISSN： 1340-2838 Impact factor: 4.458

1. Introduction

Flowering time is critical for successful seed production by plants. Flowering time and maturity are the most important traits to determine the adaptability of soybean [Glycine max (L.) Merr.] cultivation. These not only restrict the cultivation area but also greatly affect plant architecture and yield., Therefore, it is necessary to clarify the genetic factors affecting flowering time and maturity and control them using a combination of alleles with different genetic effects on flowering time. To combine such alleles freely based on DNA marker-assisted selection, a catalogue of alleles for breeding materials will be necessary. Soybean is a typical short-day plant. Several functional nucleotide polymorphisms responsible for diversity in flowering time among cultivars are already known. Classical maturity loci designated as E loci have been characterized, including E1 and E2,E3,E4,E7,E8,E9, and E10. Of these, E1,E2,E3,E4, and E9 have been isolated as flowering time-related genes. E1 encodes putative transcriptional factor containing plant-specific B3 domain.E2 encodes a homologue of GIGANTEA.E3 and E4 encode a homologue of the photoreceptor phytochrome A (PHYA).E9 encodes the florigen protein FT2a. In the E1 gene, three alleles, namely e1-as (=e1 designated by Bernard), e1-fs, and e1-nl, have been reported as early flowering phenotype under long-day conditions. The e1-as allele has a single missense mutation (Arg15Thr) in the coding region. The e1-as genotype promotes flowering for ∼10 days compared with that by the E1 genotype under natural day-length conditions at Matsudo, Japan (35°78′N, 139°90′E). The e1-fs allele has a 1-bp deletion, resulting in a premature stop codon in the cultivar Sakamotowase.e1-nl is a null allele in which ∼142 kb, including the entire E1 gene, is deleted in some early flowering cultivars. In contrast, only one e2 allele has been reported in the E2 gene. The e2 allele has one premature stop codon mutation due to single nucleotide polymorphism (SNP) in the 10th exon. The e2 genotype promotes flowering for ∼9 days under natural day-length conditions at Tsukuba, Japan (36°03′N, 140°04′E). In the E3 gene, e3-Mo, e3-fs, e3-tr (=e3 designated by Buzzell), and e3-ns have been reported as nonfunctional alleles., The e3-Mo alleles have SNP for a non-synonymous amino acid substitution (G1050R) in the third exon., The e3-fs allele has a single base insertion in the exon, resulting in frameshift mutation. The e3-ns allele has a nonsense mutation in which a single nucleotide substitution in exon 3 creates a stop codon. The e3-tr alleles lack a 13.33-kb genomic region including a part of exons 3 and 4., These nonfunctional alleles promote flowering under long-day conditions. In addition, E3-Mi and E3-Ha also have been reported as functional alleles., The E3-Mi alleles have a 2.633-kb deletion in the third intron. As for the E4 gene, five nonfunctional alleles, viz., e4-SORE-1, e4-kam, e4-kes, e4-oto, and e4-tsu, have been reported.,, The e4-SORE-1 (=e4 designated by Buzzell and Voldeng) alleles have a Ty1/copia-like retrotransposon (SORE-1) insertion in first exon, resulting in a nonfunctional allele. These E1–E4 genes can result in variation in flowering time by controlling the expression of FLOWERING LOCUS T (FT) genes, FT2a and FT5a.,,, The florigen protein FT2a is encoded by E9. The e9 allele has a SORE-1 insertion in the first intron. Although eight SNPs and six InDels in the E9 have been reported, the influence on gene function is unknown. The FT5a gene was identified as qDTF-J, which promotes the flowering time for ∼5 days under natural day-length conditions at Hokkaido, Japan (43°07′N, 141°35′E). In the FT5a gene, 13 SNPs and 3 InDels are reported only in the promoter and untranslated regions (UTRs) in 439 cultivated and wild soybean accessions. The functional nucleotide polymorphisms of the four E genes (E1–E4) are useful to predict the flowering time and could explain ∼62–66% of the phenotypic variation in flowering time among 63 Japanese accessions under long-day conditions. However, prediction of flowering time will be difficult if the breeding materials have unknown alleles affecting flowering phenotype. Therefore, development of a sequencing system that can easily capture as many alleles as possible is required. Recently, it became possible to obtain whole-genome information easily with the development of next-generation sequencing (NGS) technologies. However, it is still expensive for re-sequencing large genomes. Moreover, it is necessary to have analytical and storage environments to deal with enormous amounts of whole-genome sequence data of genetic resources. Target re-sequence is one of the alternative sequencing methods to obtain sequence data of a limited region, which can minimize cost and time for data analysis and decrease data storage. The AmpliSeq technology (Thermo Fisher Scientific, Waltham, MA, USA) is one of the target re-sequencing technologies, a multiplex polymerase chain reaction (PCR)-based assay targeting regions of interest. The AmpliSeq Designer designs primer set that amplifies PCR products ranging from 75 to 375 bp in the target region, and multiplex-PCR products are sequenced by NGS. The method enables amplification of ∼6,000 amplicons by ultra-high multiplex PCR and constructs a targeted sequencing library in 10 h. In routine genotyping of crop breeding, NGS-based techniques need to meet several criteria. The processing time between sample collection and interpretation of sequencing should be short. Furthermore, it is necessary to construct libraries using limited amount of input DNA including partially degraded DNA sample and the read depth must be deep enough to detect variant accurately. The Ion Torrent platform, in combination with the AmpliSeq multiplex PCR can use DNA input of as low as 10 ng, and the processing time between sample collection and sequence analysis can be finished within 5 days. The AmpliSeq technology is frequently used for studying human inherited cancer, but it can also be applied to plant and agronomic research. In this study, we applied AmpliSeq technology to clarify the alleles of flowering time-related genes and their homologues in diverse soybean germplasm to identify novel and known variations associated with flowering time.

2. Materials and methods

2.1. Plant materials and DNA extraction

DNA was extracted from 192 accessions of a soybean mini-core collection, provided by Genebank, NARO (Supplementary Table S1). Of these, 122 accessions were sown and germinated in plastic pots on rock wool material ‘grodan’ (Nittobo, Tokyo, Japan) moistened with water. After 10 days under 12 h light/12 h dark conditions at 25°C, the first leaf was collected in a 2-ml tube. The leaf tissue of 38 samples was ground in liquid nitrogen and CTAB buffer, and then immediately used for DNA extraction manually. The remaining 84 samples were dried in a freeze dryer (FDU-2100, EYELA, Tokyo, Japan). These samples were lyophilized at −80°C for 12 h and stored at 4°C. The dried leaves were crushed using a ShakeMaster (Bio Medical Science Inc. Tokyo, Japan) and the leaf powder was used to extract DNA. DNA from 44 samples was extracted using the CTAB DNA extraction kit (NR-502, KURABO, Osaka, Japan) and DNA extraction robot PE-480 (GENE PREP STAR, KURABO). DNA from another 40 samples was extracted by the bead-based method of the BioSprint 96 DNA Plant Kit on robotic workstation (QIAGEN, Hilden, Germany). From the other 70 samples, DNA was extracted from the seed tissue using the BioSprint 96 DNA Plant Kit on robotic workstation, according to the manufacturer’s instruction (QIAGEN). The seed tissue samples were obtained by scraping dried seed and crushing using Zirconia beads and TissueLyser II (QIAGEN). The quality of extracted DNA was evaluated based on the DNA integrity number, which is an index showing the fragmentation degree of DNA using TapeStation (Agilent Technologies, Santa Clara, CA, USA). The DNA concentration was measured using the Qubit Fluorometer (Thermo Fisher Scientific) by exciting at 485 nm and measuring the fluorescence intensity at 520 nm. The instrument was calibrated with the Quant-iT dsDNA BR Assay kit (Thermo Fisher Scientific), according to the manufacturer’s instructions.

2.2. Ion AmpliSeq custom panel design

A custom panel targeting 29 genes was designed based on Soybean reference genome version 1.1 using the Ion AmpliSeq Designer tool version 1.2.9 using the standard DNA (125–275 bp amplicon target sizes) option. Two primer pools were designed to amplify 382 amplicons, covering 29 target genes of total length 64.98 kb (Table 1 and Supplementary Tables S2 and S3). These included the coding regions of B3 domain containing genes (E1 and homologue), Phytochrome A genes (E3 and E4), Phytochrome B genes, FT/TERMINAL FLOWER 1 (TFL1) family genes (including FT2a, FT5a, and Dt1), two-component response regulator-like genes, MADS box gene, WD repeat-containing gene (WDR61), Achaete-scute transcription factor gene, and a part of the exon of GIGANTEA (E2).

Table 1

Genomic regions or SNP targeted by the AmpliSeq design

	Gmax275 (ver. 2.0)				Gmax189 (ver. 1.1)			Description^a	Gene name^b	Target	Total designed target length of amplicons^c	Target size^d	Number of amplicon^e	Coverage^f (%)	Reference
Gene ID	Chr.	Start	End	Gene ID	Chr.	Start	End	Description^a	Gene name^b	Target	Total designed target length of amplicons^c	Target size^d	Number of amplicon^e	Coverage^f (%)	Reference
Glyma.02G069500	2	6116379	6117379	Glyma02g07650	2	6041770	6042409	FLOWERING LOCUS T	GmFTL7	Exon + UTR	861	639	6	100.0	37, 38
Glyma.03G194700	3	40522597	40525110	Glyma03g35250	3	42534078	42535865	TERMINAL FLOWER 1	GmTFL2	Exon + UTR	1,577	1,003	9	100.0	37, 38
Glyma.03G227300	3	42918771	42923401	Glyma03g38620	3	44925730	44930360	Phytochrome A	GmPHYA4	Exon + UTR	3,688	2,876	20	98.6
Glyma.04G143300^g	4	26120011	26120532	Glyma18g22670	18	25739929	25740831	B3 domain-containing protein^h	E1Lb	Exon + UTR	1,037	902	5	100.0	11, 54
Glyma.04G156400	4	36758125	36758770	Glyma04g24640	4	28293933	28294806	B3 domain-containing protein^h	E1La	Exon + UTR	967	873	5	100.0	11, 54
Glyma.06G207800	6	20207077	20207940	Glyma06g23026	6	20006928	20007814	B3 domain-containing protein^h	E1	Exon + UTR	1,059	886	6	100.0	3, 4, 11
Glyma.08G363100	8	47458142	47459829	Glyma08g47810	8	46606934	46608654	FLOWERING LOCUS T	GmFT4	Exon + UTR	1,300	746	8	100.0	20
Glyma.08G363200	8	47472881	47473362	Glyma08g47823	8	46621704	46622185	FLOWERING LOCUS T	GmFT6	Exon + UTR	612	265	3	100.0	37, 38
Glyma.09G035500	9	2960395	2967229	Glyma09g03990	9	2919887	2926740	Phytochrome B	GmPHYB1	Exon + UTR	5,150	4,336	27	99.7	32
Glyma.09G143500	9	35652219	35653967	Glyma09g26550	9	33049107	33050904	BROTHER OF FT AND TFL 1	GmTFL4	Gene	1,813	1,122	11	98.7	37
Glyma.10G141400	10	37489560	37495624	Glyma10g28170	10	36962521	36968813	Phytochrome A		Exon + UTR	5,212	4,065	29	94.6	14
Glyma.10G221500	10	45294735	45316121	Glyma10g36600	10	44732730		GIGANTEA	E2	SNP	222	1	1	100.0	3, 4, 12
Glyma.12G073900	12	5508365	5522772	Glyma12g07861	12	5496565	5511828	Two-component response regulator-like		Exon + UTR	4,116	2,871	24	100.0
Glyma.15G140000	15	11435551	11442683	Glyma15g14980	15	11415495	11422656	Phytochrome B		Exon + UTR	5,445	4,542	28	98.1	32
Glyma.16G044100	16	4135885	4137742	Glyma16g04830	16	4115033	4116923	FLOWERING LOCUS T	GmFT5a/GmFTL4	Exon + UTR	1,683	1,109	9	98.0	15, 19, 37
Glyma.16G044200	16	4162525	4164824	Glyma16g04840	16	4141774	4144073	FLOWERING LOCUS T	GmFT3a/GmFTL1	Exon + UTR	1,031	486	6	92.7	15, 37
Glyma.16G150700	16	31109999	31114963	Glyma16g26660	16	30741660	30746677	FLOWERING LOCUS T	GmFT2a/GmFTL3	Exon + UTR	1,515	935	9	99.4	9, 15, 18, 33, 37
Glyma.16G151000	16	31148829	31151842	Glyma16g26690	16	30780496	30783509	FLOWERING LOCUS T	GmFT2b/GmFTL5	Exon + UTR	860	464	5	88.0	15, 37
Glyma.16G196300	16	35777815	35779317	Glyma16g32080	16	35274147	35275762	BROTHER OF FT AND TFL 1	GmTFL3	Exon + UTR	1,554	1,038	6	99.0	37
Glyma.16G200700	16	36179891	36187469	Glyma16g32540	16	35676581	35684221	MADS box protein		Exon + UTR	2,534	1,095	15	98.8	48, 49
Glyma.17G052100	17	3955518	3958432	Glyma17g05990	17	4225839	4228888	WD repeat-containing protein 61	WDR61	Exon + UTR	1,743	1,318	9	100.0	50
Glyma.17G090500	17	7052506	7053858	Glyma17g09810	17	7317271	7318756	Achaete-scute transcription factor-relatedⁱ		Exon + UTR	1,594	1,166	9	100.0
Glyma.19G108100	19	36030632	36032867	Glyma19g28390	19	35849981	35852216	FLOWERING LOCUS T	GmFT3b/GmFTL2	Exon + UTR	952	463	5	100.0	15, 37
Glyma.19G108200	19	36049111	36051851	Glyma19g28400	19	35868460	35871203	FLOWERING LOCUS T	GmFT5b/GmFTL6	Exon + UTR	1,113	560	6	90.0	15, 37
Glyma.19G194300	19	45183357	45185175	Glyma19g37890	19	44979743	44981657	TERMINAL FLOWER 1	Dt1/GmTFL1	Exon + UTR	1,411	1,078	8	100.0	37, 53
Glyma.19G224200	19	47633059	47641958	Glyma19g41210	19	47511095	47520052	Phytochrome A	E3	Exon + UTR	5,235	4,400	27	98.6	3
Glyma.19G260400	19	50364718	50369677	Glyma19g44970	19	50244046	50249070	Pseudo-response regulator 5		Gene	5,673	4,941	31	98.3
Glyma.20G090000	20	33236018	33241692	Glyma20g22160	20	32087412	32093306	Phytochrome A	E4	Exon + UTR	4,871	4,076	25	99.2	3, 14
Glyma.U034500^g	Scaffold_32	197150	220019	Glyma11g15580	11	11232271	11255186	Two-component response regulator-like		Exon + UTR	5,352	3,833	30	98.1

Gene description from Phytozome 12.

Gene name refers to Kong et al., Wu et al., Fan et al., and Cao et al.

Total size of amplified region by designed amplicon primers.

Target size (bp) of amplicon based on soybean genome version 1.1 (Gmax189).

Total number of designed amplicons on target gene.

Percentage of target region covered by amplicon.

Glyma.04G143300 and Glyma.U034500on Gmax275 genome version were different chromosome positions on Gmax189.

The gene annotation manually curated.

‘Achaete-scute transcription factor related’ gene was included as control for variant detection.

Genomic regions or SNP targeted by the AmpliSeq design Gene description from Phytozome 12. Gene name refers to Kong et al., Wu et al., Fan et al., and Cao et al. Total size of amplified region by designed amplicon primers. Target size (bp) of amplicon based on soybean genome version 1.1 (Gmax189). Total number of designed amplicons on target gene. Percentage of target region covered by amplicon. Glyma.04G143300 and Glyma.U034500on Gmax275 genome version were different chromosome positions on Gmax189. The gene annotation manually curated. ‘Achaete-scute transcription factor related’ gene was included as control for variant detection.

2.3. Library preparation and sequencing

The Qubit dsDNA High Sensitivity Assay Kit (Thermo Fisher Scientific) was used to quantify DNA for NGS library construction. The NGS library was constructed using the Ion AmpliSeq Library Kit 2.0 (Thermo Fisher Scientific), according to the manufacturer’s protocol (Japanese version corresponding to Manual 2014.7 rev.B.0 version: https://assets.thermofisher.com/TFS-Assets/LSG/manuals/MAN0013432_Ion_AmpliSeq_Library_Prep_on_Ion_Chef_UG.pdf, 25 March 2019, date last accessed). For the multiplex-PCR amplification, 1–10 ng of each DNA was amplified using one primer pool (191 amplicon primer pairs) per reaction. This was performed using 4 µl of 5× Ion AmpliSeq HiFi Master Mix, 10 µl of 2× AmpliSeq Custom Primer Pool, 1–10 ng of DNA, and the volume was made up to 20 µl with nuclease-free water. The reaction mix was heated for 2 min at 99°C for enzyme activation, followed by 18 two-step cycles at 99°C for 15 s and at 60°C for 4 min, and ending with a holding period at 10°C. As for low-quality DNA samples, 21 cycles were subjected under similar conditions. The primers of amplicons were digested and phosphorylated for adapter ligation using 2 µl of FuPa enzyme per sample at 55°C for 10 min, followed by enzyme inactivation at 60°C for 20 min. To enable multiple libraries to be loaded per chip, 2 µl of a unique diluted mix, including Ion Xpress Barcode and Ion P1 Adapters at standard volumes, was ligated to the end of the digested amplicons using 2 µl of DNA ligase at 22°C for 30 min, followed by ligase inactivation for 10 min at 72°C. The resulting un-amplified adapter-ligated library was purified using 45 µl of Agencourt AMPure XP Reagent (Beckman Coulter, Brea, CA, USA), followed by washing using 150 µl of freshly prepared 70% ethanol. After purification, 50 µl of Platinum PCR SuperMix High Fidelity and 2 µl of Library Amplification Primer Mix of the Ion AmpliSeq Library Kit 2.0 were added to the dried AMPure XP beads, and then the reaction plate was placed on a magnetic rack to separate the beads from the supernatant. The amplicon library in the supernatant was further amplified to enrich the material for accurate quantification at 98°C for 2 min, followed by five two-step cycles at 98°C for 15 s and at 60°C for 1 min. The amplified amplicon library was then purified using 25 µl of AMPure XP, followed by a second purification step with 60 µl of AMPure XP and 150 µl of freshly prepared 70% ethanol. The concentration and size distribution of amplicons in the library were then determined using an Agilent BioAnalyzer DNA High-Sensitivity chip or TapeStation 4200 D1000 chip (Agilent Technologies), according to the instruction of the manufacturer. After quantification, each library was diluted to a concentration of 100 pM prior to template preparation. Subsequently, the libraries were pooled in equimolar amounts prior to further processing. Emulsion PCR, emulsion breaking, and enrichment for template preparation of ion sphere particles were performed using the Ion 520 & 530 and 540 Kit-Chef (Thermo Fisher Scientific) according to the instruction of the manufacturer. After the preparation of ion sphere particles, sequencing was performed with an Ion Torrent Ion S5 or S5XL system using Ion 520 and 540 Chip (Thermo Fisher Scientific), according to the instruction of the manufacturer.

2.4. Data analysis

The Ion S5/S5XL sequence data were mapped to the soybean genome reference version 2.0 (Gmax275: https://genome.jgi.doe.gov/portal/pages/dynamicOrganismDownload.jsf?%20organism=Phytozome, 25 March 2019, date last accessed) using Ion Torrent Suite version 5.2.1 software. In typical genome databases of soybean (Williams 82), such as, Phytozome and Soybase, Gmax275 is widely used instead of Gmax189. The assembly size and number of predicted protein-coding loci of Gmax275 are 978 Mb and 56,044, which are higher than 969.6 Mb and 46,430 of Gmax189, respectively. In the Gmax275 assembly, several genes are located on chromosomes/scaffolds different from those of Gmax189. For example, 238 genes on the chromosomes of Gmax189 are located on the scaffolds of Gmax275, whereas 100 genes on the scaffolds of Gmax189 are located on the chromosomes of Gmax275 (http://www.soybase.org/correspondence/methods.txt, 25 March 2019, date last accessed). In this study, one of the target genes, Glyma18g22670 (B3 domain-containing protein), on chromosome 18 of Gmax189 was located on chromosome 4 (Glyma.04G143300) in Gmax275. However, the structure of all target genes used in this study was the same between Gmax189 and Gmax275. Ion Torrent Suite software was optimized for Ion Torrent raw data analysis; alignment using Torrent Mapping Alignment Program (TMAP) version 5.2.25 and Coverage Analysis v.5.8.0.8, and variant calling using Torrent Variant Caller version 5.2.1.38 and plug-in version 5.2.25. To evaluate PCR amplification efficiency of each amplicon, the amplicons per 100k reads mapped (APKM) as the count scaled by the total number of amplicons sequenced N times per 100k reads as follows: where X represents the read coverage X of target amplicon i. Variant calling was performed using the default (low stringency) and custom parameters (Supplementary Table S4). All accessions used in this study were propagated by the single seed descent method; therefore, all variants should be detected as homozygous theoretically. The parameter of TMAP in Torrent Variant Caller was changed to loosen the judgment condition of homozygous by setting ‘snp_min_allele_freq’ from 0.15 (default) to 0.3. In this condition, the SNP was detected with allele frequency of >70% as homozygous. The InDels were detected as homozygous when allele frequency was >75% (default parameter). Sequence variants detected as heterozygous under these conditions were excluded. The vcf files obtained were annotated and filtered using the snpEff version 4.0e.

2.5. Detection of known and novel SNPs and InDels

The known alleles E1, E2, E3, and E4 were investigated as described above. The polymorphism information from the Single Nucleotide Polymorphism Database (dbSNP: https://www.ncbi.nlm.nih.gov/snp, 25 March 2019, date last accessed, data downloaded on 29 April 2016) was used to investigate whether the detected polymorphism has already been identified in the whole-genome sequence of soybean. As the position of soybean genome in the National Center for Biotechnology Information (NCBI) and Phytozome v12.1 are not consistent, we converted the position of SNP in the dbSNP from the NCBI to that of Gmax275 of Phytozome v12.1. The SNP ID number (rs; refSNP cluster) was used for the SNP name.

2.6. Validation of SNPs and InDels

The detected variants of the Phytochrome A (E3, E4) and FT genes (FT2a and FT5a) were further confirmed by Sanger sequencing. The exon containing the novel variants was amplified by PCR using the primers shown in Supplementary Table S5. The PCR product was purified using Affymetrix ExoSap-IT regent (ExoSap-IT, USB Corporation, Staufen, Germany) and directly sequenced for both sense and antisense strands using Big Dye Terminator version 3.1 (Applied Biosystems, Foster City, CA, USA) in an ABI 3500 Genetic Analyzer (Applied Biosystems), according to the manufacturer’s protocol. The sequences were analysed using Genetics software version 10.0.8 (GENETYX Corp., Japan).

2.7. Gene-based multiple regression association testing for flowering time

Flowering time was evaluated from 2011 to 2013 at the National Institute of Crop Science (36°02′N, 140°11′E), Tsukuba, Japan. Seeds were sown on 12 July 2011, and 10 July 2012 and 2013. A starter fertilizer containing 3, 10, and 10 g m−2 of N, P2O5, and K2O, respectively, was applied. Each accession was planted in single-row plots. Each row was paved 0.7 m apart and each plot comprised 12 plants that were spaced 0.13 m apart. The average days to flowering in each plot were used for analysis. Association between days to flowering and each polymorphic SNPs/InDels was assessed using linear regression, where the simulated trait values across the 190 individuals were regressed onto the numeric code of each SNP and InDel genotype; this tested the null hypothesis of the additive allelic effect on the trait. Regression analyses were performed using ‘lm()’ in R. First, simple linear regression analysis was performed to assess the influence of the detected variant on flowering time at the significance level of P < 0.05. Subsequently, multiple linear regression analysis was performed using the significantly representative variants after removing redundant variants at the significance level of P < 0.05.

3. Results and discussion

3.1. Amplicon design and comparison of library quality using DNA samples derived from the leaf and seed

To evaluate the performance of AmpliSeq, we focused on gene region of 29 genes (Table 1) selected from known genes related to flowering time and their homologues in soybean. A total of 382 amplicon primer pairs consisting of two primer pools (Supplementary Table S3) were designed for the 64.98-kb target region using the AmpliSeq designer tool. These primer pairs covered 98.4% of the target region, ranging from 89.8% to 100%, by overlapping PCR products of total length 70,180 bp (Supplementary Table S2). The average amplicon size including primer region was 237.1 bp ranging from 125 to 275 bp (target region was 65–232 bp). The target gene with the lowest coverage (89.8%) was Glyma.16g044200 (FT-like gene). We examined the DNA quality necessary for AmpliSeq library construction because genotyping is commonly performed using low-quality DNA especially that derived from the seed of soybean for marker-assisted selection. Low-quality DNA derived from the seed was obtained at concentrations of 0.5–3 ng/µl, whereas high-quality DNA from the leaf was obtained at concentrations of 30–50 ng/µl (Fig. 1A). We used 1–10 ng of seed-derived DNA and 10 ng of leaf-derived DNA for preparing AmpliSeq library. To confirm whether low-quality DNA can produce a library of sufficient quality, distribution range of amplicons in the libraries prepared using low-quality DNA was compared with that of high-quality DNA, which is recommended for sequencing using the Agilent 2100 Bioanalyzer or TapeStation 4200 (Fig. 1A and B). No difference was observed between low-quality DNA from the seed and high-quality DNA from the leaf in the size range of amplicons (130–370 bp) or maximum peak amplitude (Fig. 1B and C). These results reveal that the AmpliSeq library of sufficient yield and quality can be prepared from low-quality DNA. We then prepared sufficient amount of library using DNA derived from the leaf or seed of the soybean mini-core collection.

Figure 1

Evaluation of quality of DNA and AmpliSeq library prepared from the DNA using the Agilent 2200 TapeStation system. The AmpliSeq libraries were evaluated using the D1000 screen tape. (A) Quality of the DNA derived from the leaf and seeds. W and E indicate Williams 82 and Enrei, respectively. The numerical assessment of DNA quality ranged from 1 to 10 based on the DNA integrity number (DIN). A high DIN indicates highly intact DNA, whereas a low DIN indicates degraded DNA. (B) Distribution of amplicons in the AmpliSeq library shown as a gel image. (C) Electropherogram of the same AmpliSeq library as shown in (B). Lower (25 bp) and upper (1,500 bp) peak are the standard markers. The middle peak indicates the library.

3.2. Performance of NGS and uniformity of amplicon coverage

Among 105,761,267 reads obtained, 105,603,249 (99.85%) reads were mapped to Williams 82 reference genome Gmax275 using TMAP and the average read depth across the target region was 1,237× (Supplementary Table S6). According to the on-target rate, 94.12% of the reads was mapped to the targeted regions. The average read length, average on-target rate, and uniformity (percent of reads >0.2× of mean coverage in the sample) of the leaf and seed samples were similar, but a few seed samples showed lower average read length and uniformity (Fig. 2A). The highly fragmented DNA sample showed low amplification of long amplicons (> 200 bp) and low uniformity (Supplementary Fig. S1A and B). Low average amplicon length or low uniformity of the samples might be caused by DNA fragmentation or contamination of the DNA solution.

Figure 2

Comparison of sequence performance metrics classified by the plant materials used for DNA extraction. (A) The box plot of average read length, on-target rate, and uniformity. The on-target rate is on-target percent of the aligned reads. Uniformity is the percent of bases in all the amplicon-targeted regions covered by at least 0.2× the mean base read depth. (B) Average normalized reads (APKM) per sample across 382 amplicons generated from the 192 mini-core collection. The X and Y axes indicate 382 amplicons sorted in their read coverage and APKM shown as the mean on a log scale, respectively. To compare the efficiency of PCR amplification of each amplicon, the APKM was calculated for each amplicon. The magnitude of APKM was similar irrespective of the type of DNA sample between the leaf and seed (Supplementary Fig. S2A). In contrast, there was no relationship between the APKM and read length of amplicons (Supplementary Fig. S2B). The APKM ranged from 0 to 3,333. Only one amplicon (AMPL1040290) had zero read. As the primer pair of AMPL1040290 was designed for the region flanking the TA-repeat microsatellite, it is difficult to amplify by multiplex PCR. We could amplify amplicons of 400–450 bp using the single primer pair of AMPL1040290. We evaluated known alleles as an example to verify whether the reads were correctly mapped. Among two alleles with a large deletion, e3-Mo and e3-tr, at the E3 gene on Chr19, the read mapping status of the e3tr allele, with a 15-kb deletion including the fourth exon, was examined by designing four amplicons, viz., AMPL1040313, AMPL1040314, AMPL1037722, and AMPL1036854 (Fig. 3). The mapped reads from Williams 82 covered the entire fourth exon by the four amplicons (Fig. 3A). In contrast, the reads from PGC010 were mapped to a part of the fourth exon, which could not be mapped because of absence of the fourth exon in PGC010. When we confirmed the sequence of the mapped reads, these are found to contain several polymorphic sites originated from another region (Supplementary Fig. S3A). The primer pair of AMPL1040313 was found to have a similar (2- and 1-bp mismatches in the forward and reverse primers) sequence to that of the E3 homologous gene on Chr3 (Supplementary Fig. S3B). A comparison of sequence of the original amplicon designed for the E3 gene on Chr19 with that designed for the phyA gene on Chr3 revealed that the amplicon of phyA could be mapped preferentially to E3 in the absence of sequence information (Supplementary Fig. S3C). To preferentially output the alignment containing indel, we changed the penalty parameter by using the option ‘-A 10 -M 60 -O 50 -E 1’ to TMAP. The default parameters of TMAP option are as follows: ‘-A’, score for a match [default = 1]; ‘-M’, the mismatch penalty [3]; ‘-O’, the indel start penalty [5]; and ‘-E’, the indel extension penalty [2]. By increasing the penalty values related to base match and InDels, the miss-mapped reads on the fourth exon can be reduced from 2,426 (default parameter) to 3 reads (Fig. 3A). By optimizing these parameters, the miss-mapped reads on the second exon of E3 were also mapped to the correct position, phyA on Chr3 (second exon; Fig. 3A and Supplementary Fig. S4). We also investigated whether the 15-kb deletion can be detected using the read coverage. The APKM of the four amplicons located at the fourth exon of E3 was compared between the E3 and e3tr alleles (Fig. 3B). The number of accessions classified as E3 and e3tr by additional marker analysis was 152 and 36, respectively (Supplementary Table S7). The APKM of four amplicons on the fourth exon of the e3tr allele was almost zero, whereas the APKM of the E3 allele varied depending on the accessions, and it was difficult to judge the presence or absence of deletion from the APKM of each amplicon. However, it was possible to classify the presence or absence of deletion clearly (Fig. 3B and Supplementary Table S7) when the average APKM of the four amplicons was used instead, because differences in amplification efficiency due to sequence variation at the priming site can be cancelled using the APKM of multiple amplicons.

Figure 3

Distribution of mapped sequence reads of two different alleles in the E3 gene region. (A) Top: Read coverage of Williams 82 with the default parameters of TMAP. Bottom: Read coverage of PGC010 with the default and optimized parameters of TMAP. The E3 gene of PGC010 has a large deletion in the fourth exon. (B) APKM of amplicons from variant (e3T) and wild type (WT) on the fourth exon of E3. Average APKM of four amplicons (left most) and APKM of each amplicon—(a) AMPL1040313, (b) AMPL1040314, (c) AMPL1037722, and (d) AMPL1036854. As described above, appropriate parameters are required to map short amplicon reads to the correct genomic region. As the AmpliSeq technology has been mainly used in animals in which palaeopolyploidy is considerably rare, no such limitation has been reported. Soybean is an ancient tetraploid, which underwent two whole-genome duplications (palaeopolyploidy); most of the genes have paralogous genes with multiple copies. The information provided above would be useful when the AmpliSeq technology is applied to plant species, which have experienced whole-genome duplication or triplications.

3.3. Detected variants of flowering time-related genes

A total of 192 soybean mini-core collection was analysed to detect novel variants in flowering time-related genes by AmpliSeq. Among the 461 variants (SNPs or InDels) detected in the target regions, 311 (67.5%) sites have already been reported or registered in dbSNP, whereas 150 sites (32.5%) were novel (Table 2 and Supplementary Table S8). The variants detected were compared in depth with information of flowering time-related genes, E1, E2, E3, E4, FT2a, FT5a, and their homologues.,, Further, we performed linear regression analysis to detect responsible variants associated with flowering time under long-day field conditions. Among the 461 variants, 207, 206, and 219 were found to be potentially associated with flowering time in 2011, 2012, and 2013 by the simple linear regression analysis, respectively (Table 3 and Supplementary Table S8). The most significantly associated variant with flowering time was SNP (rs124971350) at E2 (e2 allele) (P < 2.0e−16 for 3 yrs) (Table 3 and Supplementary Table S8). The second most significant variant was a large deletion in E3 (e3-tr allele) (P < 1.9e−13, 1.6e−13, and 5.2e−14 for 2011, 2012, and 2013, respectively). The other seven genes, WD repeat-containing protein 61, Dt1, MADS-box protein, two genes of two-component response regulator-like genes, FT2a, and PhyB, showed highly significant association with flowering time (P < 0.0001) (Table 3 and Supplementary Table S8).

Table 2

Classification of detected variants of 28 flowering time-related genes by snpEff analysis

Glyma ID	Description	Gene symbol	Number of variants in total	Number of novel variants	Number of known variants	Number of missense variants	Number of ‘high’ effect variants to gene function^a
Glyma.02G069500	FLOWERING LOCUS T	GmFTL7	10	2	8	0	0
Glyma.03G194700	TERMINAL FLOWER 1	GmTFL2	2	0	2	0	0
Glyma.03G227300	Phytochrome A	GmPHYA4	70	5	65	35	5
Glyma.04G143300	B3 domain-containing protein	E1Lb	3	1	2	0	0
Glyma.04G156400	B3 domain-containing protein	E1La	4	0	4	0	0
Glyma.06G207800	B3 domain-containing protein	E1	5	2	3	3	2
Glyma.08G363100	FLOWERING LOCUS T	GmFT4	9	2	7	0	0
Glyma.08G363200	FLOWERING LOCUS T	GmFT6	6	1	5	0	0
Glyma.09G035500	Phytochrome B	GmPHYB1	10	6	4	2	0
Glyma.09G143500	BROTHER OF FT AND TFL 1	GmTFL4	1	1	0	0	0
Glyma.10G141400	Phytochrome A		20	8	12	3	1
Glyma.10G221500	GIGANTEA	E2	2	1	1	1	1
Glyma.12G073900	Two-component response regulator-like		18	3	15	5	1
Glyma.15G140000	Phytochrome B		27	9	18	7	2
Glyma.16G044100	FLOWERING LOCUS T	GmFT5a/GmFTL4	14	5	9	0	0
Glyma.16G044200	FLOWERING LOCUS T	GmFT3a/GmFTL1	8	2	6	1	0
Glyma.16G150700	FLOWERING LOCUS T	GmFT2a/GmFTL3	22	13	9	2	1
Glyma.16G151000	FLOWERING LOCUS T	GmFT2b/GmFTL5	9	3	6	1	0
Glyma.16G196300	BROTHER OF FT AND TFL 1	GmTFL3	20	5	15	1	0
Glyma.16G200700	MADS box protein		45	15	30	7	0
Glyma.17G052100	WD repeat-containing protein 61	WDR61	15	1	14	0	1
Glyma.17G090500	Achaete-scute transcription factor-related		17	10	7	0	6
Glyma.19G108100	FLOWERING LOCUS T	GmFT3b/GmFTL2	0	0	0	0	0
Glyma.19G108200	FLOWERING LOCUS T	GmFT5b/GmFTL6	11	3	8	1	0
Glyma.19G194300	TERMINAL FLOWER 1	Dt1/GmTFL1	14	5	9	5	0
Glyma.19G224200	Phytochrome A	E3	16	8	8	5	3
Glyma.19G260400	Pseudo-response regulator 5		42	11	31	8	0
Glyma.20G090000	Phytochrome A	E4	16	2	14	1	0
Glyma.U034500	Two-component response regulator-like		26	26	0	5	7

High-impact variant to gene function includes criteria of stop lost, stop gained, and frameshift.

Table 3

DNA polymorphisms in nine genes significantly associated with flowering time variation in the mini-core collection as revealed by the single linear regression analysis

Glyma ID	Gene description	Gene symbol^a	Chromosome	Position (bp)	Reference sequence	Alternative sequence	SNP name	P value of association tests			Functional effect	Amino acid change	Functional class
Glyma ID	Gene description	Gene symbol^a	Chromosome	Position (bp)	Reference sequence	Alternative sequence	SNP name	2011	2012	2013	Functional effect	Amino acid change	Functional class
Glyma.06G207800	B3 domain- containing protein	E1	Chr06	20207322	C	G	rs123612969 (e1-as)	0.0226	0.0153	0.0333	Missense_variant	p.Thr75Arg/c.224C>G	MISSENSE
Glyma.10G221500	GIGANTEA	E2	Chr10	45310798	A	T	rs124971350 (e2)	2E-16	2E-16	2E-16	Stop-gained	p.Lys527*/c.1582A>T	HIGH

Glyma.12G073900	Two-component response regulator-like	-	Chr12	5508242	T	G	rs125308101	0.00258	0.0082	0.0019	Upstream_gene_ variant
			Chr12	5508672	CT	C	rs745192414	0.00155	0.00261	0.00503	Intron_variant	c.-120 + 219delT
			Chr12	5508702	T	C	rs743387935	0.0211	0.0228	0.0109	Intron_variant	c.-119-198T>C
			Chr12	5509310	G	A	rs388619068	0.0146	0.0119	0.00907	Missense_variant	p.Asp98Asn/c.292G>A	MISSENSE
			Chr12	5509317	C	T	rs125308103	0.00107	0.00148	0.00319	Missense_variant	p.Ser100Leu/c.299C>T	MISSENSE
			Chr12	5519728	A	C	rs125308115	0.00229	0.00749	0.00169	Missense_variant	p.Lys378Gln/c.1132A>C	MISSENSE
			Chr12	5520578	A	G	rs389122657	0.014	0.0112	0.00865	Synonymous_ variant	p.Ala504Ala/c.1512A>G	SILENT
			Chr12	5520945	T	C	rs125308117	8.96E-10	1.76E-08	7.74E-11	Stop_lost	p.Ter627Glnext*?/c.1879T>C	MISSENSE
HIGH			Chr12	5521029	G	T	rs743179814	0.0244	0.0203	0.0089	3_Prime_UTR_ variant	c.*82G>T
Glyma.15G140000	Phytochrome B	-	Chr15	11436193	G	A	rs388435741	5.17E-06	7.93E-06	8.47E-06	3_Prime_UTR_ variant	c.*218C>T
			Chr15	11441207	C	T	rs126279495	5.17E-06	7.93E-06	8.47E-06	Missense_variant	p.Val394Ile/c.1180G>A	MISSENSE
			Chr15	11442400	A	C	rs126279502	0.0753	0.0769	0.0431	5_Prime_UTR_ variant	c.-14T>G
Glyma.16G150700	FLOWERING LOCUS T	GmFT2a/GmFTL3	Chr16	31110004	TATAAGAAAGC	T	rs392064733_1	0.0397	0.117	0.0266	5_Prime_UTR_ variant	c.-50_-59delATAAGAAAGC
			Chr16	31110004	TATAAGAAAGCA	TA	rs392064733_2	0.0397	0.117	0.0266	5_Prime_UTR_ variant	c.-49_-58delTAAGAAAGCA
			Chr16	31110991	A	AATAT	rs864598505_1	6.24E-05	0.000274	0.000111	Intron_variant	c.202-56_202-55insATAT
			Chr16	31111033	T	A	rs126829817	6.24E-05	0.000274	0.000111	Intron_variant	c.202-15T>A
			Chr16	31111042	T	C	rs126829818	0.0697	0.21	0.0419	Splice_region_ variant	c.202T>C
			Chr16	31111349	A	G	rs126829819	6.24E-05	0.000274	0.000111	Intron_variant	c.304 + 72A>G
			Chr16	31114633	G	A	Chr16_31114633	5.46E-06	4.14E-06	3.05E-07	Missense_ variant	p.Gly169Asp/c.506G>A	MISSENSE
			Chr16	31114658	AGA	AA	Chr16_31114658_2	0.0765	0.0538	0.043	3_Prime_UTR_ variant	c.*1delG
			Chr16	31114930	G	T	rs126829846	6.24E-05	0.000274	0.000111	3_Prime_UTR_ variant	c.*272G>T
Glyma.16G200700	MADS box protein	-	Chr16	36179909	T	G	rs126888526	1.13E-07	3.18E-06	3.35E-07	3_Prime_UTR_ variant	c.*218A>C
			Chr16	36180002	G	A	rs126888527	1.09E-07	1.63E-06	1.98E-07	3_Prime_UTR_ variant	c.*125C>T
			Chr16	36180087	T	TGA	Chr16_36180087	8.8E-08	2.16E-06	2.07E-07	3_Prime_UTR_ variant	c.39_40insTC
			Chr16	36180122	A	C	rs126888528	4.19E-07	9.01E-06	1.03E-06	3_Prime_UTR_ variant	c.*5T>G
			Chr16	36180183	T	A	rs389022394	4.96E-07	1.03E-05	1.32E-06	Missense_variant	p.Thr219Ser/c.655A>T	MISSENSE
			Chr16	36182854	T	TA	Chr16_36182854	6.15E-06	4.11E-05	3.79E-06	Intron_variant	c.511-49_511-50insT
			Chr16	36182857	T	A	rs126888574	8.41E-07	7.3E-06	5.2E-07	Intron_variant	c.511-52A>T
			Chr16	36183423	T	G	rs126888580_2	6.9E-06	5.27E-06	2.61E-06	Intron_variant	c.427-19A>C
			Chr16	36183426	G	C	rs126888581_1	0.00286	0.00159	0.0018	Intron_variant	c.427-22C>G
			Chr16	36183435	G	C	rs126888582	6.8E-06	5.07E-06	2.66E-06	Intron_variant	c.427-31C>G
			Chr16	36183450	G	A	rs126888583	0.0104	0.00721	0.00804	Intron_variant	c.427-46C>T
			Chr16	36183510	C	T	rs126888584	0.000219	0.00275	0.000461	Intron_variant	c.427-106G>A
			Chr16	36183518	A	G	rs126888585	3.22E-10	9.24E-09	3.46E-10	Intron_variant	c.427-114T>C
			Chr16	36183541	C	A	rs126888586	2.98E-10	8.83E-09	3.27E-10	Intron_variant	c.427-137G>T
			Chr16	36183556	T	C	Chr16_36183556	0.0507	0.0363	0.048	Intron_variant	c.427-152A>G
			Chr16	36183568	T	A	rs126888588	0.00193	0.0172	0.00328	Intron_variant	c.427-164A>T
			Chr16	36183573	T	A	rs126888589	0.00193	0.0172	0.00328	Intron_variant	c.427-169A>T
			Chr16	36184165	T	C	rs126888605	4.71E-05	3.54E-05	0.000019	Intron_variant	c.327-41A>G
			Chr16	36184729	T	C	rs126888609	2.18E-09	2.34E-08	6.84E-10	Missense_variant	p.Thr79Ala/c.235A>G	MISSENSE
			Chr16	36184733	C	T	rs126888610	2.18E-09	2.34E-08	6.84E-10	Synonymous_ variant	p.Ser77Ser/c.231G>A	SILENT
			Chr16	36184819	A	T	rs126888611	0.0341	0.0817	0.0244	Intron_variant	c.183-38T>A
			Chr16	36187211	A	C	rs744233319	0.000287	0.000744	0.00118	Synonymous_ variant	p.Ser36Ser/c.108T>G	SILENT
			Chr16	36187552	T	G	rs126888636	0.0383	0.0315	0.0302	Upstream_gene_ variant
			Chr16	36187600	A	G	rs126888637	0.0383	0.0315	0.0302	Upstream_gene_ variant
Glyma.17G052100	WD repeat-containing protein 61	WDR61	Chr17	3955280	A	AT	rs126942305_1	0.00214	0.00191	0.00197	Downstream_gene_ variant
			Chr17	3955411	A	G	rs126942313	2.56E-10	3.12E-10	1.07E-10	Downstream_gene_ variant
			Chr17	3955475	TC	AA	rs126942314_1	0.00359	0.00645	0.00279	Downstream_gene_ variant
			Chr17	3955476	C	A	rs126942315	0.00359	0.00645	0.00279	Downstream_gene_ variant
			Chr17	3955546	T	A	rs126942316	0.00374	0.00157	0.00264	3_Prime_UTR_ variant	c.*125A>T
			Chr17	3955554	T	C	rs126942317	0.031	0.0299	0.0358	3_Prime_UTR_ variant	c.*117A>G
			Chr17	3955716	A	G	rs388258139	0.00418	0.00761	0.00338	Synonymous_ variant	p.Ala307Ala/c.921T>C	SILENT
			Chr17	3955763	C	CG	Chr17_3955763	0.00444	0.00194	0.00318	Frameshift_ variant	p.Val291_Ala292fs/c.873_874insC	HIGH
			Chr17	3955764	A	G	rs126942318	0.00374	0.00157	0.00264	Synonymous_ variant	p.Val291Val/c.873T>C	SILENT
			Chr17	3955884	G	C	rs126942319	0.00374	0.00157	0.00264	Synonymous_ variant	p.Val251Val/c.753C>G	SILENT
			Chr17	3956142	T	C	rs126942321	6.36E-08	3.72E-08	1.25E-08	Synonymous_ variant	p.Ala165Ala/c.495A>G	SILENT
			Chr17	3956163	T	C	rs126942322	0.00644	0.00295	0.00466	Synonymous_ variant	p.Lys158Lys/c.474A>G	SILENT
			Chr17	3958319	C	G	rs126942340	9.03E-10	4.86E-10	1.98E-10	Synonymous_ variant	p.Ser16Ser/c.48G>C	SILENT
			Chr17	3958374	T	G	rs126942341	2.32E-10	9.96E-11	5.12E-11	5_Prime_UTR_ variant	c.-8A>C
Glyma.19G194300	TERMINAL FLOWER 1	Dt1/GmTFL1	Chr19	45183701	T	A	rs127928573 (dt1)	0.00122	0.00587	0.000401	Missense_variant	p.Arg166Trp/c.496A>T	MISSENSE
			Chr19	45183808	C	T	rs745009806	0.0319	0.0184	0.0153	Missense_variant	p.Arg130Lys/c.389G>A	MISSENSE
			Chr19	45183859	G	A	rs127928574	4.29E-11	3.71E-11	6.02E-10	Missense_variant	p.Pro113Leu/c.338C>T	MISSENSE
			Chr19	45184581	G	T	Chr19_45184581	0.014	0.153	0.0428	Splice_region_ variant	c.202C>A
			Chr19	45185131	C	T	Chr19_45185131	0.00805	0.00351	0.00764	5_Prime_UTR_ variant	c.-142G>A
Glyma.19G224200	Phytochrome A	E3	Chr19	47633086	T	TA	Chr19_47633086	0.000763	0.000353	0.00085	5_Prime_UTR_ variant	c.-693_-694insA
			Chr19	47634596	A	G	rs127944661	0.0052	0.00108	0.00429	Synonymous_ variant	p.Ser43Ser/c.129A>G	SILENT
			Chr19	47635025	C	T	rs388644281	3.18E-05	3.43E-05	3.69E-05	Synonymous_ variant	p.Ile186Ile/c.558C>T	SILENT
			Chr19	47635737	C	A	rs389001110	0.0534	0.052	0.048	Missense_variant	p.Leu424Ile/c.1270C>A	MISSENSE
			Chr19	47636564	G	A	rs127944664	0.0098	0.00232	0.00888	Intron_variant	c.2074 + 23G>A
			Chr19	47636607	G	T	rs127944665	0.0127	0.0031	0.0113	Intron_variant	c.2074 + 66G>T
			Chr19	47637258	A	G	rs393405985_1	0.044	0.0447	0.0427	Missense_variant	p.Thr832Ala/c.2494A>G	MISSENSE
			Chr19	47638302	G	A	Chr19_47638302 (e3-Mo)	0.0433	0.0306	0.0485	Missense_variant	p.Gly1050Arg/c.3148G>A	MISSENSE
			Chr19	47638344	A	T	rs389636522	0.00508	0.00113	0.00433	Splice_region_ variant	c.3183A>T
			Chr19	47641562		15 kb deletion	e3-tr	1.9E-13	1.61E-13	5.15E-14	Loss of exon 4		HIGH
Glyma.U034500	Two-component response regulator-like	-	Scaffold_32	197169	C	T	Scaffold_32_197169	0.000318	0.000771	0.000612	3_Prime_UTR_ variant	c.*958G>A
			Scaffold_32	197421	T	C	Scaffold_32_197421	0.000034	0.00007	8.54E-05	3_Prime_UTR_ variant	c.*706A>G
			Scaffold_32	197459	A	T	Scaffold_32_197459	1.23E-09	1.94E-08	2.71E-09	3_Prime_UTR_ variant	c.*668T>A
			Scaffold_32	198053	T	A	Scaffold_32_198053	2.16E-05	4.99E-05	0.000048	Intron_variant	c.*133 + 29A>T
			Scaffold_32	198773	A	AT	Scaffold_32_198773	1.85E-06	3.75E-06	2.85E-06	Frameshift_variant	p.Met737_Ala738fs/c.2209_2210insA	HIGH
			Scaffold_32	199551	A	C	Scaffold_32_199551	4.88E-09	6.54E-08	1.23E-08	Synonymous_ variant	p.Ala529Ala/c.1587T>G	SILENT
			Scaffold_32	199605	T	G	Scaffold_32_199605	2.16E-05	4.99E-05	0.000048	Intron_variant	c.1575-42A>C
			Scaffold_32	199656	TA	T	Scaffold_32_199656	0.0096	0.0104	0.011	Intron_variant	c.1574 + 2delT
			Scaffold_32	199722	C	G	Scaffold_32_199722	0.000318	0.000771	0.000612	Missense_variant	p.Gly504Ala/c.1511G>C	MISSENSE
			Scaffold_32	202754	G	A	Scaffold_32_202754	0.0142	0.01	0.0105	Stop_gained	p.Arg308*/c.922C>T
HIGH			Scaffold_32	206717	A	G	Scaffold_32_206717	1.23E-09	1.94E-08	2.71E-09	Intron_variant	c.792 + 27T>C
			Scaffold_32	216494	C	T	Scaffold_32_216494	4.88E-09	6.54E-08	1.23E-08	Synonymous_ variant	p.Val168Val/c.504G>A	SILENT
			Scaffold_32	218380	A	T	Scaffold_32_218380	3.29E-09	4.29E-08	8.4E-09	Synonymous_ variant	p.Pro73Pro/c.219T>A	SILENT

Gene names refer to those in Kong et al., Fan et al., and Cao et al.

Classification of detected variants of 28 flowering time-related genes by snpEff analysis High-impact variant to gene function includes criteria of stop lost, stop gained, and frameshift. DNA polymorphisms in nine genes significantly associated with flowering time variation in the mini-core collection as revealed by the single linear regression analysis Gene names refer to those in Kong et al., Fan et al., and Cao et al.

E1 and E1-like genes

Five alleles, E1, e1-as, e1-nl, e1-fs, and one novel missense (Chr06_20207355) were identified at E1 (Fig. 4, Supplementary Fig. S5A, and Supplementary Tables S8 and S9). In the other two B3 domain containing E1-like genes, only one synonymous variant (rs123097808) was detected in Glyma.04G156400/E1La, whereas no variant was detected in the coding region of Glyma.04G143300/E1Lb (Supplementary Fig. S5B and C). Among the five E1 alleles, the frequency of e1-as allele (Chr06:20207322 C: Williams 82 type) was 0.09 and Chr06:20207322 C to G nucleotide change (rs123612969) was 0.91 among the soybean mini-core collection. In contrast, e1-nl, which lacks the entire E1 gene, was only found in Swedish cultivar FiskebyV (PGC001) (Supplementary Table S9). This allele was determined by the read coverage at the E1 genomic region. The average normalized read coverage of all six amplicons (AMPL1037682–AMPL1037687) was three in PGC001, whereas that of these amplicons in the other accessions was 181 (ranging from 36 to 430). Additional experiments to confirm the deletion in the E1 genomic region by PCR amplification revealed that only PGC001 lacks the E1 genomic region among all accessions and possesses the e1-nl allele. Another allele, e1-fs, which had 1-bp deletion variant (Chr06_20207323) was also found in one accession PGC002 (Fig. 4, Supplementary Fig. S5A, and Supplementary Table S9). The deletion (Chr06_20207323) causes a frameshift and introduces a premature stop codon at Lys76. These loss-of-function alleles were not included in the association analysis due to very low allele frequency (only one accession each), but might explain very early flowering of PGC001 and PGC002 under long-day-length field condition. PGC002 (Wase kuro daizu) is originated from the southern part of Japan and classified as the summer-type soybean, early-maturity group in low-latitude regions of Japan. The summer-type soybean has low photoperiod sensitivity, and e1-fs can explain this characteristic. In contrast, the novel missense (Chr06_20207355) from PGC139 and PGC147 (Supplementary Fig. S5A and Supplementary Table S9) did not show large effect on flowering time and might not significantly affect the E1 function.

Figure 4

Detected variants in E1, E3, E4, FT5a, and FT2a from 192 mini-core collection. The grey and white boxes indicate UTRs and exons. The solid lines indicate 5′-uptream and intron regions. The black and grey circles indicate loss-of-function (described as ‘HIGH’ impact on the gene function in Supplementary Table S8) and missense variants. Braces indicate known large InDels. These InDels can be detected by read depth.

E2

Two alleles, E2 and e2, and one novel SNP variant (Chr10_45310686) were detected at E2 (Fig. 4, Supplementary Fig. S5D, and Supplementary Table S8). Among them, functional defective e2 allele had A to T nucleotide change (Table 3, K528*, rs124971350) and the allele frequency among the soybean mini-core collection was 0.42 (Supplementary Table S8). A novel SNP (Chr10_45310686), which causes missense variant of Ile490Met, was detected only in PGC086 with the e2 allele (Supplementary Table S9 and Supplementary Fig. S5D).

E3

Among two alleles, e3-tr and e3-Mo, detected at E3, the frequency of e3-tr allele, which has a large deletion in the fourth exon, was 0.19 (Fig. 4, Supplementary Fig. S5E, and Supplementary Tables S7 and S8). The missense variant of e3-Mo (Chr19_47638302: G to A, Gly1050Arg) in the third exon of E3 was detected in PGC019 and PGC042 (Moshidou Gong 503) derived from Korean Peninsula and China, respectively. The e3-Mo variant is not registered in dbSNP, but we found in one Chinese landrace Ni Ding Hua Mei Dou from 302 soybean re-sequence data (SRR1533240 in NCBI SRA).

E4

The e4-SORE-1 allele has a 6.2-kb insertion in the first exon. It was difficult to estimate this insertion from the read coverage of amplicon in the region. However, the presence or absence of a large insertion could be estimated from the read coverage of amplicon (AMPL1037734) at the break point of a large insertion (Supplementary Fig. S5F). The average APKM of break point was 325 in the reference type sequence, whereas it was zero in the insertion type sequence of PGC001 and PGC021 derived from Sweden and Japan, respectively. This insertion was also confirmed by the PCR. Most SNPs (11 of 13 sites) in E4 were found from PGC123 and PGC134 derived from Nepal and China (Supplementary Table S9), but there was only one missense variant (rs390866037: Leu151Ser), which likely affects gene function. As these variants were detected as homozygous, they are considered to be real variants, not detected by the miss-mapped reads. A frameshift variant in the second exon was only found in PGC005 (Supplementary Table S9). This accession flowered earlier than Williams 82 under field conditions in spite of the same gene combination for all other flowering-related genes (Supplementary Table S1).

Other Phytochrome A genes

Five variants, three frameshifts and two splice site variants, were identified to be high-impact variant to another PhyA gene, Glyma.03G227300/GmPHYA4 (Supplementary Fig. S5G and Supplementary Table S9). This PhyA gene consisted of two main haplotypes, namely, reference type (Hap1–Hap5) and pseudogene type (Hap6–Hap11), which had various loss-of-function sites. The other PhyA gene, Glyma.10G141400, had only one novel frameshift variant (Chr10_37491867) from two accessions, PGC045 and PGC189 derived from Korea and East Timor (Supplementary Tables S6 and S9 and Supplementary Fig. S5H).

Phytochrome B genes

There has been no report of natural variation in the PhyB genes affecting flowering, but the overexpression of GmPHYB1 accelerates flowering under short-day conditions in Arabidopsis. Only one missense variant (rs124458274) was found from GmPHYB1 (Glyma.09G035500) (Supplementary Table S8 and Supplementary Fig. S5I). rs124458274 was a common variation in the mini-core collection (allele frequency = 0.69). Two novel frameshifts and six missense variants (one was novel) were found in the other PhyB gene Glyma.15G140000 (Supplementary Table S8 and Supplementary Fig. S5J). Although the frameshift variant (Chr15_11442094) was identified in the 19 accessions (Hap8, Supplementary Table S9), no association with flowering time under the examined field conditions was observed.

FLOWERING LOCUS T

Two florigen genes FT5a and FT2a in the soybean genome play a major role in the induction of flowering.,, As no variant was detected in the exon of FT5a, it appears that FT5a is highly conserved under the evolutionary constraint (Fig. 4 and Supplementary Fig. S5K). Nine and five variants were detected in the intron and 3′-UTR, respectively. Of these, four variants were associated with flowering time determined by simple linear regression analysis (Supplementary Tables S3 and S8). Two variants (rs126630615 in 3′-UTR and rs126639618 in the third intron) were reported by Takeshima et al. This FT5a region has been reported to be one of the flowering time quantitative trait loci (QTLs) in the chromosomal segment substitution lines (CSSLs) derived from a cross between Peking and Enrei. In this study, the nucleotide differences between Enrei and Peking were identified as rs126639616_1 (Fig. 4) in the intron and rs388994144_1 (Fig. 4) in the 3′-UTR region (Supplementary Fig. S5K and Supplementary Table S8). As natural variants in 5′- and 3′-UTR of the FT-like gene affect gene expression and flowering time in rice,, the variant rs388994144_1 in 3′-UTR region might be involved in gene expression of FT5a and regulation of flowering time in soybean. The FT2a is a paralogue of FT5a and has been named as E9.E9 is a leaky allele that is caused by allele-specific transcriptional repression due to the insertion of SORE-1 into the first intron. The presence of SORE-1 (FT2a-TO allele) delays flowering for 10 days under natural day-length conditions at Harbin, China (45°43′N, 126°45′E). Zhao et al. also reported a difference of 10 days or more in flowering time between E9 and e9 in Sapporo, Japan (43°07′N, 141°35′E). As we did not design the primers on the intron of FT2a, the presence or absence of SORE-1 is unknown. In this study, one frameshift and two missense variants were found in FT2a (Glyma.16G150700) (Fig. 4 and Supplementary Fig. S5L). In the first exon, missense SNP rs388788554 (Glu23Asp) was detected in PGC066 (Hap13, Supplementary Table S9). In the fourth exon, missense novel SNP (Chr16_31114633) was detected in seven accessions (Hap10, Supplementary Table S9). Another novel frameshift variant (Chr16_31111088) in the fourth exon was detected in PGC166 (Hap14, Supplementary Table S9). Enrei had four known and two novel variants in the intron and 3′-UTR of FT2a, whereas there was no variant in Peking. As QTL is not reported in the FT2a region of the CSSL between Peking and Enrei, these variants might be not involved in the regulation of flowering time under the evaluation conditions of CSSLs.

FT-like genes

Four missense variants were detected in the other three FT homologues, rs126830445 in Glyma.16G151000 (GmFT2b/GmFTL5, Supplementary Fig. S5M), rs127848197 in Glyma.19G108200 (GmFT5b/GmFTL6, Supplementary Fig. S5O), a novel SNP (Chr16_35778390) in Glyma.16G196300 (GmTFL3, Supplementary Fig. S5N), and Chr16_4162554 in Glyma.16G044200 (FT3a/GmFTL1, Supplementary Fig. S5P). The frequency of rs126830445 in GmFT2b/GmFTL5 was 0.45, whereas that of rs127848197 in GmFT5b/GmFTL6 was 0.94. A novel missense (Val98Ile) SNP (Chr16_35778390) in GmTFL3 was only found in PGC037. This accession (YAKUMO MEAKA) is a landrace from Hokkaido, northern part of Japan. In contrast, a novel missense variant (Chr16_4162554) in FT3a/GmFTL1 was only found in PGC134 (Hap7, Supplementary Table S9), which is a medium-maturing accession. No functional defect or missense variant in the other FT-like genes, Glyma.02G069500 (GmFTL7, Supplementary Fig. S5Q), Glyma.08G363100 (GmFT4, Supplementary Fig. S5R), and Glyma.08G363200 (GmFTL6, Supplementary Fig. S5S), was found. No variants were detected in Glyma.19G108100 (GmFT3b/GmFTL2). The information of alleles identified in these FT-like genes will be useful to clarify the influence of these variants on flowering regulation.

TFL1-like genes

Two TFL1-like genes, GmTFL2 (Glyma.03G194700) and GmTFL1 (Glyma.19G194300), exist in the soybean genome. No loss-of-function or missense variant was found in GmTFL2 (Supplementary Fig. S5T), whereas five missense variants were found in GmTFL1, which determine the growth habit of soybean, classically named as Dt1 locus (Supplementary Fig. S5U and Supplementary Table S8). These sites are located where amino acids are highly conserved across TFL1 orthologues: GmTFL2, GmTFL1/Dt1, Lotus japonicas CEN/TFL1, pea TFL1a, Arabidopsis TFL1, Arabidopsis ATC, and Antirrhinum majus CEN. The variant site of rs745009806 (Arg130Lys) was conserved in TFL, but not in ATC and CEN. Another four variant sites, rs127928577 (Arg62Ser), rs392653457 (Leu67Gln), rs127928574 (Pro113Leu), and rs127928573 (Arg166Trp), exist at a highly conserved amino acid site. Of these, rs127928573 (Arg166Trp) is known as dt1 allele in soybean. As the loss-of-function Sidt1 allele has been reported at S79N in Sesamum indium L., the other three missense variants should be examined to verify whether they are new defective dt1 alleles or not.

Two-component response regulator-like genes

Among three two-component response regulator-like genes screened, the stop-lost variant (rs125308117) and five missense variants were found in Glyma.12G073900 (Supplementary Table S8 and Supplementary Fig. S5V). The allele frequency of the stop-lost variant (rs125308117) was 0.31. In Glyma.19G260400, only seven missense variants were found (Supplementary Table S8 and Supplementary Fig. S5W). Among seven variants with high impact on gene function in Glyma.U034500 on scaffold 32 (Supplementary Table S8 and Supplementary Fig. S5), four were frameshift variants due to InDels. Frameshift variant (A > AT, M737I, Scaffold32: 198773) was the major allele, and 82% of the mini-core collection possesses this allele. In contrast, other frameshift variants of insertion (C > CT, Q759M, scaffold_32: 198706) from PGC044, deletion (TTGCC > -, G409D, scaffold_32: 200001) from 16 accessions, and deletion (AC > A, V355L, scaffold_32: 200258) from three accessions, PGC005, PGC094, and PGC174, were rare alleles in the mini-core collection (Supplementary Table S9). The remaining three variants (scaffold_32_199043, scaffold_32_202754, and scaffold_32_218486) with high impact on gene function were stop-gained variant. These results indicate that Glyma.U034500 of most soybean accessions, except for Hap1, Hap2, Hap3, and Hap4 (Supplementary Table S9), losses its function. Among these three genes, Glyma.12G073900 and Glyma.U034500 showed high similarity (91%) at the amino acid sequence level. The fact that length of the amino acid sequence of Glyma.12G073900 of Williams 82 is shorter (92 aa) than that of Glyma.U034500 (765 aa) at the C terminal indicates that Glyma.12G073900 encodes truncated protein. Although flowering control by two-component response regulator-like genes has been reported in various species, the role of this gene and its variant in soybean flowering are unknown. Among the two-component response regulator-like genes, only variants in Glyma.12G073900 and Glyma.U034500 were associated with flowering time, determined by simple linear regression analysis (Table 3 and Supplementary Table S8). Glyma.U034500 (Chr11 11.23-11.26Mb on Gmax189) is located near a previously reported QTL as qFT-B1 (nearest marker: Satt519 74.7cM, Chr11 13.98Mb on Gmax189) in the 96 from the cross between Tokei 780 and the soja accession Hidaka 4. Although they reported the effect of qFT-B1 is 3.4–10.8 days, the genotype of this frameshift site (Scaffold 32:198773) in the parent of recombinant inbred lines (RILs) is unknown. It can be confirmed using the detected variants as a DNA marker whether detected stop-lost and stop-gained variants in two-component response regulator-like genes are responsible genes for the flowering time.

Other genes

Seven missense variants were identified in Glyma.16G200700 encoding MADS box protein, whereas functional defect variant was not found (Supplementary Table S8 and Supplementary Fig. S5Y). MADS-domain transcription factor of the AGL6 gene is known to be a factor responsible for the regulation of lateral organ development, flowering time, and circadian clock in Arabidopsis.,AGL6 regulates flowering through the FLC family genes and FT. Two missense variants, rs389022394 and rs126888609, on Glyma.16G200700 are significantly associated with the flowering time, determined by the simple linear regression analysis (Table 3 and Supplementary Table S8). As there is no report for FLC-like genes in soybean, it will be important to examine whether the detected two variants from Glyma.16G200700 have an effect on the flowering time. One novel frameshift variant (Chr17_3955763) of WD repeat-containing protein 61 (Glyma.17g52100) was significantly associated with flowering time, determined by the simple linear regression analysis (Table 3 and Supplementary Table S8). Although the allele frequency of the frameshift variant was 0.52 (Supplementary Table S8), no QTL has been reported in this region. In Arabidopsis, WD repeat-containing protein VIP3 regulates flowering time via the vernalization pathway; however, the vernalization pathway is not known in soybean and it is difficult to infer the role. As a large proportion of the mini-core collection has the novel frameshift variant for Glyma.17G052100 and missense variant for Glyma.16G200700, it is necessary to confirm genetically whether these novel variants really affect the flowering time. The transcription factor gene, Glyma.17G090500, was sequenced as the control for variant detection. All known variants were detected correctly (data not shown).

3.4. Gene-based association test for flowering time

To refine responsible variants associated with variation in flowering time in the mini-core collection, we performed multiple linear regression analysis using variants significantly associated with flowering time in the simple linear regression analysis. The variants of e2, e3-tr and stop-lost variant (rs125308117) of two-component response regulator-like gene on Chr12 were significant in 3 yrs, and rs127928573 in Dt1 was significant only in 2013 (Table 4). These genes could explain 51.82%, 51.13%, and 52.83% of the phenotypic variation of flowering time among the mini-core collection in 3 yrs, respectively. In this study, the variants of E1 and E4 could not be incorporated into the association analysis due to the low frequency of e1-nl (0.5%) and e4-SORE1 (1%) alleles in the mini-core collection. The extent of variation explained in this study was ∼10% lower than 62–66% reported by Zhai et al. Even though the allele frequency of e1-as was relatively high (9%), e1-as was not significant in the multiple linear regression analysis. This is probably because the genetic effect of e1-as is smaller than that of E2, E3, and two-component response regulator-like gene. In the simple linear regression analysis, the P-value of E2 (2.0e−16), E3 (5.2e−14–1.9e−13), and two-component response regulator-like gene (7.7e−11–1.8e−8) was considerably lower than that of e1-as (0.015–0.033) (Table 3). Further experiment using a larger population size is required to examine the remaining variation that could not be explained by the three genes with e1-as.

Table 4

DNA polymorphisms responsible for flowering time variation in the mini-core collection as revealed by multiple regression analysis

Data set	SNP No.	SNP name	Physical position^a		Glyma ID	Description^b	Alleles	Effect^c	MAF	Parameters estimated by linear regression			Contribution rate (%)
Data set	SNP No.	SNP name	Chromosome	bp	Glyma ID	Description^b	Alleles	Effect^c	MAF	β^d	s.e.	P-value^e	Contribution rate (%)
2011
	E2	rs124971350 (e2)	Chr10	45310798	Glyma.10G221500	GIGANTEA (E2)	A/T	Stop_gained	0.42	−5.21	1.51	2.5E-03**	21
	SNP379	e3tr ^f	Chr19	47641562	Glyma.19G224200	Phytochrome A (E3)	Large deletion	Loss of exon 4	0.17	−9.88	3.05	3.9E-03**	19
	SNP156	rs125308117	Chr12	5520945	Glyma.12G073900	Two-component response regulator-like	T/C	Stop_lost	0.31	11.24	3.53	4.5E-03**	12
2012
	E2	rs124971350 (e2)	Chr10	45310798	Glyma.10G221500	GIGANTEA (E2)	A/T	Stop_gained	0.42	−5.44	1.47	1.3E-03**	22
	SNP379	e3tr ^f	Chr19	47641562	Glyma.19G224200	Phytochrome A (E3)	Large deletion	Loss of exon 4	0.17	−8.50	2.95	9.0E-03**	19
	SNP156	rs125308117	Chr12	5520945	Glyma.12G073900	Two-component response regulator-like	T/C	Stop_lost	0.31	9.65	3.42	1.0E-02*	10
2013
	E2	rs124971350 (e2)	Chr10	45310798	Glyma.10G221500	GIGANTEA (E2)	A/T	Stop_gained	0.42	−5.36	1.66	4.0E-03**	20
	SNP379	e3tr ^f	Chr19	47641562	Glyma.19G224200	Phytochrome A (E3)	Large deletion	Loss of exon 4	0.17	−12.18	3.34	1.5E-03**	13
	SNP156	rs125308117	Chr12	5520945	Glyma.12G073900	Two-component response regulator-like	T/C	Stop_lost	0.31	12.69	3.86	3.5E-03**	19
	SNP349	rs127928573	Chr19	45183701	Glyma.19G194300	TERMINAL FLOWER 1	T/A	Missense_variant	0.09	11.23	5.11	3.9E-02*	0.5

MAF: minor allele frequency; s.e.: standard error.

Physical position on Gmax275.

Gene description was obtained from Phytozome 12.

Effect to gene function annotated by snpEff. Effect of AMPL1040314 was defined by manually.

Standardized regression coefficients.

Adjusted P-value was obtained from multivariate models days to flowering and genotype as covariates. Signification codes: ‘**’ 0.01 ‘*’ 0.05.

Large deletion on E3 estimated by coverage of four amplicon on 4th exon.

DNA polymorphisms responsible for flowering time variation in the mini-core collection as revealed by multiple regression analysis MAF: minor allele frequency; s.e.: standard error. Physical position on Gmax275. Gene description was obtained from Phytozome 12. Effect to gene function annotated by snpEff. Effect of AMPL1040314 was defined by manually. Standardized regression coefficients. Adjusted P-value was obtained from multivariate models days to flowering and genotype as covariates. Signification codes: ‘**’ 0.01 ‘*’ 0.05. Large deletion on E3 estimated by coverage of four amplicon on 4th exon. The other five genes, namely, WD repeat-containing protein 61 (Glyma.17G052100), MADS-box protein (Glyma.16G200700), PhyB (Glyma.15G140000), two-component response regulator-like gene (Glyma.U034500), and FT2a/GmFTL3 (Glyma.16G150700), were significant in the simple linear regression analysis (P < 0.0001) but not significant in the multiple linear regression analysis (Table 3 and Supplementary Table S8). Variants that differ between Enrei and Peking can be used to confirm allele effect on flowering time using the phenotypic data of CSSLs. Peking had a novel frameshift variant (Chr17_3955763) in WD repeat-containing protein 61 (Glyma.17G052100), two missense variants (rs389022394 and rs126888609) in MADS-box protein (Glyma.16G200700), and one frameshift variant (Chr15_11442094) in PhyB (Glyma.15G140000), and no variant in Enrei (Supplementary Table S8). However, no flowering time QTL has been reported to Chr17, Chr16, and Chr15; these genes may not be involved in flowering time regulation under the evaluation conditions of CSSLs. It was the stop-gain allele E2 that showed the highest association with flowering time. The effect of this variant promotes flowering about 5 days (Table 4). Watanabe et al. reported that the difference in days to flowering between E2/E2 and e2/e2 was ∼9 days, which is consistent with the result of this study. The next strong association with flowering time was observed at E3. The e3-tr allele (Horosy-e3) has been reported to promote flowering for ∼17 days, but it was estimated as 9–13 days in this study. The smaller estimation at E3 can be explained by the absence of e3-Mo allele. As there are only two accessions, PGC019 and PGC042 (Supplementary Table S9), the e3-Mo allele could not be included in the association analysis. The effect of the missense variant (rs127928573, Arg166Trp) of Dt1 was detected only for 2013 data set; it delayed flowering by ∼11 days compared with that by the Dt1 allele (Table 4). Dt1 is reported as the locus strongly associated with days to maturity and plant height. Zhang et al. identified the Dt1 gene at 18.6-kb upstream of the peak SNP, which was associated with days to maturity and plant height. Dt1 plays a primary role in not only stem termination but also floral transition., As no visible influence on the flowering time has been reported with dt1 VIGS-induced suppression, the detected SNP on Dt1 in this study suggests the presence of other gene in the surrounding region related to the flowering time. The effect of stop-lost variant (rs125308117) in the two-component response regulator-like gene (Glyma.12G073900) was significant (P = 4.5 e−3 in 2011, P = 1.0 e−2 in 2012, P = 3.5 e−3 in 2013), and the plant flowers ∼10–13 days later. Involvement of the two-component response regulator-like gene in flowering time has been reported in Arabidopsis and rice; it may be functionally preserved as a flowering time-related gene in soybean. Williams 82 (reference genome) has C-terminal truncated protein as described above, whereas the rs125308117 variant has longer amino acid sequence and allele effect of delayed flowering for 4.7 days (Table 4). Although Glyma.12G073900 is located near a previously reported QTL as qFT-H (nearest marker: Satt442 on Chr12: 6,390,806–6,391,062) in RILs with the E1 allele from the cross between Tokei 780 and the soja accession Hidaka 4, the allele type of Glyma.12G073900 in both accessions is unknown. The genomic region surrounding Glyma.12G073900 has been reported to include flowering time QTL qDFF-Gm12 in CSSLs.Glyma.12G073900 of Peking (PGC084) is the stop-lost type (longer protein), whereas that of Enrei (PGC025) is reference type (truncated protein). Similar to the present study, Peking allele delayed flowering by ∼3.7 days (LOD score is 36.3, flanking markers: C12-BARC- 015603-02006 and s024200450). These data suggest that Glyma.12G073900 is one of the candidate gene for qDFF-Gm12.

4. Conclusions

Flowering time and maturity are the most important factors affecting adaptability and yield. To increase the yield of soybean, it is necessary to control flowering time at an appropriate time using a combination of flowering time-related genes or alleles. Preparing a catalogue of flowering time-related genes makes it possible to freely combine alleles with various effects using the DNA markers. Our results indicate that novel alleles and accessions with such novel alleles can be rapidly detected using the AmpliSeq technology. Although multiple defective alleles were identified, we could not include all of them in the association study of flowering time due to low allele frequency. Nevertheless, the variants detected in this study could explain 51.1–52.3% of the flowering time variation in the soybean mini-core collection. These variants consisted of a novel two-component response regulator gene besides known flowering time-related genes. Therefore, the AmpliSeq technology is useful for discovering novel variants in the target genes.

Data availability

All sequences analysed in this study have been deposited in the DDBJ database under the BioProject Accession number: PRJDB7633. Click here for additional data file.

10 in total

1. PhenGenVar: A User-Friendly Genetic Variant Detection and Visualization Tool for Precision Medicine.

Authors: JaeMoon Shin; Junbeom Jeon; Dawoon Jung; Kiyong Kim; Yun Joong Kim; Dong-Hoon Jeong; JeeHee Yoon
Journal: J Pers Med Date: 2022-06-12

2. Validation of AmpliSeq NGS Panel for BRCA1 and BRCA2 Variant Detection in Canine Formalin-Fixed Paraffin-Embedded Mammary Tumors.

Authors: Daniela Di Giacomo; Marco Di Domenico; Sabrina Vanessa Patrizia Defourny; Daniela Malatesta; Giovanni Di Teodoro; Michele Martino; Antonello Viola; Nicola D'Alterio; Cesare Cammà; Paola Modesto; Antonio Petrini
Journal: Life (Basel) Date: 2022-06-07

3. Development of co-dominant markers linked to a hemizygous region that is related to the self-compatibility locus (S) in buckwheat (Fagopyrum esculentum).

Authors: Katsuhiro Matsui; Nobuyuki Mizuno; Mariko Ueno; Ryoma Takeshima; Yasuo Yasui
Journal: Breed Sci Date: 2020-02-11 Impact factor: 2.086

4. A Soybean Deletion Mutant That Moderates the Repression of Flowering by Cool Temperatures.

Authors: Jingyu Zhang; Meilan Xu; Maria Stefanie Dwiyanti; Satoshi Watanabe; Tetsuya Yamada; Yoshihiro Hase; Akira Kanazawa; Takashi Sayama; Masao Ishimoto; Baohui Liu; Jun Abe
Journal: Front Plant Sci Date: 2020-04-15 Impact factor: 5.753

Review 5. Impacts of genomic research on soybean improvement in East Asia.

Authors: Man-Wah Li; Zhili Wang; Bingjun Jiang; Akito Kaga; Fuk-Ling Wong; Guohong Zhang; Tianfu Han; Gyuhwa Chung; Henry Nguyen; Hon-Ming Lam
Journal: Theor Appl Genet Date: 2019-10-23 Impact factor: 5.699

6. Characterization and quantitative trait locus mapping of late-flowering from a Thai soybean cultivar introduced into a photoperiod-insensitive genetic background.

Authors: Fei Sun; Meilan Xu; Cheolwoo Park; Maria Stefanie Dwiyanti; Atsushi J Nagano; Jianghui Zhu; Satoshi Watanabe; Fanjiang Kong; Baohui Liu; Tetsuya Yamada; Jun Abe
Journal: PLoS One Date: 2019-12-05 Impact factor: 3.240

7. Whole-genome sequence diversity and association analysis of 198 soybean accessions in mini-core collections.

Authors: Hiromi Kajiya-Kanegae; Hideki Nagasaki; Akito Kaga; Ko Hirano; Eri Ogiso-Tanaka; Makoto Matsuoka; Motoyuki Ishimori; Masao Ishimoto; Masatsugu Hashiguchi; Hidenori Tanaka; Ryo Akashi; Sachiko Isobe; Hiroyoshi Iwata
Journal: DNA Res Date: 2021-01-19 Impact factor: 4.458

8. Targeted amplicon sequencing + next-generation sequencing-based bulked segregant analysis identified genetic loci associated with preharvest sprouting tolerance in common buckwheat (Fagopyrum esculentum).

Authors: Ryoma Takeshima; Eri Ogiso-Tanaka; Yasuo Yasui; Katsuhiro Matsui
Journal: BMC Plant Biol Date: 2021-01-06 Impact factor: 4.215

9. Construction of prediction models for growth traits of soybean cultivars based on phenotyping in diverse genotype and environment combinations.

Authors: Andi Madihah Manggabarani; Takuyu Hashiguchi; Masatsugu Hashiguchi; Atsushi Hayashi; Masataka Kikuchi; Yusdar Mustamin; Masaru Bamba; Kunihiro Kodama; Takanari Tanabata; Sachiko Isobe; Hidenori Tanaka; Ryo Akashi; Akihiro Nakaya; Shusei Sato
Journal: DNA Res Date: 2022-06-25 Impact factor: 4.477

Review 10. The Modification of Circadian Clock Components in Soybean During Domestication and Improvement.

Authors: Man-Wah Li; Hon-Ming Lam
Journal: Front Genet Date: 2020-09-30 Impact factor: 4.599

10 in total