Literature DB >> 34995316

Establishing laboratory-specific reference intervals for TSH and fT4 by use of the indirect Hoffman method.

Sylwia Płaczkowska¹, Małgorzata Terpińska^1,2, Agnieszka Piwowar³.

Abstract

BACKGROUND: The results of examinations of laboratory parameters are the basis of appropriate medical decisions. The availability of reliable and accurate reference intervals (RIs) for each laboratory parameter is an integral part of its appropriate interpretation. Each medical laboratory should confirm their RIs. Up-to-date reference intervals for thyroid function hormones are still a matter of ongoing controversy. The aim of the study was the application of the indirect Hoffman method to determine RIs for TSH and fT4 based on the large data pools stored in laboratory information systems and the comparison of these RIs to generally used RIs.
MATERIAL AND METHODS: The TSH and fT4 routine examination results of hospitalized and outpatient populations were collected over five years (2015-2019), and reference limits were established by the improved Hoffmann method after the exclusion of outliers. Comparative verification of established RIs was conducted with the RIs values provided by test manufacturers and literature data.
RESULTS: Various RIs were observed in different age groups in the examined populations. For TSH, RIs varied between different age groups, with a narrower range of RIs in the studied adult population and a shift of both reference boundaries toward higher values in comparison to manufacturers' data among children. RIs estimated for fT4 were very similar to the manufacturer and literature data.
CONCLUSION: Thyroid hormone levels change during a person's lifetime and vary between sexes, but this difference does not always influence the clinical interpretation of laboratory results in the context of RIs. The use of indirect methods is justified due to the ease and low cost of their application.

Entities: Chemical

Mesh：

Substances：
Thyrotropin
Thyroxine

Year: 2022 PMID： 34995316 PMCID： PMC8741008 DOI： 10.1371/journal.pone.0261715

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Background

The results of examinations of laboratory parameters provide useful information for assessing the current health condition of patients. They are necessary for early detection and recognition of disturbances as well as for making appropriate medical decisions. For the purpose of interpretation of laboratory results, the use of the concept of reference intervals (RIs) is currently generally accepted in laboratory medicine [1]. The Clinical Laboratory Standards Institute (CLSI) has released a relevant guideline (C28-A3c) for the evaluation of RIs. According to the CLSI recommendations, an RI is defined as the interval between which 95% of values of a reference population fall into, and includes two extreme reference limits–boundaries derived from the distribution of reference values, which could be associated with good health but also with other physiological or pathological conditions [2, 3]. It is recommended that medical laboratories determine their own local reference intervals to embrace the variations in local populations and the methods and equipment used in particular laboratories. The confusion of RIs with clinical decision limits (CDLs) still remains an issue, especially in paediatric and geriatric age groups, where it presents a significant diagnostic problem [4]. CLSI currently recommends a direct method based on the collection of a minimum of 120 samples from members of a specific preselected reference population, making measurements, and then determining the range which includes 95% of all measured values using a parametric (mean ±2SD) or non-parametric method (2.5th and 97.5th percentile) [2]. The requirement that each laboratory determines its own reference intervals is virtually impossible to perform in practice because of the tremendous amount of time and money required to carry out additional laboratory tests and gather the appropriate reference group. Thus most laboratories adopt external sources for RIs, often without taking into account the problems of transferring values between different populations or laboratory methods [5]. Data provided by the test manufacturer is the most often used source of reference intervals, because such information is required from reagents suppliers by the ISO 15189:2013 standard [6]. At the same time, methods and processes for the determination of reference intervals using indirect methods have been in development for over 50 years, but they are not yet widely applied. This alternative approach is based on the statistical analysis of results generated as part of routine laboratory testing in hospital and outpatient clinics in order to determine reference intervals [3]. Indirect methods eliminate results that do not fit the assumed hypothetical model of the distribution of results–generally a normal distribution–and designate the RI as the central or marginal 95% of the selected results. The application of the indirect method has major potential advantages compared with direct methods. In particular, this process is faster and cheaper; it involves no inconvenience, discomfort, or any additional risk to patients; and laboratory staff need not examine any additional samples. Therefore, additional costs are avoided, which is important in the modern and effective management of the medical laboratory and the hospital [7]. Establishing RIs is particularly problematic for constituents with a large diversity of existing biological variation and inter-population differences, as for example is observed for thyroid hormones, especially thyrotropin–TSH (Thyroid Stimulation Hormone) as well as free triiodothyronine (fT3) and free thyroxine (fT4). The prevalence of thyroid dysfunction in the general world population is estimated to be between 1 and 2% [8]. There are still discrepancies between TSH, fT3 and fT4 reference values applied to the diagnosis of thyroid dysfunction not only between laboratories, but also in the scientific literature to date. It seems erroneous to apply the concept of universal limits to reference intervals for thyroid function hormones, especially for TSH [9]. Regarding this fact, it appears important and useful to establish reference intervals using a costless, optimized indirect statistical method. At present, it is assumed that the value of TSH in a healthy general population is approximately 0.4–4.0 mIU/L, which is the result of the fairly high inter-individual variability of this parameter. However, the variability of the value in an individual is much smaller [10, 11], and the value determined in a state of hormonal equilibrium can be regarded as an individual’s set-point [12, 13]. Since slight changes in fT4 value correspond to a significant change in TSH, it is used in the screening of disorders of thyroid hormones. Therefore, it is advisable to find out about this individual point by measuring the level of TSH for each person in times of health. This allows for earlier detection of important clinical disturbances in thyroid condition, even without direct comparison to the reference interval [10] or taking into account the physiological changes in concentration related to age [14]. According to the current recommendations, in order to screen for thyroid primary dysfunction, the first TSH determinations should be performed repetitively in 3–6-month intervals, followed by fT4 for differentiation of subclinical and ‘overt’ thyroid dysfunction. fT3 determinations should be performed only in specific cases [12, 15].

Material and methods

Objective

The aim of this study was to establish the reference intervals for TSH and fT4 from the large data pools of patient results stored in laboratory information systems (LIS) using the indirect Hoffman method and to conduct a comparison of RIs with generally used reference limits.

Laboratory methods

The third generation test of TSH (TSH-3 Ultra) and fT4 (Free Thyroxine) examinations were performed utilising a chemiluminescence method on the Atellica IM analyser (Simens Healthcare Diagnostics Inc., Erlagen, Germany). The linear range of this method was 0.008–150.000 mIU/L and 0.1–12.0 ng/dL for TSH and fT4, respectively. The laboratory intra-series analytical coefficient of variation (CVa) for TSH and fT4 was assessed as 6.4% and 2.5%, and inter-series were 7.1% and 3.2%, respectively.

Data gathering

The study was performed in accordance with the Declaration of Helsinki and consent was approved by the Wroclaw Medical University Bioethical Commission (decision No. 537/2018). Based on the decision of the bioethics committee, patient informed consent was waived due the retrospective nature of the study conducted on a deidentified aggregated numerical data. All the laboratory results of TSH and fT4 examinations, together with the patient’s age, sex and date of examination, archived in the Laboratory Information System of Department of Laboratory Diagnostics and derived from patients hospitalized at the University Clinical Hospital in Wroclaw during the five-year observation period (1st January 2015– 31st December of 2019 year) were included in the study and used for statistical analysis without any primary selection. The study included 105 927 TSH results (65 163 from women and 40 764 from men) and 41 400 fT4 results (26 406 from women and 14 994 from men). The participants’ age range for the analysed hormones was from 0 to 109 years.

Statistical analysis

Firstly, before performing any kind of analysis, all data were logarithmically transformed because of the strong right-skewness of the data distribution. Next, all data were divided into 8 age groups (<1 y., ≥1 y. < 6, ≥6 y., <12, ≥12 y. <18, ≥18 y. <40, ≥40 y. <65, ≥65 y. <90, ≥90 y.), which reflected the physiological changes associated with human ontogenetic development and the main age groups for which reference values were provided by the manufacturer. The two-sided Tukey test 1.5 IQR was used to reject outliers separately for each age group. The number of excluded records of TSH and fT4 in the studied age groups is reported in detail in Fig 1.

Fig 1

Number of outliers excluded from the entire data base by the two-sided Tuckey test (1.5IQR) for each studied age group.

Eventually 100 171 (38 802 men and 61 361 women) and 40 086 (14 508 men and 25 621 women) results for TSH and fT4, respectively, were included in further calculations. The Hoffman method–an indirect statistical method based on the graphic distribution of lnTSH and lnfT4 values–was applied in each age group for all participants and in regard to sex. In accord with this method, a Q-Q plot was created in each study age group. The Hoffmann method assumes a Gaussian distribution of physiological test results, and only this range of results is used to determine the reference intervals. On the Q-Q plo, the empirical data with Gaussian distribution creates a straight line with the normal theoretical quantile of the standard normal distribution. In the next step, an elimination of outliers was visually conducted on the basis of Q-Q plot, and the distribution of remaining data was used to calculate the regression equation using the least squares method. The regression line included the middle range of data, was initially fitted by visual inspection, and was statistically confirmed by determining the linear correlation coefficient. Only r> 0.99 was acceptable. The linear line over the linear part of the Q-Q chart was described by the following equation: where y = the lnTSH or lnfT4 value, respectively, x = normal theoretical quantile of the standard normal distribution (μ = 0, σ = 1), a = the slope of the regression line, c = the intercept, e = error. In the next step, the extrapolation of the linear regression equation to the boundaries of the 95% confidence interval were conducted as follows: Lower reference Limit (LRI) = −1.96×a+b and Upper reference Limit (URI) = 1.96×a+b. All statistical procedures mentioned above were performed on logarithmically transformed data. The antilogarithm was applied in the last step for the calculation of RIs values on the basis of the linear regression equation. Linear least squares regression was applied to the middle part of the data distribution. Then Reference Change Value (RCV) was used in order to determine the clinical significance of the relationship between RIs in all selected age groups and manufacturers and published RI. RCV was calculated according to the formula [16]: where Z is the probability selected for significance, the chosen Z value of 1.96 corresponds to a significance level of 0.05, CVa–the analytic variation (inter-assay variation estimated from our laboratory data) and CVi–the within-subject biologic variation (data from Ricos et al. [17]). In our study, the estimated RCV values for TSF and fT4 were 56.6% and 34.6%, respectively. Statistical analyses were performed using Statistica 13.1 PL.

Results

A representative Q-Q plot of the distribution of lnTSH for participants aged ≥18 and <40 is presented in Fig 2. It can be seen that the data included in the analysis for this age group forms a straight line in the middle of the graph and is bent at both ends. This linear range is the basis for the RI estimation for this age group according to the methodology described in the Statistical Analysis section.

Fig 2

A representative Q-Q plot of distribution of lnTSH for participants aged ≥18 and <40.

The RI values obtained for TSH and fT4 in each separate age group for all study participants and by gender are summarized in Fig 3.

Fig 3

RIs for TSH (panel A) and fT4 (panel B) in all participants included in the analysis groups and among males and females in each age group.

RIs for TSH (panel A) and fT4 (panel B) in all participants included in the analysis groups and among males and females in each age group. For TSH, the estimated values of RIs decreased with the age of the patients and simultaneously tended to decrease ranges. An inverse relationship was observed for the fT4 value with regard to the RI range, which increased with age, while the RI limits did not indicate significant shifts in values. Detailed TSH numerical data related to Fig 3 for all results are provided together with a comparison of LRI and URI for women and men in Table 1.

Table 1

Estimation of reference intervals with the Q-Q plot method for all TSH results and percentage differences in TSH reference between women and men.

age group	N records included to the final analysis	all participants		LRI			URI
age group	N records included to the final analysis	LRI	URI	women	men	% of difference	women	men	% of difference
< 1 y.	1254	0.93 *	7.76	1.23 *	1.08 *	11.7	7.51	8.81	-17.3
≥1 y. < 6	3268	1.03 †	6.48	1.13 †	1.04 †	7.3	5.77	6.18	-7.2
≥6 y. <12	5795	0.91 ‡	5.54	0.96	0.97 ‡	-0.5	5.42	5.28	2.6
≥12 y. <18	6575	0.79	4.68	0.77	0.76	2.1	4.44	5.11	-15.0
≥18 y. <40	18020	0.63	4.51	0.62	0.56	9.2	4.25	4.46	-4.8
≥40 y. <65	31001	0.45	4.23	0.55	0.49	10.4	4.48	4.01	10.6
≥65 y. <90	32924	0.44 b	4.89	0.48 *,†	0.49	-2.9	4.64	4.07	12.2
≥90 y.	1334	0.39 *,†,‡	5.20	0.43 *,†	0.39 *,†,‡	8.9	5.26	4.31	18.1

*, †, ‡- RCV between age group *: <1 year of age, †: ≥1 year of age <6, ‡: ≥years of age <12, and another age group marked with the same symbol within the analyzed group of participants (columns) exceeds 56.0%. The applied Hoffman method shows various TSH RIs in different age groups. The LRIs gradually decreased in subsequent age groups, from children up to the age of ≥90; for URI, the same trend was observed up to <65 and then increased in participants older than 65. Differences greater than those determined by RCV for comparisons between each age group were revealed for LRI in all subjects as well as in separate samples of women and men. In all cases, they concerned the results of the oldest patients, children and adolescents. The comparison of LRI and URI between women and men in particular age groups showed the greatest difference for LRI of 11.7% TSH in the age group of <1 years. The same analysis of URI for TSH indicated the highest differences in group > 90 years, but they were also among the highest among infants. It is worth mentioning that the obtained differences in percentage for LRI and URI are smaller than the extra-individual variability for TSH– 24.9% [16]. Therefore, the results obtained for all study participants in a given age group were considered as general RI values, and differences in sex were not taken into account. The LRIs for participants aged <40 established in this study were within TSH RIs provided by the manufacturer. For the three oldest participant groups, the LRIs were lower than reported by manufacturers, while TSH were higher in children and adolescents and groups ≥65 years. For adults, URI estimated by Q-Q plots were lower than provided by the manufacturer. Generally, in comparison to the manufacturers’ data, the Hoffman method revealed a shift of both reference boundaries toward higher in children and adolescents, narrower RIs for TSH in the studied adult population, and wider RIs boundaries among seniors. The difference between LRI and URI established by the Hoffman method with RIs reported by the manufacturer did not exceed the acceptable 56% RCV for TSH in any age group (Table 2).

Table 2

Comparison of calculated TSH reference intervals with RIs reported by manufacturer.

TSH [mIU/L] Hoffman method				Manufacturer RI 2.5–97.5 percentile				difference (%) between established and manufacturer RI*
Age group	N	LRI	URI	Age group	N	LRI	URI	LRI	URI
< 1 y.	1254	0.93	7.76	<2 y.	94	0.87	6.15	6.4	20.7
≥1 y. < 6	3268	1.03	6.48	2–12 y.	198	0.67	4.16	34.9	35.8
≥6 y. <12	5795	0.91	5.54	2–12 y.	198	0.67	4.16	26.4	24.9
≥12 y. <18	6575	0.79	4.68	13–20 y.	150	0.48	4.17	39.2	10.9
≥18 y. <40	18020	0.63	4.51	adults	229	0.55	4.78	12.7	-6.0
≥40 y. <65	31001	0.45	4.23					-22.2	-13.0
≥65 y. <90	32924	0.44	4.89					-25.0	2.2
≥90 y.	1334	0.39	5.20					-41.0	8.1

* accepted TSH RCV <56%.

* accepted TSH RCV <56%. RIs established for fT4 presented in Table 3 were characterized by low variability between different sex and age groups.

Table 3

Estimation of reference intervals with the Q-Q plot method for all fT4 results and percentage differences in TSH reference between women and men.

age group	N records included to the final analysis	all participants		LRI			URI
age group	N records included to the final analysis	LRI	URI	women	men	% of difference	women	men	% of difference
< 1 y.	901	0.92	1.61	0.87	0.94	-7.9	1.73	1.52	12.1
≥1 y. < 6	2390	0.87	1.50	0.94	0.90	3.5	1.54	1.53	1.0
≥6 y. <12	4647	0.85	1.51	0.84	0.89	-5.7	1.53	1.46	5.0
≥12 y. <18	5418	0.81	1.48	0.81	0.80	1.0	1.48	1.47	1.1
≥18 y. <40	6302	0.85	1.65	0.83	0.91	-9.8	1.62	1.67	-3.1
≥40 y. <65	10074	0.84	1.69	0.86	0.84	2.5	1.66	1.69	-1.7
≥65 y. <90	10051	0.87	1.77	0.87	0.87	0.2	1.77	1.71	3.5
≥90 y.	303	0.85	1.82	0.88	0.80	8.8	1.83	1.75	4.6

The sex differences for fT4 LRI were less than 10%, and the highest were observed in infants and age groups ≥40 y. <65 and ≥ 90 y. In the youngest group, the difference in URI between sexes was lower than even 5%, which is only a quarter of the RCV value established for fT4 in this study. Therefore, further analyses were carried out without regarding sex. The differences calculated for comparison between subsequent age groups were below the value of RCV 21.6% in all cases. It is worth to mentioning the obtained differences in percentage for LRI and URI are smaller than the extra-individual variability of fT4–12.1% [16]. A comparison of calculated fT4 reference intervals with RIs reported by the manufacturer is presented in Table 4.

Table 4

Comparison of calculated fT4 reference intervals with RIs reported by manufacturer.

fT4 [ng/dl] Hoffman method				Manufacturer RI 2.5–97.5 percentile				absolute difference (%) between established and manufacturer RI*
Age group	N	LRI	URI	Age group	N	LRI	URI	LRI	URI
<1 y.	901	0.92	1.61	<2 y.	72	0.94	1.44	2.2	10.5
≥1 y. < 6	2 390	0.87	1.50	2–12 y.	190	0.86	1.40	1.1	6.7
≥6 y. <12	4 647	0.85	1.51	2–12 y.	190	0.86	1.40	-1.1	7.3
≥12 y. <18	5 418	0.892	1.326	13–20 y.	129	0.83	1.43	6.9	7.8
≥18 y. <40	6 302	0.85	1.65	adults	388	0.89	1.76	-4.7	-6.7
≥40 y. <65	10 074	0.84	1.69					-5.9	-4.1
≥65 y. <90	10 051	0.87	1.77					-2.3	0.6
≥90 y.	303	0.85	1.82					-4.7	-3.2

* accepted fT4 RCV < 21,6%.

* accepted fT4 RCV < 21,6%. LRI for fT4 were almost the same as the ranges provided by manufacturers, and higher differences were noticed for infants’ URIs. Generally, LRIs were identical for infants and children, while the URI was slightly higher than provided by manufacturer. Narrower RIs falling within the manufacturers’ data were observed for the adolescent group. A minimal shift toward the lower values of RIs were revealed for adults aged ≥40 and < 65 years. The similar trend was observed for seniors, but only for the LRI value.

Discussion

The conducted analysis of TSH and fT4 RIs using the Hoffman method revealed that the differences between the sexes in LRI and URI values in individual age groups are smaller than the assumed RCV value. When comparing the age groups as a whole and taking sex into account, RI showed differences exceeding the accepted critical value only for TSH between the youngest and the oldest participants in the study. Currently, medical decisions are mainly based on results of laboratory diagnostic tests, which are used to confirm, exclude, classify or monitor disease in order to guide treatment. Therefore, establishing appropriate reference ranges is crucial for the correct interpretation of laboratory results. Unfortunately, our observations show that the main source of reference intervals in Poland is data from literature, often based on research carried out among the general population or with a different genetic and/or cultural profile [18]. The second source of information about the expected values is provided by the laboratory reagent manufacturer. The analysis of the RI values in these materials shows that although the manufacturers declare that their procedures comply with the CLSI recommendations, the reference groups are very often too small in number and very poorly characterized in many important aspects, such as race, age group, health status, or body weight [19-21]. Very rarely, a laboratory procedure is also used in order to verify the reference intervals provided by the manufacturer according to CLSI recommendations [2]. The RIs for TSH determined in our study by Hoffman method show a significant agreement with RI in the age group ≥18 y. <40 and URI in adolescents and adults provided by the manufacturer. In other age groups, the RI differences, although they did not exceed the RCV, were higher than 20%. However, with regard to fT4, the observed differences in the values determined by the Hoffman method and provided by the manufacturer were minimal and did not relate to any specific age groups. This indicates the possibility of using the RIs provided by the manufacturer, but this would require initial confirmation. Confirmation of the manufacturer’s RIs compliance by CLSI recommendations requires the completion of additional laboratory tests in a strictly defined reference group of at least 20 people for each age range and, if required, taking sex into account. A comparison of RIs established by an individual laboratory with the manufacturer’s value excludes the differentiation of the test results caused by a different measurement system, but there are still differences resulting from pre-analytical conditions and population differences. The use of the Hoffman method as a method of determining reference intervals does not require additional tests, and at the same time allows for the determination of any age intervals, not only those proposed by the manufacturer. Determination of TSH is currently the main parameter used in the screening of thyroid disorders and as a therapeutic target and prognostic marker. This approach requires the establishment of specific reference limits, not only for TSH but also for other thyroid hormones, which are in close physiological relationship [13]. Numerous studies have been conducted worldwide to set such limits for the general population, but there is still no consensus on this matter [22, 23]. This is due to numerous preanalytical and analytical factors of thyroid hormone determinations and population differences [5, 9]. There are a limited number of publications available using the indirect method (e.g. Hoffman method) for RI estimation in the general population, and the results of most of them concerning TSH and fT4 are summarized in Table 5.

Table 5

Comparison of reference intervals for TSH and fT4 obtained by different indirect methods.

Author, year	method of RI reported by author	age group (n)	LRI—URI
TSH [units]			[mIU/L]
Mokhatar KM [24]; 2020	Batattacharya method	>18 years (8838)	0.44–4.4
	Quantile regression with RCQ	18–29 (1618)	0.46–3.9
		30–39 (1981)	0.44–4.1
		40–49 (1769)	0.42–4.4
		50–59 (1670)	0.41–4.5
		60–69 (1320)	0.39–4.5
		>70 (480)	0.38–4.2
Lo Sasso et al. [5]; 2019	Reference Limit Estimator Software	15–105 years	0.18–3.54
		(22,602)
		women	0.18–3.94
		(12,099)
		men	0.19–3.23
		(7805)
Drees et al. [25]; 20	software EP Evaluator (Data Innovations)	0.5–2 (417)	0.60–5.28
		2–10 (3377)	0.72–4.92
		11–17 (8001)	0.55–4.42
		18–49 (7563)	0.50–4.00
		50–64 (6511)	0.48–4.37
		65–79 (5314)	0.54–4.84
		>80 (1784)	0.54–5.31
Stich et al. [26];2015	Hoffman followed by log transformation	11–14 (1377)	1.43–4.21
Stich et al. [26];2015	Hoffman followed by log transformation	19–30 (4416)	1.08–4.40
Larisch et al. [27];2015	improved Hoffmann and Katayev’s method	adult subjects (399)	0.57–3.32
Feng et al. [28]; 2014;	improved Hoffmann and Katayev’s method	25–85 years (10870)	0.233–4.979
Dorizzi et al. [29];2011	improved Hoffmann and Katayev’s method	>18 years (21,862)	0.16–3.28
Katayev et al. [30]; 2010	improved Hoffmann and Katayev’s method	>18 years (129,443)	0.45–3.05
fT4 [units]			[ng/ml]
Dittadi et al. [31]; 2021	Batattacharya method	-	0.61–1.14 (7.93–14.69 pmol/L)
Kapelari et al. [32]; 2008	posteriori direct methods 2.5th to 97.5th	infants (45)	0.71–1.97 (9.17–25.28 pmol/L)
		6–10 years (327)	0.82–1.62 (10.60–20.90 pmol/L)
		15–18 years (233)	0.82–1.78 10.57–22.62 (pmol/L)

The list of values obtained by indirect methods presented in Table 5 shows a significant differentiation in both LRI and URI and the width of the reference interval for TSH. The LRI obtained by the modified method of Hoffman and Katayev ranges from 0.16 to 1.08 mIU/L, and the URI values range from 3.05 to 4.98 for adults. The results obtained in our study are the closest to the results obtained by Drees et al. [25] by software EP Evaluator (Data Innovations) and Mokhatar KM [24] by Quantile regression with RCQ method. Hoffman’s method was not used in any of the available studies to determine the reference intervals for fT4. However, the available results of two studies using other indirect methods [31, 32] show a fairly good agreement between each other and the results of our research. The determination of reference intervals for thyroid hormones is the subject of ongoing debate due to poor standardisation of immunochemical methods and the lack of unambiguous reference materials [33] as well as large inter-population differences, along with variability related to the age and sex of patients within the same population. The legitimacy of determining reference intervals for thyroid hormones with an indirect method based on the results of hospital tests is additionally supported by the fact that they are very often acquired from screening tests, and therefore a large portion of the results comes from people without thyroid disorders. The use of such an approach additionally eliminates the differences in values that may result from different methods of sampling in different hospital wards and collection points [25]. Encouraged by the hints contained in Jones’ publication [3], in the field of disseminating the results of research on determining reference intervals, regardless of the applied approach: direct or indirect, we searched for the values of reference intervals characteristic for the population of our hospital and the methods and apparatus used in our laboratory. At the same time, we are aware that no intervals are perfect and final, and the results obtained by the indirect method, even if they are not absolutely accurate, are closer to the actual state of the population of a given region, because they take into account the analytical and biological variability of the analysed parameter.

Conclusions

TSH and fT4 levels change during a person’s lifetime and vary between sexes, but the difference does not always have an influence on the clinical interpretation of laboratory results in the context of reference intervals. The discussed differences between RIs estimated in this study and the literature data as well as manufacturers’ information reflect the actual distribution of results in the population of a given region and are consistent with the idea of screening the functioning of the thyroid gland firstly by TSH and fT4. Differences between our RIs in comparison to other direct and indirect studies were probably caused by analytical (different antibody characteristics used in reagents, lack of harmonization) and epidemiological factors (different populations, socio-economic status and undefined geographic covariates).

Strengths of the study

Considering common weaknesses or even shortcomings in determining reference intervals, such as the uncritical acceptance of the values supplied by the manufacturer, indirect methods with their implementation of computer applications [18] may well be the best alternative for regional hospitals and field laboratories serving the population of a given region. This is especially useful in terms of accessing hard-to-reach paediatric or geriatric populations. Comparing the RIs obtained by the indirect method can also be used to confirm compliance with the RI values provided by the manufacturer of the reagent kits. Considering the differentiation of reference values determined both by direct and indirect methods in different populations, the use of indirect methods is, in our opinion, justified due to their low cost, ease of application, and above all the possibility of imaging the distribution of results in the population of a given region rather than relying on data from other geographic and cultural areas.

Limitations of the study

The greatest limitation of this study was using unselected data from our LIS. We did not have the possibility to download the data with the ICD code and delete results repeatedly performed on the same patient. However, the applied Hoffman method is based on the use of values located in the middle of the distribution to estimate the reference intervals. This was preceded by the rejection of outliers, i.e. pathological values–the Hoffman method is relatively robust in their occurrence. Another important limitation was a lack of verification of the indirect RIs established using the direct method which would have enabled us to confirm that the exclusion of patients with thyroid disorders is not required to obtain proper reference intervals from hospital populations with many thousands of records. (XLSX) Click here for additional data file. 8 Oct 2021 PONE-D-21-26253Establishing laboratory-specific reference intervals for TSH and fT4 by use of the indirect Hoffman method PLOS ONE Dear Dr. Płaczkowska, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Two referees and I have reviewed your manuscript (MS) and both referees have requested changes and additions for you to address in a revised MS. I find all comments pertinent, from both referees, especially those by Ref. 1 about providing a more complete methods section and a more readable Discussion, and reorganizing some topics. Ref. 1 has assessed your statistical analysis and found it adequate, so the comment about that by Ref. 2 can be ignored. Your attention to major and minor comments should make your contribution more correct and impactful. Please submit your revised manuscript by November 10. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Joseph DiStefano III, PhD Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please include your full ethics statement in the ‘Methods’ section of your manuscript file. In your statement, please include the full name of the IRB or ethics committee who approved or waived your study, as well as whether or not you obtained informed written or verbal consent. If consent was waived for your study, please include this information in your statement as well. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: I Don't Know ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: General Comment The authors used an indirect method to establish reference intervals for TSH and FT4. This is an interesting approach, which merits greater attention in the literature. The method offers great potential to clinicians to efficiently validate reference intervals provided by manufacturers and laboratories using data readily available to them from a local population of interest. As for the strength of Methods, the description of methods is incomplete, lacking details which are necessary for the reader to follow the process from data collection to final results. Some of the critical omissions are discussed but belong to methods in the first place. The structure of the paper can be improved to better guide the reader through the whole article. Suggest to make some statements more precise, elaborate on methodological detail and summarise results in the text. The discussion could be more concise and better structured. Specific comments are provided below. Introduction Line 104 TSH This is not generally true, approximately 0.4 to 4 might be the best we can rightfully claim to be precise. Line 105 "variability of the value in an individual" Pls provide a reference for such statements, e.g. Andersen et al., also pls for "setpoint" in the next line. Line 110 "earlier detection of disturbances". This is certainly true, but when would these disturbances indicate true change in the underlying thyroid condition to the clinician rather than fluctuations. Line 112 "in order to confirm thyroid primary dysfunction" This is wrong, and in need of a reference. For confirmation, unlike screening, all three thyroid hormones may be required, because there is diseases such as T3 hyperthyroidism. Line 115 "it is not advisable to order isolated fT4 testing" This is a misrepresentation of the reference. Again, true for screening only. Otherwise, FT4 may be useful by itself in secondary hypothyroidism where TSH may fail to confirm the diagnosis. These statements do not provide compelling reasons, but it is ok for the authors to focus on the two main hormones if they wish to do so. Line 121 "to provide the possibility to utilise" This is difficult to understand. Pls simplify, perhaps "to evaluate" Why use the method of Hoffman? Why not prefer Katayev? Methods Line 128 Atellica IM The manufacturer of the assay should be named. What was the inter-series performance of the methods? What was the conventional reference range for TSH and FT4? Line 137 "completely anonymous" This is not exactly right terminology as per data protection standards, probably the authors meant "deidentified aggregated data". Line 143 "Excel spreadsheet" Excel is not regarded as a proper data base solution, and not considered safe (even banned) by some institutions and organisations. Line 149 Has the diagnostic code (ICD) been considered in selecting patients? What happened to severely ill patients or patients with interfering comorbidities and medications? What about the use of thyroid medication? What happened to repeat measurements from the same patient? These are critical issues that belong to methods. If the authors felt their method is more robust than other procedures, for instance requiring less accurate sampling of normal subjects, this would be most interesting, but should be included in the evaluation of the method. Statistical analysis It has been reported that log transformation failed to achieve both linearity of TSH and an acceptable normal distribution unless TPO-Ab positive subjects who add a right skew are removed prior to the analysis. Did the authors examine a possible influence by contamination of the euthyroid sample, for instance with auto-antibody positive subjects? Clinical categorisation is essential when establishing a reference range, which does not conceptionally extend to heterogeneous populations with various thyroid pathologies. The appropriateness of the age groups should be evaluated with age as a continuous outcome to further confirm their physiologically-based selection by cut-off analysis. Given the large sample size this should be feasible. Tuckey test? Pls specify at what levels patients were excluded. The Hoffman method needs to be explained more in detail and referenced. Why Hoffman? What is the difference to the method of Katayev el al.. The latter provides a detailed statistical procedure for dealing with the error term. FT4 does follow a normal distribution. Hence, a log transformation seems statistically unnecessary, and clinically unwanted, even potentially detrimental given the well known issues with interpretation of back transformed estimates in clinical medicine. Figure 1 should be moved to the Results section. The extrapolation to the boundaries should be precisely indicated in the Figure, as this is a critical part of the method. A statistical measure of how close the data are to the fitted line such as r squared should be reported. Results Line 192 A brief summary of main outcomes beyond merely referring to tables would be more informative to the reader, e.g. comment briefly on age dependency. Line 196 "decreased" with age" This could be more informative, for instance say something about the magnitude or relevance of effects. In a large sample there will always be some minor differences, which might not be all that relevant. Line 200 "significant" Does this mean statistically significant? Have differences been statistically assessed, for instance by interaction with age or age categories? Discussion The authors briefly discuss the method by Katayev et al.. This method has been successfully adopted by a few authors in the literature to verify TSH reference values, compare them with the limits provided by manufacturers and, importantly, facilitate clinical decision making in borderline conditions. The authors might want to consider discussing/ replicating some relevant findings from those publications, some of which they did not mention. The discussion is detailed, but difficult to follow. For instance in line 346 the authors mention Ris and then a comparison with LRI. A clearer structure and concise summary of similarities/ differences to other approaches would help. Line 403 "therefore a large portion of the result comes from people without thyroid disorders." This raises some questions to be both examined (see Methods) and discussed. What impact do thyroid disorders have on the validity of the method? At what proportion of pathologies might the method fail? What makes the method robust? Figures I can’t seem to find legends to Figures. Reviewer #2: The authors have defined reference intervals (RIs) for TSH and FT4 by applying the indirect Hoffman method based on a large data pools from patients and outpatients stored in laboratory information systems over a period of five years. There are not many data regarding the Hoffman method and thus, they can be compared only with manufacturers’ data. It is an analytical study. The methods should be reviewed by a statistician who has experience with Hoffman’s method. The study is interesting and methodologically novel. However, I have some minor comments: 1. Line 400: “The legitimacy of determining reference intervals for thyroid hormones with an indirect method…based on the results of hospital tests is additionally supported by the fact that these tests are screened, and therefore a large portion of the results comes from people without thyroid disorders”. However, this method does not eliminate differences in the levels of parameters (TSH & FT4) resulting from drugs and concomitant diseases which are common in hospital sampling and interfere with TSH and particularly with FT4 measurement? 2. The data should be validated by other studies and compared with direct methods. The URIs and LRIs estimated by the indirect method may be compared for concordance with the 2.5th and 97.5th percentile analysis? This evaluation can be performed easily and cheaply as validation of Hoffman’s method. 3. The indirect method however, may not distinguish well enough pathological from non- pathological levels and this may be an additional limitation. 4. This analysis clearly shows the importance of measuring FT4 together with TSH in establishing a diagnosis of thyroid disease. TSH is a reliable indicator of disease. 5. A minor linguistic revision is needed. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Rudolf Hoermann Reviewer #2: Yes: Leonidas Duntas [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 15 Nov 2021 Dear Reviewer’s Thank you very much for all your constructive comments concerning our manuscript. All changes are highlighted in yellow in revised version. A detailed description of changes in the manuscript appears below. Reviewer #1: General Comment We kindly thank you for giving us the chance to improve our manuscript. Thank you very much for all your constructive comments concerning our manuscript. A detailed description of changes in the manuscript appears below. The authors used an indirect method to establish reference intervals for TSH and FT4. This is an interesting approach, which merits greater attention in the literature. The method offers great potential to clinicians to efficiently validate reference intervals provided by manufacturers and laboratories using data readily available to them from a local population of interest. As for the strength of Methods, the description of methods is incomplete, lacking details which are necessary for the reader to follow the process from data collection to final results. Some of the critical omissions are discussed but belong to methods in the first place. The structure of the paper can be improved to better guide the reader through the whole article. Suggest to make some statements more precise, elaborate on methodological detail and summarise results in the text. The discussion could be more concise and better structured. Specific comments are provided below. Introduction Line 104 TSH This is not generally true, approximately 0.4 to 4 might be the best we can rightfully claim to be precise. Answer: Thank you for drawing our attention to this statement. We decreased the firmness of the quoted values in relation to the general population by adding the word “approximately”, bearing in mind the discrepancies in the literature which are generally summed up as the range 0.4-4.0 and now it sounds “is approximately 0.4-4.0 mIU/L” Line 105 "variability of the value in an individual" Pls provide a reference for such statements, e.g. Andersen et al., also pls for "setpoint" in the next line. Answer: As suggested by the reviewer, we introduced appropriate literature references in the places indicated by the reviewer: - Andersen S, Pedersen KM, Bruun NH, Laurberg P. Narrow individual variations in serum T(4) and T(3) in normal subjects: a clue to the understanding of subclinical thyroid disease. J Clin Endocrinol Metab. 2002 Mar;87(3):1068-72. doi: 10.1210/jcem.87.3.8165. PMID: 11889165 -Hoermann R, Midgley JEM, Larisch R, Dietrich JW. Recent advances in thyroid hormone regulation: toward a new paradigm for optimal diagnosis and treatment. Front Endocrinol (Lausanne). 2017;8:364. https://doi.org/10.3389/fendo.2017.00364 Now this passage is as follows: “However, the variability of the value in an individual is much smaller [10] , and the value determined in a state of hormonal equilibrium can be regarded as an individual's set-point [11].” Line 110 "earlier detection of disturbances". This is certainly true, but when would these disturbances indicate true change in the underlying thyroid condition to the clinician rather than fluctuations. Answer: We have edited the sentence indicated by the reviewer and we hope that it is currently correct. The sentence reads as follows: “This allows for earlier detection of clinical important disturbances in thyroid condition, even without direct comparison to the reference interval and taking into account the physiological changes in concentration related to age.” Line 112 "in order to confirm thyroid primary dysfunction" This is wrong, and in need of a reference. For confirmation, unlike screening, all three thyroid hormones may be required, because there is diseases such as T3 hyperthyroidism. Answer: We have edited this paragraph in line with the reviewer's instructions. Now this passage is as follows: “This allows for earlier detection of clinical important disturbances in thyroid condition, even without direct comparison to the reference interval [12] and taking into account the physiological changes in concentration related to age [13].” Line 115 "it is not advisable to order isolated fT4 testing" This is a misrepresentation of the reference. Again, true for screening only. Otherwise, FT4 may be useful by itself in secondary hypothyroidism where TSH may fail to confirm the diagnosis. These statements do not provide compelling reasons, but it is ok for the authors to focus on the two main hormones if they wish to do so. Answer: We have edited this paragraph in line with the reviewer's instructions and now it reads: “According to the current recommendations, in order to screening for thyroid primary dysfunction, first TSH determinations should be performed repetitively in 3-6 month intervals, followed by fT4 for differentiation of subclinical and “overt” thyroid dysfunction. fT3 determinations should be ordered only in specific cases [12-15].” Line 121 "to provide the possibility to utilise" This is difficult to understand. Pls simplify, perhaps "to evaluate" Answer: We have edited the Objective paragraph in line with the reviewer's instructions and now it reads: “The aim of this study was to establishing the reference intervals for TSH and fT4 from the large data pools of patient results stored in laboratory information systems (LIS) using the indirect Hoffman method, and conducting a comparison of RIs to generally used reference limits.” Why use the method of Hoffman? Why not prefer Katayev? Answer: We decided to use the Hoffman method with insignificant modifications consisting in logarithmic data transformation due to its simplicity and the possibility of using simple spreadsheets, e.g. Excel (Georg Hoffmann et al. in 2016 (DOI 10.1515/labmed-2015-0104) or statistical packages such as Statistica in our case. Katayeva's computerized method is definitely more complicated methodically and requires the use of the R package, which is actually free but definitely more complicated to use, especially for employees of routine medical laboratories. More detailed explanations of the choice of method are provided in response to Statistical Analysis. Methods Line 128 Atellica IM The manufacturer of the assay should be named. What was the inter-series performance of the methods? What was the conventional reference range for TSH and FT4? Answer: We have completed the missing information on the reagent supplier and the inter-series variability. Whereas, the conventional reference ranges used in our laboratory were the same as the manufacturer provided and showed at Tables 2 (TSH) and Table 4 (fT4). Line 137 "completely anonymous" This is not exactly right terminology as per data protection standards, probably the authors meant "deidentified aggregated data". Answer: We entered the correct nomenclature and now the sentence is: “Based on the bioethics committee decision patient informed consent was waived due the retrospective nature of the study conducted on a deidentified aggregated numerical data.” Line 143 "Excel spreadsheet" Excel is not regarded as a proper data base solution, and not considered safe (even banned) by some institutions and organisations. Answer: In our institution, it is acceptable to use Excel as a database with safety standards appropriate for our conditions. However, due to the doubts that arose, we have removed this passage from the manuscript. Line 149 Has the diagnostic code (ICD) been considered in selecting patients? What happened to severely ill patients or patients with interfering comorbidities and medications? What about the use of thyroid medication? What happened to repeat measurements from the same patient? These are critical issues that belong to methods. If the authors felt their method is more robust than other procedures, for instance requiring less accurate sampling of normal subjects, this would be most interesting, but should be included in the evaluation of the method. Answer: We are aware that no method of determining RI is perfect and it is always a choice between a well-defined but not very numerous group in direct methods and a less characterized but definitely more numerous group in indirect methods. In our opinion, any effort made to best match the reference interval to the cared for patient population is better than adopting the RI proposed by the manufacturer. Our opinion is based on an analysis of the RI values provided by the manufacturer of the reagents used by us, which applies only to selected age groups, without information whether the division into sex was taken into account, and despite the declaration of the use of the CLSI procedure, the insufficient number of reference groups. Statistical analysis It has been reported that log transformation failed to achieve both linearity of TSH and an acceptable normal distribution unless TPO-Ab positive subjects who add a right skew are removed prior to the analysis. Did the authors examine a possible influence by contamination of the euthyroid sample, for instance with auto-antibody positive subjects? Clinical categorisation is essential when establishing a reference range, which does not conceptionally extend to heterogeneous populations with various thyroid pathologies. Answer: We would like to thank the reviewer for drawing attention to the important problem of selecting patients for the determination of reference intervals mentioned in two paragraphs above. These issues are important in the process of determining RI and critical for direct methods. We got acquainted in detail with the available publications which used different criteria for excluding patients on the basis of available information such as ICD, taking medications, TPO antibodies. However, when planning our research, we were aware of the limitations of LIS functionality in our laboratory. It was not possible to download from the database results for single patient, which would allow obtaining a thyroid hormone profile for individual patients. Moreover, we also did not have access to the ICD and the identification of patients who had the same test repeatedly performed. This resulted in a lack of patient selection prior to statistical analysis. Therefore, we cannot provide clinical exclusion criteria for this study. Our analysis was based on the removal of outliners and the assumptions of Hoffman's method, according to which the linear part of the Q-Q plot corresponds to the range of empirical data whose distribution is consistent with the normal distribution. The Hoffmann method assume a Gaussian distribution of physiological test results and only this range of results is used to determine the reference intervals. In order to prevent the doubts presented by the reviewers regarding the selection of the results analyzed in the Data Gathering chapter, we added information about the lack of any selection of the analyzed results. Information on the reason for this situation is included in the limitation at the end of the discussion. The difficulties we have, encountered result from the need to import re-anonymized data. Therefore, attempts have already been made to adapt Laboratory and Hospital Information System to the possibility of importing the results assigned to an individual patient into the database, not only the results of a given parameter, without being related to other determinations performed on the same blood sample. The appropriateness of the age groups should be evaluated with age as a continuous outcome to further confirm their physiologically-based selection by cut-off analysis. Given the large sample size this should be feasible. Answer: We did not decide to develop the data as a continuous variable, e.g. at intervals of 1 year, because it resulted in small a number of individual groups, especially in the youngest and the oldest age groups. That is why we decided to distinguish groups that correspond to the various stages of somatic and social development of a human being, i.e. infancy, preschool, school, adolescence and adult period, with a focus on the periods of young and old adulthood and senior age. The distinction of the main age ranges was also given for the assumed purpose of the study, i.e. the comparison of the RI values with the data from the reagent manufacturer, which are included in Tables 1 and 2. We developed the rationale for selecting just such age groups by adding an excerpt: “and reflects the main age groups for which the manufacturer provides reference value” in the Statistical Analysis chapter. Tuckey test? Pls specify at what levels patients were excluded. Answer: We used standard cut-off 1.5 IQR for detection and exclusion outliners. This information was added in the Statistical Analysis section. The Hoffman method needs to be explained more in detail and referenced. Answer: As suggested by the reviewer, we have made significant changes to the Statistical analysis section and we hope that this will result in better readability of this chapter. Why Hoffman? What is the difference to the method of Katayev el al.. The latter provides a detailed statistical procedure for dealing with the error term. Answer: The original Hoffman method involved a scatter plot of experiment values vs cumulative frequency in a probability scale, not a linear scale (cumulative frequency). As described by Dan Holmes et al. in 2018 (DOI: 10.1093/AJCP/AQY149), many authors presented a typical example of incorrect implementation of Hoffmann method using a linear cumulative frequency plot instead of probability scale like in Quantile-Quantile plots. We chose the Hoffman method having knowledge of the possibility of using the cumulative linear plot proposed by Katayev et al. (Katayev A, Balciza C, Seccombe DW. Establishing reference intervals for clinical laboratory test results: is there a better way?Am J Clin Pathol. 2010;133:180-186). We also followed the discussion between Katayev and other authors in medical journals meticulously: 1.Katayev A, Fleming JK, Luo D, et al. Reference intervals data mining: no longer a probability paper method. Am J Clin Pathol. 2015;143:134-142 2. Graham Jones, MD, Gary Horowitz, MD, Reference Intervals Data Mining: Getting the Right Paper, American Journal of Clinical Pathology, Volume 144, Issue 3, 1 September 2015, Pages 526–527, https://doi.org/10.1309/AJCP26VYYHIIZLBK 3. Holmes DT, Buhr KA. Widespread incorrect implementation of the Hoffmann method, the correct approach, and modern alternatives. Am J Clin Pathol. 2019;151:328-336 4. Alexander Katayev, MD, James K Fleming, PhD, Daniel T Holmes, MD, Kevin A Buhr, PhD, Widespread Implementation of the Hoffmann Method: A Second Opinion, American Journal of Clinical Pathology, Volume 152, Issue 1, July 2019, Pages 116–117, https://doi.org/10.1093/ajcp/aqz015 Due to the intended purpose of the work, which was, inter alia, to demonstrate that the RI determination by the Hoffman method can be achieved with the use of the simplest statistical packages (e.g. Excel, Statistica), we decided to use the original Hoffman method based on the use of quantile-quantile plots, preceded by removing outliers and visually and statistical fitting the regression line to empirical data and a hypothetical distribution. (For Excell package see Georg Hoffmann et al. in). In the analyses carried out using the Hoffman method, we also obtained the error term in the regression equation. In our study, the curve fit was made on the basis of visual inspection and accepted on the basis of a linear correlation coefficient that could not be less than 0.99. Due to the almost perfect fit of our curves (r> 0.99), the component e of the regression equation was so insignificant that we did not report it in the publication. (Katayev and Larisch also did not provide the value of error) FT4 does follow a normal distribution. Hence, a log transformation seems statistically unnecessary, and clinically unwanted, even potentially detrimental given the well known issues with interpretation of back transformed estimates in clinical medicine. Answer: As we mention above, we used log transformations in the study due to the log-normal distribution of fT4 values in all analyzed age groups. Perhaps this is due to the lack of strict exclusion criteria for people with thyroid disorders or taking medications that affect the regulation of thyroid secretion. We assume that these values were removed from the group of reference results in the course of further analysis. There are publications available confirming the reviewer's statement, but also those that indicate the log-normal distribution of fT4 results in both more and less numerous study groups. However, our results are consistent with reports by other authors, (see below) which indicates that both situations are possible depending on the studied population. - Wang Y, Zhang YX, Zhou YL, Xia J. Establishment of reference intervals for serum thyroid-stimulating hormone, free and total thyroxine, and free and total triiodothyronine for the Beckman Coulter DxI-800 analyzers by indirect method using data obtained from Chinese population in Zhejiang Province, China. J Clin Lab Anal. 2017 Jul;31(4):e22069. doi: 10.1002/jcla.22069. Epub 2016 Sep 26. PMID: 27716997; PMCID: PMC6817203. - Ganslmeier, Mira, Castrop, Claudia, Scheidhauer, Klemens, Rondak, Ina-Christine and Luppa, Peter B.. "Regional adjustment of thyroid hormone reference intervals" LaboratoriumsMedizin, vol. 38, no. 5, 2014, pp. 281-287. https://doi.org/10.1515/labmed-2014-0008 - Milinković N, Ignjatović S, Zarković M, Radosavljević B, Majkić-Singh N. Indirect estimation of reference intervals for thyroid parameters. Clin Lab. 2014;60(7):1083-9. doi: 10.7754/clin.lab.2013.130733. PMID: 25134375. Figure 1 should be moved to the Results section. The extrapolation to the boundaries should be precisely indicated in the Figure, as this is a critical part of the method. A statistical measure of how close the data are to the fitted line such as r squared should be reported. Answer: We moved the graph to the results chapter and modified it as suggested by the reviewer. We have introduced a regression equation describing the linear part of the graph with a correlation coefficient. We also added information to the statistical analysis chapter on the selection criteria for the linear part of the chart: “The regression line included the middle range of data, which was initially fitted by visual inspection and was statistically confirmed by determining the linear correlation coefficient, only r> 0.99 was acceptable.” Results Line 192 A brief summary of main outcomes beyond merely referring to tables would be more informative to the reader, e.g. comment briefly on age dependency. Answer: A preliminary discussion of the reference interval distribution for TSH and fT4 is given in Figure 1. Table 1 is a retail numerical representation of the Figure 1 data for TSH along with a comparison of the percentage differences for each group with respect to the estimated RCV. The RCV values for the analyzed parameters are provided in the footer of the table. We discuss the results of Table 1 in the text below for differences by age and gender. In order to improve the clarity of the discussion of the results of Table 1, we changed the order, starting with the discussion of age groups and then moving on to a comparison between sexes within the same age group. Line 196 "decreased" with age" This could be more informative, for instance say something about the magnitude or relevance of effects. In a large sample there will always be some minor differences, which might not be all that relevant. Line 200 "significant" Does this mean statistically significant? Have differences been statistically assessed, for instance by interaction with age or age categories? Answer: All comparisons in our work were carried out in relation to the determined RCV value. Therefore, significant values are greater than the estimated RCV. We have quoted the RCV values calculated for our study at the end of the statistical analysis chapter Discussion The authors briefly discuss the method by Katayev et al.. This method has been successfully adopted by a few authors in the literature to verify TSH reference values, compare them with the limits provided by manufacturers and, importantly, facilitate clinical decision making in borderline conditions. The authors might want to consider discussing/ replicating some relevant findings from those publications, some of which they did not mention. Answer: As discussed above, we decided to use Q-Q charts in our work because of its application possibilities. at the same time, an ongoing discussion on the correctness of the Katayev method would expose us to critical comments from potential reviewers. Accordingly, the selected publication by Katayev et al. is cited to the same extent as works by other authors. The discussion is detailed, but difficult to follow. For instance in line 346 the authors mention Ris and then a comparison with LRI. A clearer structure and concise summary of similarities/ differences to other approaches would help. Answer: Thank you very much for your comments on the difficulties with the structure and clarity of the discussion. Following the reviewer's guidelines, we have completely reorganized this chapter: - we limited our discussion only to the works of other authors based on indirect methods of RIs determination - the most important results from the available publications are summarized in table 5 - we removed too detailed quotations of other authors' results - the changed structure of the discussion consists of 3 parts: citing the most important results of our study, discussion of the results determined by the Hoffman method with the values provided by the reagent manufacturer, and finally, comparison of our results with data from international literature - we have identified the strengths and weaknesses of the study as separate subsections We hope that such a structured chapter Discussion will meet the expectations of the reviewer and potential readers. We have made every effort to present our achievements in an unequivocal and legible manner. Line 403 "therefore a large portion of the result comes from people without thyroid disorders." This raises some questions to be both examined (see Methods) and discussed. What impact do thyroid disorders have on the validity of the method? At what proportion of pathologies might the method fail? What makes the method robust? Answer: In response to a pertinent comment by the reviewer, we cited Drees et al. in a more precise way, and actually this sentence is: “The legitimacy of determining reference intervals for thyroid hormones with an indirect method based on the results of hospital tests is additionally supported by the fact that their performance is very often ordered as a screening tests, and therefore a large portion of the results comes from people without thyroid disorders.” Figures I can’t seem to find legends to Figures. Answer: Descriptions of the figures were placed in the text in accordance with the guidelines for the authors, but they were barely visible due to the lack of separation from the text. In the revised manuscript, we placed the descriptions of the figures as separate paragraphs so that they are clearly visible and placed its one again at the end of the manuscript. Reviewer #2: We kindly thank you for giving us the chance to improve our manuscript. Thank you very much for all your constructive comments concerning our manuscript. A detailed description of changes in the manuscript appears below. The authors have defined reference intervals (RIs) for TSH and FT4 by applying the indirect Hoffman method based on a large data pools from patients and outpatients stored in laboratory information systems over a period of five years. There are not many data regarding the Hoffman method and thus, they can be compared only with manufacturers’ data. It is an analytical study. The methods should be reviewed by a statistician who has experience with Hoffman’s method. The study is interesting and methodologically novel. However, I have some minor comments: 1. Line 400: “The legitimacy of determining reference intervals for thyroid hormones with an indirect method…based on the results of hospital tests is additionally supported by the fact that these tests are screened, and therefore a large portion of the results comes from people without thyroid disorders”. However, this method does not eliminate differences in the levels of parameters (TSH & FT4) resulting from drugs and concomitant diseases which are common in hospital sampling and interfere with TSH and particularly with FT4 measurement? Answer: We would like to thank the reviewer for drawing attention to the important problem of selecting patients for the determination of reference intervals. This is critically important for direct RI selection methods. We got acquainted in detail with the available publications which used different criteria for excluding patients on the basis of available information such as ICD, taking medications, TPO antibodies. However, when planning our research, we were aware of the limitations of LIS functionality in our laboratory. it was not possible to download from the single patient database, which would allow obtaining a thyroid hormone profile for individual patients. Moreover, we also did not have access to the ICD and the identification of patients who had the same test repeatedly performed. This resulted in a lack of patient selection prior to statistical analysis. For this reason, we cannot provide clinical exclusion criteria for this study. Our analysis was based on the removal of outliners and the assumptions of Hoffman's method, according to which the linear part of the Q-Q plot corresponds to the range of empirical data whose distribution is consistent with the normal distribution. The Hoffmann method assume a Gaussian distribution of physiological test results and only this range of results is used to determine the reference intervals. In order to prevent the doubts presented by the reviewers regarding the selection of the results analyzed in the Data Gathering chapter, we added information about the lack of any selection of the analyzed results. Information on the reason for this is included in the limitation at the end of the discussion. 2. The data should be validated by other studies and compared with direct methods. The URIs and LRIs estimated by the indirect method may be compared for concordance with the 2.5th and 97.5th percentile analysis? This evaluation can be performed easily and cheaply as validation of Hoffman’s method. Answer: Thank you for paying attention to this aspect of research and suggestions in the field of validation of the obtained results against other methods. We are aware that the validation of the results obtained by us would increase the credibility and confirm the application of the established reference intervals. However, due to the limitations in access to patients data presented in the previous answer, we did not have the possibility to create a group of patients homogeneously enough to be able to apply direct a posteriori methodologies based on the designation of the 2.5th and the 97.5th. The difficulties we have encountered result from the need to import re-anonymized data. Therefore, attempts have already been made to adapt LIS and HIS to the possibility of importing the results assigned to a specific patient into the database, and not only the results of a given parameter, without being related to other determinations performed on the same blood sample. 3. The indirect method however, may not distinguish well enough pathological from non- pathological levels and this may be an additional limitation. Answer: We are aware that no method of determining RI is perfect and it is always a choice between a well-defined but not very numerous group in direct methods and a less characterized but definitely more numerous group in indirect methods. In our opinion, any effort made to best match the reference interval to the studied patient population is better than adopting the RI proposed by the manufacturer. Our opinion is based on an analysis of the RI values provided by the manufacturer of the reagents used by us, which applies only to selected age groups, without information whether the division into sex was taken into account, and despite the declaration of the use of the CLSI procedure, the insufficient number of reference groups. 4. This analysis clearly shows the importance of measuring FT4 together with TSH in establishing a diagnosis of thyroid disease. TSH is a reliable indicator of disease. Answer: Obviously, we agree with the opinion of the reviewer, which is consistent with the current recommendations cited at the end of Background section of our manuscript: “According to the current recommendations, in order to screening for thyroid primary dysfunction, first TSH determinations should be performed repetitively in 3-6 month intervals, followed by fT4 for differentiation of subclinical and “overt” thyroid dysfunction. fT3 determinations should be ordered only in specific cases.” 5. A minor linguistic revision is needed. Answer: We have made the necessary changes to the text of the manuscript. Submitted filename: Answers to Reviewer final.docx Click here for additional data file. 9 Dec 2021 Establishing laboratory-specific reference intervals for TSH and fT4 by use of the indirect Hoffman method PONE-D-21-26253R1 Dear Dr. Płaczkowska, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Joseph DiStefano III, PhD Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: 29 Dec 2021 PONE-D-21-26253R1 Establishing laboratory-specific reference intervals for TSH and fT4 by use of the indirect Hoffman method Dear Dr. Płaczkowska: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Prof. Joseph DiStefano III Academic Editor PLOS ONE

26 in total

1. Establishing reference intervals for clinical laboratory test results: is there a better way?

Authors: Alex Katayev; Claudiu Balciza; David W Seccombe
Journal: Am J Clin Pathol Date: 2010-02 Impact factor: 2.493

2. Reference range for thyrotropin. Post hoc assessment.

Authors: Rolf Larisch; A Giacobino; W Eckl; H-G Wahl; J E M Midgley; R Hoermann
Journal: Nuklearmedizin Date: 2015-01-08 Impact factor: 1.379

3. TSH continuous reference intervals by indirect methods: A Comparisons to Partitioned Reference Intervals.

Authors: Khelil Mohamed Mokhtar
Journal: Clin Biochem Date: 2020-08-12 Impact factor: 3.281

4. Establish and verify TSH reference intervals using optimized statistical method by analyzing laboratory-stored data.

Authors: Y Feng; W Bian; C Mu; Y Xu; F Wang; W Qiao; Y Huang
Journal: J Endocrinol Invest Date: 2014-01-09 Impact factor: 4.256

Review 5. Childhood Thyroid Function Reference Ranges and Determinants: A Literature Overview and a Prospective Cohort Study.

Authors: Ibrahim Önsesveren; Mirjana Barjaktarovic; Layal Chaker; Yolanda B de Rijke; Vincent W V Jaddoe; Hanneke M van Santen; Theo J Visser; Robin P Peeters; Tim I M Korevaar
Journal: Thyroid Date: 2017-10-24 Impact factor: 6.568

10. Standardization of Free Thyroxine and Harmonization of Thyrotropin Measurements: A Request for Input from Endocrinologists and Other Physicians.

Authors: Linda M Thienpont; James D Faix; Graham Beastall
Journal: Thyroid Date: 2015-10-14 Impact factor: 6.568