Literature DB >> 35417495

Using machine learning to understand age and gender classification based on infant temperament.

Maria A Gartstein¹, D Erich Seamon², Jennifer A Mattera¹, Michelle Bosquet Enlow³, Rosalind J Wright^4,5, Koraly Perez-Edgar⁶, Kristin A Buss⁶, Vanessa LoBue⁷, Martha Ann Bell⁸, Sherryl H Goodman⁹, Susan Spieker¹⁰, David J Bridgett¹¹, Amy L Salisbury¹², Megan R Gunnar¹³, Shanna B Mliner¹³, Maria Muzik¹⁴, Cynthia A Stifter⁶, Elizabeth M Planalp¹⁵, Samuel A Mehr¹⁶, Elizabeth S Spelke¹⁶, Angela F Lukowski¹⁷, Ashley M Groh¹⁸, Diane M Lickenbrock¹⁹, Rebecca Santelli²⁰, Tina Du Rocher Schudlich²¹, Stephanie Anzman-Frasca²², Catherine Thrasher²³, Anjolii Diaz²⁴, Carolyn Dayton²⁵, Kameron J Moding²⁶, Evan M Jordan²⁷.

Abstract

Age and gender differences are prominent in the temperament literature, with the former particularly salient in infancy and the latter noted as early as the first year of life. This study represents a meta-analysis utilizing Infant Behavior Questionnaire-Revised (IBQ-R) data collected across multiple laboratories (N = 4438) to overcome limitations of smaller samples in elucidating links among temperament, age, and gender in early childhood. Algorithmic modeling techniques were leveraged to discern the extent to which the 14 IBQ-R subscale scores accurately classified participating children as boys (n = 2,298) and girls (n = 2,093), and into three age groups: youngest (< 24 weeks; n = 1,102), mid-range (24 to 48 weeks; n = 2,557), and oldest (> 48 weeks; n = 779). Additionally, simultaneous classification into age and gender categories was performed, providing an opportunity to consider the extent to which gender differences in temperament are informed by infant age. Results indicated that overall age group classification was more accurate than child gender models, suggesting that age-related changes are more salient than gender differences in early childhood with respect to temperament attributes. However, gender-based classification was superior in the oldest age group, suggesting temperament differences between boys and girls are accentuated with development. Fear emerged as the subscale contributing to accurate classifications most notably overall. This study leads infancy research and meta-analytic investigations more broadly in a new direction as a methodological demonstration, and also provides most optimal comparative data for the IBQ-R based on the largest and most representative dataset to date.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35417495 PMCID： PMC9007342 DOI： 10.1371/journal.pone.0266026

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Although a number of approaches have been developed for the purpose of measuring temperament in childhood, including a variety of observational procedures and physiological techniques, parent report continues to be most widely used overall [1]. The latter is due to a number of factors, prominently among these being ease of administration and scoring as well as accessibility. Parent-report also provides descriptors of child temperament across time and situations, not just a “snapshot” of reactivity and/or regulation that can be gleaned from brief laboratory observations. Although multiple temperament theories or frameworks have been proposed, Rothbart’s psychobiological model is generally viewed as most widely accepted at this time [2]. This approach casts temperament as constitutionally based individual differences in reactivity and self-regulation, with constitutional referring to the relatively enduring biological make-up of the individual, influenced by heredity, maturation, and experience. Reactivity refers to the arousability of emotional, motor, and attentional responses, assessed by threshold, latency, intensity, time to peak intensity, and recovery time of reactions. Self-regulation embodies processes that can serve to modulate reactivity, such as soothability and inhibitory control [3]. Although temperament has often been delineated into three overarching factors of Negative Emotionality, Positive Affectivity/Surgency, and Regulatory Capacity/Orienting, more recent studies emphasize the narrowly defined component scales. This shift toward a fine-grained approach is a function of research demonstrating individual scales that belong to the same overarching factor differentially predict important outcomes (e.g., behavior problems), present with growth trajectories discrepant from the overarching factors, and contribute to temperament profiles in a manner inconsistent with the overarching factor content (i.e., scales that load onto different factors contribute to the same profile, and vice versa–components of the same factor define different profiles/classes; [4-7]. The Infant Behavior Questionnaire-Revised (IBQ-R) designed to provide indicators of infant temperament comprises 14 fine-grained scales: Activity Level, Smiling/Laughter, Approach, High Intensity Pleasure, Perceptual Sensitivity, Vocal Reactivity, Fear, Distress to Limitations, Sadness, Falling Reactivity, Duration of Orienting, Soothability, Cuddliness/Affiliation, and Low Intensity Pleasure, and is the focus of this investigation.

Development of temperament and age differences

Manifestations of temperament transform over development, with rapid change during infancy [8]. Positive emotionality (e.g., smiling), rarely expressed during the newborn period, is observed more reliably between ages two and three months, and increases in expression throughout the first year of life [8,9]. Levels of activity, approach, distress to limitations, and fear increase throughout the first year of life as well [10-14]. Anger reactions across infancy appear to follow a U-shaped trajectory [12,15]. The decrease in anger responses occurring between 2 and 6 months of age has been linked to greater flexibility in attention shifting [16]. In the second half of the first year, infants are likely to respond with anger when unable to grasp an attractive stimulus that has been placed out of reach, or when a caregiver has removed a forbidden object. Fear generally increases throughout the second half of the first year of life [10,12-14], with inhibition of approach toward novel and/or intense stimuli “coming online” [14,17]. The developmental course of attentional orienting has been described as U-shaped in the first year of life [18]. Carranza and colleagues [12], for example, noted decreases in Duration of Orienting between 6 and 9 months, followed by an increase between 9 and 12 months. Toward the end of the first year, skills associated with the development of the executive attention system may contribute to the flexibility of orienting reactions [19-21]. Infants also gain communication skills rapidly during the first year of life [22,23], and thus exhibit greater vocal reactivity over time. With respect to age/developmental differences discerned via the IBQ-R, older infants obtain higher scores on Approach, Vocal Reactivity, High Intensity Pleasure, Activity Level, Perceptual Sensitivity, Distress to Limitations, and Fear, whereas younger infants’ scores are higher for Low Intensity Pleasure, Cuddliness/Affiliation, and Duration of Orienting [24,25]. More recent longitudinal investigations provided further evidence of increases in Fear across the first year of life [5,26], also noting increases in Distress to Limitations and Sadness, albeit not always linear in nature. Falling Reactivity was associated with a quadratic trajectory, with increases followed by declining values later in infancy. Increasing trajectories were noted for attributes associated with Positive Affectivity/Surgency, with trends toward greater Activity Level, Smiling and Laughter, High Intensity Pleasure, Approach, Perceptual Sensitivity, and Vocal Reactivity later in infancy. Growth modeling provided evidence of nonlinear changes in Duration of Orienting, Soothability, Cuddliness, and Low Intensity pleasure, wherein initial growth in values was followed by decreases later in infancy [5]. These findings are largely consistent with prior research relying on different measurement approaches. Although the data examined in this study are cross-sectional in nature, earlier longitudinal evaluations are informative as their results speak to the importance of age in shaping temperament presentations, and vice versa–temperament features as predictors of infant age. It should be noted that no study to date has explored the latter, that is, used temperament dimensions to classify infants with respect to their age, likely due to sample size limitations and only recently available methodological advances in empirically based classification techniques.

Gender differences in temperament

Although a number of gender differences in temperament have been reported for older children and adults, fewer exist for children younger than one year of age [8,25,27,28]. Differences in infancy have been limited to activity level and fear/behavioral inhibition. Higher activity level and approach is evident in boys [29,30], with girls exhibiting greater hesitation in approaching novel objects [14,31]. Campbell and Eaton [29] applied meta-analytic procedures to summarize 46 studies addressing activity level in infancy, estimating the size of the gender difference at 0.2 standard deviations based on objective measures (parent-report measures estimated the difference to be smaller). Gender differences in approach-withdrawal have been reported for samples from different countries [30,32-34], with parents rating boys higher in approach. Martin et al. [31] reported a large and significant gender difference for distress to novelty, with 6-month-old girls receiving higher scores. Gender differences also have been documented with the IBQ-R, as boys received higher scores on Activity and High Intensity Pleasure, and girls higher scores on Fear [24,25,35,36]. Infant gender also predicted intercept values of Fear trajectories, with girls demonstrating higher levels at 4 and 6 months [5,26]. Girls also started out at lower values (i.e., intercept estimates) for Activity Level, Approach, and High Intensity Pleasure. Similar to age/developmental differences research, gender-related temperament studies have only compared temperament for boys and girls, not considering gender classification based on temperament features. Importantly, age- and gender-based temperament distinctions have not been considered jointly, discerning whether age-related changes inform gender differences.

Present study

In this study, we leveraged IBQ-R data collected across multiple laboratories (N = 4,438) to further investigate age and gender differences in infancy, addressing yet unanswered questions. Specifically, algorithmic modeling techniques were used to discern the extent to which the 14 IBQ-R subscale scores (referred to as features) accurately classified participating children as boys (n = 2,298) or girls (n = 2,093; 47 children were missing gender data) and into three age groups: youngest (< 24 weeks; n = 1,102), mid-range (24 to 48 weeks; n = 2,557), and oldest (> 48 weeks; n = 779), because of previously noted gender-based variability [14,29-34] and significant developmental differences among these age groups (e.g., with respect to brain growth and maturation; [37,38]). This study addresses an important gap in research, being the first to consider temperament attributes as determinants of age and gender groupings, quantifying the extent to which early reactivity and regulation provide the features necessary for accurate prediction. Importantly, this work also allows for simultaneous classification of age and gender categories, providing an opportunity to consider the extent to which gender differences are informed by infant age, and to our knowledge, this is the first to study to do so. That is, despite prior demonstrations of reliable age and gender differences in temperament, the two classifications have not been considered jointly, examining whether gender differences were age dependent in a single investigation. Moreover, this effort provides a new direction for infancy and temperament research, serving as a methodological demonstration of machine learning applications, not yet utilized in these areas of scientific inquiry. This meta-analytic data driven effort is the first to rely on advanced machine learning techniques using temperament features to classify infants into age and gender groups, rather than compare temperament of children who vary in age and gender, considering these classifications simultaneously. This cross-laboratory effort also overcomes prior limitations associated with small samples that were not representative, producing results circumscribed in terms generalizability.

Materials and methods

Measures

The Infant Behavior Questionnaire-Revised (IBQ-R; [24]. This parent-report measure of temperament was developed for infants between 3- and 12-months of age. The IBQ-R contains 191 items, which yield 14 scales: Activity Level, Smiling/Laughter, Approach, High Intensity Pleasure, Perceptual Sensitivity, Vocal Reactivity (loading onto Positive Affectivity/Surgency); Fear, Distress to Limitations, Sadness, Falling Reactivity (Negative Emotionality); Duration of Orienting, Soothability, Cuddliness/Affiliation, Low Intensity Pleasure (Regulatory Capacity/Orienting). Individual items are rated on a 7-point scale reflecting the frequency of occurrence of the behavior in the past week (two weeks for less frequent events, such as encounters with unfamiliar settings/adults). Reliability of the IBQ-R has been supported for mothers and fathers, as well as samples from different cultures, with Cronbach’s α values ranging from .77 to .96 [39-41]. Evidence supports the predictive and construct validity of IBQ-R scores [42-44]. Cronbach’s α values for the 14 subscales included in the current analysis, derived from 29 datasets, ranged from .74 to .89 (mean α = .82). These temperament features were used to classify children into gender and age categories via Machine Learning algorithms.

Procedure

Data sets (N = 29) were acquired by emailing researchers who requested the IBQ-R or published research using the instrument between 2006 and 2019. All of the researchers had received approval from their respective Human Research Protection Programs (HRPPs)/Institutional Review Boards (IRBs) prior to initiating data collection: Human Studies Committee at the Brigham and Women’s Hospital in Boston, MA and the Icahn School of Medicine at Mount Sinai in New York; IRB at Boston Children’s Hospital; Pennsylvania State University IRB; Rutgers-Newark IRB; Virginia Tech IRB; University of North Carolina at Greensboro IRB; Emory IRB; University of Washington IRB Committee D; Northern Illinois University IRB #1; Brown University HRPP/IRB; IRB of the University of Minnesota’s Human Research Protection Program; University of Michigan Health Sciences and Behavioral Sciences IRB; Health Sciences IRB at University of Wisconsin; Harvard University Committee on the Use of Human Subjects in Research; University of California, Irvine HRPP/IRB; University of Missouri IRB; IRB of Western Kentucky University; University of North Carolina at Chapel Hill IRB; Western Washington University IRB; University of Virginia IRB for the Social and Behavioral Sciences; Wayne State University IRB; Colorado Multiple IRB; obtaining written informed consent. Contributors were asked to provide item level data from the IBQ-R as well as infant age, gender, and race. For all participants, the IBQ-R was completed by the infant’s mother. See Table 1 for a brief description of the samples.

Table 1

Sample descriptions.

Researcher(s)	Sample Size (N)	Infant Age (Weeks)	Gender(% Male)	Race(% Non-White)	Sample Description
Bosquet & Wright	668	20.23–63.25	53.3	71.1	Community sample of infants
Gartstein	Study 1: 387 Study 2: 143 Study 3: 84 Study 4: 67	15.00–52.00 11.00–51.00 11.00–54.00 24.00–48.00	50.1 49.7 94.0 44.8	NA 11.2 13.1 10.4	Community sample of infants
Perez-Edgar, Buss, & LoBue	Study 1: 138 Study 2: 267	16.00–47.20 12.00–68.18	55.0 46.8	26.8 41.6	Community sample of infants
Bell & Calkins	353	20.57–57.00	49.3	23.8	Community sample of healthy infants
Goodman	Study 1: 82 Study 2: 252	12.00–52.00 12.00–52.00	62.2 44.8	43.9 27.0	Community sample of mothers with history of major depression Mothers received treatment for major depression during pregnancy
Spieker	221	22.00–40.00	54.8	81.4	Mothers received mental health treatment during pregnancy
Bridgett	178	16.00–48.00	47.2	29.2	Full term, healthy infants
Salisbury	172	23.00–32.00	51.7	47.7	Prenatal exposure to depression, antidepressants
Mliner & Gunnar	158	48.53–89.20	50.6	60.8	Full term, healthy infants
Muzik	157	23.27–44.40	52.2	43.3	Mothers oversampled for trauma
Stifter	149	24.57–57.29	53.0	8.1	Community sample of full-term infants
Planalp	148	23.00–87.00	48.0	24.3	Community sample of infants
Mehr & Spelke	123	11.71–88.43	59.3	32.5	Community sample of full-term infants
Lukowski	108	39.71–46.14	53.7	38.0	Full term, healthy infants
Groh	91	25.81–42.93	52.2	21.1	Full term, healthy infants
Lickenbrock	80	12.00–35.00	60.0	15.0	Low-risk community sample of infants
Santelli	73	47.57–70.14	47.9	32.9	Vaginally delivered infants exclusively breastfed until 1 month of age
Du Rocher Shudlich	73	24.80–58.80	52.1	16.4	Parents living together since birth of child
Anzman-Frasca	59	51.00–57.00	54.2	11.9	Full term, healthy infants (a portion of the entire sample was included in this study)
Thrasher	Study 1: 12 Study 2: 28 Study 3: 20	6.33–8.67 6.33–9.10 6.80–9.00	73.0 38.0 45.0	NA NA NA	Full term, healthy infants
Diaz	47	40.00	44.7	23.4	Full term, healthy infants
Dayton	47	16.00–31.00	42.9	35.7	High risk sample of families (e.g., poverty, violence exposure, psychopathology)
Moding	43	26.00–102.00	41.9	34.9	No food allergies, feeding difficulties
Jordan	42	20.00–45.00	31.0	19.0	Full term, healthy infants

Analytic strategy

Descriptive statistics across gender and age groups were computed first (Table 2). We then constructed a model framework allowing us to assess the utility of fine-grained temperament dimensions with respect to gender and age classifications. This framework resulted in a total of five (5) model types, which included: 1) gender: boys vs. girls; 2) age groups: youngest (< 24 weeks) vs. mid-range (24 to 48 weeks) vs. oldest (> 48 weeks) infants; and gender by age group analyses: 3) boys vs. girls in the youngest age group; 4) boys vs. girls in the mid-range age group; 5) boys vs. girls in the oldest age group. Classification of infant gender within age groups allows us to determine if predictive strength of gender-based classification is more accurate for younger vs. older infants.

Table 2

Descriptive statistics for the temperament subscales by gender and age group.

Models	Gender						Age Group
	Girls			Boys			Youngest < 24 weeks			Mid-Range 24 to 48 weeks			Oldest > 48 weeks
	Mean	SD	Range	Mean	SD	Range	Mean	SD	Range	Mean	SD	Range	Mean	SD	Range
Activity	4.25	1.11	0.33–6.93	4.29	1.08	0.47–6.80	4.12	0.89	0.53–6.67	4.29	1.21	0.33–6.93	4.43	0.87	0.47–6.87
Approach	4.79	1.39	0.17–7.00	4.84	1.37	0.17–7.00	3.98	1.50	0.17–7.00	5.00	1.23	0.33–7.00	5.55	0.91	1.42–7.00
Smiling/ Laughter	4.61	1.36	0.10–7.00	4.63	1.34	0.10–7.00	4.37	1.15	0.20–7.00	4.63	1.49	0.10–7.00	5.01	0.91	0.70–7.00
High Intensity Pleasure	5.32	1.41	0.09–7.00	5.49	1.38	0.09–7.00	4.98	1.23	0.55–7.00	5.47	1.53	0.09–7.00	5.95	0.74	0.27–7.00
Perceptual Sensitivity	3.27	1.35	0.08–6.83	3.33	1.36	0.17–7.00	2.89	1.25	0.17–7.00	3.38	1.40	0.08–7.00	3.71	1.18	0.42–6.83
Vocal Reactivity	4.42	1.38	0.08–7.00	4.41	1.35	0.17–7.00	3.92	1.10	0.33–7.00	4.43	1.47	0.08–7.00	5.22	0.89	1.00–7.00
Distress to Limitations	3.46	0.90	0.69–6.31	3.56	0.92	0.19–6.38	3.27	0.83	0.19–6.25	3.55	0.91	0.56–6.31	3.71	0.95	0.25–6.38
Fear	2.51	1.07	0.19–6.44	2.28	0.95	0.06–6.69	2.05	0.90	0.31–6.25	2.43	1.02	0.19–6.44	2.74	1.02	0.06–6.69
Falling Reactivity	4.57	1.20	0.23–6.92	4.50	1.19	0.08–7.00	4.63	1.07	1.08–7.00	4.62	1.03	1.15–7.00	4.13	1.67	0.08–6.92
Sadness	2.97	0.98	0.14–6.29	3.03	0.96	0.14–5.79	2.91	0.99	0.36–6.29	3.01	0.98	0.14–6.21	3.10	0.89	0.14–5.79
Cuddliness	5.12	1.11	0.53–7.00	5.08	1.13	0.29–7.00	5.39	1.13	0.76–7.00	5.03	1.12	0.29–7.00	4.87	0.97	0.41–6.82
Duration of Orienting	3.69	1.16	0.17–7.00	3.69	1.13	0.25–7.00	3.62	1.19	0.08–7.00	3.73	1.16	0.17–7.00	3.63	1.01	0.92–6.83
Low Intensity Pleasure	4.79	1.07	0.69–7.00	4.72	1.06	1.23–7.00	4.74	1.12	0.69–7.00	4.82	1.05	1.23–7.00	4.52	0.98	1.77–7.00
Soothability	4.64	1.07	0.50–7.00	4.58	1.12	0.39–7.00	5.39	1.13	0.76–7.00	4.62	1.13	0.50–7.00	4.66	1.09	0.94–7.00

Established machine learning techniques, methodologically rigorous and shown to provide reliable/reproducible results, were used in this study (e.g., [45,46]). Specifically, for all models, we used repeated 10-fold cross-validation partitioning with random assignment: a training dataset including 70% of the sample, and 30% reserved as a hold-out dataset (testing) to evaluate the predictive utility of the trained models. A total of 11 different algorithms were considered for each model type, including: (1) linear discriminant analysis; (2) generalized linear modeling; (3) support vector machines; (4) K-nearest neighbor; (5) naïve bayes; (6) classification and regression trees; (7) C5.0 classification; (8) bootstrapped aggregated trees; (9) ensembled decision trees (Random Forest; [47,48]); (10) gradient boosting; and (11) multi-class adaptive boosting (AdaBoost). These algorithms were chosen based on their applicability and widespread use in the classification modeling literature [45,46], and in order to achieve most robust and replicable results discernable across multiple modeling techniques. The aforementioned models were then compared to discern the most effective classification of infant gender and age with temperament features based on misclassification rates, Cohen’s kappa coefficients, and sensitivity and specificity via the area under the curve (AUC) from Receiver Operator Curves (ROC), considered as indicators of predictive accuracy. Misclassification provides a simplistic posterior assessment of model classification based on contingency tables and is often used for initial classification and model accuracy evaluation. Accuracy indicators, reported herein, represent the inverse of misclassification rates. Cohen’s kappa coefficient assesses reliability of categorization, which incorporates chance agreement, is normalized, and can range from -1 to 1. Kappa values will typically be lower than overall misclassification indictors, as it represents a more conservative estimate given its assessment of accuracy compared to random assignment. The area under an ROC curve (area under the curve, or AUC) is a third metric used to evaluate the accuracy of binary classifiers, which encapsulates both Type I and Type II errors [49]. However, ROC-AUC is limited insofar as it does not take predicted probability values and goodness of fit of evaluated models into account. While all three indicators provide unique assessments of classification accuracy, overall misclassification rate (or, inversely, accuracy) is the most broadly used metric for classification evaluation [50]. For all of the model classification indices, higher values (i.e., closer to 1) can be considered superior, indicative of more optimal performance.

Results

Overall, classification accuracy was superior for age relative to gender categories, based on misclassification rates (i.e., accuracy indicators), Kappa, and area under the curve (AUC) indicators (Table 3A).

Table 3A

Classification effectiveness indicators across machine learning algorithms: Gender and age-based classification with temperament features.

	Gender Classification: boys vs. girls			Age Classification: youngest (age < 24 weeks) vs. mid-range (age 24 to 48 weeks) vs. oldest (age > 48 weeks)
Models	Accuracy	Kappa	AUC	Accuracy	Kappa	AUC*
Linear Discriminant Analysis	.558	.162	.422	.641	.284	.517
Generalized Linear Modeling	.569	.153	.485	.630	.295	.526
Support Vector Machines	.559	.169	.432	.637	.308	.517
K-Nearest Neighbor	.556	.084	.471	.650	.271	.529
Naïve Bayes	.577	.094	.451	.634	.272	.512
Classification and Regression Trees	.565	.099	.424	.645	.240	.514
C5.0 Classification	.575	.099	.422	.625	.272	.538
Bootstrapped Aggregated Trees	.580	.099	.422	.640	.274	.535
Ensembled Decision Trees (Random Forest)	.580	.133	.485	.641	.289	.535
Gradient Boosting	.556	.157	.432	.631	.306	.522
Multi-class Adaptive Boosting (AdaBoost)	.558	.141	.471	.641	.241	.517

*AUC for Age Classification analysis represents a multiclass ROC indicator, based on 3 groups.

*AUC for Age Classification analysis represents a multiclass ROC indicator, based on 3 groups. Specifically, across all algorithmic models, age-based classification outperformed gender-based classification for all classification outcomes. Gender classification was performed within the three infant age groups next (Table 3B), with classification effectiveness for gender generally superior in the oldest age group (> 48 weeks). That is, oldest age group classification models consistently outperformed others based on the AUC, and this was the case for the majority of classification algorithms with respect to accuracy and Kappa indicators. Next, we focused on the AUC, especially informative in capturing differences for gender classification models across age groups because of its longstanding widespread use for comparative purposes in the machine learning classification literature [51] and visualization capabilities (Figs 1–3). AUC gender classification indicators were superior for the oldest age group, yielding higher values across different algorithmic models, illustrated in Fig 3.

Fig 1

Note: lda—Linear Discriminant Analysis; glm—Generalized Linear Modeling; svm—Support Vector Machines; knn—K-Nearest Neighbor; nb—Naïve Bayes; cart—Classification and Regression Trees; c50—C5.0 Classification; treebag—Bootstrapped Aggregated Trees; rf—Ensembled Decision Trees (Random Forest); gbm—Gradient Boosting Method; adabag—Multi-class Adaptive Boosting (AdaBoost).

Fig 3

Discussion

We set out to leverage existing IBQ-R datasets from multiple laboratories (N = 4,438) to address an important gap in research by investigating age and gender classifications in early childhood, and overcoming limitations of the published studies such as small sample sizes that cannot be considered representative or provide widely generalizable results. Relying on algorithmic modeling techniques, 14 IBQ-R subscale scores served as features used to classify participating children as boys (n = 2,298) and girls (n = 2,093), and into three age groups: youngest (< 24 weeks; n = 1,102), mid-range (24 to 48 weeks; n = 2,557), and oldest (> 48 weeks; n = 779). Importantly, this approach allowed us to simultaneously classify infants into age and gender categories, providing an opportunity for the first time to consider the extent to which gender differences are informed by infant age. This study also makes an important contribution to the literature as a novel methodological demonstration. That is, the present application of machine learning algorithms provides a new direction for infancy and temperament research, as well as meta-analytic investigations more broadly. Results based on accuracy indicators (the inverse of misclassification rates), Cohen’s kappa coefficients, and AUC (incorporating sensitivity and specificity parameters) demonstrated that temperament features provided superior classification of age groups relative to gender, which is consistent with the existing literature insofar as age effects have generally been more robust (e.g., not dependent on methodology; [5,26,52]). As noted, gender differences in infancy have been largely limited to activity level and fear/behavioral inhibition, with higher activity level and approach reported for boys [29,30] and greater fear/behavioral inhibition for girls [14,25,31,35,36]. These gender differences are somewhat controversial due to a lack of consensus regarding their origin (i.e., biologically based or largely a function of socialization; [53]) and questions regarding the role of parental expectations. That is, parents could rate boys and girls differently not due to actual variability in behavior but as a function of their own culturally influenced ideas about what is typical behavior in boys vs. girls. This explanation cannot be ruled out completely, although existing research suggests that gender differences are not entirely dependent on methodology (i.e., have been identified via behavioral observations along with parent report; [33,52]). Importantly, gender classification by age groups results suggest this is most effective for the oldest age group, in line with the literature that indicates gender differences in temperament attributes become more pronounced with age [54]. Although a number of factors could be contributing to this pattern of results—accentuated gender differences in temperament with increasing age, and, conversely more accurate classification of gender with temperament features for oldest participants—socialization is often described as critical among these. The primary mechanism invoked in such explanations involves the infants’ interactional history, and is consistent with literature that indicates mothers respond differently to their sons and daughters [55-59], presenting with different affordances as social interaction partners (e.g., [60]). Over time, such differences could result in divergent trajectories with respect to temperament due to differences in socialization goals/approaches for boys vs. girls. Specifically, parents may prioritize relationship orientation for daughters, but competence and autonomy for sons [61-63]. These and other socialization-related pathways may be responsible for the stronger temperament-based classification of boys and girls later in infancy observed herein. At the same time, gender is viewed as a marker for a host of sex-linked distinctions in physiological processes. For example, prenatal exposure to high levels of androgen is predictive of later behavior problems, primarily of the externalizing type (e.g., ADHD; [64]), and used to explain early vulnerability observed in boys with respect to this set of problems [65]. Postpartum biological effects are also possible, for example via testosterone increases for boys in infancy, referred to as “mini-puberty,” peaking by the second month and returning to baseline at about 6 months [66]. Sex-linked differentiation in brain structures and functions occurs with maturation, resulting in greater discrepancies with age. For example, Goldstein et al. [67] reported that the amygdala tends to be larger in males and the hippocampus larger in females (see Hines [68] for a related review). Follow-up analyses outlining feature importance for classification models were performed for the Ensembled Decision Trees (Random Forest) to further interpretation of the observed results. Random Forest methods provide an effective mechanism for feature selection and importance using tree-based mechanisms to rank node classification via the mean decrease in gini impurity, i.e., the probability that a random sample in a particular tree node would be mislabeled using the distribution of the node sample, averaged across all trees [69]. Figures provided in Supplemental Materials (S1–S3 Figs) demonstrate that while Fear was the most important feature in distinguishing boys and girls for the youngest and mid-range age group, for oldest infants, low intensity pleasure was most influential. In fact, for youngest infants (S3 Fig), all three distress-related scales (Fear, Distress to Limitations, Sadness) were of primary importance in classifying infants accurately by gender via the Random Forest algorithm. Positive emotionality and regulatory dimensions of temperament (e.g., Falling Reactivity, Approach) begin to take on greater importance for mid-range and oldest infants. Notably, certain temperament features detracted from model accuracy in classifying infants by gender (i.e., associated with lowest negative importance values), particularly Cuddliness, Vocal Reactivity, and Smiling and Laughter in the youngest age group and Smiling and Laughter, Perceptual Sensitivity, and Activity in the oldest age group. These results identify the temperament attributes that did not differentiate boys and girls effectively, and it is of interest that the list of these poorly differentiating features varied by age. When the most important features were considered for age classification and gender classification models only, Fear again emerged as the critical dimension, which is in line with the extensive literature documenting the developmental progression as well as gender differences for this domain of temperament [2,13,14,26,54]. This work is not without limitations, chief among these our reliance on a single method (i.e., parent report) in the assessment of infant temperament. Future studies should aggregate datasets providing different sources of information, including behavioral observations and physiological measures, such as cortisol reactivity, heart rate variability/respiratory sinus arrhythmia, and/or frontal alpha asymmetry ascertained via electroencephalogram (EEG) recordings. In addition, the outcomes examined in this study were limited to child gender and age. Future studies with older children should conduct classification analyses with additional dependent variables, particularly symptom and disorder classifications (e.g., clinical/subclinical/asymptomatic ADHD). It should be noted that we did not consider classification based on race/ethnicity because of a far more limited literature suggesting these differences can be discerned on the basis of temperament, and future research should examine related models, as relevant studies accumulate. Finally, the present modeling approach could be extended and potentially improved by applying ensembling modeling approaches (i.e., using multiple algorithms simultaneously), as opposed to relying on singular modeling frameworks. This study underscores the importance of meta-analytic investigations and cross-laboratory collaborations, providing illusive answers to questions, such as those related to intersections of gender and age in temperament development, that have not been previously addressed. Because of the large cross-laboratory sample included herein, this study provides most optimal comparative data for the IBQ-R (Table 2), which has emerged as a widely used infant temperament assessment tool. Importantly, the present investigation serves as a methodological illustration for application of machine learning techniques in infancy and temperament research, as well as developmental science more broadly. Given the propensity for differing algorithmic methods to have strengths and weaknesses that may bias predictive outcomes and classification accuracy, we selected 11 established algorithmic modeling and classification techniques to quantify the most robust outcomes, simultaneously demonstrating the viability of machine learning approaches in this area of scientific inquiry. Results of this study make an important contribution to developmental temperament research, demonstrating effective age group classification on the basis of fine-grained temperament features, and indicating more effective gender classification for the older age group, with multiple implications for future mechanistic research examining potential socialization and biological contributors.

Note.

DL–distress to limitations; Sad–sadness; PS–perceptual sensitivity; App–approach; Fall–falling reactivity; DO–duration of orienting; HP–high intensity pleasure; LP–low intensity pleasure; Act–activity level; Sooth–soothability; SL–smiling and laughter; VR–vocal reactivity; Cud–cuddliness. (TIF) Click here for additional data file. Fall–falling reactivity; HP–high intensity pleasure; LP–low intensity pleasure; Sad–sadness; VR–vocal reactivity; App–approach; DL–distress to limitations; SL–smiling and laughter; PS–perceptual sensitivity; DO–duration of orienting; Sooth–soothability; Cud–cuddliness; Act–activity level. (TIF) Click here for additional data file. LP–low intensity pleasure; App–approach; VR–vocal reactivity; Fall–falling reactivity; Sad–sadness; DL–distress to limitations; Cud–cuddliness; DO–duration of orienting; Sooth–soothability; HP–high intensity pleasure; Act–activity level; PS–perceptual sensitivity; SL–smiling and laughter. (TIF) Click here for additional data file. 4 Nov 2021

PONE-D-21-33028

Using Machine Learning to Understand Age and Gender Classification Based on Infant Temperament

PLOS ONE Dear Dr. Gartstein, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The idea of the paper is good and has potential to create new knowledge in that research area.

Authors should provide more clearer explanation about the motivation of the study, research problem and contribution of this research.

Two reviewers provided their comments to improve the quality of the paper. Please submit your revised manuscript by Dec 19 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Siuly Siuly, PhD Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section 3. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. In your revised cover letter, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. We will update your Data Availability statement on your behalf to reflect the information you provide. 4. Please include your full ethics statement in the ‘Methods’ section of your manuscript file. In your statement, please include the full name of the IRB or ethics committee who approved or waived your study, as well as whether or not you obtained informed written or verbal consent. If consent was waived for your study, please include this information in your statement as well. 5. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: By applying machine learning techniques, this paper analyzed the temperament features of infants and classified these features into gender and age categories. It is a significant improvement of traditional analytical investigation and a very meaningful research area for young children under 12 months, yet a few comments. (1) In the section "Analytic Strategy" line 254, it mentioned the "gender by age group analyses". However, there is no Table or Figure in the paper to further elucidate these model types. Please try to provide supplemental information or move this group to the "Discussion" section as future work. (2) For Table 3a, "Gender and age-based classification with temperament features", three metrics, i.g. Accuracy, Kappa, AUC, are applied to evaluate each of 11 machine learning algorithms. I hesitate to ask, what features do you use to do Gender Classification and Age Classification? I thought the features were those 14 models listed in Table 2, line 294. If not, please further explain. If yes, how to integrate 14 features' results into one metric? Reviewer #2: This manuscript represents a meta-analysis utilizing IBQ-R data collected across multiple laboratories to overcome the limitations of smaller samples in elucidating links among temperament, age, and gender in early childhood. This paper is generally well written, logical and discusses a hot topic. The results also present a good effectiveness. I think this paper can be accepted. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

28 Dec 2021 November 14, 2021 Dr. Siuly, Academic Editor PLOS ONE Dear Dr. Siuly, Thank you for the opportunity to revise and resubmit our manuscript, entitled: “Using Machine Learning to Understand Age and Gender Classification Based on Infant Temperament.” We appreciate your appraisal of the idea behind our paper as good, with potential to create new knowledge in the research area. In this revision, we provide a clearer explanation regarding the motivation behind the study, the research problem addressed, and the contribution of this research in the Introduction and Discussion sections (pgs. 10, 22, 26, 27). With respect to data sharing, there are ethical/legal restrictions on sharing the de-identified data set used in this study imposed by contributors’ institutions (e.g., Institutional Review Board, Human Research Protection Program, Office of Research). In fact, the first author and her institution had to enter into Memorandum of Understanding agreements with a number of contributing sites in order to obtain the relevant data. These types of arrangements can be considered upon request to access data to the first and corresponding author: Maria A. Gartstein (gartstma@wsu.edu). I would also like to note that the ethical guidelines of the American Psychological Association were closely followed in conducting research presented herein. We have included the full names of the ethics committees that approved data collection across the 29 sites, indicating that all co-authors/investigators obtained written informed consent. Reviewers’ comments and recommendations are addressed in detail below. Reviewer #1: By applying machine learning techniques, this paper analyzed the temperament features of infants and classified these features into gender and age categories. It is a significant improvement of traditional analytical investigation and a very meaningful research area for young children under 12 months, yet a few comments. (1) In the section "Analytic Strategy" line 254, it mentioned the "gender by age group analyses". However, there is no Table or Figure in the paper to further elucidate these model types. Please try to provide supplemental information or move this group to the "Discussion" section as future work. #################################################################### We apologize if this element of the results was confusing, as Table 3b in fact presents gender by age group findings, one component of which (AUC indicators across considered algorithmic models) is also illustrated in Figure 1a-1c. We have clarified the latter for the reader in the revision, as requested (pgs. 19-20). #################################################################### (2) For Table 3a, "Gender and age-based classification with temperament features", three metrics, e.g. Accuracy, Kappa, AUC, are applied to evaluate each of 11 machine learning algorithms. I hesitate to ask, what features do you use to do Gender Classification and Age Classification? I thought the features were those 14 models listed in Table 2, line 294. If not, please further explain. If yes, how to integrate 14 features' results into one metric? #################################################################### The reviewer is correct insofar as the features are the 14 temperament scales, namely: Activity Level, Smiling/Laughter, Approach, High Intensity Pleasure, Perceptual Sensitivity, Vocal Reactivity, Fear, Distress to Limitations, Sadness, Falling Reactivity, Duration of Orienting, Soothability, Cuddliness/Affiliation, and Low Intensity Pleasure. On the other hand, Accuracy, Kappa, AUC, are indicators used to evaluate the predictive accuracy of the 11 machine learning algorithms considered in this study that rely on the 14 temperament features for gender, age, and gender by age classifications. In the revision, we have clarified related language (pgs. 10, 16, 22), increasing clarity according to this recommendation. #################################################################### Reviewer #2: This manuscript represents a meta-analysis utilizing IBQ-R data collected across multiple laboratories to overcome the limitations of smaller samples in elucidating links among temperament, age, and gender in early childhood. This paper is generally well written, logical and discusses a hot topic. The results also present a good effectiveness. I think this paper can be accepted. #################################################################### We thank the reviewer for this positive view of our manuscript. #################################################################### We hope that you and the reviewers find this manuscript worthy of publication in PLOS ONE. Submitted filename: PONE-D-21-33028R1_RespLet_FIN.docx Click here for additional data file. 14 Mar 2022 Using Machine Learning to Understand Age and Gender Classification Based on Infant Temperament PONE-D-21-33028R1 Dear Dr. Gartstein, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Siuly Siuly, PhD Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: In this revised version, major changes are made in Introduction, Materials and Methods, and Discussion. (a) In the Introduction session, the disadvantage of existing classification method is elaborated. (b) In the Materials and Methods session, detailed researcher list is given. (c) In the Discussion session, contributions of this paper is emphasized and the ensembling modeling approach is mentioned as the potential improvement in further work. Other typos are clear in this version. Good job. Reviewer #2: This manuscript represents a meta-analysis utilizing IBQ-R data collected across multiple laboratories to overcome the limitations of smaller samples in elucidating links among temperament, age, and gender in early childhood. This paper is generally well written, logical and discusses a hot topic. The results also present a good effectiveness. I think this paper can be accepted. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No 21 Mar 2022 PONE-D-21-33028R1 Using Machine Learning to Understand Age and Gender Classification Based on Infant Temperament Dear Dr. Gartstein: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Siuly Siuly Academic Editor PLOS ONE

Table 3B

Classification effectiveness indicators across machine learning algorithms: Gender by age with temperament features.

	Age Group 1 (< 24 weeks; n = 1,102)			Age Group 2 (24 to 48 weeks; n = 2,557)			Age Group 3 (> 48 weeks; n = 779)
Models	Accuracy	Kappa	AUC	Accuracy	Kappa	AUC	Accuracy	Kappa	AUC
Linear Discriminant Analysis	.563	.164	.404	.557	.148	.429	.527	.152	.452
Generalized Linear Modeling	.549	.154	.407	.551	.147	.436	.574	.112	.501
Support Vector Machines	.530	.185	.439	.559	.130	.463	.608	.093	.525
K-Nearest Neighbor	.569	.066	.427	.558	.098	.450	.589	.138	.570
Naïve Bayes	.594	.117	.455	.556	.087	.436	.572	.194	.542
Classification and Regression Trees	.536	.075	.437	.548	.075	.471	.546	.133	.536
C5.0 Classification	.567	.087	.457	.573	.112	.436	.571	.159	.487
Bootstrapped Aggregated Trees	.572	.092	.410	.568	.060	.422	.618	.093	.565
Ensembled Decision Trees (Random Forest)	.577	.105	.386	.559	.109	.451	.584	.138	.552
Gradient Boosting Method	.540	.123	.395	.567	.155	.405	.540	.214	.576
Multi-class Adaptive Boosting (AdaBoost)	.563	.119	.404	.557	.131	.429	.527	.100	.452

34 in total

8. First-Time Mothers' and Fathers' Developmental Changes in the Perception of Their Daughters' and Sons' Temperament: Its Association With Parents' Mental Health.

Authors: Cristina Sechi; Laura Vismara; Luca Rollè; Laura Elvira Prino; Loredana Lucarelli
Journal: Front Psychol Date: 2020-08-20

9. Maternal sensitivity in the second year: gender-based relations in the dyadic balance of control.

Authors: Z Biringen; J L Robinson; R N Emde
Journal: Am J Orthopsychiatry Date: 1994-01

10. Fear and positive affectivity in infancy: convergence/discrepancy between parent-report and laboratory-based indicators.

Authors: Maria A Gartstein; Julia Marmion
Journal: Infant Behav Dev Date: 2008-02-20

Using machine learning to understand age and gender classification based on infant temperament.

Introduction

Development of temperament and age differences

Gender differences in temperament

Present study

Materials and methods

Measures

Procedure

Analytic strategy

Results

Discussion

Note.

1. The reliability and validity of the Infant Behavior Questionnaire-Revised.

2. Gender and patterns of emotional availability in mother-toddler and father-toddler dyads.

3. Control networks and neuromodulators of early development.

4. Normal sexual dimorphism of the adult human brain assessed by in vivo magnetic resonance imaging.

5. Positive Affectivity and Fear Trajectories in Infancy: Contributions of Mother-Child Interaction Factors.

Review 6. Prenatal endocrine influences on sexual orientation and on sexually differentiated childhood behavior.

7. Developing Mechanisms of Self-Regulation in Early Life.

8. First-Time Mothers' and Fathers' Developmental Changes in the Perception of Their Daughters' and Sons' Temperament: Its Association With Parents' Mental Health.

9. Maternal sensitivity in the second year: gender-based relations in the dyadic balance of control.

10. Fear and positive affectivity in infancy: convergence/discrepancy between parent-report and laboratory-based indicators.