Literature DB >> 33137147

Could early tweet counts predict later citation counts? A gender study in Life Sciences and Biomedicine (2014-2016).

Abstract

In this study, it was investigated whether early tweets counts could differentially benefit female and male (first, last) authors in terms of the later citation counts received. The data for this study comprised 47,961 articles in the research area of Life Sciences & Biomedicine from 2014-2016, retrieved from Web of Science's Medline. For each article, the number of received citations per year was downloaded from WOS, while the number of received tweets per year was obtained from PlumX. Using the hurdle regression model, I compared the number of received citations by female and male (first, last) authored papers and then I investigated whether early tweet counts could predict the later citation counts received by female and male (first, last) authored papers. In the regression models, I controlled for several important factors that were investigated in previous research in relation to citation counts, gender or Altmetrics. These included journal impact (SNIP), number of authors, open access, research funding, topic of an article, international collaboration, lay summary, F1000 Score and mega journal. The findings showed that the percentage of papers with male authors in first or last authorship positions was higher than that for female authors. However, female first and last-authored papers had a small but significant citation advantage of 4.7% and 5.5% compared to male-authored papers. The findings also showed that irrespective of whether the factors were included in regression models or not, early tweet counts had a weak positive and significant association with the later citations counts (3.3%) and the probability of a paper being cited (21.1%). Regarding gender, the findings showed that when all variables were controlled, female (first, last) authored papers had a small citation advantage of 3.7% and 4.2% in comparison to the male authored papers for the same number of tweets.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2020 PMID： 33137147 PMCID： PMC7605688 DOI： 10.1371/journal.pone.0241723

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

According to statistics provided by the US National Science Foundation [1], women received over half of the bachelor’s, master’s and doctorate degrees awarded in biological sciences in 2016. Furthermore, the proportion of women amongst researchers in health and life sciences between 2011–2015 was shown to be overall higher than men researchers, as per the Gender in the Global Research Landscape Report [2]. Despite the gender parity in degree recipients and the number of researchers, women remain underrepresented among tenure-track biomedical faculty at research institutions [3] and are underrepresented in the faculties of medicine and life sciences, as well in senior positions [4]. In 2016 in the EU, women represented totally 27% of grade A academic staff in health and medical sciences [5]. In the United States, in 2014, women composed 38% of the full-time academic medicine workforce, while men made up 62%.Additionallly, only 21% of full professors and just 15% of department chairs were female, compared with 79% and 85% for men in the same positions [6]. Beyond the gender imbalance in the number of women in senior positions and amongst tenure-track faculty members, some studies have also reported a citation advantage for male-last authored papers in biomedical and life sciences fields [7-9]. However, some others did not find notable differences between male and female-last authored papers or between female and male authors in terms of the number received citations [10, 11]. Typically, in biomedicine, the author listed first has less experience and does most of the research work, while the author listed last has more experience and provides a supervisory role [12, 13]. Given these gender differences in the number of women in senior positions and the number of citations, some studies have also sought to examine whether the web might provide a democratizing space for female academics [14, 15]. Many social media sites have provided new opportunities for both female and male scholars to disseminate and promote their research results within and beyond scientific community [15, 16]. Twitter is one of these social media platforms which is fundamentally reshaping the way biomedical scientists and academic physicians can discover, discuss and share research across disciplinary boundaries, as well as to the public. Twitter allows conversations about new papers to happen immediately and publicly [15, 17]. It also provides a possibility for authors to push their research out via twitter, rather than hope that it is pulled in by readers. Thus, this possibility provides the potential for scholars to draw wide attention to their research [15]. Twitter might also reduce the influence of hierarchies based on seniority. This is because on Twitter, people who do not have tenure, or have a limited number of publications or are early in their career can demonstrate their expertise [17]. Shifting to the push method on social media might also potentially reduce gendered gatekeeping in the dissemination of research [15]. For example, some studies on social media and gender have found that females had a higher visibility in terms of Web citations [18], average Mendeley readers [19, 20], profile views on Academia.Edu in certain disciplines [21], or event counts from Twitter [14, 20], blogs, and news [14]. Others found similar visibility for both female and male scholars in blogs, news, Facebook, or LinkedIn. [20]. Regarding the relation between tweet counts and the number received citation, studies generally tend to suggest a weak positive correlation [15, 22]. Some studies also suggested that tweet counts could predict the later number of received citation [23] or correlate with later downloads and citations for arXiv preprints [24]. This study follows in the same vein. However, it extends this line of research about the prediction of later citation count by early tweets, by comparing female and male (first, last) authored papers while controlling for several important factors that according to previous research have an association with citation counts. Most of these factors have also been examined in relation to gender or altmetrics studies. In relation to citation and gender, these included factors such as journal impact [7, 25], citations and self-citation of first and last authors [25], topic of an article (measured as MeSH) [8, 26], number of MeSH topics [13], gender of first and last authors [9], and the total number of authors’ publications [13, 27]. In relation to citations, these included open access (OA) [28], Mega journal [29, 30] and number of topics of an article [31] amongst others. Regarding altmetrics and citations, factors such as abstract readability [32], international collaboration (measured as number of countries) [32, 33], title length [34], OA [35, 36], Mega journal, journal impact (measured as SNIP) and lay summary [36] have been studied. Additionally, by studying both first and last authorship positions, I was able to control for the effect of seniority in terms of citations and tweet counts. Thus, this study aims to investigate whether early tweets counts could differentially benefit female and male scholars in the field of Life Sciences and Biomedicine, in terms of later citations counts received per paper. The study has the two following objectives: To compare the number of received citations by female and male last/first authored papers, when controlling for a number of important factors. To investigate whether (and to what extent) early tweet counts can predict the later number of received citations by female and male last/first authored papers, when controlling for several important factors.

Materials and methods

Data collection and processing

The data for this study comprised 47,961 articles in the research area of Life Sciences & Biomedicine from 2014–2016, retrieved from Web of Science’s Medline in June 2020. The reason for choosing this time period was to give the documents the required two-year period after publication year to receive tweets, which could be used to predict later citation counts. Furthermore, it would ensure that the documents would have had enough time (a time citation window of at least three years) to be cited following the two-year period after publication year. For each article, the number of received citations per year was downloaded from WOS, while the number of received tweets per year was downloaded from PlumX, using a combination of Doi and Pubmed ID. As the number of tweets per year is not currently available in PlumX, I obtained the date of tweets for each article and then aggregated the number of citations and tweets per year using the methodology applied in Thelwall and Nevill’s study [37] (See Table 1).

Table 1

The aggregation method for early tweet counts and later citation counts for each year.

Year	Number of articles	Early number of tweets	Later citation counts
2014	16,630	Sum of tweets counts for 2014 and 2015	Sum of citations coutns for 2016–2020
2015	16,404	Sum of tweets counts for 2015 and 2016	Sum of citations coutns for 2017–2020
2016	14,927	Sum of tweets counts for 2016 and 2017	Sum of citations coutns for 2018–2020

After aggregation, of the 47,961 articles, 2,496 had zero citations and 24,190 had zero tweets. Fig 1 shows the distribution of early tweet counts versus later citation counts.

Fig 1

The scatter plot of early tweet and later citation counts.

OA status of the articles was obtained from Unpaywall.org in June 2020. To determine the gender of first and last authors, Gender API (https://gender-api.com/) was used. Using this service, it is possible to search for first names, including those with two parts. The results provide the gender (male, female, or unknown), the number of names used to determine the gender and accuracy [38]. In cases of gender-neutral, unknown, initials or where the accuracy was lower than 80%, the names were checked manually using internet searches. The gender of authors in 12 authorship positions were remained unidentified. In the regression models (explained in data collection processing section) they were regarded as missing values. These 12 authorships accounted for seven first and five last authorship positions. The reason for choosing these two authorship positions was that in the field of biomedicine, the last position in the authors list is reserved for senior authors, whereas the first author position is for the person who fulfils the International Committee of Medical Journal Editors (ICMJE) authorship criteria to the highest level and performs the majority of the experimental and clinical work [12, 39]. The total numbers of publications, citations and self-citations for each author (first, last) as scientific (professional) age of an author [13, 25] were downloaded from SciVal API using authors’ IDs in June 2020. To do this, the author IDs for first and last authors were downloaded via Scopus author API using a combination of Doi and Pubmed ID of articles. Then, in the next step, the SciVal Author Lookup API (https://dev.elsevier.com/documentation/SciValAuthorAPI.wadl) was used to download the three former-mentioned indicators. Regarding mega journals, I used the journal list provided by Spezi et al.’s study [40] to determine whether a journal was mega journal or not. A mega journal is a peer-reviewed academic open access journal that publishes manuscripts that presents scientifically trustworthy empirical results without asking about the potential scientific contribution prior to publication. It covers a broad coverage of different subject areas and uses article processing charges to cover the costs of publishing [29, 40]. For each article, the paper length was considered as the absolute number of pages of a publication, while the title length was calculated by counting the number of characters in the title of an article. As this study has been conducted in the area of Life Sciences and Biomedicine, MeSH categories were an appropriate subject classification to consider. Medline assigns articles to 14 broad MeSH categories. In this article, only the seven most relevant medical topics were considered for evaluation. These topic categories were Anatomy, Organisms, Diseases, ‘Chemicals and Drugs’, ‘Analytical, Diagnostic and Therapeutic Techniques and Equipment’, ‘Psychiatry and Psychology’ and ‘Health Care’. For each of these seven MeSH categories, a dummy variable was created and entered as a covariate in the regression models. The articles that did not belong to any of these seven MeSH categories were tagged with a 0. Lay summaries can help journals in life sciences and biomedicine to reach out to patients and others who might benefit from the research [41]. Thus, they may assist the diffusion of research on social media platforms such as Twitter. Using a journal list provided by Shailes [41], the articles were divided in two groups, those with and those without a lay summary. Regarding the abstract readability, the Flesch Reading Ease Score was used, as it is the most commonly used measure of text readability and it has been used in other bibliometric studies [30]. The R quanteda package was used to calculate this score for each abstract. The highest possible score is 121.22 and there is no lower limit. Very complicated sentences can have negative scores. The higher the score, the easier the text is to understand. F1000 score as an altmetric indicator was included in the models as a control variable. The rationale for this is that the articles scored in F1000 are recommended as highly important works in the fields of life sciences, health and physical sciences [42]. Table 2 shows all the variables studied in this paper categorized as dependent, independent variables and covariates. It also provides a short description of these variables and how they are measured.

Table 2

Dependent variables, independent variables and covariates for the hurdle models.

Variable type	Name	Measure
Dependent	Later citation counts¹	The number of received citations after the two first years of publication
Dependent	Total number of citations²	The total number of citations received by an article since its publications.
Independent and Covariate	Early tweet counts	The number of tweets in the first two years of publication
	Gender	Gender of first and last author on an article: Male (0); Female (1)
	Number of authors	Number of authors collaborating in an article
	Funding	Funded article (1); not-funded article (0)
	SNIP	Source (journal) Normalized Impact per Paper.
	International collaboration	Number of countries collaborating in an article.
	Number of MeSH topics	Number of MeSH headings assigned to an article
	MeSH category	Seven MeSH categories assigned to each article as listed below:MeSH1: Anatomy (1); Otherwise³ (0)MeSH2: Organisms (1); Otherwise (0)MeSH3: Diseases (1); Otherwise (0)MeSH4: Chemicals and Drugs (1); Otherwise (0)MeSH5: Analytical, Diagnostic and Therapeutic Techniques and Equipment (1); Otherwise (0)MeSH6: Psychiatry and Psychology (1); Otherwise (0)MeSH7: Health Care (1); Otherwise (0)
	Title length	Number of characters in the title of an article.
	Lay summary	Articles from journals including lay summaries listed in Shailes list⁴ (1); other journals (0)
	Abstract readability	Flesch readability score of the abstract.
	F1000 score	The score was obtained from Altmetrics.com public API.
	Mega journal	Mega Journal (1); non- Mega journal (0)
	OA	OA (1); non-OA articles (0)
	Paper length	The absolute number of pages of a publication derived from the beginning and end page of a document.
	Total number of publications, self-citations and citations for first and last authors	These values were downloaded from SciVal API for first and last authors using their authors’ IDs.

1. Dependant variable in the tweet-citation regression analysis (Models 2, 3).

2. Dependent variable in citation analysis (Model 1).

3. By otherwise I mean the other 13 MeSH categories.

4. https://elifesciences.org/articles/25411?utm_source=content_alert&utm_medium=email&utm_content=fulltext&utm_campaign=elife-alerts.

1. Dependant variable in the tweet-citation regression analysis (Models 2, 3). 2. Dependent variable in citation analysis (Model 1). 3. By otherwise I mean the other 13 MeSH categories. 4. https://elifesciences.org/articles/25411?utm_source=content_alert&utm_medium=email&utm_content=fulltext&utm_campaign=elife-alerts.

Data analysis and procedures

Excel and the mctest, pscl, quanteda R packages were used to process and analyse the data. Considering that the dependent variable of this study, the number of citations (See Table 2), was count data, count regression models were used. Furthermore, as this variable was over-dispersed and zero-inflated, a count model was required to deal with these two issues. Therefore, a negative binomial-logit hurdle model was the best fit for the data. Hurdle models measure the likelihood of an observation being positive or zero, and then determine the parameters of the count distribution for positive observations. Thus, a hurdle model comprises two parts: the count model, which is either a negative binomial or Poisson model, and the logit model. The count model predicts the changes in the positive non-zero observations, whilst the logit part models the zero observations [43, 44]. In this paper, three hurdle regression analysis were performed, namely model 1, model 2 and model 3. In model 1, the total number of received citations was considered as a dependant variable, whereas the gender of first and last authors was considered as independent variables. The rest of the variables were considered as covariates, except time since publication, which was considered as an offset variable in the regression model. In models 2 and 3, the later citation counts were considered as dependant variables, whereas the early tweet counts were considered as an independent variable. In Model 3, the rest of variables were considered as covariates. In model 2, there were no covariates. In both models, time after two first years of publication was entered as an offset variable. Table 3 shows descriptive statistics at paper level for the covariates used in regression models 1 and 3. As can be seen from this table, the covariates are divided into two groups of numerical and categorical variables.

Table 3

Descriptive statistics at paper level for numerical and categorical covariates entered in the regression models 1 and 3.

Numerical Covariates	Mean (SD)	Median	Categorical Covariates	Category	Number (%)
Title length	12.87 (4.64)	12	Mega journal	Yes	4,597 (9.58)
Number of Mesh topics	5.91 (1.51)	6	Mega journal	No	43,364 (90.42)
SNIP	1.33 (0.97)	1.11	OA Status	OA	27,906 (58.18)
Number of authors	5.38 (6.82)	4	OA Status	non-OA	20,055 (41.82)
Number of countries	1.48 (0.99)	1	Lay summary	Yes	1,580 (3.29)
Abstract readability	14.04 (13.44)	14.47	Lay summary	No	46,381(96.71)
F1000 score	0.04 (0.34)	0.001	Funding	Yes	10,289 (21.45)
Paper length	9.87 (26.19)	9	Funding	No	37,672 (78.55)
Total publications for last authorship position	70.39 (74.69)	49	MeSH category	Anatomy	3,942 (8.22)
Total number of citations for last authorship position	2456 (4825.87)	1093		Organisms	5,351 (11.16)
Total number of self-citations for last authorship position	183.50 (361.80)	86		Diseases	3,244 (6.76)
Total publications for first authorship position	22.82 (33.50)	13		Chemicals and Drugs	3,431 (7.15)
Total number of citations for first authorship position	682.24 (1922.98)	212		Analytical, Diagnostic and Therapeutic Techniques and Equipment	11,863 (24.73)
Total number of self-citations for first authorship position	48.87 (146.69)	12		Psychiatry and Psychology	814 (1.70)
	48.87 (146.69)	12		Health Care	1,651(3.44)

Multicollinearity was tested using Variance Inflation Factor (VIF). The VIF estimates how much the variance of a regression coefficient is inflated due to multicollinearity in the model. As a rule of thumb, a VIF value that exceeds 5 or 10 indicates a problematic amount of collinearity [45]. All variables had VIF values less than 3; hence no collinearity is expected (See S1 and S2 Appendices).

Results

Fig 2 shows the percentage of first and last authorship positions by gender. As can be seen from the figure, in both authorship positions, the percentage of male-authored papers is higher than that of female authors. However, the percentage of female-first authored positions is slightly higher than the percentage of female-last authored positions.

Fig 2

The percentage of first and last authored papers by gender in the life Sciences and Biomedicine (2014–2016).

Comparison of the number of received citations between female and male last/first authored papers (Model 1)

As can be seen from the count model in Table 4, female first-authored papers and female-last authored papers have a small significant citation advantage over male-authored papers in these two positions. This means that by a unit of increase in gender (moving from male to female in either of these positions), the average number of received citations will be increased by 4.7% for female first-authored papers and by 5.5% for female last-authored papers.

Table 4

Results of Hurdle regression for the comparison of female and male first/last authored articles regarding citation counts, having controlled for several factors.

Count model	Total number of citations
Count model	Coef.	Exp(Coef.)		Sig.
Last (Female)	0.054	1.055	< 0.001***
First (Female)	0.045	1.047	< 0.001***
Title length	-0.008	0.992	< 0.001***
Mega Journal	-0.038	0.963	0.545
Number of MeSH topics	-0.006	0.994	0.112
MeSH-Anatomy	0.035	1.035	0.078 .
MeSH-Organism	0.006	1.007	0.704
MeSH-Diseases	0.036	1.036	0.093 .
MeSH-Chemicals and Drugs	0.046	1.048	0.021 *
MeSH-Analytical. Diagnostic and Therapeutic Techniques and Equipment	0.015	1.015	0.274
MeSH-Psychiatry and Psychology	0.035	1.036	0.394
MeSH-Health Care	-0.085	0.918	0.004 **
SNIP	0.309	1.363	< 0.001***
OA	0.216	1.242	< 0.001***
Number of authors	0.010	1.010	< 0.001 ***
International collaboration	0.023	1.024	< 0.001***
Lay Summary	0.040	1.041	0.582
F1000	0.108	1.115	< 0.001 ***
Funding	-0.024	0.976	0.081 .
Paper length	0.008	1.008	< 0.001 ***
Abstract readability	-0.001	0.999	0.007 **
Total publications of last author	-0.001	0.998	< 0.001 ***
Total number of citations for last author	0.162	1.177	< 0.001 ***
Total number of self-citations for last author	0.0001	1.000	< 0.001 ***
Total publications of first author	-0.006	0.993	< 0.001 ***
Total number of citations for first author	0.309	1.362	< 0.001 ***
Total number of self-citations for first author	0.0006	1.001	< 0.001 ***
Logit model	Total number of citations
Logit model	Coef.	Exp(Coef.)	Sig.
Last (Female)	0.144	1.155	0.051 .
First (Female)	0.060	1.062	0.347
Title length	0.020	1.020	0.002 **
Mega Journal	0.315	1.370	0.493
Number of MeSH topics	0.032	1.032	0.115
MeSH-Anatomy	0.004	1.004	0.974
MeSH-Organism	0.080	1.084	0.412
MeSH-Diseases	0.150	1.162	0.227
MeSH-Chemicals and Drugs	0.142	1.153	0.248
MeSH-Analytical. Diagnostic and Therapeutic Techniques and Equipment	0.027	1.028	0.713
MeSH-Psychiatry and Psychology	-0.079	0.924	0.725
MeSH-Health Care	-0.161	0.851	0.281
SNIP	1.020	2.774	< 0.001 ***
OA	0.176	1.192	0.010 *
Number of authors	0.105	1.111	< 0.001 ***
International collaboration	0.060	1.062	0.261
Lay Summary	-0.312	0.732	0.597
F1000	0.281	1.325	0.371
Funding	0.240	1.271	0.021 *
Paper length	0.035	1.036	< 0.001***
Abstract readability	-0.003	0.997	0.171
Total publications of last author	-0.005	0.995	< 0.001 ***
Total number of citations for last author	0.144	1.155	< 0.001***
Total number of self-citations for last author	0.001	1.001	< 0.001 ***
Total publications of first author	-0.008	0.992	< 0.001***
Total number of citations for first author	0.299	1.348	< 0.001 ***
Total number of self-citations for first author	0.002	1.002	0.004 **

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1.

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1. As for controlled factors in both logit and count models and regardless of gender, OA, SNIP, number of authors, paper length and the total number of citations and self-citations of first and last authors has a positive associations with average number of received citations, as well as higher probability of a paper to be cited. Amongst MeSH topics, there was small positive association with the estimated number of received citations for articles categorized as ‘Chemicals and Drugs’.

Early tweet counts and later citation counts

Regression model with tweet and citation counts only (Model 2)

As can be seen from Table 5, the early number of tweets received by an article has a small positive association with the average number of later citations and a higher probability of a paper being cited. In other words, by increase of a unit in the number of tweet counts, the average number of later citation counts will approximately increase by 1.7% and the probability of being cited will be higher by 22.5%.

Table 5

Results of Hurdle regression for the association between early tweet counts and later citation counts, without any control variables.

Early tweet Counts	Later citation counts
	Count model			Logit model
	Coef.	Exp(Coef.)	Sig.	Coef.	Exp(Coef.)	Sig.
	0.017	1.017	< 0.001 ***	0.203	1.225	< 0.001 ***

Signif. codes: 0 '***'.

Regression model with tweet and citation counts controlling for all covariates (Model 3)

In the next step, we checked to see if there still would be an association between the early number of tweets and the later number of received citations when controlling for a several important factors that have an association with the number of received citations. Interestingly, the results from the comparison of two regression models (2 and 3) shows that by increase of a unit in early tweet counts in both models, the estimated average citation counts will increase to 1.7% in Model 2 (see Table 5) and 3.3% in the current model (see Table 6). As the coefficient in Model 3 is still significant and positive, this suggests that the early number of tweets could predict the later number of received citations, having controlled for specific factors.

Table 6

Results of Hurdle regression for the association between early tweet counts and later citation counts, having controlled for several factors.

Count model	Later citation counts
Count model	Coef.	Exp(Coef.)		Sig.
Early tweet counts	0.032	1,033	< 0.001 ***
Last (Female)	0.041	1,042	0.003 **
First (Female)	0.036	1,037	0.003 **
Title length	-0.012	0.987	< 0.001 ***
Mega Journal	-0.022	0.978	0.746
Number of MeSH topics	-0.016	0.984	< 0.001 ***
MeSH-Anatomy	0.051	1.053	0.018 *
MeSH-Organism	0.002	1.002	0.906
MeSH-Diseases	0.016	1.016	0.498
MeSH-Chemicals and Drugs	0.075	1.078	<0.001 ***
MeSH-Analytical. Diagnostic and Therapeutic Techniques and Equipment	0.020	1.021	0.179
MeSH-Psychiatry and Psychology	0.047	1.049	0.307
MeSH-Health Care	-0.115	0.891	< 0.001 ***
SNIP	0.376	1.456	< 0.001 ***
OA	0.294	1.342	< 0.001 ***
Number of authors	0.011	1.011	< 0.001 ***
International collaboration	0.045	1.046	< 0.001 ***
Lay Summary	0.055	1.056	0.504
F1000	0.121	1.129	< 0.001 ***
Funding	0.064	1.066	< 0.001 ***
Paper length	0.012	1.012	< 0.001 ***
Abstract readability	-0.003	0.997	< 0.001 ***
Total publications of last author	-0.000	1.000	0.015 *
Total number of citations for last author	0.000	1.000	< 0.001 ***
Total number of self-citations for last author	0.000	1.000	< 0.001 ***
Total publications of first author	-0.003	0.997	< 0.001 ***
Total number of citations for first author	0.000	1.000	< 0.001 ***
Total number of self-citations for first author	0.000	1.000	<0.001 ***
Logit model	Later citation counts
Logit model	Coef.	Exp(Coef.))	Sig.
Tweet counts	0.192	1.212	< 0.001 ***
Last (Female)	0.107	1.113	0.095 .
First (Female)	0.073	1.075	0.198
Title length	0.020	1.020	< 0.001 ***
Mega Journal	0.557	1.746	0.225
Number of MeSH topics	0.019	1.019	0.276
MeSH-Anatomy	0.092	1.096	0.361
MeSH-Organism	0.127	1.136	0.142
MeSH-Diseases	0.178	1.194	0.102
MeSH-Chemicals and Drugs	0.218	1.243	0.041 *
MeSH-Analytical. Diagnostic and Therapeutic Techniques and Equipment	0.033	1.033	0.614
MeSH-Psychiatry and Psychology	-0.156	0.855	0.417
MeSH-Health Care	-0.131	0.877	0.319
SNIP	1.017	2.765	< 0.001 ***
OA	0.266	1.305	< 0.001 ***
Number of authors	0.110	1.116	< 0.001 ***
International collaboration	0.115	1.122	0.014 *
Lay Summary	0.161	1.174	0.785
F1000	0.269	1.308	0.319
Funding	0.422	1.524	< 0.001 ***
Paper length	0.041	1.042	< 0.001 ***
Abstract readability	-0.004	0.996	0.048 *
Total publications of last author	-0.002	0.998	0.013 *
Total number of citations for last author	0.000	1.000	0.711
Total number of self-citations for last author	0.002	1.002	0.007 **
Total publications of first author	-0.004	0.996	0.007 **
Total number of citations for first author	0.000	1.000	0.437
Total number of self-citations for first author	0.004	1.004	< 0.001 ***

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1.

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1. As can been seen from count and logit models in Table 6, by increase of a unit in the number of early tweets, the estimated number of received citations will increase by 3.3% and the probability of being cited will be higher by 21.1%. Regarding gender, as can be seen from the count model, it can be concluded that while keeping all other variables in the model constant, at the same number tweets, the estimated number of citations by female first or last-authored papers on average is slightly higher than that for male authors. In other words, by switching from male to female in both authorship positions, the average number of citations by female first or last-authored papers (at the same number of tweets for both genders) will increase by 3.7% and 4.2%, respectively. As for the rest of covariates, as can be seen from both logit and count models, OA, SNIP, number of authors, International collaboration, Funding, paper length and being categorized under the ‘Chemicals and Drugs’ MeSH topic, were significantly associated with the higher number of received citations as well as the higher probability of being cited.

Conclusion and discussion

The goal of this paper was to examine whether and to what extent early tweet counts received by articles in the field of Life Sciences and Biomedicine (2014–2016) could differently benefit female and male scholars in terms of the later citation counts received. To do this, the number of received citations by female and male last/first authored papers were compared, when controlling for several important factors (model 1). Then, it was investigated whether, and to what extent, early tweet counts could predict later citation counts received by female and male last/first authored papers, when controlling for several important factors (model 3). The findings in relation to these two objectives are briefly discussed below. Regarding the first objective, the findings showed that the percentage of papers with male authors in first or last authorship positions was higher than that for female authors. Furthermore, the percentage of female-first authored papers was slightly higher than the percentage of female-last authored ones. These findings might indicate male dominance in the field. The later finding might especially reflect the lack of senior females in this field, as last authors tend to be senior. This conclusion is supported by Plank-Bazinet’s, et al. [4] study which found a significant scarcity of women in academic biomedical leadership and senior positions. Having controlled for several factors, it was found that female first and last-authored papers had a small but significant citation advantage of 4.7% and 5.5% compared to male-authored papers. This finding is interesting given the lower number of female authors in these two authorship positions. The finding regarding the female first author citation advantage is in line with Thelwall’s [9] study, which found a small citation advantage for female first authored papers in parasitology. As first authorship positions tend to be taken by younger researchers, it could be suggested that young female researchers are slightly outperforming young male researchers in terms of citation counts. The findings regarding the last authorship position, contradicts the ones from Thelwall [9], which found a small female last author disadvantage in immunology, parasitology and virology. It is however in line with a study by Sotudeh, Dehdarirad, Freer [20] in the field of neurosurgery, which found a higher average of citations for female first and last authors. Regarding the second objective, the findings showed that irrespective of whether the factors were included in regression models or not, early tweet counts had a weak positive and significant association with the later citations counts (3.3%) and the probability of a paper being cited (21.1%). This finding is in line with the ones of Eysenbach [23], Haustein, et al. [22], Peoples et al. [46] and Klar, et al. [15]. Regarding gender, the findings showed that while keeping all other variables constant in the model, at the same number of tweets, the average citation counts by female first or last-authored papers was slightly higher than that for male authors. Compared to male first or last authored papers, female authored papers had a small citation advantage of 3.7% and 4.2% when both genders receive the same number of tweets per paper. This might suggest that in Life sciences and Biomedicine, early tweets counts could slightly benefit female authored papers in terms of the later citation counts received. This finding is to some extent in line with Klar et al. [15], who found a positive association between the percentage of women authors per paper and the number of citations received, after controlling for the number of tweets. With regard to the other variables controlled for in model 3 (early tweet-later citation counts), the results showed that while keeping all other variables in the model constant, with the same number of tweets, two conclusions can be drawn. i) OA articles, articles with international collaboration or research funding had a higher average of citations and a higher probability to be cited. ii) As F1000 score and journal impact (SNIP) increased, the average number of citations increased. Amongst these covariates, journal impact and OA had the highest association with the number of citations and the probability of being cited, respectively. The finding about journal impact is consistent with Andersen’s et al. [25] study, which found journal prestige as a covariate that accounted for most of the small average citation differences between genders. The finding about OA might show the importance of making an article open, as this makes it more visible, and thus easier for Twitter users to access the full text of articles. This in turn might translate into more citations. With regard to MeSH topics, the results showed that while keeping all other variables in the model constant and with the same number of tweets, the articles with ‘Chemicals and Drugs’ MeSH topic had a higher probability of being cited (24.3%) and a higher average of citations (7.8%) in comparison to the rest of articles with other 13 MeSH topics. According to a study by Bhattacharya, Srinivasan and Polgreen [47], tweeting about MeSH topics such as ‘Chemicals and Drugs’ leads to more engagement (in terms of number of re-tweets) on Twitter. More engagement on Twitter does not guaranty more citations. However, it might provide increased visibility for papers with this topic, which may also make them be seen more by the scientific community. The findings also showed that while some factors had a positive association with the average number of citations received (model 1, citation comparison), they had a very weak or almost no association with the later citation counts received (model 3, early tweet-later citation counts). As an example, the total number of citations by first and last authors had a positive association with the probability of a paper being cited (34.8%; 15.5%) and the average number of citations received in model 1 (36.2%; 17.7%). However, the association between the same variables and the average later citation counts in model 3 was almost none and the coefficients were very close to zero. Collectively, this could suggest that at the same number of tweets, scientific impact of authors, measured as total number of citations, has almost no association with the probability of a paper being cited and later average citations counts received. This study has some limitations. The extent to which early tweet counts associates with later citation counts may vary by adding or removing factors from the model. However, the current model attempted to control for several important factors. By doing so, I was able to increase the probability of obtaining a more precise and reliable association between the early tweet counts and average number of citations received. It also should be considered that the analysis in this paper was limited to articles in the area of Life sciences and Biomedicine which were published in the time period of 2014–2016. Thus, the results obtained in this article are not comprehensive. Thus, caution should be advised with generalization of the results beyond the case studied.

Multicollinearity diagnostics results for Model 1.

(DOCX) Click here for additional data file.

Multicollinearity diagnostics results for Model 3.

(DOCX) Click here for additional data file.

The percentage and number of articles by female and male authors in first and last authorship positions.

(XLSX) Click here for additional data file. 24 Aug 2020 PONE-D-20-22052 Could early tweet counts predict later citation counts? A gender study in Life Sciences and Biomedicine (2014-2016) PLOS ONE Dear Dr. Dehdarirad, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Oct 05 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Alireza Abbasi Academic Editor PLOS ONE ======================================================================== Additional Editor Comments: In addition to reviewers' comments, The following issues need to be investigates as well: Several usage of ‘authors (first, last)’ OR ‘authors (last, first)’ - To avoid any confusion, please use a complete and proper term ‘first and last author’. That will help keeping consistency as well. Please create a separate sub-section for the control variables discussed in pages 6 and 7. A proper (but brief) definition of each variable is also needed. no definition is provided for some variables (e.g. ‘mega journals’, ‘F1000’). Some of the variables are publication-age dependent (e.g. ‘citations count’) and some not (e.g., ‘tweet counts’. In other words, the number of tweet counts are fixed to two years while ‘the number of citations’ are calculated regardless of the age of publication! That can affect the regression (and correlation) results. This can be addressed by for instance using a fixed number of years for citations count (a 3-year window, for instance) or using publication age as a factor. Please discuss. Likewise, total number of publications, citations, … (last variable in Table 2) is age-dependent (i.e. older authors will have higher values) and can affect the statistical results. Please discuss. The use of regression types should be justified. Except for the number of articles per year, no statistic if provided for the data set. It will be helpful to provide some basis statistics about the dataset. For instance, total number of unique authors (and by gender, and position); number of articles and authors) with no citations / tweets; distribution of citations count, tweet count. A discussion on the effects of that range on the statistical analysis is also needed. Under discussion for Multicollinearity test, it is claimed that there is no significant collinearity while a high correlation is expected between independent/covariate variables such as ‘number of authors’ and ‘international collaboration’. Please discuss. Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1) Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2) We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. In your revised cover letter, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. We will update your Data Availability statement on your behalf to reflect the information you provide. [Note: HTML markup is below. Please do not edit.] ======================================================================== Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: I Don't Know ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This analysis adds to both the role of gender in citations and the relationship between altmetrics and citations. The methods are appropriate and careful. The discussion is also appropriate and careful. Although the regression *might* perhaps better have been done with ordinary least squares and log(1+citations) as the dependent variable, I think the approach used here with the hurdle aspect could be better so I do not recommend a change. In Table 3, some of the commas should be full stops. Line 304: "articles with International" should be "articles with international" * I had to answer No to the question, "Have the authors made all data underlying the findings in their manuscript fully available?" but the author is correct not to share the citation and altmetric data. Reviewer #2: The paper looks at early tweets (first two years) for papers in life sciences and citation counts and investigate whether there is an association between them and whether there are gender differences in this regard (citation advantage, and benefit from tweets). A relatively large number of papers have been studied which is good and regression analysis has been used for data analysis. The paper can benefit from some clarification in methods and presentation. This sentence in the introduction, “but only 21% were full professors and just 15% were department chairs [6].” I think this is natural as academic rank is like a pyramid and there are fewer professors than associate professors and fewer associate professors than assistant professor. But if 21% of full professors were female (and the remaining 79% were male, and 15% of department chairs were female (and the remaining 85% were male) then that should be a concern. Not sure if this is what the author (and that reference) has meant to say? Page 4 where it says “Most of these factors have also 91 been examined in relation to gender or altmetrics studies.”, and then lists several factors that have been studied, it should be made clear each of those factors was investigated in relation to what, gender or altmetrics. For instance, was the influence of abstract readability was studied in relation to altmetric or in relation to gender? This is important for understanding the contribution of the current paper. The method needs more details and clarification. For instance, it says tweets for a two year period were collected. For example for papers published in 2014, tweets in 2014 and 2015 were collected. Was the month of publication taken into account in this data collection? If not, a paper published in January 2014 would’ve had two years of tweets in the dataset, while a paper in Dec 2014, would have only 13 months worth of tweets. The same goes for citation data. How many authors (first and last) were there in the dataset and how the publications, citation, self-citation data was obtained? Did the author manually search each of those probably thousands of authors? Was there any problem with author disambiguation? Title length: were words like the, a, an, on, and …counted? Abstract readability, how was it calculated? Did software (text processing) do this or somebody had to read all of the abstracts and assign a score? How about the validity and reliability issues here? Figure 1 should have proper legends with values shown on the bars (e.g. percentage). The paper needs a table that presents some descriptive statistics about the variables included in the study. For instance, how many authors, how many papers from each subject category, what was the average and median title length, how many OA and non-OA, how many papers had funding and how many didn’t, average, mean of the number of authors etc. I believe the level of accuracy used in the paper for significance reporting (shown with long exponents, e.g. 2.45e-05) is unnecessary, up to 3 decimal points would suffice. Also I think the author needs to make the contribution clear in the paper given the focus is on association of tweets and citation (adding gender to the issue) and there has already been some good research on that. Language, proofreading will improve the paper. It seems the paper has one author, but throughout the paper, the author uses 'we' to present the study which might not be right. Typo: p. 209, line 206, as well as well as higher Typo, p. 16, line 299, cations ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Mike Thelwall Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 5 Oct 2020 PONE-D-20-22052 Answer to editor’s comments #Several usages of ‘authors (first, last)’ OR ‘authors (last, first)’- To avoid any confusion, please use a complete and proper term ‘first and last author’. That will help keeping consistency as well. • Thanks for this comment. Following the editor’s comment, this has been corrected throughout the manuscript. # Please create a separate sub-section for the control variables discussed in pages 6 and 7. A proper (but brief) definition of each variable is also needed. no definition is provided for some variables (e.g. ‘mega journals’, ‘F1000’). Regarding this comment, I believe it would be confusing to add a sub-section called control variables before explaining about the regression models and introducing the dependant and independent variables. The aim of data collection section was just to explain how the variables were collected and processed. Then, in data analysis section, I categorized them in different groups (independent, dependant and covariate) in relation to regression models and briefly explained how they are measured (Table 2). Then in Table (3) I have provided statistics for these variables. Thus, instead I did as below: • Some of the variables were already explained in the manuscript such as F1000, abstract readability, paper length, abstract length and Mesh. However, now explanations regarding some variables such as Mega journals (page 7, line 161-165) and lay summary (page 8, lines 179-183) that were not explained in the manuscript have now been added. The following reference with regard to lay summary has been added to the reference section: Shailes S. Plain-language Summaries of Research: Something for everyone. eLife. 2017; 6:e25411. doi: 10.7554/eLife.25411. Furthermore, details regarding the data collection and the reason for inclusion of ‘the total numbers of publications, citations and self-citations’ has been also added to page 7, lines 153-159. • I also added the following paragraph at the end of the data collection and processing section (page 9, lines 192-194), to refer the readers to more details about these variables such as type of variables (independent, dependant and covariates) and how they are measured. ‘Table 2 shows all the variables studied in this paper categorized as dependent, independent variables and covariates. It also provides a short description for these variables and how they are measured.’ • Table 3 is now added to the manuscript which provides descriptive statistics and further details for these variables (page 11). If the editor still believes that the sub-section would be necessary, after these changes have been made, I will add it. #Some of the variables are publication-age dependent (e.g. ‘citations count’) and some not (e.g., ‘tweet counts’. In other words, the number of tweet counts are fixed to two years while ‘the number of citations’ are calculated regardless of the age of publication! That can affect the regression (and correlation) results. This can be addressed by for instance using a fixed number of years for citations count (a 3-year window, for instance) or using publication age as a factor. Please discuss. • Regarding the number of tweets, a two-year time period after publications was set to be sure that articles have enough time to receive the number of tweets. The reason to this is that twitter is categorized as a fast altmetrics data sources. This means that for these sources, in this case Twitter, altmetric events for newly published research outputs are accumulated very fast. According to research by Feng and Costas (2020), half of the tweet mentions for newly published articles were accrued in the first 2 weeks (14 days) after the research outputs were published, and over 85% of their data happened within a year (365 days). Given that, by setting a two-year time period after publication, I made sure that the studied articles had enough time to receive tweets. I have already mentioned this on page 6 lines 121-122. Fang, Z., Costas, R. Studying the accumulation velocity of altmetric data tracked by Altmetric.com. Scientometrics 123, 1077–1101 (2020). https://doi.org/10.1007/s11192-020-03405-9 • Regarding the number of citations, as already mentioned in page 6 lines 123-125, I set a citation window of at least 3 years for the studied articles to have enough time to receive citations. Furthermore, I controlled for the effect of time on the number of citations by entering the time after two first years of publication as an offset variable in the regression models (1,2,3). It is already mentioned in the manuscript at page 9 lines 213-214 and page 10, lines 217-218. So, using both these methods, I was able to control for the effect of time. #Likewise, total number of publications, citations, … (last variable in Table 2) is age-dependent (i.e. older authors will have higher values) and can affect the statistical results. Please discuss. • The total number of publications, citations and self-citations of an author is defined as professional or scientific age of an author as per studies by Mishra et al. (2018) and Andersen et al. (2019). Following the methodology used in the above-mentioned studies, I entered the professional age in regression models (1,3) as control variables to be able to control for this age effect on the probable number of citations received or the probability of a paper being cited. I have now mentioned these studies in the manuscript at page 7 line 154. Article Source: Self-citation is the hallmark of productive authors, of any gender Mishra S, Fegley BD, Diesner J, Torvik VI (2018) Self-citation is the hallmark of productive authors, of any gender. PLOS ONE 13(9): e0195773. https://doi.org/10.1371/journal.pone.0195773 Andersen JP, Schneider JW, Jagsi R, Nielsen MW. Gender variations in citation distributions in medicine are very small and due to self-citation and journal prestige. eLife. 2019;8:e45374. doi: 10.7554/eLife.45374. #The use of regression types should be justified. • On pages 9, lines 201-209, it is already explained and justified why hurdle regression model has been used. #Except for the number of articles per year, no statistic if provided for the data set. It will be helpful to provide some basis statistics about the dataset. For instance, total number of unique authors (and by gender, and position); number of articles and authors) with no citations / tweets; distribution of citations count, tweet count. A discussion on the effects of that range on the statistical analysis is also needed. Regarding this comment: • It should be considered that the analysis in this paper is based on authorship positions and not by the number of publications for unique authors in each authorship position. In other words, the analysis in the paper is at paper level and not author level. Thus, providing statistics on author level would be confusing and is not in line with the aim of study. The data set of this study was comprised of 47,961 papers. So, there are 47,961 authorship positions for both last and first authorship position. The gender of authors in 12 authorship positions were not detected as mentioned on page 7 line 145. However, to be more precise, now, I have added this line to page 7, lines 147-148: ‘These 12 authorships accounted for seven first and five last authorship positions’. Thus, in this paper the statistics on the number of authorship positions by each gender is reported in Figure 2 at paper level. Additionally, Table 3 has now been added to the manuscript page 11, which provides statistics on the covariates. • Following the editor’s comment, the number of articles with zero citations and zero tweets are provided at paper level at Page 6, lines 134-135 as below. Fig 1. is also added to page 6, which shows the scatter plot for early tweet and later citation counts. ‘After aggregation, of the 47,961 articles, 2,496 had zero citations and 24,190 had zero tweets. Fig 1. shows the distribution of early tweet counts versus later citation counts.’ • Regarding the comment for the effect of distribution, in the methodology section when it explains about the hurdle regression and why it used (page 9, lines 201-209), the distribution of citations is considered. As explained in the manuscript (lines 201-209), one of the main reasons for using hurdle regression model in this paper is that the citations counts are over-dispersed and include an excessive number of zeros (zero-inflated). Furthermore, this model can handle zero values. Basically, the hurdle model has two parts: part one, deals with zero values and calculates the probability of receiving citations using a logit model. Part two, deals with non-zero values and predicts the changes in the positive non-zero observations, which in this paper is the probable number of received citations. The obtained results from both parts, zero values (logit model) and non-zero values (count model) are explained in results section and are discussed in discussion and conclusion section for each model in relation to each gender and to covariates. #Under discussion for Multicollinearity test, it is claimed that there is no significant collinearity while a high correlation is expected between independent/covariate variables such as ‘number of authors’ and ‘international collaboration’. Please discuss. • Regarding multicollinearity test, I used mctest R package to test for multicollinearity amongst the variables. The results of this test for models 1 and 3 are shown in the Appendices (s1, s2). By looking at the VIF values in S1 and S2, it can be seen that the VIF values are all lower than 5. So, no collinearity was found. There is also explanation regarding this test and results on page 11 lines 226-230. • There are other studies in citation and altmetrics analysis like Zahedi and Haustein (2018), Didegah Bowman and Holmberg (2018), that have entered these covariates in the regression models as no-collinearity was found. 1. Zahedi Z, Haustein S. On the relationships between bibliographic characteristics of scientific documents and citation and Mendeley readership counts: A large-scale analysis of Web of Science publications. Journal of Informetrics. 2018;12(1):191-202. doi: https://doi.org/10.1016/j.joi.2017.12.005. 2. Didegah F, Bowman TD, Holmberg K. On the differences between citations and altmetrics: An investigation of factors driving altmetrics versus citations for finnish articles. Journal of the Association for Information Science and Technology. 2018;69(6):832-43. doi: 10.1002/asi.23934. Answer to Reviewers' comments Reviewer #1: # This analysis adds to both the role of gender in citations and the relationship between altmetrics and citations. The methods are appropriate and careful. The discussion is also appropriate and careful. Although the regression *might* perhaps better have been done with ordinary least squares and log(1+citations) as the dependent variable, I think the approach used here with the hurdle aspect could be better, so I do not recommend a change. • Thanks for this comment. # In Table 3, some of the commas should be full stops. • Thanks for this comment. It has now been corrected. Table 3 is now Table 4 in the manuscript. # Line 304: "articles with International" should be "articles with international" • It has now been corrected. Reviewer #2: # This sentence in the introduction, “but only 21% were full professors and just 15% were department chairs [6].” I think this is natural as academic rank is like a pyramid and there are fewer professors than associate professors and fewer associate professors than assistant professor. But if 21% of full professors were female (and the remaining 79% were male, and 15% of department chairs were female (and the remaining 85% were male) then that should be a concern. Not sure if this is what the author (and that reference) has meant to say? • Thanks for this comment. Yes, it is exactly what the report says. For details please refer to this pdf https://store.aamc.org/downloadable/download/sample/sample_id/228/, where it shows these percentages for men and women. I have now made this clearer in the manuscript (Page 3, lines 55-57). #Page 4 where it says “Most of these factors have also 91 been examined in relation to gender or altmetrics studies.”, and then lists several factors that have been studied, it should be made clear each of those factors was investigated in relation to what, gender or altmetrics. For instance, was the influence of abstract readability was studied in relation to altmetric or in relation to gender? This is important for understanding the contribution of the current paper. • Following the reviewer’s comment, the factors are related to each group of studies i.e. gender, altmetrics (pages 4-5, lines 93-101) have been added, • Furthermore, the two following references have also been added to the manuscript: 35. Holmberg K, Hedman J, Bowman TD, Didegah F, Laakso M. Do articles in open access journals have more frequent altmetric activity than articles in subscription-based journals? An investigation of the research output of Finnish universities. Scientometrics. 2020;122(1):645-59. doi: 10.1007/s11192-019-03301-x. 36. Dehdarirad T, Didegah F. To What Extent Does the Open Access Status of Articles Predict Their Social Media Visibility? A Case Study of Life Sciences and Biomedicine. Journal of Altmetrics 2020;3(1). doi: http://doi.org/10.29024/joa.29. #The method needs more details and clarification. For instance, it says tweets for a two year period were collected. For example for papers published in 2014, tweets in 2014 and 2015 were collected. Was the month of publication taken into account in this data collection? If not, a paper published in January 2014 would’ve had two years of tweets in the dataset, while a paper in Dec 2014, would have only 13 months’ worth of tweets. The same goes for citation data. • Thanks for this comment. In this, paper the number of citations and tweets were aggregated by year and not by month of publications. In bibliometrics it is common to set a time citation window of 3 or 5 years based on the year of publications and not month of publications. In fact, citation window is defined as: the years after publication that is used to count the citations (Bibliometric handbook for Karolinska institutet, 2014) https://kib.ki.se/sites/default/files/bibliometric_handbook_2014.pdf. So, it refers to years after publication. However, following the reviewer’s comment, to make it clearer and to be precise, I have changed ‘after publication’ to ‘after publication year’ (page 6 lines 122-125). I also replaced ‘at least three years’ with ‘a time citation of window of at least three years.’ #How many authors (first and last) were there in the dataset? • Regarding this comment, it should be considered that the analysis in this paper is based on authorship positions and not by the number of publications for unique authors in each authorship position. In other words, the analysis in the paper is at paper level and not author level. The data set of this study was comprised 47,961 papers. So, there are 47,961 authorship positions for both last and first authorship position. The gender of authors in 12 authorship positions were not detected as mentioned on page line 7, line 145. However, to be more precise, I have added this line to page 7, lines 147-148: ‘These 12 authorships accounted for seven first and five last authorship positions.’ Thus, in this paper, the statistics on the number of authorship positions by each gender is reported in Figure 2 at paper level. # and how the publications, citation, self-citation data was obtain'd? Did the author manually search each of those probable thousands of authors? Was there any problem with author disambiguation? • Thanks for bringing this to my attention. I used Scival API (https://dev.elsevier.com/documentation/SciValAuthorAPI.wadl) to automatically download these three values for each author using their author IDs. I have now corrected this in the manuscript (pages 7, lines 153-159). After correspondence with Clarivate, I realized that currently it is not possible to download the total number of self-citations for each author, automatically. Furthermore, there was not a straightforward way to download the total number of publications and citations for each author using their APIs. So, instead, I used Scival API which allowed me to download these three former-mentioned indicators automatically. For details regarding the steps taken to do this, please see pages 7, lines 153-159 in the manuscript. Finally, in order to address the issue of disambiguation, I used author IDs. In Author history provided by Author API, it was also possible to check and track the affiliation history of authors. #Title length: were words like the, a, an, on, and …counted? • Thanks for this comment. I calculated the title length based on the number of characters and not words in the title of an article. So, I have corrected it in the paper (page 8, line 167). This methodology was used in other bibliometrics studies such as: Haustein S, Costas R, Larivière V (2015) Characterizing Social Media Metrics of Scholarly Papers: The Effect of Document Properties and Collaboration Patterns. PLOS ONE 10(3): e0120495. https://doi.org/10.1371/journal.pone.0120495 Zahedi Z, Haustein S. On the relationships between bibliographic characteristics of scientific documents and citation and Mendeley readership counts: A large-scale analysis of Web of Science publications. Journal of Informetrics. 2018;12(1):191-202. doi: https://doi.org/10.1016/j.joi.2017.12.005. # Abstract readability, how was it calculated? Did software (text processing) do this or somebody had to read all of the abstracts and assign a score? How about the validity and reliability issues here? • Quanteda R Package was used to calculate the score for each abstract. So, it is not done manually. I used textstat_readability command which returns a data.frame of documents and their readability scores. More information on this package can be found at this link: https://rdrr.io/cran/quanteda/man/textstat_readability.html. • On page 8 line 186, it is now mentioned that this was done using quanteda R package. #Figure 1 should have proper legends with values shown on the bars (e.g. percentage). • Following the reviewer’s comment, in Figure 1, the percentage % sign has been added to the values in the Y axis. The values were also shown in percentages on the bars. Figure 1 is now Figure 2 in the manuscript. #The paper needs a table that presents some descriptive statistics about the variables included in the study. For instance, how many authors, how many papers from each subject category, what was the average and median title length, how many OA and non-OA, how many papers had funding and how many didn’t, average, mean of the number of authors etc. • Thanks for this comment. Following the reviewer’s comment, Table 3 has been added to page 11, which provides descriptive statistics at paper level for the covariates entered in regression models 1 and 3. #I believe the level of accuracy used in the paper for significance reporting (shown with long exponents, e.g. 2.45e-05) is unnecessary, up to 3 decimal points would suffice. • They have now been modified. #Also I think the author needs to make the contribution clear in the paper given the focus is on association of tweets and citation (adding gender to the issue) and there has already been some good research on that. • It has now been modified in the manuscript. #Language, proofreading will improve the paper. It seems the paper has one author, but throughout the paper, the author uses 'we' to present the study which might not be right. • Thanks for this comment. It has been corrected. #Typo: p. 209, line 206, as well as well as higher • It has been corrected. #Typo, p. 16, line 299, cations • It has been corrected. Submitted filename: Answer to reviewers and editors comments.docx Click here for additional data file. 20 Oct 2020 Could early tweet counts predict later citation counts? A gender study in Life Sciences and Biomedicine (2014-2016) PONE-D-20-22052R1 Dear Dr. Dehdarirad, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Alireza Abbasi Academic Editor PLOS ONE Additional Editor Comments (optional): Thank you for the revision. We noticed almost all teh comments are address properly. However, we advise to highlight the contribution of the paper (as advised by a reviewer as well) perhaps in the Abstract or Introduction, and Conclusion. Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #2: The author has adequately addressed all of the comments. The paper has been improved and the method has been explained more thoroughly. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: Yes: Hamid R. Jamali 22 Oct 2020 PONE-D-20-22052R1 Could early tweet counts predict later citation counts? A gender study in Life Sciences and Biomedicine (2014-2016) Dear Dr. Dehdarirad: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Alireza Abbasi Academic Editor PLOS ONE

19 in total

1. Authorship and citation gender trends in immunology and microbiology.

Authors: Mike A Thelwall
Journal: FEMS Microbiol Lett Date: 2020-01-01 Impact factor: 2.742

2. How Twitter is changing medical research.

Authors: Nicole Wetsman
Journal: Nat Med Date: 2020-01 Impact factor: 53.440

3. Engagement with health agencies on twitter.

Authors: Sanmitra Bhattacharya; Padmini Srinivasan; Phil Polgreen
Journal: PLoS One Date: 2014-11-07 Impact factor: 3.240

4. Characterizing social media metrics of scholarly papers: the effect of document properties and collaboration patterns.

Authors: Stefanie Haustein; Rodrigo Costas; Vincent Larivière
Journal: PLoS One Date: 2015-03-17 Impact factor: 3.240

5. Twitter Predicts Citation Rates of Ecological Research.

Authors: Brandon K Peoples; Stephen R Midway; Dana Sackett; Abigail Lynch; Patrick B Cooney
Journal: PLoS One Date: 2016-11-11 Impact factor: 3.240

6. The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles.

Authors: Heather Piwowar; Jason Priem; Vincent Larivière; Juan Pablo Alperin; Lisa Matthias; Bree Norlander; Ashley Farley; Jevin West; Stefanie Haustein
Journal: PeerJ Date: 2018-02-13 Impact factor: 2.984

7. Something for everyone.

Authors: Sarah Shailes
Journal: Elife Date: 2017-03-15 Impact factor: 8.140

8. Gender variations in citation distributions in medicine are very small and due to self-citation and journal prestige.

Authors: Jens Peter Andersen; Jesper Wiborg Schneider; Reshma Jagsi; Mathias Wullum Nielsen
Journal: Elife Date: 2019-07-15 Impact factor: 8.140

9. Using social media to promote academic research: Identifying the benefits of twitter for sharing academic work.

Authors: Samara Klar; Yanna Krupnikov; John Barry Ryan; Kathleen Searles; Yotam Shmargad
Journal: PLoS One Date: 2020-04-06 Impact factor: 3.240

10. How the scientific community reacts to newly submitted preprints: article downloads, Twitter mentions, and citations.

Authors: Xin Shuai; Alberto Pepe; Johan Bollen
Journal: PLoS One Date: 2012-11-01 Impact factor: 3.240

1 in total

1. Social Media: Flattening Hierarchies for Women and Black, Indigenous, People Of Color (BIPOC) to Enter the Room Where It Happens.

Authors: Boghuma K Titanji; Jacinda C Abdul-Mutakabbir; Briana Christophers; Laura Flores; Jasmine R Marcelin; Talia H Swartz
Journal: Clin Infect Dis Date: 2022-05-15 Impact factor: 20.999

1 in total