Literature DB >> 25485095

FALCON: a software package for analysis of nestedness in bipartite networks.

Stephen J Beckett¹, Chris A Boulton¹, Hywel T P Williams¹.

Abstract

Nestedness is a statistical measure used to interpret bipartite interaction data in several ecological and evolutionary contexts, e.g. biogeography (species-site relationships) and species interactions (plant-pollinator and host-parasite networks). Multiple methods have been used to evaluate nestedness, which differ in how the metrics for nestedness are determined. Furthermore, several different null models have been used to calculate statistical significance of nestedness scores. The profusion of measures and null models, many of which give conflicting results, is problematic for comparison of nestedness across different studies. We developed the FALCON software package to allow easy and efficient comparison of nestedness scores and statistical significances for a given input network, using a selection of the more popular measures and null models from the current literature. FALCON currently includes six measures and five null models for nestedness in binary networks, and two measures and four null models for nestedness in weighted networks. The FALCON software is designed to be efficient and easy to use. FALCON code is offered in three languages (R, MATLAB, Octave) and is designed to be modular and extensible, enabling users to easily expand its functionality by adding further measures and null models. FALCON provides a robust methodology for comparing the strength and significance of nestedness in a given bipartite network using multiple measures and null models. It includes an "adaptive ensemble" method to reduce undersampling of the null distribution when calculating statistical significance. It can work with binary or weighted input networks. FALCON is a response to the proliferation of different nestedness measures and associated null models in the literature. It allows easy and efficient calculation of nestedness scores and statistical significances using different methods, enabling comparison of results from different studies and thereby supporting theoretical study of the causes and implications of nestedness in different biological contexts.

Entities: Chemical Disease Species

Year: 2014 PMID： 25485095 PMCID： PMC4244763 DOI： 10.12688/f1000research.4831.1

Source DB: PubMed Journal: F1000Res ISSN： 2046-1402

Introduction

Nestedness is a statistical property of systems where two kinds of entity interact, which can be represented as bipartite networks. Originally used as a metric for species-site distributions [1, 2], nestedness has recently gathered much attention as a metric for bipartite species interaction networks, e.g. plant-pollinator mutualisms [3, 4] and host-virus interactions [5– 7]. Various discussions have considered the sources of nestedness in such systems and its potential implications for ecological dynamics [4, 8– 13]. However, it is unclear how to systematically compare results for different ecological datasets. Furthermore, nestedness is not restricted to ecological datasets, but is a generic property of any bipartite network. Thus, there is a need for measures of nestedness that are context-independent and do not depend on any particular (ecological) interpretation. Multiple methods for measuring nestedness have been used in different studies, along with multiple approaches to calculating statistical significance of the measured values. This provides a large number of ways in which nestedness could be evaluated [14– 16]. Before theoretical investigations of the mechanisms of nestedness can be properly undertaken, robust measures and statistical tests for nestedness are required to allow comparison of results from different studies. Here we present FALCON – a free software package that allows the user to easily compute several measures of nestedness and associated statistical significances based on a selection of null models. FALCON stands for “Framework for Adaptive ensembLes for the Comparison Of Nestedness”. FALCON operates on any form of bipartite interaction data represented as a matrix of associations and is set up to be deliberately ‘blind’ to the source and interpretation of input data. FALCON is based on the assumption that nestedness is a general statistical property of matrices and therefore its measurement should be independent of context or interpretation. FALCON calculates nestedness as a statistical property of a matrix, by returning the nestedness score for the most-nested configuration of the input matrix. Since calculating statistical significance of nestedness scores can be computationally demanding, involving generation of a large ensemble of matrices from a null distribution, FALCON uses a novel “adaptive ensemble” method to improve efficiency by using the minimal ensemble size sufficient to give robust statistics. Several software packages for calculating nestedness already exist – including [1, 11, 17– 21], but these are subject to various factors which make the direct comparison of different nestedness measures and the statistical interpretation of returned values difficult to achieve. Several nestedness measures are handled by packages which deliver a single measure, making the comparison difficult. Some are specific to a particular operating system. Some do not make the source codes available for re-implementation, reducing confidence in their outputs and prevent future extensions. Two packages for the R statistical programming language, bipartite [19] and vegan [21], together contain functions for several nestedness measures and associated null models, as well as many other tools for analysis of bipartite ecological networks. However, these packages offer no obvious implementation of significance testing (the principal method for reporting results of nestedness analyses) and they also lack several nestedness measures which have been recently developed. FALCON is designed to address these deficiencies, enabling the calculation of nestedness and statistical significance by using a variety of measures and null models, with open source code provided for several platforms. The FALCON package is available for three commonly used numerical analysis platforms: MATLAB, Octave and R. MATLAB ( http://www.mathworks.co.uk/products/matlab/) is a commercial software platform, while Octave ( https://www.gnu.org/software/octave/) and R ( http://www.r-project.org/) are both freely available open source platforms. FALCON can be freely downloaded on Github ( http://github.com/sjbeckett/FALCON) or figshare [22] and all code is open and accessible. A guide to downloading, installing and running FALCON accompanies the code. This document describes the assumptions on which FALCON is based, how it calculates nestedness and statistical significance, gives details of the adaptive ensemble method used to improve computational efficiency and provides a case study to demonstrate its usage and outputs.

1 What is nestedness?

Nestedness is a statistical property of bipartite interaction data presented in matrix form. In a perfectly nested matrix, the entries in each successive row are a strict subset of those in the previous row, while the entries in each successive column are a strict subset of those in the previous column ( Figure 1). Interpretation of nestedness depends on context.

Figure 1.

Perfectly nested, weakly nested, and randomly connected matrices.

White squares indicate connections between two kinds of entity arranged in rows and columns.

Perfectly nested, weakly nested, and randomly connected matrices.

White squares indicate connections between two kinds of entity arranged in rows and columns. The concept of nestedness was first described in studies on how species distributions varied between sites [23– 25], and later defined quantitatively as measuring the ‘amount of order/disorder’ in matrices representing the presence/absence of species in island communities [1]. Used in this way, nestedness is calculated from a matrix of presence-absence data where rows are species and columns are sampling sites along some environmental or spatial gradient. A perfectly “nested” matrix (see Figure 1) would be achieved when the set of species present at each site along the gradient is a subset of the species present at the previous site. Since then, the concept of nestedness has been extended in various directions; see [26] for an historical overview of the nestedness concept. Nestedness has continued to be applied to spatial patterning (e.g. [27]) and has been linked with β-diversity [28], but has also been applied to study mutualistic or antagonistic species-species interactions [29, 30], species-time relationships for a single site [31], and several other types of bipartite networks [9, 10, 32– 35]. For pairwise interactions (e.g. plant-pollinator or host-parasite systems), nestedness has been interpreted as placing species along a gradient of generalism-specialism in the number of partners they interact with; in this context, perfect nestedness is achieved when species within each class are ordered such that the interaction set (set of partners) for each species is a strict subset of that of the next species, and the most generalised species of one class interact with the most specialised species of the other class. Nestedness is calculated from a biadjacency matrix representing pairwise interactions between two kinds of entity (one represented by rows, the other by columns). The order of rows and columns for a biadjacency matrix is arbitrary with respect to connectivity; rows and columns can be permuted without affecting the underlying topology of the interaction network. Any non-arbitrary ordering of rows and columns in the matrix representation necessitates the use of additional contextual information to specify which order rows and columns should take. While some datasets may suggest a “natural” ordering to rows and columns in the matrix representation of data (e.g. when one of the dimensions represents an environmental/spatial/temporal gradient), for many applications of nestedness there is no natural ordering (e.g. species interactions). As stated above, we consider that nestedness should be a context-free metric, so that it can be applied to data without requiring any supplemental information on row/column ordering. This assumption implies that the ordering of rows and columns should not affect the measurement of nestedness. While some nestedness measures are insensitive to row/column ordering, several of the most commonly used measures are highly sensitive to ordering, introducing indeterminacy to the quantification of nestedness when rows/columns are ordered arbitrarily. To avoid this indeterminacy and return a single robust nestedness score for a given input matrix, FALCON can sort the rows and columns such that nestedness (however calculated) is maximised. Since re-ordering rows/columns in a matrix representation does not alter the structural information (node adjacency) of the underlying data, this re-ordering is a reasonable approach and makes the measurement of nestedness more consistent.

2 Measures of nestedness in FALCON

Nestedness is most commonly calculated for binary data representing presence/absence of an interaction between two entities, but can also be calculated for weighted data that indicate the strength of the interaction. The methods used to calculate nestedness vary depending on whether binary or weighted interaction data are provided. The nestedness measures available in FALCON are shown and briefly described below and in Table 1; further details are given in Appendix A.

Table 1.

Nestedness measures available in FALCON.

Name	Shorthand	Binary/ Weighted	Brief description	Reference
Nestedness based on overlap and decreasing fill	NODF	Binary	Pairwise row and column comparisons	[ 36]
Manhattan distance	MD	Binary	Sum of row and column indexes of connections	[ 37]
Nestedness temperature calculator	NTC	Binary	Difference from an ‘isocline of perfect order’	[ 1] [ 21]
Johnson, Domínguez- García & Muñoz	JDM	Binary	Measure of dissassortivity using configuration model	[ 38]
Discrepancy	BR	Binary	Difference from a ‘maximally packed’ matrix	[ 2]
Weighted NODF	WNODF	Weighted	Weighted version of NODF	[ 20]
Spectral radius	SR	Both	Maximum real eigenvalue of adjacency matrix	[ 11]

The nestedness measures considered here are not trivial variations upon each other, but differ significantly in their derivations. However, some similarities can be drawn. Spectral radius (SR) [11] and the measure of Johnson, Domínguez-García, & Muñoz [38] (JDM) are invariant to the ordering of rows and columns in the network and are calculated using the adjacency matrix of the network. On the other hand, discrepancy (BR) [2], Manhattan distance (MD) [37] and nestedness based on overlap and decreasing fill (NODF) [36] are all sensitive to row/column ordering and are maximised when rows/columns are ranked by degree. The nestedness temperature calculator (NTC) [1, 21] involves sorting of rows and columns against the ‘isocline of perfect order’ (see Figure 5) such that it maximises connections above the isocline and minimises connections below the isocline. BR is similarly calculated relative to an idealised ‘maximally packed’ matrix. NODF is found through pairwise comparisons of overlap between subsequent rows and columns, whilst MD is found by assigning a weight to each connection as a sum of it’s row and column indexes. The measures also differ in how nestedness is scored; the degree of nestedness in a network increases with increasing measure score for JDM, NODF and SR, but with decreasing measure score for BR, MD and NTC.

Figure 5.

The NTC measure is based on an isocline of perfect order.

Here there are two ‘surprises’.

3 Comparison of nestedness scores

Nestedness is strongly sensitive to the size (number of rows and columns) and fill (number of non-zero entries) of the input matrix [17]. This is problematic in practical terms, since we often wish to compare nestedness of matrices that differ in these basic properties; in fact, cases where we compare empirically derived matrices with identical size and fill are an exception. Thus comparison of absolute values of nestedness metrics is not informative and may be misleading. To compare nestedness of matrices with differing size and fill, observed nestedness should always be interpreted in the context of a null distribution of matrices with similar properties. Measuring observed nestedness relative to expected nestedness derived from a null distribution of similar matrices allows determination of both effect size (e.g. as a z-score, which is commonly used to compare different nestedness schemes [26, 39]) and statistical significance (e.g. as a p-value giving the expected frequency of the observed score in the null distribution). This approach necessitates choice of a suitable null model and generation of a distribution of random matrices drawn from it. In the present context, a null model is a method for creating a distribution of matrices that conserves some properties of the input matrix while varying other properties at random [40]. We continue the “context-free” approach in our treatment of null models; to allow comparison of nestedness across different scenarios, a good null model should not make assumptions about the mechanisms by which data were generated, but treat the matrix as an independent data structure. However, to be comparable to the input matrix, null matrices must conserve some key matrix properties (such as size and fill) on which nestedness depends. The null models available in FALCON are given in Table 2; further detail is given in Appendix B. FALCON includes some of the more popular null models from the literature, alongside some additional null models that we feel can be useful. Null models vary in whether the original data is binary or quantitative, and in which properties of the original input matrix are preserved.

Table 2.

Null models in FALCON.

Name	Description	Binary/ weighted	Conserved features	Reference
SS	Shuffles positions randomly	Binary	Shape, Fill	[ 11]
FF	Permutations of structure with same node degrees	Binary	Shape, Fill, Degree	[ 41]
CC	Some structure is preserved, rest is shuffled	Binary	Shape, Fill
DD	Determined probabilistically by node degree	Binary	Probabilistic Degree	[ 3]
EE	Determined probabilistically by fill	Binary	Probabilistic Fill
Binary Shuffle	Order of weighted links is swapped	Weighted	Binary positions, weights	[ 11]
CRT	Random weights where row totals conserved	Weighted	Binary positions, row totals
CCT	Random weights where column totals conserved	Weighted	Binary positions, column totals
RCTA	Average of conserve row totals and column totals	Weighted	Binary positions

4 How FALCON works

4.1 Inputs and outputs

FALCON requires several inputs: an input network in the form of a bipartite matrix whether binary or quantitative nestedness should be investigated (quantitative matrices can be analysed using binary measures) whether to sort rows and columns to maximise nestedness score which nestedness measures should be used which null models nestedness should be tested under whether the ensemble of null models should be created with a fixed number or adaptively chosen whether or not to plot the distributions of nestedness scores Output is returned to the user in the form of: the most nested configuration of the input matrix the nestedness measure(s) of the input matrix the expected value of nestedness under the null model(s) (as the mean measure of matrices created in the ensemble) the number of ensemble members used to calculate significance in each null model the statistical significance of the nestedness of the input matrix against each null model as a p-value the standard deviation and sample z-scores of the measure in the ensemble as well as other properties.

4.2 What FALCON does

FALCON follows the process shown in Figure 2. First, it sorts the user input matrix into a maximally nested configuration and removes any empty rows/columns before finding the nestedness of this matrix using the users chosen measures. Then, FALCON goes through each of the user specified null models one by one, creating an ensemble of null matrices according to the rules of each null model. Each null matrix is then sorted and measured by each of the chosen nestedness measures. Thus, for each null model, nestedness measures are calculated for each of the null matrices in a single null ensemble, enabling direct comparison of results. The size of the null ensemble is determined by the input choice of using either the fixed or adaptive ensemble size (see Section 4.5). Statistics are computed from the measures found in the null ensemble (and the direction in which that nestedness measure is calculated), before the next null model ensemble is instantiated. Once all null models have been computed, the results are returned to the user.

Figure 2.

FALCON algorithmic procedure.

4.3 Direction of increasing nestedness

For different nestedness measures, increasing scores can represent either increasing or decreasing nestedness as discussed in Section 2. FALCON initially determines whether a higher measure score is related to greater nestedness (or vice versa) in the chosen measure by comparing the scores returned for a highly nested network (see Figure 3A) and a highly non-nested network (a weighted checkerboard configuration; Figure 3B), for which the fill (number of non-zero elements) and element sums are equal. The direction of increasing nestedness for a given measure is used during calculation of statistical significance. This method of determining direction each time the algorithm runs is included to allow easy extensibility; if a new measure is added, FALCON will automatically determine which direction indicates increasing nestedness.

Figure 3.

( A) Weighted nested matrix. ( B) Weighted non-nested matrix (checkerboard). Both ( A) and ( B) have 55 non-zero elements that sum to 220.

4.4 Initial sort

For efficiency, FALCON is set up to initially sort the input matrix by row and column degrees for calculation of BR, MD and NODF, retains this sorted configuration for calculation of JDM and SR, and subsequently re-sorts for NTC in order to find the maximal nestedness of a binary matrix. For quantitative data, FALCON uses the same methods as for binary interactions, but also utilises weight data to break symmetry when two rows (columns) have the same degree; in this case, the row (column) which has greater values for most overlapping elements is ranked highest. Where two or more rows (columns) share the same degree and most overlapping elements, the rows (columns) are ranked according to the total sum of row (column) elements. This sorting does not affect the underlying topology or the relationships in the data. FALCON also allows the user to decide if any sorting is performed, enabling the “context free” assumption to be relaxed (e.g. for investigation of gradient-based nestedness [39]).

4.5 Size of null ensemble

FALCON uses a bootstrap method to calculate the statistical significance of a given nestedness score, since the true null distributions of the test statistics are not known. The ensemble size used for this calculation can either be fixed or calculated adaptively by FALCON to improve computational efficiency and reduce undersampling effects. Note that the strongest significance that can be assigned is where N is the ensemble size. The number of null matrices used to make up the ensemble is fixed by the user. This method is effective providing that the ensemble is large enough to have statistical power; the larger the ensemble, the more power the test has and the closer the answer will be to the p-value for the (unknown) true null distribution. However, it is not obvious how large the ensemble needs to be; in the literature, amongst others [30], use 1,000 null models in their ensembles, whilst [12] use 10,000, and [6] use 100,000. A large number of different null matrix configurations are possible for a given input matrix and we may wish to avoid undersampling [42]; however, at the same time very large ensembles can make the calculation of significance computationally intractable. FALCON includes a mechanism for adaptive determination of ensemble size. This is intended to ensure robust statistics are achieved, avoiding concerns about undersampling or oversampling [42], while minimising computational load. The adaptive method works by creating two ensembles in parallel using the same null model. Starting with a minimum ensemble size of 500 in each group, the ensembles are expanded until they show similar statistical properties. This condition is met when the null hypothesis (both ensembles come from the same distribution) of a Mann-Whitney U-test cannot be rejected at 10% significance. When this occurs, it suggests each group represents a good sample of the underlying distribution, and the two groups are combined to form a single null ensemble used to calculate final statistics. The expansion of the size of the ensemble has an upper limit of 100,000 members in case the null hypothesis is always rejected. The adaptive ensemble methods balances statistical precision with computational efficiency; we conservatively use 1,000 as a minimum final ensemble size such that a p-value as low as 0.001 can be assigned.

4.6 Output statistics

The p-value is the probability that a matrix drawn from the null distribution will be more nested than the input matrix. Low values ( p → 0) indicate that the input matrix is highly nested relative to the null distribution; commonly a threshold of p ≤ 0.05 or p ≤ 0.01 is used to denote a statistically significant level of nestedness. Here p is calculated by counting the frequency of matrices in the null ensemble that are more nested than the input matrix; for cases where no member of the null ensemble is more nested than the input matrix we conservatively assign where N is the ensemble size. The normalised temperature is inspired by the τ-Temperature [37]. It describes the relationship between the nestedness measure found for the input matrix and the expected nestedness measure derived from the null model ensemble. It is described as: where < Measure > denotes the expected value. In simple terms, the normalised temperature indicates whether the input matrix is more or less nested than the expectation for a null distribution of similar matrices. Where the measure gives increasing scores with increasing nestedness, T > 1 indicates greater-than-expected nestedness. Where the measure gives decreasing scores with increasing nestedness, T < 1 indicates greater-than-expected nestedness. The mean average of the set of nestedness measures found for each of the ensemble members is returned. The standard deviation ( σ) of the set of nestedness measures found for each of the ensemble members is returned. The z-score, or standard score, is calculated as the difference between the nestedness measure and its expected value divided by the standard deviation of the sample: It is a measure of the number of standard deviations the nestedness measure of the input matrix is above the expected value. Hence, the way it should be interpreted, as with the normalised temperature, depends on whether nestedness increases with increasing measure score.

5 FALCON usage - case study

To demonstrate FALCON we analyse nestedness analysis in a bipartite network representing the hashtags used by a sample of Twitter users. Data were collected using the Twitter API ( https://dev.twitter.com/docs/api) by searching for tweets including the hashtag “#IPCC” in the time period 21st September 2013 – 5th October 2013. A list of all hashtags used by all users found in the search dataset was then used to create a binary bipartite adjacency matrix for users and hashtags. This was then sampled to create a smaller matrix used for this case study by including each row/column with probability of 1.1 and removing any empty rows/columns. The resulting matrix was stored in a comma-separated file called ’IPCC_HTuse_10_10_1_53x27.csv’. The box below shows the command sequence used to perform a binary nestedness analysis using FALCON in MATLAB. The first line reads in the “.csv” datafile, which includes row and column headers. The second line extracts the adjacency matrix from the imported data. The third line runs FALCON, using two binary nestedness measures (NODF and SR) and two null models (CC and FF, i.e. nulls 2 and 3) using the adaptive solver and displaying histogram plots. The fourth line plots the input matrix in its most nested configuration, as determined by FALCON. The nested configuration of the matrix and output histograms from significance testing are shown in Figure 4, whilst Table 3 shows an example output from the significance testing. Further examples for use of FALCON in R are given in supporting information accompanying the software.

Figure 4.

Example output from FALCON.

( A) the nested arrangement of the UserHashtag data. ( B) Distribution of NODF scores found for an ensemble of FF null models generated for the UserHashtag network ( C) distribution of spectral radius scores found from the same null matrix ensemble. The asterix marks the nestedness score of the input network.

Table 3.

Example output from FALCON showing sample statistics for the UserHashtag network using the operations in the box above.

Nestedness statistics were computed for the FF and CC null models using the NODF and spectral radius measures.

Null model	FF		CC
Nestedness measure	NODF	SR	NODF	SR
Measure	44.4339	7.5552	44.4339	7.5552
Ensemble Size	1000	1000	1000	1000
Mean	45.3207	7.5346	29.9048	6.4755
Standard Deviation	0.8357	0.0336	3.5600	0.2136
z-score	-1.0612	0.6123	4.0812	5.0550
p-value	0.8490	0.2480	<0.001	<0.001
Normalised Temperature	0.9804	1.0027	1.4858	1.1667

Example output from FALCON.

Example output from FALCON showing sample statistics for the UserHashtag network using the operations in the box above.

Nestedness statistics were computed for the FF and CC null models using the NODF and spectral radius measures.

6 Summary

In this paper we have presented FALCON, a software tool for reliable and efficient calculation of nestedness (and associated effect size and statistical significance) based on a selection of popular nestedness measures and null models used in the literature. FALCON treats nestedness purely as a statistical property of a bipartite matrix and removes any form of interpretation or contextual information from the analysis. This enables FALCON to be used to compare nestedness across a wide variety of application areas, noting that the concept of nestedness has already spread from its origin in island biogeography to include species-species interactions and other scenarios, and is likely to find further applications in other domains. The contribution of FALCON is to enable easy cross-comparison of observed nestedness using different nestedness measures and null models. We hope that this functionality will allow greater methodological uniformity and comparability of studies of nestedness. We are in the process of performing a large comparison study of nestedness metrics using FALCON (Beckett and Williams., in preparation). Uniformity of measurement and comparability of empirical results is an important preliminary step that must be achieved to enable understanding of the mechanistic basis and ecological (and otherwise) implications of nestedness. We hope that FALCON will be of use to other researchers and help illuminate this intriguing property of bipartite networks in many natural systems.

7 Software availability

Software access

The FALCON software package is available from Github: http://github.com/sjbeckett/FALCON

Source code as at the time of publication

https://github.com/F1000Research/FALCON

Archived source as at the time of publication

http://dx.doi.org/10.6084/m9.figshare.999117 [22] This is a great, easy-to-use, and comprehensive piece of software. I am eager to begin to use it myself and to send it to colleagues who are reluctant to run the other nestedness software. The paper is also very well-written making this field more accessible. As the authors state, many forms of biological data are organized in bipartite networks and show nested pattern. With such a software platform and clear manuscript, I hope more similarities across biological systems will be recognized and appreciated, perhaps even promoting collaborations and new theory. My one suggestion is for the authors to provide a more thorough description of what null models should be used with which data structures, sampling procedures, biological processes, ecs. This would aid the cross-system comparisons and users from finding spurious patterns. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. The paper introduces a new freeware package designed to collate existing measures of nestedness in biological samples. This is a useful capability because several different methods exist for quantifying parameters and testing significance. The paper describes a novel and efficient method for sampling the null distributions that calibrate the significance of nestedness scores. The package should interest anyone working with biological networks of bipartite data, and is conveniently available for open-source platforms R and Octave as well as MATLAB. I have no major concerns about the conception, execution or description of the paper. It appears to me well motivated, technically sound and written with considerable clarity. The case study provides a useful illustration, though is under-developed as I will detail below and perhaps represents a missed opportunity. I found the program itself easy to use and really useful for exploring alternative measures of nestedness The manuscript title answers to the content of the paper. The Abstract provides a clear summary, though its impact would benefit from removal of repetition. An abstract of 150-words in a single paragraph could state concisely what the need is, and how the package addresses it. The Introduction could usefully explain the meaning of nestedness (currently in the following section) before briefly reviewing alternative methods of quantification. That review should attempt some categorisation of existing methods in one paragraph, and examples of their applications, before describing their various limitations in a following paragraph. Then say how FALCON addresses these issues. The paper provides a clear description of the concept of nestedness and its interpretation, which is not an easy task. It would help to define ‘bipartite’ in the context of the Fig.-1 matrices, as the two dimensions that make up the matrix columns and rows. Table 1 gives shorthand codes for the various nestedness measures, though only some of these correspond to the codes for calling the measures in the program (Table 2 of the instruction guide). It would help to have a closer correspondence of shorthand with code of measures, and with the subtitles in Appendix A. Likewise for null models and Appendix B. The Section-5 case study should take the opportunity to show readers how they can find interesting patterns in datasets by calculating nestedness, using sensible measures and calibrations. For this particular example, it would help to clarify the context by indicating that ‘IPCC’ stands for either ‘Intergovernmental Panel on Climate Change’ or ‘Independent Police Complaints Commission’. It’s not clear to me whether this hashtag was chosen purposefully to sample such different populations of Twitter users, and if so how one might interpret nestedness in the aggregate population. Any case study serves its purpose only if it is followed through all the way from motivation through to extraction of qualitative meaning from the quantitative analysis. Thus the text needs to interpret Figure 4 qualitatively, explaining the choice of measure and null model for these data, and resulting nestedness score such that the reader can see how nestedness might reveal interesting pattern in the data. It would be good to have the case-study dataset made available for users to try out for themselves. In the case study, the textual explanation of the command box says that the program uses three null models, but as far as I can see it calls only two: FF and CC. Then, confusingly the output graphs of null distributions show only the first one: CREATEBINNULL2. Actually, the second one looks more interesting according to Table 3. These steps from input to output need more explanation. Particularly for Fig. 4 and Table 3, interpretation is not helped by the lack of correspondence between names used for shorthand (e.g., ‘SR’, ‘FF’) and code (e.g., ‘SPECTRAL_RADIUS’, ‘CREATEBINNULL2’). If correspondence is not possible in the program itself, it would help at least to have clearer titles for graphs and tables. When using the package, I thought that the instruction guide would benefit from some tidying up. For example, it would help to tabulate all possible alternative values for each of the seven options. Null models should be presented after measures, since a null only has meaning as the baseline against which to calibrate a measure. For the R version of the program, an example script is provided to illustrate capability (examplescript.R). Its ‘TEST 2’ loads a .csv file containing a matrix suitable for nestedness analysis, but omits the specification ‘header = FALSE’. As a result, it reads the first line of the file as a header, which was surely not the intention. Specific points that merit attention: Last paragraph of Section 1, say: “FALCON can sort rows and columns so as to maximise nestedness regardless of the method used for its calculation.” When referring to specific published texts, remove the superscript, to say for example (Section 4.5): “in the literature, ref 30 amongst others use 1000 null models in their ensembles, whilst ref 12 use…” etc. Figure 4 component parts need labels (A), (B), (C). Label the axes of the matrix, at least in the legend if not in the figure itself: presumably columns for users and rows for hashtags. Table 3, avoid vertical lines in tables. The table would be easier to read if columns and rows were switched around. Left-justify text, and use a decimal tab to align numbers in a column. Avoid the phrase ‘As stated above’ – reorganise to state only once. Avoid ending a sentence in a verb. The verb imparts meaning to the sentence by describing the action, so giving it early facilitates the reader’s comprehension. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

20 in total

1. The nested assembly of plant-animal mutualistic networks.

Authors: Jordi Bascompte; Pedro Jordano; Carlos J Melián; Jens M Olesen
Journal: Proc Natl Acad Sci U S A Date: 2003-07-24 Impact factor: 11.205

2. Disentangling nestedness from models of ecological complexity.

Authors: Alex James; Jonathan W Pitchford; Michael J Plank
Journal: Nature Date: 2012-07-12 Impact factor: 49.962

3. Null model analysis of species nestedness patterns.

Authors: Werner Ulrich; Nicholas J Gotelli
Journal: Ecology Date: 2007-07 Impact factor: 5.499

4. Statistical structure of host-phage interactions.

Authors: Cesar O Flores; Justin R Meyer; Sergi Valverde; Lauren Farr; Joshua S Weitz
Journal: Proc Natl Acad Sci U S A Date: 2011-06-27 Impact factor: 11.205

5. Geothermal activity helps life survive glacial cycles.

Authors: Ceridwen I Fraser; Aleks Terauds; John Smellie; Peter Convey; Steven L Chown
Journal: Proc Natl Acad Sci U S A Date: 2014-03-10 Impact factor: 11.205

6. Coevolutionary diversification creates nested-modular structure in phage-bacteria interaction networks.

Authors: Stephen J Beckett; Hywel T P Williams
Journal: Interface Focus Date: 2013-12-06 Impact factor: 3.906

7. Strong contributors to network persistence are the most vulnerable to extinction.

Authors: Serguei Saavedra; Daniel B Stouffer; Brian Uzzi; Jordi Bascompte
Journal: Nature Date: 2011-09-14 Impact factor: 49.962

8. The dynamics of nestedness predicts the evolution of industrial ecosystems.

Authors: Sebastián Bustos; Charles Gomez; Ricardo Hausmann; César A Hidalgo
Journal: PLoS One Date: 2012-11-19 Impact factor: 3.240

9. Factors determining nestedness in complex networks.

Authors: Samuel Jonhson; Virginia Domínguez-García; Miguel A Muñoz
Journal: PLoS One Date: 2013-09-19 Impact factor: 3.240

10. Multi-scale structure and geographic drivers of cross-infection within marine bacteria and phages.

Authors: Cesar O Flores; Sergi Valverde; Joshua S Weitz
Journal: ISME J Date: 2012-11-22 Impact factor: 10.302

14 in total

1. The structure of temperate phage-bacteria infection networks changes with the phylogenetic distance of the host bacteria.

Authors: Carolin C Wendling; Henry Goehlich; Olivia Roth
Journal: Biol Lett Date: 2018-11-14 Impact factor: 3.703

2. Nestedness in assemblages of helminth parasites of bats: a function of geography, environment, or host nestedness?

Authors: Elizabeth M Warburton; Luther Van Der Mescht; Irina S Khokhlova; Boris R Krasnov; Maarten J Vonhof
Journal: Parasitol Res Date: 2018-03-28 Impact factor: 2.289

3. Lysogeny is prevalent and widely distributed in the murine gut microbiota.

Authors: Min-Soo Kim; Jin-Woo Bae
Journal: ISME J Date: 2018-02-07 Impact factor: 10.302

4. Analysis of whole genome sequencing for the Escherichia coli O157:H7 typing phages.

Authors: Lauren A Cowley; Stephen J Beckett; Margo Chase-Topping; Neil Perry; Tim J Dallman; David L Gally; Claire Jenkins
Journal: BMC Genomics Date: 2015-04-08 Impact factor: 3.969

5. Structure of the epiphyte community in a tropical montane forest in SW China.

Authors: Mingxu Zhao; Nalaka Geekiyanage; Jianchu Xu; Myo Myo Khin; Dian Ridwan Nurdiana; Ekananda Paudel; Rhett Daniel Harrison
Journal: PLoS One Date: 2015-04-09 Impact factor: 3.240

6. Detecting the small island effect and nestedness of herpetofauna of the West Indies.

Authors: De Gao; Gad Perry
Journal: Ecol Evol Date: 2016-07-05 Impact factor: 2.912

7. Tripartite species interaction: eukaryotic hosts suffer more from phage susceptible than from phage resistant bacteria.

Authors: Carolin C Wendling; Agnes Piecyk; Dominik Refardt; Cynthia Chibani; Robert Hertel; Heiko Liesegang; Boyke Bunk; Jörg Overmann; Olivia Roth
Journal: BMC Evol Biol Date: 2017-04-11 Impact factor: 3.260

8. FALCON: a software package for analysis of nestedness in bipartite networks.

Authors: Stephen J Beckett; Chris A Boulton; Hywel T P Williams
Journal: F1000Res Date: 2014-08-06

9. High richness of ectomycorrhizal fungi and low host specificity in a coastal sand dune ecosystem revealed by network analysis.

Authors: Alice Roy-Bolduc; Etienne Laliberté; Mohamed Hijri
Journal: Ecol Evol Date: 2015-12-29 Impact factor: 2.912

10. Zooming into plant-flower visitor networks: an individual trait-based approach.

Authors: Beatriz Rumeu; Danny J Sheath; Joseph E Hawes; Thomas C Ings
Journal: PeerJ Date: 2018-09-17 Impact factor: 2.984