Literature DB >> 27830649

Optimal prediction of the number of unseen species.

Alon Orlitsky1, Ananda Theertha Suresh2, Yihong Wu3.   

Abstract

Estimating the number of unseen species is an important problem in many scientific endeavors. Its most popular formulation, introduced by Fisher et al. [Fisher RA, Corbet AS, Williams CB (1943) J Animal Ecol 12(1):42-58], uses n samples to predict the number U of hitherto unseen species that would be observed if [Formula: see text] new samples were collected. Of considerable interest is the largest ratio t between the number of new and existing samples for which U can be accurately predicted. In seminal works, Good and Toulmin [Good I, Toulmin G (1956) Biometrika 43(102):45-63] constructed an intriguing estimator that predicts U for all [Formula: see text] Subsequently, Efron and Thisted [Efron B, Thisted R (1976) Biometrika 63(3):435-447] proposed a modification that empirically predicts U even for some [Formula: see text], but without provable guarantees. We derive a class of estimators that provably predict U all of the way up to [Formula: see text] We also show that this range is the best possible and that the estimator's mean-square error is near optimal for any t Our approach yields a provable guarantee for the Efron-Thisted estimator and, in addition, a variant with stronger theoretical and experimental performance than existing methodologies on a variety of synthetic and real datasets. The estimators are simple, linear, computationally efficient, and scalable to massive datasets. Their performance guarantees hold uniformly for all distributions, and apply to all four standard sampling models commonly used across various scientific disciplines: multinomial, Poisson, hypergeometric, and Bernoulli product.

Entities:  

Keywords:  extrapolation model; nonparametric statistics; species estimation

Mesh:

Year:  2016        PMID: 27830649      PMCID: PMC5127330          DOI: 10.1073/pnas.1607774113

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  8 in total

1.  Bacterial diversity within the human subgingival crevice.

Authors:  I Kroes; P W Lepp; D A Relman
Journal:  Proc Natl Acad Sci U S A       Date:  1999-12-07       Impact factor: 11.205

Review 2.  Counting the uncountable: statistical approaches to estimating microbial diversity.

Authors:  J B Hughes; J J Hellmann; T H Ricketts; B J Bohannan
Journal:  Appl Environ Microbiol       Date:  2001-10       Impact factor: 4.792

3.  Molecular analysis of human forearm superficial skin bacterial biota.

Authors:  Zhan Gao; Chi-hong Tseng; Zhiheng Pei; Martin J Blaser
Journal:  Proc Natl Acad Sci U S A       Date:  2007-02-09       Impact factor: 11.205

4.  Estimating the number of unseen variants in the human genome.

Authors:  Iuliana Ionita-Laza; Christoph Lange; Nan M Laird
Journal:  Proc Natl Acad Sci U S A       Date:  2009-03-10       Impact factor: 11.205

5.  Shakespeare's New Poem: An Ode to Statistics: Two statisticians are using a powerful method to determine whether Shakespeare could have written the newly discovered poem that has been attributed to him.

Authors:  G Kolata
Journal:  Science       Date:  1986-01-24       Impact factor: 47.728

6.  Comprehensive assessment of T-cell receptor beta-chain diversity in alphabeta T cells.

Authors:  Harlan S Robins; Paulo V Campregher; Santosh K Srivastava; Abigail Wacher; Cameron J Turtle; Orsalem Kahsai; Stanley R Riddell; Edus H Warren; Christopher S Carlson
Journal:  Blood       Date:  2009-08-25       Impact factor: 22.113

7.  Bacterial diversity in human subgingival plaque.

Authors:  B J Paster; S K Boches; J L Galvin; R E Ericson; C N Lau; V A Levanos; A Sahasrabudhe; F E Dewhirst
Journal:  J Bacteriol       Date:  2001-06       Impact factor: 3.490

8.  Predicting the molecular complexity of sequencing libraries.

Authors:  Timothy Daley; Andrew D Smith
Journal:  Nat Methods       Date:  2013-02-24       Impact factor: 28.547

  8 in total
  8 in total

1.  Opportunities for improving cancer treatment using systems biology.

Authors:  Jason I Griffiths; Adam L Cohen; Veronica Jones; Ravi Salgia; Jeffrey T Chang; Andrea H Bild
Journal:  Curr Opin Syst Biol       Date:  2019-11-27

2.  An Engineered CRISPR-Cas9 Mouse Line for Simultaneous Readout of Lineage Histories and Gene Expression Profiles in Single Cells.

Authors:  Sarah Bowling; Duluxan Sritharan; Fernando G Osorio; Maximilian Nguyen; Priscilla Cheung; Alejo Rodriguez-Fraticelli; Sachin Patel; Wei-Chien Yuan; Yuko Fujiwara; Bin E Li; Stuart H Orkin; Sahand Hormoz; Fernando D Camargo
Journal:  Cell       Date:  2020-05-14       Impact factor: 41.582

3.  Diversity in biology: definitions, quantification and models.

Authors:  Song Xu; Lucas Böttcher; Tom Chou
Journal:  Phys Biol       Date:  2020-03-19       Impact factor: 2.583

4.  Genome-wide detection of DNA double-strand breaks by in-suspension BLISS.

Authors:  Britta A M Bouwman; Federico Agostini; Silvano Garnerone; Giuseppe Petrosino; Henrike J Gothe; Sergi Sayols; Andreas E Moor; Shalev Itzkovitz; Magda Bienko; Vassilis Roukos; Nicola Crosetto
Journal:  Nat Protoc       Date:  2020-11-02       Impact factor: 13.491

5.  On the Impossibility of Learning the Missing Mass.

Authors:  Elchanan Mossel; Mesrob I Ohannessian
Journal:  Entropy (Basel)       Date:  2019-01-02       Impact factor: 2.524

6.  Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets.

Authors:  Stefano Garlaschi; Anna Fochesato; Anna Tovo
Journal:  Entropy (Basel)       Date:  2020-09-26       Impact factor: 2.524

7.  BUTTERFLY: addressing the pooled amplification paradox with unique molecular identifiers in single-cell RNA-seq.

Authors:  Johan Gustafsson; Jonathan Robinson; Jens Nielsen; Lior Pachter
Journal:  Genome Biol       Date:  2021-06-08       Impact factor: 13.583

8.  Using somatic variant richness to mine signals from rare variants in the cancer genome.

Authors:  Saptarshi Chakraborty; Arshi Arora; Colin B Begg; Ronglai Shen
Journal:  Nat Commun       Date:  2019-12-03       Impact factor: 14.919

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.