| Literature DB >> 29732265 |
Lorena Endara1, Hong Cui2, J Gordon Burleigh1.
Abstract
PREMISE OF THE STUDY: Phenotypic data sets are necessary to elucidate the genealogy of life, but assembling phenotypic data for taxa across the tree of life can be technically challenging and prohibitively time consuming. We describe a semi-automated protocol to facilitate and expedite the assembly of phenotypic character matrices of plants from formal taxonomic descriptions. This pipeline uses new natural language processing (NLP) techniques and a glossary of over 9000 botanical terms. METHODS ANDEntities:
Keywords: morphological matrices; natural language processing; phenotypic traits; taxonomic descriptions
Year: 2018 PMID: 29732265 PMCID: PMC5895189 DOI: 10.1002/aps3.1035
Source DB: PubMed Journal: Appl Plant Sci ISSN: 2168-0450 Impact factor: 1.936
Figure 1Software and steps of the natural language processing pipeline used to extract phenotypic traits from taxonomic descriptions. (A) Explorer of Taxon Concepts, (B) MatrixConverter. * indicates steps where human input is required.
Figure 2Explorer of Taxon Concepts tools and steps used to extract taxonomic information from taxonomic descriptions and generate a phenotypic matrix. (A) Text Capture tool, (B) Matrix Generation tool. * indicates steps where human input is required.
Taxa of the Araucariaceae included in the natural language processing analysis
| Genus | No. of characters in raw matrix | No. of characters in final matrix |
|---|---|---|
|
| 61 | |
|
| 27 | 26 |
|
| 52 | 33 |
|
| 62 | 27 |
|
| 31 | 33 |
|
| 58 | 36 |
|
| 29 | 27 |
|
| 30 | 30 |
|
| 29 | 31 |
|
| 31 | 31 |
|
| 38 | 33 |
|
| 29 | 34 |
|
| 55 | 31 |
|
| 51 | 30 |
|
| 29 | 31 |
|
| 37 | 33 |
|
| 26 | 29 |
|
| 39 | 34 |
|
| 53 | 31 |
|
| 53 | 41 |
|
| 32 | |
|
| 40 | 33 |
|
| 43 | 34 |
|
| 34 | 37 |
|
| 56 | 27 |
|
| 33 | 35 |
|
| 31 | 35 |
|
| 32 | 35 |
|
| 27 | 35 |
|
| 26 | 36 |
|
| 24 | 32 |
|
| 35 | 39 |
|
| 28 | 37 |
|
| 27 | 40 |
|
| 31 | 38 |
|
| 29 | 34 |
|
| 38 | 39 |
|
| 20 | 28 |
|
| 36 | 36 |
|
| 25 | 32 |
|
| 27 | 32 |
aSuperscript numbers indicate the source of the taxonomic description: 1Farjon, 2010; 2Earle, 2006; 3Jones et al., 1995.
bPrior to the inclusion of characters extracted from higher‐level descriptions.
| Structures or entities | Natural language processing pipeline | Phenotypic data set (Escapa and Catalano, |
|---|---|---|
| Whole organism | 1. Reproduction of organism | Habit0: monoecious1: dioecious |
| 2. Presence of sap when punctured | ||
| Bud | 3. Prominence of bud | |
| Bark | 4. Presence of cushion‐shaped scars after branches fall | |
| 5. Presence of spongy nodules on bark | ||
| 6. Coating of bark (4) 0: resinous | ||
| 7. Pubescence_or_Relief of bark (7) 0: rough 1: smooth | ||
| 8. Architecture_or_Pubescence of bark (12) 0: scaly (coarsely scaly, finely scaly, thinly scaly) 1: flaky (coarsely flaky_slightly flaky) | ||
| 9. Condition of bark (25) 0: exfoliating | ||
| 10. Type of exfoliation of bark (21) 0: in large thick flakes 1: in plates 2: in patches 3: in scales (irregular scales, in fine scales) 4: in strips (in thin strips) 5: in circular bands | ||
| 11. Coloration of bark (36) 0: brown (dark‐brown, externally dark‐brown, gray‐brown, grey‐brown, light‐ brown, orangebrown, purplish grey‐brown, purplish‐brown, redbrown) 1: grey (ash‐grey, blue‐grey, gray, light gray, red‐gray) 2: black (light‐brown, purplish‐black) 3: white (externally gray‐white, nearly white, white, whitish) 4: red 5: tan 6: green | ||
| 12. Coloration of inner bark (6) 0: red (internally reddish_reddish) 1: brown (internally reddish‐brown, redbrown) 2: tan 3: pink | ||
| Branching | 13. Branching pattern (4) 0: u‐like 1: v‐like | |
| Resin | 14. Coloration of resin (4) 0: white 1: yellow (pale‐yellow_yellowish) | |
| Branch | 15. Orientation of branch (14) 0: horizontal (irregularly horizontal) 1: ascending 2: spreading 3: pendent | |
| Branchlet | 16. Diameter of branchlet (9) 0: 0.0–10.0 1: 10.0–20.0 2: 20.0–30.0 3: 30.0–55.0 | |
| Bud | 17. Shape of bud (6) 0: globular 1: round (rounded with scales) | |
| Leaf | 18. Reflectance of leaf (4) 0: glossy (below shiny, shiny) 1: dull | |
| 19. Texture of leaf (6) 0: coriaceous | ||
| 20. Orientation of leaf (10) 0: spreading 1: incurved (inward) | ||
| 21. Patterns of abaxial side of leaf (12) 0: glaucous (below glaucous, slightly glaucous) 1: non‐glaucous (below non‐glaucous, underneath non‐glaucous) | ||
| 22. Coloration of leaf (16) 0: dark green 1: bright‐green (light green, light‐green, pale‐yellow‐green, yellowish‐green) | ||
| 23. Leaf arrangement | Phyllotaxis of mature leaves0: Helical1: Whorl2: Opposite to subopposite | |
| 24. Arrangement of leaf 2 (14) 0: imbricate (closely imbricate) 1: loosely imbricate | ||
| 25. Shape of leaf 1 (34) 0: lanceolate (narrowly lanceolate, oblong‐lanceolate, oval‐lanceolate, ovate‐lanceolate) 1: elliptic (linear‐elliptic, long‐oval, oblong‐elliptic, oval, ovate‐elliptic) 2: lanceolate 3: circular (round, ovate‐round) 4: ovate (round broadly ovate, triangular‐ovate) 5: lenticular 6: obovate (elliptic‐obovate) 7: triangular | ||
| 26. Shape of leaf 2 (34) 0: laminar blade 1: needlelike 2: scale‐like | ||
| 27. Shape of leaf 3 (14) 0: keeled (dorsally keeled) 1: flattened (somewhat flattened) 2: non‐flattened 3: awl‐shaped | ||
| 28. Shape of leaf apex (21) 0: acute (bluntly acute, sharply acute) 1: obtuse 2: attenuate (acuminate‐attenuate) | Bract/scale fusion at ovuliferous cone0: acute1: obtuse | |
| 29. Length of mature leaf (cm) (8) 0: 0.0–2.5 1: 2.5–5.0 2: 5.0–10.0 3: 10.0–20.0 | Mature leaf length (continuous) | |
| 30. Length of juvenile leaf (cm) (8) 0: 0.0–2.5 1: 2.5–5.0 2: 5.0–10.0 3: 10.0–20.0 4: 20.0–25.0 | ||
| 31. Width of mature leaf (cm) (5) 0: 0.0–1.0 1: 1.0–5.0 2: 5.0–15.0 3: 15.0–20.0 | Mature leaf width (continuous) | |
| 32. Width of juvenile leaf (cm) (16) 0: 0.0–2.0 1: 2.0–4.0 2: 4.0–6.0 3: 6.0–15.0 | ||
| Male cone | 33. Coloration of male cone (6) 0: brown (redbrown, reddish‐brown, ultimately becoming dark‐brown, yellowish‐brown) 1: bluish‐white 2: reddish | |
| 34. Architecture_or_Arrangement_or_Growth_Form of male cone (6) 0: solitary 1: in groups | ||
| 35. Position of male cone (39) 0: axillary 1: terminal | Pollen cone disposition0: axillary1: terminal | |
| 36. Fragility_or_Size of peduncle male cone (6) 0: robust (stout) | ||
| 37. Architecture of peduncle of male cone (12) 0: sessile (almost sessile, short peduncle, shortly pedunculate) 1: peduncle (on peduncle) | ||
| 38. Length of male cone (cm) (38) 0: 0.0–5.0 1: 5.0–10.0 2: 10.0–15.0 3: 15.0–26.0 | Pollen cone length (continuous) | |
| 39. Shape of male cone (30) 0: cylindrical (broadly cylindrical, cylindric, oblong‐cylindric, ovoid‐cylindrical) 1: globose (globular) 2: pyriform 3: ovate | Pollen cone morphology0: spherical/globose1: ellipsoidal/subglobose2: cylindrical3: irregular | |
| 40. Width of male cone (cm) (35) 0: 0.0–1.0 1: 1.0–2.5 2: 2.5–5.0 3: 5.0–15.0 | Pollen cone width (continuous) | |
| Microsporophyll | 41. Arrangement of microsporophyll (7) 0: imbricate (strongly imbricate) 1: spirally | Microsporophyll phyllotaxy0: decussate1: helical2: whorled |
| 42. Shape of microsporophyll (13) 0: triangular (broadly triangular) 1: rhombic 2: oval 3: semicircular | ||
| 43. Shape of microsporophyll apex (4) 0: umbonate 1: acute 2: obtuse | ||
| Midrib | 44. Prominence of midrib (13) 0: prominent (visible) 1: faint (not conspicous) | Midrib0: evident from external view1: not evident from external view |
| Female cone | 45. Coloration of female cone (9) 0: green (glaucous‐green, greenish, olive‐green) 1: brown (purplish brown, chestnut‐brown, dark‐brown, when ripe brown) | |
| 46. Length of female cone (cm) (29) 0: 0.0–10.0 1: 10.0–20.0 2: 20.0–35.0 | Ovuliferous cone length (continuous) | |
| 47. Width of female cone (cm) (29) 0: 0.0–10.0 1: 10.0–20.0 2: 20.0–25.0 | Ovuliferous cone width (continuous) | |
| 48. Shape of female cone (39) 0: globose (globular, subglobose) 1: elliptic (broadly ellipsoidal, globose‐ovoid, oval, ovoid) 2: obovate 3: lanceolate | Ovuliferous cone morphology0: spherical/globose1: ellipsoidal/subglobose2: cylindrical3: irregular | |
| 49. Fusion of bracts and scales of female cone (4) 0: yes | Bract/scale fusion at ovuliferous cone0: absent1: present | |
| 50. Length of female cone scale (cm) (4) 0: 0.0–3.0 1: 3.0–4.0 | ||
| Ovate scales | 51. Shape of ovate scales (39) 0: flattened (somewhat flattened) | |
| 52. Shape of ovate scales 2 (39) 0: broadly ovate 1: thin | ||
| Scales | 53. Arrangement of scale (7) 0: imbricate 1: densely imbricate | |
| 54. Shape of scale (18) 0: round (broadly rounded) 1: triangular (nearly triangular, roughly triangular) 2: angular 3: ovate (ovoid) 4: lanceolate 5: quadrangular | ||
| 55. Seed cone scales apical appendage (18) 0: yes | ||
| 56. Shape of apex of scale (5) 0: well rounded 1: obtuse 2: acuminate | ||
| Cone bract | 57. Shape of cone bract (4) 0: oblong‐elliptic 1: oblong‐ovate 2: acuminate 3: triangular | |
| 58. Length of cone bract (mm) (12) 0: 0.0–10.0 1: 10.0‐–20.0 | ||
| Bract | 59. Orientation of bract (9) 0: recurved 1: erect 2: incurved 3: reflexed | |
| Nut | 60. Size_or_Width of nut (5) 0: broad 1: narrow (relatively narrow) | |
| 61. Shape of nut (7) 0: oblong 1: ovate 2: triangular 3: somewhat rectangular | ||
| Scale | 62. Width of scale (cm) (5) 0: 0.0–3.0 1: 3.0–6.0 | |
| Seeds | 63. Seeds becoming detached 0: yes 1: no scales | Seed abscission0: absent1: present |
| 64. Shape of seed (12) 0: ovoid (ellipsoid_oval, oblong‐subovoid) 1: obovoid 2: cordate (narrowly cordate) 3: rounded 4: triangular | ||
| 65. Width of seed (mm) (12) 0: 0.0–20.0 1: 20.0–50.0 2: 50.0–90.0 | Seed width (continuous) | |
| 66. Length of seed (cm) (26) 0: 0.0–1.0 1: 1.0–4.0 2: 4.0–10.0 3: 10.0–20.0 | Seed length (continuous) | |
| 67. Wings on seeds (39) 0: protruding wing on one side and small protrusion on the other 1: wingless 2: two wings 3: circumferentially winged | Integumentary seed wings0: absent1: presentIntegumentary seed wing symmetry0: 11: 2Integumentary seed wing symmetry0: asymmetric1: symmetric | |
| 68. Length of wing of seed (mm) (5) 0: 0.0–10.0 1: 10.0–20.0 2: 20.0–30.0 | ||
| 69. Shape of wing of seed (11) 0: truncated 1: obovoid 2: rounded (broadly rounded) 3: ovate 4: ovate (broadly ovate) 5: triangular 6: rectangular | ||
| Cotyledons | 70. Quantity of cotyledon (15) 0: 2 1: 4 | Number of cotyledons0: 21: 42: cotyledon tube |
| 71. Germination of cotyledon (19) 0: epigeal (reportedly epigeal) 1: hypogeal | Germination0: epigeal1: cryptogeal |
*Characters extracted from generic descriptions that were manually added to the matrix.
Expressions included in parentheses were coded under the same character state because they were considered synonymous.
Terms or expressions are presented in the format that they were extracted by the pipeline from the source literature. Underscore (e.g., “ellipsoid_oval”) signifies “to” or “or” (i.e., “ellipsoid to oval” or “ellipsoid or oval”).
Character name (no. of taxa with data); code: character state.
Equivalent character(s); code: character state.