| Literature DB >> 29212472 |
Jen-Mei Chang1, Hui Zeng2, Ruxu Han2, Ya-Mei Chang3, Ruchit Shah4, Carolyn M Salafia4,5,6, Craig Newschaffer7, Richard K Miller6,8, Philip Katzman6,8, Jack Moye9, Margaret Fallin10, Cheryl K Walker6,11, Lisa Croen6,12.
Abstract
BACKGROUND: Autism Spectrum Disorder (ASD) is one of the fastest-growing developmental disorders in the United States. It was hypothesized that variations in the placental chorionic surface vascular network (PCSVN) structure may reflect both the overall effects of genetic and environmentally regulated variations in branching morphogenesis within the conceptus and the fetus' vital organs. This paper provides sound evidences to support the study of ASD risks with PCSVN through a combination of feature-selection and classification algorithms.Entities:
Keywords: Arterial network; Autism spectrum disorder risk; Boruta algorithm; Linear discriminant analysis; Placenta; Placental chorionic surface vascular network (PCSVN); Principal component analysis; Random forest
Mesh:
Year: 2017 PMID: 29212472 PMCID: PMC5719902 DOI: 10.1186/s12911-017-0564-8
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1A flowchart of the research pipeline. The proposed work follows a three-stage process: preprocessing (left), feature-selection (middle), and classification (right). The entire process is automated except to obtain the color tracing
Fig. 2The process of obtaining a feature vector for each placenta. a A digital photograph of the placental chorionic surface vascular network (PCSVN) from the NCS data set. b Traced PCSVN for the image in (a) following the tracing protocols in [14]. c The skeletonisation of the traced PCSVN image in (b) that was produced by a MATLAB program written in house by the research team. d Numerical values of PCSVN features computed by our MATLAB program for the image in (c). Each of the 290 placentas in our data set is associated with a list of values similar to those given in (d)
Fig. 3Feature selection result. Importance scores (horizontal axis) for each of the arterial vascular features (vertical axis) returned by the Boruta algorithm. A feature is considered the most relevant and ranked the highest when its importance score is the largest
Fig. 4Visual definitions for the five principal features. a Principal feature 1 and 4: branch points and distance from an end point to its nearest point on the boundary. b Principal feature 2: vessel thickness, which is the same as the diameter (d ) of the i th vessel tube. c Principal feature 3: tortuosity, which is the ratio of arc length and straight line distance between the initial and terminal nodes of the vessel segment, i.e., tortuosity of the i th vessel = c /d . d Principal feature 5: branch angle, which is given by the angle between the two line segments formed by connecting the initial node and the fourth pixel of each branch
Fig. 5Visible difference between high- and low-risk ASD groups in low dimensions. The box whisper plot of the projection coefficients for the first five principal components of EARLI (89 data points) and NCS (201 data points) cohorts. The difference between the two groups are apparent and consistent across all five PCs. For example, the mean of the first PC projection coefficients among the EARLI placentas is negative while the mean of the first PC projection coefficients among the NCS placentas is positive
Fig. 6Images with the lowest (left) and the highest (right) Principal Component (PC) projection coefficients for the first five PCs. a Principal feature 1: number of branch points. For example, the average number of branch points is 36.74 with a standard deviation of 15.66 in EARLI (left) and 48.48 with a standard deviation of 16.34 in NCS (right). b Principal feature 2: thickness. For example, the average mean thickness is 0.16 with a standard deviation of 0.03 in EARLI (left) and 0.13 with a standard deviation of 0.02 in NCS (right). c Principal feature 3: tortuosity. For example, the average standard deviation of tortuosity is 0.06 with a standard deviation of 0.03 in EARLI (left) and 0.08 with a standard deviation of 0.03 in NCS (right). d Principal feature 4: growth extension. For example, the average mean distance from end points to the nearest boundary point is 2.82 with a standard deviation of 0.48 in EARLI (left) and 2.96 with a standard deviation of 0.41 in NCS (right). e Principal feature 5: branching angle. For example, the average mean branching angle is 102.28 with a standard deviation of 2.87 in NCS (left) and 100.64 with a standard deviation of 3.51 in EARLI (right)
Fig. 7A visualization of the Linear Discriminant Analysis (LDA) result. Each placenta in the data set is associated with a dot in the projected space. The vertical dashed line serves as a separation threshold. In the case of a perfect separation, all points on the top line should fall to the left of the threshold while all points on the bottom line should fall to the right of the threshold. The graph illustrates cases for which the classifier has an easier time (True Negative and True Positive) and a harder time (False Positive and False Negative) predicting
The first five principal components (PCs) of the data retain approximately 88% of the data variability
| Boruta ranking | Vascular features (variability captured) | PC1 (35.27 | PC2 (22.57 | PC3 (17.20 | PC4 (7.79 | PC5 (5.80 |
|---|---|---|---|---|---|---|
| 1 | MeanThickness | − 0.1582 | − 0.4747 | 0.1035 | 0.0651 | − 0.0089 |
| 2 | MeanTortuosity | 0.0002 | 0.0575 | 0.5347 | − 0.0979 | 0.0013 |
| 3 | MurrayL1FitError | − 0.256 | − 0.3903 | 0.0438 | 0.0139 | 0.0397 |
| 4 | StdThickness | − 0.1566 | − 0.4762 | 0.0701 | − 0.0046 | 0.0196 |
| 5 | StdDevTortuosity | 0.0029 | 0.0812 | 0.5912 | − 0.0641 | 0.1449 |
| 6 | MaxTortuosity | 0.0948 | 0.0724 | 0.5459 | − 0.0264 | 0.1709 |
| 7 | MeanAngle | − 0.0611 | 0.0704 | 0.2028 | 0.2135 | − 0.936 |
| 8 | NumEndPoints | 0.4251 | − 0.0298 | − 0.0132 | 0.0153 | − 0.005 |
| 9 | ArcLength | 0.3773 | − 0.1259 | − 0.0035 | − 0.0163 | 0.0116 |
| 10 | NumBranchPoints | 0.4254 | − 0.0301 | − 0.0125 | 0.0146 | − 0.0038 |
| 11 | MurrayBranchesUsed | 0.4254 | − 0.0301 | − 0.0125 | 0.0146 | − 0.0038 |
| 12 | Volume | 0.1444 | − 0.4823 | 0.065 | 0.0502 | − 0.0368 |
| 13 | NumGenerations | 0.3182 | − 0.0237 | 0.014 | 0.2178 | − 0.0619 |
| 14 | MeanDistEndPointToPerim | 0.0055 | − 0.0323 | 0.0545 | 0.905 | 0.2124 |
| 15 | VesselToDiscPercent | 0.255 | − 0.3502 | 0.0031 | − 0.2561 | − 0.1457 |
The absolute value of the attributes within each PC gives a measure of contribution. The higher the value, the bigger the contribution. Specifically, NumEndPoints, NumBranchPoints, and MurrayBranchesUsed contributed the most to PC1, Thickness, StdThickness, and Volume contributed the most to PC2, MeanTortuosity, StdDevTortuosity, MaxTortuosity contributed the most to PC3, MeanDistEndPointToPerim contributed most to PC4, and MeanAngle contributed most to PC5