Literature DB >> 28105037

Classification and Identification of Plant Fibrous Material with Different Species Using near Infrared Technique-A New Way to Approach Determining Biomass Properties Accurately within Different Species.

Wei Jiang1, Chengfeng Zhou2, Guangting Han3, Brian Via4, Tammy Swain5, Zhaofei Fan6, Shaoyang Liu7.   

Abstract

Plant fibrous material is a good resource in textile and other industries. Normally, several kinds of plant fibrous materials used in one process are needed to be identified and characterized in advance. It is easy to identify them when they are in raw condition. However, most of the materials are semi products which are ground, rotted or pre-hydrolyzed. To classify these samples which include different species with high accuracy is a big challenge. In this research, both qualitative and quantitative analysis methods were chosen to classify six different species of samples, including softwood, hardwood, bast, and aquatic plant. Soft Independent Modeling of Class Analogy (SIMCA) and partial least squares (PLS) were used. The algorithm to classify different species of samples using PLS was created independently in this research. Results found that the six species can be successfully classified using SIMCA and PLS methods, and these two methods show similar results. The identification rates of kenaf, ramie and pine are 100%, and the identification rates of lotus, eucalyptus and tallow are higher than 94%. It is also found that spectra loadings can help pick up best wavenumber ranges for constructing the NIR model. Inter material distance can show how close between two species. Scores graph is helpful to choose the principal components numbers during the model construction.

Entities:  

Keywords:  accurate; classification; fibrous material; identification; near infrared; quantitative analysis

Year:  2017        PMID: 28105037      PMCID: PMC5215078          DOI: 10.3389/fpls.2016.02000

Source DB:  PubMed          Journal:  Front Plant Sci        ISSN: 1664-462X            Impact factor:   5.753


Introduction

Plant fibrous material is one of the most valuable materials because of its renewability, abundance and wide application (Cheng, 2009). It can be used in textile (Costa et al., 2013), paper (Hubbell and Ragauskas, 2010), food (Muangrat et al., 2010), medical (Pomin and Mourão, 2008), composite (Messing and Oppermann, 1979), biofuel (Guazzotti et al., 2003), and other areas. In each area the use of plant fibrous material is not limited to one species. Several species are normally used for one production process to ensure enough resource and yield of the product. However, different species of biomass have various properties. Therefore, identification and determination of the properties of plant fibrous material prior to process is of great significance for industrial utilization to ensure the quality of the final product. It is easy to identify different plant fibrous materials when they are in raw condition, because they have special color, shape and structure. However, most of the materials before processing are semi products which are ground, rotted or pre-hydrolyzed (Zheng et al., 2001; Cheng, 2009). Under these conditions, the materials from different species can hardly be identified. Traditionally, they are all considered as raw material and process wet chemistry methods was used to characterize their chemical composition as guidance for the following procedure. However, wet chemistry is known to be time consuming, high pollution and complex procedure, which is not encouraged for the future (Jiang et al., 2010). Even though the classification/identification method on plant fibrous materials have not been studied wildly, near infrared (NIR) is found to be a rapid quantitative determination method on plant fibrous material in recent years (Kelley et al., 2004; Jiang et al., 2014; Zhou et al., 2015). However, most of the NIR researches are focused on one species or several similar species (Yeh et al., 2004; Cozzolino et al., 2006; Jin and Chen, 2007; Xu et al., 2015). The limited number of work including multiple species model construction all had high prediction errors (Table 1) (Ono et al., 2003; Kelley et al., 2004; Yeh et al., 2004; Jin and Chen, 2007; Yao et al., 2010). This indicates that NIR is a good tool to fast evaluate biomass properties on either broad range with high prediction error or small range with more accuracy. A NIR modeling method which can combine broad range of species and prediction accuracy still need to be studied further.
Table 1

A comparison of NIR model prediction of lignin between different species.

Author and yearSampleRange of lignin content (%)R2RMSEP (%)RPD
Jiang et al., 2014Pine5.45–28.590.990.614.34
Yao et al., 2010Acacia spp.17.9–24.90.940.533.01
Jin and Chen, 2007Rice straw7.2–12.80.862.10.76
Kelley et al., 2004Agricultural fibers0.2–35.20.855.51.61
Yeh et al., 2004Pinus taeda8–420.991.05N/A
Ono et al., 2003Forest floor5.6–48.10.9152.1
A comparison of NIR model prediction of lignin between different species. Some researchers found that NIR has potential ability to classify/identify samples from different species, although these researches mostly focused on food science (Barbin et al., 2012; Chen et al., 2012; Zhang et al., 2014). It is believed that high classification accuracy is much easier to achieve than quantitative analysis. If the classification model can approach 100% accuracy or close, it is easy to analyze the unknown sample's property by using a two-step prediction method. This method can first identify the species of the unknown sample, and then quantify the sample using the prediction model constructed on the corresponding species. Therefore, the NIR method of classifying/identifying plant fibrous materials is essential and worth to be studied. It is not only to classify unknown samples for pretreatment, but also a big premise for high precise quantitative analysis. This research tried to construct an accurate classification model using NIR on six different species which were pre-ground. Soft Independent Modeling of Class Analogy (SIMCA) and partial least squares (PLS) were used to build the models, respectively.

Materials and methods

Sample preparation

Six species of biomass were used in this research. Southern pine (25 samples) and Tallow (24 samples) samples were harvested in Alabama, USA. Eucalyptus samples (50 samples) were shipped from South Africa. Kenaf (13 samples), Ramie (10 samples) and Lotus (17 samples) samples were collected from Xinjiang Province, Hu Nan Province and Shandong Province, respectively, in China. All the samples were ground to 40 mesh powders, and then air dried under ambient conditions. In this research, 20 southern pine samples, 20 Tallow samples, 35 Eucalyptus samples, 10 Kenaf samples, 8 Ramie samples and 14 Lotus samples were used for constructing the model. All the rest of the samples were used to verify the model accuracy. The six species belong to three different groups. Pine is a softwood, Eucalyptus and Tallow are hardwoods. Ramie and Kenaf are bast samples. Lotus belongs to aquatic plant. These three big groups with six small species cover most of the bio-based material used in the world. The successful classification of them is very important and significant.

Near infrared spectra collection

The NIR spectra were collected using a PerkinElmer spectrum 400 FT-IR/FT-NIR spectrometer. Biomass powders were analyzed and the reflectance spectra were collected. The spectrum covers a range of 10,000–4000 cm−1 with a spectral resolution of 4 cm−1. Each spectrum is an average of 32 scans.

Classification method

The classification models were conducted with two different methods. One was Soft Independent Modeling of Class Analogy (SIMCA) method (Gemperline et al., 1989). The other one was partial least squares (PLS) modeling method. Prior to modeling, a spectral pretreatment was performed using multiple scattering correction (MSC) coupled with a first and second derivative with a Savitzky-Golay approach to decrease the noise of the spectra. The pretreatment can significantly reduce the noise including sample color, sample size unevenness and machine noise. SIMCA is a statistical method for supervised classification of data. The samples in different species can be analyzed using principal components (PC) analysis. This method is used on classification of thermally modified wood in a previous study (Bachle et al., 2012). PLS is traditionally a quantitative analysis method. In this study, we set up some rules that can use PLS to be applied on classification research. As described in Table 2, the samples that come from different species were assigned to different values (1, 2, 3…n). Then a PLS model was constructed based on these values. If the predicted value of the sample was inside the 0.5 error area (±0.5) of one number, this sample was identified to the relevant species.
Table 2

Algorithm for classify different species samples using PLS.

SampleSample 1Sample 2Sample 3Sample n
Sample sizeN1N2N3Nn
Assigned value123n
Classification value0.5–1.51.51–2.52.51–3.5(n − 0.5)–(n + 0.5)
Prediction valueA1–AN1B1–BN2C1–CN3Z1–ZNN
Recognition no.Nrg = The number of sample that prediction value inside the classification value
Recognition rateNrg/Nx × 100% (x = 1, 2, 3,…, n)
Rejection no.Nrj = The number of sample that prediction value outside the classification value
Rejection rateNrj/(N1 + N2 + N3 ++ Nn–Nx) × 100%
Algorithm for classify different species samples using PLS. In this research, the values of the six species were assigned as following: 1: Tallow, 2: Eucalyptus, 3: Pine, 4: Kenaf, 5: Ramie, 6: Lotus (Roughly based on the cellulose content from low to high).

Results

NIR spectra of all samples

By reviewing the NIR spectra of the six species in Figure 1, it is found that the six species can be clearly separated to two different groups. The wood samples including Eucalyptus, Tallow and Pine have similar spectra while Lotus, Kenaf, and Ramie hold close patterns, especially in the wavenumber range of 7500–6000 cm−1. This indicates that the wood samples and non-wood samples can be easily separated.
Figure 1

Raw spectra (left) and First derivative spectra (right) of 6 species samples.

Raw spectra (left) and First derivative spectra (right) of 6 species samples.

SIMCA classification

An optimized classification model was successfully constructed using SIMCA method. It is found that the model has perfect prediction ability on Kenaf, Lotus, Ramie, Pine, and Eucalyptus (Table 3). They show 100% recognition rate and rejection rate. Tallow has 100% recognition rate while 94% rejection rate, which means the model may identify some other samples to Tallow. The identification results (Table 4) show that most of the samples were successfully identified to the correct species including Tallow. Only one Lotus samples was misidentified to other samples. As described in the previous section, Lotus is the Aquatic plant which differs from wood and bast samples; and moreover, the sample size of Lotus is not large enough. Only 14 Lotus samples were involved for the model construction and three for identification, which causes the Lotus samples not to be identified completely. In the future study, by adding more samples for model construction could help improve the accuracy at lotus species.
Table 3

Classification performance report using SIMCA method.

MaterialKenafLotusRamiePineEucalyptusTallow
Recognition rate (%)100 (10/10)100 (13/13)100 (8/8)100 (20/20)100 (35/35)100 (20/20)
Rejection rate (%)100 (96/96)100 (93/93)100 (98/98)100 (86/86)100 (71/71)94 (81/86)
Table 4

Identification result of SIMCA model.

No.Sample IDSpecified materialIdentified materialResultSpecified material total distance ratioSpecified material distance ratio limit
1Kenaf 1KenafKenafPassed0.55211.0000
2Kenaf 2KenafKenafPassed0.51631.0000
3Kenaf 3KenafKenafPassed0.93991.0000
4Lotus 1LotusLotusPassed0.64241.0000
5Lotus 2LotusLotusPassed0.85781.0000
6Lotus 3LotusOtherFailed2.10821.0000
7Ramie 1RamieRamiePassed0.61661.0000
8Ramie 2RamieRamiePassed0.78001.0000
9Pine 1PinePinePassed0.79801.0000
10Pine 2PinePinePassed0.80761.0000
11Pine 3PinePinePassed0.76571.0000
12Pine 4PinePinePassed0.88621.0000
13Pine 5PinePinePassed0.85001.0000
14Tallow 1TallowTallowPassed0.77471.0000
15Tallow 2TallowTallowPassed0.94581.0000
16Tallow 3TallowTallowPassed0.96301.0000
17Tallow 4TallowTallowPassed0.88361.0000
18Eucalyptus 1EucalyptusEucalyptusPassed0.68951.0000
19Eucalyptus 2EucalyptusEucalyptusPassed0.81271.0000
20Eucalyptus 3EucalyptusEucalyptusPassed0.83751.0000
21Eucalyptus 4EucalyptusEucalyptusPassed0.81841.0000
22Eucalyptus 5EucalyptusEucalyptusPassed0.71951.0000
23Eucalyptus 6EucalyptusEucalyptusPassed0.85531.0000
24Eucalyptus 7EucalyptusEucalyptusPassed0.87951.0000
25Eucalyptus 8EucalyptusEucalyptusPassed0.70721.0000
26Eucalyptus 9EucalyptusEucalyptusPassed0.77131.0000
27Eucalyptus 10EucalyptusEucalyptusPassed0.85781.0000
28Eucalyptus 11EucalyptusEucalyptusPassed0.92241.0000
29Eucalyptus 12EucalyptusEucalyptusPassed0.88401.0000
30Eucalyptus 13EucalyptusEucalyptusPassed0.79801.0000
31Eucalyptus 14EucalyptusEucalyptusPassed0.67931.0000
32Eucalyptus 15EucalyptusEucalyptusPassed0.92181.0000
Classification performance report using SIMCA method. Identification result of SIMCA model.

PLS classification

Another classification model was successfully constructed using PLS method with optimized parameters. The cross validation report (R2 = 98.49) shows the species have strong relevance with the number that set in previous section. The classification results were calculated based on the method of Table 2. It is found that the classification results (Table 5 and Figure 2) perfectly matched the SIMCA model, in which the Pine, Kenaf, Ramie and Lotus have excellent classification results, while Tallow and Eucalyptus slightly overlap on data.
Table 5

Classification results using PLS (cross validation).

SampleTallowEucalyptusPineKenafRamieLotus
Sample no.20352010813
Classification value0.5–1.51.51–2.52.51–3.53.51–4.54.51–5.55.51–6.5
Prediction value0.70–1.521.62–2.232.81–3.183.99–4.264.60–5.255.64–6.24
Recognition no.19352010813
Recognition rate95%100%100%100%100%100%
Rejection no.86/8670/7186/8696/9698/9893/93
Rejection rate100%98.6%100%100%100%100%
Figure 2

Cross validation results using PLS.

Classification results using PLS (cross validation). Cross validation results using PLS.

Discussion

Wavenumber range selection for improving classification precision

This section explains how the optimized wavenumber ranges were chosen. Spectra loading plots are the data that were generated from PLS method. They show the most important information that was used in constructing the model. Figure 3 shows the spectra loading plots of PC1–4. It is found that the wavenumbers higher than 9000 cm−1 barely contain any useful information. The best wavenumber ranges were 7500–4000 cm−1 for PC 1; 7800–4000 cm−1 for PC2, PC3, and PC4. It is also found that 9000–7800 cm−1 may contain helpful information from loading plots of PC2 and PC3. Based on the above results, the wavenumber ranges of 7500–4000 cm−1 or (9000–7800)–4000 cm−1 were chosen to construct the model. It was found that the optimized wavenumber ranges are 7500–4000 cm−1 for SIMCA method, and 8500–4000 cm−1 for PLS method, respectively. Figures 4, 5 approve the above optimization. It was found that all the classification and identification performances were significantly improved by using the optimized wavenumber ranges.
Figure 3

Spectra loading plots of PC1–4 using PLS.

Figure 4

Classification results using different wavenumber ranges for SIMCA (left) and PLS (right) model.

Figure 5

Identification results using different wavenumber range for SIMCA model.

Spectra loading plots of PC1–4 using PLS. Classification results using different wavenumber ranges for SIMCA (left) and PLS (right) model. Identification results using different wavenumber range for SIMCA model.

Relationship between species on classification

The study found that the Eucalyptus and Tallow samples were not perfectly classified in previous results. This section explains why this happens and how to separate them better. Table 6 gives the inter material distance (IMD) between species using SIMCA method. The IMD shows the relationship between species: when the two species have closer relationship, the IMD will be smaller; and when the two species have big difference, the IMD will be larger. It was found that the IMDs between wood species (Eucalyptus, Tallow and Pine) and Bast species (Kenaf and Ramie) are all higher than 10, which means the wood species and bast species can be separated effortlessly. The IMDs between Lotus and Bast species and those between Lotus and Wood species are 6–10, implying that Lotus samples can be easily separated from other species. The IMD between the bast fibers (Kenaf and Ramie) is 4.69, which is lower than 6. The IMDs are all lower than 6 within wood species, the IMD between Eucalyptus and Pine is 5.29, and the IMD between Tallow and Pine is 3.8, the IMD between Tallow and Eucalyptus is the lowest value of 2.61, which can explain why the Eucalyptus and Tallow samples overlap a little during classification.
Table 6

Inter material distance of SIMCA model.

MaterialKenafLotusRamiePineEucalyptusTallow
Kenaf8.374.6911.811.79.13
Lotus9.2712.311.18.74
Ramie12.513.710.9
Pine5.293.8
Eucalyptus2.61
Inter material distance of SIMCA model. Figure 6 gives the score values of all the samples for PC1–4 using PLS method. The score values show clearly how close the species are, and give us the idea on which PC we can chose to classify the species better. It was found that only wood samples (Eucalyptus, Tallow, and Pine) and non-wood samples (Kenaf, Ramie and Lotus) can be separated using PC 1. By choosing PC 2, the pine samples were separated from Eucalyptus and Tallow; Kenaf, Ramie and Lotus samples were also separated well. Eucalyptus and Tallow samples started to separate by choosing PC 3. Eucalyptus and Tallow samples were well separated when PC 4 was chosen. However, the other samples were mixed again. When choosing PC 5 (data not shown), it was found that all the samples were mixed. The data above demonstrates that combining PC1–4 are the best for classifying all the samples.
Figure 6

Scores values of PC1–4 using PLS.

Scores values of PC1–4 using PLS.

Conclusions

The spectra of six different species samples, including Tallow, Eucalyptus, Pine, Ramie, Kenaf and Lotus, were collected and analyzed using NIR classification software (SIMCA). A new algorithm was also created to classify the six species using quantitative analysis method (PLS). Results found that the six species can successfully be classified using SIMCA and PLS methods. These two methods show similar results. The identification rete and rejection rate for all the samples were above 94%. It was also found that spectra loadings, inter material distance and scores graph were helpful for construct the model. In the future study, with more species added in the model, the NIR model could be able to identify most of the plant fibrous species frequently used in the industry. And combined with a quantitative analysis method on each species, a wildly applicable and high precision rapid prediction system can be established and used in the future.

Author contributions

GH and BV developed the research hypothesis and the experiment design. WJ, TS, and ZF performed sample preparation, spectra collection and SIMCA analysis. WJ and CZ performed PLS analysis and the manuscript draft. SL revised the English and discussion. The final manuscript is the end product of joint writing efforts of all authors.

Funding

This work was supported by the Award Funds for Outstanding Middle-Aged and Young Scientists of the Shandong Province (BS2014CL044), Taishan Scholars Construction Engineering of Shandong Province, and the Program for Scientific Research Innovation Team in the Colleges and Universities of the Shandong Province.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
  8 in total

1.  Degumming of ramie fibers by alkalophilic bacteria and their polysaccharide-degrading enzymes.

Authors:  L Zheng; Y Du; J Zhang
Journal:  Bioresour Technol       Date:  2001-05       Impact factor: 9.642

2.  Rapid prediction of solid wood lignin content using transmittance near-infrared spectroscopy.

Authors:  Ting-Feng Yeh; Hou-Min Chang; John F Kadla
Journal:  J Agric Food Chem       Date:  2004-03-24       Impact factor: 5.279

3.  Rapid identification of adulterated cow milk by non-linear pattern recognition methods based on near infrared spectroscopy.

Authors:  Li-Guo Zhang; Xin Zhang; Li-Jun Ni; Zhi-Bin Xue; Xin Gu; Shi-Xin Huang
Journal:  Food Chem       Date:  2013-08-27       Impact factor: 7.514

4.  Effect of acid-chlorite delignification on cellulose degree of polymerization.

Authors:  Christopher A Hubbell; Arthur J Ragauskas
Journal:  Bioresour Technol       Date:  2010-05-14       Impact factor: 9.642

5.  Near-infrared hyperspectral imaging for grading and classification of pork.

Authors:  Douglas Barbin; Gamal Elmasry; Da-Wen Sun; Paul Allen
Journal:  Meat Sci       Date:  2011-07-21       Impact factor: 5.209

6.  Prediction of mixed hardwood lignin and carbohydrate content using ATR-FTIR and FT-NIR.

Authors:  Chengfeng Zhou; Wei Jiang; Brian K Via; Oladiran Fasina; Guangting Han
Journal:  Carbohydr Polym       Date:  2015-01-02       Impact factor: 9.381

7.  Classification of Chinese honeys according to their floral origin by near infrared spectroscopy.

Authors:  Lanzhen Chen; Jiahua Wang; Zhihua Ye; Jing Zhao; Xiaofeng Xue; Yvan Vander Heyden; Qian Sun
Journal:  Food Chem       Date:  2012-03-03       Impact factor: 7.514

Review 8.  Structure, biology, evolution, and medical importance of sulfated fucans and galactans.

Authors:  Vitor H Pomin; Paulo A S Mourão
Journal:  Glycobiology       Date:  2008-09-16       Impact factor: 4.313

  8 in total
  3 in total

Review 1.  HD2-type histone deacetylases: unique regulators of plant development and stress responses.

Authors:  Muhammad Sufyan Tahir; Lining Tian
Journal:  Plant Cell Rep       Date:  2021-05-26       Impact factor: 4.570

2.  Near-Infrared Spectral Characteristic Extraction and Qualitative Analysis Method for Complex Multi-Component Mixtures Based on TRPCA-SVM.

Authors:  Guiyu Zhang; Xianguo Tuo; Shuang Zhai; Xuemei Zhu; Lin Luo; Xianglin Zeng
Journal:  Sensors (Basel)       Date:  2022-02-20       Impact factor: 3.576

3.  Production of Novel Polygalacturonase from Bacillus paralicheniformis CBS32 and Application to Depolymerization of Ramie Fiber.

Authors:  Md Saifur Rahman; Yoon Seok Choi; Young Kyun Kim; Chulhwan Park; Jin Cheol Yoo
Journal:  Polymers (Basel)       Date:  2019-09-19       Impact factor: 4.329

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.