| Literature DB >> 35782724 |
Ivan José Reis Filho1,2, Ricardo Marcondes Marcacini2, Solange Oliveira Rezende2.
Abstract
Forecasting models in the financial market generally use quantitative time-series data. However, external factors can influence data in time-series, such as weather events, economic crises, and the foreign exchange market. This information is not explicit in the time-series and can influence the prediction of the variable values. Textual data can be a source of knowledge about external factors and is potentially helpful for time-series forecasting models. Some studies have presented text mining techniques to combine textual and time-series data. However, the existing representations have limitations, such as the curse of dimensionality and sparse data. This work investigates the finite use of domain-specific terms to investigate these problems by representing textual data with low dimensional space. We consider thirty-three keywords that are potentially important in the domain to enrich time-series using text mining techniques. Four regression models were applied to the representation proposed to predict the future daily price of corn and soybeans. The experimental setup considers a real market scenario, in which the daily sliding window strategy and step-forward forecast were used. The representation proposed has better accuracy in some forecasting scenarios. The results indicate that text data are a promising alternative for enriching time-series representations and reducing uncertainty forecasting models.•We show an approach to enriching time-series using domain-specific terms;•Representation proposed combines quantitative data with qualitative market factors;•Regression Models to learn a forecasting function from enriched time-series.Entities:
Keywords: Enriched series; Forecasting; Machine learning; Text mining
Year: 2022 PMID: 35782724 PMCID: PMC9240644 DOI: 10.1016/j.mex.2022.101758
Source DB: PubMed Journal: MethodsX ISSN: 2215-0161
Studies that combine technical information from time-series and textual features to improve the forecasting model.
| Ref | TS domain | Textual rep. | Training vs Test | Algorithm | SW |
|---|---|---|---|---|---|
| AUD-USD daily prices | Bag-of-Words | Set. 2009 - Set. 2012 (60% train vs 40% test) | MLR, MLP | no | |
| gold prices monthly | Clever Craft software | Jan. 1999 - Dec. 2005 vs. Jan. 2009 - Dec. 2009 | ARIMA, ANN | no | |
| daily oil price | TF-IDF | Nov. 2009 - Apr. 2012 vs. Mai 2012 - Jul. 2014 | CNN, LDA | no | |
| hourly taxi demand | GloVe embeddings | Jan. 2013 - Set. 2014 vs. Oct. 2014 - Jun 2016 | DL-LSTM, DL-FC | no | |
| average monthly prices of corn and soybeans | TF-IDF | Jan. 2014 - Feb. 2020 | SVR | yes | |
| average monthly prices of corn and soybeans | BERT | Jan. 2014 - Feb. 2020 | SVR, LSTM | yes | |
| S&P 500 index (monthly and yearly) | BERT | Jan. 2000 - Dec. 2019 | ARIMA, LR, RF, FFNN, LSTM | yes | |
| HSI daily closing price | LDA | Set. 2015 - Dec. 2020 | Rolling Regression Model | yes |
Fig. 1Conceptual Model of the TSED method.
Fig. 3Cross-validation for time-series.
Fig. 2Soybean price series - Chicago of Board Trade (CBOT).
Overview of time-series and textual data used in experiment evaluation.
| Commodity | Corn and Soybean |
| Period | 2014-01-02 to 2020-12-30 |
| Number of Days | 1769 |
| TS Attributes | Values (Open, Close, High, Low): CBOT |
| Number of News | 1398 |
| Domain-specific Keywords | crop, safrinha, losses, yield, estimate, disappoint, excellent, good, rains, planting, increase, decrease, price, reduction, sales, additional, complete, lower, low, more, progress, high, domestic, harvest, production, decline, cost, export, import, no news, record, large, growing |
Hyperparameters used in regression models.
| Model | Parameters |
|---|---|
| HGBR | default |
| SVR | Kernel RBF and gamma auto |
| RF | Depth = 4 and random state = 0 |
| BR | base estimator SVR, estimator number = 10, random state = 0 |
Corn and Soybeans Results with forecast horizon (h).
| Corn | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TS | TSED | DST | TS | TSED | DST | TS | TSED | DST | TS | TSED | DST | TS | TSED | DST | |
| Model | h = 1 | h = 2 | h = 3 | h = 4 | h = 5 | ||||||||||
| HGBR | 1,186 | 7,554 | 1,687 | 7,578 | 2,021 | 7,579 | 2,341 | 7522 | 2,607 | 7,48 | |||||
| SVR (RBF) | 1,240 | 1,632 | 1,953 | 2,220 | |||||||||||
| RF | 7,133 | 1,594 | 7,098 | 7,076 | 2,218 | 7,076 | 2,455 | 7,061 | |||||||
| BR | 1,263 | 6,788 | 1,64 | 6,789 | 1,954 | 6,763 | 2,222 | 6,73 | 2,455 | 6,692 | |||||
| Soybean | |||||||||||||||
| HGBR | 11,316 | 1,394 | 11,212 | 1,748 | 11,302 | 1,989 | 11,028 | 2,192 | 11,093 | ||||||
| SVR (RBF) | 1,022 | 1,382 | 1,696 | 1,947 | 2,147 | ||||||||||
| RF | 1,108 | 1,082 | 1,437 | 10,725 | 1,733 | 10,683 | 1,967 | 10,638 | 2,150 | 10,594 | |||||
| BR | 1,027 | 7,807 | 7,791 | 7,772 | 7,727 | 7,672 | |||||||||
Fig. 4Predicted daily value for corn and soybeans with horizon (h=1).
Comparison of the performance of representations in number of forecasts.
| Corn | |||||
|---|---|---|---|---|---|
| Representions | h = 1 | h = 2 | h = 3 | h = 4 | h = 5 |
| TS | 547 | 570 | 545 | 570 | 489 |
| TSED | 418 | 455 | 466 | 480 | 441 |
| TS = TSED | 272 | 210 | 222 | 181 | 299 |
| TS (MAPE 0%) | 69 | 48 | 42 | 38 | 33 |
| TSED (MAPE 0%) | 57 | 48 | 50 | 30 | 28 |
| Soybean | |||||
| TS | 586 | 584 | 586 | 582 | 587 |
| TSED | 526 | 507 | 536 | 554 | 578 |
| TS = TSED | 125 | 144 | 111 | 95 | 64 |
| TS (MAPE 0%) | 67 | 52 | 44 | 40 | 41 |
| TSED (MAPE 0%) | 60 | 48 | 43 | 43 | 46 |
News published in the previous days in which the price series showed abnormal fluctuations.
| Corn | ||||
|---|---|---|---|---|
| Data | Headline | Prediction | Intraday | Keywords occurrence (News) |
| 2020/01/30 | Brazil to be a Major Exporter of Food to India in the Coming Years. | 2020/01/31 | 1,05% | corn(1), export(3), increase(1), production(4) |
| 2018/07/19 | Brazilians may be missing Selling Opportunity due to Freight Dispute. | 2018/07/20 | -1,40% | additional(2), corn(1), cost(5), crop(6), estimate(2), harvest(1), high(4), import(1), increase(4), large(3), planting(1), production(1), rains(3), record(2) |
| 2018/05/23 | Initial Impact of Truck Strike on Brazilian Agriculture Sector. | 2018/05/24 | -1,47% | corn(2), cost(1), crop(1), domestic(1), export(10), good(1), harvest(1), high(2), increase(2), large(3), price(2), production(4), rains(7), record(2), safrinha(1) |
| Soybean | ||||
| 2020/11/09 | Brazil Importing U.S. Soybeans. | 2020/11/10 | 3,24% | additional(3), domestic(3), export(2), harvest(2), high(1), import(7), large(2), planting(1), price(1), rains(1), record(2), sales(2), soybean(18) |
| 2020/10/14 | Full-Season Corn in Southern Brazil 39% Planted, About Average. | 2020/10/15 | -1,22 | additional(1), crop(7), domestic(1), estimate(6), good(1), growing(3), harvest(2), high(3), increase(2), planting(13), price(4), production(3), rains(2), record(3), reduction(1), safrinha(11), soybean(3), |
| 2017/02/07 | Brazilian Government Announces Upgrade of Port of Santos. | 2017/02/08 | -0,84 | complete(1), cost(1), export(4), good(1), import(4), increase(1), large(1), low(1), production(1), record(1), soybean(1) |
| Subject Area: | |
|---|---|
| More specific subject area: | |
| Method name: | |
| Name and reference of original method: | N.A. |
| Resource availability: |