| Literature DB >> 33977129 |
Jussi Kalliola1, Jurgita Kapočiūtė-Dzikienė1, Robertas Damaševičius1,2.
Abstract
Accurate price evaluation of real estate is beneficial for many parties involved in real estate business such as real estate companies, property owners, investors, banks, and financial institutes. Artificial Neural Networks (ANNs) have shown promising results in real estate price evaluation. However, the performance of ANNs greatly depends upon the settings of their hyperparameters. In this paper, we apply and optimize an ANN model for real estate price prediction in Helsinki, Finland. Optimization of the model is performed by fine-tuning hyper-parameters (such as activation functions, optimization algorithms, etc.) of the ANN architecture for higher accuracy using the Bayesian optimization algorithm. The results are evaluated using a variety of metrics (RMSE, MAE, R2) as well as illustrated graphically. The empirical analysis of the results shows that model optimization improved the performance on all metrics (reaching the relative mean error of 8.3%).Entities:
Keywords: Artificial neural network; Hyperparameter optimisation; Prediction model; Real estate prices
Year: 2021 PMID: 33977129 PMCID: PMC8064234 DOI: 10.7717/peerj-cs.444
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
The summary of related research works using ANN for the real estate price prediction.
The results cannot be directly comparable due to the different datasets used.
| Authors | City/Country | Dataset size/features | Train: test ratio | ANN Model | Accuracy |
|---|---|---|---|---|---|
| Spain | 100/12 | 85:15 | 12-7-1 | n/a | |
| Athens (Greece) | 3,150/9 | 60:20:20 | 9-5-1 | 0.86 (R2) | |
| Hong Kong | 4,143/29 | 80:20 | 30 models | 0.78 (R2) | |
| Spain | 10,124/6 | 80:20 | 6-6-1 | 0.86 (R2) | |
| Bangladesh | 100/40 | 70:30 | 40-10-1 | 0.92 (R2) | |
| Italy | 90/7 | 80:20 | 8-13-1 | 0.99 (R2) | |
| Taranto (Italy) | 193/42 | 70:15:15 | 20-20-1 | 0.819 (R2) | |
| Philadelphia (USA) | Places365 database | 90:10 | VGG16 | 0.823 (R2) | |
| South Boston (USA) | n/a | n/a | 4 layers | 96.5% accuracy within 20% price range | |
| Hong Kong | n/a | 90:10 | 3 layers | 0.92 (R2) | |
| n/a | n/a | n/a | 3 layers | 3.552 (MAE) |
Description of the dataset attributes.
| Property | ||
|---|---|---|
| Variable | Category | Description |
| Debt free price | Price | Price of an apartment, dept free price |
| Living area | Size | Size (m²) of an apartment |
| Rooms | Size | Number of rooms in an apartment |
| Living floor | Building | Floor number where an apartment is located |
| Total floors | Building | Total number of floors in a building |
| Type | Building | Type of an apartment, numerical value |
| Year | Building | Building year |
| Energy class | Building | Energy class of a building, numerical value |
| Property ownership | Building | Property ownership, own or rental, numerical value |
| Population | Population structure | Population of the postal code area |
| Average age | Population structure | Average age of inhabitants |
| Aged 18 or over | Education level of residents | Amount of over 18-years old, total |
| With education | Education level of residents | People with at least an upper secondary qualification |
| Lower level university degree | Education level of residents | University/tertiary-level degree, lower: lower-degree level tertiary education (level 6) |
| Higher level university degree | Education level of residents | University/tertiary-level degree, higher: higher-degree level tertiary education (level 7) and doctorate degrees or equivalent (level 8) |
| Median income of inhabitants | Resident disposable income | Median income of inhabitants (€) is obtained by listing inhabitants by the amount of disposable monetary income |
| The lowest income category | Resident disposable income | Inhabitants earning at most EUR 13 287 per year |
| The middle-income category | Resident disposable income | Inhabitants earning EUR 13 288 - 31 873 per year |
| The highest income category | Resident disposable income | Inhabitants earning more than EUR 31 874 per year |
| Purchasing power of inhabitants | Resident disposable income | Accumulated purchasing power of inhabitants (€) is the accumulated disposable monetary income |
| Households | Size and stage of life of households | Number of households in total |
| Occupancy rate | Size and stage of life of households | Occupancy rate (m2) is the average floor area that is derived dividing the total floor area by the number of inhabitants |
| Owner-occupied dwellings | Size and stage of life of households | Households living in owner-occupied dwellings are households whose tenure status is owner-occupied dwelling |
| Rented dwellings | Size and stage of life of households | Households with rented dwellings are households whose tenure status is rental, subsidized, interest subsidized rental and right of occupancy dwellings |
| Median income of households | Disposable monetary income of households | Median income of households (€) is obtained by listing households by the amount of disposable monetary income |
| The lowest income category | Disposable monetary income of households | Households earning at most EUR 16 979 per year |
| The middle-income category | Disposable monetary income of households | Households earning EUR 16 980 - 35 297 per year |
| The highest income category | Disposable monetary income of households | Households earning more than EUR 35 298 per year |
| Purchasing power of households | Disposable monetary income of households | Accumulated purchasing power of households (€) is the accumulated disposable monetary income |
| Buildings, total | Buildings and housing | The total number of buildings per area. Free-time residences are not included in this total |
| Residential buildings | Buildings and housing | Residential buildings is the number of buildings per area that are intended for residential use |
| Blocks of flats | Buildings and housing | Dwellings in blocks of flats are dwellings in residential blocks. They include buildings with at least three flats of which at least two are located on top of each other |
| Average floor area | Buildings and housing | Average floor area (m2) is the total floor area of all dwellings divided by their number |
| Workplaces | Jobs by industry | Number of workplaces is the number of people working (including part-time) in each area |
| Employed | Main activity of residents | Employed labor force is defined as people aged 18 to 74 who were employed during the last week of the year |
| Unemployed | Main activity of residents | Unemployed labor force comprises people aged 15 to 64 who were unemployed on the last working day of the year |
| Students | Main activity of residents | Students are defined as persons who study full-time and are not gainfully employed or unemployed |
| Pensioners | Main activity of residents | Pensioners are defined as persons who according to the Finnish Centre for Pensions receive a pension or have some other pension income |
| Distance to a bus stop | Local services | Average distance (m) to the nearest bus stop |
| Distance to a grocery store | Local services | Average distance (m) to the nearest grocery store |
| Distance to a doctor or hospital | Local services | Average distance (m) to the nearest doctor or hospital |
| Distance to a school | Local services | Average distance (m) to the nearest school |
| Distance to a sports center | Local services | Average distance (m) to the nearest sports center |
Descriptive statistics of the selected dataset attributes.
| Variable | Min | Max | Mean | Std. dev. |
|---|---|---|---|---|
| Debt free price, EUR | 100,000 | 995,000 | 308,750 | 152,130 |
| Living area, m2 | 14 | 199 | 65.44 | 28.38 |
| Rooms | 1 | 6 | 2.56 | 1.06 |
| Year | 1925 | 2020 | 1979 | 28.61 |
Figure 1Distribution of apartments according to their construction year.
Figure 2Distribution of apartments according to their size.
Figure 3Statistically significant (p < 0.001) correlations of features with the apartment price.
Figure 4Top 10 property features with the biggest weight values calculated using neighborhood component analysis.
Figure 5Top 10 most important features of the dataset in terms of predictor importance from regression tree.
Prediction performance of an initial (baseline) ANN model.
| No. | Training results | Testing results | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| MSE | MSE Train | MSE Difference | Val loss | Loss | R2 | RMSE | RME | MAE | ||
| 1 | 0.0027 | 0.0024 | 0.0003 | 0.00895 | 0.0026 | 0.90 | 46,517.8 | 10.9% | 31,076.6 | |
| 2 | 0.0026 | 0.0020 | 0.0006 | 0.00896 | 0.0012 | 0.90 | 46,054.24 | 9.89% | 29,642.0 | |
| 3 | 0.0028 | 0.0025 | 0.0002 | 0.00774 | 0.0023 | 0.90 | 47,202.2 | 11.5% | 32,093.7 | |
| 4 | 0.0029 | 0.0020 | 0.0009 | 0.01021 | 0.0012 | 0.90 | 48,106.7 | 10.3% | 30,562.5 | |
| 5 | 0.0027 | 0.0025 | 0.0002 | 0.00909 | 0.0020 | 0.90 | 46,197.5 | 11.3% | 31,468.3 | |
| AVG | 0.0027 | 0.0023 | 0.0004 | 0.00899 | 0.0019 | 0.90 | 46,815.7 | 10.8% | 30,968.6 | |
Hyperparameter optimization algorithm.
Build a probability model Find the hyperparameters that perform best on the probability model Apply these hyperparameters to the objective function and get the performance Update the probability model incorporating the new results Repeat steps 2–4 until max iterations or max computation time is reached |
Hyperparameter value ranges and categories in each optimization iteration.
| No. | Dropout | Batch size | Validation split | Learning rate | Number of layers | Optimizer | Activation function | Number of nodes | Search |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 0–0.5 | 10–1,000 | 0.05–0.2 | 0.0007–0.0011 | 1–10 | Adam, SGD, RMSProp, NAdam | reLU, elu, selu, sigmoid, tanh | 1–3: 64–1,024 | Random |
| 2 | 0–0.2 | 10–1,000 | 0.08–0.19 | 0.0008-0.0011 | 1–10 | Adam | reLu | All: 50–1,000 | Bayes |
| 3 | 0–0.1 | 10–1,000 | 0.08–0.12 | 0.0009–0.0011 | 6–10 | Adam | reLu | 1: 600–1,000 | Bayes |
| 4 | 0–0.05 | 300–700 | 0.08–0.1 | 0.001–0.002 | 6 | Adam | reLu | 1: 700–900 | Bayes |
| 5 | 0.3 | 600 | 0.085 | 0.0014 | 6 | Adam | reLu | 1: 500–750 | Bayes |
Figure 6Overview of the hyperparameter optimization.
The X-axis represents the activation function, optimization function and number of layers, and Y-axis represents MSE values and the difference between MSE on the training and testing set. Plotted MSE values are minimum, average, and maximum from all training sessions. The upper section describes the performance of the model using the MSE metric. The lower section shows the difference between MSE values which describes the over- or under-fitting of the model.
Results for initial and optimized models.
Baseline model architecture is selected based on expert knowledge, considering the best practices in the previous studies. Optimized model is developed using hyper-parameter optimization.
| Model | Training results | Testing results | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| MSE | MSE Train | MSE Difference | Val loss | Loss | R2 | RMSE | RME | MAE | ||
| Baseline | 0.0027 | 0.0023 | 0.0004 | 0.00899 | 0.0019 | 0.90 | 46,815.7 | 10.8% | 30,968.6 | |
| Optimized | 0.0018 | 0.0015 | 0.0003 | 0.00669 | 0.0011 | 0.95 | 33,232.2 | 8.3% | 23,320.9 | |
| Difference | −0.0009 | −0.0008 | −0.0001 | −0.0023 | −0.008 | 0.05 | −13,583.5 | −2.5% | −7647.7 | |
| Improvement | 33.3% | 34.8% | 25.0% | 25.6% | 42.1% | 5.56% | 29.0% | 23.2% | 24.7% | |
Figure 7Real (solid line) and predicted (dots) prices.
Figure 8Number of rooms and performance of the model: (A) Studio, (B) two rooms, (C) three rooms, (D) four rooms, (E) five rooms, (F) six or more rooms.