Next Article in Journal
Removal of >10 µm Microplastic Particles from Treated Wastewater by a Disc Filter
Previous Article in Journal
Groundwater Autochthonous Microbial Communities as Tracers of Anthropogenic Pressure Impacts: Example from a Municipal Waste Treatment Plant (Latium, Italy)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Support Vector Regression Integrated with Fruit Fly Optimization Algorithm for River Flow Forecasting in Lake Urmia Basin

1
Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, Bahman Boulevard 29, Iran
2
Department of Hydrosciences, Technische Univeristät Dresden, 01069 Dresden, Germany
3
Institute of Visual Informatics, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia
4
School of the Built Environment, Oxford Brookes University, Oxford OX30BP, UK
5
Institute of Automation, Kando Kalman Faculty of Electrical Engineering, Obuda University, 1034 Budapest, Hungary
6
Queensland University of Technology, Brisbane QLD 4059, Australia
7
Department for Management of Science and Technology Development, Ton Duc Thang University, Ho Chí Minh, Vietnam
8
Faculty of Information Technology, Ton Duc Thang University, Ho Chí Minh, Vietnam
9
School of Architecture, Design and the Built Environment, Nottingham Trent University, 50 Shakespeare St, Nottingham NG1 4FQ, UK
*
Author to whom correspondence should be addressed.
Water 2019, 11(9), 1934; https://doi.org/10.3390/w11091934
Submission received: 20 June 2019 / Revised: 10 July 2019 / Accepted: 11 July 2019 / Published: 17 September 2019
(This article belongs to the Section Hydrology)

Abstract

:
Advancement in river flow prediction systems can greatly empower the operational river management to make better decisions, practices, and policies. Machine learning methods recently have shown promising results in building accurate models for river flow prediction. This paper aims to identify models with higher accuracy, robustness, and generalization ability by inspecting the accuracy of a number of machine learning models. The proposed models for river flow include support vector regression (SVR), a hybrid of SVR with a fruit fly optimization algorithm (FOA) (so-called FOASVR), and an M5 model tree (M5). Additionally, the influence of periodicity (π) on the forecasting enactment was examined. To assess the performance of the proposed models, different statistical meters were implemented, including root mean squared error (RMSE), mean absolute error (MAE), correlation coefficient (R), and Bayesian information criterion (BIC). Results showed that the FOASVR with RMSE (4.36 and 6.33 m3/s), MAE (2.40 and 3.71 m3/s) and R (0.82 and 0.81) values had the best performance in forecasting river flows at Babarud and Vaniar stations, respectively. Also, regarding BIC parameters, Qt−1 and π were selected as parsimonious inputs for predicting river flow one month ahead. Overall findings indicated that, although both the FOASVR and M5 predicted the river flows in suitable accordance with observed river flows, the performance of the FOASVR was moderately better than the M5 and periodicity noticeably increased the performance of the models; consequently, FOASVR can be suggested as the most accurate method for forecasting river flows.

1. Introduction

Dependable approximation of discharge is imperative in water resources management [1]. River flow prediction has emerged from hydrological modeling and transformed into a dynamic and active research area [2,3]. Studying the river flow and stream flow is fundamental to flood protection, sustainable irrigation, and urban development [4,5]. Due to the uncertainties in atmospheric behavior associated with change climate, the dynamic and data-driven methods for hydrological modeling of catchments have become popular more than ever [6]. Data mining and machine learning methods have brought novelty in producing insight from big data [7,8]. These tools forecast forthcoming trends using knowledge-driven decisions resulting from enormous input-output data. The literature includes reviews of the latest machine learning models and comparative studies of the models in river and stream flow forecasting [8,9,10,11,12,13]. Among the machine learning methods used for river flow prediction, machine learning models presented higher performance with better accuracy and generalization ability for river flow as well many hydrological applications [14]. However, there are many machine learning methods that have never been applied in this realm, which presents a gap in the literature. This paper, consequently, aims at introducing new models and evaluating their performance.
The M5 model tree (M5), as a sub-technique of data mining, constructs tree-based linear models for continuous data. Lately, the implementation of M5 as a decision tree-based regression method has been used for hydrological and water-related studies [15,16,17,18,19,20,21,22]. Londhe and Dixit [23] implemented M5 to estimate the river flow at two stations in India. The models to predict the next day’s river flow were established by the preceding day’s gauged river flow and rainfall. Sattari et al. [24] inspected the proficiencies of the support vector machine (SVM) and M5 model tree in forecasting the flows of the Sohu River. They revealed that M5 provided more precise predictions when compared with SVM.
SVM is a technique in which the strong points of traditional statistical methods, which are theory-oriented and analytically simple, are utilized. The SVM approach has been frequently implemented in the areas of hydrology and time series forecasting. Liong and Sivapragasam [25] applied the method to foresee floods. Yu et al. [26] suggested a method for forecasting the daily runoff by combining Chaos Theory and the SVM method. Recently, the support vector regression (SVR) method has been developed based on SVM and shows superiority in the prediction of hydrologic processes. Kalteh [27], by applying an Artificial Neural Network (ANN) and SVR to monthly streamflow recorded at two different stations, revealed that both models coupled with wavelet transformation produced more accurate outcomes than the regular models. Also, the results specified that SVR models had enhanced performance in comparison to ANN models. Wu et al. [28] used a genetic algorithm to optimize the SVR model, and the result established that the suggested model could anticipate river flow precisely in comparison with other models. Londhe and Gavraskar [29] utilized the SVR model to forecast river flow one day ahead in two studied locations. The model results were favorable according to the low values of the evaluating metrics.
On the other hand, Cao and Wu [30] coupled the Fruit Fly Optimization algorithm (FOA) with SVR (the combination was named FOASVR) to optimize the parameters of SVR for seasonal electricity consumption forecasting. The results showed that applying FOASVR had a significant role in increasing the prediction accuracy. Lijuan and Guohua [31] used FOASVR to estimate monthly incoming tourist flow, and it was reported that the suggested FOASVR is a viable option for touristic applications. To the best knowledge of the authors, FOA has not previously been integrated with SVR in river flow forecasting.
The primary objective of this research is to present models with higher accuracy, robustness, and generalization ability for river flow prediction. M5, support vector regression, and optimized SVR with FOA were used to forecast river flow at the Vaniar and Babarud stations on the Aji Chay and the Barandouz rivers, respectively, located in the Lake Urmia Basin of Iran. Some evaluation parameters for error estimation are utilized to assess the enactment of the considered models.

2. Study Area

The current study used data on the monthly river flow for the Vaniar station on the Aji Chay River and the Babarud station on the Barandouz River, both located at Lake Urmia Basin of Iran (Figure 1). The observed data includes 780 monthly river flow measurements (taken over 65 years from 1952 to 2017) at Babarud station and 744 monthly records (over 62 years from 1952 to 2014) at Vaniar station. There is no basic and technical way of separating training and testing data. For example, the study by Kurup and Dudani [32] used 63% of total data for model development, whereas Pal [33] used 69%, Samadianfard et al. [20,21,34] used 67% of total data, and Deo et al. [35] and Samadianfard et al. [36] used 70% of the total data to develop their models. Thus, to develop the studied models, the data are divided into training (70%) and testing (30%). Table 1 displays the statistics of implemented data for both stations. The observed data confirms high positive values of skewness (Csx = 2.13 and 3.19). Furthermore, the low auto-correlations demonstrate the low persistence for both mentioned stations. It should be noted that Lake Urmia is currently in a drought crisis, with the amount of precipitation and consequently river flow having decreased in recent years; therefore, this may cause some difficulties in forecasting river flows.

3. Techniques Applied in Modeling

3.1. M5 Model Tree

With a constant value at their leaves, model trees are based on regression trees [38]. In this regard, M5, as one of the versions of model trees, has a high capability to forecast continuous numerical attributes [39]. Two different steps are necessary to develop tree models. First, a splitting principle should be utilized to create a decision tree. This criterion is constructed using the standard deviation (SD) of the class values which reach a node as a size of the error. So, the standard deviation reduction (SDR) is given by:
S D R = S D ( T ) | T i | | T | × S D ( T i )
where T is a set of data that reaches the node and Ti is the ith subset of data. After the first step, data in the secondary nodes have lower SD as compared with the initial nodes, so M5 selects the split which expects to maximize error reduction. The main drawback of this step is the production of a large tree, which may cause an overfitting problem. Pruning techniques should be employed in order to fix this problem and to avoid overfitting. Therefore, the second step for developing M5 involves these techniques and substitution of subtrees with linear functions. By applying these two steps, M5 develops a linear model for each subspace.

3.2. Support Vector Regression (SVR)

SVM is a recognized technique for classification and regression [40]. Generally, regression-based SVM is called SVR. To solve complex problems effectively, SVR is constructed based on minimizing the structural risk. The insensitive loss function ( ε ) is identified as the model tolerating errors up to  in the training data. Thus, the SVR ε pursues a linear function as follows:
F ( x ) = w T x + b
where w and b represent the coefficients of the weight vector. This can be clarified as the following problem:
M i n   1 2 | | w | | 2 + C i = 1 N ( ξ i + ξ i * )                                         S u b j e c t   t o { F ( x ) y i ε + ξ i * y i F ( x ) ε + ξ i ξ i , ξ i * 0 ,       i = 1 , 2 , , N
where C > 0 is a penalty parameter which has to be selected earlier. The constant C can grade the experimental error. Moreover, ξ i   and   ξ i * , which are known as slack variables, indicate the distance between real values and the corresponding boundary values of ε -tube. Hence, in order to minimize Equation (2) subject to Equation (3), the function is given by [41,42]:
f ( x ) = i = 1 N ( α i * α i ) K ( x , x i ) + B
where K ( x , x i ) is the kernel function, a i , a i * 0 are the Lagrange multipliers and B is a bias term. The kernel trick is an approach which is used to solve this problem by SVR [43]. In this study, the widely implemented kernel named radial basis function (RBF) is utilized for building an optimum SVR model. Converging fast, working well in high dimensional spaces, and being simple are some advantages of the selected kernel [44]. The kernel function equation is as follows:
k ( x ,   x i ) = exp ( γ | | x x i | | 2 )
where γ is the bandwidth of the kernel function and C, γ , and ε are three predefined parameters. In this research, 1, 0.01, and 0.001 were selected as default amounts which are used in WEKA software. Figure 2 indicates the schematic configuration of the SVR model.

3.3. Fruit fly Optimization Algorithm (FOA)

FOA, which was introduced by Pan [45], is a swarm intelligence optimization algorithm that imitates the activities of fruit flies to search for the global optimum. Fruit flies can identify a smell from a distance of as much as 40 km and fly all the way to it. Figure 3 displays the food searching progression utilized by the fruit fly iteratively. According to Pan [45], the following equations are exploited to acquire the initial swarm location of a fruit fly (LR):
X 0 = r a n d ( L R ) Y 0 = r a n d ( L R )
where LR is the location range of the accidental initial fruit fly swarm. Subsequently, unexpected search direction and distance for foraging of the fruit flies are given by:
X i = X 0 + r a n d ( F R ) Y i = Y 0 + r a n d ( F R )
where FR is the random flight range, so, smell concentration judgment value (S) can be computed by:
S i = 1 / x i 2 + y i 2
To improve the performance of the river flow forecasting, FOA was implemented for choosing optimized values of three SVR parameters including C ,   ε , γ , which are connected to ( S c i , S ε i , S γ i ) (i.e., C = S c i , γ = S γ i , and ε = S ε i ). It should be noted that the values of LR and FR were selected based on a trial and error procedure for minimizing the prediction errors. The flowchart of the mentioned procedure (FOASVR) is displayed in Figure 4.
The differences between the predicted and the actual values were evaluated by mean squared error (MSE) as presented in the equation below:
M S E = i = 1 n ( p i o i ) 2 n
where p i and o i are the ith predicted and observed values and n is the entire number of data. The fruit fly saves the finest smell concentration value and the corresponding coordinate among the swarms, then flies toward the next place. When the new result is not superior to the previous iteration, the iteration number reaches its maximum, or the error of the prediction reaches the predefined value, this process will stop. Therefore, optimal values are acquired, and the model has the best performance with these values.
In this research, data were normalized to be between 0 and 1 because this helps to increase the accuracy of the model and to predict performance [46]. Additionally, LR and FR were chosen to be included at [0, 10] and [−1, 1], respectively; also, the maximum iteration number (maxgen) was equal to 100, and a population size (sizepop) of 20 was selected in order to have reasonable efficiency. Libsvm toolbox was used to run SVR in this article.

4. Evaluation Parameters

In this study, different evaluation parameters were considered for scrutinizing the precision of the examined models for river flow forecasting.
As one of the widely-used statistical parameters, the root mean squared error (RMSE) measures the average amount of error (the difference between predicted and observed flows) appropriately, and it can be determined as follows:
R M S E = 1 n i = 1 n ( Q p ( i ) Q o ( i ) ) 2
where Qp(i), Qo(i), and n represent the predicted river flow, the observed river flow, and the number of observations, respectively.
The bias in the predicted river flow is calculated by the mean absolute error (MAE) which measures the closeness of the predictions to the actual flows. Lower MAE values represent more precise predictions of river flow, either equal or close to the observed values. It is calculated as follows:
M A E = 1 n i = 1 n | Q p ( i ) Q o ( i ) |
The correlation coefficient (R), which describes the amount of linearity among simulated and observed values of river flows, ranges from −1 to 1 and is described as follows:
R = ( i = 1 n Q o ( i ) Q p ( i ) 1 n i = 1 n Q o ( i ) i = 1 n Q p ( i ) ) ( i = 1 n Q o ( i ) 2 1 n ( i = 1 n Q o ( i ) ) 2 ) ( i = 1 n Q p ( i ) 2 1 n ( i = 1 n Q p ( i ) ) 2 )
Also, the Bayesian information criterion (BIC) was utilized in order to specify the best model parsimoniously, which means that the model with fewer input parameters would have better performance in comparison to others. BIC measures models relative to each other; in fact, the model with the best performance has the smallest quantity of the BIC [47]. It is given as follows:
B I C = n × ln ( R S S n ) + K × ln ( n )
where K indicates the number of input parameters and residual sum of squares (RSS) can be determined as follows:
R S S = i = 1 n ( Q p ( i ) Q o ( i ) ) 2
Furthermore, a Taylor diagram (TD), which is a graphical illustration of the observed and forecasted data, was applied to inspect the precision of models [48]. The TD is able to encapsulate some characteristics of the predicted and observed flows at the same time. For instance, this diagram can illustrate RMSE, R, and SD between the forecasted and actual data simultaneously. In the TD, the azimuth angle, the radial distance from the origin, and radial distance from the observed data point denote the R-value, the ratio of the normalized SD, and the RMSE value of the prediction, respectively.

5. Results and Discussion

For evaluating the effects of previous monthly flows, three input combinations were established (Table 2). Moreover, the periodicity effect was inspected by appending a component π (1 to 12 for each month).
The results of the statistical parameters for studied techniques in the test phase for the Babarud station are given in Table 3. From these tables, it is clear that the periodicity considerably increased each model’s accuracy. For the FOASVR model, R increased from 0.63 (for input combination (1)) to 0.82 (for input combination (2)) and similarly, RMSE and MAE indices decreased from 5.74 to 4.36 and from 3.29 to 2.40, respectively. Regarding two previous cases, by adding the periodicity component, R increased from 0.70 to 0.80 and RMSE and MAE decreased from 5.33 to 4.50 and from 2.90 to 2.67, respectively. Finally, in the case of three previous discharge inputs, R increased from 0.67 to 0.79 and RMSE and MAE decreased from 5.69 to 4.58 and from 3.20 to 2.67, respectively. Comparison of FOASVR, M5, and SVR models indicated that the FOASVR-2 model, whose inputs are Qt−1, and π, had better accuracy than the M5 and SVR models. M5 also performed better than the SVR model. Overall, FOASVR performed better than SVR and M5. Also, FOA increased the accuracy of SVR by approximately 27% for RMSE and 38% for MAE in the second scenario, which performed roughly (4% RMSE and 14% MAE) better than M5. Without periodicity, FOASVR-3 indicated a 6% better performance than M5-3, and both models performed better than the SVR-5 model. The relative RMSE and MAE differences between the optimal FOASVR-3 model without periodicity and FOASVR-3 model with periodicity input were 18.2% and 17.2%, respectively. From the BIC point of view, FOASVR-2, M5-2, and SVR-4, with the values of 597.55, 581.85, and 701.18, respectively, had better performance in comparison with other models, which means that these scenarios had parsimonious inputs (accurate results with fewer input parameters). So, for this station input combination, (2) was a reasonable choice. The time variation of observed and predicted river flows by the optimal periodic and non-periodic FOASVR, M5, and SVR models is illustrated in Figure 5, Figure 6, Figure 7 and Figure 8. It can be comprehended from the Figure 5 and Figure 6 that all three periodic and non-periodic models considerably underestimate some peak flows. It seems that the precision of these models decreases with increasing flow rate. However, the superior accuracy of FOASVR and M5 to the SVR model can also be comprehended from these figures. Comparison of Figure 5 and Figure 6 visibly indicate that the periodic models better approximate the observed river flows than the non-periodic models. Figure 9 displays the scattered diagrams of the observed and predicted monthly river flows for each method. It is noticeable from the graphs that the SVR model performs worse than the other two methods, especially in the prediction of peak river flows. Comparison of the two figures reveals that the periodic models produce more accurate estimates than non-periodic models. Also, this figure indicates that all models (periodic and non-periodic) overestimate some low flows, and the periodic models perform worse than periodic ones in estimating peak flows. This may be because the peak flows do not have any high correlation with the time of the year (i.e. the periodicity value).
The test statistics of the FOASVR, M5, and SVR models for the Vaniar station are provided in Table 4. The encouraging influence of the periodicity component on models’ precision is clearly seen for this station. For the FOASVR model, R increased from 0.57 (for input combination (1)) to 0.79 (for input combination (2)) and similarly, RMSE and MAE values decreased from 8.78 to 6.58 and from 4.77 to 3.86, respectively. In the case of two previous discharge inputs, when the periodicity component was added, R increased from 0.55 to 0.80 and RMSE and MAE decreased from 8.88 to 6.48 and from 4.97 to 3.75, respectively. Finally, in the three previous discharge inputs case, R increased from 0.55 to 0.81 and RMSE and MAE values decreased from 8.99 to 6.33 and from 5.53 to 3.71, respectively. Comparison of the three models reveals that the optimal FOASVR-6 model whose inputs are Qt−1, Qt−2, Qt−3, and π performed better than optimal M5-2 comprising Qt−1 and π inputs, and both performed better than the optimal SVR-6 model whose inputs are the same as FOASVR-6. Generally, FOASVR performed better than SVR and M5 models; moreover, the accuracy of SVR was increased by 29.7% and 30.4% related to RMSE and MAE, respectively, in the optimal scenario (FOASVR-6) by applying FOA; also, FOASVR showed 16.8% and 19.7% better performance than M5 in terms of RMSE and MAE, respectively, for this scenario. Without the periodicity component, the optimal FOASVR-1 model performed better than the optimal M5-1 and SVR-3 models. The relative RMSE and MAE differences between the optimal FOASVR-1 model without periodicity and FOASVR-1 model with periodicity input were 25.1% and 19.1%, respectively. The best values for BIC in this station were related to FOASVR-6 with 703.64, M5-2 with 740.34, and SVR-2 with 825.05. In light of the fact that FOASVR-6 was closely followed by FOASVR-4 with the value of 707.09 and FOASVR-2 with the value of 707.53, it is better to choose a combination with fewer input parameters. Thus, the input parameters of Qt−1 and π were selected as a parsimonious scenario for this station, which is similar to the previous station. Figure 7 and Figure 8 demonstrate the time variation of observed and predicted river flows by the optimal periodic and non-periodic FOASVR, M5, and SVR models. As found for the Vaniar station, here also the three periodic and non-periodic models underestimate some peak flows. Comparison of Figure 7 and Figure 8 confirms that appending the periodicity component as the input increases the estimation capacity of the models. The scatter plots of the observed and predicted monthly river flows by each method are shown in Figure 9. As with the previous station, the FOASVR and M5 perform better than the SVR model, especially in the prediction of peak river flows. This figure indicates that the estimates of periodic models are more accurate. According to Figure 9, as at Babarud station, the models overestimate the low flows at the Vaniar station, thereby forecasting shifts from overestimation to underestimation with increasing flow rate.
Furthermore, TDs were utilized to examine SD and R values for the FOASVR, M5, and SVR models. Figure 10 exhibits TDs for all models, where the space from the reference green point is an amount of the centered RMSE. It can be comprehended from Figure 10 that FOASVR (a point with yellow color) provided relatively precise predictions of river flow at both stations.

6. Conclusions

In the current study, three different data-driven techniques, FOASVR, M5, and SVR, were compared for one month of river flow forecasting at two stations located in the Lake Urmia Basin of Iran. Comparison of three periodic models revealed that the periodic FOASVR model had better accuracy than the periodic M5 and SVR models. M5 was also found to achieve more suitable results than the SVR model. Similarly, the comparison of non-periodic models showed that the optimal FOASVR also had better performance than the M5 and SVR models. It was proved that appending a periodicity component significantly increases models’ accuracy in forecasting monthly river flows for both stations. For the Babarud station, the relative RMSE and MAE differences between the optimal periodic and non-periodic FOASVR models were found to be 18.2% and 17.2%, respectively. For the Vaniar station, the periodicity component decreased the RMSE and MAE values of the optimal FOASVR models by 27.9 and 22.2%, respectively. According to BIC, the second input combination (Qt−1 and π) were used as parsimonious inputs for FOASVR with values of 581.85 and 707.53 for Babarud and Vaniar stations, respectively. Generally, the FOASVR models performed better than the other two methods in forecasting monthly river flows. However, all methods had some difficulties in forecasting peak river flows, while the FOASVR models provided a better forecast in this case. The presented advancement in river flow prediction can highly empower operational river management to make better decisions and policies. The hybrid model of FOASVR shows promising results in building accurate models for river flow prediction.

Author Contributions

S.S. (Saeed Samadianfard) and S.S. (Shahaboddin Shamshirband) designed the research; S.J. and E.S. collected hydrological data and interpreted the results, S.S. (Saeed Samadianfard), S.J. and A.M. wrote the initial draft; S.S. (Saeed Samadianfard), S.S. (Shahaboddin Shamshirband) and S.A. revised the manuscript; all authors contributed to the final manuscript.

Funding

This research received no external funding.

Acknowledgments

We acknowledge all three reviewers and Editor-in-Chief for their critical comments that have improved the clarity of our final paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Onyari, E.; Ilunga, F. Application of MLP neural network and M5P model tree in predicting streamflow: A case study of Luvuvhu catchment, South Africa. In Proceedings of the International Conference on Information and Multimedia Technology (ICMT), Hong Kong, China, 28–30 December 2010; pp. 156–160. [Google Scholar]
  2. Harrigan, S.; Prudhomme, C.; Parry, S.; Smith, K.; Tanguy, M. Benchmarking ensemble streamflow prediction skill in the UK. Hydrol. Earth Syst. Sci. 2018, 22, 2023–2039. [Google Scholar] [CrossRef] [Green Version]
  3. Muhammad, A.; Stadnyk, T.; Unduche, F.; Coulibaly, P. Multi-model approaches for improving seasonal ensemble streamflow prediction scheme with various statistical post-processing techniques in the Canadian Prairie region. Water 2018, 10, 1604. [Google Scholar] [CrossRef]
  4. Bou-Fakhreddine, B.; Mougharbel, I.; Faye, A.; Chakra, S.A.; Pollet, Y. Daily river flow prediction based on Two-Phase Constructive Fuzzy Systems Modeling: A case of hydrological–meteorological measurements asymmetry. J. Hydrol. 2018, 558, 255–265. [Google Scholar] [CrossRef]
  5. Baydaroğlu, Ö.; Koçak, K.; Duran, K. River flow prediction using hybrid models of support vector regression with the wavelet transform, singular spectrum analysis and chaotic approach. Meteorol. Atmos. Phys. 2018, 130, 349–359. [Google Scholar] [CrossRef]
  6. Fernando, A.K.; Shamseldin, A.Y.; Abrahart, B.J. River Flow Forecasting Using Gene Expression Programming Models. In Proceedings of the 10th International Conference on Hydroinformatics HIC 2012, Hamburg, Germany, 14–18 July 2012. [Google Scholar]
  7. Rehana, S. River Water Temperature Modelling Under Climate Change Using Support Vector Regression. In Hydrology in a Changing World; Springer: Cham, Switzerland, 2019; pp. 171–183. [Google Scholar]
  8. Yaseen, Z.M.; Sulaiman, S.O.; Deo, R.C.; Chau, K.W. An enhanced extreme learning machine model for river flow forecasting: State-of-the-art, practical applications in water resource engineering area and future research direction. J. Hydrol. 2018, 569, 387–408. [Google Scholar] [CrossRef]
  9. Azad, A.; Farzin, S.; Kashi, H.; Sanikhani, H.; Karami, H.; Kisi, O. Prediction of river flow using hybrid neuro-fuzzy models. Arab. J. Geosci. 2018, 11, 718. [Google Scholar] [CrossRef]
  10. Mosavi, A.; Ozturk, P.; Chau, K.W. Flood prediction using machine learning models: Literature review. Water 2018, 10, 1536. [Google Scholar] [CrossRef]
  11. Choubin, B.; Moradi, E.; Golshan, M.; Adamowski, J.; Sajedi-Hosseini, F.; Mosavi, A. An Ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines. Sci. Total Environ. 2019, 651, 2087–2096. [Google Scholar] [CrossRef] [PubMed]
  12. Qasem, S.N.; Samadianfard, S.; Nahand, H.S.; Mosavi, A.; Shamshirband, S.; Chau, K.W. Estimating Daily Dew Point Temperature Using Machine Learning Algorithms. Water 2019, 11, 582. [Google Scholar] [CrossRef]
  13. Tongal, H.; Booij, M.J. Simulation and forecasting of streamflows using machine learning models coupled with base flow separation. J. Hydrol. 2018, 564, 266–282. [Google Scholar] [CrossRef]
  14. Darwen, P.J. Bayesian model averaging for river flow prediction. Appl. Intell. 2019, 49, 103–111. [Google Scholar] [CrossRef]
  15. Bhattacharya, B.; Solomatine, D.P. Neural networks and M5 model trees in modeling water level–discharge relationship. Neurocomputing 2005, 63, 381–396. [Google Scholar] [CrossRef]
  16. Bhattacharya, B.; Solomatine, D.P. Machine learning in sedimentation modeling. Neural Netw. 2006, 19, 208–214. [Google Scholar] [CrossRef] [PubMed]
  17. Khan, A.S.; See, L. Rainfall-Runoff Modeling Using Data-Driven and Statistical Methods; Institute of Electrical and Electronics Engineers (IEEE): Piscataway, NJ, USA, 2006. [Google Scholar]
  18. Siek, M.; Solomatine, D.P. Tree-like machine learning models in hydrologic forecasting: Optimality and expert knowledge. Geophys. Res. Abstr. 2007, 9, 2–5. [Google Scholar]
  19. Stravs, L.; Brilly, M. Development of a low flow forecasting model using the M5 machine learning method. Hydrol. Sci. 2007, 52, 466–477. [Google Scholar] [CrossRef]
  20. Samadianfard, S.; Nazemi, A.H.; Sadraddini, A.A. M5 model tree and gene expression programming based modeling of sandy soil water movement under surface drip irrigation. Agric. Sci. Dev. 2014, 3, 178–190. [Google Scholar]
  21. Samadianfard, S.; Sattari, M.T.; Kisi, O.; Kazemi, H. Determining flow friction factor in irrigation pipes using data mining and artificial intelligence approaches. Appl. Artif. Intell. 2014, 28, 793–813. [Google Scholar] [CrossRef]
  22. Esmaeilzadeh, B.; Sattari, M.T.; Samadianfard, S. Performance evaluation of ANNs and an M5 model tree in Sattarkhan Reservoir inflow prediction. ISH J. Hydraul. Eng. 2017, 23, 283–292. [Google Scholar] [CrossRef]
  23. Londhe, S.N.; Dixit, P.R. Forecasting Stream Flow Using Model Trees. Int. J. Earth Sci. Eng. 2011, 4, 282–285. [Google Scholar]
  24. Sattari, M.T.; Pal, M.; Apaydin, H.; Ozturk, F. M5 Model Tree Application in Daily River Flow Forecasting in Sohu Stream, Turkey. Water Resour. 2013, 40, 233–242. [Google Scholar] [CrossRef]
  25. Liong, S.Y.; Sivapragasam, C. Flood stage forecasting with support vector machines. J. AWRA 2002, 38, 173–186. [Google Scholar] [CrossRef]
  26. Yu, X.Y.; Liong, S.Y.; Babovic, V. EC-SVM approach for realtime hydrologic forecasting. J. Hydroinf. 2004, 6, 209–233. [Google Scholar] [CrossRef]
  27. Kalteh, A.M. Monthly river flow forecasting using artificial neural network and support vector regression models coupled with wavelet transform. Comput. Geosci. 2013, 54, 1–8. [Google Scholar] [CrossRef]
  28. Wu, C.L.; Chau, K.W.; Li, Y.S. River stage prediction based on a distributed support vector regression. J. Hydrol. 2008, 358, 96–111. [Google Scholar] [CrossRef] [Green Version]
  29. Londhe, S.; Gavraskar, S.S. Forecasting One Day Ahead Stream Flow Using Support Vector Regression. Aquat. Procedia 2015, 4, 900–907. [Google Scholar] [CrossRef]
  30. Cao, G.; Wu, L. Support vector regression with fruit fly optimization algorithm for seasonal electricity consumption forecasting. Energy 2016, 115, 734–745. [Google Scholar] [CrossRef]
  31. Lijuan, W.; Guohua, C. Seasonal SVR with FOA algorithm for single-step and multi-step ahead forecasting in monthly inbound tourist flow. Knowl.-Based Syst. 2016, 110, 157–166. [Google Scholar] [CrossRef]
  32. Kurup, P.U.; Dudani, N.K. Neural networks for profiling stress history of clays from PCPT data. J. Geotech. Geoenviron. Eng. 2014, 128, 569–579. [Google Scholar] [CrossRef]
  33. Pal, M. M5 model tree for land cover classification. Int. J. Remote Sens. 2006, 27, 825–831. [Google Scholar] [CrossRef]
  34. Samadianfard, S.; Delirhasannia, R.; Kisi, O.; Agirre-Basurko, E. Comparative analysis of ozone level prediction models using gene expression programming and multiple linear regression. Geofizika 2013, 30, 43–74. [Google Scholar]
  35. Deo, R.C.; Ghorbani, M.A.; Samadianfard, S.; Maraseni, T.; Bilgili, M.; Biazar, M. Multi-layer perceptron hybrid model integrated with the firefly optimizer algorithm for windspeed prediction of target site using a limited set of neighboring reference station data. Renew. Energy 2018, 116, 309–323. [Google Scholar] [CrossRef]
  36. Samadianfard, S.; Asadi, E.; Jarhan, S.; Kazemi, H.; Kheshtgar, S.; Kisi, O.; Sajjadi, S.; Abdul Manaf, A. Wavelet neural networks and gene expression programming models to predict short-term soil temperature at different depths. Soil Tillage Res. 2018, 175, 37–50. [Google Scholar] [CrossRef]
  37. Available online: https://earth.google.com/web/@32.205151,53.07029487,2852.42968574a,2667368.97567809d,35y,0.11753984h,16.72644158t,-0r (accessed on 21 February 2019).
  38. Witten, I.H.; Frank, E. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations; Morgan Kaufmann: San Francisco, CA, USA, 2005. [Google Scholar]
  39. Quinlan, J.R. Learning with continuous classes. In Proceedings of the Fifth Australian Joint Conf. on Artificial Intelligence, Hobart, Tasmania, 16–18 November 1992; Adams, A., Sterling, L., Eds.; World Scientific: Singapore, 1992; pp. 343–348. [Google Scholar]
  40. Vapnik, V. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995. [Google Scholar]
  41. Gunn, S.R. Support Vector Machines for Classification and Regression, Technical Report; University of Southampton: Southampton, UK, 1998. [Google Scholar]
  42. Cimen, M. Estimation of daily suspended sediments using support vector machines. Hydrol. Sci. J. 2008, 53, 656–666. [Google Scholar] [CrossRef]
  43. Smola, A.J.; Scholkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef] [Green Version]
  44. Wu, C.-H.; Tzeng, G.-H.; Lin, R.-H. A Novel hybrid genetic algorithm for kernel function and parameter optimization in support vector regression. Expert Syst. Appl. 2009, 36, 4725–4735. [Google Scholar] [CrossRef]
  45. Pan, W.-T. A new Fruit Fly Optimization Algorithm: Taking the financial distress model as an example. Knowl.-Based Syst. 2012, 26, 69–74. [Google Scholar] [CrossRef]
  46. Chang, C.-C.; Lin, C.-J. Training v-support vector classifiers: Theory and algorithms. Neural Comput. 2001, 13, 2119–2147. [Google Scholar] [CrossRef] [PubMed]
  47. Burnham, K.P.; Anderson, D.R. Model Selection and Inference: A Practical Information-Theoretic Approach, 2nd ed.; Springer: New York, NY, USA, 2002. [Google Scholar]
  48. Taylor, K.E. Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res. Atmos. 2001, 106, 7183–7192. [Google Scholar] [CrossRef]
Figure 1. Babarud and Vaniar stations, located at Lake Urmia Basin [37].
Figure 1. Babarud and Vaniar stations, located at Lake Urmia Basin [37].
Water 11 01934 g001
Figure 2. Schematic configuration of the SVR model.
Figure 2. Schematic configuration of the SVR model.
Water 11 01934 g002
Figure 3. Food searching process utilized by the fruit fly iteratively.
Figure 3. Food searching process utilized by the fruit fly iteratively.
Water 11 01934 g003
Figure 4. FOASVR flowchart.
Figure 4. FOASVR flowchart.
Water 11 01934 g004
Figure 5. The observed and forecasted monthly river flows for Babarud station without periodicity.
Figure 5. The observed and forecasted monthly river flows for Babarud station without periodicity.
Water 11 01934 g005
Figure 6. The observed and forecasted monthly river flows for Babarud station with periodicity.
Figure 6. The observed and forecasted monthly river flows for Babarud station with periodicity.
Water 11 01934 g006
Figure 7. The observed and forecasted monthly river flows for Vaniar station without periodicity.
Figure 7. The observed and forecasted monthly river flows for Vaniar station without periodicity.
Water 11 01934 g007
Figure 8. The observed and forecasted monthly river flows for Vaniar station with periodicity.
Figure 8. The observed and forecasted monthly river flows for Vaniar station with periodicity.
Water 11 01934 g008
Figure 9. The scatter plots of the observed and forecasted monthly river flows. (a) Babarud station without periodicity, (b) Babarud station with periodicity, (c) Vaniar station without periodicity, (d) Vaniar station with periodicity.
Figure 9. The scatter plots of the observed and forecasted monthly river flows. (a) Babarud station without periodicity, (b) Babarud station with periodicity, (c) Vaniar station without periodicity, (d) Vaniar station with periodicity.
Water 11 01934 g009aWater 11 01934 g009b
Figure 10. Taylor diagrams (TDs) of the monthly predicted river flow. (a) Babarud station, (b) Vaniar station.
Figure 10. Taylor diagrams (TDs) of the monthly predicted river flow. (a) Babarud station, (b) Vaniar station.
Water 11 01934 g010
Table 1. Statistical parameters of the implemented data (Xmean, Xmax, Xmin, Sx, Csx, a1, a2, a3 denote the overall mean, maximum, minimum, standard deviation, skewness, lag-1, lag-2, lag-3 auto-correlation coefficients, respectively).
Table 1. Statistical parameters of the implemented data (Xmean, Xmax, Xmin, Sx, Csx, a1, a2, a3 denote the overall mean, maximum, minimum, standard deviation, skewness, lag-1, lag-2, lag-3 auto-correlation coefficients, respectively).
StationData SetXmean (m3/s)Xmax (m3/s)Xmin (m3/s)Sx (m3/s)Csx (m3/s)r1r2r3
BabarudTraining data8.7566.500.009.632.050.700.25−0.07
Testing data4.7143.270.007.372.540.590.14−0.12
Entire data7.7466.500.009.282.130.690.25−0.05
VaniarTraining data14.28178.290.0021.352.940.620.15−0.11
Testing data5.6665.300.0010.503.020.500.11−0.05
Entire data12.13178.290.0019.583.190.630.18−0.07
Table 2. Input parameters of the established models.
Table 2. Input parameters of the established models.
ModelInput ParametersOutput Parameters
1Qt−1Qt
2Qt−1, πQt
3Qt−1, Qt−2Qt
4Qt−1, Qt−2, πQt
5Qt−1, Qt−2, Qt−3Qt
6Qt−1, Qt−2, Qt−3, πQt
Table 3. The evaluation parameters of studied models in the test period for Babarud Station.
Table 3. The evaluation parameters of studied models in the test period for Babarud Station.
Model InputModelStatistical Parameters
RMSE (m3/s)MAE (m3/s)RBIC
Qt1SVR-16.104.100.59706.88
M5-15.943.620.61696.57
FOASVR-15.743.290.63683.28
Qt1, πSVR-25.973.880.61703.79
M5-24.542.730.80597.55
FOASVR-24.362.400.82581.85
Qt1, Qt2SVR-35.984.040.62704.44
M5-35.793.490.68691.92
FOASVR-35.332.900.70659.80
Qt1, Qt2, πSVR-45.853.830.64701.18
M5-44.552.830.80603.67
FOASVR-44.502.670.80599.39
Qt1, Qt2, Qt3SVR-55.913.900.62705.14
M5-55.793.500.68697.18
FOASVR-55.693.200.67690.42
Qt1, Qt2, Qt3, πSVR-65.823.770.64704.46
M5-64.542.840.80608.09
FOASVR-64.582.670.79611.49
Table 4. The evaluation parameters of studied models in the test period for Vaniar Station.
Table 4. The evaluation parameters of studied models in the test period for Vaniar Station.
Model InputModelStatistical Parameters
RMSE (m3/s)MAE (m3/s)RBIC
Qt1SVR-19.335.600.50831.52
M5-19.575.440.54840.91
FOASVR-18.784.770.57809.04
Qt1, πSVR-29.045.310.52825.05
M5-27.194.460.77740.34
FOASVR-26.583.860.79707.53
Qt1, Qt2SVR-39.215.570.53831.95
M5-39.805.460.59854.92
FOASVR-38.884.970.55818.45
Qt1, Qt2, πSVR-48.965.330.54826.99
M5-47.584.640.76765.10
FOASVR-46.483.750.80707.09
Qt1, Qt2, Qt3SVR-59.225.730.52837.57
M5-59.795.550.60859.76
FOASVR-58.995.530.55828.22
Qt1, Qt2, Qt3, πSVR-69.015.530.53834.27
M5-67.614.620.75771.78
FOASVR-66.333.710.81703.64

Share and Cite

MDPI and ACS Style

Samadianfard, S.; Jarhan, S.; Salwana, E.; Mosavi, A.; Shamshirband, S.; Akib, S. Support Vector Regression Integrated with Fruit Fly Optimization Algorithm for River Flow Forecasting in Lake Urmia Basin. Water 2019, 11, 1934. https://doi.org/10.3390/w11091934

AMA Style

Samadianfard S, Jarhan S, Salwana E, Mosavi A, Shamshirband S, Akib S. Support Vector Regression Integrated with Fruit Fly Optimization Algorithm for River Flow Forecasting in Lake Urmia Basin. Water. 2019; 11(9):1934. https://doi.org/10.3390/w11091934

Chicago/Turabian Style

Samadianfard, Saeed, Salar Jarhan, Ely Salwana, Amir Mosavi, Shahaboddin Shamshirband, and Shatirah Akib. 2019. "Support Vector Regression Integrated with Fruit Fly Optimization Algorithm for River Flow Forecasting in Lake Urmia Basin" Water 11, no. 9: 1934. https://doi.org/10.3390/w11091934

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop