Next Article in Journal
Dam Site Suitability Mapping and Analysis Using an Integrated GIS and Machine Learning Approach
Previous Article in Journal
Leaf Wetness Duration Models Using Advanced Machine Learning Algorithms: Application to Farms in Gyeonggi Province, South Korea
Previous Article in Special Issue
Prediction of Seasonal Frost Heave Behavior in Unsaturated Soil in Northeastern China Using Interactive Factor Analysis with Split-Plot Experiments and GRNN
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Groundwater Recharge Prediction Using Linear Regression, Multi-Layer Perception Network, and Deep Learning

1
College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
2
CSIRO Land and Water, Glen Osmond, SA 5064, Australia
3
Faculty of Education, Tianjin Normal University, Tianjin 300387, China
4
CSIRO Land and Water, Private Bag 5, Wembley, WA 6913, Australia
*
Author to whom correspondence should be addressed.
Water 2019, 11(9), 1879; https://doi.org/10.3390/w11091879
Submission received: 8 July 2019 / Revised: 3 September 2019 / Accepted: 5 September 2019 / Published: 10 September 2019

Abstract

:
As the largest freshwater storage in the world, groundwater plays an important role in maintaining ecosystems and helping humans adapt to climate change. However, groundwater dynamics, such as groundwater recharge, cannot be measured directly and is influenced by spatially and temporally complex processes, models are therefore required to capture the dynamics and provide scientific advice for decision-making. This paper developed, estimated and compared the performance of linear regression, multi-layer perception (MLP) and Long Short-Term Memory (LSTM) models in predicting groundwater recharge. The experimental dataset consists of time series of annual recharge from the year 1970 to 2012, based on water table fluctuation estimates from 465 bores in the states of South Australia and Victoria, Australia. We identified the factors that influenced groundwater recharge and found that the correlation between rainfall and groundwater recharge was strongest. The linear regression model had the poorest fitting performance, with the root mean squared error (RMSE) being greater than 0.19 when various proportions of training data were considered. The MLP model outperformed the linear regression in the prediction capability, achieving RMSE = 0.11 when 80% of training data was considered. The LSTM model was found to have the best performance, whose root mean squared errors were less than 0.12 when various proportions of training data were applied. The relative importance of influential predictors was evaluated using the above three models.

1. Introduction

Robust groundwater recharge estimates are a primary requirement for effective management of water resources and sustainable use of groundwater [1], which plays an important role in the sustainable development of regional societies and economies [2,3]. Groundwater recharge is one of the most difficult components of the water balance to estimate since it cannot be directly measured [4,5] and it is influenced by spatially and temporally complex processes. Models are usually required to help stakeholders understand groundwater recharge, identify the key processes influencing the rate of groundwater recharge, and to inform pathways for sustainable water resources management [6].
Previous studies on groundwater recharge have outlined numerous methods for estimating recharge, including chemical tracers [7,8,9,10,11], physical methods [12,13,14,15,16,17] and mathematical approaches [18,19]. The most widely used chemical tracer method is the chloride mass-balance method (CMB) [7,8,9,10], because it is conceptually simple and inexpensive to implement. However, the CMB method cannot estimate the negative component (groundwater evapotranspiration) of net recharge [1], therefore cannot be applied to groundwater discharge regions. Physical methods include water balance estimation (WB) [15,16,17] and the water-table fluctuation method (WTF) [4,12,13,14]. The WB method provides an estimate of net recharge—a combination of recharge to and evapotranspiration from the groundwater, while the water fluctuation method provides an estimate of gross recharge [20]. These methods provide different estimates of recharge, as they include or neglect factors such as runoff, evapotranspiration and changes in soil moisture, and represent either instantaneous estimates or historical averages. In mathematical simulation and statistical methods, linear regression is often adopted for estimating groundwater recharge [7,18,19]. Crosbie et al. [7] used regression kriging to predict regional groundwater recharge across the Australian continent. Global regression equations were applied to data-sparse areas while kriging of regression equation residuals was used for data-dense areas. Fu et al. [18] also used multiple linear regression models together with 71 climate variables and 17 non-climate variables to analyze the groundwater recharge in South Australia. Mathematical simulation models can provide new insights into the factors that affect groundwater recharge.
Multiple linear regression (MLR) is a linear approach for modeling the relationship between input parameters and resulting metrics. In recent studies, this method is applied to model and analyze groundwater recharge. Mogaji et al. [21] estimated and predicted groundwater recharge rates based on the relationship between rainfall and geophysical parameters in the southern part of Perak, Malaysia. Figura et al. [22] predicted groundwater temperatures in several aquifers in Switzerland based on the relationship between observed groundwater and regional air temperature. Ebrahimi et al. [23] simulated groundwater level variations in the Qom plain, Iran based on linear regression, neural network and support vector machine.
Artificial neural networks (ANNs) enhance the expressive ability of the system based on the collection of connected nodes called artificial neurons. Shamshirband et al. [24] proposed a multi-wavelet ANN for forecasting of chlorophyll a concentration. More recently, the ANN method has been used to solve groundwater-related problems [25,26,27]. Mohanty et al. [28] applied the ANN model to the weekly prediction of groundwater levels in various bores based on expert knowledge and statistical analysis. The ANN was used to predict the groundwater level in a swamp forest in Singapore based on rainfall and surrounding reservoir levels [29]. Pasandi et al. [30] applied the ANN to estimating water-table depth in Shibkooh, Iran using various ancillary data, such as aquifer bed elevation and aquifer thickness.
Recent studies have shown that deep learning has broad prospects in groundwater recharge. Deep neural networks have been shown to be suitable for groundwater management. Kong-A-Siou et al. [31] proposed a recurrent multilayer perceptron for predicting the water table level using rainfall and pumping discharge data. Jiang et al. [32] applied a super-resolution convolutional neural network for classifying paleovalleys, which are significant in groundwater exploration as productive aquifers are often formed there. In recent years, computational advances in processing speed and data storage mean that numerically intensive analyses are now possible at large scales and at a declining cost. Models based on machine learning [33] and deep learning [34] are widely used in many fields, such as forest cover projection [34], climate forecasting [35], flood and typhoon forecasting [36]. The typical regression methods used in these studies, such as linear regression, neural networks, and deep learning, have potential for improving the performance in predicting groundwater recharge dynamics.
The paper developed, estimated and compared the performance of linear regression, multi-layer perception (MLP) and Long Short-Term Memory (LSTM) models in predicting groundwater recharge based on water table fluctuation estimates from 465 bores from the year 1970 to 2012. The main purpose of this study was to estimate groundwater recharge in an area straddling the South Australian and Victorian border using the three machine learning methods. Machine learning, especially deep learning approaches have the potential to improve the non-linear expression ability of a system, not only improving the performance of the model, but enhancing the stability of the whole model. In Section 2, we describe the research domain and datasets used in this research. In Section 3, three time series models adopted for predicting regional groundwater recharge are presented. In Section 4, we firstly analyze the correlation coefficients of influential predictors for groundwater recharge estimation. Subsequently, we examine the efficiency of the three models on temporal prediction of groundwater recharge. Finally, we measure the relative importance of influential predictors using the three models. The findings are discussed and summarized in Section 5 and Section 6, respectively.

2. Datasets

2.1. Study Area

Our research area is located in the Otway and Murray Basins in south-eastern South Australia, referred to as the South East (Figure 1). The area is characterized by the tertiary confined sands aquifer known as the Dilwyn Formation covering an area of 29,000 km2 [4]. It is overlain by the unconfined Gambier/Murray Group limestone aquifer. The area is relatively flat, with the highest altitude in the north-east part of the region and land generally sloping downward south and west towards the coast. The surface in places is undulating owing to the dune/flat systems from Pleistocene marine transgressions [4].
The region has a typical Mediterranean climate, with hot dry summers and cool wet winters [37]. The highest annual precipitation is in the southern part of the region, gradually decreasing in the inland area. Annual precipitation is less than annual potential evapotranspiration in almost all parts of the study area [38]. Leaney et al. [39] found that based on the karstic features of the landscape, any runoff could quickly infiltrate into the groundwater. Crosbie et al. [4] reported that most of the groundwater recharge in the region occurred during winter due to the lower potential evapotranspiration and higher rainfall at that time.

2.2. Groundwater Recharge and Potential Variable Datasets

In this study, time series data of annual groundwater recharge developed by Crosbie et al. [4] were used. The groundwater level data in the experiment were evaluated from monthly or semi-annual measurements and used to estimate groundwater recharge using the water-table fluctuation (WTF) method. This method provides an estimate of groundwater recharge through the analysis of water-level fluctuations in groundwater observation wells [40,41].
The development of the groundwater recharge data used in the paper was fully described by Crosbie et al. [42] and has been already applied in a few studies, such as [4,20]. It includes recharge data from 465 groundwater bores, mostly located in South Australia with the remaining few in Victoria. Figure 1 shows the location of the study area in Australia and the spatial distributions of the 465 bores. The length of the time series on groundwater recharge varies from 3 to 41 years, based on the length of the groundwater observation records. The dataset is suitable for regional long term average study on groundwater recharge, because more than 70% of bores have an observation record of more than 15 years, and these bores are distributed throughout the study area. The atmospheric demand variables and groundwater extraction dataset were also used for predicting groundwater recharge.
The SILO (Scientific Information for Land Owners) Drill data [43] used in this study consists of 0.05° gridded daily data of atmospheric variables across Australia, shown in Table 1. SILO datasets are constructed from observational records provided by the Bureau of Meteorology (BoM), which have been processed to infill missing data with interpolated values using smoothing splining and kriging algorithms. Rainfall, maximum temperature and minimum temperature are directly measured variables, while actual evaporation and Morton actual evapotranspiration were derived from pan evaporation and other measured variables [43]. The data were all sampled daily from an interpolated dataset provided by SILO [43]. All the data can be accessed at https://legacy.longpaddock.qld.gov.au/silo/about.html.
Actual evaporation was derived from Class-A pan evaporation. A monthly (or seasonal) actual evaporation value was calculated by adding daily pan evaporation in the month (or season). Similarly, annual actual evaporation was calculated by adding monthly actual evaporation values over the corresponding year. The Morton actual evapotranspiration (MAET) was calculated by the complementary relationship between areal potential evapotranspiration (APET) and point potential evapotranspiration (PPET) in [44]. In Morton’s model [45], APET was estimated using modified Priestley-Taylor equation [44] based on psychrometric constant, atmospheric pressure, slope of saturation vapour pressure and net radiation at equilibrium temperature. Furthermore, PPET was estimated by solving energy and vapour transfer equations simultaneously [44] based on air temperature, equilibrium temperature, net radiation at air temperature, saturation vapour pressure and actual vapour pressure respectively.
In this study, the data from 1970 to 2012 were considered. Maximum temperature and minimum temperature were aggregated to a monthly average, and rainfall and evaporation were aggregated to monthly totals. The monthly quantities were used to capture the seasonal features of various predictors and their impacts on groundwater recharge. The study area in the paper was the same as that in Fu et al. [18]. Fu et al. analyzed extreme rainfall variables containing 99th percentile of rainfall, 95th percentile of rainfall and maximum daily rainfall. They found that there was no close relationship between the above three finer scale predictors and groundwater recharge. Therefore, we did not select finer scale predictors, except RD (rainfall days greater than 1 mm) and RI (rainfall intensity) for groundwater recharge analysis. Compared with Fu et al. [18], the novel contribution of the paper was introducing the machine learning and deep learning methods for predicting groundwater recharge. In addition, 465 bores in the states of South Australia and Victoria were used in the paper instead of 426 bores only in the state of South Australia in Fu et al. [18].
The potential influential predictors are shown in Table 1. These are yearly time-series from 1970 to 2012 averaged over the study area. How these yearly predictors were calculated is described as follows. Firstly, daily or monthly rainfall and evaporation values were added up over 12 months (such as Rainfall and ET) or several months (such as Rainfall4-10 or ET5-9) for each year (from 1970 to 2012) in each bore. Daily or monthly maximum and minimum temperature values were averaged across 12 months or several months for each year in each bore. The average and maximum wet-spell as well as dry-spell days were calculated in terms of corresponding daily values for each year in each bore. Subsequently, the regional values for these predictors were averaged across all available bores for each year (for early years, data for some bores are not available) in the case study area. The time series data of influential predictors were obtained using the same method as that for the groundwater recharge. That is, if only 50 bores have groundwater recharge data for a specific year, then regional influential predictors for that year were calculated only based on the locations of those 50 bores. This provides spatial consistency between groundwater recharge values and the corresponding influential predictors. According to [18], the seasonal rainfalls during the winter period (May to September) and summer period (April to October) are critical for groundwater recharge prediction in the South East. The seasonal rainfall and evaporation were included as influential predictors in the analysis, as were estimates of Annual Morton actual evapotranspiration, mean and maximum wet/dry spell-length days, rainfall days and rainfall intensity. Fu et al. [18] analysed the impacts of extreme rainfall variables (containing the 99th percentile of rainfall, and the 95th percentile of rainfall and maximum daily rainfall) on recharge. Their results showed that these predictors and groundwater recharge were not very relevant. Therefore, we did not select many extreme rainfall predictors, except RD and RI for groundwater recharge analysis.
The dataset was developed based on measured groundwater extraction [46] from 2009 to 2013. Groundwater extraction values during the period 1970–2008 were estimated based on the drill date for the bore, and the assumption that the rate of average groundwater extraction for each bore was constant.

3. Methods

3.1. Linear Regression

In statistics, linear regression is a regression analysis approach for modeling the relationship between various influential predictors and a resulting variable. That is, a linear model is established to fit a relationship between the components of the explanatory predictor datasets and resulting variables. The method of least squares is commonly used when linear regression is applied.
Given the dataset { y i , x i 1 , , x i p } i = 1 n ( i is the time index, starting from 1 to n ) with p explanatory predictors (xi1 to xip) and groundwater recharge observations (yi) containing n statistical values (here n = 43 in our case as our time series data are available from 1970 to 2012, totaling 43 years), the model assumes that the relationship between the vector of explanatory predictors and resulting predicted groundwater recharge is linear. Thus, the model has the following form:
h θ ( x i ) = θ 0 + θ 1 x i 1 + + θ p x i p = x i θ , i = 1 , , n
where hθ(xi) is the estimated resulting variable, xi (xi1 to xip) is the explanatory predictors, θ0 is intercept, and θp is slope coefficients for each explanatory predictor.
In this paper, the explanatory predictors for the linear model are the 20 potential variables listed in Table 1. The result variable h θ ( x i ) is the spatially averaged values of groundwater recharge. The whole dataset have 43 groups of data from year 1970 to 2012. That is, i is the data subscript. The linear relationship between explanatory predictors and groundwater recharge values are established when the parameter matrix θ of the model is learned through a training process. The predicted values are calculated from the learned parameter matrix. The fitting and error results are obtained from the difference between predicted groundwater recharge using the linear regression model and observed values of groundwater recharge.

3.2. Multi-Layer Perception Network

Multi-layer perception network (MLP) is a typical representative of feedforward artificial neural networks. It consists of three parts: an input layer, a hidden layer and an output layer. All of nodes in the hidden and output layers are neurons using a non-linear activation function. Furthermore, the hidden layer can be composed of multiple layers of neurons. A supervised learning technique is applied for training the MLP network, called backward propagation. The multiple layered structure and non-linear activation function in the MLP network distinguishes it from linear regressions. Non-linear data relationships can be distinguished by the MLP network.
The MLP network is sometimes referred to as traditional neural network, especially when it only has a single hidden layer. In a MLP network, all the neurons in the hidden and output layers use nonlinear activation functions to simulate the action potential of biological neurons. In this paper, a Rectified Linear Unit (ReLU) function is used for the activation function in all of neurons using Equation (2). Compared with logistic sigmoid function and hyperbolic tan function, the ReLU has better performance. It can solve the problems of gradient explosion and gradient disappearance, and maintains the convergence rate in a stable state.
R e L U ( x ) = { x i f   x > 0 0 i f   x 0
Since the network is fully connected, each node in one layer connects to every node in the next layer with a certain weight. Therefore, the output c of each neuron is:
c = φ ( i w i a i + b )
where a i and w i are the inputs and weights of the neuron respectively, b is the bias of the current neuron and φ is the activation function ReLU.
In this paper, the MLP network is composed of three layers (an input, one hidden layer and an output layer) of nonlinearly-activating modes (Figure 2). The input layer has 20 nodes, which corresponds to the 20 potential influential predictors that are listed in Table 1. There are 100 neurons in the hidden layer. All of the weights and biases of the linkages between the potential variables and groundwater recharge values are updated when the parameters of the network are learned based on the training data. Then the predicted values are calculated based on the learned weights and biases when the input data is applied. The output layer only had one neuron, which represented the values of groundwater recharge.
At the beginning of the training, network weights and biases were assigned randomly. The solver for weight optimization is a stochastic gradient-based optimizer (Adam). The algorithm then projected these forward from the input layer to the hidden layer based on Equation (3). The results from the hidden layer were propagated to the output layer and the error between value of the output layer and the observed groundwater recharge in training data was obtained. By propagating the error iteratively back to the network, the connection weights and biases were automatically adjusted until the network error reaches a pre-determined value. The predicted values could then be calculated using the input values of 20 potential variables based on the trained network parameters.

3.3. LSTM Model for Regression

To solve the problem of vanishing gradient in traditional recurrent neural networks, Hochreiter and Schmidhuber [47] proposed LSTM, which introduces a new structure called a memory cell to develop persistent long-term dependencies.
The structure of the whole algorithm is composed of one input layer, one output layer and three LSTM layers. The process of training and predicting depends on the temporal sequence features of the data being predicted. The system has T input vectors, x T = 1 , , t T , which represent t T -year groundwater values. Similarly, the system output also has T prediction vectors x T = 1 , , t P indicating t P -year prediction results. For this study, the temporal sequence features from the regional average groundwater recharge was developed based on the 465 bores located in the study area from 1970 to 2012.
In order to avoid the proposed model from being overfitted, a method called dropout was used, with dropout rate 0.25 applied to the output of each LSTM layer. It was a regularization technique for reducing overfitting in the model by preventing complex co-adaptations on training data, and improved the accuracy of prediction data. The mean squared error (Equation (4)) was selected as the loss function to regularize the training process, where y ^ t and y t represent for the predicted values and observed values in year t .
l o s s = 1 T t = 1 T ( y ^ t y t ) 2
The internal structure of a memory cell in a LSTM layer is shown in Figure 3. The memory cell is composed a forget gate f l , t , an input gate i l , t , a new memory unit n l , t and an output gate o l , t , where l and t are the number of LSTM layers and the current time step respectively. W l t and U l , t are the weight matrices corresponding to the input x l , t k and previous hidden state h l , t 1 k , and b l , t is the bias vectors on the memory cell. The superscripts f, i, n and o represent the forget gate, input gate, new memory unit and output gate respectively.
The forget gate f l , t is calculated as follows:
f l , t = σ ( W l , t f x l , t k + U l , t f h l , t 1 k + b l , t f )
The logistic sigmoid function 1 1 + e x is applied as the activate function, σ , on the gates.
Similarly, the input gate and output gate are defined as follows:
i l , t = σ ( W l , t i x l , t k + U l , t i h l , t 1 k + b l , t i )
o l , t = σ ( W l , t o x l , t k + U l , t o h l , t 1 k + b l , t o )
The hidden state h l , t k and output cell state c l , t k are denoted as follows:
h l , t k = t a n h ( c l , t k ) o l , t
c l , t k = i l , t n l , t + f l , t c l , t 1 k
where is the Hadamard product of two vectors. The new memory unit, n l , t , can be calculated as follows:
n l , t = t a n h ( W l , t n x l , t k + U l , t n h l , t 1 k + b l , t n )
The three LSTM layers are followed by an output layer in the fully connected structure. The prediction vector, y ^ t , in time, t , can be calculated using Equation (11).
y ^ t = σ ( W o u t y ^ k , t + b o u t )
where W o u t and b o u t are the weight matrix and bias vector in the output layer respectively. A logistic sigmoid function is used as the activate function in the output layer for predicting groundwater recharge in year t .

3.4. Model Testing and Comparison

The linear regression, MLP and LSTM models were run three times, trained with 70%, 80% and 90% of the observation data, then their predictions were compared with 30%, 20% and 10% of the observation data as a model validation. The statistical measures used for comparing the model performance were the root mean squared error (RMSE) and coefficient of determination ( R 2 ). The RMSE and R 2 were calculated between the predicted groundwater recharge values and observed recharge values using Equations (12) and (13), to measure the precision and bias of groundwater recharge predictions, respectively.
R M S E = i = 1 n ( y i y ^ i ) 2 n
R 2 = 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i 1 n i = 1 n y i ) 2
where y i and y ^ i are the observed and estimated values respectively and n is the number of the observed values in the testing data. The R 2 should be close to 1 to indicate strong model performance, and the RMSE should be as close to zero as possible. The two measures are applied for estimating the prediction of groundwater recharge by the three models. In the statistical analysis, R-squared is a relative measure of fit. Its advantage is that it can be used to measure the whole trend of fitted data. As the square root of a variance, RMSE is an absolute measure of fit. It can measure the error between the whole prediction data and observation data directly. Therefore, in order to ensure the reliability of the results, the relative indicator R 2 and absolute indicator RMSE are used to measure the fitted and prediction effects together.
The linear regression model and the MLP model were implemented based on the Python packages StatsModels and scikit-learn, respectively. The LSTM model was built using Tensorflow, which is an end-to-end machine learning platform in Python. All the modelling, data analysis and visualization were conducted in the environment of Python 3.5. TensorFlow is an open source machine learning library based on data flow graphs. It has better computational graph visualizations, and is supported by Google. As a popular framework, Tensorflow is adopted in the work. The environment Python is selected in the work owing to its cross-platform processing capability, open source code, presence of third-party modules and extensive support libraries.

4. Results

The annual time series of regional groundwater recharge was established by averaging the groundwater recharge estimates temporally for each year and spatially based on data from all available bores. The three models were then developed to predict groundwater recharge using the full set of influential predictors. The relative importance of each predictor in predicting groundwater recharge was measured under the three temporal models. Furthermore, all of weights and biases were initialized using Xavier initializer. Then, all of the parameters were optimized based on a stochastic gradient-based optimizer (Adam). The method of initializing and optimizing parameters can ensure that all of parameters maintain the good performance in training and prediction.

4.1. Correlation Coefficients between Potential Variables and Groundwater Recharge

The time series of annual regional groundwater recharge and potential predictors from 1970 to 2012 were applied to establish the temporal regression models. The Pearson correlation coefficients between regional groundwater recharge and each influential predictor are shown in Figure 4.
As expected, the correlation coefficient between rainfall and groundwater recharge was the highest. Annual rainfall had a correlation coefficient of 0.8, rainfall from May to September had a correlation coefficient of 0.88 and for rainfall from April to October it was 0.9. The rainfall and minimum temperature had a positive correlation with groundwater recharge, while in contrast, the evaporation and maximum temperature had negative correlation with groundwater recharge. This could be because these influential predictors influence the negative component of groundwater recharge, or evapotranspiration. Almost all the absolute values of correlation coefficients for evaporation, maximum temperature and minimum temperature were between 0.3 and 0.4.
Other influential predictors, MeWS, MxWS, RD and RI had positive correlation coefficients, as they are associated with rainfall and infiltration, while MeDS, MxDS and Extraction had negative correlation coefficients as they are associated with evapotranspiration processes. The RD correlation magnitude with groundwater recharge (r = 0.76) was higher than that of RI (r = 0.45). It indicated that groundwater recharge was a process of gradual accumulation, because at an annual time scale, the number of rainfall days was more important than rainfall intensity (Figure 4). As expected, the correlation magnitudes for mean and max Wet-Spell length (r = 0.66, r = 0.63) were higher than that of the mean and max Dry-Spell length (r = −0.50, r = −0.15), as wet days are likely to result in groundwater recharge than dry days. Groundwater extraction had a negative correlation coefficient (r = −0.43) with groundwater recharge. This can be attributed to the fact that groundwater extraction is simply a negative recharge. Furthermore, it has been shown that lower water tables can lead to lower recharge [42], and this is consistent with the negative correlation relationship between groundwater extraction and recharge.

4.2. Temporal Prediction of Groundwater Recharge

The time series of regional groundwater recharge based on the average values of groundwater recharge from 1970 to 2012 are shown in Figure 5a–c, marked as “Observation”. Figure 5a shows the results from the linear regression method, which was trained with the first 80% of the observed data, and then predicted groundwater recharge for the last 20% of the time series. Similarly, Figure 5b,c demonstrate the results from the MLP model and the LSTM model, respectively. Both models were trained using the first 80% of the observed data and predicted the rest 20% of the time series.
As shown in Figure 5a, there was relatively good fitness and several small mismatches between the trained data and observation for the linear regression model. However, the fitting performance between the predicted data and observation was poor, especially from 2010 to 2012, and the mismatch was serious. That is, the time series’ features obtained from the trained data could not successfully be applied to the predicted data when the linear regression model was used.
For the MLP model (Figure 5b), the fitness between trained data and observations were better than that of the linear regression model. The trends in trained data and observations were consistent. However, the fitting result between the predicted data and observations was only slightly better than that of the linear regression model. That is, the MLP method accurately learned the time series characteristics of the trained data, but it still lacked in aspects of data prediction. This can be attributed to the over-fitting of learning from trained data based on the MLP method.
Using the LSTM model (Figure 5c), the fitting performance between trained data and observations was worse than that of MLP model and similar to that of the linear regression model. However, the fitting result between the predicted data and observation was better than those from the linear regression and MLP models, especially from 2010 to 2012. The trends between the predicted data and observation were consistent with those in the LSTM model (Figure 5c), but not consistent with those in the linear regression and the MLP model. Compared with the insufficient learning of the linear regression model and the over-fit learning of the MLP model, the time series’ features learned from the trained data by the LSTM model could be successfully applied for the prediction of groundwater recharge.
The linear regression, MLP and LSTM methods were then compared for different percentages of the observation data being used for training and testing the models (Table 2). In order to compare the three methods, an error indicator ( R M S E ) and a fitness indicator ( R 2 ) were applied into testing the whole prediction results. R M S E T r a and R 2 T r a are the error and fitness values for the training data, R M S E Pr e and R 2 Pr e are the error and fitness values for the prediction data, and R M S E S u m and R 2 S u m are the error and fitness values for the total data containing both the training data and the prediction data. All of the data for the influential predictors and groundwater recharge observations were divided into two groups: training data and prediction data. In the experiment, the ratio of training data was selected as 70%, 80% and 90% respectively.
For all the three models, R M S E for all the total dataset ranged from 0.02 to 0.20. For each model, the R M S E for all the data decreased and the fitness indicator R 2 increased when the ratio of training data to prediction data increased. It also demonstrates that the prediction error for the training dataset was lower than that for the prediction dataset. The errors for the total dataset produced from the MLP and the LSTM were similar, and the performance of the linear regression model was slightly inferior to the two more complex methods.
The three models all presented excellent fitting performance on the training dataset. All fitting errors (represented by R M S E ) were lower than 0.07 and all fitness values (represented by R 2 ) were greater than 0.9, no matter which ratio of training data was chosen. For the three models, the fitting performance on the training dataset was obviously better than that on the prediction dataset. Furthermore, compared with the other two models, LSTM had the minimum differences in both fitting errors and fitness values between the prediction dataset and the training dataset. This demonstrated that LSTM model has the higher generalization performance.
Fitting error of the predicted data was the most important factor for comparing the performance of various regression models. For the linear regression, R M S E was greater than 0.19 when the 70%, 80% and 90% ratios of training data were considered, and the fitness R 2 was negative when 90% of the data was used to train the model. The negative value represented that the fitting performance was worse than just fitting a horizontal line and linear relationship was not suitable for fitting the data.
For the MLP model, R M S E and R 2 were better than that those from the linear regression, with the fitting error lower than 0.20 for the 70% ratio of training data. The fitting performance was satisfactory with fitting error R M S E equalling 0.11 when the 80% ratio of training data was considered. The R M S E values were lower than those of the other two methods for all the three ratios of training data. Also, all the fitness values were more than 0.7, no matter which ratio of training data was chosen. That is, the stability of the LSTM model was the best in the three compared methods.
The LSTM model had the best prediction performance in terms of its R M S E values and R 2 values regardless of the proportion of training data chosen. Although the performance of the MLP model was worse than that of the LSTM model, it still gave a better estimate of groundwater recharge than the linear regression method for all of three ratios of training data. It demonstrated that compared with the traditional regression method, deep learning and machine learning methods can greatly improve the whole performance of groundwater recharge prediction.

4.3. Relative Importance of Influential Predictors

The relative importance of potential influential predictors for predicting groundwater recharge was investigated for each of the linear regression, MLP and LSTM models. The coefficients of determination and RMSEs are shown in Figure 6, where influential predictors were excluded one by one from linear regression, MLP and LSTM models. The variable importance ranking was found to be the same whether it was determined through RMSE (Figure 6a) or coefficients of determination (Figure 6b).
For the linear regression model, the least important variables were found to be the minimum temperature (Mintem, Mintem4-10 and Mintem5-9) and Average annual Morton actual evapotranspiration (AnnMact), and the most important predictors were mean dry spell and wet spell lengths (MeDS and MeWS) and annual number of rainfall days (RD), in that order. Rainfall data (Rainfall, Rainfall4-10 and Rainfall5-9) also had an obvious influence. This was unsurprising, because correlations between groundwater recharge and variables of minimum temperature and annual Morton actual evapotranspiration were relatively low (See Figure 4), and the correlations in mean wet/dry spell-length days and rainfall days were significantly high (See Figure 4). Furthermore, it can be attributed to linear regression model which applies linear relationship between potential predictors and is consistent with correlation coefficients between corresponding predictors and groundwater recharge.
For the MLP model, rainfall, maximum wet spell length (MxWS), maximum temperatures (Maxtem, Maxtem5-9, Maxtem4-10), and evaporation (ET, ET4-10 and ET5-9) were relatively insignificant variables, while Mintem, Mintem4-10, Mintem5-9 and AnnMact were found the most important predictors.
For the LSTM model, the least important variables were Maxtem, Maxtem5-9, Maxtem4-10 and RD. For this model, the most important predictors were AnnMact, ET4-10, Mintem, Mintem5-9, MeDS, MeWS, Rainfall4-10 and RI. Overall, the RMSEs were lower for the LSTM model, indicating that it retained its good performance even when a variable had been removed from the inputs.
The relative importance of influential predictors are different for the MLP and LSTM models. The phenomenon can be attributed to two causes. Firstly, over-fitting can be a problem, especially for the MLP model. This model relies entirely on the training dataset to develop its prediction algorithms, which results in its poor performance for the predicted dataset. The generalization ability of the whole model was therefore reduced and the accuracy of the predictor importance measurement was also affected. Secondly, the problem of non-convex optimization makes convergence to a local optimum possible when the parameters of the model are learned and adjusted. This will lead to the deviation of the predictor importance’s estimation. Finally, different local optimal solutions can lead to differences in the relative importance of same influential predictor when MLP and LSTM models are applied respectively.

5. Discussion

In this paper, linear regression, MLP and LSTM models have been used to predict a time series of annual average groundwater recharge from 1970 to 2012, based on 465 bores in the South East of South Australia and Victoria. The performance on the three models in predicting groundwater recharge was assessed using various ratios of trained dataset. The study obtained the following findings.

5.1. Performance and Comparison of Models

Compared with the MLP and linear regression models, the LSTM model showed the best performance in predicting groundwater recharge in the case study area.
Previous research on using machine learning to understand groundwater processes includes the use of a MLP model to predict weekly groundwater levels in the Mahanadi Delta, India [28] and a linear regression model to predict groundwater recharge based on 71 climate variables and 17 non-climate variables also in the South East of South Australia and Victoria [18]. The purpose of this paper is to find the applicable model for groundwater recharge prediction by comparing the linear regression model, MLP and LSTM models based on the same data format. According to Fu’s research [18] in the same study area, seasonal rainfall from May to September and from April to October are the most important influential predictors for groundwater recharge based on multivariate linear regression model. Therefore, we calculate the monthly average quantities based on daily data on various influential predictors from 1970 to 2012, such as maximum temperature, minimum temperature, rainfall and evaporation. The LSTM model has been used to predict groundwater heads [48], but until this paper, it has not been used to predict groundwater recharge timeseries. Results from this study (Section 4) show that the LSTM model consistently obtained the best performance for annual groundwater recharge prediction compared with MLP and linear regression models.
Research in the field of thermodynamics, energy and fuels also shows that the LSTM model performs better than MLP network in forecasting aggregated power load and photovoltaic (PV) power output [49].
The LSTM model is a more complex type of machine learning model that includes three control units: input gates, output gates, and forget gates. As the information enters the model, the control units in the LSTM assess each piece of information for which will be retained and which will be forgotten. The LSTM model can therefore acquire long-term dependencies to solve the problem of gradient vanish and improve predictions of groundwater recharge.
The performance of the MLP model was also found to be better than the more simple linear regression model approach. The MLP model is also known as a feedforward neural network which has one or more hidden layers between the input layer and the output layer of the network. The trained parameters in these hidden layers improve the predictive ability of the model.
An important finding was that the performance of the LSTM model appeared to be almost unaffected for various ratios of trained and prediction dataset were considered. A benefit of this model is that it may reduce the length of time series data required to train the model, and that its results were relatively resilient to changes in parameter settings.
In addition, we found that the model fitness declined when predictors that have low correlations with groundwater recharge were added into the linear regression model. The fitness of the linear regression model depended on the correlation strength between selected predictors and groundwater recharge. That is, the model had better fitness when those predictors with higher correlation coefficients were added into the model. This reflects that the linear regression model was not able to deal with the non-linear relationship among the predictors. On the contrary, different combinations of influential predictors had little influence on the fitness of the MLP and the LSTM. That is, compared with the linear regression model, the two models had better robustness in predicting complex data relationship. Therefore, the hidden layer in MLP model and the memory unit in LSTM model captured the non-linear relationship among influential predictors, and improved the robustness of the models.
Traditional time series analysis methods, such as the autoregressive integrated moving average with explanatory variable (ARIMAX) model, are difficult to capture non-linear relationship. The hidden layer in MLP model and the memory unit in LSTM model can help improve the non-linear learning ability of the models.

5.2. Influential Predictors Identification

The relative importance of all predictors to drive the temporal changes in groundwater recharge was measured using the linear regression, MLP and LSTM models. The results showed that the relative importance of influential predictors was different for each of the three models.
The most important predictors for the linear regression model were the mean number of days wet and dry, number of rain days and three rainfall metrics (MeDS, MeWS, RD, Rainfall, Rainfall4-10 and Rainfall5-9). This is consistent with standard hydrogeological understanding of groundwater recharge, which often estimates recharge as a proportion of rainfall or difference between rainfall and evapotranspiration [22].
For the MLP model, the important predictors were almost opposite to the hydrogeological understanding of recharge: minimum temperature and annual evaporation (Mintem, Mintem4-10, Mintem5-9 and AnnMact), although evaporation is known to decrease net groundwater recharge [42]. Future research is advisable to determine whether these relationships were coincidental or are consistent across many different locations and climate types.
For the LSTM model, the most important predictors were found to be a combination of evapotranspiration, minimum temperature, annual and seasonal evaporation, mean number of dry and wet days, rainfall and rainfall intensity (AnnMact, ET4-10, Mintem, Mintem5-9, MeDS, MeWS, Rainfall4-10 and RI). The LSTM model had a consistently high performance when any one of the influential predictors was removed from the analysis and it was less sensitive to these changes in input. This model appeared to rely on more of the influential predictors to predict groundwater recharge, therefore was more robust when any of the predictors was removed from the analysis. Again, future research is required to determine how consistent these important influential predictors are across multiple locations and climates.

5.3. Implications for Groundwater Management

Accurate evaluation of groundwater recharge is the foundation for equitable and reliable water resources planning. This research verified that machine learning and deep learning methods can be applied to the estimation of groundwater recharge. Out of the three models, the LSTM model had the most precise groundwater recharge predictions. The results from this study provide groundwater researchers and managers a valid reference for the selection of appropriate machine learning models in the future.
This research has explored the groundwater recharge estimation and important drivers for changes in temporal groundwater recharge in the South East of South Australia and Victoria. The performance of the models in this study may also inform the choice of data learning techniques for prediction of groundwater recharge in other places, while the obtained results will be instrumental for the development of future groundwater management strategies in the region.
Identification of important but unintuitive influential factors, such as Rainfall4-10, MeDS, MeWS, Mintem and Mintem5-9 can be used to inform monitoring requirements and improve the design of policy and management plans.

5.4. Advantages, Limitations and Further Research

These experiments for temporal prediction of groundwater recharge clearly demonstrate the effectiveness and robustness of the deep learning approach. The performance of the LSTM model appeared to be almost unaffected when various ratios of trained dataset were considered, or when influential predictors were removed.
Limitations of the study include: non-spatial data sources, short prediction time frames and quantification of uncertainty. Spatial data inputs have not been used in this paper to predict groundwater recharge. In future research, spatial data such as soil type, vegetation type, and slope may be used for improving the model predictions of groundwater recharge. Finally, our research does not yet quantify the uncertainty in prediction results. Although uncertainty analysis methods have been widely applied in classical prediction models [50,51,52], they have not generally been used in machine learning methods, especially deep learning models. Future work will introduce uncertainty analysis methods into these models for the uncertainty quantification of model predictions.
In future studies, nature-inspired intelligent algorithms [53,54] and other neural networks [55] will also be tested in order to further improve the fitness of predicted data. The performance of the models may also be enhanced by incorporating other environmental data (e.g., soil data) or supplemented interpolation data [56]. Furthermore, the iterative calculation [57] may also be used better optimize the model parameters. For this study, the whole observed groundwater recharge data set is divided into trained and predicted subsets based on various ratios. The trained dataset and predicted dataset were used for model training and validation respectively. In the future work, longer term predictions of groundwater recharge may be modelled, which can then be applied for to forecast recharge and managing regional groundwater more effectively.

6. Conclusions

The linear regression, multi-layer perception and long short-term memory models were used for the challenging problem of predicting long-term, time-continuous groundwater recharge. The LSTM model greatly outperformed the MLP and linear regression models when the ratio of the training dataset to the full dataset (composed of the training dataset and the prediction dataset) was set as 70%, 80% or 90%. In turn, the MLP model outperformed the linear regression model. The results clearly demonstrate the effectiveness of the three models for various ratios of trained data to predicted data. The relative importance of potential predictors associated with the observed variation in regional groundwater recharge was also assessed.

Author Contributions

Conceptualization & methodology, X.H. and L.G.; software, X.H.; validation, X.H. and N.Z; formal analysis & investigation, X.H., L.G., and R.C; resources & data curation, R.C and G.F.; writing—original draft preparation, X.H.; writing—review and editing, L.G., N.Z. and R.D.; visualization, X.H. and N.Z.; funding acquisition, X.H and L.G.

Funding

This research was funded by the National Science Foundation of China, No. 61703306; the Natural Science Foundation of Tianjin, No. 16JCQNJC00600; and the Doctoral Foundation of Tianjin Normal University, No. 52XB1302.

Acknowledgments

The work was supported by CSIRO Land and Water. The authors would like to thank the anonymous reviewers for critical reviews and constructive comments, which were very helpful for improving the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Crosbie, R.S.; Jolly, I.D.; Leaney, F.W.; Petheram, C. Can the dataset of field based recharge estimates in Australia be used to predict recharge in data-poor areas? Hydrol. Earth Syst. Sci. 2010, 14, 2023–2038. [Google Scholar] [CrossRef] [Green Version]
  2. Gao, L.; Connor, J.; Doble, R.; Ali, R.; McFarlane, D. Opportunity for pen-urban Perth groundwater trade. J. Hydrol. 2013, 496, 89–99. [Google Scholar] [CrossRef]
  3. Gleeson, T.; Wada, Y.; Bierkens, M.F.P.; van Beek, L.P.H. Water balance of global aquifers revealed by groundwater footprint. Nature 2012, 488, 197–200. [Google Scholar] [CrossRef] [PubMed]
  4. Crosbie, R.S.; Davies, P.; Harrington, N.; Lamontagne, S. Ground truthing groundwater-recharge estimates derived from remotely sensed evapotranspiration: A case in South Australia. Hydrogeol. J. 2015, 23, 335–350. [Google Scholar] [CrossRef]
  5. Gao, L.; Connor, J.D.; Dillon, P. The economics of groundwater replenishment for reliable urban water supply. Water 2014, 6, 1662–1670. [Google Scholar] [CrossRef]
  6. Gao, L.; Bryan, B.A. Finding pathways to national-scale land-sector sustainability. Nature 2017, 544, 217–222. [Google Scholar] [CrossRef] [PubMed]
  7. Crosbie, R.S.; Peeters, L.J.M.; Herron, N.; McVicar, T.R.; Herr, A. Estimating groundwater recharge and its associated uncertainty: Use of regression kriging and the chloride mass balance method. J. Hydrol. 2018, 561, 1063–1080. [Google Scholar] [CrossRef]
  8. Gebru, T.A.; Tesfahunegn, G.B. Chloride mass balance for estimation of groundwater recharge in a semi-arid catchment of northern Ethiopia. Hydrogeol. J. 2019, 27, 363–378. [Google Scholar] [CrossRef]
  9. Marei, A.; Khayat, S.; Weise, S.; Ghannam, S.; Sbaih, M.; Geyer, S. Estimating groundwater recharge using the chloride mass-balance method in the West Bank, Palestine. Hydrol. Sci. J. 2010, 55, 780–791. [Google Scholar] [CrossRef]
  10. Subyani, A.M. Use of chloride-mass balance and environmental isotopes for evaluation of groundwater recharge in the alluvial aquifer, Wadi Tharad, western Saudi Arabia. Environ. Geol. 2004, 46, 741–749. [Google Scholar] [CrossRef]
  11. Shende, S.; Chau, K.W. Forecasting safe distance of a pumping well for effective riverbank filtration. J. Hazard. Toxic Radioact. Waste 2019, 23, 04018040. [Google Scholar] [CrossRef]
  12. Cuthbert, M.O.; Acworth, R.I.; Andersen, M.S.; Larsen, J.R.; McCallum, A.M.; Rau, G.C.; Tellam, J.H. Understanding and quantifying focused, indirect groundwater recharge from ephemeral streams using water table fluctuations. Water Resour. Res. 2016, 52, 827–840. [Google Scholar] [CrossRef]
  13. Delottier, H.; Pryet, A.; Lemieux, J.M.; Dupuy, A. Estimating groundwater recharge uncertainty from joint application of an aquifer test and the water-table fluctuation method. Hydrogeol. J. 2018, 26, 2495–2505. [Google Scholar] [CrossRef]
  14. Fan, J.L.; Oestergaard, K.T.; Guyot, A.; Lockington, D.A. Estimating groundwater recharge and evapotranspiration from water table fluctuations under three vegetation covers in a coastal sandy aquifer of subtropical Australia. J. Hydrol. 2014, 519, 1120–1129. [Google Scholar] [CrossRef] [Green Version]
  15. Hou, L.Z.; Wang, X.S.; Hu, B.X.; Shang, J.; Wan, L. Experimental and numerical investigations of soil water balance at the hinterland of the Badain Jaran Desert for groundwater recharge estimation. J. Hydrol. 2016, 540, 386–396. [Google Scholar] [CrossRef]
  16. Izady, A.; Abdalla, O.A.E.; Joodavi, A.; Karimi, A.; Chen, M.J.; Tompson, A. Groundwater recharge estimation in arid hardrock-alluvium aquifers using combined water-table fluctuation and groundwater balance approaches. Hydrol. Process. 2017, 31, 3437–3451. [Google Scholar] [CrossRef]
  17. Park, C.; Seo, J.; Lee, J.; Ha, K.; Koo, M.H. A distributed water balance approach to groundwater recharge estimation for Jeju volcanic island, Korea. Geosci. J. 2014, 18, 193–207. [Google Scholar] [CrossRef]
  18. Fu, G.B.; Crosbie, R.S.; Barron, O.; Charles, S.P.; Dawes, W.; Shi, X.G.; Niel, T.V.; Li, C. Attributing variations of temporal and spatial groundwater recharge: A statistical analysis of climatic and non-climatic factors. J. Hydrol. 2019, 568, 816–834. [Google Scholar] [CrossRef]
  19. Messier, K.P.; Campbell, T.; Bradley, P.J.; Serret, M.L. Estimation of Groundwater Radon in North Carolina Using Land Use Regression and Bayesian Maximum Entropy. Environ. Sci. Technol. 2015, 49, 9817–9825. [Google Scholar] [CrossRef]
  20. Doble, R.C.; Crosbie, R.S. Review: Current and emerging methods for catchment-scale modelling of recharge and evapotranspiration from shallow groundwater. Hydrogeol. J. 2017, 25, 3–23. [Google Scholar] [CrossRef]
  21. Mogaji, K.A.; Lim, H.S.; Abdullah, K. Modeling of groundwater recharge using a multiple linear regression (MLR) recharge model developed from geophysical parameters: A case of groundwater resources management. Environ. Earth Sci. 2015, 73, 1217–1230. [Google Scholar] [CrossRef]
  22. Figura, S.; Livingstone, D.M.; Kipfer, R. Forecasting groundwater temperature with linear regression models using historical data. Groundwater 2015, 53, 943–954. [Google Scholar] [CrossRef] [PubMed]
  23. Ebrahimi, H.; Rajaee, T. Simulation of groundwater level variations using wavelet combined with neural network, linear regression and support vector machine. Glob. Planet Chang. 2017, 148, 181–191. [Google Scholar] [CrossRef]
  24. Shamshirband, S.; Nodoushan, E.J.; Adolf, J.E.; Manaf, A.A.; Mosavi, A.; Chau, K.W. Ensemble models with uncertainty analysis for multi-day ahead forecasting of chlorophyll a concentration in coastal waters. Eng. Appl. Comp. Fluid 2019, 13, 91–101. [Google Scholar] [CrossRef]
  25. Gholami, V.; Chau, K.W.; Fadaee, F.; Torkaman, J.; Ghaffari, A. Modeling of groundwater level fluctuations using dendrochronology in alluvial aquifers. J. Hydrol. 2015, 529, 1060–1069. [Google Scholar] [CrossRef]
  26. Taormina, R.; Chau, K.W.; Sethi, R. Artificial neural network simulation of hourly groundwater levels in a coastal aquifer system of the Venice lagoon. Eng. Appl. Artif. Intell. 2012, 25, 1670–1676. [Google Scholar] [CrossRef] [Green Version]
  27. Taormina, R.; Chau, K.W.; Sivakumar, B. Neural network river forecasting through baseflow separation and binary-coded swarm optimization. J. Hydrol. 2015, 529, 1788–1797. [Google Scholar] [CrossRef]
  28. Mohanty, S.; Jha, M.K.; Raul, S.K.; Panda, R.K.; Sudheer, K.P. Using artificial neural network approach for simultaneous forecasting of weekly groundwater levels at multiple sites. Water Resour. Manag. 2015, 29, 5521–5532. [Google Scholar] [CrossRef]
  29. Sun, Y.B.; Wendi, D.; Kim, D.E.; Liong, S.Y. Technical note: Application of artificial neural networks in groundwater table forecasting—A case study in a Singapore swamp forest. Hydrol. Earth Syst. Sci. 2016, 20, 1405–1412. [Google Scholar] [CrossRef]
  30. Pasandi, M.; Salmani, N.; Samani, N. Spatial estimation of water-table depth by artificial neural networks in light of ancillary data. Hydrol. Sci. J. 2017, 62, 2012–2024. [Google Scholar] [CrossRef]
  31. Kong-A-Siou, L.; Johannet, A.; Estupina, V.B.; Pistre, S. Neural networks for karst groundwater management: Case of the Lez spring (Southern France). Environ. Earth Sci. 2015, 74, 7617–7632. [Google Scholar] [CrossRef]
  32. Jiang, Z.J.; Mallants, D.; Peeters, L.; Gao, L.; Soerensen, C.; Mariethoz, G. High-resolution paleovalley classification from airborne electromagnetic imaging and deep neural network training using digital elevation model data. Hydrol. Earth Syst. Sci. 2019, 23, 2561–2580. [Google Scholar] [CrossRef] [Green Version]
  33. Alizadeh, M.J.; Kavianpour, M.R.; Danesh, M.; Adolf, J.; Shamshirband, S.; Chau, K.W. Effect of river flow on the quality of estuarine and coastal waters using machine learning models. Eng. Appl. Comp. Fluid 2018, 12, 810–823. [Google Scholar] [CrossRef] [Green Version]
  34. Ye, L.; Gao, L.; Marcos-Martinez, R.; Mallants, D.; Bryan, B.A. Projecting Australia’s forest cover dynamics and exploring influential factors using deep learning. Environ. Modell. Softw. 2019, 119, 407–417. [Google Scholar] [CrossRef]
  35. Scher, S. Toward data-driven weather and climate forecasting: Approximating a simple general circulation model with deep learning. Geophys. Res. Lett. 2018, 45, 12616–12622. [Google Scholar] [CrossRef]
  36. Jiang, G.Q.; Xu, J.; Wei, J. A deep learning algorithm of neural network for the parameterization of typhoon-ocean feedback in typhoon forecast models. Geophys. Res. Lett. 2018, 45, 3706–3716. [Google Scholar] [CrossRef]
  37. Jones, D.A.; Wang, W.; Fawcett, R. High-quality spatial climate data-sets for Australia. Aust. Meteorol. Ocean 2009, 58, 233–248. [Google Scholar] [CrossRef]
  38. Donohue, R.J.; McVicar, T.R.; Roderick, M.L. Assessing the ability of potential evaporation formulations to capture the dynamics in evaporative demand within a changing climate. J. Hydrol. 2010, 386, 186–197. [Google Scholar] [CrossRef]
  39. Leaney, F.W.; Herczeg, A.L. Regional Recharge to a Karst Aquifer Estimated from Chemical and Isotopic Composition of Diffuse and Localized Recharge, South Australia. J. Hydrol. 1995, 164, 363–387. [Google Scholar] [CrossRef]
  40. Healy, R.W.; Cook, P.G. Using groundwater levels to estimate recharge. Hydrogeol. J. 2002, 10, 91–109. [Google Scholar] [CrossRef]
  41. Meinzer, O.E.; Stearns, N.D. A study of ground water in the Pomperaug Basin, Connecticut with special reference to intake and discharge. Anat. Rec. 1929, 64, 327–341. [Google Scholar]
  42. Crosbie, R.S.; Davies, P. Recharge estimation. In Framework for a Regional Water Balance Model for the South Australian Limestone Coast Region; Harrington, N., Lamontagne, S., Eds.; Goyder Institute for Water Research: Adelaide, SA, Australia, 2013. [Google Scholar]
  43. Jeffrey, S.J.; Carter, J.O.; Moodie, K.B.; Beswick, A.R. Using spatial interpolation to construct a comprehensive archive of Australian climate data. Environ. Modell. Softw. 2001, 16, 309–330. [Google Scholar] [CrossRef]
  44. Chiew, F.; Wang, Q.J.; Mcconachy, F.; James, R.; Wright, W.; De Hoedt, G. Evaportranspiration maps for Australia. In Proceedings of the Hydrology and Water Resources Symposium, Melbourne, Australia, 20–23 May 2002. [Google Scholar]
  45. Morton, F.I. Operational Estimates of Areal Evapo-Transpiration and Their Significance to the Science and Practice of Hydrology. J. Hydrol. 1983, 66, 1–76. [Google Scholar] [CrossRef]
  46. Harrington, N.; Li, C. Development of a Groundwater Extraction Dataset for the South East of South Australia: 1970–2013; Goyder Institute for Water Research: Adelaide, SA, Australia, 2015. [Google Scholar]
  47. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural. Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  48. Zhang, J.F.; Zhu, Y.; Zhang, X.P.; Ye, M.; Yang, J.Z. Developing a Long Short-Term Memory (LSTM) based model for predicting water table depth in agricultural areas. J. Hydrol. 2018, 561, 918–929. [Google Scholar] [CrossRef]
  49. Wen, L.L.; Zhou, K.L.; Yang, S.L.; Lu, X.H. Optimal load dispatch of community microgrid with deep learning based solar power and load forecasting. Energy 2019, 171, 1053–1065. [Google Scholar] [CrossRef]
  50. Gao, L.; Bryan, B.A. Incorporating deep uncertainty into the elementary effects method for robust global sensitivity analysis. Ecol. Modell. 2016, 321, 1–9. [Google Scholar] [CrossRef]
  51. Gao, L.; Bryan, B.A.; Nolan, M.; Connor, J.D.; Song, X.D.; Zhao, G. Robust global sensitivity analysis under deep uncertainty via scenario analysis. Environ. Modell. Softw. 2016, 76, 154–166. [Google Scholar] [CrossRef]
  52. Gao, L.; Bryan, B.A.; Liu, J.; Li, W.; Chen, Y.; Liu, R.; Barrett, D. Managing too little and too much water: Robust mine-water management strategies under variable climate and mine conditions. J. Clean. Prod. 2017, 162, 1009–1020. [Google Scholar] [CrossRef]
  53. Gao, L.; Ding, Y.; Ying, H. An adaptive social network-inspired approach to resource discovery for the complex grid systems. Int. J. Gen. Syst. 2006, 35, 347–360. [Google Scholar] [CrossRef]
  54. Gao, L.; Hailu, A. Comprehensive Learning Particle Swarm Optimizer for Constrained Mixed-Variable Optimization Problems. Int. J. Comput. Int. Syst. 2010, 3, 832–842. [Google Scholar] [CrossRef]
  55. Huang, X.; Hao, K.R.; Ding, Y.S. Human fringe skeleton extraction by an improved Hopfield neural network with direction features. Neurocomputing 2012, 87, 99–110. [Google Scholar] [CrossRef]
  56. Huang, X.; Zhu, Y.P. An entity based multi-direction cooperative deformation algorithm for generating personalized human shape. Multimed. Tools Appl. 2018, 77, 24865–24889. [Google Scholar] [CrossRef]
  57. Huang, X.; Gao, L. Reconstructing three-dimensional human poses: A combined approach of iterative calculation on skeleton model and conformal geometric algebra. Symmetry Basel 2019, 11, 301. [Google Scholar] [CrossRef]
Figure 1. The study area and 465 groundwater bores (The blue area in the red rectangle in the left panel represents the study area in Australia; and the borders in the right panel represents the study area, within which blue points represent the locations of 465 bores).
Figure 1. The study area and 465 groundwater bores (The blue area in the red rectangle in the left panel represents the study area in Australia; and the borders in the right panel represents the study area, within which blue points represent the locations of 465 bores).
Water 11 01879 g001
Figure 2. Structure of MLP network for predicting groundwater recharge.
Figure 2. Structure of MLP network for predicting groundwater recharge.
Water 11 01879 g002
Figure 3. The internal structure of a LSTM cell [47].
Figure 3. The internal structure of a LSTM cell [47].
Water 11 01879 g003
Figure 4. Correlation coefficients between regional groundwater recharge and influential predictors.
Figure 4. Correlation coefficients between regional groundwater recharge and influential predictors.
Water 11 01879 g004
Figure 5. Trained and predicted trajectories of groundwater recharge based on (a) linear regression, (b)MLP and (c)LSTM models.
Figure 5. Trained and predicted trajectories of groundwater recharge based on (a) linear regression, (b)MLP and (c)LSTM models.
Water 11 01879 g005
Figure 6. RMSE and coefficients of determination (R2) using three models after excluding each potential influential predictor for the predicted groundwater recharge data.
Figure 6. RMSE and coefficients of determination (R2) using three models after excluding each potential influential predictor for the predicted groundwater recharge data.
Water 11 01879 g006
Table 1. Potential variables used in this study.
Table 1. Potential variables used in this study.
Data TypeExplanationVariablesUnit
Spatial-temporalRegional annual rainfallRainfallmm
Regional April-October rainfallRainfall4-10mm
Regional May-September rainfallRainfall5-9mm
Regional annual actual evaporationETmm
Regional April-October actual evaporationET4-10mm
Regional May-September actual evaporationET5-9mm
Regional annual maximum temperatureMaxtem°C
Regional April-October maximum temperatureMaxtem4-10°C
Regional May-September maximum temperatureMaxtem5-9°C
Regional annual minimum temperatureMintem°C
Regional April-October minimum temperatureMintem4-10°C
Regional May-September minimum temperatureMintem5-9°C
Regional annual Morton actual evapotranspirationAnnMactmm
Regional mean wet-spell lengthMeWSday
Regional max wet-spell lengthMxWSday
Regional mean dry-spell lengthMeDSday
Regional max dry-spell lengthMxDSday
Regional rainfall (≥1.0mm) days annuallyRDday
Regional rainfall intensity (Rainfall/RD) annuallyRImm/day
TemporalAnnual regional groundwater extractionExtractionmm
Table 2. The trained and predicted groundwater recharge results based on the three models and the percentage of observation data used to train the models.
Table 2. The trained and predicted groundwater recharge results based on the three models and the percentage of observation data used to train the models.
Training Data (%)Linear RegressionMLP ModelLSTM Model
R M S E T r a R M S E P r e R M S E S u m R M S E T r a R M S E P r e R M S E S u m R M S E T r a R M S E P r e R M S E S u m
70%0.060.200.130.020.190.120.060.120.09
80%0.060.190.110.020.110.060.070.100.08
90%0.060.200.090.020.130.050.040.110.06
Training Data (%)Linear RegressionMLP ModelLSTM Model
R 2 T r a R 2 P r e R 2 S u m R 2 T r a R 2 P r e R 2 S u m R 2 T r a R 2 P r e R 2 S u m
70%0.960.460.790.990.490.820.940.770.88
80%0.950.440.850.990.820.950.930.840.92
90%0.95−0.110.890.990.540.960.980.700.96

Share and Cite

MDPI and ACS Style

Huang, X.; Gao, L.; Crosbie, R.S.; Zhang, N.; Fu, G.; Doble, R. Groundwater Recharge Prediction Using Linear Regression, Multi-Layer Perception Network, and Deep Learning. Water 2019, 11, 1879. https://doi.org/10.3390/w11091879

AMA Style

Huang X, Gao L, Crosbie RS, Zhang N, Fu G, Doble R. Groundwater Recharge Prediction Using Linear Regression, Multi-Layer Perception Network, and Deep Learning. Water. 2019; 11(9):1879. https://doi.org/10.3390/w11091879

Chicago/Turabian Style

Huang, Xin, Lei Gao, Russell S. Crosbie, Nan Zhang, Guobin Fu, and Rebecca Doble. 2019. "Groundwater Recharge Prediction Using Linear Regression, Multi-Layer Perception Network, and Deep Learning" Water 11, no. 9: 1879. https://doi.org/10.3390/w11091879

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop