A Comparative Study of Regression Analysis and Artificial Neural Network Methods for Medium-Term Load Forecasting

Objectives : Load forecasting is an operation of predicting the future of load demands in electrical systems using previous or historical data. This paper reports the study on a medium-term load forecast carried out with load demand data set obtained from Covenant University campus in Nigeria and carry out comparative study of the two methods used in this paper. Methods/Statistical analysis : The regression analysis and Artificial Neural Network (ANN) models were used to show the feasibility of generating an accurate medium-term load forecast for the case study despite the peculiarity of its load data. The statistical evaluation methods used are Mean Absolute Percentage Error (MAPE) and root mean square error. Findings: The results from the comparative study show that the ANN model is a superior method for load forecast due to its ability to handle the load data and it has lower MAPE and RMSE of 0.0285 and 1.124 respectively which is far better result than the regression model. Application/Improvements: This result provides a benchmark for power system planning and future studies in this research domain.


Introduction
Forecast can be defined as the estimation of future events and conditions based on previous data.The act of making such estimate is called forecasting 1 .Forecasting is very crucial to decision making 2 .Forecasting is usually confused with prediction, but to carry out a forecast, one relies mainly on past data records.A forecast can be said to be more specific and may cover a range of possible outcomes.
Load forecasting is now an important research area for the running and operation electrical power systems and energy management.It is the estimation of electrical loads that will be required for a certain location with the use of previous records of electrical load demand.Load forecasting is divided into three types according to the time period in which the forecasts are made.These types are short-term, medium-term and long-term load forecasting.The short-term forecasting is from an hour to at least a week ahead.The medium-term forecasting is from a week to a year while the long-term forecasting is from a year upwards.
Furthermore, the load forecasting types are defined based on their uses.Short-term forecasts are used to ensure the stability of the system and cost minimization for load dispatch and system running.The medium-term forecasts are used for effective operational planning and optimization of generation.The long-term forecasts are used for investment planning for expansion of power system infrastrure 3 .Each load forecast is a unique activity.Therefore, for each forecast, different factors are considered before they are carried out.This is because, an overestimation of load demand will result economic waste since large investment will be required for construction of excess power infrastructures, while underestimation will result underinvestment which will lead to customer discontentment 4 .
Load forecasting could also be classified into qualitative and quantitative methods.Qualitative forecasting methods use the views of experts to estimate future load intuitively.This is used when previous or past load data are not available.These methods include: subjective curve fitting, Delphi method and technological comparisons.Quantitative methods are based on established mathematical analysis like the regression analysis, decomposition methods, exponential smoothing, and the Box-Jenkins methodology 5 .
Medium-term forecasting is useful to the University community because it sets the stage for a power system maintenance plan.It also helps in generation capacity planning for future network expansions due to increasing load demands.Therefore, it is very necessary for a load forecast to be carried out in order to aid in the proper planning and load apportioning in any electrical network especially for a fast developing and research based University environment.Proper knowledge of its load requirements can also aid in the adjustment of power generating capacities to suit the intensity of future research works, teachings and learning.
There are a number of studies that have been reported in the literature on different types of load forecasting.To improve the modeling of seasonality, 3 anon-linear regression model was developed for medium term load forecasting.A study carried out in 6 shows three regression models used to calculate the medium-term load forecast.In 7 , the increasing usage of Artificial Intelligence techniques was applauded.Also, a lot of work has been done using ANN for long-and short-term forecast [8][9][10][11][12][13][14] , but research publications on medium-term load forecast using ANN are few.
The primary goal of this paper is to report the results of a study carried out for a medium-term load forecast using the substation load data records of Covenant University in Ogun State, Nigeria.The accuracy of the methods used for the load forecast was tested and confirmed using the Mean Absolute Percentage Error (MAPE) and the Root Mean Square Error (RMSE) tools.

Data Collection
The load data collected from the substation at the study location showed constant variations in the load demand by the University community.This was due to power failure at different points in time from the public power distribution company.The data values for the six months (starting from August 2012 -January 2013) were collated at the time of study as shown in Table 1.But to prepare the forecast models, the months of August, September and October 2012 were used.The following observations were noted from the data collected for this study.
• During the hot weather periods (dry season), there was a high increase in the load demand at the study location.Further studies carried out revealed that a lot of air conditioners and ceiling fans were turned ON to reduce the effect of the heat in residences, offices, guest houses, classrooms and shopping malls.The reverse was the case for colder weather periods (wet season), as there were no residence with inbuilt heaters in them.Many residents then resorted to switching OFF cooling systems at most times of the day, thereby leading to reductions in load demands (i.e.reduced energy consumption).• The total load demands were always higher during weekdays than weekends.• There was less activities (both human and commercial) on the campus, especially in December and early day's part of January due to semester breaks and festive holidays.This was evident in the load data values recorded during the period as shown in Table 1.
• The load patterns led to a hypothetical dividing day into two periods: Peak periods and Non-peak periods.Peak periods are slated to be between 6pm to 9pm in the evenings when lots of activities were recorded in the residential areas e.g.staff quarters and student halls of residence.Non-peak periods are the times when people were at work.Most of the load demands were centered within the academic area of the campus e.g. the offices, laboratories and lecture theatres.• Looking at the load demands with respect to the peak periods, non-peak periods and the different seasons, it was seen that the peak periods extends from 9pm to around 12 noon due to hot weather because of the extended usage of ceiling fans and air conditioners in the residences.

Forecast Models
The load forecast models examined in this study are regression and ANN models.Brief overviews of each of these models are given in this sub-section.

Regression Model
Regression analysis is a very important mathematical tool which is very useful in determining the statistical relationship between a change in one variable and the change in another when compared or a variable whose value is known and another whose value is to be predicted.When a variable can be predicted from two or more independent variables it is called multiple regressions.With respect to the dependent and independent variables, regression analysis can show the proportion of the variance between the variables.It can also show how much variance is due to the dependent and independent variables respectively.The regression analysis method of load forecasting has the following advantages: • It gives the strength and direction of a relationship.• Unlike simple correlations, regression analysis allows for the use of more than one predictor.It allows for the prediction of an outcome even when the multiple predictors are correlated with each other (in the case of multiple regressions).• It can be used to correct errors based on previous assumptions.• Good results can be obtained with relatively small data sets.
The disadvantages of regression analysis are: • It is very sensitive to outliers.Outliers are observation points that are distant from other observations.They might be indications of experimental error or as a result of variability in measurement 14 .• Implementation of data collection is time-consuming and very costly.• Outputs of regression can lie outside of the range (0, 1).• The extrapolation properties have the tendency to be poor.

Artificial Neural Network (ANN) Model
An Artificial Neural Network (ANN) is an information processing model.It is a type of machine learning system whose method of processing is inspired from the biological nervous system.A simple explanation of ANN is that it is a set of connected input/output units that has a weight attached/associated with every connection.It consists of a pool of simple processing elements which communicate by sending signals to each other over a large number of weighted connections 10 .It draws its inspiration from the biological nervous system whose fundamental unit is the biological neuron which is shown in Figure 1 11 .It was developed as a generalization of mathematical models from neural biology.Since an ANN is similar to a biological neural network, then its fundamental building block is the mathematical model of a neuron whose schematic is shown in Figure 2.There are three basic components of the artificial neuron and they are: • The synapses or connecting links that provide weights, w kp , to the input values, x p for p = 1… n. • An adder component which sums the weighted input values and computes the input to the activation function.• An activation function that maps the sum of x p to y k , the output value of the neuron.It is also called a squashing function.ANN finds its usefulness in different industries e.g. the aerospace, defense, electronics, manufacturing, medical, robotics, speech and transportation amongst other industries in areas of classification, pattern recognition, data mining, control, optimization, etc.The areas of application of ANN in power system include load forecasting, fault diagnosis/fault location, economic dispatch, security assessment and transient stability.The advantages of ANN are as highlighted in 15 , and they are: • Its ability to support a non-linear mapping of input and output variables.• It also has features like robust performance in noisy environments and environments with incomplete data which enables it to generalize.• High parallel computation which means that it can be implemented with very large scale integrated circuit like ASIC, DSP, etc. • Its ability to learn on its own (with or without a supervisor) using the data inputted into the model, hence making it an adaptive method (which is usually achieved by changing the connection strengths).• If a neuron in the model gets damaged along the process, the whole network doesn't shut down; it can still function quite well.• It can make decisions with a measure of confidence.
• ANN as a tool also has its own disadvantages which include 13 : • The processing time of ANN increases with its size.• ANN needs training before operation begins.
• The architecture of an ANN differs from that of microprocessors.Therefore, it needs to be emulated 16 .

Performance Evaluation using MAPE and RMSE
The two measures used to evaluate the forecasting performance in this study are: the MAPE 18 , and the RMSE 17 .
The MAPE is defined in equation ( 1) as: (1) Where: L f = forecast load L a = actual load N = number of values The RMSE is defined in equation ( 2) as: (2)

Experimentation Using the Linear Regression Model
Using the collected data shown in Table 1, the RMSE and MAPE values of three different regression models were computed and the performance results are shown in Table 2.As shown in the table, the linear regression gave the best result out of the three models with the least MAPE value of 0.6989 and RMSE value of 50.96 respectively.The linear regression model was therefore adopted for the load forecast in this first experiment.Table 3 shows the actual and the forecast values using the linear regression model while Figure 3

Experimentation Using the ANN Model
The data set we collected from the power base station was pre-processed and saved in an Excel spreadsheet.This data set spans over a period of 184 days, which is approximately equal to six months.The ANN load forecasting model was then designed and configured using the neural network tool box in MATLAB.Carrying out a mediumterm load forecast with an ANN model network comes in two phases.The first phase involves the training and validation of the ANN model.An ANN "thinks" like a human brain and for it to successfully carry out load forecasting, there is a need to first train it with the collected historical data.The inputs and the expected outputs for the data is fed into the network and the network parameters are adjusted until the best network model (the model with the least error) is arrived at.Modifications that can be made when training an ANN to achieve optimal performance include the number of epochs, the number of hidden layers, type of activation functions and the number of neurons in the hidden layer.In order to validate an ANN, comparison is made inherently between the outputs generated by the model and the target outputs using the partitioned data set for validation, and necessary adjustments are made to the network parameters.The data set that was collected was partitioned into 50% for training, 16% for validation and 34% was set aside for the medium-term forecast.Both the input and output layers made use of the 'tansig' and 'purelin' functions respectively.The network is trained using the gradient descent algorithm which is a variant of the back propagation learning algorithm.The training of the network was carried out and the adjustments made till the best network was obtained.The training output of the network is shown in Figure 4. From the training output, the MAPE performance result gave a value of 0.0285 while the RMSE gave a value of 1.124.
Afterwards, sample inputs needed for the forecast were fed into the trained network and the outputs were recorded.These outputs are the ANN forecast values of the load data from November 2012 till January 2013.These forecast values were compared with the actual values of the load and the plots for the months of November 2012, December 2012 and January 2013 are shown in Figure 5, Figure 6 and Figure 7 respectively.

Conclusion
This research was carried out using the peculiarity of the Covenant University load data to find out which load forecasting method among the two methods presented could respond more positively to non-linear load data.Table 5 shows the comparison of the MAPE and the RMSE values, it has been deduced that the ANN is a better method for the load forecast than the regression analysis method.The ANN was able to forecast the data regardless of the drop in load demands in December.It did not just forecast, but forecasted the prospective load demands with a very minimal error when compared to the actual load demands.This shows the robustness of the ANN to learn and forecast with great accuracy and precision.This will enable future research works to be undertaken for a long-term load forecast of the case study location using ANN due to its level of accuracy and robustness in handling nonlinear load trends.With respect to producing immediate results, the regression analysis method produces results faster because of the direct mathematical computations, while ANN has to be trained before it can begin forecasting electrical load data.However, once the ANN has been trained, the computation by the ANN forecast method is faster than that of the regression analysis method.In the future, we hope to further extend this study using long-term load dataset from the study location and other machine learning platforms such as Radial Basis Function (RBF) ANN and ANN ensemble.
shows a comparison of the actual and forecast values for August 2013 to January 2014.The study in 9 also gave a similar result using linear regression model.Vol 10 (10) | March 2017 | www.indjst.orgIsaac A. Samuel, Adetiba Emmanuel, Ishioma A. Odigwe and Firstlady C. Felly-Njoku

Figure 3 .
Figure 3.A Comparison of the Actual and Forecasted Average Load Demands of Covenant University from August 2012 -January 2013 using the Linear Regression Analysis Method.
The second phase is the forecasting with the trained ANN model.The trained network makes predictions or forecasts based on the relationship learnt during the training and validation phase.The ANN topology used for this research is a Multi-Layer Perceptron (MLP) network.The layers in the MLP are one input layer, two hidden layers and one output layer.The input layer of the network is made up of four neurons and the input values are: • Load of previous day.• Load of the previous week.• Day of the week.• 'Holiday' or 'No Holiday' .The days of the week are assigned numerical values as shown in Table4.'Holiday' is assigned a numerical value '1' and 'No Holiday' is assigned a value '0' .The output layer of the network is made up of only one neuron.

Table 4 .
Numerical values assigned to respective days of the week used in the ANN training

Figure 4 .
Figure 4.The MLP ANN Model Schematic for the Medium Term Load Forecast.

Figure 5 .
Figure 5.A Comparison of the Actual and Forecasted Load Demand for November 2012 using ANN.

Figure 6 .
Figure 6.A Comparison of the Actual and Forecasted Load Demand for December 2012 using ANN.

Figure 7 .
Figure 7.A Comparison of the Actual and Forecasted Load Demand for January 2013 using ANN.

Table 1 .
The actual load demand values

Table 2 .
The MPSE and RMSE values of the three regression models

Table 3 .
The actual and forecasted values using the linear regression analysis method