Forecasting of FOREX Price Trend Using Recurrent Neural Network - Long Short-term Memory

: Algorithms of neural networks (NN) can search and represent both structured and not structured data, we employ then on financial time-series. This paper describes the use of Long short-term memory (LSTM) for FOREX pair EUR/USD price prediction. Aim of the paper is to test and proposes the best time block to predict based on a daily FOREX data. We employ the mean of absolute errors and the least mean squared errors to assess prediction results in order to find the time block. We tested time blocks from ten to fifty-eight days and 100 or 300 epochs. Training dataset contained daily exchange rate data from 1.4.1971 until 9.5.2019. The best performing network has been trained for 30-day period and 100 epochs. This paper also describes the effect of training for a high number of epochs.


Introduction
Forecasting is one of the essential tasks that humans are trying to achieve. That is why economic time-series were the aim of prediction for a very long time. In the late 90' financial time series, such as stock market and Foreign Exchange (FOREX), were described as a phenomenon with nearly like a random-walk process behavior, with statistical properties different at different points in time as the process is time-varying, making the prediction almost impossible (Hellstrom and Holmstrom 1998). Twenty-one years later is our set of tools wider and computational power incomparable. With their use we try algorithms to learn from repetitions and patterns to predict the next period. The methods are moving forward very quickly yet long-term or extremely short-term remains as a significant challenge and mainly intraday trading needs to be explored (Pradeepkumar and Ravi 2018).
Algorithms of neural networks (NN) can search and represent both structured and not structured data -for instance, natural language processing, time series or image data (Abdel-Nasser and Mahmoud 2019; Pena- Barragan et al. 2011). In image data processing can be found examples about fixing an image (Wolterink et al. 2017;Yang et al. 2018), compression (Sun et al. 2020, super-resolution (Ledig et al. 2017)  One of the main problems is the time block used for a prediction. This is the aim of our research. We perform a mid-term FOREX analysis and test a different time block in order to find the least mean of absolute errors (MAE) and the least mean squared errors (MSE). Time block of a prediction is crucial to prediction accuracy. With an extended period, we get too far into the future and lost accuracy. With a short time block, we also get very inaccurate results. It is caused by missing information about the previous trend.
The paper is divided into several parts. In the first chapter, we describe the connection between FOREX and neural networks. In the second chapter, we explain RNN and LSTM network architecture. Also, we describe the data we used and the methodology of processing. The third chapter describes the result of time series prediction we have achieved.

Topic Overview
Regarding the Web of Science database, the topic of FOREX and forecast a query "FOREX AND forecasting" shows an increasing trend for the last decade. Since 2015 there is almost three times the number of published articles, see chart below. About 30% is related to computer science. Keyword analysis in Figure 2 shows the connection between FOREX and keywords such as forecasting, machine learning, neural networks, prediction in a total of 76 keywords. The minimum occurrence of keyword was 5 times. The amount of all used keywords is 1,687 from 423 articles. This analysis suggested suitability of employing recurrent neural networks (RNN) for FOREX value prediction. We also found out there are not many research papers using long short-term memory (LSTM).

Recurrent Neural Networks
Humans do not start their thinking from scratch every second. As we do something, we understand and we build our experience on previous knowledge. We do not throw everything away and start thinking from scratch again. During the reading, we understand every word in the context of other words.
Traditional neural networks don't have a memory. For instance, imagine your network is trying to solve a classification task. To be correct classify every frame of a movie. Every point of the timeline has to be described. It is impossible to understand a movie in context if the network does not have any information about previous events in the movie.
RNN is a network with architecture which address this issue. RNN contains a loop to allow the network to persist information. A loop allows information to be passed from one step to the next one. (Hochreiter and Schmidhuber 1997) This loop makes a recurrent neural network less clear for understanding. For better understanding, we can imagine recurrent networks as multiple copies of the same network. Each new one is created from the previous network. Consider it as a loop.
This brings us to the general use of RNN. They are high with time series. The incredible success of RNN applied to a variety of problems: speech recognition, language modeling, translation, image captions, traffic predictions and our primary target price predictions. (Abdel-Nasser and Mahmoud 2019; Carapuço et al. 2018;Sidehabi et al. 2016;Zhao et al. 2017) A special kind of RNN is LSTM. They are capable of learning long-term dependencies and were introduced in 1997. LSTM has been proven to be high performing on various problems. (Hochreiter and Schmidhuber 1997) LSTM is designed to avoid the long-term dependency problem. They keep information for long periods. As we described in the previous part, all RNN have some form of a loop like repeating modules. LSTM also has a loop like a repeating module. The architecture of the LSTM cell is illustrated in Figure 3. The key part in a LSTM cell is C line. C represents a memory pipeline. LSTM has an ability to add or remove information to the memory regulated by gates. Gates x and + are a way to let information thought optionally. The σ layer outputs a real number between zero and one. This value represents the impact of information. Zero is for no impact one and one is for very important one.

Dataset, HW and Data Treatment
FOREX currency pairs are divided into three groups -major, minor and exotic. (Broto 2013; Laherrere and Sornette 1998) Major currencies are the most used. For instance, EUR/USD (our choice) or GBP/USD and others. Minor currencies are less traded and have lower liquidity -for instance, Norwegian krone. For different pairs a different behavior can be observed. Hence, similarly to a problem of various industry evaluation a different approach has to be taken in order to assess the development (Hedvicakova and Kral 2019), i.e. one strategy won't fit for all and for each pair the processing will results in a different NN settings. Regarding the settings, one of advantages is that FOREX is one of the most available sources of data which provides enough training data. It is one of the reasons why we choose FOREX for our time series forecasting. Specifically, a public dataset from kaggle.com with the only date and close value of EUR/USD price. (Mahesh 2019) For better results, we are using for pre-processing MinMaxScaler from library Sklearn. This feature scaler gets minimal value and maximal value. This value then recalculates all dataset values to fit the given range in the constructor.
Feature scaling is a technique to standardize the independent features in a fixed range value. It is done to reduce network size. For instance, with the adequately distributed value, we can use a small network. We used feature scaling for the price value. That means our results are also scaled values. As for post-processing, we also used the same instance of feature scaler on predicted values.
To reach our goal, we divided the input dataset into slices in a range from 10 to 60 days. This period is a shape for training and also for later predictions.
The trained model is pure LSTM architecture from 1997. (Hochreiter and Schmidhuber 1997) We are using the LSTM layer provided by Keras framework. Our architecture uses three output units connected into a fully connected Dense layer. We used mean squared error (MSE) loss function with RMSprop optimizer.
LMST input has as many dimensions as time block used for a particular model. Generally, we are using a structure with shape (N, 1).
Our model is drawn by Keras utility method mode_to_dot in Figure 4. LSTM, RNN and NN generally are very performance demanding. Our computation computer has two dedicated cards with a total of 7,934 CUDA cores. These cards are one of the top-performing gaming cards of current NVIDIA cards. Because of frameworks support, we decided to use NVIDIA cards only. One of our cards is 1080TI with 11,176 MB graphic memory and 1,607 MHz max clock rate. Another one is 2080TI with 11,019 MB of graphics memory and 1,545 max clock rate. Used processor is i7-8700 with 3.20 GHz clock.
We used as a programming language Python in version 3.7.3. For programing in Python, we used a web-based interactive computational environment Jupyter Notebook. Keras is our main used framework. Keras is a top-level framework with high abstraction on top of Tensorflow. In new versions of Tensorflow (2.+) is Keras integrated.
Prediction validation was calculated by MSE and mean absolute error (MAE). Both methods are commonly used for neural network loss calculation. In our case, used loss function was MSE.
where y is the actual value and ̂ is predicted value. For validation of results, we used MSE where is the difference between the predicted and the the actual value. is a number of observations.
Mean absolute error is a measure of the difference between two continuous variables.
where N is the number of observations.

Results
FOREX price forecasting is an exciting field of study. Our very simple model with only one LSTM layer shows excellent performance on this dataset. We proved the correlation between input time block and prediction accuracy. The best performing model has an MSE value 0.003052 and MAE 0.002390. Impact of time block input on LSRM forecast accuracy can be seen in Table 1. Some time blocks are really close to the best performing 30 days period. An exciting result has a 26 days result. This block shows one of the worst accuracies. The worst block is 38 days, with 100 epochs.
We also test the impact of training length i.e. an impact of epochs counts on the accuracy. In the case of three times more epochs, we got very different results. For instance, the best performing 30day period has 59 % worse accuracy.    Table 1 represents results of calculated errors MSE and MAE. Best performing combination is bold and second is italic. The best performing network is a combination of 30 days period and 100 epochs of training. It is essential to know that these results are directly dependent on our architecture.

Discussion and Conclusion
Although not many papers described RNN and LSTM employment for the FOREX prediction, a heading of our research can be compared to results of e.g. ) who chose the same pair EUR/USD pair and also a daily data. A larger set of methods was employed where RNN was compared to e.g. results of autocorrelation-based models but with mixed results. When RNN was compared to a hybrid model of adaptive particle swarm optimization and radial basis function in another study by (Sermpinis, Theofilatos et al. 2012), the hybrid model outperformed it by 9% in annualized return and needed only a half of positions taken. Although a trading pair was EUR/GBP. Daily time frame data were employed also by (Bagheri et al. 2014;Khashei et al. 2008) although they employed a hybrid model-based algorithms unlike us. Study (Persio and Honchar 2016) employed the same approach as we when utilizing RNN, LSTM, and adding Multi-layer Perceptron, and the Convolutional Neural Networks (CNN) for a purpose of FOREX analysis (EUR/USD pair). They applied it also on the S&P500 index. Unlike our approach their focus was on intraday trading with a minute-by-minute setting regarding FOREX. The best results of accuracy after training for different architectures by their novel Wavelet + CNN algorithm which outperformed other NN approaches, including RNN in both stocks as well as forex market, with a very high 83% accuracy. However, the variation of results was not very high with maximum of 4%preditction accuracy showing RNN suitability for the task. Study (Maknickienė and Maknickas 2012) based EUR/USD trading strategy on LSTM resulting in 4% profit over the test period of threeday trading steps with an overall accuracy of 65% percent of Pearson's correlation between predicting and historical values. RNN and CNN were employed and compared to C-RNN algorithm by (Ni et al., 2019) for nine volatile currency pairs in the data set with 10 years timespan. Comparison of mean squared errors of predicted values by different forecasting algorithms where C-RNN performed better by 10% compared to CNN.
Employed RNN and LSTM methods in our study are suitable and they were successfully employed by other studies. Although a model specialization into hybrid models creation seems highly promising (Pradeepkumar and Ravi 2018) notes that more attention should be paid to image processing algorithms for the needs of FOREX prediction. This opinion is consistent with rather problematic but still practiced trading of chartism where a certain "image" formation types are being searched for in order to enter a trading position.
The best performing network is a combination of 30 days period and 100 epochs of training. It is essential to know that these results are directly dependent on our architecture. We understand that any algorithm can be, at least temporarily, unsuccessful in an environment with an unexpected fundament changes such as economic distortion caused by the recent coronavirus. After reaching a certain point the market became bearish in a three-day panic which affected both the stock market as well as FOREX. Although algorithms can adjust in the after-initial-panic period the moment of panic itself cannot be handled properly since there not enough adequate cases of such events. Hence one of the main challenges will likely remain even after further research, in this promising and organically fast-growing research area, will be conducted.