XenonStack

Saturday, 28 April 2018

4/28/2018 03:29:00 pm

Time Series Analysis & Forecasting Using Machine Learning & Deep Learning

Time Series Analysis For Business Forecasting

Time is the only moving thing in the world which never stops. When it comes to forecasting, the human mind tends to be more curious as we know that things change with time.

Hence we are interested in making predictions ahead of time. Wherever time is an influencing factor, there is a potentially valuable thing that can be predicted.

So here we are going to discuss Time Series Forecasting. Different types of forecasting we can make the machine to do in real life. The name, ‘time-series’ itself suggests that data related to it varies with time.

The primary motive in time series problems is forecasting. Time Series Analysis For Business Forecasting helps to forecast/predict the future values of a critical field which has a potential business value in the industry, predict health condition of a person, predict results of a sport or performance parameters of a player based on previous performances and previous data.

Time Series Forecasting Methods

Univariate Time Series Forecasting

A univariate time series forecasting problem will have only two variables. One is date-time, and the other is the field which we are forecasting.

For example, if we want to predict the particular weather field like average temperature tomorrow, we’ll consider the temperatures of all the previous dates and use it in the model to predict for tomorrow.

Multivariate Time Series Forecasting

In multivariate case, the target would be the same, if we consider the above example as univariate, our goal is the same to predict average temperature for tomorrow, the difference is we use all other scenarios too in the model which affect the temperature like there is a chance for rainfall, if yes, what will be the duration of the rain? What’s the wind speed at various times? Humidity, atmospheric pressure, precipitation, solar radiation and many more.

All these factors are intuitively relevant to temperature. The primary point of consideration in comparison for univariate and multivariate is that multivariate is more suited for practical scenarios.

Time Series Forecasting Models

ARIMA Model

ARIMA means Autoregressive Integrated Moving Average. The AR part of ARIMA indicates that the evolving variable of interest is regressed on its own lagged (i.e., prior) values.

The MA part suggests that the regression error is a linear combination of error terms whose values occurred contemporaneously and at various times in the past.

The I (for "integrated") indicates that the data values have been replaced with the difference between their values and the previous values (and this differencing process may have been performed more than once).

The purpose of each of these features is to make the model fit the data, and its advantage is that it depends on the accuracy over a broad domain of time series despite being more complicated.

ARCH/GARCH Model

Most importantly volatility models for time series are Autoregressive Conditional Heteroscedasticity (ARCH) and extended to its generalized version GARCH model.

These models are very well trained in capturing dynamics of volatility from time series.

Univariate GARCH models have achieved fame in volatility models, but Multivariate GARCH is still very challenging to implement in the time series.

Vector Autoregression (VAR) Model

VAR is an abbreviation for Vector Autoregression.

VAR model captures the interdependencies among various time series in the data. This model is the generalization of the Univariate Autoregression Model

LSTM Model

LSTM stands for long short-term memory, and it is a Deep Learning Model.

LSTM is a type of Recurrent Neural Network(RNN), and RNNs are designed to capture the sequence dependence.

LSTM is capable of handling large architectures during training.

ELMAN and JORDAN Neural Network

Elman and Jordan's neural network are two types of architectures of Recurrent Neural Network (RNN). These networks combine the past values of the context unit with the current input to obtain the final output.

The Jordan context unit acts as low pass filter, which creates an output that is the standard value of some of its most current history outputs.

In these networks, the weighting over time is inflexible since we can only control the time decay. Also, a small change in time is reflected as a significant change in the weighting.

Elman network is context units added to three layers network. In this, hidden layer is connected to these context units fixed with the weight of one. At each step of this when the input is fed forward, learning rule is applied.

Jordan network is same like Elman, but context units are fed from output layer instead of hidden layer.

Approaches To Time Series Analysis

Let us assume a data with a mixture of both continuous and categorical columns, and we have to forecast a column named ‘value, ’ and this column is continuous.

Let the number of columns in the dataset be 100 named as ‘col1’,’col2’,’col3’... ’col100’. Along with this let there be a categorical column ‘cat’ with ten different categories.

Let us assume that each unique item in this ‘cat’ column represents a unique sensor. So, there are ten sensors, and we are getting data from all the ten sensors in real-time which is producing data for other 100 columns.

This is now a multivariate time series problem, and we have to forecast values for ‘value’ column. The later sections describe various approaches to go on with a dataset of this kind.

Questions to be asked about Data

How fast are we getting the data? (once in a second, minute, hour).
Are all the sensors giving data at the same time or different sensors giving data at different times?
Are the values of the sensors related or independent?
To predict one's future value, should we consider all the past data or the latest subset enough? If we are considering, just a subset, how much data is good enough for future predictions?

Approach 1

Univariate Model

In this approach, we just consider the time and the field which we are forecasting.

Before modelling, we have to do some data cleaning and transformation. Data cleaning includes missing value imputation, outlier detection, and replacement.

Data transformation is required to convert non-stationary time series to stationary time series.

A stationary time series is the one which complies with certain statistical measures like mean, variance, autocorrelation.

The data shouldn’t be heteroscedastic. That means at different intervals of the time these statistical measures are expected to be constant.

In general, the data won’t be this way, so data transformation techniques like differencing, log transformation, etc. can be used to get the time series stationary.

How to determine if Time Series is Stationary?

The two things that we can visualize is Trend and Seasonality. The same value of mean with varying time denotes a constant trend. Variations in specific time intervals denote seasonality.

To capture the trend, we can create rolling mean for the data and see how the rolling mean in varying with time. Rolling mean can be explained as taking the mean of previous few values and taking average to define the next rolling mean value.

To get rolling mean value at n, we take an average of time series values of n-1 to n-10. If the rolling mean stays constant, we can say that that trend is in consensus with stationarity.

To eliminate trend, we can use log transformation. Along with rolling mean, we could also take care of rolling standard deviation. To test the stationarity of a time series, we can use Dickey-Fuller test.

Another method that can be used to eliminate both trend and seasonality is Differencing.

Differencing is simply taking the difference between present value and the previous value. This is also called first-order differencing.

Determining if Time Series Data is Stationery

Time Series Forecasting

We could use ARIMA (Autoregressive integrated moving average) for forecasting.

ARIMA has two parameters ‘p’ and ‘q.' The value ‘p’ for AR and ‘q’ for MA. ‘p’ is the lags considered for the dependent variable. If ‘p’ is 10, for predicting value at time t, we use values of t-1 to t-10 as predictors.

‘q’ is similar to ‘p.' The only difference is instead of taking the values; we take the forecast errors in prediction. We can model AR and MA separately and combine them for forecasting. After the modelling is finished, the values are back-transformed or inversely transformed to get the predictions to the original scale.

In our case, the data has 100 columns with the mixture of both continuous and categorical values and the data is collected from multiple sensors at different times.

So, using the Univariate analysis to forecast value column may not be sufficient for reasonable predictions.

However, if there are no good relations between dependent and all independent variables and each sensor data is independent of others, we could still consider using univariate analysis for each sensor separately.

Approach 2

Multivariate Model using VAR

In a Multivariate model, we consider multiple columns to have the interdependencies intact and predict the values of the required field.

VAR (Vector Autoregression) is a statistical technique which is a stochastic process which helps is considering all the interdependencies among various time series fields in the data.

VAR is the generalization of the univariate AR model. Each variable in the model has its own lagged values to explain the predictions, and along with this it also considers other model variables too.

VAR comes with parameters Akaike Information Criterion and Bayesian Information Criterion ‘AIC,' ’BIC.' These parameters are used to tune the model to select the best for the data.

Approach 3

Multivariate Time Series Forecasting Using Deep Learning Keras with Tensorflow Backend

We could use Deep Learning techniques for time series forecasting. In years, sequential models with LSTM (Long short term memory) can be used for time series forecasting.

LSTMs are one of the recurrent neural networks (RNN). Recurrent neural networks consider the dependency of the sequence in which the data is being inputted.

Besides this, LSTM has the capability of handling huge data. The deep learning models also have the support of learning in batches, saving and reusing them with the new data to continue training process.

To implement the LSTM for multivariate data, we should convert the time series problem to supervised learning problem.

In this approach, to predict one's future value of a given field, we require values of the other variables too. So, in a way we have to predict all the other values too before predicting the required field.

We could employ many methods for this, and among them one of the methods is, to predict one's future value in an independent variable, we could find the average of previous few values.

To predict 10 future values, we could loop over the process for ten times. Another technique is to implement univariate time series for each variable, and once we have future data for all the independent variables, we could use LSTM for training.

Besides this, we can use the approach to train huge dataset in batches. We can work out with the number of hidden layers, other parameters during the model fitting to get the best model.

Multivariate Time Series Forecasting Using Deep Learning

Continue Reading:XenonStack/Blog

XenonStack 0

Monday, 16 April 2018

4/16/2018 04:25:00 pm

Continuous Delivery Pipeline for Deploying PHP Laravel Application on Kubernetes

Overview

Running Containers at any real-world scale requires container orchestration, and scheduling platform like Docker Swarm, Apache Mesos, AWS ECS but the most popular out of it is Kubernetes. Kubernetes is an open source system for automating deployment and management of containerized applications.

In this post, We’ll share the process how you can Develop and Deploy Microservices based PHP Laravel Application on the Container Environment - Docker and Kubernetes and adopt DevOps in existing PHP Applications.

Prerequisites For Deploying Laravel Application on Kubernetes

To follow this guide you need -

Kubernetes
Kubectl
PHP Laravel Application Source Code
Dockerfile
Container-Registry

Kubernetes

It is an open source platform that automates container operations, and Minikube is best for testing Kubernetes.

Kubectl

Kubectl is command line interface to manage Kubernetes cluster either remotely or locally. To configure kubectl on your machine follow this link.

Shared Persistent Storage

Shared Persistent Storage is permanent storage that we can attach to the Kubernetes container so that we don`t lose our data even when container dies. We will be using GlusterFS as the persistent data store for Kubernetes container applications.

PHP Application Source Code

Application Source Code is source code that we want to run inside a Kubernetes container.

DockerFile

Dockerfile contains a bunch of commands to build PHP Laravel application.

Container Registry

The Registry is an online image store for container images.

Below mentioned options are few most popular registries.

Writing a Dockerfile

The below-mentioned code is sample Dockerfile for PHP Laravel applications. In which we are using Apache Maven 3 as the builder for PHP Laravel applications and OpenJDK 8 as a base development environment. Alpine Linux is used due to its very compact size.

FROM ubuntu:14.04

MAINTAINER Xenonstack

# Installing PHP5 and Apache2
RUN apt-get update \
&& apt-get -y install apache2 php5 libapache2-mod-php5 php5-mcrypt php5-json curl git \
&& apt-get clean \
&& update-rc.d apache2 defaults \
&& php5enmod mcrypt \
&& rm -rf /var/www/html \
&& rm -r /var/lib/apt/lists/*

# Installing Composer
RUN curl -sS https://getcomposer.org/installer | php \
&& mv composer.phar /usr/local/bin/composer

# Adding Laravel configurations for apache2
COPY laravel.conf /etc/apache2/sites-available/000-default.conf

# Setting Working Directory
WORKDIR /var/www

# Creating PHP laravel project
RUN composer create-project --prefer-dist laravel/laravel laravel \
&& php laravel/artisan key:generate \
&& chown www-data:www-data -R laravel/storage

# Expose Apache2 Port
EXPOSE 80

# Persistent Data
VOLUME ["/var/www"]

# Starting Apache2 Web Server
CMD /usr/sbin/apache2ctl -D FOREGROUND

view raw Dockerfile For Deploying PHP Laravel Application on Kubernetes hosted with ❤ by GitHub

Below mentioned is the sample Apache2 config file for Laravel application.

Create a file name laravel.conf and add the below mentioned code to it.

<VirtualHost *:80>
 ServerName localhost
 DocumentRoot /var/www/laravel/public

 <Directory /var/www/laravel>
 AllowOverride All
 </Directory>

</VirtualHost>

view raw laravel.conf for Deploying PHP Application on Kubernetes hosted with ❤ by GitHub

Building PHP Laravel Application Image

The below-mentioned command will build your application container image.

$ docker build -t <name of your Laravel application>:<version of application>

view raw Building PHP Laravel Application Image for Deploying PHP Application on Kubernetes hosted with ❤ by GitHub

Publishing PHP Laravel Application Container Image

Now we publish our PHP Laravel application container image to any container registry like Docker Hub, AWS ECR, Google Container Registry, Private Docker Registry.

Today I will be using Azure container registry for publishing container images.

You also need to Sign UP to Azure Cloud Platform then create container registry by using the link.

Now use this link to Pull and Push to Azure Container registry.

Similarly, we can push or pull any container image to any of the below-mentioned container registries like Docker Hub, AWS ECR, Google Container Registry, Private Docker Registry etc.

Creating Deployment Files for Kubernetes

Deploying application on kubernetes with ease using deployment and service files either in JSON or YAML format.

Deployment File

Following Content is for “<name of application>.deployment.yml” file of python container application.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: <name of application>
namespace: <namespace of Kubernetes>
spec:
replicas: 1
template:
 metadata:
 labels:
 k8s-app: <name of application>
 spec:
 containers:
 - name: <name of application>
 image: <image name >:<version tag>
 imagePullPolicy: "IfNotPresent"
 ports:
 - containerPort: 80
 volumeMounts:
 - mountPath: /var/www
 name: <name of application>
 volumes:
 - name: <name of application>
emptyDir: {}

view raw Deployment File For Deploying PHP Laravel Application on Kubernetes hosted with ❤ by GitHub

Service File

Following Content is for “<name of application>.service.yml” file of python container application.

apiVersion: v1
kind: Service
metadata:
labels:
 k8s-app: <name of application>
name: <name of application>
namespace: <namespace of Kubernetes>
spec:
type: NodePort
ports:
- port: 80
selector:
 k8s-app: <name of application>