It is not easy to distinguish which

data is directly relevant for the prediction of the concentration. Actually,

the number of features effecting the concentration is of interest.

In machine learning ‘No Free lunch’

theorem states that there is no such kind of generic algorithm that works best

for all cases specially when it is related to supervised learning5. For

example, one cannot say that in case of regression problem decision trees are

always better than the Neural Networks or vise-versa. There are many factors to

deal with, such as the size of the data set etc. As a result, different algorithms

are applied on the given problem and then evaluate the accuracy on each model.

This helps us finding the model with the better accuracy.

In this case, the prediction of

Anti-freeze concentration lies in the regression problem task. Regression is

the supervised learning task to predict the continuous numeric variables. There are many algorithms to deal with the

regression problem, i.e. Linear Regression, decision trees, nearest neighbors

and deep neural networks.

In this thesis the problem in

tackled with the deep models, the reason for selecting neural networks is that

it requires less formal statistical training. Moreover, its ability to discreetly

detect complex non-linear associations between independent and dependent

variables. Deep models have an ability

to detect all possible interactions between predictor variables and offers with

a variety of training algorithms.

ANNs can learn and model non-linear

and complex relationships, which is very handy because in anti-freeze

concentration case, links between inputs and outputs variables are non-linear

as well as complex. After learning from the initial inputs and their

relationships, ANNs can conclude unseen relationships on new data as well, thus

this makes the model to generalize and predict on unobserved data. As compared

to many other prediction techniques, ANN does not impose any restrictions on

the input variables (like how they should be distributing). Additionally, many

studies have shown that ANNs can better model heteroscedasticity i.e. data with

high volatility and non-constant variance, given its ability to learn hidden

relationships in the data without imposing any fixed connections in the data

set6.