It is not easy to distinguish which
data is directly relevant for the prediction of the concentration. Actually,
the number of features effecting the concentration is of interest.
In machine learning ‘No Free lunch’
theorem states that there is no such kind of generic algorithm that works best
for all cases specially when it is related to supervised learning5. For
example, one cannot say that in case of regression problem decision trees are
always better than the Neural Networks or vise-versa. There are many factors to
deal with, such as the size of the data set etc. As a result, different algorithms
are applied on the given problem and then evaluate the accuracy on each model.
This helps us finding the model with the better accuracy.
In this case, the prediction of
Anti-freeze concentration lies in the regression problem task. Regression is
the supervised learning task to predict the continuous numeric variables. There are many algorithms to deal with the
regression problem, i.e. Linear Regression, decision trees, nearest neighbors
and deep neural networks.
In this thesis the problem in
tackled with the deep models, the reason for selecting neural networks is that
it requires less formal statistical training. Moreover, its ability to discreetly
detect complex non-linear associations between independent and dependent
variables. Deep models have an ability
to detect all possible interactions between predictor variables and offers with
a variety of training algorithms.
ANNs can learn and model non-linear
and complex relationships, which is very handy because in anti-freeze
concentration case, links between inputs and outputs variables are non-linear
as well as complex. After learning from the initial inputs and their
relationships, ANNs can conclude unseen relationships on new data as well, thus
this makes the model to generalize and predict on unobserved data. As compared
to many other prediction techniques, ANN does not impose any restrictions on
the input variables (like how they should be distributing). Additionally, many
studies have shown that ANNs can better model heteroscedasticity i.e. data with
high volatility and non-constant variance, given its ability to learn hidden
relationships in the data without imposing any fixed connections in the data