Introduction algorithm. Literature survey In paper 1, three

Introduction

Now-a-days breast cancer
is common in women. Predicting breast cancer is as important as its treatment.
Breast cancer is the most common cause of death among women. If breast cancer
predicted at its earlier stages, better treatment can be provided which enable
the person to survive. Diagnosis and treatment of breast cancer has become an
urgent work to perform. Different data mining methods are used to retrieve
valuable information from large databases inorder to make decisions to provide
better health services.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

 Breast cancer begins with the abnormal growth of some
breast cells. These cells divide more rapidly and continue to accumulate than
healthy cells do, forming a lump or mass. These cells may grow through your
breast to your lymph nodes or to other parts of your body. Breast cancer varies on the
basis of age groups, it is less common at a young age (i.e., in their thirties), younger women lean
to have more aggressive breast cancers than older women.

In this paper we perform comparison on different
classification as well as clustering algorithm to predict breast cancer. A
number of attributes are used in performing comparison, they
include……………………………. These attributes are compared to find the best
classification algorithm. 

Literature survey

In paper 1,
three different data mining classification methods are used for the prediction
of breast cancer. it compared on different parameters for prediction of cancer.
But for superior prediction, focus is on accuracy and lowest computing time. Studies
filtered all algorithms based on lowest computing time and accuracy and we came
up with the conclusion that Naïve Bayes is a superior algorithm compared to the
two others because it takes lowest time i.e. 0.02 seconds and at the same time
is providing highest accuracy. In future we will compare results with other
supervised as well as unsupervised methods and compare their performances. In
evidences.

 In 2 paper, WPBC dataset is used for finding
an efficient predictor algorithm to predict the recurring or non-recurring
nature of disease. This helps Oncologists to differentiate a good prognosis
(non-recurrent) from a bad one (recurrent) and can treat the patients more
effectively. Eight popular data mining methods have been used, four from
clustering algorithms (Kmeans ,EM, PAM and Fuzzy c-means) and four from classification
algorithms (SVM, C5.0, KNN and Naive Bayes).The results of these algorithms are
clearly outlined in this paper with necessary results. The classification
algorithms, C5.0 and SVM have shown 81% accuracy in classifying there occurrence
of the disease. This is found to be best among all. On the other hand,

EM was found to be
the most promising clustering algorithm with the accuracy of 68%. The research
shows that the classification algorithms are better predictor than clustering algorithms.
The impact factors of various parameters responsible for predicting the
occurrence/non-occurrence of the disease can be verified clinically. Further,
the identified critical parameters should be verified by applying on larger
medical dataset topredict the recurrence of the disease in future.

3 In this study,
we intend to build a diagnostic model for breast cancer which is to search the
relationship between breast cancer and its symptoms. A feature selection
method, INTERACT, is applied to select related and important features in order
to improve the accuracy of the diagnostic model. And, SVM is applied to build
the classification model. Two diagnostic models are built with and without
feature selection for the sake of proving the significance of the feature
selection. Through the experiments, the accuracy of the diagnostic model with
feature selection is improved obviously compared with the model without feature
selection.

Meantime, nine
features are chosen out as the relevant factors for building the diagnostic
model. The information we find in this study can be supplementary information
for related practitioner better diagnosing heart disease.

In paper 4 
we tried to focus on the importance of feature selection in breast
cancer prognosis. Using proper attribute selection technique, any
classification algorithm can be improved significantly. Attributes with less
contribution in dataset often misguides the classification and results in poor
prediction. In our work, we found Support Vector Machine giving much better
output both before and after attribute selection. Area under ROC curve analysis
showed results in our favor where Naïve Bayes and Decision Tree showed much
better improvement after feature selection method. In future we will try to
evaluate some newer algorithms with better feature selection technique. In this
paper we only focused on whether breast cancer is recursive or not. In addition
of this work, we will try to predict the time of recurrence of cancer which is
classified as recursive. Paper 5 presented a survey of
classification simulations which can be used for breast cancer detection using
weka tool. We discussed variety of classification techniques that already exist
in real world and the performance accuracy is listed from that. By using that
we can decide which algorithm is best for the weka tool for breast cancer
detection. It compares different algorithmsand found SVM is better having high
accuracy and expectation maximization with the least accuracy. In paper 6 paper
presented a survey of classification simulations which can be used for breast
cancer detection using weka tool. We discussed variety of classification
techniques that already exist in real world and the performance accuracy is
listed from that. By using that we can decide which algorithm is best for the
weka tool for breast cancer detection.