MINING a reduced and lossless representation of high

 

                                     MINING
HIGH UTILITY USING DAHU AND CHUD ITEMSETS

 

Abstract:

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

             A vital information mining assignment is
Mining item sets with high utility from value based databases, which alludes to
the disclosure of item sets in highly profitable(e.g. high benefits), few works
have been done seen that ongoing techniques may display excessively numerous
item sets with high utilities for a client, which corrupts the                                                                                                                                                                                                                                                                                                                                              for the mining undertaking
and give a compact extracting result to clients, a unique system in this paper
for mining closed+ high utility item sets is proposed, which works as a reduced
and lossless representation of high utility item sets. A proficient calculation
called CHUD (Closed+ High Utility item set Disclosure) for extracting closed+
high utility item sets. Further, a strategy called DAHU (Derive All High
Utility item sets) is suggested to recuperate all item sets with high utility
from the arrangement of closed+ high utility item sets in the absence of
getting to the first database. After effects of tests on genuine and
manufactured datasets demonstrate that CHUD and DAHU are exceptionally
effective with a gigantic decrease (up to 800 times in tests) in the number of
high profitable item sets. Also, when all high utility item sets are
recuperated by DAHU, the methodology joining CHUD and DAHU additionally beats
the top in class calculations in mining high utility item sets.

 

 

Keywords: High Utility Mining
Dataset, Data streams

 

1.      Introduction

The mining of association rules
for discovery of relationship between items in large databases is a well
designed technique in data mining field with typical methods like Apriori 1,2.
The issue of mining association rules can be break down into two steps. The
first step requires finding all frequent itemsets (or say large itemsets) in
databases. Once the frequent itemsets are create, producing association rules
is effortless and can be achieved in linear time.

An important research topic
expanded from the association rules mining is the detection of temporal
association patterns in data streams due to the vast applications on different
domains. Temporal data mining can be defined as the movement of looking for interesting
connection or patterns in large sets of temporal data gathered for other
purposes 6. For a database with a specified transaction window size, we may
use the algorithm like Apriori to obtain frequent itemsets from the database.
For time variant data streams, there is a strong demand to develop systematic
and successful  method to mine different
temporal patterns 11. However, most methods are designed for the traditional databases
cannot be directly applied for mining temporal patterns in data streams because
of the high difficulty. In numerous applications, we
would like to mine temporal association patterns in data streams for amount of
most recent data. That is, in the temporal data mining, one has to include new
data (i.e., data in the new hour) and also remove the old data (i.e., data in
the most obsolete hour) from the mining process. Without loss of generalization
consider a typical market.

 

 

1.      Literature
Survey

W. Wang et al in “Efficient
mining of weighted association rules (WAR),” 1 suggested weighted association
rule. In this rule we first locate frequent itemsets and the weighted
association rules for each frequent itemsets are created. Weighted association
rule mining first proposed the concept of weighted items and weighted
association rules. However, the weighted association rules does not have
downward closure property, mining presentation cannot be improved. By using
transaction weight, weighted support can reflect the importance of an itemsets and
also maintain the downward closure property during the mining process. In Fast
algorithms for mining association rules, R. Agarwal 2 proposed Apriori
algorithm, used to obtain frequent itemsets from the database. In mining the
association rules we have the problem to create all association rules that have
support and confidence greater than the user specified minimum support and
minimum confidence. Apriori is a classic algorithm for frequent itemsets mining
and association rule learning over transactional databases. After identifying
the large itemsets, only those itemsets are allowed which have the support
greater than the minimum support allowed. Apriori Algorithm creates a large
amount of candidate item sets and checks database every time. When a new
transaction is added to the database then it should recheck the entire database
again. Candidate itemsets are stored in a hash-tree which consists of either a
list of itemsets or a hash table. Utility mining is used to find all the
itemsets that have utility values which are beyond a user specified threshold.
“A fast high utility itemsets mining algorithm,” by Liu et al in 3 suggested
a Two-phase algorithm for finding high utility itemsets. Two-Phase algorithm
effectively top down the number of candidates and acquire the complete set of
high utility itemsets. It performs very effectively in terms of memory cost and
speed both on synthetic and real databases, even on large databases. In this
method, there is two phase concept is used. In Two-phase, to be focused on traditional
databases and is not suitable for data streams. In Two-phase we are not finding
temporal high utility itemsets in data streams but it must recheck the entire database
when added new transactions from data streams. J. Hu et al in “High-utility
pattern mining: A method for discovery of high-utility item sets”, 4 defines
an algorithm that the concept of frequent item set mining is used which locate
high utility items combinations. But actually an algorithm is used to find
segment of data, which is defined with the merging of few items i.e. rules and it
differs from the frequent item mining techniques and traditional association
rule. The problem review in high utility pattern mining is entirely different
from the previous approaches as it conducts rule discovery with respect to the
overall specification for the mined set as well as with respect to individual
attributes. S.Shankar, A fast algorithm for mining high utility itemsets 5
presents a unique algorithm for Fast Utility Mining. For generating Itemsets, the
techniques like Low Utility and High Frequency (LUHF) and Low Utility and Low
Frequency (LULF), High Utility and High Frequency (HUHF), High Utility and Low
Frequency (HULF) are used. Cheng-Wei Wu et al in “UP Growth: An Efficient
Algorithm for High Utility Itemsets Mining,”6 suggested an algorithm for
effectively discovering high utility itemsets from transactional databases.
Depending on the making of a global UP tree the high utility itemsets are created
using UP Growth which is one of the structured algorithms. J. Han et al in 7
proposed frequent pattern tree (FP-tree) structure in “Mining frequent patterns
without candidate generation,” paper for collecting crucial information about
frequent patterns, compressed and develop an effective FP-tree based mining
method is Frequent pattern tree structure. It makes a highly compact FP-tree,
which is usually significantly smaller than the original database, by which costly
database scans are saved in the subsequent mining processes. It applies a
pattern growth method that avoids the costly candidate generation. FP-growth is
inadequate to find high utility itemsets. H. F. Li et al in “Fast and Memory
Efficient Mining of High Utility Itemsets in Data Streams,” 8 proposed two
efficient one pass algorithms MHUI-BIT and MHUI-TID for extracting high utility
itemsets from data streams within a transaction sensitive sliding window. For
improving the efficiency of high utility itemsets mining, two effective
representations of extended lexicographical tree-based summary data structure
and itemsets information were developed. V.S. Tseng et al in “Efficient Mining
of Temporal High Utility Itemsets from Data streams,” 9 proposes a temporal
high utility itemsets mining. The temporal high utility itemsets with less
candidate itemsets and higher performance can be discovered by THUI- mining
utility. To generate a continuous set of itemsets THUI-Mine retains a filtering
threshold in every partition. The two drawbacks of THUI-Mine algorithm are vast
memory requirement and a large amount of false candidate itemsets.