2. without candidate generation have been proposed which

Posted on

2. without candidate generation have been proposed which

2. Problems and related work.Problems of FPM AlgorithmsIn a data set the items which satisfies user defined threshold are frequent otherwise infrequent. 1. In many real applications it is infeasible to find out all frequent patterns especially when data set is highly populated.2. It is a tedious task to decide the threshold value as low threshold may produce large number of patterns destroying the accuracy of mining and the high threshold will only produce very less patterns leaving even some of the frequent item sets.3. Algorithm with candidate generation may generate large number of candidates to produce frequent patterns which require more space and database scans and make complete process expensive.4. The major problems with this algorithm are of multiple database scan and the search space 6.Related worksTo overcome the problem of previously proposed algorithms many extensions are being made to reduce database scans and the number of candidate generation like Aclose 10, CHARM 8, Cobbler 11, Carpenter 11, AFOPT 12 and etc, are the extensions of Apriori which is a method based on candidate generation. FP-growth 20 is a method based on without candidate generation have been proposed which requires two database scans in first scan item sets are arranged according to the frequency descending order and during second scan it mine all frequent items. It is advancement over prefix tree. FP-tree merges the links which have same value. It compacts the data and enhances the performance by increasing the speed. It requires large memory space for parsley populated data set where common path is very low. There is another method known ELCAT 7 which uses vertical data format rather than horizontal data format, it prove much more efficient then Apriori as it uses Boolean power set lattice theory which requires less space to store information about the transaction. The refined solution proposed in our method is to derive frequent patterns from MFP, many algorithms have been devised which generates frequent patterns from MFP and prune infrequent item sets. There are two pruning techniques used1. Subset pruning mining: the all subsets of any frequent pattern are pruned because they cannot be maximal frequent pattern.2. Superset infrequency pruning: the all supersets of any infrequent pattern are pruned because they cannot be frequent pattern.But still they still require two database scans like Pincer search algorithm 13 4. It makes use of both top-down and bottom-up traversal to mine MFP. Depth project is another method to mine MFP which uses depth first traversal 15 and both pruning techniques and moves in lexicographic order to traverse. This is an efficient method to mine frequent patterns. The extension of depth project is MAFIA 5. Rymon’s set enumeration is used by above methods which avoid counting the support of all frequent patterns 18. Flex 20 is a method based on vertical data format, it moves in lexicographic order which is based on first test than generate which ensures that generated nodes are frequent. It is the one of the most efficient method to mine MFP. But the major drawback is it needs the huge amount of memory to store the information about item sets.3. Proposed Method3.1 Data Transformation techniqueData pre-processing is an essential step of data mining as shown in figure. It comprise of data cleaning, data reduction and data transformation 22. In our method data transformation is used to reduce the size of data set significantly. In this method the web log dataset is transformed with prime based compaction which reduces the size of dataset. Each complete transaction is transformed into PMV (Prime Multiplied Value) a positive integer. During Prime graph construction transaction given          T=(Pid, Z ) where Pid is the ID of transaction and Z= { an……am} is the item set of Z. Prime Multiplied Value Pid is computed with the help of equation 1                   Mod (PMV, Pr) Where Pr is the number of item set of Z.With the help of above equation can be transformed into contracted form. In fact data transformation is an abstracted form of transactions. This is explained with the help of an example in table 1 there shows eight transaction of website login and page number. In which page number is then transformed into prime numbers and then prime multiplied value is calculated.When this transformation is applied to the real web log data result will be in drastic compaction. It reduces the size of data set more than half. This process is independent of size and type of data set, any data set can be reduced like  P=(4,{8,7,12,11}) and P0 = (4,{8884,990,7123,1234}) are transformed to the same value 770.Table 3.1 The transaction database DB and its Transaction ValuesTID     Page No.      Transformed           TV1    8, 5, 11, 20, 6     2, 3, 5, 7, 11          23102    8, 5, 11, 20, 9      2, 3, 5, 7, 13          27303    8, 5, 6                  2, 3, 11                  664    8, 11, 20             2, 5, 7, 11              7705    11, 20, 9             5, 7, 13                  4556    8, 11, 20, 9          2, 5, 7, 13             9107    8, 11, 20              2, 5, 7                    708   11, 20, 9              5, 7, 13                 4553.2 Prime Graph ConstructionGraph structure enhances the performance of mining by data compression and by reducing search space by using pruning techniques.Thus the graph structures have been considered as a good option in previous data mining researches . This research introduces a simple graph structure called Prime Graph (Prime-number Compressed Graph). Prime graph uses the concept of prime number theory for transformation. This method improves the performance by data compressing and pruning techniques.A Prime Graph includes number of nodes which consist of prime number allotted to the item set of transaction (P1….n) and on the other hand some nodes consist of Prime multiplied value i.e. PMV1…m. The node structure consisted mainly of several different fields: value, local-count, global-count, status and link. PMV is getting stored in the value field. During insertion of current PMV local-count field set by 1 if function mod (PMVm,P1….n) = 0 or no remainder. The global-count field registers support of pattern P which presented by its PMV.Global–count register is used to store the support information of all frequent and non-frequent items which can be further used for mining according to the user defined threshold.  The status field is use to keep track of traversing when a new node is visited its value changes from 0 to 1. The link field is to form inward and outward edges to and from the nodes.Fig. 1 & 2 shows the construction of Prime Graph for transactions shown in table 1. The construction operation mainly consists of insertion of nodes PMV(s) and P1…n into Prime Graph based on definitions below:Definition1: Links through PMV and Pn will be connected depending upon the formulated equation mod (PMVn,P1…n)=0 or 1. Each and every value of PMV get modulo divided by P. If there is no remainder or 0 that means PMV is completely divisible by P, then there will be a link form between from that P directed towards PMV and local-count increased by 1.Definition 2: Link from one PMV to other PMV is formed when one PMV is completely Divisible by other PMV.Definition 3: A self loop to a node of PMV is form when same value of PMV is repeated more than one time i.e. same subset of item set is been repeated more than once in a whole set of transaction.

admin
Author

x

Hi!
I'm James!

Would you like to get a custom essay? How about receiving a customized one?

Check it out