-
Thomas G. Dietterich.
The Divide-and-Conquer Manifesto.
In Proc. 11th Int. Conf. on Algorithmic Learning Theory,
pages 13--26,
2000.
[ URL
]
Keywords:
Sequential/Temporal Data.
| Abstract: |
Existing machine learning theory and algorithms have focused on learning an unknown function from training examples, where the unknown functions maps from a feature vector to one of a small number of classes. Emerging applications in science and industry require learning much more complex functions that map from complex input spaces (e.g., 2-dimensional maps, time series, and strings) to complex output spaces (e.g., other 2-dimensional maps, time series, and strings). Despite the lack of theory covering such cases, many practical systems have been built that work well in particular applications. These systems all employ some form of divide-and-conquer, where the inputs and outputs are divided into smaller pieces (e.g., windows), classified, and then the results are merged to produce an overall solution. This paper defines the problem of divide-and-conquer learning and identifies the key research quetions that need to be studied in order to develop practical, general-purpose learning algorithms for divide-and-conquer problems and an associated theory. |
-
Graziano Frosini,
Beatrice Lazzerini,
and Francesco Marcelloni.
A modified Fuzzy c-Means Algorithm for Feature Selection.
In Thomas Whalen, editor,
NAFIPS00,
Atlanta, Georgia, USA,
pages 148--152,
2000.
Keywords:
Classification,
Clustering,
Fuzzy c-Means,
Nearest Neighbour Methods.
| Abstract: |
In this paper we propose a novel method for feature selection based on a modified fuzzy c-means algorithm with supervision (MFCMS). MFCMS adopts an appropriately modifed version of the objective function used by the classic fuzzy c-means. We applied MFCMS to some real-world pattern classification benchmarks. To test the effectiveness of MFCMS as feature selector, we used the well-known k-nearest neighbour as learning algorithm. In our experiments we found that the classification performance using the set of features selected by MFCMS is better than that using all the original features. Furthermore, our approach proved to be less time-consuming than other selection methods. |
-
B. Goethals and J. Van den Bussche.
On Supporting interactive association rule mining.
In DAWAK00,
volume 1874 of LNCS,
pages 307--316,
2000.
Keywords:
Association Rules.
| Abstract: |
We investigate ways to support interactive mining sessions, in the setting of association rule mining. In such sessions, users specify conditions (queries) on the associations to be generated. Our approach is a combination of the integration of querying conditions inside the mining phase, and the incremental querying of already generated associations. We present several concrete algorithms and compare their performance. |
-
Jiawei Han,
Jian Pei,
and Yiwen Yin.
Mining frequent patterns without candidate generation.
In ICMD00,
pages 1--12,
2000.
[ URL
]
Keywords:
Sequential/Temporal Data.
| Abstract: |
Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns.\newline In this study, we propose a novel frequent pattern tree (FP-tree) structure, which is an extended pre x- tree structure for storing compressed, crucial information about frequent patterns, and develop an e cient FP-tree- based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. E ciency of mining is achieved with three techniques: (1) a large database is compressed into a highly condensed, much smaller data structure, which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining con ned patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is e cient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent pattern mining methods. |
-
Bjarne K. Hansen.
Analog forecasting of ceiling and visibility using fuzzy sets.
In Preprints of the 2nd Conference on Artificial Intelligence, American Meteorological Society,
pages 1--7,
2000.
[ URL
]
Keywords:
Similarity Measures,
Nearest Neighbour Methods.
| Abstract: |
A fuzzy logic based methodology for knowledge acquisition is used to build a retrieval-based analog forecasting system, a fuzzy k-nearest neighbour based prediction system. The methodology is used to acquire knowledge about what salient features of continuous-vector, unique temporal cases indicate significant similarity between cases. Such knowledge is encoded in a similarity-measuring function and thereby used to retrieve k nearest neighbours (k-nn) from a large database of airport weather observations. Predictions for the present weather case are made from a weighted median of the outcomes of analogous past cases, the k-nn, the analog ensemble. Past cases are weighted according to their degree of similarity to the present case. |
-
Frank Höppner.
Piecewise Linear Function Approximation by Alternating Optimization.
In Proc. of the 8th Int. Conf. on Information Processing and Management of Uncertainty in Knowledge Based Systems (IPMU),
Madrid, Spain,
pages 1751--1757,
2000.
[ PDF
] [ Postscript
]
Keywords:
Piecewise Linear Representations,
Clustering,
Fuzzy Clustering,
Fuzzy c-Means.
| Abstract: |
Fuzzy clustering algorithms like the Fuzzy c-Means algorithm perform cluster analysis by minimizing an objective function through {\sl alternating optimization} (AO). The problem of optimal piecewise linear function approximation can also be expressed by means of an objective function. In this paper, the AO technique is used to derive a new solution to this problem. The resulting unsupervised Hard c-Connected Lines (HcCL) algorithm updates alternatingly support points and output values and finds an optimum number of line segments and their location without any further input parameter. |
-
Frank Höppner and Frank Klawonn.
Fuzzy Clustering of Sampled Functions.
In NAFIPS00,
Atlanta, USA,
pages 251--255,
2000.
[ PDF
] [ Postscript
]
Keywords:
Clustering,
Fuzzy Clustering.
| Abstract: |
Fuzzy clustering algorithms perform cluster analysis on a data set that consists of feature attribute vectors. In the context of multiple sampled functions, a set of samples (sampled function) becomes a single datum. We show how the already known algorithms can be used to perform fuzzy cluster analysis on this kind of data sets by replacing the conventional prototypes with sets of prototypes. This approach allows reusing the known algorithms and works also with other data sets than sampled functions. Furtermore, to reduce the computational costs in case of single-input/single-output functions we present a new algorithm, which uses for the first time a more complex input data type (data points aggregated to data-lines) than the known approaches. The new alternating optimisation algorithm performs cluster analysis directly on this more compact representation of the sampled functions. |
-
Frank Höppner and Frank Klawonn.
Obtaining Interpretable Fuzzy Models from Fuzzy Clustering and Fuzzy Regression.
In KES00,
Brighton, UK,
pages 162--165,
2000.
[ PDF
] [ Postscript
]
Keywords:
Clustering,
Fuzzy Clustering,
Fuzzy Models,
Sequential/Temporal Data,
Regression.
| Abstract: |
In this paper we develop an objective function-based clustering algorithm to build fuzzy models of the Takagi-Sugeno (TS) type automatically from data. In contrast to most of the TS models that can be found in the literature, we decided to use very simple input-space partitions and a higher degree of consequence polynomials (quadratic). Only in this way transparency and interpretability can be guaranteed. We also show how to derive linguistic labels for the polynomials found by the algorithm. |
-
Po-Shan Kam and Ada Wai-Chee Fu.
Discovering Temporal Patterns for Interval-based Events.
In DAWAK00,
volume 1874 of LNCS,
pages 317--326,
2000.
Keywords:
Sequential/Temporal Patterns.
| Abstract: |
In this paper, we consider interval-based events where the duration of events is expressed in terms of endpoint values, and these are used to form temporal constraints in the discovery process. We introduce the notion of temporal representation which is capable of expressing the relationships between interval-based events. We develop new methods for finding such interesting patterns. |
-
Kamran Karimi and Howard J. Hamilton.
Finding Temporal Relations: Causal Bayesian Networks vs. C4.5.
In ISMIS00,
Charlotte, NC, USA,
pages 266-273,
2000.
Keywords:
Decision Trees,
Bayesian Networks.
| Abstract: |
Observing the world and finding trends and relations among variables of interest is an important and common learning activity. In this paper we apply TETRAD, a program that uses Bayesian networks to discover causal rules, and C4.5, which creates decision trees, to the problem of discovering relations among a set of variables in the controlled environment of an Artificial Life simulator. All data in this environment are generated by a single entity over time. The rules in the domain are known, so we are able to assess the effectivness of each method. The agent's sensings of its environment and its own actions are saved in data records over time. We first compare TETRAD and C4.5 in discovering the relations between variables in a single record. We next attempt to find temporal relations among the variables of consecutive records. Since both these programs disregard the passage of time among the records, we introduce the flattening operation as a way to span time and bring the variables of interest together in a new single record. We observe that flattening allows C4.5 to discover relations among variables over time, while it does not improve TETRAD's output. |
-
Annette Keller.
Fuzzy Clustering with Outliers.
In NAFIPS00,
2000.
Keywords:
Noise Handling,
Clustering,
Fuzzy Clustering.
| Abstract: |
In this paper we introduce a modified objective function for fuzzy clustering. We add an additional weighting factor for each datum and derive necessary conditions for the introduced parameter in order to optimise the objective function. These conditions are used in an alternating optimisation scheme to calculate a partition of sample data. The obtained weights determine a kind of representativeness of each datum for the data distribution. They can be used to identify outliers and enable the expert to locate critical areas that are often represented by only a few outliers. |
-
Edward D. Kim,
Joyce M. W. Lam,
and Jiawei Han.
AIM: Approximate Intelligent Matching for Time Series Data.
In Y. Kambayashi,
M. Mohania,
and A. M. Tjoa, editors,
DAWAK00,
volume 1874 of LNCS,
London, UK,
pages 347--357,
2000.
Keywords:
Similarity Measures,
Sequential/Temporal Data.
| Abstract: |
Time-series data mining presents many challenges due to the intrinsic large scale and high dimensionality of the data sets. Subsequence similarity matching has been an active research area driven by the need to analyze large data sets in the financial, biomedical and scientific databases. In this paper, we investigate an intelligent subsequence similarity matching of time series queries based on efficient graph traversal. We introduce a new problem, the approximate partial matching of a query sequence in a time series database. Our system can address such queries with high specifity and minimal time and space overhead. The performance bottleneck of the current methods were analyzed and we show our method can improve the performance of the time series queries significantly. It is general and flexible enough to find the best approximate match query without specifying a tolerance $\varepsilon$ parameter. |
-
Yingjiu Li,
X. Sean Wang,
and Sushil Jajodia.
Discovering Temporal Patterns in Multiple Granularities.
In J.F. Roddick and K. Hornsby, editors,
TSDM00,
number 2007 of LNAI,
Lyon, France,
pages 5--19,
2000.
Keywords:
Sequential/Temporal Patterns.
| Abstract: |
Many events repeat themselves as the time goes by. For example, an institute pays its employees on the first day of every month. However, events may not repeat with a constant span of time. In the payday example here, the span between each two consecutive paydays ranges between 28 and 31 days. As a result, regularity, or temporal pattern, has to be captured with a use of granularities (such as day, week, month, and year), oftentimes multiple granularities. This paper defines the above patterns, and proposes a number of pattern discovery algorithms. To focus on the basics, the paper assumes that a list of events with their timestamps is given, and the algorithms try to find patterns for the events. All of the algorithms repeat two possibly interleaving steps, with the first step generating possible (called candidate) patterns, and the second step verifying if candidate patterns satisfy some user-given requirements. The algorithms use pruning techniques to reduce the number of candidate patterns, and adopt a data structure to efficiently implement the second step. Experiments show that the pruning techniques and the data structure are quite effective. |
-
Weiqiang Lin,
Mehmet A. Orgun,
and Graham J. Williams.
Temporal Data Mining Using Multilevel-Local Polynomial Models.
In IDEAL00,
volume 1983 of LNCS,
pages 180--186,
2000.
Keywords:
Similarity Measures,
Sequential/Temporal Data,
Sequential/Temporal Patterns.
| Abstract: |
This study proposes a data mining framework to discover qualitative and quantitative patterns in discrete-valued time series (DTS). In our method, there are three levels for mining temporal patterns. At the first level, a structural method based on distance measures through polynomial modelling is employed to find pattern structures; the second level performs a value-based search using local polynomial analysis; and then the third level based on multilevel-local polynomial models find global patterns from a DTS set. We demonstrate our method on the analysis of ``Exchange Rates Patterns'' between the US dollar and Australian dollar. |
-
Bing Liu,
Yiyuan Xia,
and Philip S. Yu.
Clustering Through Decision Tree Construction.
In CIKM00,
pages 20--29,
2000.
Keywords:
Classification,
Decision Trees,
Clustering,
Similarity Measures.
| Abstract: |
Clustering aims to find the intrinsic nature of data by organizing data objects into similarity groups or clusters. It is often called unsupervised learning as no class labels denoting an a priori partition of the objects are given. This is in contrast with supervised learning (e.g., classification) for which the data objects are already labeled with known classes. Past research in clustering has produced many algorithms. However, these algorithms have some major shortcomings. In this paper, we propose a novel clustering technique, which is based on a supervised learning technique called decision tree construction. The new technique is able to overcome many of these shortcomings. The key idea is to use a decision tree to partition the data space into clusters and empty (sparse) regions at different levels of details. The technique is able to find "natural" clusters in large high dimensional spaces efficiently. It is suitable for clustering in the full dimensional space as well as in subspaces. It also provides comprehensible descriptions of clusters. Experiment results on both synthetic data and real-life data show that the technique is effective and also scales well for large high dimensional datasets. |
-
Anna Maria Massone,
Léonard Studer,
and Francesco Masulli.
Pattern Recognition in RICH Counters Using the Possibilistic C-Spherical Shell Algorithm.
In KES00,
Brighton, UK,
pages 792--795,
2000.
Keywords:
Noise Handling,
Image Data.
| Abstract: |
The pattern recognition problem in RICH counters concerns the identification of an unknown number of imperfect roughly-circular rings made of a low number of discrete points in presence of background. In this paper we present some preliminary results obtained using the Possibilistic c-Spherical Shells Algorithm. In particular, we show that the algorithm is very tolerant and robust to noise (outliers rate) level. Moreover, for complex images, full of rings, we introduce an iterative scheme that greatly improves performances. Besides that, the rings are not requested to be complete, only arcs are enough to recognize the underlying rings by the algorithm. |
-
Katharina Morik.
The Representation Race -- Preprocessing for Handling Time Phenomena.
In ECML00,
volume 1810 of LNAI,
Barcelona, Spain,
pages 4--19,
2000.
| Abstract: |
Designing the representation language for the input, $L_E$, and output, $L_H$, of a learning algorithm is the hardest task within machine learning applications. This paper emphasizes the importance of constructing an appropriate representation $L_E$ for knowledge discovery applications using the example of time related phenomena. Given the same raw data -- most frequently a database with time-stamped data -- rather different representations have to be produced for the learning methods that handle time. In this paper, a set of learning tasks dealing with time is given together with the input required by learning methods which solve the task. Transformations from raw data to the desired representation are illustrated by three case studies. |
-
Jian Pei,
Jiawei Han,
and Runying Mao.
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets.
In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery,
pages 21--30,
2000.
Keywords:
Association Rules.
| Abstract: |
Association mining may often derive an undesirably large set of frequent itemsets and association rules. Recent studies have proposed an interesting alternative: mining frequent closed itemsets and their corresponding rules, which has the same power as association mining but substantially reduces the number of rules presented. In this paper, we propose an efficient algorithm, CLOSET, for mining closed itemsets, with the development of three techniques: (1) applying a compressed, frequent pattern tree FP-tree structure for mining closed itemsets without candidate generation, (2) developying a single prefix path comperession technqiue to identify frequent closed itemsets quickly, and (3) exploring a partition-based projection mechanism for scalable mining in large databases. Our performance study shows that CLOSET is efficient and scalable over large databases, and is faster than the previously proposed methods. |
-
Alexandrin Popescul,
Gary William Flake,
Steve Lawrence,
Lyle H. Ungar,
and C. Lee Giles.
Clustering and Identifying Temporal Trends in Document Databases.
In Proc. of the IEEE Int. Conf. on Advances in Digital Libraries (ADL),
Washington, DC,
pages 173--182,
2000.
Keywords:
Clustering.
| Abstract: |
We introduce a simple and efficient method for clustering and identifying temporal trends in hyper-linkes document databases. Our method can scale to large datasets because it exploits the underlying regularity often found in hyper-linked document databases. Because of this scalability, we can use our method to study the temporal trends of individual clusters in a statistically meaningful manner. As an example of our approach, we give a summary of the temporal trends found in a scientific literature database with thousands of documents (citeseer). |
-
Richard J. Povinelli.
Identifying Temporal Patterns for Characterization and Prediction of Financial Time Series Events.
In J.F. Roddick and K. Hornsby, editors,
TSDM00,
number 2007 of LNAI,
pages 46--61,
2000.
Keywords:
Sequential/Temporal Data,
Sequential/Temporal Patterns.
| Abstract: |
The novel time series data mining (TSDM) framework is applied to analyzing financial time series. The TSDM framework adapts and innovates data mining concepts to analyzing time series data. In particular, it creates a set of methods that reveal hidden temporal patterns that are characteristic and predictive of time series events. This contrasts with other time series analysis techniques, which typically characterize and predict all observations. The TSDM framework and concepts are reviewed, and the applicable TSDM method is discussed. Finally, the TSDM method is applied to time series generated by a basket of financial securities. The results show that statistically significant temporal patterns that are both characteristic and predictive of events in financial time series can be identified. |
-
Chris P. Rainsford and John F. Roddick.
Visualisation of Temporal Interval Association Rules.
In IDEAL00,
number 1983 of LNCS,
pages 91--96,
2000.
Keywords:
Association Rules.
| Abstract: |
Temporal intervals and the interaction of interval-based events are fundamental in many domains including medicine, commerce, computer security and various types of normalcy analysis. In order to learn from temporal interval data we have developed a temporal interval association rule algorithm. In this paper, we will provide a definition for temporal interval association rules and present our visualization techniques for viewing them. Visualization techniques are particularly important because the complexity and volume of knowledge that is discovered during data mining often makes it difficult to comprehend. We adopt a circular graph for visualizing a set of associations that allows underlying patterns in the associations to be identified. To visualize temporal relationships, a parallel coordinate graph for displaying the temporal relationships has been developed. |
-
Iztok Savnik,
Georg Lausen,
Hans-Peter Kahle,
Heinrich Spiecker,
and Sebastian Hein.
Algorithm for Matching Sets of Time Series.
In Int. Conf. on Principles of Data Mining and Knowledge Discovery,
pages 277-288,
2000.
Keywords:
Classification,
Sequential/Temporal Data.
| Abstract: |
Time series are time-stamped sequences of values which represent a parameter of the observed processes in subsequent time points. Given a set of time series describing a set of similar processes, the model of the behaviour of processes is constructed as a range of classification trees which describe the characteristics of each particular time point in series. An algorithm for matching a sequence of values with the model is used for searching common patterns in the sets of time series, and for predicting the starting time points of undated time series. The algorithm was developed and analyzed in the frame of the study of tree-ring time series. The implementation and the empirical analysis of the algorithm on the tree-ring time series are presented. |
|