-
Richard J. Bolton and David J. Hand.
Significance Tests for Patterns in Continuous Data.
In ICDM01,
2001.
| Abstract: |
In this paper we consider the question of uncertainty of detected patterns in data mining. In particular, we develop statistical tests for patterns found in continuous data, indicating the significance of these patterns in terms of the probability that they have occurred by chance. We examine the performance of these tests on patterns detected in several large data sets, including a data set describing the locations of earthquakes in California and another describing flow cytometry measurements on phytoplankton. |
-
Fu-lai Chung,
Tak-Chung Fu,
Robert W. P. Luk,
and Vincent Ng.
Flexible Time Series Pattern Matching Based on Perceptually Important Points.
In IJCAI-01 Workshop on Learning from Temporal and Spatial Data,
Seattle, USA,
pages 1--7,
2001.
Keywords:
Similarity Measures,
Time Warping,
Sequential/Temporal Data,
Sequential/Temporal Patterns.
| Abstract: |
Defining similarity between time series (or time series segments) is of fundamental importance, particularly in time series data mining and query. By identifying the perceptually important points directly from the time domain, time series segments and templates of different lengths can be compared and flexible pattern matching can be carried out in an effective and efficient manner. The proposed method is distinctive in its intuitiveness, making it particularly user friendly to ordinary data analysts like stock market investors. As demonstrated by two sets of experiments, based on syntactic and real stock time series respectively, the proposed method, with two schemes (PIP-ED and PIP-VD), has outperformed the traditional ones (e.g., dynamic time warping) under quite diverse experimental setups. |
-
Paul R. Cohen.
Fluent Learning: Elucidating the Structure of Episodes.
In IDA01,
number 2189 of LNAI,
pages 268--277,
2001.
Keywords:
Sequential/Temporal Data.
| Abstract: |
Fluents are logical descriptions of situations that persist, and composite fluents are statistically significant temporal relationships between fluents. This paper presents an algorithm for learning composite fluents incrementally from categorical time series data. The algorithm is test with a large dataset of mobile robot episodes. It is given no knowledge of the episode structure of the dataset (i.e., it learns without supervision) yet it discovers fluents that correspond well with episodes. |
-
Paul R. Cohen and Niall Adams.
An Algorithm for Segmenting Categorical Time Series into Meaningful Episodes.
In IDA01,
number 2189 of LNAI,
pages 197--205,
2001.
Keywords:
Sequential/Temporal Data.
| Abstract: |
This paper describes an unsupervised algorithm for segmenting categorical time series. The algorithm first collects statistics about the frequency and boundary entropy of ngrams, then passes a window over the series and has two ``expert methods'' decide where in the window boundaries should be drawn. The algorithm segments text into words successfully in three languages. We claim that the algorithm finds meaningful episodes in categorical time series, because it exploits two statistical characteristics of meaningful episodes. |
-
Marius Cãlin and Dan Gâlea.
A Fuzzy Relation for Comparing Intervals.
In B. Reusch, editor,
Proceedings of the 7th Fuzzy Days,
volume 2206 of LNCS,
Dortmund, Germany,
pages 904--916,
2001.
| Abstract: |
The paper proposes a method for expressing the matching degree of two arbitrary intervals through a fuzzy relation having the properties of reflexivity and symmetry. This method could be useful in situations when one has to compare two entities whose attributes are expressed through intervals and give a measure of their matching degree. Such a real world situation occurs in the mechanical engineering, namely in the steel selection process, when the designer has to determine the steels that are equivalent to a given one. This situation is described, as an application, in the final part of the paper. |
-
Bernard Hugueney and Bernadette Bouchon-Meunier.
Time-Series Segmentation and Symbolic Representation from Process-Monitoring to Data-Mining.
In CI01,
volume 2206 of LNCS,
pages 118-123,
2001.
Keywords:
Sequential/Temporal Data.
| Abstract: |
Data-analysis has undergone an important change from statistical descriptive analysis to data-mining. Information networks and huge data-storage equipments brought data-retrieval to new dimensions. Time-series are especially easy to accumulate as digital sensors can be used to fill databases without any intervention. This is both a boon and a problem as the very amount of data available prevents the user from being able to understand them. One has to build high-level representations of the time-series to be able to extract some information. Segmentation is often used in process-monitoring for similar reasons. \newpage In this paper, we describe step by step difficulties and solutions that we studied when adapting automated time-series segmentation to a real-world example of electric consumption analysis. The data that we want to analyze consists of yearly reports of electric power consumption analysis in 10 minute ticks. We study industrial consumers that have simple processes (ovens, motors) switched either on or off for the duration of the process. Hence we could use this prior knowledge to model the time series with piecewise constant changing mean models. We then extend the segmentation to a symbolic representation to enable interpretation to enable interpretation of the overwhelming number of generated segments. |
-
Frank Höppner and Frank Klawonn.
A New Approach to Fuzzy Partitioning.
In NAFIPS01,
Vancouver, Canada,
pages 1419--1424,
2001.
[ PDF
] [ Postscript
]
Keywords:
Clustering,
Fuzzy Clustering,
Fuzzy c-Means,
Sequential/Temporal Data.
| Abstract: |
Fuzzy clustering algorithms like the popular fuzzy c-means algorithm (FCM) are frequently used to automatically divide up the data space into fuzzy granules (fuzzy vector quantization). In the context of fuzzy systems, in order to be intuitive and meaningful to the user, the fuzzy membership functions of the used linguistic terms have to fulfill some requirements like boundedness of support or unimodality. By rewarding crisp membership degrees, we modify FCM and obtain different membership functions that better suit these purposes. We show that the modification can be interpreted as standard FCM using distances to the Voronoi cell of the cluster rather than using distances to the cluster prototypes. In consequence, the resulting partitions of the modified algorithm are much closer to those of the crisp original methods. The membership functions can be generalized to a fuzzified minimum function. We give some bounds on the approximation quality of this fuzzification. |
-
Frank Höppner and Frank Klawonn.
Finding Informative Rules in Interval Sequences.
In IDA01,
volume 2189 of LNCS,
Lissabon, Portugal,
pages 123--132,
2001.
[ PDF
] [ Postscript
]
Keywords:
Sequential/Temporal Data,
Sequential/Temporal Patterns,
Temporal Reasoning.
| Abstract: |
Observing a binary feature over a period of time yields a sequence of observation intervals. To ease the access to continuous features (like time series), they are often broken down into attributed intervals, such that the attribute describes the series' behaviour within the segment (e.g. increasing, high-value, highly convex, etc.). In both cases, we obtain a sequence of interval data, in which temporal patterns and rules can be identified. A temporal pattern is defined as a set of labeled intervals together with their interval relationships described in terms of Allen's interval logic. In this paper, we consider the evaluation of such rules in order to find the most informative rules. We discuss rule semantics and outline deficiencies of the previously used rule evaluation. We apply the J-measure to rules with a modified semantics in order to better cope with different lengths of the temporal patterns. We also consider the problem of specializing temporal rules by additional attributes of the state intervals. |
-
Frank Höppner.
Discovery of Temporal Patterns -- Learning Rules about the Qualitative Behaviour of Time Series.
In PKDD01,
number 2168 of LNAI,
Freiburg, Germany,
pages 192--203,
2001.
[ PDF
] [ Postscript
]
Keywords:
Association Rules,
Sequential/Temporal Data,
Sequential/Temporal Patterns,
Temporal Reasoning.
| Abstract: |
Recently, association rule mining has been generalized to the discovery of episodes in event sequences. In this paper, we additionally take durations into account and thus present a generalization to {\sl time intervals}. We discover frequent temporal patterns in a single series of such labeled intervals, which we call a state sequence. A temporal pattern is defined as a set of states together with their interval relationships described in terms of Allen's interval logic, for instance ``A before B, A overlaps C, C overlaps B'' or equivalently ``state A ends before state B starts, the gap is covered by state C''. As an example we consider the problem of deriving local weather forecasting rules that allow us to conclude from the qualitative behaviour of the air-pressure curve to the wind-strength. Here, the states have been extracted automatically from (multivariate) time series and characterize the trend of the time series locally within the assigned time interval. |
-
Frank Höppner.
Learning Temporal Rules from State Sequences.
In WLTSD,
Seattle, USA,
pages 25--31,
2001.
[ PDF
] [ Postscript
]
Keywords:
Association Rules,
Sequential/Temporal Data,
Sequential/Temporal Patterns,
Temporal Reasoning.
| Abstract: |
In this paper we consider the problem of learning rules about temporal relationships between labeled time intervals. We learn these rules from a single series of such labeled intervals, which might be obtained from (multivariate) time series by extracting various features of interest, for instance segments of increasing and decreasing local trends. We seek for the identification of frequent local patterns in this {\sl state series}. A temporal pattern is defined as a set of states together with their interval relationships described in terms of Allen's interval logic, for instance ``A before B, A overlaps C, C overlaps B'' or equivalently ``state A ends before state B starts, the gap is covered by state C''. In the spirit of association rule mining we propose an algorithm to discover frequent temporal patterns and to generate temporal rules. As an application we consider the problem of deriving local weather forecasting rules that allow us to conclude from the qualitative behaviour of the air-pressure curve to the wind-strength. |
-
Eamonn J. Keogh and Michael J. Pazzani.
Dynamic Time Warping with Higher Order Features.
In SDM01,
Chicago, USA,
2001.
[ URL
]
Keywords:
Classification,
Time Warping,
Sequential/Temporal Data.
| Abstract: |
Dynamic time warping (DTW) is a technique for aligning time series. This alognment is a necessary step in many domains before classification or averaging of time series can take place. Dynamic time warping has been successfully used in a wide variety of domains, including gesture recognition, robotics, data mining, speech processing, manufacturing and medicine. A problem with DTW is that the algorithm may try to explain variability in the Y-axis by shifting the X-axis. This can lead to unintuitive alignments between two sequences. A variety of ad-hoc solutions have been proposed to solve this problem. However these methods may result in the DTW failing to find the correct alignment. In this paper we present a modification of the algorithm which alleviates this problem by considering a higher level representation of the data. In particular our algorithm examines the first order derivative of the original data. We experimentally demonstrate the superiority of our approach on medical, space telemetry, financial and synthetic data. |
-
Hans-Peter Kriegel,
Marco Pötke,
and Thomas Seidl.
Interval Sequences: An Object-Relational Approach to Manage Spatial Data.
In SSTD,
volume 2121 of LNCS,
pages 481--501,
2001.
Keywords:
Sequential/Temporal Data.
| Abstract: |
The design of external index structures for one- and multidimensional extended objects is a long and well studied subject in basic database research. Today, more and more commercial applications rely on spatial datatypes and require a robust and seamless integration of appropriate access methods into reliable database servers. This paper proposes an efficient, dynamic and scalable approach to manage one-dimensional interval sequences within off-the-shelf object-relational database systemms. The presented technique perfectly fits to the concept of space-filling curves and, thus, generalizes to spatially extended objects in multidimensional data spaces. Based on the Relational Interval Tree, the method is easily embedded in modern extensible indexing frameworks and significantly outmatches Linear Quadtrees and Relational R-trees with respect to usability, concurrency, and performance. As demonstrated by our experimental evaluation on an Oracle server with real GIS and CAD data, the competing methods are outperformed by factors of up to 4.6 (Linear Quadtree) and 58.3 (Relational R-tree) for query response time. |
-
Chang-Hung Lee,
Cheng-Ru Lin,
and Ming-Syan Chen.
On Mining General Temporal Association Rules in a Publication Database.
In ICDM01,
pages 337-344,
2001.
Keywords:
Association Rules.
| Abstract: |
In this paper, we explore a new problem of mining general temporal association rules in publication databases. In essence, a publication database is a set of transactions where each transaction T is a set of items of which each item contains an individual exhibition period. The current model of association rule mining is not able to handle the publication database due to the following fundamental problems, i.e., (1) lack of consideration of the exhibition period of each individual item; (2) lack of an equitable support counting basis for each item. To remedy this, we propose an innovative algorithm Progressive-Partition-Miner (abbreviatedly as PPM) to discover general temporal association rules in a publication database. The basic idea of PPM is to first partition the publication database in light of exhibition periods of items and then progressively accumulate the occurrence count of each candidate 2-itemset based on the intrinsic partitioning characteristics. Algorithm PPM is also designed to employ a filtering threshold in each partition to early prune out those cumulatively infrequent 2-itemsets. Explicitly, the execution time of PPM is, in orders of magnitude, smaller than those required by the schemes which are directly extended from existing methods. |
-
Cen Li,
Gautam Biswas,
Mike Dale,
and Pat Dale.
Building Models of Ecological Dynamics using HMM based Temporal Data Clustering -- A Preliminary Study.
In IDA01,
number 2189 of LNAI,
pages 52--61,
2001.
Keywords:
Clustering,
Sequential/Temporal Data.
| Abstract: |
This paper discusses a temporal data clustering system that is based on the Hidden Markov Model (HMM) methodology. The proposed methodology improves upon existing HMM clustering methods in two ways. First, an explicit HMM model size selection procedure is incorporated into the clustering process, i.e., the sizes of the individual HMMs are dynamically determined for each cluster. This improves the interpretability of cluster models, and the quality of the final clustering partition results. Secondly, a partition selection method is developed to ensure an objective, data-driven selection of the number of clusters in the partition. The result is a heuristic sequential search control algorithm that is computationally feasible. Experiments with artificially generated data and real world ecology data show that: (i) the HMM model size selection algorithm is effective in re-discovering the structure of the generating HMMs, (ii) the HMM clustering with model size selection significantly outperforms HMM clustering using uniform HMM model sizes for re-discovering clustering pattern structures, (iii) it is able to produce interpretable and ``interesting'' models for real world data. |
-
Heikki Mannila and Marko Salmenkivi.
Finding Simple Intensity Descriptions from Event Sequence Data.
In KDD01,
San Francisco, USA,
pages 341--346,
2001.
Keywords:
Sequential/Temporal Data.
| Abstract: |
Sequences of events are an important type of data arising in various applications, including telecommunications, bio-statistics, web access analysis, etc. A basic approach to modeling such sequences is to find the underlying intensity functions describing the expected number of events per time unit. Typically, the intensity functions are assumed to be piecewise constant. We therefore consider different ways of fitting intensity models to event sequence data. We start by considering a Bayesian approach using Markov chain Monte Carlo (MCMC) methods with varying number of pieces. These methods can be used to produce posterior distributions on the intensity functions and they can also accomodate covariates. The drawback is that they are computationally intensive and thus are not very suitable for data mining applications in which large numbers of intensity functions have to be estimated. We consider dynamic programming approaches to finding the change points in the intensity functions. These methods can find the maximum likelihood intensity function in $O(n^2 k)$ time for a sequence of $n$ events and $k$ different pieces of intensity. We show that simple heuristics can be used to prune the number of potential change points, yielding speedups of several orders of magnitude. The results of the improved dynamic programming method correspond very closely with the posterior averages produced by the MCMC methods. |
-
Victor Medino-Chico,
Alberto Suárez,
and James F. Lutsko.
Backpropagation in Decision Trees for Regression.
In ECML01,
volume 2167 of LNAI,
Freiburg, Germany,
pages 348--359,
2001.
Keywords:
Decision Trees,
Neural Networks,
Regression.
| Abstract: |
A global optimization algorithm is designed to find the parameters of a CART regression tree extended with linear predictors at its leaves. In order to render the optimization mathematically feasible, the internal decisions of the CART tree are made continuous. This is accomplished by the replacement of the crisp decisions at the internal nodes of the tree with soft ones. The algorithm then adjusts the parameters of the tree in a manner similar to the backpropagation algorithm in multilayer perceptrons. With this procedure it is possible to generate regression trees optimized with a global cost function, which give a continuous representation of the unknown function, and whose architecture is automatically fixed by the data. The integration in one decision system of complementary features of symbolic and connectionist methods leads to improvements in prediction efficiency in both synthetic and real-world regression problems. |
-
Andrew Y. Ng,
Alice X. Zheng,
and Michael I. Jordan.
Link Analysis, Eigenvectors and Stability.
In IJCAI01,
pages 903--910,
2001.
| Abstract: |
The HITS and the PageRank algorithms are eigenvector methods for identifying "authoritative" or "influential" articles, given hyperlink or citation information. That such algorithms should give consistent answers is surely a desideratum, and in this paper, we address the question of when they can be expected to give stable rankings under small perturbations to the hyperlink patterns. Using tools from matrix perturbation theory and Markov chain theory, we provide conditions under which these methods are stable, and give specific examples of instability when these conditions are violated. We also briefly describe a modification to HITS that improves its stability. |
-
Siegfried Nijssen and Joost Kok.
Faster Association Rules for Multiple Relations.
In IJCAI01,
Seattle, Washington, USA,
pages 891--896,
2001.
Keywords:
Inductive Logic Programming,
Association Rules,
Speed-up Issues.
| Abstract: |
Several algorithms have already been implemented which combine association rules with first order logic formulas. Although this resulted in several usable algorithms, little attention was payed until recently to the efficiency of these algorithms. In this paper we present some new ideas to turn one important intermediate step in the process of discovering such rules, i.e. the discovery of frequent item sets, more efficient. Using an implementation that we coined FARMER, we show that indeed a speed-up is obtained and that, using these ideas, the performance is much more comparable to original association rule mining algorithms. |
-
Pavel Petrovic.
Solving LEGO brick layout problem using Evolutionary Algorithms.
In Proc. Norsk Informatikkonferanse (NIK) 2001,
pages 87--97,
2001.
| Abstract: |
LEGO presented the following problem at the SCAI´01 conference in February 2001: Given any 3D body, how can it be built from LEGO bricks? We apply methods of Evolutionary Algorithms (EA) to solve this optimization problem. For this purpose several specific operators are defined, applied, and their use is compared. In addition, mutation operators with dynamic rate of mutation updated based on thier contribution to progress of evolution are proposed. Different population organisation strategies are compared. Early results indicate that EA is suitable for solving this hard optimization problem. |
|