Home Kontakt Lehre Research Software


Publications of year 2002

Books and proceedings

Articles in journal or book's chapters

  • Annette Keller and Frank Klawonn. Adaption of Cluster Sizes in Objective-Function Based Fuzzy Clustering, volume IV, pages 181-199. 2002.
    Keywords: Clustering, Fuzzy Clustering.
    Abstract: This paper discusses new approaches in objective-function based fuzzy clustering. Some well-known approaches are extended by a supplementary component. The resulting new clustering techniques are able to adapt single clusters to the expansion of the corresponding group of data in an iterative optimization procedure. Another new approach based on volume centers as cluster representatives with varying radii for individual groups is also described. The corresponding objective functions are presented, and alternating optimization schemes are derived. Experimental results demonstrate the significance of the presented techniques.
  • Frank Höppner and Frank Klawonn. Learning Rules about the Development of Variables over Time. In Cornelius T. Leondes, editor,Intelligent Systems: Technology and Applications, volume IV, pages 201--228. 2002.
    Keywords: Similarity Measures, Association Rules, Sequential/Temporal Data, Sequential/Temporal Patterns, Temporal Reasoning.
    Abstract: The human's approach to learn from time series is guided by the impressive pattern recognition capabilities of the human brain. Rather than looking at the quantitative values a human automatically segments the time series appropriately and characterizes or classifies the segments. Similarity of time series is then decided on the basis of this abstracted representation. In this paper we propose an algorithm to support a human in this approach when dealing with large data sets. As an example, we consider the problem of deriving local weather forecasting rules that allow us to conclude from the qualitative behaviour of the air-pressure curve to the wind strength. Formally, we learn these rules from a single series of labeled intervals, which has been obtained from (multivariate) time series by extracting various features of interest, for instance segments of increasing and decreasing local trends. We seek for the identification of frequent local patterns in this {\sl state series}. A temporal pattern is defined as a set of states together with their interval relationships described in terms of Allen's interval logic. In the spirit of association rule mining we propose an algorithm to discover frequent temporal patterns and to generate temporal rules. We discuss an efficient algorithm for frequent pattern discovery in detail.
  • Se-Ho Choi and Peter Rockett. The training of Neural Classifiers With Condensed Datasets. SMCB, 32(2):202-206, 2002.
    Keywords: Classification, Nearest Neighbour Methods, Neural Networks, Speed-up Issues.
    Abstract: In this paper we apply k-nearest neighbour-based data condensing algorithm in the training set of multilayer perceptron neural networks. By removing the overlapping data and retaining only training exemplars adjacent to the decision boundary we are able to significantly speed the network training time while achieving an undegraded misclassification rate compared to the unedited training set. We report results on a range of synthetic and real datasets that indicate that a training speed-up of an order of magnitude is typical.
  • Erhan Gokcay and Jose C. Principe. Information Theoretic Clustering. TPAMI, 24(2):158--171, 2002.
    Keywords: Classification, Clustering.
    Abstract: Clustering is one of the most important topics in pattern recognition. Since only the structure of the data dictates the grouping (unsupervised learning), information theory is an obvious criteria to establish the clustering rule. This paper describes a novel valley seeking clustering algorithm using an information theoretic measure to estimate the cost of partitioning the data set. The information theoretic criteria developed here evolved from a Renyi's entropy estimator that was proposed recently and has been successfully applied to other machine learning applications. An improved version of the k-change algorithm is used in optimization because of the stepwise nature of the cost function and existence of local minima. Even when applied to nonlinearly separable data, the new algorithm performs well, and was able to find nonlinear boundaries between clusters. The algorithm is also applied to the segmentation of magnetic resonance imaging data (MRI) with very promising results.
  • Maria Halkidi, Yannis Batistakis, and Michalis Vazirgiannis. Cluster Validity Methodes: Part II. SIGMOD Record, 2002. [ URL ]
    Keywords: Clustering, Cluster Validity Measures.
    Abstract: Clustering results validation is an important topic in the context of pattern recognition. We review approaches and systems in this context. In the first part of this paper we presented clustering validity checking approaces based on internal and external criteria. In the second, current part, we present a review of clustering validity approaches based on relative criteria. Also we discuss the results of an experimental study based on widely known validity indices. Finally the paper illustrates the issues that are under-addressed by the recent approaches and proposes the research directions in the field.
  • Maria Halkidi, Yannis Batistakis, and Michalis Vazirgiannis. Cluster Validity Methodes: Part I. SIGMOD Record, 2002. [ URL ]
    Keywords: Clustering, Cluster Validity Measures.
    Abstract: Clustering is an unsupervised process since there are no predefined classes and no examples that would indicate grouping properties in the data set. The majority of the clustering algorithms behave differently depending on the features of the data set and the initial assumptions for defining groups. Therefore, in most applications the resulting clustering scheme requires some sort of evaluation as regards in validity. Evaluating and assessing the results of a clustering algorithm is the main subject of cluster validity.\newline In this paper we present a review of the clustering validity and methods. More specifically, Part I of the paper discusses the cluster validity approaches based on external and internal criteria.
  • Frank Höppner, Frank Klawonn, and Patrik Eklund. Learning Indistinguishability from Data. Soft Computing, 6(1):6--13, 2002. [ PDF ]
    Keywords: Clustering, Similarity Measures, Fuzzy Clustering, Fuzzy Models.
    Abstract: In this paper we revisit the idea of interpreting fuzzy sets as representations of vague values. In this context a fuzzy set is induced by a crisp value and the membership degree of an element is understood as the similarity degree between this element and the crisp value that determines the fuzzy set. Similarity is assumed to be a notion of distance. This means that fuzzy sets are induced by crisp values and an appropriate distance function. This distance function can be described in terms of scaling the ordinary distance between real numbers. With this interpretation in mind, the task of designing a fuzzy system corresponds to determining suitable crisp values and appropriate scaling functions for the distance. When we want to generate a fuzzy model from data, the parameters have to be fitted to the data. This leads to an optimisation problem that is very similar to the optimisation task to be solved in objective function based clustering. We borrow ideas from the alternating optimisation schemes applied in fuzzy clustering in order to develop a new technique to determine our set of parameters from data, supporting the interpretability of the fuzzy system.
  • Frank Höppner and Frank Klawonn. Finding Informative Rules in Interval Sequences. IDA, 6(3):237--256, 2002. [ Postscript ]
    Keywords: Sequential/Temporal Data, Sequential/Temporal Patterns, Temporal Reasoning.
    Abstract: Observing a discrete-valued variable over a period of time yields a sequence of observation intervals: whenever the variable changes its value, the observation of its current value ends and a new interval starts. Measurements of real-valued variables over time (time series) are often partitioned in homogenous subsegments, such that every segment can be described by simple attributes, e.g.\ convexly increasing, constant, high-value, etc. Thus for both discrete and continuous data we obtain a sequence of labeled intervals. In this paper, a temporal pattern is defined as a set of labeled intervals together with their interval relationships in terms of Allen's interval logic. We consider the discovery and evaluation of informative rules in such sequences. In particular we focus on the problem of specializing temporal rules by imposing quantitative constraints on numerical attributes of the labeled intervals.
  • Frank Höppner. Speeding up Fuzzy c-Means: Using a Hierarchical Data Organisation to Control the Precision of Membership Calculation. FSS, 128(3):365--378, 2002.
    Keywords: Clustering, Fuzzy c-Means, Speed-up Issues.
    Abstract: We examine the run-time behaviour of conventional fuzzy c-means implementations. Investigating into FCM termination conditions and membership update equations, we derive an approximative FCM that yields the same results as a conventional implementation within a given precision. We incorporate additional information about the data set by re-organizing the set as a tree. Our modification leads to an FCM algorithm with a significantly different run time behaviour; the gain of using the modified implementation increases with an increasing number of data objects and especially an increasing number of clusters, but is also sensitive to the chosen fuzzifier.
  • Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. An Efficient k-Means Clustering Algorithm: Analysis and Implementation. TPAMI, 24(7):881-892, 2002.
    Keywords: Clustering, Image Data, Speed-up Issues.
    Abstract: In k-means clustering, we are given a set of n data points in d-dimensional space $R^d$ and an integer k and the problem is to determine a set of k points in $R^d$, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper, we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which shows that the algorithms runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation.
  • John F. Kolen and Tim Hutcheson. Reducing the Time Complexity of the Fuzzy c-Means Algorithm. TFS, 10(2):263--267, 2002.
    Keywords: Clustering, Fuzzy c-Means, Speed-up Issues.
    Abstract: In this paper, we present an efficient implementation of the fuzzy c-means clustering algorithm. The original algorithm alternates between estimating centers of the clusters and the fuzzy membership of the data points. The size of the membership matrix is on the order of the original data set, a prohibitive size if this technique is to be applied to very large data sets with many clusters. Our implementation eliminates the storage of this data structure by combining the two updates into a single update of the cluster centers. This change significantly affects [speed-up] the asymptotic runtime as the new algorithm is linear with respect to the number of clusters, while the original is quadratic. Elimination of the membership matrix also reduces the overhead associated with repeatedly accessing a large data structure. Empirical evidence is presented to quantify the savings achieved by this new method.
  • Xiaohui Liu, Gongxian Cheng, and John X. Wu. Analyzing Outliers Cautiously. TKDE, 14:432--437, 2002.
    Keywords: Noise Handling.
    Abstract: Outliers are difficult to handle because some of them can be measurement errors, while others may represent {\sl phenomena of interest}, something ``significant'' from the viewpoint of the application domain. Statistical analysis of outliers requires much relevant domain knowledge. In our previous work, we suggested a knowledge-based method for distinguishing between the measurement errors and phenomena of interest by modeling ``real measurements'' -- how measurements should be distributed in an application domain. In this paper, we make this distinction by modeling measurement errors instead. This is a cautious approach to outlier analysis, which has been successfully applied to a medical problem and may find interesting applications in other domains such as science, engineering, finance, and economics.
  • Pabitra Mitra, C.A. Murthy, and Sankar K. Pal. Unsupervised Feature Selection Using Feature Similarity. TPAMI, 24(3):301--312, 2002.
    Keywords: Similarity Measures, Multiscale Analysis.
    Abstract: In this article, we describe an unsupervised feature selection algorithm suitable for data sets, large in both dimension and size. The method is based on measuring similarity between features whereby redundancy therein is removed. This does not need any search and, therefore, is fast. A new feature similarity measure, called maximum information compression index, in introduced. The algorithm is generic in nature and has the capability of multiscale representation of data sets. The superiority of the algorithm, in terms of speed and performance, is established extensively over various real-life data sets of different sizes and dimensions. It is also demonstrated how redundancy and information loss in feature selection can be quantified with an entropy measure.
  • Pabitra Mitra, C. A. Murthy, and Sankar K. Pal. Density-Based Multiscale Data Condensation. TPAMI, 24(6):734--747, 2002.
    Keywords: Classification, Clustering, Rule Generation, Multiscale Analysis.
    Abstract: A problem gaining interest in pattern recognition applied to data mining is that of selecting a small representative subset from a very large data set. In this article, a nonparametric data reduction scheme is suggested. It attempts to represent the density underlying the data. The algorithm selects representative points in a multiscale fashion which is novel from existing density-based approaches. The accuracy of representation by the condensed set is measured in terms of the error in density estimates of the original and reduced sets. Experimental studies on several real life data sets show that the multiscale approach is superior to several related condensation methods both in terms of condensation ratio and estimation error. The condensed set obtained was also experimentally shown to be effective for some important data mining tasks like classification, clustering, and rule generation on large data sets. Moreover, it is empirically found that the algorithm is efficient in terms of sample complexity.
  • Bamshad Mobasher, Honghua Dai, Tao Luo, and Miki Nakagawa. Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization. Data Mining and Knowledge Discovery, 6:61--82, 2002.
    Keywords: Clustering, Sequential/Temporal Data.
    Abstract: Web usage mining, possibly used in conjunction with standard approaches to personalization such as collaborative filtering, can help address some of the shortcomings of these techniques, including reliance on subjective user ratings, lack of scalability, and poor performance in the face of high-dimensional and sparse data. However, the discovery of patterns from usage data by itself is not sufficient for performing the personalization tasks. The critical step is the effective derivation of good quality and useful (i.e., actionable) "aggregate usage profiles" from these patterns. In this paper we present and experimentally evaluate two techniques, based on clustering of user transactions and clustering of pageviews, in order to discover overlapping aggregate profiles that can be effectively used by recommender systems for real-time Web personalization. We evaluate these techniques both in terms of the quality of the individual profiles generated, as well as in the context of providing recommendations as an integrated part of a personalization engine. In particular, our results indicate that using the generated aggregate profiles, we can achieve effective personalization at early stages of users' visits to a site, based only on anonymous clickstream data and without the benefit of explicit input by these users or deeper knowledge about them.
  • Michel Ménard. Fuzzy clustering and switching regression models using ambiguity and distance rejects. FSS, 128(3), 2002.
    Keywords: Clustering, Fuzzy Clustering, Regression.
  • Michel Ménard and Michel Eboueya. Extreme physical information and objective function in fuzzy clustering. FSS, 128:285--303, 2002.
    Keywords: Clustering, Fuzzy Clustering, Fuzzy c-Means.
    Abstract: Fuzzy clustering algorithms have been widely studied and applied in a variety of areas. They become the major techniques in cluster analysis. In this paper, we focus on objective function models whose aim is to assign the data to clusters so that a given objective function is optimized. We propose a new approach in fuzzy clustering and show how it can be used to obtain a systematic method deriving objective functions. This approach is based on a unifying principle of physics, that of extreme physical information (EPI) defined by Frieden (Physics from Fisher Information. A Unification, 1999). The information in question is the trace of the Fisher information matrix for the estimation procedure; this information is shown to be a physical measure of disorder. We use the EPI approach for finding the effective and minimal constraint terms in objective function. With the proposed approach we justify the constraint terms defined a priori in the Fuzzy c-Means (FCM) algorithm and possibilistic and maximum entropy inference approaches. Indeed, these algorithms, by contrast, offer no such systematic method of finding its constraints. Moreover, in this context, the EPI approach derives the "reason" for the extremization of objective functions. The resulting formulae have a clearer physical meaning than those obtained by means of classical algorithms. The updated equations of our algorithm are identical to those in the possibilistic, MEI and FCM with regularization approaches.
  • Witold Pedrycz and Andrzej Bargiela. Granular Clustering: A Granular Signature of Data. TSMCB, 32(2):212--224, 2002.
    Keywords: Clustering.
    Abstract: The study is devoted to a granular analysis of data. We develop a new clustering algorithm that organizes findings about data in the form of a collection of information granules -- hyperboxes. The clustering carried out here is an example of a granulation mechanism. We discuss a compatibility measure guiding a construction (growth) of the clusters and explain a rationale behind their development. The clustering promotes a data mining way of problem solving by emphasizing the transparency of the results (hyperboxes). We discuss a number of indexes describing hyperboxes and expressing relationships between such information granules. It is also shown how the resulting family of the information granules is a concise descriptor of the structure of the data -- a granular signature of the data. We examine the properties of features (variables) occurring of the problem as they manifest in the setting of the information granules. Numerical experiments are carried out based on two-dimensional synthetic data as well as multivariable Boston data available on the WWW.
  • Shounak Roychowdhury and Witold Pedrycz. Modeling temporal functions with granular regression and fuzzy rules. FSS, 126:377--387, 2002.
    Keywords: Sequential/Temporal Data, Regression.
    Abstract: Modeling of temporal functions has been studied extensively in last few decades. Temporal functions or time series are found practically in the gamut of applications, ranging from engineering analysis to financial and commercial transactions. \newline Numerous time series models have been proposed, and they can be found in the relevant time series literature. Deterministic and stochastic models have dominated much of the research in this area. Nonetheless, these models have their limitations, and particularily specific models solve specific problem domains. Therefore, there is no general mechanism that can address the issues of modeling of temporal functions. \newline Granular information is now being used to address a few of the problems in the theory of time series. This paper proposes two models that use the concepts of regression and interleaved information granules -- intervals and fuzzy sets to identify the structure existing in a given time series. In brevity, we are attempting to combine regression and fuzzy rules to model temporal systems.
  • Sani Susanto, Ignatius Suharto, and Paulus Sukapto. Using the fuzzy clustering algorithm for the allocation of students. World Transactions on Engineering and Technology Education, 1(2):245--248, 2002. [ PDF ]
    Keywords: Clustering, Fuzzy Clustering.

Conference's articles

  • Ziv Bar-Joseph, Georg Gerber, David K. Gifford, and Tommi S. Jaakkola. A New Approach to Analyzing Gene Expression Time Series Data. In 6th Annual Int. Conf. on Research in Comp. Molecular Biology, 2002.
    Keywords: Clustering, Time Warping, Sequential/Temporal Data, Splines.
    Abstract: We present algorithms for time-series gene expression analysis that permit the principles estimation of unobserved time-points, clustering, and data-alignment. Each expression profile is modeled as a cubic spline (piecewise polynomial) that is estimated from the observed data and every time point influences the overall smooth expression curve. We constrain the spline coefficients of genes in the same class to have similar expression patterns, while also allowing for gene specific parameters. We show that unobserved time-points can be reconstructed using our method with 10-15\ 4.902896e+00ss error when compared to previous best methods. Our clustering algorithm operates directly on the continuous representation of gene expression profiles, and we demonstrate that this is particularily effective when applied to non-uniformly sampled data. Our continuous alignment algorithm also avoids difficulties encountered by discrete approaches. In particular, our method allows for control of the number of degrees of freedom of the warp through the specification of parameterized functions, which helps to avoid overfitting. We demonstrate that our algorithm produces stable low-error alignments on real expression data and further show a specific application to yeast knockout data that produces biologically meaningful results.
  • Riccardo Bellazzi, Cristiana Larizza, Paolo Magni, and Roberto Bellazzi. Quality assessment of dialysis services through Intelligent Data Analysis and temporal data mining. In KDSTD02, Lyon, France, pages 3--9, 2002.
    Keywords: Association Rules, Sequential/Temporal Data.
    Abstract: This paper describes a research project that deals with the definition of methods and tools for the assessment of the clinical performance of a hemodialysis service. While simple statistical summaries are computed to asses basic outcomes, Intelligent Data Analysis techniques are applied to gain insight and to discover knowledge on the causes of negative clinical results. In particular, different techniques, comprising Temporal Abstractions, association rules and confirmation rules induction, are applied on the time series that are automatically collected during the monitoring of hemodialysis sessions. The paper describes the application domain, the basic goals of the project and the methodological approach applied for time series data analysis. Some clinically interesting results, obtained on data coming from 5 patients monitored for nine months, are also shown.
  • Michael R. Berthold, David E. Patterson, Marco Ortolani, Heiko Hofer, Frank Höppner, and Ondine Callan. Shape-Invariant Fuzzy Clustering of Proteomics Data. In NAFIPS02, 2002.
    Keywords: Clustering, Fuzzy Clustering.
  • Richard J. Bolton, David J. Hand, and Niall M. Adams. Determining Hit Rate in Pattern Search. In \citeHand:PDD:2002, pages 36--48, 2002.
  • Christian Borgelt and Michael R. Berthold. Mining Molecular Fragments: Finding Relevant Substructures of Molecules. In ICDM02, pages 51--58, 2002.
    Abstract: We present an algorithm to find fragments in a set of molecules that help to discriminate between different classes of, for instance, activity in a drug discovery context. Instead of carrying out a brute-force search, our method generates fragments by embedding them in all appropriate molecules in parallel and prunes the search tree based on a local order of the atoms and bonds, which results in substantially faster search by eliminating the need for frequent, computationally expensive reembeddings and by suppressing redundant search. We prove the usefulness of our algorithm by demonstrating the discovery of activity-related groups of chemical compounds in the well-known National Cancer Institute s HIV-screening dataset.
  • G. Carrault, M.-O. Cordier, Réne Quiniou, and F. Wang. Intelligent multichannel cardiac data analysis for diagnosis and monitoring. In KDSTD02, Lyon, France, pages 10--16, 2002.
    Keywords: Inductive Logic Programming, Sequential/Temporal Patterns.
    Abstract: We present a novel approach to cardiac data analysis for monitoring patients in intensive care units. Recorded data are transformed into symbolic event streams. Then a chronicle recognizer attempts to detect, on the fly, interesting temporal patterns called chronicles in this stream. Chronicles including temporal relationships are learned from example data by relational learning methods such as Inductive Logic Programming. The resulting chronicles are high level representations of temporal phenomena. On the one hand, they are expected to be more accurate and robust for diagnosis, even in the presence of a noisy input and, on the other hand, are easier to understand by the clinical staff. The approach is illustrated on learning and recognizing cardiac arrhythmias from electrocardiograms.
  • Alan Fern, Robert Givan, and Jeffrey Mark Siskin. Specific-to-General Learning for Temporal Events. In National Conference on Artificial Intelligence (AAAI-02), 2002.
    Keywords: Classification.
    Abstract: We study the problem of supervised learning of event classes in a simple temporal event-description language. We give lower and upper bounds and algorithms for the subsumption and generalization problems for two expressively powerful subsets of this logic, and present a positive-examples-only specific-to-general learning method based on the resulting algorithms. We also present a polynomial-time computable ``syntactic'' subsumption test that implies semantic subsumption without being equivalent to it. A generalization algorithm based on syntactic subsumption can be used in place of semantic generalization to improve the asymptotic complexity of the resulting learning algorithm. A companion paper shows that our methods can be applied to duplicate the performance of human-coded concepts in the substantial application domain of video event recognition.
  • Alan Fern, Jeffrey Mark Siskind, and Robert Givan. Learning Temporal, Relational, Force-Dynamic Event Definitions from Video. In National Conference on Artificial Intelligence (AAAI-02), 2002.
    Abstract: We present and evaluate a novel implemented approach for
    learning to recognize events in video. First, we introduce a sublanguage
    of event logic, called k-AMA, that is sufficiently expressive to
    represent visual events yet sufficiently restrictive to support
    learning. Second, we develop a specific-to-general learning algorithm
    for learning event definitions in k-AMA. Finally, we apply this
    algorithm to the task of learning event definitions from video and show
    that it yields definitions that are competitive with hand-coded ones.
  • David Hand. Pattern Detection and Discovery. In \citeHand:PDD:2002, pages 1--12, 2002.
  • Frank Höppner. Handling Feature Ambiguity in Knowledge Discovery from Time Series. In Proc. of 5th Int. Conf. on Discovery Science, number 2534 of LNCS, Lübeck, Germany, pages 398--405, 2002.
    Keywords: Piecewise Linear Representations, Multiscale Analysis, Sequential/Temporal Data.
    Abstract: In knowledge discovery from time series abstractions (like piecewise linear representations) are often preferred over raw data. In most cases it is implicitly assumed that there is a single valid abstraction and that the abstraction method, which is often heuristic in nature, finds this abstraction. We argue that this assumption does not hold in general and that there is need for knowledge discovery methods that pay attention to the ambiguity of features: In a different context, an increasing segment may be considered as (being part of) a decreasing segment. It is not a priori clear which view is correct or meaningful. We show that the relevance of ambiguous features depends on the relevance of the knowledge that can be discovered by using the features. We combine techniques from multiscale signal analysis and interval sequence mining to discover rules about dependencies in multivariate time series.
  • Frank Höppner. Learning Dependencies in Multivariate Time Series. In KDSTD02, Lyon, France, pages 25--31, 2002.
    Keywords: Sequential/Temporal Data, Sequential/Temporal Patterns, Temporal Reasoning.
    Abstract: This paper sketches an approach to learn interdependencies between multiple time series. At the beginning the time series are segmented and thereby transformed into sequences of labeled intervals. The labels denote qualitative aspects of the signal in the respective intervals. Then, from the sequence of labeled intervals, we discover rules where premise and conclusion consist of temporal patterns. The temporal patterns are sets of intervals where Allen's interval logic is used to capture their temporal relationships. Rules are specialized with respect to numerical attributes like the length of the intervals or the slope of the signal within the interval. Finally, we obtain rules like ``when signal A decreases while signal B increases with slope greather than 2 then signal C will decrease''. Since humans use a similar syntax when discussing such aspects, the proposed methodology may support a human in learning dependencies in multivariate time series.
  • Frank Höppner. Time Series Abstraction Methods -- A Survey. In Proceedings GI Jahrestagung Informatik, Workshop on Knowl. Discovery in Databases, LNI, Dortmund, Germany, pages 777-786, 2002.
    Keywords: Noise Handling, Multiscale Analysis, Sequential/Temporal Data, Surveys.
    Abstract: In this paper several time series abstraction methods are reviewed with respect to their applicability in the framework of knowledge discovery (KDD) and data mining (DM). We discuss the demands of KDD/DM (noise, robustness) and show that many widely used methods do not satisfy them, because they are single-scale methods. With multiscale approaches, however, the drawbacks of the single-scale methods can be addressed explicitly.
  • Frank Höppner. Lernen lokaler Zusammenhänge in multivariaten Zeitreihen. In J. Biethahn, J. Kuhl, and A. Lackner, editors, Tagungsband zum 5. Göttinger Symposium Soft Computing, Göttingen, pages 113--125, 2002. [ PDF ]
  • Frank Höppner. Discovery of Core Episodes from Sequences -- Using Generalization for Defragmentation of Rule Sets. In Pattern Detection and Discovery, number 2447 of LNAI, pages 199--213, 2002.
    Keywords: Association Rules, Sequential/Temporal Data, Temporal Reasoning.
    Abstract: We consider the problem of knowledge induction from sequential or temporal data. Patterns and rules in such data can be detected using methods adopted from association rule mining. The resulting set of rules is usually too large to be inspected manually. We show that (amongst other reasons) the inadequacy of the pattern space is often responsible for many of these patterns: If the true relationship in the data is fragmented by the pattern space, it cannot show up as a peak of high pattern density, but the data is divided among many different patterns, often difficult to distinguish from incidental patterns. To overcome this fragmentation, we identify core patterns that are shared among specialized patterns. The core patterns are then generalized by selecting a subset of specialized patterns and combining them disjunctively. The generalized patterns can be used to reduce the size of the set of patterns. We show some experiments for the case of labeled interval sequences, where patterns consist of a set of labeled intervals and their temporal relationships expressed via Allen's interval logic.
  • Eamonn J. Keogh, Stefano Lonardi, and Bill Chiu. Finding Surprising Patterns in a Time Series Database in Linear Time and Space. In KDD02, Edmonton, Canada, 2002.
    Keywords: Sequential/Temporal Data.
    Abstract: The problem of finding a specified pattern in a time series database (i.e. query by content) has received much attention and is now a relatively mature field. In contrast, the important problem of enumerating all surprising or interesting patterns has received far less attention. This problem requires a meaningful definition of "surprise", and an efficient search technique. All previous attempts at finding surprising patterns in time series use a very limited notion of surprise, and/or do not scale to massive datasets. To overcome these limitations we introduce a novel technique that defines a pattern surprising if the frequency of its occurrence differs substantially from that expected by chance, given some previously seen data. This notion has the advantage of not requiring an explicit definition of surprise, which may be impossile to elicit from a domain expert. Instead the user simply gives the algorithm a collection of previously observed normal data. Our algorithm uses a suffix tree to efficiently encode the frequency of all observed patterns and allows a Markov model to predict the expected frequency of previously unobserved patterns. Once the suffix tree has been constructed, a measure of surprise for all the patterns in a new database can be determined in time and space linear in the size of the database. We demonstrate the utility of our approach with an extensive experimental evaluation.
  • Terence Kwok, Kate Smith, Sebastian Lozano, and David Taniar. Parallel Fuzzy c-Means Clustering for Large Data Sets. In Burkhard Monien and Rainer Feldmann, editors, EUROPAR02, volume 2400 of LNCS, pages 365--374, 2002.
    Keywords: Clustering, Fuzzy c-Means.
    Abstract: The parallel fuzzy c-means (PFCM) algorithm for clustering large data sets is proposed in this paper. The proposed algorithm is designed to run on parallel computers of he Single Program Multiple Data (SPMD) model type with the Message Passing Interface (MPI). A comparison is made between PFCM and an existing parallel k-means (PKM) algorithm in terms of thei parallelisation capability and scalability. In an implementation of PFCM to cluster a large data set from an insurance company, the proposed algorithm is demonstrated to have almost ideal speedups as well as an excellent scaleup with respect to the size of the data set.
  • S. Miyamoto and D. Suizu. Fuzzy c-means clustering using transformations into high dimensional spaces. In Int. Conf. on Fuzzy Systems and Knowledge Discovery, volume 2, Singapore, pages 656--660, 2002.
    Keywords: Clustering, Fuzzy c-Means.
  • Marco Ortolani, Heiko Hofer, David Patterson, Frank Höppner, and Michael Berthold. Fuzzy Information Granules in Time Series Data. In FUZZIEEE02, Honolulu, Hawai, pages 695--699, 2002.
    Keywords: Clustering, Fuzzy Clustering, Fuzzy c-Means, Sequential/Temporal Data, Medical Applications.
    Abstract: It is often desirable to summarize a set of time series through typical shapes in order to analyze them. The algorithm presented here compares pieces of different time series in order to find similar shapes. The use of a fuzzy clustering technique based on fuzzy c-means allows us to consider such subsets as belonging to typical shapes with a degree of membership. Additionally, this matching is invariant with respect to a scaling of the time series. The algorithm is demonstrated on a widely known set of data taken from the ECG rhythm analysis experiments at MIT laboratories.
  • Linda Peelen, Niels Peek, and Ameen Abu-Hanna. Acquiring and using temporal knowledge in medicine: an application in chronic pulmonary disease. In KDSTD02, Lyon, France, pages 44-50, 2002.
    Keywords: Sequential/Temporal Data.
    Abstract: Routine recording of time-stamped patient data is becoming more and more common in clinical medicine. It is generally believed that this can improve the quality of medical practice by providing doctors with more knowledge of temporal relations in disease and treatment; most benefits are to be expected from better prognoses, and thereby better treatment choices. It is however far from obvious which methods should be applied to extract this knowledge from the data; probably a mixture of methods, tailored to the domain at hand, should be used. This paper discusses the issues that are involved when learning, and putting to use, knowledge from medical time series data. An example decision support application, for the prognosis and treatment of chronic pulmonary disease, is presented in detail.
  • Juan J. Rodriguez and Carlos J. Alonso. Boosting Interval-Based Literals: Variable Length and Early Classification. In KDSTD02, Lyon, France, pages 51--62, 2002.
    Keywords: Classification, Sequential/Temporal Data.
    Abstract: In previous works, a system for supervised time series classification has been presented. It is based on boosting very simple classifiers: only one literal. The used predicates are based on temporal intervals. There are two types of predicates: i) relative predicates, such as ``increases'' and ``stays'', and ii) region predicates, such as ``always'' and ``sometime'', which operate over regions in the domain of the variable. \newline This work presents two new features of the system. First, the system can now deal directly with variable length time series. Second, the obtained classifiers can now be used with partial time series. This ``early classification'' is essential in some supervision environments, where it is necessary to give an alarm signal as soon as possible. Experiments on different data sets show that the proposed method is highly competitive with previous approaches.
  • Mohammad Shahmohammadi and Hamid R. Nikoofar. Automatic Digital Modulation Recognition under Non-Coherent Condition. In Proc. of ISIT, Lausanne, Switzerland, pages 19-22, 2002.
    Keywords: Classification, Clustering.
    Abstract: In this paper, we propose a soft clustering algorithm for digital modulation recognition by exploiting the constellation shape as a key signature of the modulation type. Using the constellation shape, we change the procedure of automatic modulation recognition (AMR) to a general pattern classification problem. Moreover, the algorithm enables us to identify the modulation type under non-coherent condition. Our approach can be viewed as a method for blind estimation of reference phase with modulation identification. Experimental results are shown for several cases. The performance of the algorithm under non-coherent condition is relatively close to that of the ideal condition, which means that the algorithm achieves good estimation for reference phase.

Disclaimer

This list of publications is neither official nor complete, but a personal compilation.

Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.

This document was translated from BibTEX by bibtex2html

Home © F. Höppner last update: Tue Dec 7 08:49:56 CET 2004