Home Kontakt Lehre Research Software


Publications of year 2000

Books and proceedings

  • Judea Pearl. Causality. 2000.
    Abstract: Chapter headings: Introduction to Probabilities, Graphs, and Causal Models; A Theory of Inferred Causation; Causal Diagrams and the Identification of Causal Effects; Actions, Plans, and Direct Effects; Causality and Structural Models in Social Science and Economics; Simpson's Paradoxon, Confounding, and Collapsibility; The Logic of Structure-Based Counterfactuals; Imperfect Experiments: Bounding Effects and Counterfactuals; Probability of Causation: Interpretation and Identification; The Actual Cause; Epilogue: The Art and Science of Cause and Effect

Articles in journal or book's chapters

  • Paulo Félix, Senén Barro, Santiago Fraga, and Francisco Palacios. A fuzzy model for pattern recognition in the evolution of patients. In Fuzzy Logic in Medicine, Studies in Fuzziness and Soft Computing. 2000.
    Keywords: Fuzzy Models, Sequential/Temporal Data.
    Abstract: In this paper the Multivariable Fuzzy Temporal Profile (MFTP) model is presented, for the representaion of patterns in the evolution of a set of physical parameters, and its application in the domain of medicine. On the basis of the linguistic acquisition of the information that defines an MFTP, and passing through a stage of the analysis of the consistency of this information, the application of the MFTP model is proposed for the recognition of patterns on signals. In this sense, the relevance of this recognition is emphasized in the clinical environment, and in particular, in the monitoring of patients in Intensive Care Units.
  • D. Calvelo, M.-C. Chambrin, D. Pmorski, and P. Ravaux. Towards symbolization using data-driven extraction of local trends for ICU monitoring. AIM, 19:203-223, 2000.
    Abstract: We propose a methodology for the extraction of local trends from a stream of data. It has been designed to suit the needs of interpretation-oriented visualization and symbolization from ICU monitoring data. After giving implementation details for efficient computation of local trends, we propose the use of a characteristic analysis span for each variable. This characteristic span is obtained from a set of criteria that we compare and evaluate in regard of analysis of ICU monitoring data gathered within the Aiddaig project. The processing results in a rich visual representation and a framework for the local symbolization of the data stream based on its dynamics.
  • Gabriela Guimarães. Temporal Knowledge Discovery with Self-Organizing Neural Networks. IJCSS, 1(1):5--16, 2000.
    Keywords: Neural Networks, Sequential/Temporal Data, Sequential/Temporal Patterns.
    Abstract: This paper presents the use of special self-organizing Neural Networks, named Self-Organizing Maps (SOMs), in the context of Temporal Data Mining. SOMs are unsupervised Neural Networks suitable for exploratory tasks. However, in order to discover temporal patterns in multivariate time series, as proposed here, an extension of the original algorithm is afforded. Therefore, several approaches are combined for temporal processing. These are: the introduction of several hierarchical levels in order to handle the complexity given by the large number of signal channels; a visualization of the network structures for exploratory tasks; and trajectories for a visualisation of the temporal course. This approach is part of the recently developed method for temporal knowledge conversion that introduces several abstraction levels and enables a transition from multivariate time series into a linguistic temporal knowledge representation. This method was successfully applied to a problem in medicine, called sleep apnoea. All disorders have been identified with the method, and even additional, and potentially `new' information was provided about a temporal pattern, the one specifying a mixed obstructive apnoe.
  • Reginald E. Hammah and John H. Curran. Validity Measures for the Fuzzy Cluster Analysis of Orientations. TPAMI, 22(12):1467--1472, 2000.
    Keywords: Clustering, Cluster Validity Measures, Fuzzy Clustering, Fuzzy c-Means.
    Abstract: Fuzzy k-means clustering can be applied to the automatic identification of sets in discontinuity data after suitable adaptation of the algorithm. To establish the number of clusters in a data set, modified versions of the validity measures of Gath and Geva, Xie-Beni and Fukuyama-Sugeno are presented in this paper.
  • David J. Hand, K. Blunt, M. G. Kelly, and Niall Adams. Data mining for fun and profit. Statistical Science, 15(2):111--131, 2000.
    Keywords: Sequential/Temporal Data.
    Abstract: Data mining is defined as the process of seeking interesting or valuable information within large data sets. This presents novel challenges and problems, distinct from those typically arising in the allied areas of statistics, machine learning, pattern recognition or database science. A distinction is drawn between the two data mining activities of model building and pattern detection. Even though statisticians are familiar with the former, the large data sets involved in data mining mean that novel problems do arise. The second of the activities, pattern detection, presents entirely new classes of challenges, some arising, again, as a consequence of the large sizes of the data sets. Data quality is a particularly troublesome issue in data mining applications, and this is examined. The discussion is illustrated with a variety of real examples.
  • Jochen Hipp, Ulrich Güntzer, and Gholamreza Nakhaeizadeh. Algorithms for Association Rule Mining -- A General Survey and Comparison. SIGKDDEx, 2(1):58--64, 2000.
    Keywords: Association Rules, Surveys.
    Abstract: Today there are several efficient algorithms that copy with the popular and computationally expensive task of associations rule mining. Actually, these algorithms are more or less described on their own. In this paper, we explain the fundamentals of association rule mining and moreover derive a general framework. Based on this we describe today's approaches in context by pointing out common aspects and differences. After that we thoroughly investigate their strengths and weaknesses and carry out several runtime experiments. It turns out that the runtime behaviour of the algorithms is much more similar as to be expected.
  • Annette Keller and Frank Klawonn. Fuzzy Clustering with Weighting of Data Variables. Int. Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 8(6):735--746, 2000.
    Keywords: Clustering, Fuzzy Clustering, Fuzzy c-Means.
    Abstract: We introduce an objective function based fuzzy clustering technique that assigns one influence parameter to each single data variable and each cluster. Our method is not only suited to detect structures or groups in unevenly over the structure's single domains distributed data, but gives also information about the influence of special variables on the detected groups. In addition, our approach can be seen as generalization of the well-known fuzzy c-means clustering algorithm.
  • Eamonn J. Keogh, Kaushik Chakrabarti, Michael J. Pazzani, and Sharad Mehrotra. Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. KAIS, 3(3):263--286, 2000.
    Keywords: Similarity Measures, Wavelets, Sequential/Temporal Data.
    Abstract: The problem of similarity search in large time series databases has attracted much attention recently. It is a non-trivial problem because of the inherent high dimensionality of the data. The most promising solutions involve first performing dimensionality reduction on the data, and then indexing the reduced data with a spatial access method. Three major dimensionality reduction techniques have been proposed, Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and more recently the Discrete Wavelet Transform (DWT). In this work, we introduce a new dimensionality reduction technique which we call Piecewise Aggregate Approximation (PAA). We theoretically and empirically compare it to the other techniques and demonstrate its superiority. In addition to being competitive with or fast than the other methods, our approach has numerous other advantages. It is simple to understand and to implement, it allows more flexible distance measures, including weighted Euclidean queries, and the index can be built in linear time.
  • Vladik Kreinovich and Yeung Yam. Why Clustering in Function Approximation? Theoretical Explanation. Int. Journal of Intelligent Systems, 15(10):959--966, 2000.
    Keywords: Clustering, Splines.
    Abstract: Function approximation is a very important practical problem: in many practical applications, we know the exact form of the functional dependece $y=f(x_1,...,x_n)$ between physical quantities, but this expeact dependence is complicated, so we need a lot of computer space to store it, and a lot of time to process it, i.e., to predict $y$ from the given $x_i$. It is therefore necessary to find a simpler approximate expression $g(x_1,...,x_n) \approx f(x_1,...,x_n)$ for this same dependence. This problem has been analyzed in numerical mathematics for several centuries, and it is, therefore, one of the most thoroughly analyzed problems of applied mathematics. There are many results related to approximation by polynomials, splines of different type, etc. Since this problem has been analyzed for so long, no wonder that for many reasonable formulations of the optimality criteria, the corresponding problems of finding the optimal approximations have already been solved. \newline Lateley, however, new clustering-related techniques have been applied to solve this problem (by Yager, Filev, Chu, and others). At first glance, since for most traditional optimality criteria, optimal approximations are already known, clustering approach can only lead to non-optimal approximations, i.e., approximations of inferior quality. We show, however, that there exist new reasonable criteria with respect to which clustering-based function approximation is indeed the optimal method of function approximation.
  • Nikhil R. Pal and Debrup Chakraborty. Mountain and Subtractive Clustering Method: Improvements and Generalizations. IJIS, 15:329--341, 2000.
    Keywords: Clustering, Mountain Method.
    Abstract: The mountain method of clustering and its relative, the subtractive clustering method, are studied here. A scheme to improve the accuracy of the prototypes obtained by the mountain method is proposed. Finally, the mountain circular shell method to detect circular shells by using the mountain function is proposed. The proposed method is tested extensively on several synthetic data sets, and the results obtained are quire satisfactory.
  • Kate A. Smith and Jatinder N.D. Gupta. Neural networks in business: techniques and applications for the operations researcher. Computer & Operations Research, 27:1023--1044, 2000.
    Keywords: Neural Networks.
    Abstract: This paper presents an overview of the different types of neural network models which are applicable when solving business problems. The history of neural networks in business is outlined, leading to a discussion of the current applications in business including data mining, as well as the current research directions. The role of neural networks as a modern operations research tool is discussed.

Conference's articles

  • Thomas G. Dietterich. The Divide-and-Conquer Manifesto. In Proc. 11th Int. Conf. on Algorithmic Learning Theory, pages 13--26, 2000. [ URL ]
    Keywords: Sequential/Temporal Data.
    Abstract: Existing machine learning theory and algorithms have focused on learning an unknown function from training examples, where the unknown functions maps from a feature vector to one of a small number of classes. Emerging applications in science and industry require learning much more complex functions that map from complex input spaces (e.g., 2-dimensional maps, time series, and strings) to complex output spaces (e.g., other 2-dimensional maps, time series, and strings). Despite the lack of theory covering such cases, many practical systems have been built that work well in particular applications. These systems all employ some form of divide-and-conquer, where the inputs and outputs are divided into smaller pieces (e.g., windows), classified, and then the results are merged to produce an overall solution. This paper defines the problem of divide-and-conquer learning and identifies the key research quetions that need to be studied in order to develop practical, general-purpose learning algorithms for divide-and-conquer problems and an associated theory.
  • Graziano Frosini, Beatrice Lazzerini, and Francesco Marcelloni. A modified Fuzzy c-Means Algorithm for Feature Selection. In Thomas Whalen, editor, NAFIPS00, Atlanta, Georgia, USA, pages 148--152, 2000.
    Keywords: Classification, Clustering, Fuzzy c-Means, Nearest Neighbour Methods.
    Abstract: In this paper we propose a novel method for feature selection based on a modified fuzzy c-means algorithm with supervision (MFCMS). MFCMS adopts an appropriately modifed version of the objective function used by the classic fuzzy c-means. We applied MFCMS to some real-world pattern classification benchmarks. To test the effectiveness of MFCMS as feature selector, we used the well-known k-nearest neighbour as learning algorithm. In our experiments we found that the classification performance using the set of features selected by MFCMS is better than that using all the original features. Furthermore, our approach proved to be less time-consuming than other selection methods.
  • B. Goethals and J. Van den Bussche. On Supporting interactive association rule mining. In DAWAK00, volume 1874 of LNCS, pages 307--316, 2000.
    Keywords: Association Rules.
    Abstract: We investigate ways to support interactive mining sessions, in the setting of association rule mining. In such sessions, users specify conditions (queries) on the associations to be generated. Our approach is a combination of the integration of querying conditions inside the mining phase, and the incremental querying of already generated associations. We present several concrete algorithms and compare their performance.
  • Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without candidate generation. In ICMD00, pages 1--12, 2000. [ URL ]
    Keywords: Sequential/Temporal Data.
    Abstract: Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns.\newline In this study, we propose a novel frequent pattern tree (FP-tree) structure, which is an extended pre x- tree structure for storing compressed, crucial information about frequent patterns, and develop an e cient FP-tree- based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. E ciency of mining is achieved with three techniques: (1) a large database is compressed into a highly condensed, much smaller data structure, which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining con ned patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is e cient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent pattern mining methods.
  • Bjarne K. Hansen. Analog forecasting of ceiling and visibility using fuzzy sets. In Preprints of the 2nd Conference on Artificial Intelligence, American Meteorological Society, pages 1--7, 2000. [ URL ]
    Keywords: Similarity Measures, Nearest Neighbour Methods.
    Abstract: A fuzzy logic based methodology for knowledge acquisition is used to build a retrieval-based analog forecasting system, a fuzzy k-nearest neighbour based prediction system. The methodology is used to acquire knowledge about what salient features of continuous-vector, unique temporal cases indicate significant similarity between cases. Such knowledge is encoded in a similarity-measuring function and thereby used to retrieve k nearest neighbours (k-nn) from a large database of airport weather observations. Predictions for the present weather case are made from a weighted median of the outcomes of analogous past cases, the k-nn, the analog ensemble. Past cases are weighted according to their degree of similarity to the present case.
  • Frank Höppner. Piecewise Linear Function Approximation by Alternating Optimization. In Proc. of the 8th Int. Conf. on Information Processing and Management of Uncertainty in Knowledge Based Systems (IPMU), Madrid, Spain, pages 1751--1757, 2000. [ PDF ] [ Postscript ]
    Keywords: Piecewise Linear Representations, Clustering, Fuzzy Clustering, Fuzzy c-Means.
    Abstract: Fuzzy clustering algorithms like the Fuzzy c-Means algorithm perform cluster analysis by minimizing an objective function through {\sl alternating optimization} (AO). The problem of optimal piecewise linear function approximation can also be expressed by means of an objective function. In this paper, the AO technique is used to derive a new solution to this problem. The resulting unsupervised Hard c-Connected Lines (HcCL) algorithm updates alternatingly support points and output values and finds an optimum number of line segments and their location without any further input parameter.
  • Frank Höppner and Frank Klawonn. Fuzzy Clustering of Sampled Functions. In NAFIPS00, Atlanta, USA, pages 251--255, 2000. [ PDF ] [ Postscript ]
    Keywords: Clustering, Fuzzy Clustering.
    Abstract: Fuzzy clustering algorithms perform cluster analysis on a data set that consists of feature attribute vectors. In the context of multiple sampled functions, a set of samples (sampled function) becomes a single datum. We show how the already known algorithms can be used to perform fuzzy cluster analysis on this kind of data sets by replacing the conventional prototypes with sets of prototypes. This approach allows reusing the known algorithms and works also with other data sets than sampled functions. Furtermore, to reduce the computational costs in case of single-input/single-output functions we present a new algorithm, which uses for the first time a more complex input data type (data points aggregated to data-lines) than the known approaches. The new alternating optimisation algorithm performs cluster analysis directly on this more compact representation of the sampled functions.
  • Frank Höppner and Frank Klawonn. Obtaining Interpretable Fuzzy Models from Fuzzy Clustering and Fuzzy Regression. In KES00, Brighton, UK, pages 162--165, 2000. [ PDF ] [ Postscript ]
    Keywords: Clustering, Fuzzy Clustering, Fuzzy Models, Sequential/Temporal Data, Regression.
    Abstract: In this paper we develop an objective function-based clustering algorithm to build fuzzy models of the Takagi-Sugeno (TS) type automatically from data. In contrast to most of the TS models that can be found in the literature, we decided to use very simple input-space partitions and a higher degree of consequence polynomials (quadratic). Only in this way transparency and interpretability can be guaranteed. We also show how to derive linguistic labels for the polynomials found by the algorithm.
  • Po-Shan Kam and Ada Wai-Chee Fu. Discovering Temporal Patterns for Interval-based Events. In DAWAK00, volume 1874 of LNCS, pages 317--326, 2000.
    Keywords: Sequential/Temporal Patterns.
    Abstract: In this paper, we consider interval-based events where the duration of events is expressed in terms of endpoint values, and these are used to form temporal constraints in the discovery process. We introduce the notion of temporal representation which is capable of expressing the relationships between interval-based events. We develop new methods for finding such interesting patterns.
  • Kamran Karimi and Howard J. Hamilton. Finding Temporal Relations: Causal Bayesian Networks vs. C4.5. In ISMIS00, Charlotte, NC, USA, pages 266-273, 2000.
    Keywords: Decision Trees, Bayesian Networks.
    Abstract: Observing the world and finding trends and relations among variables of interest is an important and common learning activity. In this paper we apply TETRAD, a program that uses Bayesian networks to discover causal rules, and C4.5, which creates decision trees, to the problem of discovering relations among a set of variables in the controlled environment of an Artificial Life simulator. All data in this environment are generated by a single entity over time. The rules in the domain are known, so we are able to assess the effectivness of each method. The agent's sensings of its environment and its own actions are saved in data records over time. We first compare TETRAD and C4.5 in discovering the relations between variables in a single record. We next attempt to find temporal relations among the variables of consecutive records. Since both these programs disregard the passage of time among the records, we introduce the flattening operation as a way to span time and bring the variables of interest together in a new single record. We observe that flattening allows C4.5 to discover relations among variables over time, while it does not improve TETRAD's output.
  • Annette Keller. Fuzzy Clustering with Outliers. In NAFIPS00, 2000.
    Keywords: Noise Handling, Clustering, Fuzzy Clustering.
    Abstract: In this paper we introduce a modified objective function for fuzzy clustering. We add an additional weighting factor for each datum and derive necessary conditions for the introduced parameter in order to optimise the objective function. These conditions are used in an alternating optimisation scheme to calculate a partition of sample data. The obtained weights determine a kind of representativeness of each datum for the data distribution. They can be used to identify outliers and enable the expert to locate critical areas that are often represented by only a few outliers.
  • Edward D. Kim, Joyce M. W. Lam, and Jiawei Han. AIM: Approximate Intelligent Matching for Time Series Data. In Y. Kambayashi, M. Mohania, and A. M. Tjoa, editors, DAWAK00, volume 1874 of LNCS, London, UK, pages 347--357, 2000.
    Keywords: Similarity Measures, Sequential/Temporal Data.
    Abstract: Time-series data mining presents many challenges due to the intrinsic large scale and high dimensionality of the data sets. Subsequence similarity matching has been an active research area driven by the need to analyze large data sets in the financial, biomedical and scientific databases. In this paper, we investigate an intelligent subsequence similarity matching of time series queries based on efficient graph traversal. We introduce a new problem, the approximate partial matching of a query sequence in a time series database. Our system can address such queries with high specifity and minimal time and space overhead. The performance bottleneck of the current methods were analyzed and we show our method can improve the performance of the time series queries significantly. It is general and flexible enough to find the best approximate match query without specifying a tolerance $\varepsilon$ parameter.
  • Yingjiu Li, X. Sean Wang, and Sushil Jajodia. Discovering Temporal Patterns in Multiple Granularities. In J.F. Roddick and K. Hornsby, editors, TSDM00, number 2007 of LNAI, Lyon, France, pages 5--19, 2000.
    Keywords: Sequential/Temporal Patterns.
    Abstract: Many events repeat themselves as the time goes by. For example, an institute pays its employees on the first day of every month. However, events may not repeat with a constant span of time. In the payday example here, the span between each two consecutive paydays ranges between 28 and 31 days. As a result, regularity, or temporal pattern, has to be captured with a use of granularities (such as day, week, month, and year), oftentimes multiple granularities. This paper defines the above patterns, and proposes a number of pattern discovery algorithms. To focus on the basics, the paper assumes that a list of events with their timestamps is given, and the algorithms try to find patterns for the events. All of the algorithms repeat two possibly interleaving steps, with the first step generating possible (called candidate) patterns, and the second step verifying if candidate patterns satisfy some user-given requirements. The algorithms use pruning techniques to reduce the number of candidate patterns, and adopt a data structure to efficiently implement the second step. Experiments show that the pruning techniques and the data structure are quite effective.
  • Weiqiang Lin, Mehmet A. Orgun, and Graham J. Williams. Temporal Data Mining Using Multilevel-Local Polynomial Models. In IDEAL00, volume 1983 of LNCS, pages 180--186, 2000.
    Keywords: Similarity Measures, Sequential/Temporal Data, Sequential/Temporal Patterns.
    Abstract: This study proposes a data mining framework to discover qualitative and quantitative patterns in discrete-valued time series (DTS). In our method, there are three levels for mining temporal patterns. At the first level, a structural method based on distance measures through polynomial modelling is employed to find pattern structures; the second level performs a value-based search using local polynomial analysis; and then the third level based on multilevel-local polynomial models find global patterns from a DTS set. We demonstrate our method on the analysis of ``Exchange Rates Patterns'' between the US dollar and Australian dollar.
  • Bing Liu, Yiyuan Xia, and Philip S. Yu. Clustering Through Decision Tree Construction. In CIKM00, pages 20--29, 2000.
    Keywords: Classification, Decision Trees, Clustering, Similarity Measures.
    Abstract: Clustering aims to find the intrinsic nature of data by organizing data objects into similarity groups or clusters. It is often called unsupervised learning as no class labels denoting an a priori partition of the objects are given. This is in contrast with supervised learning (e.g., classification) for which the data objects are already labeled with known classes. Past research in clustering has produced many algorithms. However, these algorithms have some major shortcomings. In this paper, we propose a novel clustering technique, which is based on a supervised learning technique called decision tree construction. The new technique is able to overcome many of these shortcomings. The key idea is to use a decision tree to partition the data space into clusters and empty (sparse) regions at different levels of details. The technique is able to find "natural" clusters in large high dimensional spaces efficiently. It is suitable for clustering in the full dimensional space as well as in subspaces. It also provides comprehensible descriptions of clusters. Experiment results on both synthetic data and real-life data show that the technique is effective and also scales well for large high dimensional datasets.
  • Anna Maria Massone, Léonard Studer, and Francesco Masulli. Pattern Recognition in RICH Counters Using the Possibilistic C-Spherical Shell Algorithm. In KES00, Brighton, UK, pages 792--795, 2000.
    Keywords: Noise Handling, Image Data.
    Abstract: The pattern recognition problem in RICH counters concerns the identification of an unknown number of imperfect roughly-circular rings made of a low number of discrete points in presence of background. In this paper we present some preliminary results obtained using the Possibilistic c-Spherical Shells Algorithm. In particular, we show that the algorithm is very tolerant and robust to noise (outliers rate) level. Moreover, for complex images, full of rings, we introduce an iterative scheme that greatly improves performances. Besides that, the rings are not requested to be complete, only arcs are enough to recognize the underlying rings by the algorithm.
  • Katharina Morik. The Representation Race -- Preprocessing for Handling Time Phenomena. In ECML00, volume 1810 of LNAI, Barcelona, Spain, pages 4--19, 2000.
    Abstract: Designing the representation language for the input, $L_E$, and output, $L_H$, of a learning algorithm is the hardest task within machine learning applications. This paper emphasizes the importance of constructing an appropriate representation $L_E$ for knowledge discovery applications using the example of time related phenomena. Given the same raw data -- most frequently a database with time-stamped data -- rather different representations have to be produced for the learning methods that handle time. In this paper, a set of learning tasks dealing with time is given together with the input required by learning methods which solve the task. Transformations from raw data to the desired representation are illustrated by three case studies.
  • Jian Pei, Jiawei Han, and Runying Mao. CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 21--30, 2000.
    Keywords: Association Rules.
    Abstract: Association mining may often derive an undesirably large set of frequent itemsets and association rules. Recent studies have proposed an interesting alternative: mining frequent closed itemsets and their corresponding rules, which has the same power as association mining but substantially reduces the number of rules presented. In this paper, we propose an efficient algorithm, CLOSET, for mining closed itemsets, with the development of three techniques: (1) applying a compressed, frequent pattern tree FP-tree structure for mining closed itemsets without candidate generation, (2) developying a single prefix path comperession technqiue to identify frequent closed itemsets quickly, and (3) exploring a partition-based projection mechanism for scalable mining in large databases. Our performance study shows that CLOSET is efficient and scalable over large databases, and is faster than the previously proposed methods.
  • Alexandrin Popescul, Gary William Flake, Steve Lawrence, Lyle H. Ungar, and C. Lee Giles. Clustering and Identifying Temporal Trends in Document Databases. In Proc. of the IEEE Int. Conf. on Advances in Digital Libraries (ADL), Washington, DC, pages 173--182, 2000.
    Keywords: Clustering.
    Abstract: We introduce a simple and efficient method for clustering and identifying temporal trends in hyper-linkes document databases. Our method can scale to large datasets because it exploits the underlying regularity often found in hyper-linked document databases. Because of this scalability, we can use our method to study the temporal trends of individual clusters in a statistically meaningful manner. As an example of our approach, we give a summary of the temporal trends found in a scientific literature database with thousands of documents (citeseer).
  • Richard J. Povinelli. Identifying Temporal Patterns for Characterization and Prediction of Financial Time Series Events. In J.F. Roddick and K. Hornsby, editors, TSDM00, number 2007 of LNAI, pages 46--61, 2000.
    Keywords: Sequential/Temporal Data, Sequential/Temporal Patterns.
    Abstract: The novel time series data mining (TSDM) framework is applied to analyzing financial time series. The TSDM framework adapts and innovates data mining concepts to analyzing time series data. In particular, it creates a set of methods that reveal hidden temporal patterns that are characteristic and predictive of time series events. This contrasts with other time series analysis techniques, which typically characterize and predict all observations. The TSDM framework and concepts are reviewed, and the applicable TSDM method is discussed. Finally, the TSDM method is applied to time series generated by a basket of financial securities. The results show that statistically significant temporal patterns that are both characteristic and predictive of events in financial time series can be identified.
  • Chris P. Rainsford and John F. Roddick. Visualisation of Temporal Interval Association Rules. In IDEAL00, number 1983 of LNCS, pages 91--96, 2000.
    Keywords: Association Rules.
    Abstract: Temporal intervals and the interaction of interval-based events are fundamental in many domains including medicine, commerce, computer security and various types of normalcy analysis. In order to learn from temporal interval data we have developed a temporal interval association rule algorithm. In this paper, we will provide a definition for temporal interval association rules and present our visualization techniques for viewing them. Visualization techniques are particularly important because the complexity and volume of knowledge that is discovered during data mining often makes it difficult to comprehend. We adopt a circular graph for visualizing a set of associations that allows underlying patterns in the associations to be identified. To visualize temporal relationships, a parallel coordinate graph for displaying the temporal relationships has been developed.
  • Iztok Savnik, Georg Lausen, Hans-Peter Kahle, Heinrich Spiecker, and Sebastian Hein. Algorithm for Matching Sets of Time Series. In Int. Conf. on Principles of Data Mining and Knowledge Discovery, pages 277-288, 2000.
    Keywords: Classification, Sequential/Temporal Data.
    Abstract: Time series are time-stamped sequences of values which represent a parameter of the observed processes in subsequent time points. Given a set of time series describing a set of similar processes, the model of the behaviour of processes is constructed as a range of classification trees which describe the characteristics of each particular time point in series. An algorithm for matching a sequence of values with the model is used for searching common patterns in the sets of time series, and for predicting the starting time points of undated time series. The algorithm was developed and analyzed in the frame of the study of tree-ring time series. The implementation and the empirical analysis of the algorithm on the tree-ring time series are presented.

Internal reports

  • Pete Chapman, Julian Clinton, Randy Kerber, Thomas Khabaza, Thomas Reinartz, Colin Shearer, and Rüdiger Wirth. Cross Industry Standard Process for Data Mining (CRISP-DM) -- Step by Step Data Mining Guide. Technical report, 2000.
  • Todd A. Stephenson. An Introduction to Bayesian Network Theory and Usage. Technical report 3, 2000.
    Keywords: Bayesian Networks, Sequential/Temporal Data.
    Abstract: I present an introduction to some of the concepts within Bayesian networks to help a beginner become familiar with this field's theory. Bayesian networks are a combination of two different mathematical areas: graph theory and probability theory. So, I first give the basic definition of Bayesian networks. This is followed by an elaboration of the underlying graph theory that involves the arrangements of nodes and edges in a graph. Since Bayesian networks encode one's beliefs for a system of variables, I then proceed to discuss, in general, how to update these beliefs when one or more of the variables' values are no longer unknown (i.e., you have observed their values). Learning algorithms involve a combination of learning the probability distributions along with learning the network topology. I then conclude Part I by showing how Bayesian networks can be used in various domains, such as in the time-series problem of automatic speech recognition. In Part II I then give in more detail some of the algorithms needed for working with Bayesian networks.
  • Pang-Ning Tan and Vipin Kumar. Interestingness Measures for Association Patterns: A Perspective. Technical report TR00-036, 2000. [ URL ]

Disclaimer

This list of publications is neither official nor complete, but a personal compilation.

Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.

This document was translated from BibTEX by bibtex2html

Home © F. Höppner last update: Tue Dec 7 08:49:56 CET 2004