-
Ziv Bar-Joseph,
Georg Gerber,
David K. Gifford,
and Tommi S. Jaakkola.
A New Approach to Analyzing Gene Expression Time Series Data.
In 6th Annual Int. Conf. on Research in Comp. Molecular Biology,
2002.
Keywords:
Clustering,
Time Warping,
Sequential/Temporal Data,
Splines.
| Abstract: |
We present algorithms for time-series gene expression analysis that permit the principles estimation of unobserved time-points, clustering, and data-alignment. Each expression profile is modeled as a cubic spline (piecewise polynomial) that is estimated from the observed data and every time point influences the overall smooth expression curve. We constrain the spline coefficients of genes in the same class to have similar expression patterns, while also allowing for gene specific parameters. We show that unobserved time-points can be reconstructed using our method with 10-15\ 4.902896e+00ss error when compared to previous best methods. Our clustering algorithm operates directly on the continuous representation of gene expression profiles, and we demonstrate that this is particularily effective when applied to non-uniformly sampled data. Our continuous alignment algorithm also avoids difficulties encountered by discrete approaches. In particular, our method allows for control of the number of degrees of freedom of the warp through the specification of parameterized functions, which helps to avoid overfitting. We demonstrate that our algorithm produces stable low-error alignments on real expression data and further show a specific application to yeast knockout data that produces biologically meaningful results. |
-
Riccardo Bellazzi,
Cristiana Larizza,
Paolo Magni,
and Roberto Bellazzi.
Quality assessment of dialysis services through Intelligent Data Analysis and temporal data mining.
In KDSTD02,
Lyon, France,
pages 3--9,
2002.
Keywords:
Association Rules,
Sequential/Temporal Data.
| Abstract: |
This paper describes a research project that deals with the definition of methods and tools for the assessment of the clinical performance of a hemodialysis service. While simple statistical summaries are computed to asses basic outcomes, Intelligent Data Analysis techniques are applied to gain insight and to discover knowledge on the causes of negative clinical results. In particular, different techniques, comprising Temporal Abstractions, association rules and confirmation rules induction, are applied on the time series that are automatically collected during the monitoring of hemodialysis sessions. The paper describes the application domain, the basic goals of the project and the methodological approach applied for time series data analysis. Some clinically interesting results, obtained on data coming from 5 patients monitored for nine months, are also shown. |
-
Michael R. Berthold,
David E. Patterson,
Marco Ortolani,
Heiko Hofer,
Frank Höppner,
and Ondine Callan.
Shape-Invariant Fuzzy Clustering of Proteomics Data.
In NAFIPS02,
2002.
Keywords:
Clustering,
Fuzzy Clustering.
-
Richard J. Bolton,
David J. Hand,
and Niall M. Adams.
Determining Hit Rate in Pattern Search.
In \citeHand:PDD:2002,
pages 36--48,
2002.
-
Christian Borgelt and Michael R. Berthold.
Mining Molecular Fragments: Finding Relevant Substructures of Molecules.
In ICDM02,
pages 51--58,
2002.
| Abstract: |
We present an algorithm to find fragments in a set of molecules that help to discriminate between different classes of, for instance, activity in a drug discovery context. Instead of carrying out a brute-force search, our method generates fragments by embedding them in all appropriate molecules in parallel and prunes the search tree based on a local order of the atoms and bonds, which results in substantially faster search by eliminating the need for frequent, computationally expensive reembeddings and by suppressing redundant search. We prove the usefulness of our algorithm by demonstrating the discovery of activity-related groups of chemical compounds in the well-known National Cancer Institute s HIV-screening dataset. |
-
G. Carrault,
M.-O. Cordier,
Réne Quiniou,
and F. Wang.
Intelligent multichannel cardiac data analysis for diagnosis and monitoring.
In KDSTD02,
Lyon, France,
pages 10--16,
2002.
Keywords:
Inductive Logic Programming,
Sequential/Temporal Patterns.
| Abstract: |
We present a novel approach to cardiac data analysis for monitoring patients in intensive care units. Recorded data are transformed into symbolic event streams. Then a chronicle recognizer attempts to detect, on the fly, interesting temporal patterns called chronicles in this stream. Chronicles including temporal relationships are learned from example data by relational learning methods such as Inductive Logic Programming. The resulting chronicles are high level representations of temporal phenomena. On the one hand, they are expected to be more accurate and robust for diagnosis, even in the presence of a noisy input and, on the other hand, are easier to understand by the clinical staff. The approach is illustrated on learning and recognizing cardiac arrhythmias from electrocardiograms. |
-
Alan Fern,
Robert Givan,
and Jeffrey Mark Siskin.
Specific-to-General Learning for Temporal Events.
In National Conference on Artificial Intelligence (AAAI-02),
2002.
Keywords:
Classification.
| Abstract: |
We study the problem of supervised learning of event classes in a simple temporal event-description language. We give lower and upper bounds and algorithms for the subsumption and generalization problems for two expressively powerful subsets of this logic, and present a positive-examples-only specific-to-general learning method based on the resulting algorithms. We also present a polynomial-time computable ``syntactic'' subsumption test that implies semantic subsumption without being equivalent to it. A generalization algorithm based on syntactic subsumption can be used in place of semantic generalization to improve the asymptotic complexity of the resulting learning algorithm. A companion paper shows that our methods can be applied to duplicate the performance of human-coded concepts in the substantial application domain of video event recognition. |
-
Alan Fern,
Jeffrey Mark Siskind,
and Robert Givan.
Learning Temporal, Relational, Force-Dynamic Event Definitions from Video.
In National Conference on Artificial Intelligence (AAAI-02),
2002.
| Abstract: |
We present and evaluate a novel implemented approach for learning to recognize events in video. First, we introduce a sublanguage of event logic, called k-AMA, that is sufficiently expressive to represent visual events yet sufficiently restrictive to support learning. Second, we develop a specific-to-general learning algorithm for learning event definitions in k-AMA. Finally, we apply this algorithm to the task of learning event definitions from video and show that it yields definitions that are competitive with hand-coded ones. |
-
David Hand.
Pattern Detection and Discovery.
In \citeHand:PDD:2002,
pages 1--12,
2002.
-
Frank Höppner.
Handling Feature Ambiguity in Knowledge Discovery from Time Series.
In Proc. of 5th Int. Conf. on Discovery Science,
number 2534 of LNCS,
Lübeck, Germany,
pages 398--405,
2002.
Keywords:
Piecewise Linear Representations,
Multiscale Analysis,
Sequential/Temporal Data.
| Abstract: |
In knowledge discovery from time series abstractions (like piecewise linear representations) are often preferred over raw data. In most cases it is implicitly assumed that there is a single valid abstraction and that the abstraction method, which is often heuristic in nature, finds this abstraction. We argue that this assumption does not hold in general and that there is need for knowledge discovery methods that pay attention to the ambiguity of features: In a different context, an increasing segment may be considered as (being part of) a decreasing segment. It is not a priori clear which view is correct or meaningful. We show that the relevance of ambiguous features depends on the relevance of the knowledge that can be discovered by using the features. We combine techniques from multiscale signal analysis and interval sequence mining to discover rules about dependencies in multivariate time series. |
-
Frank Höppner.
Learning Dependencies in Multivariate Time Series.
In KDSTD02,
Lyon, France,
pages 25--31,
2002.
Keywords:
Sequential/Temporal Data,
Sequential/Temporal Patterns,
Temporal Reasoning.
| Abstract: |
This paper sketches an approach to learn interdependencies between multiple time series. At the beginning the time series are segmented and thereby transformed into sequences of labeled intervals. The labels denote qualitative aspects of the signal in the respective intervals. Then, from the sequence of labeled intervals, we discover rules where premise and conclusion consist of temporal patterns. The temporal patterns are sets of intervals where Allen's interval logic is used to capture their temporal relationships. Rules are specialized with respect to numerical attributes like the length of the intervals or the slope of the signal within the interval. Finally, we obtain rules like ``when signal A decreases while signal B increases with slope greather than 2 then signal C will decrease''. Since humans use a similar syntax when discussing such aspects, the proposed methodology may support a human in learning dependencies in multivariate time series. |
-
Frank Höppner.
Time Series Abstraction Methods -- A Survey.
In Proceedings GI Jahrestagung Informatik, Workshop on Knowl. Discovery in Databases,
LNI,
Dortmund, Germany,
pages 777-786,
2002.
Keywords:
Noise Handling,
Multiscale Analysis,
Sequential/Temporal Data,
Surveys.
| Abstract: |
In this paper several time series abstraction methods are reviewed with respect to their applicability in the framework of knowledge discovery (KDD) and data mining (DM). We discuss the demands of KDD/DM (noise, robustness) and show that many widely used methods do not satisfy them, because they are single-scale methods. With multiscale approaches, however, the drawbacks of the single-scale methods can be addressed explicitly. |
-
Frank Höppner.
Lernen lokaler Zusammenhänge in multivariaten Zeitreihen.
In J. Biethahn,
J. Kuhl,
and A. Lackner, editors,
Tagungsband zum 5. Göttinger Symposium Soft Computing,
Göttingen,
pages 113--125,
2002.
[ PDF
]
-
Frank Höppner.
Discovery of Core Episodes from Sequences -- Using Generalization for Defragmentation of Rule Sets.
In Pattern Detection and Discovery,
number 2447 of LNAI,
pages 199--213,
2002.
Keywords:
Association Rules,
Sequential/Temporal Data,
Temporal Reasoning.
| Abstract: |
We consider the problem of knowledge induction from sequential or temporal data. Patterns and rules in such data can be detected using methods adopted from association rule mining. The resulting set of rules is usually too large to be inspected manually. We show that (amongst other reasons) the inadequacy of the pattern space is often responsible for many of these patterns: If the true relationship in the data is fragmented by the pattern space, it cannot show up as a peak of high pattern density, but the data is divided among many different patterns, often difficult to distinguish from incidental patterns. To overcome this fragmentation, we identify core patterns that are shared among specialized patterns. The core patterns are then generalized by selecting a subset of specialized patterns and combining them disjunctively. The generalized patterns can be used to reduce the size of the set of patterns. We show some experiments for the case of labeled interval sequences, where patterns consist of a set of labeled intervals and their temporal relationships expressed via Allen's interval logic. |
-
Eamonn J. Keogh,
Stefano Lonardi,
and Bill Chiu.
Finding Surprising Patterns in a Time Series Database in Linear Time and Space.
In KDD02,
Edmonton, Canada,
2002.
Keywords:
Sequential/Temporal Data.
| Abstract: |
The problem of finding a specified pattern in a time series database (i.e. query by content) has received much attention and is now a relatively mature field. In contrast, the important problem of enumerating all surprising or interesting patterns has received far less attention. This problem requires a meaningful definition of "surprise", and an efficient search technique. All previous attempts at finding surprising patterns in time series use a very limited notion of surprise, and/or do not scale to massive datasets. To overcome these limitations we introduce a novel technique that defines a pattern surprising if the frequency of its occurrence differs substantially from that expected by chance, given some previously seen data. This notion has the advantage of not requiring an explicit definition of surprise, which may be impossile to elicit from a domain expert. Instead the user simply gives the algorithm a collection of previously observed normal data. Our algorithm uses a suffix tree to efficiently encode the frequency of all observed patterns and allows a Markov model to predict the expected frequency of previously unobserved patterns. Once the suffix tree has been constructed, a measure of surprise for all the patterns in a new database can be determined in time and space linear in the size of the database. We demonstrate the utility of our approach with an extensive experimental evaluation. |
-
Terence Kwok,
Kate Smith,
Sebastian Lozano,
and David Taniar.
Parallel Fuzzy c-Means Clustering for Large Data Sets.
In Burkhard Monien and Rainer Feldmann, editors,
EUROPAR02,
volume 2400 of LNCS,
pages 365--374,
2002.
Keywords:
Clustering,
Fuzzy c-Means.
| Abstract: |
The parallel fuzzy c-means (PFCM) algorithm for clustering large data sets is proposed in this paper. The proposed algorithm is designed to run on parallel computers of he Single Program Multiple Data (SPMD) model type with the Message Passing Interface (MPI). A comparison is made between PFCM and an existing parallel k-means (PKM) algorithm in terms of thei parallelisation capability and scalability. In an implementation of PFCM to cluster a large data set from an insurance company, the proposed algorithm is demonstrated to have almost ideal speedups as well as an excellent scaleup with respect to the size of the data set. |
-
S. Miyamoto and D. Suizu.
Fuzzy c-means clustering using transformations into high dimensional spaces.
In Int. Conf. on Fuzzy Systems and Knowledge Discovery,
volume 2,
Singapore,
pages 656--660,
2002.
Keywords:
Clustering,
Fuzzy c-Means.
-
Marco Ortolani,
Heiko Hofer,
David Patterson,
Frank Höppner,
and Michael Berthold.
Fuzzy Information Granules in Time Series Data.
In FUZZIEEE02,
Honolulu, Hawai,
pages 695--699,
2002.
Keywords:
Clustering,
Fuzzy Clustering,
Fuzzy c-Means,
Sequential/Temporal Data,
Medical Applications.
| Abstract: |
It is often desirable to summarize a set of time series through typical shapes in order to analyze them. The algorithm presented here compares pieces of different time series in order to find similar shapes. The use of a fuzzy clustering technique based on fuzzy c-means allows us to consider such subsets as belonging to typical shapes with a degree of membership. Additionally, this matching is invariant with respect to a scaling of the time series. The algorithm is demonstrated on a widely known set of data taken from the ECG rhythm analysis experiments at MIT laboratories. |
-
Linda Peelen,
Niels Peek,
and Ameen Abu-Hanna.
Acquiring and using temporal knowledge in medicine: an application in chronic pulmonary disease.
In KDSTD02,
Lyon, France,
pages 44-50,
2002.
Keywords:
Sequential/Temporal Data.
| Abstract: |
Routine recording of time-stamped patient data is becoming more and more common in clinical medicine. It is generally believed that this can improve the quality of medical practice by providing doctors with more knowledge of temporal relations in disease and treatment; most benefits are to be expected from better prognoses, and thereby better treatment choices. It is however far from obvious which methods should be applied to extract this knowledge from the data; probably a mixture of methods, tailored to the domain at hand, should be used. This paper discusses the issues that are involved when learning, and putting to use, knowledge from medical time series data. An example decision support application, for the prognosis and treatment of chronic pulmonary disease, is presented in detail. |
-
Juan J. Rodriguez and Carlos J. Alonso.
Boosting Interval-Based Literals: Variable Length and Early Classification.
In KDSTD02,
Lyon, France,
pages 51--62,
2002.
Keywords:
Classification,
Sequential/Temporal Data.
| Abstract: |
In previous works, a system for supervised time series classification has been presented. It is based on boosting very simple classifiers: only one literal. The used predicates are based on temporal intervals. There are two types of predicates: i) relative predicates, such as ``increases'' and ``stays'', and ii) region predicates, such as ``always'' and ``sometime'', which operate over regions in the domain of the variable. \newline This work presents two new features of the system. First, the system can now deal directly with variable length time series. Second, the obtained classifiers can now be used with partial time series. This ``early classification'' is essential in some supervision environments, where it is necessary to give an alarm signal as soon as possible. Experiments on different data sets show that the proposed method is highly competitive with previous approaches. |
-
Mohammad Shahmohammadi and Hamid R. Nikoofar.
Automatic Digital Modulation Recognition under Non-Coherent Condition.
In Proc. of ISIT,
Lausanne, Switzerland,
pages 19-22,
2002.
Keywords:
Classification,
Clustering.
| Abstract: |
In this paper, we propose a soft clustering algorithm for digital modulation recognition by exploiting the constellation shape as a key signature of the modulation type. Using the constellation shape, we change the procedure of automatic modulation recognition (AMR) to a general pattern classification problem. Moreover, the algorithm enables us to identify the modulation type under non-coherent condition. Our approach can be viewed as a method for blind estimation of reference phase with modulation identification. Experimental results are shown for several cases. The performance of the algorithm under non-coherent condition is relatively close to that of the ideal condition, which means that the algorithm achieves good estimation for reference phase. |
|