Home Kontakt Lehre Research Software


Publications of year 1998

Books and proceedings

  • Vladimir S. Cherkassky and Filip Mulier. Learning from Data. 1998.
    Keywords: Classification, Regression.
    Abstract: Chapter headings: Introduction, Problem Statement, Classical Approaches and Adaptive Learning, Regularization Framework, Statistical Learning Theory, Nonlinear Optimization Strategies, Methods for Data Reduction and Dimensionality Reduction, Methods for Regression, Classification, Support Vector Machines, Fuzzy Systems
  • Nils J. Nilsson. Artificial Intelligence: A New Synthesis. 1998.
    Keywords: Neural Networks, Artificial Intelligence.
    Abstract: Stimulus-Response Agents, Neural Networks, Machine Evolution, State Machines, Robot Vision, Agents That Plan, Uninformed Search, Heuristic Search, Planning-Acting-Learning, Alternative Search Formulations and Applications, Adversarial Search, The Propositional Calculus, Resolution in the Propositional Calculus, The Predicate Calculus, Resolution in the Predicate Calculus, Knowledge-Based Systems, Representing Commonsense Knowledge, Reasoning with Uncertain Information, Learning and Acting with Bayes Nets, The Situation Calculus, Planning, Multiple Agents, Communication among Agents, Agent Architectures

Articles in journal or book's chapters

  • R. Bellazzi, L. Ironi, R. Guglielmann, and M. Stefanelli. Qualitative models and fuzzy systems: an integrated approach for learning from data. AIM, 14:5--28, 1998.
    Keywords: Fuzzy Models.
    Abstract: This paper presents a method for the identification of the dynamics of non-linear systems by learning from data. The key idea which underlies our approach consists of the integration of qualitative modelling techniques with fuzzy logic systems. The resulting hybrid method exploits the a priori structural knowledge on the system to initialize a fuzzy inference procedure which determines, from the available experimental data, a functional approximation of the system dynamics that can be used as a reasonable predictor of the patient's future state. The major advantage which results from such an integrated framework lies in a significant improvement of both efficiency and robustness of identification methods based on fuzzy models which learn an input-output relation from data. As a benchmark of our method, we have considered the problem of identifying the response to the insulin therapy from insulin-dependent diabetic patients: the results obtained are presented and discussed in the paper.
  • Claudio Bettini, X. Sean Wang, Sushil Jajodia, and Jia-Ling Lin. Discovering Frequent Event Patterns with Multiple Granularities in Time Sequences. TKDE, 10(2):222-237, 1998.
    Keywords: Sequential/Temporal Data, Sequential/Temporal Patterns.
    Abstract: An important usage of time sequences is to discover temporal patterns. The discovery process usually starts with a user-specified skeleton, called an {\sl event structure} which consists of a number of variables representing events and temporal constraints among these variables; the goal of the data mining is to find temporal patterns, i.e., instantiations of the variables in the structure, that appear frequently in the time sequence. This paper introduces event structures that have temporal constraints with multiple granularities, defines the pattern-discovery problem with these structures, and studies effective algorithms to solve it. The basic components of the algorithms include timed automata with granularities (TAGs) and a number of heuristics. The TAGs are for testing whether a specific temporal pattern, called a candidate complex event type, appears frequently in a time sequence. Since there are often a huge number of candidate event types for a usual event structure, heuristics are presented aiming at reducing the number of candidate event types and reducing the time spent by the TAGs testing whether a candidate type does appear frequently in the sequence. These heuristics exploit the information provided by explicit and implicit temporal constraints with granularity in the given event structure. The paper also gives the results of an experiment to show the effectiveness of the heuristics on a real data set.
  • James C. Bezdek and Nikhil R. Pal. Some New Indexes of Cluster Validity. SMCB, 28(3):301--315, 1998.
    Keywords: Noise Handling, Clustering, Cluster Validity Measures.
    Abstract: We review two clustering algorithms (hard c-means and single linkage) and three indexes of crisp cluster validity (Hubert's statistics, the Davies-Bouldin index, and Dunn's index). We illustrate two deficiencies of Dunn's index which make it overly sensitive to noisy clusters and propose several generalizations of it that are not as brittle to outliers in the clusters. Our numerical examples show that the standard measure of interset distance (the minimum distance between points in a set) is the {\sl worst} (least reliable) measure upon which to base cluster validation indexes when the clusters are expected to form volumetric clouds. Experimental results also suggest that intercluster separation plays a more important role in cluster validation than cluster diameter. Our simulations show that while Dunn's original index has operational flaws, the concept it embodies provides a rich paradigm for validation of partitions that have cloud-like clusters. Five of our generalized Dunn's indexes provide the best validation results for the simulations presented.
  • Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30:107--117, 1998.
  • Antonio C. Capelo, Liliana Ironi, and Stefania Tentoni. Automated Mathematical Modelling from Experimental Data: An Application to Material Science. SMCC, 28(3):356--370, 1998.
    Abstract: Automated model formulation is a crucial issue toward the construction of computational environments that can reason about the behaviour of a physical system. The procedure of mathematically modelling a given physical system is quite complex and basically involves three fundamental entities: the experimental data, a set of candidate models, and rules for determining in such a set the ``best'' model that reproduces the measured data. The construction of the candidate model is domain dependent and based on specific knowledge and techniques of the application domain. The choice of the best model is guided by the data themselves; a first rough guess, which is suggested by the qualitative properties of the observed behaviour, is refined through system identification techniques so that the quantitative properties of the observed behaviour are assessed. Therefore, automating such a procedure requires handling and integrating different formalisms and methods, both qualitative and quantitative. This paper describes a comprehensive environment that aims at the automated formulation of an accurate quantitative model of the mechanical behaviour of an actual viscoelastic material in accordance with the observed response of the material to standard experiments. To this end, algorithms and methods for both the generation of an exhaustive library of models of ideal materials and the selection of the most ``accurate'' model of a real material have been designed and implemented. The model selection phase occurs in two main stages; at first, the subset of most plausible candidate models for the material is drawn out from the library in accordance with the qualitative properties of the material that are highlighted by the experimental data; then, the most accurate model of the material is identified within such a set by exploiting both statistical and numerical methods.
  • Wei-Ge Chen, Georgios B. Giannakis, and N. Nandhakumar. A Harmonic Retrieval Framework for Discontinuous Motion Estimation. TIP, 7(9):1242--1257, 1998.
    Keywords: Image Data.
    Abstract: Motion discontinuities arise when there are occlusions or multiple moving objects in the scene that is imaged. Conventional regularization techniques use smoothness constraints but are not applicable to motion discontinuities. In this paper, we show that discontinuous (or multiple) motion estimation can be viewed as a multicomponent harmonic retrieval problem. From this viewpoint, a number of established techniques for harmonic retrieval can be applied to solve the challenging problem of discontinuous (or multiple) motion. Compared with existing techniques, the resulting algorithm is not iterative, which not only implies computational efficiency but also obviates concerns regarding convergence or local minima. It also adds flexibility to spatio-temporal techniques which have suffered from lack of explicit modelling discontinuous motion. Experimental verification of our framework on both synthetic as well as real image data is provided.
  • Jian-Qin Chen, Yu-Geng Xi, and Zhong-Jun Zhang. A clustering algorithm for fuzzy model identification. FSS, 98:319--329, 1998.
    Keywords: Clustering, Fuzzy Models, Sequential/Temporal Data.
    Abstract: The fuzzy model proposed by Takagi and Sugeno can represent highly nonlinear systems and is widely used for the representation of fuzzy rules. In this paper, the model is firstly modified to make its identification easier. Base on the fuzzy $c$-partition space, four criteria are proposed for optimization of the model parameters. Following that, a clustering algorithm composed of fuzzy $c$-linear functions clustering and like fuzzy $c$-means clustering is developed for minimizing the four criteria. An identification scheme for rule's premise and consequence parameters is deduced from the clustering algorithm in succession. Finally, four examples are demonstrated to verify the effectiveness of the proposed algorithm.
  • Yasser El-Sonbaty and M. A. Ismail. Fuzzy Clustering for Symbolic Data. TFS, 6(2):195--204, 1998.
    Keywords: Clustering, Fuzzy Clustering.
    Abstract: Most of the techniques used in the literature in clustering symbolic data are based on the hierarchical methodology, which utilizes the concept of agglomerative or divisive methods as the core of the algorithm. The main contribution of this paper is to show how to apply the concept of fuzziness on a data set of symbolic objects and how to use this concept in formulating the clustering problem of symbolic objects as a partitioning problem. Finally, a fuzzy symbolic c-means algorithm is introduced as an application of applying and testing the proposed algorithm on real and synthetic data sets. The results of the application of the new algorithm show that the new technique is quite efficient and, in many respects, superior to traditional methods of hierarchical nature.
  • Amir B. Geva. Feature Extraction and State Identification in Biomedical Signals using Hierarchical Fuzzy Clustering. MBEC, 36:608--614, 1998.
    Keywords: Clustering, Cluster Validity Measures, Fuzzy Clustering, Sequential/Temporal Patterns, Medical Applications.
    Abstract: Many problems in the field of biomedical signal processing can be reduced to a task of state recognition and event prediction. Examples can be found in tachycardia detection from ECG signals, epileptic seizure or psychotic attack prediction from an EEG signal, and prediction of vehicle drivers falling asleep from both signals. The problem generally treats a set of ordered measurements and asks for the recognition of some patterns of observed elemtns that will forecast an event or a transition between two different states of the biological system. It is proposed to apply clustering methods to grouping discontinuous related temporal patterns of a continuously sampled measurement. The vague switches from one stationary state to another are naturally trated by means of fuzzy clustering. In such cases, an adaptive selection of the number of clusters (the number of underlying semi-stationary processes) can overcome the general non-stationary nature of biomedical signals and enable the formation of a warning cluster. The algorithm suggested for the clustering is a new recursive algorithm for hierarchical fuzzy partitioning. Each pattern can have a non-zero membership on more than one data subset in the hierarchy. A `natural' and feasible solution to the cluster validity problem is suggested by combining hierarchical and fuzzy concepts. The algorithm is shown to be effective for a variety of data sets with a wide dynamic range of both covariance matrices and number of members in each class. The new method is applied to state recognition during recovery from exercise using the heart rate signal and to the forecasting of generalised epileptic seizures from the EEG signal.
  • Amir B. Geva. ScaleNet -- Multiscale Neural-Network Architecture for Time Series Prediction. TNN, 9(5):1471--1482, 1998.
    Keywords: Clustering, Neural Networks, Wavelets, Multiscale Analysis, Sequential/Temporal Data, Sequential/Temporal Patterns.
    Abstract: The effectiveness of a multiscale neural-network (NN) architecture for the time series prediction of nonlinear dynamic systems has been investigated. The prediction task is simplified by decomposing different scales of past windows into different scales of wavelets (local frequencies), and predicting the coefficients of each scale of wavelets by means of a separate multilayer percepetron NN. The short-term history (short past windows) is decomposed into the lower scales of wavelet coefficients (higher frequencies) which are utilitzed for ``detailed'' analysis and prediction, while the long-term history (long past window) is decomposed into higher scales of wavelet coefficients (low frequencies) that are used for the analysis and prediction of slow trends in the time series. These coordinated scales of time and frequency provides an interpretation of the series structures, and more information about the history of the series, using fewer coefficients than other methods. The prediction's results concerning all the different scales of time and frequencies are combined by another ``expert'' perceptron NN which learns the weight of each scale in the goal-prediction of the original time series. Each network is trained by the backpropagation algorithm using the Levenberg-Marquardt method. The weights and biases are initialized by a new clustering algorithm of the temporal patterns of the time series, which improves the prediction results as compared to random initialization. Three main sets of data were analyzed: the sunspots' benchmark, fluctuations in a far-infrared laser and a nonlinear numerically generated series. Taking the ultimate goal to be the accuracy of the prediction, we found that the suggested multiscale architecture outperforms the corresponding single-scale architectures. The employment of improved learning methods for each of the ScaleNet networks can further improve the prediction results.
  • Pedro Julian, Mario Jordán, and Alfredo Desages. Canonical Piecewise-Linear Approximation of Smooth Functions. TCS1, 45(5):567--571, 1998.
    Keywords: Piecewise Linear Representations.
    Abstract: This paper deals with the approximation of smooth functions using canonical piecewise-linear functions. The developing of tools in the field of analysis and control of nonlinear systems based on this kind of functions, as well as its efficiency in the representation of electronic devices, motivated the development of useful methods to obtain accurate approximations. A recursive method is proposed to obtain simultaneously all the parameters required and its convergence is studied. In addition, an iterative method to introduce new partitions on the domain, when the error obtained is not satisfactory, is described. This method takes advantage of the partitions already found to reduce the total number of parameters that the algorithm has to handle.
  • M. Kundu, M. Nasipuri, and D. K. Basu. A Knowledge-Based Approach to ECG Interpretation Using Fuzzy Logic. SMCB, 28(2):237--243, 1998.
    Keywords: Classification, Medical Applications.
    Abstract: A rule-based expert system which uses generalized modus ponens (GMP) from fuzzy logic as a rule of inference is described here for classification of abnormalities related to rhythm disorder in the human heart, through interpretation of the patient's electrocardiographic (ECG) patterns. Application of GMP makes diagnosis of a wide range of variations in the input ECG patterns possible even if they differ from the patterns defined in the preconditions of the rules of the rulebase. The work shows how fuzzy logic with suitably drawn possibility distributions of variables of cardiological domain plays a significant role in making the expert system sensitive to finer variations of input ECG patterns, which are very common in bioelectric signals, without enhancing the size of the rulebase.
  • Sven Loncaric. A Survey of Shape Analysis Techniques. PR, 31(8):983--1001, 1998.
    Keywords: Classification, Surveys.
    Abstract: This paper provides a review of shape analysis methods. Shape analysis methods play an important role in systems for object recognition, matching, registration, and analysis. Research in shape analysis has been motivated, in part, by studies of human visual form perception systems. Several theories of visual form perception are briefly mentioned. Shape analysis methods are classified into several groups. Classification is determined according to the use of shape boundary or interior, and according to the type of the result. An overview of the most representative methods is presented.
  • M.A. Martinelli. Pattern Recognition in Time-Series. Technical Analysis in Stocks & Commodities, 1998.
    Keywords: Similarity Measures, Sequential/Temporal Data.
    Abstract: The correltaion coefficient is a statistics that is used to measure "goodness-of-fit" in many curve-fitting procedures, such as least-squartes. Here we use it as an indicator of fit, or similarity, between a user-selected chart-pattern, and all segments of another chart having the same length.
  • Juiyao Pan, Guilherme N. DeSouza, and Avinash C. Kak. FuzzyShell: A Large Scale Expert System Shell Using Fuzzy Logic for Uncertainty Reasoning. TFS, 6(4):563--581, 1998.
    Abstract: There exist in the literature today many contributions dealing with the incorporation of fuzzy logic in expert systems. However, unfortunately, much of what has been proposed can only be applied to small-scale expert systems; that is, when the number of rules is in the dozens as opposed to in the hundreds. The more traditional (nonfuzzy) expert systems are able to cope with large numbers of rules by using Rete networks for maintaining matches of all the rules and all the facts. (A Rete network obviates the need to match the rules with the facts on every cycle of the inference engine.) In this paper, we present a more general Rete network that is particulary suitable for reasoning with fuzzy logic. The generalized Rete network consists of a cascade of three networks: the pattern network, the join network, and the evidence network. The first two layers are modified versions of similar layers for the traditional Rete networks and the last, the aggregation layer, is a new concept that allows fuzzyz evidence to be aggregated when fuzzy inferences are made about the same fuzzzy variable by different rules.
  • M. Ramze Rezaee, B. P. F. Lelieveldt, and J. H. C. Reiber. A new cluster validity index for the fuzzy c-means. PRL, 19:237--246, 1998.
    Keywords: Clustering, Cluster Validity Measures, Fuzzy c-Means.
    Abstract: In this paper a new cluster validity index is introduced, which assesses the average compactness and separation of fuzzy partitions generated by the fuzzy c-means algorithm. To compare the performance of this new index with a number of known validation indices, the fuzzy partitioning of two data sets was carried out. Our validation performed favorably in all studies, even in those where other validity indices failed to indicate the true number of clusters within each data set.
  • Magne Setnes, Robert Babuska, Uzay Kaymak, and Hans R. van Nauta Lemke. Similarity Measures in Fuzzy Rule Base Simplification. SMCB, 28(3):376--386, 1998.
    Keywords: Similarity Measures, Fuzzy Models.
    Abstract: In fuzzy rule-based models acquired from numerical data, redundancy may be present in the form of similar fuzzy sets that represent compatible concepts. This results in an unnecessarily complex and less transparent linguistic description of the system. By using a measure of similarity, a rule base simplification method is proposed that reduces the number of fuzzy sets in the model. Similar fuzzy sets are merged to create a common fuzzy set to replace them in the rule base. If the redundancy in the model is high, merging similar fuzzy sets might result in equal rules that also can be merged, thereby reducing the number of rules as well. The simplified rule base is computationally more efficient and linguistically more tractable. The approach has been successfully applied to fuzzy models of real world systems.
  • J. Shao. Application of an artifial neural network to improve short-term road ice forecasts. ESWA, 14:471--482, 1998.
    Keywords: Neural Networks.
    Abstract: This paper describes how a three-layer artificial neural network (NN) can be used to improve the accuracy of short-term (3-12 hours) automatic numerical prediction of road surface temperature, in order to cut winter road maintenance costs, reduce environmental damage from oversalting and provide safer roads for road users. In this paper, the training of the network is based on historical and preliminary meteorological parameters measured at an automatic roadside weather station, and the target of the training is hourly error of original numerical forecasts. The generalization of the trained network is then used to adjust the original model forecast. The effectiveness of the network in improving the accuracy of numerical model forecasts was tested at 39 sites in eight countries. Results of the tests show that the NN technique is able to reduce absolute error and root-mean-square error of temperature forecasts by 9.9-29\%, and increase the accuracy of frost/ice prediction.
  • Rosaria Silipo and Carlo Marchesi. Articifial Neural Networks for automatic ECG analysis. TSP, 46:1417--1425, 1998.
    Keywords: Classification, Neural Networks, Medical Applications.
    Abstract: The analysis of the ECG can benefit from the wide availability of computing technology as far as features and performances as well. This paper presents some results achieved by carrying out the classification tasks of a possible equipment integrating the most common features of the ECG analysis: arrhytmia, myocardial ischemia, chronic alterations. Several ANN architectures are implemented, tested, and compared with competing alternatives. Approach, structure, and learning algorithm of ANN were designed according to the features of each particular classification task. The trade-off between the time consuming training of ANN's and their performance is also explored. Data pre- and post-processing efforts on the system performance were critically tested. These efforts' crucial role on the production of the input space dimensions, on a more significant description of the input features, and on improving new or ambiguous event processing has been also documented. Finally, the algorithm assessment was done on data coming from all the currently available ECG databases.

Conference's articles

  • Lee Breslau, Pei Cao, Li Fan, and Graham Phillips. On the Implications of Zipf´s Law for Web Caching. In 3rd Int. WWW Caching Workshop, 1998.
    Abstract: Recently, a number of studies on characteristics of Web proxy traces have shown that the hit-ratios of the traces exhibit certain properties that are uniform across the different sets of the traces. An explanation for these phenomena has eluded researchers and it is not clear whether the properties are inherent to Web accesses or particular to the set of traces studied. \newline In this paper, we show that if one assumes that the references in the Web access stream are independent and the reference probability of the documents follow Zipf´s law then the observed properties folow from Zipf´s law. We revisit Web cache replacement algorithms and show that the algorithm that is suggested by Zipf´s law performs best. Finally, we investigate the drift in the cache´s hot set as a function of time.
  • Gautam Das, King-Ip Lin, Heikki Mannila, Gopal Renganathan, and Padhraic Smyth. Rule Discovery from Time Series. In KDD98, pages 16-22, 1998.
    Keywords: Similarity Measures, Sequential/Temporal Data.
    Abstract: We consider the problem of finding rules relating patterns in a time series to other patterns in that series or patterns in one series to patterns in another series. A simple example is a rule such as ``a period of low telephone call activity is usually followed by a sharp rise in call volume``. Examples of rules relating two or more time series are ``if the Microsoft stock price goes up and Intel falls, then IBM goes up the next day'' and ``if Microsoft goes up strongly for one day, then declines strongly on the next day, and on the same days Intel stays about level, then IBM stays about level''. Our emphasis is in the discovery of {\sl local} patterns in multivariate time series, in contrast to traditional time series analysis which largely focuses on {\sl global} models. Thus, we search for rules whose conditions refer to patterns in time series. However, we do not want to define beforehand which patterns are to be used; rather, we want the patterns to be formed from the data in the context of rule discovery. We describe adaptive methods for finding rules of the above type from time-series data. The methods are based on discretizing the sequence by methods resembling vector quantization. We first form subsequences by sliding a windows through the time series, and then cluster these subsequences by using a suitable measure of time-series similarity. The discretized version of the time series is obtained by taking the cluster identifiers corresponding to the subsequence. Once the time-series is discretized, we use simple rule finding methods to obtain rules from the sequence. We present empirical results on the behavior of the method.
  • Christian S. Jensen and Curtis E. Dyreson. The Consensus Glossary of Temporal Database Concepts. In O. Etzion, S. Jajodia, and S. Sripada, editors, Temporal Databases -- Research and Practice, volume 1399 of LNCS, pages 357--405, 1998.
    Keywords: Sequential/Temporal Data.
    Abstract: This document contains definitions of a wide range of concepts specific to and widely used within temporal databases. In addition to providing definitions, the document also includes explanations of concepts as well as discussions of the adopted names. \newline The consensus effort that lead to this glossary was initiated in Early 1992. Earlier versions appeared in SIGMOD Record in September 1992 and March 1994. The present glossary subsumes all the previous documents. The glossary meets the need for creating a higher degree of consensus on the definition and naming of temporal database concepts. \newline Two sets of criteria are included. First, all included concepts were required to satisfy four releveance criteria, and, second, the naming of the concepts was resolved using a set of evaluation criteria. The concepts are grouped into three categories: concepts of general database interest, of temporal database interest, and of specialized interest.
  • Eamonn J. Keogh and Michael J. Pazzani. An Enhanced Representation of Time Series which allows Fast and Accurate Classification, Clustering and Relevance Feedback. In KDD98, pages 239--241, 1998.
    Keywords: Piecewise Linear Representations, Classification, Clustering, Similarity Measures, Sequential/Temporal Data.
    Abstract: We introduce an extended representation of time series that allows fast, accurate classification and clustering in addition to the ability to explore time series data in a relevance feedback framework. The representation consists of piecewise linear segments to represent shape and a weight vector that contains the relative importance of each individual linear segment. In the classification context, the weights are learned automatically as part of the training cycle. In the relevance feedback context, the weights are determined by an interactive and iterative process in which users rate various choices presented to them. Our representation allows a user to define a variety of similarity measures that can be tailored to specific domains. We demonstrate our approach on space telemetry, medical and synthetic data.
  • Willi Klösgen. Deviation and Association Patterns for Subgroup Mining in Temporal, Spatial, and Textual Data Bases. In RSCTC'98, volume 1424 of LNAI, pages 1--18, 1998.
    Abstract: Data mining is usually introduced as search for interesting patterns in data. It is often an explorative step iteratively performed within a process of knowledge discovery in data bases (KDD). A mining step typically relies on strategies for systematic search in large hypotheses spaces guided by the autonomous evaluation of statistical tests. We describe the subgroup mining approach that is based on deviation and association patterns. A typical database contains values of attributes for many objects (persons, transactions, documents). Interpretable subgroups of these objects are searched that deviate from a designated expected behavior. Many types of data analysis questions can be answered by subgroup mining with diverse specializations of general deviation and association patterns. Tests measure the statistical interestingness of subgroup deviations. After summarizing the approach by discussing the fundamental components of subgroup pattern classes concerning validation, search and interactive presentation of pattern instances, we explain how deviation patterns of subgroup mining are applied for temporal, spatial and textual databases.
  • A. König. A Survey of Methods for Multivariate Data Projection, Visualisation and Interactive Analysis. In Proc. of the 5th Int. Conf. on Soft Computing and Information/Intelligent Systems, Iizuka, Fukuoka, Japan, pages 55--59, 1998. [ URL ]
    Keywords: Surveys.
    Abstract: In this paper, algorithms for multivariate data projection, based on topology or distance preserving mappings, as well as tools and techniques for projection display and user interaction are briefly reviewed and compared in an unifying approach. Advanced mapping algorithms, that focus on improved data structure preservation, following laws of perception as given by Gestalt-theory, as well as advanced features of data visualisation and navigation are introduced. These methods help to exploit the remarkable human perceiptive and associative capabilities in man/computer dialog, e.g.\ for visual exploratory data analysis.
  • Charles X. Ling and Chenghui Li. Data Mining for Direct Marketing: Problems and Solutions. In KDD98, 1998.
    Abstract: Direct marketing is a process of identifying likely buyers of certain products and promoting the products accordingly. It is increasingly used by banks, insuracne companies, and the retail industry. Data mining can provide an effective tool for direct marketing. During data mining, several specific problems arise. For example, the class distribution is extremly imbalanced (the response rate is about 1\%), the predictive accuracy is no longer suitable for evaulating learning methods, and the number of examples can be too large. In tjis paper, we discuss methods of coping with these problems based on our experience on direct-marketing projects using data mining.
  • Tim Oates, David Jensen, and Paul R. Cohen. Discovering Rules for Clustering and Predicting Asynchronous Events. In Andrea Danylok, editor, Predicting the Future: AI Approaches to Time-Series Problems., pages 73--79, 1998.
    Keywords: Clustering.
    Abstract: A wide variety of complex systems generate asynchronous events, including nuclear power plants, computer networks, governments, relational database systems and operating systems. We present Multi-Event Dependency Detection (MEDD), a novel algorithm for acquiring event correlation rules from historical logs of asynchronous events. Given a new stream of events generated in real time, the rules enable two important activities: clustering sets of related events and predicting events that will occur in the future. The former activity supports data reduction so that human monitors can more easily understand the state of the system generating the events, and the latter activity facilitates prediction of future states of the system by reasoning about events that are likely to occurr. MEDD's utility is evaluated in experiments with event logs generated by a simulated computer network and encodings of Reuters news stories describing events in the Persian Gulf during 1996 and 1997.
  • Balaji Padmanabhan and Alexander Tuzhilin. A Belief-Driven Method for Discovering Unexpected Patterns. In KDD98, pages 94--100, 1998.
    Abstract: Several pattern discovery methods proposed in the data mining literature have the drawbacks that they discover too many obvious or irrelevant patterns and that they do not leverage to a full extent valuable prior domain knowledge that decision makers have. In this paper we propose a new method of discovery that addresses these drawbacks. In particular we propose a new method of discovering unexpected patterns that takes into consideration prior background knowledge of decision makers. This prior knowledge constitutes a set of expectations or beliefs about the problem domain. Our proposed method of discovering unexpected patterns uses these beliefs to seed the search for patterns in data that contradict the beliefs. To evaluate the practicality of our approach, we applied our algorithm to consumer purchase data from a major market research company and to web logfile data tracked at an academic web site and present our findings in the paper.
  • Sani Susanto, R. D. Kennedy, and J. W. H. dan Price. SKP-2 Algorithm: on forming part and machine-clusters separately. In Proc. Pacific Conf. on Manufacturing, Brisbane, Queensland, Australie, pages 122--127, 1998. [ PDF ]
  • Banu Özden, Sridhar Ramaswamy, and Avi Silberschatz. Cyclic Association Rules. In ICDE98, pages 412-421, 1998.
    Keywords: Association Rules.
    Abstract: We study the problem of discovering association rules that display regular cyclic variation over time. For example, if we compute association rules over monthly sales data, we may observe seasonal variation where certain rules are true at approximately the same month each year. Similarly, association rules can also display regular hourly, daily, weekly, etc., variation that is cyclical in nature. We demonstrate that existing methods cannot be naively extended to solve this problem of cyclic association rules. We then present two new algorithms for discovering such rules. The first one, which we call the sequential algorithm, treats association rules and cycles more or less independently. By studying the interaction between association rules and time, we devise a new technique called cycle pruning, which reduces the amount of time needed to find cyclic association rules. The second algorithm, which we call the interleaved algorithm, uses cycle pruning and other optimization techniques for discovering cyclic association rules. We demonstrate the effectiveness of the interleaved algorithm through a series of experiments. These experiments show that the interleaved algorithm can yield significant performance benefitswhen compared to the sequential algorithm. Performance improvements range from 5\27777756214 several hundred percent.

Disclaimer

This list of publications is neither official nor complete, but a personal compilation.

Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.

This document was translated from BibTEX by bibtex2html

Home © F. Höppner last update: Tue Dec 7 08:49:56 CET 2004