Freshly Printed - allow 8 days lead
Introduction to Clustering Large and High-Dimensional Data
Focuses on a few of the important clustering algorithms in the context of information retrieval.
Jacob Kogan (Author)
9780521617932, Cambridge University Press
Paperback, published 13 November 2006
222 pages
22.9 x 15.3 x 1.5 cm, 0.307 kg
"...this book may serve as a useful reference for scientists and engineers who need to understand the concepts of clustering in general and/or to focus on text mining applications. It is also appropriate for students who are attending a course in pattern recognition, data mining, or classification and are interested in learning more about issues related to the k-means scheme for an undergraduate or master's thesis project. Last, it supplies very interesting material for instructors."
Nicolas Loménie, IAPR Newsletter
There is a growing need for a more automated system of partitioning data sets into groups, or clusters. For example, digital libraries and the World Wide Web continue to grow exponentially, the ability to find useful information increasingly depends on the indexing infrastructure or search engine. Clustering techniques can be used to discover natural groups in data sets and to identify abstract structures that might reside there, without having any background knowledge of the characteristics of the data. Clustering has been used in a variety of areas, including computer vision, VLSI design, data mining, bio-informatics (gene expression analysis), and information retrieval, to name just a few. This book focuses on a few of the most important clustering algorithms, providing a detailed account of these major models in an information retrieval context. The beginning chapters introduce the classic algorithms in detail, while the later chapters describe clustering through divergences and show recent research for more advanced audiences.
1. Introduction and motivation
2. Quadratic k-means algorithm
3. BIRCH
4. Spherical k-means algorithm
5. Linear algebra techniques
6. Information-theoretic clustering
7. Clustering with optimization techniques
8. k-means clustering with divergence
9. Assessment of clustering results
10. Appendix: Optimization and Linear Algebra Background
11. Solutions to selected problems.
Subject Areas: Pattern recognition [UYQP], Data mining [UNF], Probability & statistics [PBT]