Freshly Printed - allow 4 days lead
Model-Based Clustering and Classification for Data Science
With Applications in R
Colorful example-rich introduction to the state-of-the-art for students in data science, as well as researchers and practitioners.
Charles Bouveyron (Author), Gilles Celeux (Author), T. Brendan Murphy (Author), Adrian E. Raftery (Author)
9781108494205, Cambridge University Press
Hardback, published 25 July 2019
446 pages, 40 b/w illus. 171 colour illus. 48 tables
26 x 18.5 x 2.5 cm, 1.1 kg
'This book frames cluster analysis and classi?cation in terms of statistical models, thus yielding principled estimation, testing and prediction methods, and sound answers to the central questions … Written for advanced undergraduates in data science, as well as researchers and practitioners, it assumes basic knowledge of multivariate calculus, linear algebra, probability and statistics.' Hans-Jürgen Schmidt, zbMATH
Cluster analysis finds groups in data automatically. Most methods have been heuristic and leave open such central questions as: how many clusters are there? Which method should I use? How should I handle outliers? Classification assigns new observations to groups given previously classified observations, and also has open questions about parameter tuning, robustness and uncertainty assessment. This book frames cluster analysis and classification in terms of statistical models, thus yielding principled estimation, testing and prediction methods, and sound answers to the central questions. It builds the basic ideas in an accessible but rigorous way, with extensive data examples and R code; describes modern approaches to high-dimensional data and networks; and explains such recent advances as Bayesian regularization, non-Gaussian model-based clustering, cluster merging, variable selection, semi-supervised and robust classification, clustering of functional data, text and images, and co-clustering. Written for advanced undergraduates in data science, as well as researchers and practitioners, it assumes basic knowledge of multivariate calculus, linear algebra, probability and statistics.
1. Introduction
2. Model-based clustering: basic ideas
3. Dealing with difficulties
4. Model-based classification
5. Semi-supervised clustering and classification
6. Discrete data clustering
7. Variable selection
8. High-dimensional data
9. Non-Gaussian model-based clustering
10. Network data
11. Model-based clustering with covariates
12. Other topics
List of R packages
Bibliography
Index.
Subject Areas: Machine learning [UYQM], Data mining [UNF], Data capture & analysis [UNC], Probability & statistics [PBT], Epidemiology & medical statistics [MBNS], Economic statistics [KCHS], Social research & statistics [JHBC]