000 11658nam a22002057a 4500
020 _a9789380931623
040 _cCUS
082 _a006.4
_bTHE/P
100 _aTheodoridis, Sergios
245 _aPattern recognition
_cSergios Theodoridis and Konstantinos Koutroumbas
250 _a4th ed.
260 _aAmsterdam:
_bElsevier,
_c2009.
300 _axvii, 961 p.
_bill.
505 _a2.5.1 Maximum Likelihood Parameter Estimation 34 2.5.2 Maximum « Posferfon Probability Estimation 38 2.5.3 Bayesian Inference 39 2.5.4 Maximum Entropy Estimation 43 2.5.5 Mixture Models 44 2.5.6 Nonparametric Estimation 49 2.5.7 The Naive-Bayes Classifier 59 The Nearest Neighbor Rule 61 Bayesian Networks 64 Problems 71 References 86 Linear Classifiers 91 Introduction 91 Linear Discriminant Functions and Decision Hyperplanes 91 Tlie Perceptron Algorithm 93 Least Squares Methods IO3 3.4.1 Mean Square Error Estimation 103 3.4.2 Stochastic Approximation and the LMS Algorithm .. 105 3.4.3 Sum of Error Squares Estimation 108 3.5 Mean Square Estimation Revisited 110 3.5.1 Mean Square Error Regression 110 3.5.2 MSE Estimates Posterior Class Probabilities 112 3.5.3 The Bias-Variance Dilemma 114 3.6 Logistic Discrimination 117 3.7 Support Vector Machines II9 3.7.1 Separable Classes II9 3.7.2 Nonseparable Classes 124 3.7.3 The Multiclass Case 127 3.7.4 /^-SVM 133 3.7.5 Support Vector Machines: A Geometric Viewpoint I36 3.7.6 Reduced Convex Hulls 138 3.8 Problems 142 References 147 CHAPTER 4 Nonlinear Classifiers 151 4.1 Introduction 151 4.2 The XOR Problem 151 4.3 Tlie Two-Layer Perceptron 153 4.3.1 Classification Capabilities of the Two-Layer Perceptron 156 4.4 Three-Layer Perceptrons 158 4.5 Algoritlims Based on Exact Classification of the Training Set l60 4.6 The Backpropagation Algorithm l62 4.7 Variations on the Backpropagation Theme 169 4.8 Tlie Cost Function Choice 172 4.9 Choice of the Network Size 176 4.10 A Simulation Example 181 4.11 Networks with Weight Sharing 183 4.12 Generalized Linear Classifiers 185 4.13 Capacity of the /-Dimensional Space in Linear Dichotomies 4.14 Polynomial Classifiers 189 4.15 Radial Basis Function Networks ^90 4.16 Universal Approximators 19^ 4.17 Probabilistic Neural Networks ^96 4.18 Support Vector Machines: The Nonlinear Case 198 4.19 Beyond the SVM Paradigm 203 4.19.1 Expansion in Kernel Functions and Model Sparsification 205 4.19.2 Robust Statistics Regression 211 4.20 Decision Trees 215 4.20.1 Set of Questions 218 4.20.2 Splitting Criterion 218 4.20.3 Stop-Splitting Rule 219 4.20.4 Class Assignment Rule 219 4.21 Combining Classifiers 222 4.21.1 Geometric Average Rule 223 4.21.2 Arithmetic Average Rule 224 4.21.3 Majority Voting Rule 225 4.21.4 A Bayesian Viewpoint 227 4.22 The Boosting Approach to Combine Classifiers 230 4.23 The Class Imbalance Problem 237 4.24 Discussion 239 4.25 Problems 240 References 249 CHAPTER 5 Feature Selection 261 5.1 Introduction 261 5.2 Preprocessing 262 5.2.1 Outlier Removal 262 5.2.2 Data Normalization 263 5.2.3 Missing Data 263 5.3 The Peaking Phenomenon 265 5.4 Feature Selection Based on Statistical Hypothesis Testing 268 5.4.1 Hypothesis Testing Basics 268 5.4.2 Application of the f-Test in Feature Selection 273 5.5 The Receiver Operating Characteristics (ROC) Curve 275 5.6 Class Separability Measures 276 5.6.1 Divergence 276 5.6.2 Chernoff Bound and Bhattacharyya Distance 278 5.6.3 Scatter Matrices 280 5.7 Feature Subset Selection 283 5.7.1 Scalar Feature Selection 283 5.7.2 Feature Vector Selection 284 5.8 Optimal Feature Generation 288 5.9 Neural Networks and Feature Generation/Selection 298 5.10 A Hint On Generalization Theory 299 5.11 The Bayesian Information Criterion 309 5.12 Problems 311 References 318 CHAPTER 6 Feature Generation I: Data Transformation and Dimensionality Reduction 323 6.1 Introduction 323 6.2 Basis Vectors and Images 324 6.3 The Karhunen-Loeve Transform 326 6.4 The Singular Value Decomposition 335 6.5 Independent Component Analysis 342 6.6 6.5.1 ICA Based on Second- and Fourth-Order Cumulants 344 6.5.2 ICA Based on Mutual Information 345 6.5.3 An ICA Simulation Example 348 Nonnegative Matrix Factorization 349 6.7 Nonlinear Dimensionality Reduction 35O 6.8 6.9 6.10 6.7.1 Kernel PCA 35I 6.7.2 Graph-Based Methods 353 The Discrete FourierTransform (DFT) 363 6.8.1 One-Dimensional DFT 364 6.8.2 Two-Dimensional DFT 366 The Discrete Cosine and Sine Transforms 366 The Hadamard Transform 368 6.11 The Haar Transform 369 6.12 The Haar Expansion Revisited 37I 6.13 Discrete Time Wavelet Transform (DTWT) 375 6.14 The Multiresolution Interpretation 384 6.15 Wavelet Packets jg7 6.16 A Look atTwo-Dimensional Generalizations 388 0-17 Applications 6.18 Problems References CHAPTER 7 Feature Generation II 4.. 7.1 Introduction T.2 Regional Features | ^'" * * ^ ^ 2 7.2 2 Locl^ forTexture Characterization 412 Exttactir f^^Texture Feature 7-2.3 Moments ■7-2.4 Parametric Models 7.3 Features for Shape and Size Characterization 435 7.3.1 Fourier Features 436 7.3.2 Chain Codes 439 7.3.3 Moment-Based Features 441 7.3.4 Geometric Features 442 7.4 A Glimpse at Fractals 444 7.4.1 Self-Similarity and Fractal Dimension 444 7.4.2 Fractional Brownian Motion 446 7.5 Typical Features for Speech and Audio Classification 451 7.5.1 ShortTime Processing of Signals 452 7.5.2 Cepstrum 455 7.5.3 The Mel-Cepstrum 457 7.5.4 Spectral Features 460 7.5.5 Time Domain Features 462 7.5.6 An Example 463 7.6 Problems 466 References 473 CHAPTER 8 Template Matching 48i 8.1 Introduction 481 8.2 Measures Based on Optimal Path Searching Techniques 482 8.2.1 Bellman's Optimality Principle and Dynamic Programming 484 8.2.2 The Edit Distance 487 8.2.3 Dynamic Time Warping in Speech Recognition 491 8.3 Measures Based on Correlations 498 8.4 Deformable Template Models 504 8.5 Content-Based Information Retrieval: Relevance Feedback 5O8 8.8 Problems 513 References CHAPTER 9 Context-Dependent Classification 521 9.1 Introduction 9.2 The Bayes Classifier 52i 9.3 Markov Chain Models 522 9.4 The Viterbi Algorithm 523 9.5 Channel Equalization 527 9.6 Hidden Markov Models 552 9.7 HMM with State Duration Modeling 545 9.8 Training Markov Models via Neural Networks 552 9.9 A discussion of Markov Random Fields 554 9.10 Problems 556 References 560 CHAPTER 10 3L!r)';i v'ssfJ Learning; The Epilogue 567 10.1 Introduction 567 10.2 Error-Counting Approach 568 10.3 Exploiting the Finite Size of the Data Set 569 10.4 A Case Study from Medical Imaging 573 10.5 Semi-Supervised Learning 577 10.5.1 Generative Models 579 10.5.2 Graph-Based Methods 582 10.5.3 Transductive Support Vector Machines 586 10.6 Problems 590 References 591 CHAPTER 11 Clustering; Basic Concepts 595 11.1 Introduction 595 11.1.1 Applications of Cluster Analysis 598 11.1.2 Types of Features 599 11.1.3 Definitions of Clustering 600 11.2 Proximit}' Measures 602 11.2.1 Definitions 602 11.2.2 Proximity Measures between Two Points 604 11.2.3 Proximity Functions between a Point and a Set 6l6 11.2.4 Proximity Functions between Two Sets 620 11.3 Problems 622 References 624 CHAPTER 12 Clustering Algorithms I: Sequential Algorithms 627 12.1 Introduction 627 12.1.1 Number of Possible Clusterings 627 12.2 Categories of Clustering Algorithms 629 12.3 Sequential Clustering Algorithms 633 Estimation of the Number of Clusters 635 12.4 A Modification of BSAS 637 12.5 A Two-Threshold Sequential Scheme 638 "2.6 Refinement Stages 641 Neural Network Implementation 643 12^^ of the Architecture 643 ^•2 Implementation of the BSAS Algoritlim 644 12.8 Problems 646 References 650 CHAPTER 13 Clustering Algorithms II: Hierarchical Algorithms 653 13.1 Introduction 653 13.2 Agglomerative Algorithms 654 13.2.1 Definition of Some Useful Quantities 655 13.2.2 Agglomerative Algorithms Based on Matrix Theory . 658 13.2.3 Monotonicity and Crossover 664 13.2.4 Implementational Issues 667 13.2.5 Agglomerative Algorithms Based on Graph Theory.. 667 13.2.6 Ties in the Proximity Matrix 676 13.3 The Cophenetic Matrix 679 13.4 Divisive Algorithms 680 13.5 Hierarchical Algorithms for Large Data Sets 682 13.6 Choice of the Best Number of Clusters 690 13.7 Problems 693 References 697 CHAPTER 14 Clustering Algorithms III: Schemes Based on Function Optimization 701 14.1 Introduction 701 14.2 Mixture Decomposition Schemes 703 14.2.1 Compact and Hyperellipsoidal Clusters 705 14.2.2 A Geometrical Interpretation 709 14.3 Fuzzy Clustering Algorithms 712 14.3.1 Point Representatives 716 14.3.2 Quadric Surfaces as Representatives 718 14.3.3 Hyperplane Representatives 728 14.3.4 Combining Quadric and Hyperplane Representatives 731 14.3.5 A Geometrical Interpretation 732 14.3.6 Convergence Aspects of the Fuzzy Clustering Algorithms 732 14.3.7 Alternating Cluster Estimation 733 14.4 Possibilistic Clustering 733 14.4.1 The Mode-Seeking Property 737 14.4.2 An Alternative Possibilistic Scheme 739 14.5 Hard Clustering Algorithms 739 14.5.1 The Isodata or k-Means or c-Means Algorithm 741 14.5.2 ife-Medoids Algorithms 745 14.8 Vector Quantization 749 14.7 Problems 752 References 758 CHAPTER 15 Clustering Algorithms IV 765 15.1 Introduction 765 15.2 Clustering Algorithms Based on Graph Theory 765 15.2.1 Minimum Spanning Tree Algorithms 766 15.2.2 Algorithms Based on Regions of Influence 768 15.2.3 Algorithms Based on Directed Trees 770 15.2.4 Spectral Clustering 772 15.3 Competitive Learning Algorithms 780 15.3 .1 Basic Competitive Learning Algorithm 782 15.3.2 Leaky Learning Algorithm 783 15.3.3 Conscientious Competitive Learning Algoritlims— 784 15.3.4 Competitive Learning-Like Algorithms Associated with Cost Functions 785 15.3.5 Self-Organizing Maps 786 15.3.6 Supervised Learning Vector Quantization 788 15.4 Binary Morphology Clustering Algorithms (BMCAs) 789 15.4.1 Discretization 790 15.4.2 Morphological Operations 791 15.4.3 Determination of the Clusters in a Discrete Binary Set 794 15.4.4 Assignment of Feature Vectors to Clusters 795 15.4.5 Tlie Algorithmic Scheme 796 15.5 Boundary Detection Algorithms 798 15.6 Valley-Seeking Clustering Algorithms 801 15.7 Clustering via Cost Optimization (Revisited) 803 15.7.1 Branch and Bound Clustering Algorithms 803 15.7.2 Simulated Annealing 807 15.7.3 Deterministic Annealing 808 15.7.4 Clustering Using Genetic Algoritlims 810 15.8 Kernel Clustering Methods 811 15.9 Density-Based Algorithms for Large Data Sets 815 15.9.1 The DBSCAN Algorithm 815 15.9.2 TheDBCLASDAlgoritlim 818 15.9.3 The DENCLUE Algorithm 819 15.10 Clustering Algorithms for High-Dimensional Data Sets 821 15.10.1 Dimensionality Reduction Clustering Approach — 822 15.10.2 Subspace Clustering Approach 824 15.11 Other Clustering Algorithms 837 15.12 Combination of Clusterings 839 15.13 Problems 846 References 852 CHAPTER 16 Cluster Validity 863 16.1 Introduction 863 16.2 Hypothesis Testing Revisited 864 16.3 Hypothesis Testing in Cluster Validity 866 16.3.1 External Criteria 868 16.3.2 Internal Criteria 873 16.4 Relative Criteria 877 16.4.1 Hard Clustering 880 16.4.2 Fuzzy Clustering 887 16.5 Validity of Individual Clusters 893 16.5.1 External Criteria 894 16.5.2 Internal Criteria 894 16.6 Clustering Tendency 896 16.6.1 Tests for Spatial Randomness 900 16.7 Problems 905 References 909
650 _aPattern recognition
650 _aBiometrics
650 _aBioinformatics
650 _aDynamic programming
942 _cL2C2
999 _c3612
_d3612