Theodoridis, Sergios

Pattern recognition Sergios Theodoridis and Konstantinos Koutroumbas - 4th ed. - Amsterdam: Elsevier, 2009. - xvii, 961 p. ill.

2.5.1 Maximum Likelihood Parameter Estimation 34
2.5.2 Maximum « Posferfon Probability Estimation 38
2.5.3 Bayesian Inference 39
2.5.4 Maximum Entropy Estimation 43
2.5.5 Mixture Models 44
2.5.6 Nonparametric Estimation 49
2.5.7 The Naive-Bayes Classifier 59
The Nearest Neighbor Rule 61
Bayesian Networks 64
Problems 71
References 86
Linear Classifiers 91
Introduction 91
Linear Discriminant Functions and Decision
Hyperplanes 91
Tlie Perceptron Algorithm 93
Least Squares Methods IO3
3.4.1 Mean Square Error Estimation 103
3.4.2 Stochastic Approximation and the LMS Algorithm .. 105
3.4.3 Sum of Error Squares Estimation 108
3.5 Mean Square Estimation Revisited 110
3.5.1 Mean Square Error Regression 110
3.5.2 MSE Estimates Posterior Class Probabilities 112
3.5.3 The Bias-Variance Dilemma 114
3.6 Logistic Discrimination 117
3.7 Support Vector Machines II9
3.7.1 Separable Classes II9
3.7.2 Nonseparable Classes 124
3.7.3 The Multiclass Case 127
3.7.4 /^-SVM 133
3.7.5 Support Vector Machines: A Geometric
Viewpoint I36
3.7.6 Reduced Convex Hulls 138
3.8 Problems 142
References 147
CHAPTER 4 Nonlinear Classifiers 151
4.1 Introduction 151
4.2 The XOR Problem 151
4.3 Tlie Two-Layer Perceptron 153
4.3.1 Classification Capabilities of the Two-Layer
Perceptron 156
4.4 Three-Layer Perceptrons 158
4.5 Algoritlims Based on Exact Classification of the
Training Set l60
4.6 The Backpropagation Algorithm l62
4.7 Variations on the Backpropagation Theme 169
4.8 Tlie Cost Function Choice 172
4.9 Choice of the Network Size 176
4.10 A Simulation Example 181
4.11 Networks with Weight Sharing 183
4.12 Generalized Linear Classifiers 185
4.13 Capacity of the /-Dimensional Space in
Linear Dichotomies
4.14 Polynomial Classifiers 189
4.15 Radial Basis Function Networks ^90
4.16 Universal Approximators 19^
4.17 Probabilistic Neural Networks ^96
4.18 Support Vector Machines: The Nonlinear Case 198
4.19 Beyond the SVM Paradigm 203
4.19.1 Expansion in Kernel Functions and Model
Sparsification 205
4.19.2 Robust Statistics Regression 211
4.20 Decision Trees 215
4.20.1 Set of Questions 218
4.20.2 Splitting Criterion 218
4.20.3 Stop-Splitting Rule 219
4.20.4 Class Assignment Rule 219
4.21 Combining Classifiers 222
4.21.1 Geometric Average Rule 223
4.21.2 Arithmetic Average Rule 224
4.21.3 Majority Voting Rule 225
4.21.4 A Bayesian Viewpoint 227
4.22 The Boosting Approach to Combine Classifiers 230
4.23 The Class Imbalance Problem 237
4.24 Discussion 239
4.25 Problems 240
References 249
CHAPTER 5 Feature Selection 261
5.1 Introduction 261
5.2 Preprocessing 262
5.2.1 Outlier Removal 262
5.2.2 Data Normalization 263
5.2.3 Missing Data 263
5.3 The Peaking Phenomenon 265
5.4 Feature Selection Based on Statistical
Hypothesis Testing 268
5.4.1 Hypothesis Testing Basics 268
5.4.2 Application of the f-Test in Feature Selection 273
5.5 The Receiver Operating Characteristics (ROC) Curve 275
5.6 Class Separability Measures 276
5.6.1 Divergence 276
5.6.2 Chernoff Bound and Bhattacharyya Distance 278
5.6.3 Scatter Matrices 280
5.7 Feature Subset Selection 283
5.7.1 Scalar Feature Selection 283
5.7.2 Feature Vector Selection 284
5.8 Optimal Feature Generation 288
5.9 Neural Networks and Feature Generation/Selection 298
5.10 A Hint On Generalization Theory 299
5.11 The Bayesian Information Criterion 309
5.12 Problems 311
References 318
CHAPTER 6 Feature Generation I: Data Transformation and
Dimensionality Reduction 323
6.1 Introduction 323
6.2 Basis Vectors and Images 324
6.3 The Karhunen-Loeve Transform 326
6.4 The Singular Value Decomposition 335
6.5 Independent Component Analysis 342
6.6
6.5.1 ICA Based on Second- and Fourth-Order
Cumulants 344
6.5.2 ICA Based on Mutual Information 345
6.5.3 An ICA Simulation Example 348
Nonnegative Matrix Factorization 349
6.7 Nonlinear Dimensionality Reduction 35O
6.8
6.9
6.10
6.7.1 Kernel PCA 35I
6.7.2 Graph-Based Methods 353
The Discrete FourierTransform (DFT) 363
6.8.1 One-Dimensional DFT 364
6.8.2 Two-Dimensional DFT 366
The Discrete Cosine and Sine Transforms 366
The Hadamard Transform 368
6.11 The Haar Transform 369
6.12 The Haar Expansion Revisited 37I
6.13 Discrete Time Wavelet Transform (DTWT) 375
6.14 The Multiresolution Interpretation 384
6.15 Wavelet Packets jg7
6.16 A Look atTwo-Dimensional Generalizations 388
0-17 Applications
6.18 Problems
References
CHAPTER 7 Feature Generation II 4..
7.1 Introduction
T.2 Regional Features | ^'" * * ^ ^ 2
7.2 2 Locl^ forTexture Characterization 412
Exttactir f^^Texture Feature
7-2.3 Moments
■7-2.4 Parametric Models
7.3 Features for Shape and Size Characterization 435
7.3.1 Fourier Features 436
7.3.2 Chain Codes 439
7.3.3 Moment-Based Features 441
7.3.4 Geometric Features 442
7.4 A Glimpse at Fractals 444
7.4.1 Self-Similarity and Fractal Dimension 444
7.4.2 Fractional Brownian Motion 446
7.5 Typical Features for Speech and Audio Classification 451
7.5.1 ShortTime Processing of Signals 452
7.5.2 Cepstrum 455
7.5.3 The Mel-Cepstrum 457
7.5.4 Spectral Features 460
7.5.5 Time Domain Features 462
7.5.6 An Example 463
7.6 Problems 466
References 473
CHAPTER 8 Template Matching 48i
8.1 Introduction 481
8.2 Measures Based on Optimal Path Searching Techniques 482
8.2.1 Bellman's Optimality Principle and Dynamic
Programming 484
8.2.2 The Edit Distance 487
8.2.3 Dynamic Time Warping in Speech Recognition 491
8.3 Measures Based on Correlations 498
8.4 Deformable Template Models 504
8.5 Content-Based Information Retrieval:
Relevance Feedback 5O8
8.8 Problems 513
References
CHAPTER 9 Context-Dependent Classification 521
9.1 Introduction
9.2 The Bayes Classifier 52i
9.3 Markov Chain Models 522
9.4 The Viterbi Algorithm 523
9.5 Channel Equalization 527
9.6 Hidden Markov Models 552
9.7 HMM with State Duration Modeling 545
9.8 Training Markov Models via Neural Networks 552
9.9 A discussion of Markov Random Fields 554
9.10 Problems 556
References 560
CHAPTER 10 3L!r)';i v'ssfJ Learning; The Epilogue 567
10.1 Introduction 567
10.2 Error-Counting Approach 568
10.3 Exploiting the Finite Size of the Data Set 569
10.4 A Case Study from Medical Imaging 573
10.5 Semi-Supervised Learning 577
10.5.1 Generative Models 579
10.5.2 Graph-Based Methods 582
10.5.3 Transductive Support Vector Machines 586
10.6 Problems 590
References 591
CHAPTER 11 Clustering; Basic Concepts 595
11.1 Introduction 595
11.1.1 Applications of Cluster Analysis 598
11.1.2 Types of Features 599
11.1.3 Definitions of Clustering 600
11.2 Proximit}' Measures 602
11.2.1 Definitions 602
11.2.2 Proximity Measures between Two Points 604
11.2.3 Proximity Functions between a Point and a Set 6l6
11.2.4 Proximity Functions between Two Sets 620
11.3 Problems 622
References 624
CHAPTER 12 Clustering Algorithms I: Sequential Algorithms 627
12.1 Introduction 627
12.1.1 Number of Possible Clusterings 627
12.2 Categories of Clustering Algorithms 629
12.3 Sequential Clustering Algorithms 633
Estimation of the Number of Clusters 635
12.4 A Modification of BSAS 637
12.5 A Two-Threshold Sequential Scheme 638
"2.6 Refinement Stages 641
Neural Network Implementation 643
12^^ of the Architecture 643
^•2 Implementation of the BSAS Algoritlim 644
12.8 Problems 646
References 650
CHAPTER 13 Clustering Algorithms II: Hierarchical Algorithms 653
13.1 Introduction 653
13.2 Agglomerative Algorithms 654
13.2.1 Definition of Some Useful Quantities 655
13.2.2 Agglomerative Algorithms Based on Matrix Theory . 658
13.2.3 Monotonicity and Crossover 664
13.2.4 Implementational Issues 667
13.2.5 Agglomerative Algorithms Based on Graph Theory.. 667
13.2.6 Ties in the Proximity Matrix 676
13.3 The Cophenetic Matrix 679
13.4 Divisive Algorithms 680
13.5 Hierarchical Algorithms for Large Data Sets 682
13.6 Choice of the Best Number of Clusters 690
13.7 Problems 693
References 697
CHAPTER 14 Clustering Algorithms III: Schemes Based on
Function Optimization 701
14.1 Introduction 701
14.2 Mixture Decomposition Schemes 703
14.2.1 Compact and Hyperellipsoidal Clusters 705
14.2.2 A Geometrical Interpretation 709
14.3 Fuzzy Clustering Algorithms 712
14.3.1 Point Representatives 716
14.3.2 Quadric Surfaces as Representatives 718
14.3.3 Hyperplane Representatives 728
14.3.4 Combining Quadric and Hyperplane
Representatives 731
14.3.5 A Geometrical Interpretation 732
14.3.6 Convergence Aspects of the Fuzzy Clustering
Algorithms 732
14.3.7 Alternating Cluster Estimation 733
14.4 Possibilistic Clustering 733
14.4.1 The Mode-Seeking Property 737
14.4.2 An Alternative Possibilistic Scheme 739
14.5 Hard Clustering Algorithms 739
14.5.1 The Isodata or k-Means or c-Means Algorithm 741
14.5.2 ife-Medoids Algorithms 745
14.8 Vector Quantization 749
14.7 Problems 752
References 758
CHAPTER 15 Clustering Algorithms IV 765
15.1 Introduction 765
15.2 Clustering Algorithms Based on Graph Theory 765
15.2.1 Minimum Spanning Tree Algorithms 766
15.2.2 Algorithms Based on Regions of Influence 768
15.2.3 Algorithms Based on Directed Trees 770
15.2.4 Spectral Clustering 772
15.3 Competitive Learning Algorithms 780
15.3 .1 Basic Competitive Learning Algorithm 782
15.3.2 Leaky Learning Algorithm 783
15.3.3 Conscientious Competitive Learning Algoritlims— 784
15.3.4 Competitive Learning-Like Algorithms Associated
with Cost Functions 785
15.3.5 Self-Organizing Maps 786
15.3.6 Supervised Learning Vector Quantization 788
15.4 Binary Morphology Clustering Algorithms (BMCAs) 789
15.4.1 Discretization 790
15.4.2 Morphological Operations 791
15.4.3 Determination of the Clusters in a Discrete Binary
Set 794
15.4.4 Assignment of Feature Vectors to Clusters 795
15.4.5 Tlie Algorithmic Scheme 796
15.5 Boundary Detection Algorithms 798
15.6 Valley-Seeking Clustering Algorithms 801
15.7 Clustering via Cost Optimization (Revisited) 803
15.7.1 Branch and Bound Clustering Algorithms 803
15.7.2 Simulated Annealing 807
15.7.3 Deterministic Annealing 808
15.7.4 Clustering Using Genetic Algoritlims 810
15.8 Kernel Clustering Methods 811
15.9 Density-Based Algorithms for Large Data Sets 815
15.9.1 The DBSCAN Algorithm 815
15.9.2 TheDBCLASDAlgoritlim 818
15.9.3 The DENCLUE Algorithm 819
15.10 Clustering Algorithms for High-Dimensional Data Sets 821
15.10.1 Dimensionality Reduction Clustering Approach — 822
15.10.2 Subspace Clustering Approach 824
15.11 Other Clustering Algorithms 837
15.12 Combination of Clusterings 839
15.13 Problems 846
References 852
CHAPTER 16 Cluster Validity 863
16.1 Introduction 863
16.2 Hypothesis Testing Revisited 864
16.3 Hypothesis Testing in Cluster Validity 866
16.3.1 External Criteria 868
16.3.2 Internal Criteria 873
16.4 Relative Criteria 877
16.4.1 Hard Clustering 880
16.4.2 Fuzzy Clustering 887
16.5 Validity of Individual Clusters 893
16.5.1 External Criteria 894
16.5.2 Internal Criteria 894
16.6 Clustering Tendency 896
16.6.1 Tests for Spatial Randomness 900
16.7 Problems 905
References 909

9789380931623


Pattern recognition
Biometrics
Bioinformatics
Dynamic programming

006.4 / THE/P