Data mining:concepts and techniques / (Record no. 3560)

MARC details
000 -LEADER
fixed length control field 16153nam a2200169 4500
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 9789380931913 (pb)
040 ## - CATALOGING SOURCE
Transcribing agency HAN/D
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 005.741
Item number HAN/D
100 ## - MAIN ENTRY--PERSONAL NAME
Personal name Han, Jiawei
245 ## - TITLE STATEMENT
Title Data mining:concepts and techniques /
Statement of responsibility, etc. Jiawei Han , Michelilne and Jian Pei
250 ## - EDITION STATEMENT
Edition statement 3rd ed.
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of publication, distribution, etc. Amsterdam :
Name of publisher, distributor, etc. Elsevier,
Date of publication, distribution, etc. 2012.
300 ## - PHYSICAL DESCRIPTION
Extent xxxiii, 703 p.
Other physical details ill. ;
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note Chapter I Introduction 1<br/>1.1 Why Data Mining? I<br/>I.I.I Moving toward the Information Age I<br/>1. 1.2 Data Mining as the Evolution of Information Technology 2<br/>1.2 What Is Data Mining? 5<br/>1.3 What Kinds of Data Can Be Mined? 8<br/>1.3.1 Database Data 9<br/>1.3.2 Data Warehouses 10<br/>1.3.3 Transactional Data 13<br/>1.3.4 Other Kinds of Data 14<br/>1.4 What Kinds of Patterns Can Be Mined? IS<br/>1.4.1 Class/Concept Description: Characterization and Discrimination<br/>1.4.2 Mining Frequent Patterns, Associations, and Correlations 17<br/>1.4.3 Classification and Regression for Predictive Analysis 18<br/>1.4.4 Cluster Analysis 19<br/>1.4.5 Outlier Analysis 20<br/>1.4.6 Are All Patterns Interesting? 21<br/>1.5 Which Technologies Are Used? 23<br/>1.5.1 Statistics 23<br/>1.5.2 Machine Learning 24<br/>1.5.3 Database Systems and Data Warehouses 26<br/>1.5.4 Information Retrieval 26<br/>Contents<br/>1.6 Which Kinds of Applications Are Targeted? 27<br/>1.6.1 Business Intelligence 27<br/>1.6.2 Web Search Engines 28<br/>1.7 Major Issues in Data Mining 29<br/>1.7.1 Mining Methodology 29<br/>1.7.2 User Interaction 30<br/>1.7.3 Efficiency and Scalability 31<br/>1.7.4 Diversity of Database Types 32<br/>1.7.5 Data Mining and Society 32<br/>1.8 Summary 33<br/>1.9 Exercises 34<br/>1.10 Bibliographic Notes 35<br/>Chapter 2 Getting to Know Your Data 39<br/>2.1 Data Objects and Attribute Types 40<br/>2.1.1 What Is an Attribute? 40<br/>2.1.2 Nominal Attributes 41<br/>2.1.3 Binary Attributes 41<br/>2.1.4 Ordinal Attributes 42<br/>2.1.5 Numeric Attributes 43<br/>2.1.6 Discrete versus Continuous Attributes 44<br/>2.2 Basic Statistical Descriptions of Data 44<br/>2.2.1 Measuring the Central Tendency. Mean, Median, and Mode 45<br/>2.2.2 Measuring the Dispersion of Data; Range, Quartiles, Variance,<br/>Standard Deviation, and Interquartile Range 48<br/>2.2.3 Graphic Displays of Basic Statistical Descriptions of Data 51<br/>2.3 Data Visualization 56<br/>2.3.1 Pixel-Oriented Visualization Techniques 57<br/>2.3.2 Geometric Projection Visualization Techniques 58<br/>2.3.3 Icon-Based Visualization Techniques 60<br/>2.3.4 Hierarchical Visualization Techniques 63<br/>2.3.5 Visualizing Complex Data and Relations 64<br/>2.4 Measuring Data Similarity and Dissimilarity 65<br/>2.4.1 Data Matrix versus Dissimilarity Matrix 67<br/>2.4.2 Proximity Measures for Nominal Attributes 68<br/>2.4.3 Proximity Measures for Binary Attributes 70<br/>2.4.4 Dissimilarity of Numeric Data: Minkowski Distance 72<br/>2.4.5 Proximity Measures for Ordinal Attributes 74<br/>2.4.6 Dissimilarity for Attributes of Mixed Types 75<br/>2.4.7 Cosine Similarity 77<br/>2.5 Summary 79<br/>2.6 Exercises 79<br/>2.7 Bibliographic Notes 81<br/>Chapter 3 Data Preprocessing 83<br/>3.1 Data Preprocessing: An Overview 84<br/>3.1.1 Data Quality: Why Preprocess the Data? 84<br/>3.1.2 Major Tasks in Data Preprocessing 85<br/>3.2 Data Cleaning 88<br/>3.2.1 Missing Values 88<br/>3.2.2 Noisy Data 89<br/>3.2.3 Data Cleaning as a Process 91<br/>3.3 Data integration 93<br/>3.3.1 Entity Identification Problem 94<br/>3.3.2 Redundancy and Correlation Analysis 94<br/>3.3.3 Tuple Duplication 98<br/>3.3.4 Data Value Conflict Detection and Resolution 99<br/>3.4 Data Reduction 99<br/>3.4.1 Overview of Data Reduction Strategies 99<br/>3.4.2 Wavelet Transforms 100<br/>3.4.3 Principal Components Analysis 102<br/>3.4.4 Attribute Subset Selection 103<br/>3.4.5 Regression and Log-Linear Models; Parametric<br/>Data Reduction 105<br/>3.4.6 Histograms 106<br/>3.4.7 Clustering 108<br/>3.4.8 Sampling 108<br/>3.4.9 Data Cube Aggregation 1 10<br/>3.5 Data Transformation and Data Discretization 111<br/>3.5.1 Data Transformation Strategies Overview 1 12<br/>3.5.2 Data Transformation by Normalization 1 13<br/>3.5.3 Discretization by Binning 1 15<br/>3.5.4 Discretization by Histogram Analysis 1 15<br/>3.5.5 Discretization by Cluster, Decision Tree, and Correlation<br/>Analyses 1 16<br/>3.5.6 Concept Hierarchy Generation for Nominal Data 1 17<br/>3.6 Summary 120<br/>3.7 Exercises 121<br/>3.8 Bibliographic Notes 123<br/>Chapter 4 Data Warehousing and Online Analytical Processing 125<br/>4.1 Data Warehouse: Basic Concepts 125<br/>4.1.1 What Is a Data Warehouse? 126<br/>4.1.2 Differences between Operational Databas'^ Systems<br/>and Data Warehouses 128<br/>4.1.3 But Why Have a Separate Data Warehou el 129<br/>4.1 .4 Data Warehousing: A Multitiered Architecture 130<br/>4.1.5 Data Warehouse Models: Enterprise Warehouse. Data Mart,<br/>and Virtual Warehouse 132<br/>4.1 .6 Extraction, Transformation, and Loading 134<br/>4.1.7 Metadata Repository 134<br/>4.2 Data Warehouse Modeling: Data Cube and OLAP 135<br/>4.2.1 Data Cube: A Multidimensional Data Model 136<br/>4.2.2 Stars, Snowflakes. and Fact Constellations; Schemas<br/>for Multidimensional Data Models 139<br/>4.2.3 Dimensions: The Role of Concept Hierarchies 142<br/>4.2.4 Measures: Their Categorization and Computation 144<br/>4.2.5 Typical OLAP Operations 146<br/>4.2.6 A Starnet Query Model for Querying Multidimensional<br/>Databases 149<br/>4.3 Data Warehouse Design and Usage 150<br/>4.3.1 A Business Analysis Framework for Data Warehouse Design I SO<br/>4.3.2 Data Warehouse Design Process 151<br/>4.3.3 Data Warehouse Usage for Information Processing 153<br/>4.3.4 From Online Analytical Processing to Multidimensional<br/>Data Mining 155<br/>4.4 Data Warehouse Implementation 156<br/>4.4.1 Efficient Data Cube Computation: An Overview 156<br/>4.4.2 Indexing OLAP Data: Bitmap Index and Join Index 160<br/>4.4.3 Efficient Processing of OLAP Queries 163<br/>4.4.4 OLAP Server Architectures; ROLAP versus MOLAP<br/>versus HOLAP 164<br/>4.5 Data Generalization by Attribute-Oriented Induction 166<br/>4.5.1 Attribute-Oriented Induction for Data Characterization 167<br/>4.5.2 Efficient Implementation of Attribute-Oriented Induction 172<br/>4.5.3 Attribute-Oriented Induction for Class Comparisons 175<br/>4.6 Summary 178<br/>4.7 Exercises 180<br/>4.8 Bibliographic Notes 184<br/>Chapter 5 Data Cube Technology 187<br/>5.1 Data Cube Computation: Preliminary Concepts 188<br/>5.1 .1 Cube Materialization: Full Cube. Iceberg Cube. Closed Cube,<br/>and Cube Shell 188<br/>5.1.2 General Strategies for Data Cube Computation 192<br/>5.2 Data Cube Computation Methods 194<br/>5.2.1 Multiway Array Aggregation for Full Cube Computation 195<br/>5.2.2 BUC: Computing Iceberg Cubes from the Apex Cuboid<br/>Downward 200<br/>5.2.3 Star-Cubing: Computing Iceberg Cubes Using a Dynamic<br/>Star-Tree Structure 204<br/>5.2.4 Precomputing Shell Fragments for Fast High-Dimensional OLAP 210<br/>5.3 Processing Advanced Kinds of Queries by Exploring Cube<br/>Technology 218<br/>5.3.1 Sampling Cubes: OLAP-Based Mining on Sampling Data 218<br/>5.3.2 Ranking Cubes: Efficient Computation of Top-k Queries 225<br/>5.4 Multidimensional Data Analysis in Cube Space 227<br/>5.4.1 Prediction Cubes: Prediction Mining in Cube Space 227<br/>5.4.2 Multifeature Cubes: Complex Aggregation at Multiple<br/>Granularities 230<br/>5.4.3 Exception-Based, Discovery-Driven Cube Space Exploration 231<br/>5.5 Summary 234<br/>5.6 Exercises 235<br/>5.7 Bibliographic Notes 240<br/>Chapter 6 Mining Frequent Patterns, Associations, and Correlations: Basic<br/>Concepts and Methods 243<br/>6.1 Basic Concepts 243<br/>6.1.1 Market Basket Analysis: A Motivating Example 244<br/>6.1.2 Frequent Itemsets, Closed Itemsets, and Association Rules 246<br/>6.2 Frequent Itemset Mining Methods 248<br/>6.2.1 Apriori Algorithm: Finding Frequent Itemsets by Confined<br/>Candidate Generation 248<br/>6.2.2 Generating Association Rules from Frequent Itemsets 254<br/>6.2.3 Improving the Efficiency of Apriori 254<br/>6.2.4 A Pattern-Growth Approach for Mining Frequent Itemsets 257<br/>6.2.5 Mining Frequent Itemsets Using Vertical Data Format 259<br/>6.2.6 Mining Closed and Max Patterns 262<br/>6.3 Which Patterns Are Interesting?—Pattern Evaluation<br/>Methods 264<br/>6.3.1 Strong Rules Are Not Necessarily Interesting 264<br/>6.3.2 From Association Analysis to Correlation Analysis 265<br/>6.3.3 A Comparison of Pattern Evaluation Measures 267<br/>6.4 Summary 271<br/>6.5 Exercises 273<br/>6.6 Bibliographic Notes 276<br/>Chapter 7 Advanced Pattern Mining 279<br/>7.1 Pattern Mining: A Road Map 279<br/>7.2 Pattern Mining in Multilevel, Multidimensional Space 283<br/>7.2.1 Mining Multilevel Associations 283<br/>7.2.2 Mining Multidimensional Associations 287<br/>7.2.3 Mining Quantitative Association Rules 289<br/>7.2.4 Mining Rare Patterns and Negative Patterns 291<br/>7.3 Constraint-Based Frequent Pattern Mining 294<br/>7.3.1 Metarule-Guided Mining of Association Rules 295<br/>7.3.2 Constraint-Based Pattern Generation: Pruning Pattern Space<br/>and Pruning Data Space 296<br/>7.4 Mining High-Dimensional Data and Colossal Patterns 301<br/>7.4.1 Mining Colossal Patterns by Pattern-Fusion 302<br/>7.5 Mining Compressed or Approximate Patterns 307<br/>7.5.1 Mining Compressed Patterns by Pattern Clustering 308<br/>7.5.2 Extracting Redundancy-Aware Top-k Patterns 310<br/>7.6 Pattern Exploration and Application 313<br/>7.6.1 Semantic Annotation of Frequent Patterns 313<br/>7.6.2 Applications of Pattern Mining 317<br/>7.7 Summary 319<br/>7.8 Exercises 32 i<br/>7.9 Bibliographic Notes 323<br/>Chapter 8 Classification: Basic Concepts 327<br/>8.1 Basic Concepts 327<br/>8.1.1 What Is Classification? 327<br/>8.1.2 General Approach to-Classification 328<br/>8.2 Decision Tree Induction 330<br/>8.2.1 Decision Tree Induction 332<br/>8.2.2 Attribute Selection Measures 336<br/>8.2.3 Tree Pruning 344<br/>8.2.4 Scalability and Decision Tree Induction 347<br/>8.2.5 Visual Mining for Decision Tree Induction 348<br/>8.3 Bayes Classification Methods 350<br/>8.3.1 Bayes' Theorem 350<br/>8.3.2 Naive Bayesian Classification 351<br/>8.4 Rule-Based Classification 355<br/>8.4.1 Using IF-THEN Rules for Classification 355<br/>8.4.2 Rule Extraction from a Decision Tree 357<br/>8.4.3 Rule Induction Using a Sequential Covering Algorithm 359<br/>8.5 Model Evaluation and Selection 364<br/>8.5.1 Metrics for Evaluating Classifier Performance 364<br/>8.5.2 Holdout Method and Random Subsampling 370<br/>8.5.3 Cross-Validation 370<br/>8.5.4 Bootstrap 371<br/>8.5.5 Model Selection Using Statistical Tests of Significance 372<br/>8.5.6 Comparing Classifiers Based on Cost-Benefit and ROC Curves 373<br/>8.6 Techniques to Improve Classification Accuracy 377<br/>8.6.1 Introducing Ensemble Methods 378<br/>8.6.2 Bagging 379<br/>8.6.3 Boosting and AdaBoost 380<br/>8.6.4 Random Forests 382<br/>8.6.5 Improving Classification Accuracy of Class-lmbalanced Data 383<br/>8.7 Summary 385<br/>8.8 Exercises 386<br/>8.9 Bibliographic Notes 389<br/>Chapter 9 Classification: Advanced Methods 393<br/>9.1 Bayesian Belief Networks 393<br/>9.1.1 Concepts and Mechanisms 394<br/>9.1.2 Training Bayesian Belief Networks 396<br/>9.2 Classification by Backpropagation 398<br/>9.2.1 A Multilayer Feed-Forward Neural Network 398<br/>9.2.2 Defining a Network Topology 400<br/>9.2.3 Backpropagation 400<br/>9.2.4 Inside the Black Box: Backpropagation and Interpretability 406<br/>9.3 Support Vector Machines 408<br/>9.3.1 The Case When the Data Are Linearly Separable 408<br/>9.3.2 The Case When the Data Are Linearly Inseparable 413<br/>9.4 Classification Using Frequent Patterns 415<br/>9.4.1 Associative Classification 416<br/>9.4.2 Discriminative Frequent Pattern-Based Classification 419<br/>9.5 Lazy Learners (or Learning from Your Neighbors) 422<br/>9.5.1 k-Nearest-Neighbor Classifiers 423<br/>9.5.2 Case-Based Reasoning 425<br/>9.6 Other Classification Methods 426<br/>9.6.1 Genetic Algorithms 426<br/>9.6.2 Rough Set Approach 427<br/>9.6.3 Fuzzy Set Approaches 428<br/>9.7 Additional Topics Regarding Classification 429<br/>9.7.1 Multiclass Classification 430<br/>9.7.2 Semi-Supervised Classification 432<br/>9.7.3 Active Learning 433<br/>9.7.4 Transfer Learning 434<br/>9.8 Summary 436<br/>9.9 Exercises 438<br/>9.10 Bibliographic Notes 439<br/>Chapter 10 Cluster Analysis: Basic Concepts and Methods 443<br/>10.1 Cluster Analysis 444<br/>10.1.1 What Is Cluster Analysis? 444<br/>10.1.2 Requirements for Cluster Analysis 445<br/>10.1.3 Overview of Basic Clustering Methods 448<br/>10.2 Partitioning Methods 451<br/>10.2.1 k-Means; A Centroid-Based Technique 451<br/>10.2.2 k-Medoids: A Representative Object-Based Technique 454<br/>10.3 Hierarchical Methods 457<br/>10.3.1 Agglomerative versus Divisive Hierarchical Clustering 459<br/>10.3.2 Distance Measures in Algorithmic Methods 461<br/>10.3.3 BIRCH: Multiphase Hierarchical Clustering Using Clustering<br/>Feature Trees 462<br/>10.3.4 Chameleon: Multiphase Hierarchical Clustering Using Dynamic<br/>Modeling 466<br/>10.3.5 Probabilistic Hierarchical Clustering 467<br/>10.4 Density-Based Methods 471<br/>10.4.1 DBSCAN: Density-Based Clustering Based on Connected<br/>Regions with High Density 471<br/>10.4.2 OPTICS: Ordering Points to Identify the Clustering Structure 473<br/>10.4.3 DENCLUE: Clustering Based on Density Distribution Functions 476<br/>10.5 Grid-Based Methods 479<br/>10.5.1 STING: STatistical INformation Grid 479<br/>10.5.2 CLIQUE: An Apriori-like Subspace Clustering Method 481<br/>10.6 Evaluation of Clustering 483<br/>10.6.1 Assessing Clustering Tendency 484<br/>10.6.2 Determining the Number of Clusters 486<br/>10.6.3 Measuring Clustering Quality 487<br/>10.7 Summary 490<br/>10.8 Exercises 491<br/>10.9 Bibliographic Notes 494<br/>Chapter 1 1 Advanced Cluster Analysis 497<br/>1 1 .1 Probabilistic Model-Based Clustering 497<br/>1 1.1.1 Fuzzy Clusters 499<br/>I 1 . 1.2 Probabilistic Model-Based Clusters 501<br/>I 1. 1.3 Expectation-Maximization Algorithm 505<br/>I 1.2 Clustering High-Dimensional Data 508<br/>I 1.2.1 Clustering High-Dimensional Data: Problems, Challenges,<br/>and Major Methodologies 508<br/>1 1.2.2 Subspace Clustering Methods 510<br/>1 1.2.3 Biclustering 512<br/>I 1.2.4 Dimensionality Reduction Methods and Spectral Clustering 519<br/>1 1.3 Clustering Graph and Network Data 522<br/>1 1.3. 1 Applications and Challenges 523<br/>1 1.3.2 Similarity Measures 525<br/>1 1.3.3 Graph Clustering Methods 528<br/>1 1.4 Clustering with Constraints 532<br/>1 1.4.1 Categorization of Constraints 533<br/>1 1.4.2 Methods for Clustering with Constraints 535<br/>1 1.5 Summary 538<br/>1 1.6 Exercises 539<br/>1 1.7 Bibliographic Notes 540<br/>Chapter 12 Outlier Detection 543<br/>12.1 Outliers and Outlier Analysis 544<br/>12.1.1 What Are Outliers? 544<br/>12.1.2 Types of Outliers 545<br/>12.1.3 Challenges of Outlier Detection 548<br/>12.2 Outlier Detection Methods 549<br/>12.2.1 Supervised, Semi-Supervised, and Unsupervised Methods 549<br/>12.2.2 Statistical Methods, Proximity-Based Methods, and<br/>Clustering-Based Methods 551<br/>12.3 Statistical Approaches 553<br/>12.3.1 Parametric Methods 553<br/>12.3.2 Nonparametric Methods 558<br/>12.4 Proximity-Based Approaches 560<br/>12.4.1 Distance-Based Outlier Detection and a Nested Loop<br/>Method 561<br/>12.4.2 A Grid-Based Method 562<br/>12.4.3 Density-Based Outlier Detection 564<br/>12.5 Clustering-Based Approaches 567<br/>12.6 Classification-Based Approaches 571<br/>12.7 Mining Contextual and Collective Outliers 573<br/>12.7.1 Transforming Contextual Outlier Detection to C onventional<br/>Outlier Detection 573<br/>12.7.2 Modeling Normal Behavior with Respect to Contexts 574<br/>12.7.3 Mining Collective Outliers 575<br/>12.8 Outlier Detection In HIgh-Dlmenslonal Data 576<br/>12.8.1 Extending Conventional Outlier Detection 577<br/>12.8.2 Finding Outliers in Subspaces 578<br/>12.8.3 Modeling High-Dimensional Outliers 579<br/>12.9 Summary 581<br/>12.10 Exercises 582<br/>12.1 1 Bibliographic Notes 583<br/>Chapter 13 Data Mining Trends and Research Frontiers 585<br/>13.1 Mining Complex Data Types 585<br/>13.1.1 Mining Sequence Data: Time-Series, Symbolic Sequences,<br/>and Biological Sequences 586<br/>13.1.2 Mining Graphs and Networks 591<br/>13.1.3 Mining Other Kinds of Data 595<br/>13.2 Other Methodologies of Data Mining 598<br/>13.2.1 Statistical Data Mining 598<br/>13.2.2 Views on Data Mining Foundations 600<br/>13.2.3 Visual and Audio Data Mining 602<br/>13.3 Data Mining Applications 607<br/>13.3.1 Data Mining for Financial Data Analysis 607<br/>13.3.2 Data Mining for Retail and Telecommunication Industries <br/>13.3.3 Data Mining in Science and Engineering 61 1<br/>13.3.4 Data Mining for Intrusion Detection and Prevention 614<br/>13.3.5 Data Mining and Recommender Systems 615<br/>13.4 Data Mining and Society 618<br/>13.4.1 Ubiquitous and Invisible Data Mining 618<br/>13.4.2 Privacy, Security, and Social Impacts of Data Mining 620<br/>13.5 Data Mining Trends 622<br/>13.6 Summary 625<br/>13.7 Exercises 626<br/>13.8 Bibliographic Notes 628<br/>Bibliography 633<br/>Index 673
650 ## - SUBJECT
Keyword Computer Programming
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Koha item type GN Books
Holdings
Withdrawn status Lost status Damaged status Not for loan Home library Current library Shelving location Date acquired Full call number Accession number Date last seen Koha item type
        Central Library, Sikkim University Central Library, Sikkim University General Book Section 24/06/2016 005.741 HAN/D P35920 24/06/2016 General Books
SIKKIM UNIVERSITY
University Portal | Contact Librarian | Library Portal

Powered by Koha