TheoryCSESemester V

CSPC-515 Data Warehouse and Data Mining

Teaching Scheme

Credit

Marks Distribution

Duration of End Semester Examination

LTPInternal AssessmentEnd Semester ExaminationTotal
3104Maximum Marks: 40Maximum Marks: 601003 Hours
Minimum Marks: 16Minimum Marks: 2440

Unit-I

Data warehouse: Introduction to Data warehouse, Difference between operational database systems and data warehouses, Data warehouse Characteristics, Data warehouse Architecture and its Components, Extraction-Transformation-Loading, Logical(Multi-Dimensional), Data Modeling, Schema Design, Star and Snow-Flake Schema, Fact Constellation, Fact Table, Fully Additive, Semi-Additive, Non Additive Measures; Fact-Less-Facts, Dimension Table Characteristics; OLAP Cube, OLAP Operations, OLAP Server Architecture- ROLAP, MOLAP and HOLAP.

Unit-II

Data Mining: Fundamentals of data mining, Data Mining Functionalities, KDD, Data Mining process, Integration of a Data Mining System with a Database or Data Warehouse System, Major issues in Data Mining.

Data Pre-processing: Need for Data Pre-processing, Steps in data pre-processing: Data Cleaning, Data Integration and Transformation, Data Reduction Techniques, Data Discretization and Concept Hierarchy Generation.

Unit-III

Association Rules: Problem Definition, Frequent Item Set Generation, Association Rule Generation, APRIOIRI Algorithm, The Partition Algorithm.

Classification: Problem Definition, General Approaches to solving a classification problem, Evaluation of Classifiers, Classification techniques: Decision Trees-Decision, Naive-Bayes Classifier, Bayesian Belief Networks, K- Nearest neighbor classification. Algorithms Evaluation metrics.

Unit-IV

Clustering: Overview of Clustering, Categorization of Major Clustering Methods, Partitioning Methods, Hierarchical Methods.

Advanced Topics and Applications: Web Mining: Types of Web Mining, Web Mining Software. Text Mining: Definition and Importance, Applications: (Search Engines, Sentiment Analysis, Spam Filtering).

Real-world Applications: Business Intelligence, Healthcare Analytics, Cyber security Threat Detection.

On this page