Saint Mary’s University
Department of Mathematics and Computing Science

Betty on Data Mining
The above cartoon is copyright 2001 by NEA Inc. Click to see the complete version

CSC 677.1: Intelligent Data Mining
SUMMER 2002
Some of the material on this web site is obtained from CS6604 taught by Dr. N. Ramakrishnan 
 

Instructor: Dr. Pawan Lingras

Class time: Tuesday and Thrusday 5:30-8:15 p.m.

Brief Description

Data Mining has emerged as one of the most exciting and dynamic fields in computing science. The driving force for data mining is the presence of petabyte-scale online archives that potentially contain valuable bits of information hidden in them. Commercial enterprises have been quick to recognize the value of this concept; consequently, within the span of a few years, the software market itself for data mining is expected to be in excess of $10 billion by the end of this year.

Data mining refers to a family of techniques used to detect interesting nuggets of relationships/knowledge in data. While the theoretical underpinnings of the field have been around for quite some time (in the form of pattern recognition, statistics, data analysis and machine learning), the practice and use of these techniques have been largely ad-hoc. With the availability of large databases to store, manage and assimilate data, the new thrust of data mining lies at the intersection of database systems, artificial intelligence and algorithms that efficiently analyze data. The distributed nature of several databases, their size and the high complexity of many techniques present interesting computational challenges.

The requirements for an MBA student will be that you have MSC 207 type of statistical background and are numerically inclined. Some programming knowledge may be helpful but not essential. It is hoped that the MBA and CSC students will benefit from each others special interests. The main platform will be IBM's DB2, Intelligent Miner, and Text miner software. We will have access to some web logs, as well as data from the industry. A company will share their data as well as some open problems that could be tackled during the course.

Textbook

Han, J. and Kamber, M. Data Mining: Concepts and Techniques, Morgan Kaufmann publishers.

Class Links:

Class Notes
Marks

Tentative list of topics.

Class

Topics

Chapter

1

Overview, Databases

1,2

2

Data Warehousing and On-Line Analytical Processing, time series analysis, neural networks

2, 9, 10

3

Data pre-processing, project team selection

3

4

Data mining task analysis, Visiting lecturer

4-5

5

Test 1, Clustering

12

6

Project first phase, Clustering

13

7

Associations and Rule Generation

6

8

 Associations and Rule Generation

6

9

Classification and Prediction

7

10

Genetic algorithms, fuzzy sets, rough sets, 

9, 10

11

Test 2, Sequential analysis, Web mining

9, 10

12

Project presentations

NA

 

Evaluation scheme

Method of Evaluation

Marks

Class Assignments

20

Project

10+25 = 35

Two tests (no final exam)

10+20 = 30

In-class learning activities (may include homework to be presented in class)

15

Total

100

(You must pass both the tests in order to pass the course)

Assignments
Assignment 1 (5 marks) (Due Date: July 9, 2002)

Assignment 2 (5 marks) (Due Date: July 16, 2002)

Assignment 3 (5 marks) (Due Date: July 30, 2002)

Assignment 4 (5 marks) (Due Date: August 6, 2002)
Project Stage I (10 marks) (Due Date: July 23, 2002)
Project Stage II (25 marks) (Due Date: August 15, 2002)