Saint Marys
University
Department of Mathematics and Computing Science
The above cartoon is copyright 2001
by NEA Inc. Click to see the complete version
CSC 677.1: Intelligent Data Mining
SUMMER 2002
Some of
the material on this web site is obtained from CS6604 taught by Dr. N.
Ramakrishnan
Instructor: Dr. Pawan Lingras
Class time: Tuesday and Thrusday 5:30-8:15 p.m.
Brief Description
Data Mining has emerged as one of the most exciting and dynamic fields in computing science. The driving force for data mining is the presence of petabyte-scale online archives that potentially contain valuable bits of information hidden in them. Commercial enterprises have been quick to recognize the value of this concept; consequently, within the span of a few years, the software market itself for data mining is expected to be in excess of $10 billion by the end of this year.Data mining refers to a family of techniques used to detect interesting nuggets of relationships/knowledge in data. While the theoretical underpinnings of the field have been around for quite some time (in the form of pattern recognition, statistics, data analysis and machine learning), the practice and use of these techniques have been largely ad-hoc. With the availability of large databases to store, manage and assimilate data, the new thrust of data mining lies at the intersection of database systems, artificial intelligence and algorithms that efficiently analyze data. The distributed nature of several databases, their size and the high complexity of many techniques present interesting computational challenges.
The requirements for an MBA student will be that you have MSC 207 type of statistical background and are numerically inclined. Some programming knowledge may be helpful but not essential. It is hoped that the MBA and CSC students will benefit from each others special interests. The main platform will be IBM's DB2, Intelligent Miner, and Text miner software. We will have access to some web logs, as well as data from the industry. A company will share their data as well as some open problems that could be tackled during the course.
Textbook
Han, J. and Kamber, M. Data Mining: Concepts and Techniques, Morgan Kaufmann publishers.
Class Links:
Tentative list of topics.
Class |
Topics |
Chapter |
1 |
Overview, Databases | 1,2 |
2 |
Data Warehousing and On-Line Analytical Processing, time series analysis, neural networks | 2, 9, 10 |
3 |
Data pre-processing, project team selection | 3 |
4 |
Data mining task analysis, Visiting lecturer | 4-5 |
5 |
Test 1, Clustering | 12 |
6 |
Project first phase, Clustering | 13 |
7 |
Associations and Rule Generation | 6 |
8 |
Associations and Rule Generation | 6 |
9 |
Classification and Prediction | 7 |
10 |
Genetic algorithms, fuzzy sets, rough sets, | 9, 10 |
11 |
Test 2, Sequential analysis, Web mining | 9, 10 |
12 |
Project presentations | NA |
Evaluation scheme
Method of Evaluation |
Marks |
Class Assignments | 20 |
Project | 10+25 = 35 |
Two tests (no final exam) | 10+20 = 30 |
In-class learning activities (may include homework to be presented in class) | 15 |
Total |
100 |
(You must pass both the tests in order to pass the course)
Assignments
Assignment 1 (5 marks) (Due Date: July 9, 2002)
Assignment 2 (5 marks) (Due Date: July 16, 2002)
Assignment 3 (5 marks) (Due Date: July 30, 2002)
Assignment 4 (5 marks) (Due Date: August 6, 2002)
Project Stage I (10 marks) (Due Date:
July 23, 2002)
Project Stage II (25 marks) (Due Date: August 15, 2002)