Learning
- Basic concepts
- Learning a decision tree (Quinlan's method
- Rough sets
- General concepts
- Potential applications
- Another look at neural networks based on these concepts
- Components of a learning agent (reading, chapter 18 upto page
531)
- supervised learning
- you (supervisor) know the correct answer
- the agent tries to adjust its prediction method based on the
feedback
- reinforcement
- evaluation of action
- there may not be a right answer but a cost associated with
the chose action
- unsupervised learning
- figure out patterns on your own and use the paterns in some
meaningful way
- Decision trees
- Decision tree given in Fig. 18.4 describes the decisions that
will be made by a person regarding waiting for a seat in the restaurant
- You can follow any path and construct a logical sentence that
will determine whether or not the person will wait.
- What type of logic is expressible by the decision tree?
- You cannot have real relations in a decision tree, so it is
essentially propositional logic.
- Page 534: shows a table of examples that is created from the
decision tree.
- Generally, we start with the examples and then build the decision
tree.
- Data mining is becoming a very popular area of application
for learning
- Most of the major stores give you incentives to use their
card such and A&P
- This data can be processed with the faster computers
- There is no evidence that they are actually doing anything
significant with the data.
- Given a table find a decision tree: preferably simplest
- Algorithm (page 535):
- choose an attribute that will provide the best classification
of the objects in the table
- The ultimate classification will be the one that correponds
to minimum entropy. That is based on the value of attribute we
can make decision about the classification
- Generally perfect classification using a single attribute
is not possible
- Fig. 18.6 a versus b, patrons clearly seems like the best
of the two.
- So we start the decision tree with patrons. If the value for
patron is full, we cannot conclude anything.
- So we move on and select next attribute (hungry?)
- The process continues, results are shown in Fig 18.8
- The tree that was learned was better than original tree
- Generally, databases are too big for a human to identify the
patterns. Computers as naive as they are can go through all the
patterns systematically.
- Examples of real life applications on page 539-540.
- How do we decide which one is the best attribute?
- Information theory:
- This is also known as entropy.
- Let us calculate gain(patron)
- P(none)=2/12,I(none)=0
- P(some)=4/12,I(some)=0
- P(full)=6/12
- we have 6 examples 2 positive and 4 negative
- 2/6=1/3,4/6=2/3
- I(full)=0.918
- expected information called remainder(patron)
- p(none)*i(none)+p(some)*i(some)+p(full)*i(full)
- 0+0+0.5*0.918
- 0.459
- gain(patron)=original-remainder(patron)
- 0.541
- Similarly, you can calculate gain(type) on page 541 is show
to be equal to 0
- Choose the attribute with highest gain
- We can use other statistical measures to identify irrelevant
attributes. Read pages 542-543.
- Decision trees can be exponential. May not always be the best
knowledge representations
- Rough Sets, Pawlak (1984)
Id | Headache | Mucsle-pain
| Temp. | Flu |
p1 | no | yes
| high | yes |
p2 | yes | no
| high | yes |
p3 | yes | yes
| veryhigh | yes |
p4 | no | yes
| normal | no |
p5 | yes | no
| high | no |
p6 | no | yes
| veryhigh | yes |
- We want to determine rules that will tell us whether the patient
has flue depending on the attributes headache, muscle-pain, fever
- You can construct an equivalence relation using the evidence
attributes. If two objects have excatly the same values for all
the attributes then they are in the same equivalence class.
- set for flue = {p1,p2,p3,p6}
- notflue={p4,p5}
- lower(flue)={p1,p3,p6}
- upper(flue)={p1,p2,p3,p5,p6}
- lower(notflue)={p4}
- upper(notflue)={p2,p4,p5}
- These upper and lower bounds can be used to determine IF...THEN
type of rules
- We can also use different probability cut-offs to eliminate
odd noise in the data
- Reduct allows you to choose only the relevant attribute
- verify that using only headache and temp may be sufficient
to get the same results
- similarly, using only muscle-pain and temp may be sufficient
to get the same results
- Assessing the performance of the learning algorithm
- Divide your set of examples into training and test set.
- Develop the strategy using training set and then apply it
to test set
- Many times during debugging the program is fixed based on
how it performed for the test set. The test set gets tainted because
it is used in developing the strategy.
- Figure 18.9 shows how the performance changes with the size
of the train and test set.
- Generally, we will go with the knee-of-curve approach
- In the figure 20 may be a good size for the training set.
- Pages 546-550: Definition of
- current-best-hypothesis search
- Find a single hypothesis which explains all the positive examples
- incremental learning with new examples
- false positive: algo says positive but in reality it is negative
- Specialize
- Figure 18.10(d&e)
- false negative: algo says negative but in reality it is positive
- generalize
- Figure18.10(b&c)
- Least commitment search
- Start with a huge hypothesis space and eliminate individual
hypothesis as we see false positive or negative examples
- Exponential
- Neural networks
- Supervised learning
- We know the answer. We try to find a function that will predict
the answer by adjusting the weights
- Unsupervised learning
- Kohonen rule
- Given a set of patterns, separate them into different classes.
The values of classes are not known. Kohonen network will classify
on its own
-