Summary of Presentation by Prof. John Case on Computational Learning Theory

Summary Prepared by Tony Mendez (additions and comments by Frawley)

The theory of computational learning was addressed by Professor John Case. He described the theory as an account of "catching the drift" kind of learning and contrasted this with conscious learning.
NOTE: If we are algorithmic learning machines, then what are the overall limits and prospects as such machines (i.e., us). The study of these prospects and limits is computational learning theory. To do this work, we first need to distinguish two kinds of learning: intuitive learning (e.g., figuring out a pattern from exposure to instances of data) vs. deliberate learning (e.g., memorization). Case is concerned with the former.
Case began by briefly describing Turing's analysis. In this "clerical computation example," the clerk's brain is in a finite state, as well as his vision and the algorithms, or programs. (See slide 3 for summary of Turing's analysis.)
Note: assuming that algorithms and memory are finite, Turing gives us a definition of what is computable: i.e., what you can compute, how hard it is, and its limits.
The topic of discussion was to explain the method a machine learns a rule.
That is, learn not by memorizing, but by program (= rule). Instances of such learning would be learning a concept from positive and negative examples, learning a grammar from positive examples, or learning a rule for generating sequences of data, if shown sufficiently much.)
For example, consider how a machine would determine a rule for generating infinite sequences of odd numbers from a series of input. Once the series has been entered, the machine forms levels that consist of differences in an attempt to find a constant. For example, the machine would process the following sequence by its differences:

Sequence: 1 3 5 7 9

Differences: 2 2 2 2

Once the machine finds the constant, 2 in this case, it proceeds in continuing the sequence, following the rule it generated. Significantly, the machine does not know if the rule is correct. The machine, M, can be programmed to handle other cases, such as exponents.

Note: these ideas of "knowing" and correctness are important points. It is not clear that the machine knowns the rule in the first place, but it is also not clear that the machine knows it is "correct" in some universal sense. "Know as correct" means "stop testing further hypotheses" or "be in a state such that it does not revise its programs." This is why this state of "knowing" is called "success" rather than "correctness." Question: when humans develop face knowledge from instances, do they reach a point where they stop revising their programs for processing faces? This seems to be true for language.
What are the necessary features of such learning? According to Gold, no learning machine exists which can successfully learn every infinite, computable numerical sequence. No machine exists which can combine other machines. Thus, there is no ultimate general learning machine.
Note: is this vindication of modularity?
Case and Smith, however, did determine that if a machine is allowed to make mistakes or not have each rule consistent with the data (i.e., not be totally successful), it can become a "better learner" ("better learner" means `learning larger sequences'). The success rate is raised to the N+1 power, where N is the number of mistakes. The more mistakes a machine makes, the more number of sequences produced.

Case concluded that the mind can be compared to a machine that does not revise its programs once past a "critical period."

Note the commonalities with our findings about psychobiological development.