Saturday, September 28, 2013

Chomsky vs. Hinton

This is a continuation of my "free lunch" post.  I want to provide more context.  I also feel bad about phrasing the discussion in the context of a particular textbook.  The no free lunch theorem is widely quoted and seems to be a standard fixture of the machine learning community.

 I would like to rephrase the issue here as a debate between Chomsky and Hinton.  I will start with caricatures of their positions:

Chomsky:  Generalizing to unseen inputs --- for example judging grammatically on sentences never seen before --- is impossible without some a-priori knowledge of the set of allowed predictors (the set of possible human languages).  Hence we must be born with knowledge of the space of possible human languages --- some form of universal grammar must exist.

Hinton:  Neural networks are a Turing-complete model of computation.  Furthermore, there exists some (perhaps yet to be discovered) universal algorithm for learning networks which can account for leaning in arbitrary domains, including language.

This debate centers on an empirical question --- what algorithms or knowledge is provided by the human genome?

It is important here to distinguish information-theoretic issues from computational complexity issues.

Information-Theoretic Issues: Chomsky takes the position that on purely information theoretic grounds universal grammar must exist. Chomsky's argument is essentially an invocation of the no free lunch theorem.  But the free lunch theorem shows, I think, that there exist information-theoretically adequate universal priors on the set of all computable functions.  On information-theoretic grounds I think Hinton's position is much more defendable than Chomsky's.

Computational Issues: It is clearly silly to consider enumerating all C++ programs as part of a learning algorithm.  But training deep neural networks is not silly --- it has lead to a 30% reduction in word error rate in speech recognition.  Chris Manning is working on deep neural network approaches to learning grammar.

Computational issues do provide good motivations for certain learning algorithms such as the convex optimization used in training an SVM.  But a computational efficiency motivation for a restricted predictor class is different from claiming that for information-theoretic reasons we must be given some restriction to a proper subclass of the computable functions.

There are many Turing complete models of computation and the choice of model does seem to matter. We seem to gain efficiency in programming when we become familiar with the C++ standard library. Somehow the library provides useful general purpose constructs.

We seem to be capable of general purpose thought.  Thought seems related to language. Chomsky might be right that some form of linguistic/cognitive architecture is provided by the human genome. But the adaptive advantage provided by the details of an architecture for general-purpose thought seem likely to be related to computational issues rather than an information-theoretic requirement for a learning bias.

1 comment:

  1. "Neural networks are a Turing-complete model of computation. Furthermore, there exists some (perhaps yet to be discovered) universal algorithm for learning networks which can account for leaning in arbitrary domains, including language."
    One of these algorithms might have been discovered. So far from my testing and work, it seems to meet the criteria for creating language, visual processing, creativity, emotions ( with dopamine); and much more.
    Our cortex is composed of 100K's of cortical cell columns, & except for the motor strip, they all have the same micrograph appearance, 6 levels. The EEG is the same over these, where ever they are, alpha, beta, theta. Cyberneticists believe there is a single, active principle doing the same process in the cortex, according to a Bayesian analysis for predictive values.

    http://anilkseth.wordpress.com/2014/02/03/all-watched-over-by-search-engines-of-loving-grace/#comments
    This might well be the Comparison Process, which so far has met the criteria. It has given endless new insights into understanding the brain, language, and much else.
    You might find it interesting. Google Le chanson sans fin on wordpress.com. The systems works, 1 simple comparison process, and it explains motivations, emotions and much more, simply from a brief look at what makes humor work.
    Herb Wiggins, (ret.); Diplomat Am. Board of Psychiatry/Neuro

    ReplyDelete