I first want to say that I edited my last post to replace the phrase "vector types" with "vector categories". I think the NLP community is more comfortable with the term category, as in "syntactic category". The phrase "vector categories" may have a better change of going viral in the NLP community.
This post is on Wittgenstein and Mentalese. For those not familiar with phrase, "mentalese"refers to a hypothetical language of thought --- some form of symbolic representation of propositions and beliefs manipulated by the computational process of thought. Mentalese seems related to the concept of an interlingua in machine translation or logical forms in linguistics. One interpretation of "semantics" is the relationship between the surface form of language and internal mentalese expressions. Before going any further, the first sentence of the lead story at news.google.com as of this writing is:
President Obama said Thursday his government "strongly condemns" violence in Egypt, and he is canceling U.S.-Egyptian military exercises that had been scheduled for next month.
What should we take the internal representation of this sentence to be? I have argued in previous posts that semantics should be viewed as a relationship between sentences and a database model of reality. The above sentence presents a new fact (I will assume that it is true) that Obama said what he is claimed to have said. Reading this sentence might cause me to update my model of reality. I have argued in previous posts that entities can simply be symbolic tokens such as entity-3687 which occur in relations such as
entity-3687 is an organization
entity-3687 is named "the Muslim Brotherhood".
We might generate a database representation of the above sentence as follows where I will write entity tokens as symbols starting with *.
*E is an event
*E is the referent of " *Obama said *P "
*E happened on *D
*P states *G strongly condemns *V
*P states *Obama is canceling *Exc
The reader may already be familiar with the entities *Obama, the day Thursday *D, Obama's government *G, the current violence in Egypt *V, and the scheduled exercises *Exc. If these entities are well established then perhaps the above relations are all that need to be added to the database. Note that in this case the reference resolution drops most of the verbatim wording of the sentence.
I mention Wittgenstein here because of his emphasis on "public language" as opposed to "private language". I like the emphasis on public language because it seems plausible to me that the relations of the database model of reality are actually just fragments of public language (of, say, English) with certain phrases replaced by entity tokens. In this view mentalese is essentially just public language --- there is no mysterious mentalese. This seems consistent with "shallow" machine translation and the apparent lack of a need for an interlingua. Translation can be viewed as paraphrase --- a direct substitution of one surface form for another. Thought itself may take place in a similar manner --- as a direct manipulation of surface forms.
I heard Chris Manning take the position at a panel on language understanding that dependency parses were perhaps an adequate representation of semantics. I would add reference resolution --- a dependency parse with resolved references seems to go a long way toward meaning.
Of course there is a large literature on tense, aspect, modality, counterfactuals, and quantification. Rigorous mathematical thought must involve some representation of mathematically rigorous statements. I don't know how to reconcile these things with the idea that mentalese might be just fragments of public language. However I find it plausible that some reconciliation exists --- for example, that precise mathematical thought can be carried out in fragments of English. I certainly take the public language hypothesis seriously.