Algorithm: Applying transformations

For an introduction to these concepts, see: An example of language parsing

The language parsing framework that has been laid out thus far uses a set of transformations. Say this set is of size N, where N is a large number. Perhaps N = 1,000,000.

If our reduction algorithm, that is, the algorithm that reduces a sentence to our internal representation, is O(N^2), then it's not feasible. We don't want to spend on the order of 1,000,000,000,000 (a trillion) operations just to parse a simple sentence.

Even if our algorithm is O(N), we're spending on the order of a million or perhaps a billion operations to parse a simple sentence. This is feasible, but not desirable.

My intuition is that the algorithm should be at most O(log(N)), but more realistically constant time on the size of the transformation set. What this hints at is the use of a hash or tree to index the transformation set.

For example, the input specification:

{person} is {#}

... could be indexed on the first token, followed by the second token, followed by the third token. Determining whether a transformation exists for this input specification, then, involves a lookup in an index.

The implication of this is that the first step in processing a sentence is to break each of its words down into all of the possible entities that they might represent, and then permuting those to create a long list of possible interpretations. This list could be extremely long, perhaps even O(W^5) or O(W^6), and hints at the need to process long sentences in chunks/fragments rather than as a whole. (15^6 = 11,390,625)

One curious fact about language is that people don't interpret statements as a whole, but rather from beginning to end, parsing as we go. When we listen to someone speak, we get served one word at a time and must assemble the meaning iteratively. Perhaps this is a hint to the parsing strategy that should be used. Could it be that human languages, at there very core, are designed to be parsed in an iterative manner from beginning to end?