How good is the parser recognize_bottomup.pl?

It's pretty easy to understand --- but performance-wise it's awful. Our English grammar is pretty basic --- but the query

?- recognize_bottomup([mia,knew,vincent,shot,marsellus]).

will make Sicstus Prolog hesitate, and

?- recognize_bottomup([jules,believed,the,robber,who,shot,marsellus,fell]).

is painfully slow.

In fact, though this parser is so bad, it's useful --- it acts as a grim warning of how badly things can be done. Moreover, it's useful to try and understand why it is so awful. For it is bad in two quite distinct ways: at the level of implementation and at the level of algorithm.

Implementation

We made heavy use of append/3 to subdivide lists into the required pattern. This is very inefficient. If you examine a trace, you will see the the program spends most of its time trying out different ways of putting lists together. The time it devotes to the key recognition steps is comparatively small.

It's worth emphasizing that this implementation inefficiency has nothing to do with the basic idea of bottom up recognition/parsing. For a start, there are many nice Prolog implementations of fairly simple bottom up parsers --- in particular, what are known as shift-reduce parsers --- that are much better than this. Moreover, we will see that if we implement naive top down parsing/recognition using append/3, the result would be just as inefficient as what we have seen now. On the other hand, if we use difference lists we'll see the result is a lot faster.

Algorithmic Problems

There is, however, a deeper problem. What we have discussed today is a naive bottom up recognizer --- but its naivete has nothing to do with its implementation.

In what deeper sense is our bottom up recognizer inefficient? It has an inefficiency you will find in many different kinds of parsers/recognizers (top down, left corner ...) namely, the algorithm needlessly repeats work.

Consider the sentence ``The robber knew Vincent shot Marsellus''. As we have already mentioned, this sentence is locally ambiguous. In particular, the first part of the string, ``The robber knew Vincent'' is a sentence. Now, our naive bottom up algorithm will find that this string is a sentence --- and then discover that there are more words in the input. Hence, in this case, s is not the right analysis. Thus our parser will backtrack and it will undo and forget all the useful work it has done. For example, while showing that ``The robber knew Vincent'' is a sentence, the parser has to show that ``The robber'' is an np. Great! This is dead right! And we need to know this to get the correct analysis! But then, when it backtracks, it undoes all this work. So when it finally hits on the right way of analyzing ``The robber knew Vincent shot Marsellus'', it will once again demonstrate that ``The robber'' is an np. If lots of backtracking is going on, it may end up showing this simple fact over and over and over again.

It is for this reason that I call the algorithm described above `naive'. The poor implementation is easily corrected but this deeper problem is harder to solve.

Back to lab 2.