Predictive Head-Corner Chart Parsing

Head-Corner (HC) parsing has come up in computational linguistics a few years ago, motivated by linguistic arguments. This idea is a heuristic, rather than a fail-safe principle, hence it is relevant indeed to consider the worst-case behaviour of the HC parser. We define a novel predictive head-corner chart parser of cubic time complexity. We start with a left-corner (LC) chart parser, which is easier to understand. Subsequently, the LC chart parser is generalized to an HC chart parser. It is briefly sketched how the parser can be enhanced with feature structures.


"Our Latin teachers were apparently right" , Mar tin Kay remarks in (Kay, 1989). "You should start [parsing] with the main verb. This will tell you what kinds of subjects and objects to look for and what cases they will be in. When you come to look for these, you should also start by trying to find the main word, because this will tell you most about what else to look for".
Head-driven or head-corner parsing has been addressed in several papers. (Proudian and Pol lard, 1985;Kay· 1989;Satta and Stock 1989;van Noord 1991

; Bouma and van Noord 1993). As the head-driven approach is a heuristic, rather than a fail-safe principle, it is important to pay attention to the worst-case behaviour. This is best taken care of in a tabular approach like the bottom up head-driven parser by Satta and Stock. We enhance the tabular head-driven parser with top down prediction.
The algorithmic details of the head-corner parser are not easy. Therefore we will make some effort to convey the intuition behind the parser.

To that end, we first define a left-corner chart parser in Section 3 and afterwards generalize this to a head-corner parser in 4. A complexity anal ysis is given in 5. We sketch extension with fea-ture structures in 6 and briefly discuss related approaches in 7. nition of other items . Furthermore, there is an initial chart and initial agenda.
At each step some current item is selected from the agenda, and moved to the chart. If the chart contains items that, in combination with the current item, allow recognition of other items not yet present on the chart or on the agenda, these are added to the agenda. This continues until the agenda is empty.
A context-free chart parser does not really construct parse trees . But [b, j-l,j] The turnstyle (1-) notation is a convenient short hand, meaning that the left-hand side items li cence the recognition of the right-hand side item . As a running example we will use the sentence the cat caught a mouse, represented by the lexical categories *det *n *v *det *n .

S--+ NP VP , NP --+ *det *n , VP --+ *v NP .
Due to lack of ambiguity, the example will nicely illustrate the difference between the chart parsers that are presented in this paper.  2,5] complete (6,0,5] complete( 4,10)  Figure 3 it is shown how the LC chart parser steps through a parse tree: • Steps up correspond to a scan or complete as in the Earley case.
• Steps down to the leftmost child are skipped because these are implicitly encoded in the transitive left-corner relation that is encor porated in the parser.
• Steps down to non-leftmost nonterminal children correspond to setting a new s1.1b goal.
that the semantic ambiguity of the noun phrase "Generalized LC parsing" duly reflects the syntactic ambiguity:

SIKKEL -OP DEN AKKER
In Figure 4 the final chart is shown of the LC chart parser that will be formally defined next.
Each item on the final chart corresponds to an arrow in the tree walk.
predict ( The intuition should be clear now, and we present the formal definition rather terse. The left corner of a production is the leftmost symbol in the right-hand side of that production (and c for an empty production). We write A >t U if A has left corner U E (VU { c}) We write >; for the transitive closure of >t · Hence, in the running example, we have S >t *det. The LC chart parser uses the following kinds of resenting the sentence, the agenda is initialized to {[0, S] }. The operators of the LC chart parser are defined as follows. We distinguish separate left corner (le) operators for left corners a, C, and €. [a,j-1,j] terminal items as usual.

lc(i): for A >t B, B � a/3 E P:
[ The initial chart contains the terminal items rep-predict: scan: Thus we have characterized the LC chart parser by defining the initial chart and agenda and the operators. The reader may verify that these op erators produce the chart shown in Figure 4 for ing We introduce the head-corner chart parser by analogy to the left-corner parser. While the LC parser makes a left-to-right walk through a parse tree, the HC parser makes a head-first walk through a parse tree 2 , as shown in Figure 5.  Figure 5: The head-corner tree walk A context-free head grammar is a 5-tuple (N, E, P, S, r), with r a function that assigns a natural number to each production in P. Let IPI denote the length of the right-hand side of p.
Then r is constrained to r(p) = 0 for IPI = 0 and 1 � r(p) � IPI for IPI > 0. The head of a pro duction p is the r(p)-t h symbol of the right-hand side; an £-production has head c.
In a practical notation, we give a head gram mar as a set of productions with the heads un derlined. The head grammar H for our running example is given by A predict item [l, r, A] will be recognized if a con stituent A is being looked for that must be lo cated somewhere between l and r. Such a con stituent should either stretch from l to some j (if we are working to the right from the head of some production) or from r downto some j (if we are working to the .l eft from the head of some pro duction), with l � j � r. A double dotted item  [B --+ a.,B.,, i, j)

Complexity analysis and fu rther optimizations current item is more than linear. The most prob lematic operation is complete ( with scan as a special sub-case) with 5 place markers involved. The number of items that can be recognized now is O(n 2 ), but the work involved for an arbitrary
Complete can be reduced to 3 place markers with some special extra bookkeeping. As a conse q uence, the number of place markers involved in scan and predict will drop from 4 to 3. We keep a goal  l, r). Furthermore, we write an A in every en try ( i, j) with l ::; i ::; j ::; r in which no A is present. A typical case is presented in Figure 7.
A goal [O, 5, A] is to be added, the entry (0,5) is indicated * in Figure 7(a) . One adds A symbols column by column, stopping each time when an A is found. In Figure 7

(b) a * indicates the entries where an A is written and + indicates the entries that were inspected but already contained an A.
During the course of the algorithm only O(n 2 ) A symbols are written, per A only 0( n) entries are inspected that already did contain A.  The obtained worst-case complexity is opti mal, in the sense that all complexity factors are properly accounted for (i.e., the factors r and INI in addition to an optimal Earley parser are ev idently necessary). Ye t, on a practical level, a large percentage of computing time can be saved by adding some more sophistication to the algo rithm. We will not formally introduce an opti mized algorithm, as the definitions grow rather complicated, but simply state some principles that can be implemented straightforwardly.

We have given a formal treatment of a predictive head-corner parser. The item-based description of (predictive) chart parsers is a useful formal ism for such a formal treatment. This is exem plified by the fact that we cover grammars with €-productions with hardly any additional effort, while these are usually left out for the sake of simplicity. Enhancing a head-corner chart parser with prediction is new. It cannot be stated in general that the head corner approach is more efficient than the (gener alized) left-corner approach or other parsers. It is indeed a heuristic, that can be expected to be ef fe ctive when most of the feature information of a constituent is located in the head. Hence, because it is a method based on a heuristic, rather than a fail-safe principle, it is important to consider what happens if the heuristic doesn't pay off. There fore we have made some effort to make sure that the worst-case behaviour conforms to the usual complexity bounds for context-free parsing algo rithms: O(n 3 ) time and 9(n 2 ) space. We have indicated how the algorithm can be extended with fe ature information. An im plementation of a head-corner chart parser for PATR-like unification grammars is (nearly) fin ished. It is currently being tested on a natu ral language grammar developed for a knowledge representation research project at our institute.
We intend to make an extensive comparison of the efficiency of the head-corner and left-corner parser. 275