User:Oshanis/SCFG Michael Collins

This article is an extension to the article by Carl deMarcken on SCFG.

Representational proposals adopted from deMarcken for SCFG:
Lexicalized PCFG is the same idea as was introduced in the deMarcken’s paper which added a head word to each non-terminal. This idea is described as Model 1 in the Michael Collin’s paper on “Head-driven statistical models for natural language processing”.

Adding distance to the model:
Model 1 is based on the modifier independence assumption such that, the left and right modifiers are generated by 0th Markov order processes. However the probability of generating each modifier could depend on any function of the previous modifiers, head / parent category and head word. Therefore, the method of using ‘histories’ as a conditioning probabilistic context (as first used by Black at IBM) is modified to take the ‘distances’ into account. The distance is a function of the surface string below previous modifiers. The distance between words standing in head-modifier relationships is important, particularly in capturing right-branching structures (when a string with non-zero length is encountered) and to allow preference for modification of the most recent verb (when the sting contains a verb).

The complement / adjunct distinction
Complementing is to add “-C” suffix to all nonterminals which satisfy the criteria for that particular token to be a completing or essential part of that sentence, (Exact criteria are given in section 3.2.1 in the Collins’ paper.) The reasons for incorporating such complement / adjunct distinction into the parsing model are two folds. First, identifying complements at a post-processing stage is complex because it requires lexical information and knowledge about subcategorization preferences. Second, it helps parsing accuracy (the assumption that complements are generated independently of each other often leads to incorrect parses).

Subcategorization frames
Generative process of Model 1 is extended to include a probabilistic choice of left and right subcategorization frames.

Traces and Wh-Movement
A “TRACE” is used to identify complements in a parse tree. The trace could move about in a phrase as the “Wh” words such as “that” are moved about. Model 3 gives a probabilistic treatment of Wh-movement. It is derived from the analysis given in Generalized Phrase Structure Grammar to handle this problem by adding a “gap” feature to each nonterminal in the tree. Then it propagates gaps through the tree until they are finally discharged as a “TRACE” complement.

Comparison with the deMarcken’s proposals
Although there are no experimental evidence to compare the three models of parsing with deMarcken’s proposals, it seems Michael Collins’ parser is doing very well at recovering the core structure of sentences: (complements, sentential heads, and base-NP relationships are all covered with 90% accuracy). The sources of error include adjuncts and coordination, because it often involves a dependency between two content-words, leading to very sparse statistics.

Although deMarken’s model includes head information, it does not take into account the distance measures. It seems that there is a remarkable improvement in accuracy when distance measures and subcategorization are taken into account.

When parsing the example suggested earlier, i.e. “listening to music”, it seems that both the adjacency and verb components of the distance measure allow the model to learn a preference for right-branching structures. Thus the structures (A-D) will be preferred over the structures (E-H). Then again when the structures (A-B) are considered, the statistics given in table 9 of the Michael Collins’ paper shows that chance of seeing a PP modifier is 15.8% while the chance of seeing the NP modifier is 2.76%. Therefore structure A has a higher probability. If we consider table 8, we could easily rule out phrase structure D because the chance of seeing a NP as a post-modifier to NP is 3.43%, and this is lower than the probability reported for structure A. There are no data given for post-modifiers to PPs so that we could find out the probability associated with the structure C. However, from the phrase structures A, B and D, it can be concluded that A has the highest probability and thus the gives the correct parse as a human would expect it to be.