This assignment is due Wednesday, Dec 4 at 11PM.

Goals

Through this assignment you will:

Background

Please review the class slides and readings in the textbook on pronominal anaphora resolution (e.g. Nov 20 slides #31-35) and especially the Hobbs algorithm (Nov 20 slides 40-41; J&M, 2nd ed, p. 704-705).

Analyzing Coreference Resolution with the Hobbs Algorithm

The Hobbs algorithm takes as input a pronoun and a sequence of sentence parse trees in the context, and returns the proposed antecedent. The data file contains a list of pairs of sentences separated by blank lines. In each pair of the sentences, the second sentence has one or more pronouns to be resolved. Parse the sentences, almost all of which are drawn from the first homework assignment, using the same techniques as in HW#1 (or HW#5 if you want to handle number agreement).

For each pronoun, in each sentence pair, trace the Hobbs algorithm to identify its antecedent.

Specifically, you should:

  1. Read in your grammar.
  2. Read in the file of sentence pairs with pronouns to resolve.
  3. For each (pronoun, sentence pair) set:
    1. Parse the sentences with your grammar.
    2. Print out the pronoun and the corresponding parses.
    3. Use the Hobbs algorithm to (attempt to) resolve the pronoun in the context.
      1. identify each parse tree node corresponding to ‘X’ in the algorithm, writing out the corresponding NP or S (or SBAR) constituent.
      2. identify each node proposed as an antecedent
      3. reject any proposed node ruled out by agreement
      4. identify the accepted antecedent.
    4. indicate whether the accepted antecedent is correct
      1. If the accepted antecedent is correct, do nothing more
      2. If the accepted antecedent is NOT correct, explain why and identify which of the syntactic and semantic preferences listed in the text (Nov 20 slides 31-35) would be required to correct this error.

Implementation

You should implement steps 1-3(b) using NLTK and a suitable parser. You may do step 3(c) either:

Step 3(d) should be done manually.

Note: Manual processing should be done on a copy of the output of automatic processing.

Programming

Create a program hw9_coref.sh to implement the automatic processing components of the pronominal anaphora resolution process described above invoked as:
hw9_coref.sh <input_grammar_filename> <test_sentence_filename> <output_filename>

where:

Files

The files for this assignment may be found on patas in /dropbox/19-20/571/hw9/.

Test, Example, and Resource Files

Submission Files