This assignment is due Wednesday, Dec 9 at 11PM.
Goals
Through this assignment you will:
- Explore issues in pronominal anaphora resolution.
- Gain familiarity with syntax-based resolution techniques.
- Analyze the effectiveness of the Hobbs algorithm by applying it to pairs of parsed sentences.
- Optionally: Implement the Hobbs algorithm for anaphora resolution on a set of sentences.
Background
Please review the class slides and readings in the textbook on pronominal anaphora resolution (e.g. Nov 20 slides #31-35) and especially the Hobbs algorithm (Nov 20 slides 40-41; J&M, 2nd ed, p. 704-705).
Analyzing Coreference Resolution with the Hobbs Algorithm
The Hobbs algorithm takes as input a pronoun and a sequence of sentence parse trees in the context, and returns the proposed antecedent. The data file contains a list of pairs of sentences separated by blank lines. In each pair of the sentences, the second sentence has one or more pronouns to be resolved. Parse the sentences, almost all of which are drawn from the first homework assignment, using the same techniques as in HW#1 (or HW#5 if you want to handle number agreement).
For each pronoun, in each sentence pair, trace the Hobbs algorithm to identify its antecedent.
Specifically, you should:
- Read in your grammar.
- Read in the file of sentence pairs with pronouns to resolve.
- For each (pronoun, sentence pair) set:
- Parse the sentences with your grammar.
- Print out the pronoun and the corresponding parses.
- Use the Hobbs algorithm to (attempt to) resolve the pronoun in the context.
- identify each parse tree node corresponding to ‘X’ in the algorithm, writing out the corresponding NP or S (or SBAR) constituent.
- identify each node proposed as an antecedent
- reject any proposed node ruled out by agreement
- identify the accepted antecedent.
- indicate whether the accepted antecedent is correct
- If the accepted antecedent is correct, do nothing more
- If the accepted antecedent is NOT correct, explain why and identify which of the syntactic and semantic preferences listed in the text (Nov 30 slides 32-35) would be required to correct this error.
Implementation
You should implement steps 1-3(b) using NLTK and a suitable parser. You may do step 3(c) either:
- by manually stepping through the algorithm, or
- (for additional credit) by implementing this simplified portion of the algorithm. If you take this coding route, you may use a feature grammar or a simple look-up table to filter for agreement. You may use any supporting software, such as NLTK’s components for manipulating parse trees, that you wish, provided it does not implement the full Hobbs algorithm for you.
Step 3(d) should be done manually.
Note: Manual processing should be done on a copy of the output of automatic processing.
Programming
Create a program hw9_coref.sh to implement the automatic processing components of the pronominal anaphora resolution process described above invoked as:
hw9_coref.sh <input_grammar_filename> <test_sentence_filename> <output_filename>
where:
- <input_grammar_filename>
- The name of the file that holds the grammar to be used to parse the sentences. This should be a legal NLTK CFG grammar (with or without features).
- <test_sentence_filename>
- The name of the file that holds the pairs of sentences that form contexts for pronoun resolution. Each sentence appears on a line by itself, with a blank between pairs of sentences. The second sentence of each pair contains one or more pronouns to resolve.
- <output_filename>
- The name of the file to which the results of automatic processing for this assignment will be written, either:
- Parsing and pronoun identification only, or
- Parsing through candidate antecedent identification
- The name of the file to which the results of automatic processing for this assignment will be written, either:
Files
The files for this assignment may be found on patas in /dropbox/20-21/571/hw9/.
Test, Example, and Resource Files
- coref_sentences.txt
- Contains the contexts to analyze. You should resolve the pronoun(s) in the second sentence in each pair based on the context provided by the pair of sentences.
- simple_example_sentences.txt
- Contains a set of example sentence pairs with pronouns to be resolved.
- simple_example_output.txt
- Contains an application to a simplified parse of a textbook example. This is intended to provide an example of the process and output format.
- grammar.cfg
- Contains a simple grammar that covers the test sentences and is fairly compatible with the Hobbs algorithm in the text (minor changes may be made). You may also use the grammar from HW#1 (with adaptations to the algorithm as needed).
Submission Files
- hw9.tar.gz: Tarball containing:
- hw9_coref.sh
- Program which implements the automatic processing phase of your Hobbs algorithm-based pronoun resolution approach.
- hw9_output.txt
- Output of running your program with your grammar and the test sentences, through the automatic processing stages.
- hw9_output_final.txt
- This file should contain the augmented analysis based on the contents of hw9_output.txt.
- For the manual case, this is steps 3(c–d)
- For the coding case, this is steps 3(d)
- In particular, for each (pronoun, sentence pair), your output file should have:
- First line: pronoun parse-sent-1 parse-sent-2
- One line for each node corresponding to 'X' in the algorithm: write the entire constituent in bracket notation
- For each node presented as possible antecedent:
- One line with bracketed node
- One line: "Accept" or "Reject" (based on agreement; if "Reject", what was the disagreement?)
- One line: "Correct" or "Incorrect": was the accepted antecedent intuitively correct or not. If not, explain on this line why not, as described above.
- This file should contain the augmented analysis based on the contents of hw9_output.txt.
- readme.{txt|pdf}
- This file should describe and discuss your work on this assignment. Include problems you came across and how (or if) you were able to solve them, any insights, special features, and what you learned. Give examples if possible. If you were not able to complete parts of the project, discuss what you tried and/or what did not work. This will allow you to receive maximum credit for partial work.In particular, you should discuss the successes and failures of the algorithm in resolving these pronouns in context.
- hw9_coref.sh