This assignment is due Thursday, February 6 at 11PM.

Goals

The main goal of this course is to carry out a novel analysis research project in small groups. This assignment will be to write a proposal for your group's project.

In particular, you will:

  1. State your main research question / hypothesis
  2. Explain the methodology that you will use to answer that question
  3. Discuss possible outcomes of your experiment(s)
  4. Make a rough plan of division of labor and a timeline

Proposal Format

Your proposal should be 3-5 pages in length and should contain four sections, as described below. Be sure to cite relevant past work when relevant. These could be papers that motivate your question, that introduce methods that you plan to use, and/or that provide datasets or other tools that you plan to use.

1. Introduction

Begin with one or two paragraphs that motivate your primary research question, or the hypothesis that you want to test. Your question should be motivated from the literature that we've seen in class. For reasons discussed in the methods class, it should be more specific than "does model X understand phenomenon Y?".

Here are some hypothetical questions / hypotheses that could be in the right scope for a class project. Your final result should be novel, i.e. not a mere replication of an existing result. But it can also be fairly modest; this is a proto-paper, not a full conference paper, and you only have a few weeks to work on it. Typical projects will use known methods with a new linguistic phenomenon, or compare an analysis method with different models (/ architectures), or use multiple methods to look at one linguistic phenomenon.

  • Diagnostic classifiers trained on model X's representations from layer Y encode linguistic property P. (For example: can features of semantic parses of sentences be predicted?)
  • Model X relies on heuristic H to solve task Y. A novel challenge data set shows how.
  • LSTM language models (e.g. from Gulordava et al 2018) have single neurons in their memory encoding grammatical gender (in languages with grammatical gender).
  • Left-to-right Transformers (i.e. GPT) do/do not exhibit garden path effects the way that LSTMs do.
  • Pre-trained model X does not distinguish between implicatures and presuppositions.

2. Methods

Your methods section should contain a complete experimental design. This section will probably most closely correspond to the equivalent in your final paper. Be sure to include all of the following:

  • What models are you analyzing?
  • What method will you use to address your question / hypothesis? (E.g. analyzing individual neurons, diagnostic classifiers, adversarial data)
  • What data will you use? (Pre-existing dataset, generated on your own, etc.)
  • What will your evaluation be? (e.g. accuracy/F1 of a diagnostic classifier, visual inspection of individual neuron activations, performance of the model on a new dataset that it hasn't seen)

3. Possible Results

Include a section with a short discussion of the possible outcomes of your experiment and what they would tell you. Negative results are just as good and important as positive results, but it's important to know how you would interpret each possible outcome.

For example (with more surrounding discussion): our diagnostic classifier reliably predicts property P, but only in deeper layers of model X. This shows that the model is learning to extract the feature from raw text, but not immediately. Alternatively: if it does not reliably predict property P, then we will tentatively conclude that the language model does not learn to extract that feature.

4. Division of Labor + Timeline

This section should include all the tasks that your team needs to carry out. You should make a list (or a table), with each item being a task, labeled with who (can be more than one person) will do it, and when you expect it to be done.

Bear in mind: your actual project will not go as linearly as your proposed timeline. That's OK. But it's important to think about all the necessary steps and components in advance, so that you have a view of the whole project while you're working on the individual pieces.

Reminder: every group will present their project on March 12, in a proto-conference. Final papers will be do on March 15. I encourage groups to write the paper in advance of the presentation.

Submission

Your group only needs to submit one copy of the proposal. It must be uploaded in PDF format. Designate one person from the group to upload the proposal on Canvas.

If you are not the designated person, you must, on Canvas, upload a file readme.pdf, containing:

  • your group number, as reflected in the spreadsheet
  • the name of the student who uploaded your proposal