This assignment is due Thursday, February 6 at 11PM.
The main goal of this course is to carry out a novel analysis research project in small groups. This assignment will be to write a proposal for your group's project.
In particular, you will:
Your proposal should be 3-5 pages in length and should contain four sections, as described below. Be sure to cite relevant past work when relevant. These could be papers that motivate your question, that introduce methods that you plan to use, and/or that provide datasets or other tools that you plan to use.
Begin with one or two paragraphs that motivate your primary research question, or the hypothesis that you want to test. Your question should be motivated from the literature that we've seen in class. For reasons discussed in the methods class, it should be more specific than "does model X understand phenomenon Y?".
Here are some hypothetical questions / hypotheses that could be in the right scope for a class project. Your final result should be novel, i.e. not a mere replication of an existing result. But it can also be fairly modest; this is a proto-paper, not a full conference paper, and you only have a few weeks to work on it. Typical projects will use known methods with a new linguistic phenomenon, or compare an analysis method with different models (/ architectures), or use multiple methods to look at one linguistic phenomenon.
Your methods section should contain a complete experimental design. This section will probably most closely correspond to the equivalent in your final paper. Be sure to include all of the following:
Include a section with a short discussion of the possible outcomes of your experiment and what they would tell you. Negative results are just as good and important as positive results, but it's important to know how you would interpret each possible outcome.
For example (with more surrounding discussion): our diagnostic classifier reliably predicts property P, but only in deeper layers of model X. This shows that the model is learning to extract the feature from raw text, but not immediately. Alternatively: if it does not reliably predict property P, then we will tentatively conclude that the language model does not learn to extract that feature.
This section should include all the tasks that your team needs to carry out. You should make a list (or a table), with each item being a task, labeled with who (can be more than one person) will do it, and when you expect it to be done.
Bear in mind: your actual project will not go as linearly as your proposed timeline. That's OK. But it's important to think about all the necessary steps and components in advance, so that you have a view of the whole project while you're working on the individual pieces.
Reminder: every group will present their project on March 12, in a proto-conference. Final papers will be do on March 15. I encourage groups to write the paper in advance of the presentation.
Your group only needs to submit one copy of the proposal. It must be uploaded in PDF format. Designate one person from the group to upload the proposal on Canvas.
If you are not the designated person, you must, on Canvas, upload a file readme.pdf, containing: