The Winograd Schema Challenge at AAAI-18: Announcement

Nuance Communications, Inc. is sponsoring a competition to encourage efforts to develop programs that can solve the Winograd Schema Challenge, an alternative to the Turing Test developed by Hector Levesque, winner of the 2013 IJCAI Award for Research Excellence. The test will be organized, administered, and evaluated by, which is dedicated to furthering and promoting research in the field of automated commonsense reasoning.


The Turing Test is intended to serve as a test of whether a machine has achieved human-level intelligence. In one of its best-known versions , a person attempts to determine whether he or she is conversing (via text) with a human or a machine. However, it has been criticized as being inadequate. At its core, the Turing Test measures a human’s ability to judge deception: Can a machine fool a human into thinking that it too is human? Chatbots like Eugene Goostman can fool at least some judges into thinking it is human, but that likely reveals more about how easy it is to fool some humans, especially in the course of a short conversation, than the bot’s intelligence. It also suggests that the Turing Test may not be an ideal way to judge a machine’s intelligence.

An alternative is the Winograd Schema Challenge.

Rather than base the test on the sort of short free-form conversation suggested by the Turing Test, the Winograd Schema Challenge (WSC) poses a set of multiple-choice questions that have a particular form. Two examples follow; the second, from which the WSC gets its name, is due to Terry Winograd.

I. The trophy would not fit in the brown suitcase because it was too big (small). What was too big (small)?
Answer 0: the trophy
Answer 1: the suitcase

II. The town councilors refused to give the demonstrators a permit because they feared (advocated) violence. Who feared (advocated) violence?
Answer 0: the town councilors
Answer 1: the demonstrators

The answers to the questions (in the above examples, 0 for the sentences if the bolded words are used; 1 for the sentences if the italicized words are used) are expected to be obvious to a layperson. A human who answers the first questions correctly would likely use his knowledge about the typical size of objects and his ability to do spatial reasoning to solve the first example; he would likely use his knowledge about how political demonstrations unfold and his ability to do interpersonal reasoning to solve the second example. Due to the wide variety of commonsense knowledge and commonsense reasoning that would presumably be used by humans to solve Winograd Schema problems, it was proposed during Commonsense-2013 that the Winograd Schema Challenge could be a promising method for tracking progress in automating commonsense reasoning. The Winograd Schema Challenge received further attention after Eugene Goostman fooled 30% of judges into thinking it was human in 2014, sparking interest in developing and furthering alternatives to the Turing Test, and was one of several Turing Test alternatives proposed in the Spring 2016 special issue of AI Magazine, “Beyond the Turing Test.”

Features of the Challenge

Winograd Schemas typically share the following features: (Details can be found in Levesque [2011] and Levesque et al. [2012].)

  1. Two entities or sets of entities, not necessarily people or sentient beings, are mentioned in the sentences by noun phrases.
  2. A pronoun or possessive adjective is used to reference one of the parties (of the right sort so it can refer to either party).
  3. The question involves determining the referent of the pronoun.
  4. There is a special word that is mentioned in the sentence and possibly the question. When replaced with an alternate word, the answer changes although the question still makes sense (e.g., in the above examples, “big” can be changed to “small;” “feared” can be changed to “advocated.”)

Ernest Davis has created a collection of more than 140 sample Winograd Schemas that can be used by participants to test their systems during development, at the WSC Collection. Leora Morgenstern has collected more than 60 sample Pronoun Disambiguation Problems, a more general form of Winograd Schemas that is explained below, and in Morgenstern, Davis, and Ortiz (2016) at the PDP Collection. These collections will be augmented over time with examples from previous tests.

Further details are below.

Subject Tests

The PDPs and Winograd schemas to be used in the AAAI-18 competition Will be validated using tests on human subjects. Details of the results will be published when available.

Earlier Competition

The first running of the Winograd Schema Challenge was at IJCAI-16. An account of the results will be published in a forthcoming issue of AI Magazine.


1. Registration

Contestants should email Charles Ortiz stating their intent to enter the contest no later than January 20, 2018, The contest itself will be held at AAAI 2018 in New Orleans, February 2–7, 2017.

2. Input Format

Contestant programs will receive their input in the form of an .xml file. An example file may be found at The structure of the .xml should be self-explanatory on inspection of this file.

3. Problems

All problems have the following form. There is a text of a single sentence or a few sentences that contains one or more pronouns with ambiguous referents. Each problem asks about one such pronoun. The pronoun that is the subject of the problem is demarcated in the XML input with the tag < pron > < /pron >. (In viewing the XML file in a web browser, the pronoun appears in boldface.) After the text, there is a short excerpt from the text containing the pronoun that is the subject of the problem and a few words that occur on one side or the other; this is for the benefit human viewers. Finally, a list of possible referents is given, labelled “A”, “B”, “C” … For example:

Babar wonders how he can get new clothing. Luckily, a very rich old man who has always been fond of little elephants understands right away that he is longing for a fine suit. As he likes to make people happy, he gives him his wallet.
he is longing for a fine suit

  1. Babar
  2. old man

There may be multiple problems that use the same text but ask about different pronouns, as with problems 2-5 in the example file.

4. Number of Rounds and Questions

There will be one or two rounds in the contest. Each round will consist of 60 questions. Contestants will have 3-1/2 hours (210 minutes) to complete each round.

The two rounds differ in the source of the texts. In the first round, the texts are “Pronoun Disambiguation Problems”; that is, they are drawn from actual texts, possibly with some editing. In the second round, each text is one half of a Winograd schema. A detailed discussion and justification is given in the AI Magazine article by Morgenstern, Davis, and Ortiz (2016).

Only contestants who achieve at least 90% in the first round will be allowed to compete in the second round. If no contestants qualify, then the second round will not be given.

5. Competing

Each contestant should be represented by an individual who is present. The representative must bring a laptop on which the entry will run. Any commercially sold portable computer is acceptable.

If it is not possible for a contestant to come in person, then contact the contest organizer to arrange to submit an executable program, which we will run.

6. Internet Access

Limited access to the Internet will be permitted, under certain circumstances. Please contact the organizers for details.

7. Output Format

The output of your program should be a plain text file named TeamName-output.txt. The format to be followed is illustrated in the file

For each problem, there will be four lines in the output, separated by line breaks:
Line 1: Problem number, and echo of text of problem.
Line 2: Echo of the excerpt for the problem.
Line 3: “Answer” problemNumber.answerNumber answer
Line 4: Blank, as a separator between problems

For example, if the problem above is problem 2 in the input, then the corresponding four lines of the output file would be as follows:

Line 1: Babar wonders how he can get new clothing. Luckily, a very rich old man who has always been fond of little elephants understands right away that he is longing for a fine suit. As he likes to make people happy, he gives him his wallet.
Line 2: he is longing for a fine suit 
Line 3: Answer 2.A Babar 
Line 4: Blank

At the end of the file, there should be a comma-separated list of all the answers in order. For example:

A, A, B, B, A, A, B

The submission will be graded on the final list of answers. The remaining material is there for human inspection.

No problems should be omitted. Any problem that is omitted will be marked as wrong, so it always pays to guess.

8. Publication

In the three weeks following the competition, researchers with winning or potentially winning entries will be expected to submit to WSC organizers a paper explaining the algorithms, knowledge sources, and knowledge structures used. These papers will be posted on the website. Publication on the website does not preclude any other publication. Entries not submitting such a paper will be disqualified.

If, in the judgement of the contest committee, the description of the program in this paper is entirely inadequate or implausible as an explanation of the success of the program, then the team involved will be asked to demonstrate in detail that the behavior of the program is in fact that described in the paper.

The aim of this contest is to advance science; all results obtained must be reproducible, and communicable to the public. As such, any winning entry is encouraged to furnish to the organizers of the Winograd Schema Challenge Competition its source code and executable code, and to use open source databases or knowledge bases or make its databases and knowledge structures available for independent verification of results.

9. Prizes

The grand prize of $25,000 will be awarded to the first team to achieve a score of 90% in both rounds of the contest. If more than one team accomplishes this, the prize will be awarded to the team with the higher score in the second round. If tied in the second round, the prize will go to the team with the higher score in the first round. If both rounds are ties, the prize will be split.

At AAAI-2018 three smaller prizes, of $1000, $750, and $500 will be awarded to the top three programs that score over 65% on the first round of the contest.

10. Contest Committee

The contest has been designed and will be administered by a committee consisting of: Leora Morgenstern; Ernest Davis; and Charles Ortiz The committee’s decisions on all matters relating to the contest is final. Any questions should be addressed to them. We gratefully acknowledge the assistance of Hector Levesque and Gary Marcus.


Levesque, H. J. 2011. The Winograd Schema Challenge. In Logical Formalizations of Commonsense Reasoning: Papers from the 2011 AAAI Spring Symposium. Technical Report SS-11-06, 63–68. Menlo Park, CA: AAAI Press.

Levesque, H.; Davis, E.; and Morgenstern, L. 2012. The Winograd Schema Challenge. In Principles of Knowledge Representation and Reasoning: Proceedings of the Thirteenth International Conference (KR2012), 552–561. Palo Alto, CA: AAAI Press.

Morgenstern, L.; Davis, E.; and Ortiz, C. 2016. Planning, Executing, and Evaluating the Winograd Schema Challenge. AI Magazine 37(1): 50–54. doi: 10.1609/aimag.v37i1.2639

This site is protected by copyright and trademark laws under US and International law. All rights reserved. Copyright © 1995–2017 AAAI