BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring ---they are generated in unprompted and unconstrained settings.
Each example is a triplet of (question, passage, answer), with the title of the page as an optional additional context. The text-pair classification setup is similar to existing natural language inference tasks.
By sampling questions from the distribution of information-seeking queries (rather than prompting annotators for text pairs), we observe significantly more challenging examples compared to existing NLI datasets.