14 open-source projects similar to deepmind/rc-data, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Rc Data alternative.
This is a large-scale collection of curated Chinese text corpora designed for training natural language processing models. The project provides a variety of datasets, including a deduplicated archive of millions of news articles with titles and keywords, high-quality categorized question-and-answer pairs, and parallel translation corpora. The collection includes millions of aligned Chinese and English sentence pairs used for cross-lingual model training and machine translation development. It also contains filtered question-and-answer data organized by label for the construction of knowledge-
A topic-centric list of HQ open datasets.
This repository contains the NarrativeQA dataset. It includes the list of documents with Wikipedia summaries, links to full stories, and questions and answers.
Collection of tools, utilities, datasets and approaches towards realizing natural language interfaces for the Web of Data. Currently, we are focusing on Question Answering (QA) utilities.
This Repo was permanently archived in 2019/10/01. BUT this is not the end! With almost 3 months together in joint development, the project has been reborn once more! We convert our website from static to dynamic such that everyone can submit a challenge/competition on their own interests. And…
This is a list of datasets/corpora for NLP tasks, in reverse chronological order. Suggestions and pull requests are welcome. The goal is to make this a collaborative effort to maintain an updated list of quality datasets.
Tools for using Maluuba's news questions and answer data. The code in the repo is used to compile the dataset since it cannot be made directly available due to legal reasons.
A dataset of millions of news articles scraped from a curated list of data sources.
The COmmonsense Dataset Adversarially-authored by Humans (CODAH) is an evaluation set for commonsense question-answering in the sentence completion style of SWAG. As opposed to other automatically generated NLI datasets, CODAH is adversarially constructed by humans who can view feedback from a…
GraphQuestions is a characteristic-rich dataset for factoid question answering described in the paper "On Generating Characteristic-rich Question Sets for QA Evaluation" - EMNLP'16.