# cmrc2018

**Repository Path**: iflytek/cmrc2018

## Basic Information

- **Project Name**: cmrc2018
- **Description**: A Span-Extraction Dataset for Chinese Machine Reading Comprehension  (CMRC 2018)
- **Primary Language**: Python
- **License**: CC-BY-SA-4.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-06-02
- **Last Updated**: 2025-09-28

## Categories & Tags

**Categories**: ai

**Tags**: None

## README

[**中文说明**](./README_CN.md) | [**English**](./README.md)

<p align="center">
    <br>
    <img src="./banner.png" width="500"/>
    <br>
</p>
<p align="center">
    <a href="https://github.com/ymcui/cmrc2018/blob/master/LICENSE">
        <img alt="GitHub" src="https://img.shields.io/github/license/ymcui/cmrc2018.svg?color=blue&style=flat-square">
    </a>
</p>

本目录包含[第二届“讯飞杯”中文机器阅读理解评测（CMRC 2018）](https://hfl-rc.github.io/cmrc2018/)所使用的数据。本数据集已被计算语言学顶级国际会议[EMNLP 2019](http://emnlp-ijcnlp2019.org)录用。

**Title: A Span-Extraction Dataset for Chinese Machine Reading Comprehension**    
Authors: Yiming Cui, Ting Liu, Wanxiang Che, Li Xiao, Zhipeng Chen, Wentao Ma, Shijin Wang, Guoping Hu   
Link: [https://www.aclweb.org/anthology/D19-1600/](https://www.aclweb.org/anthology/D19-1600/)  
Venue: EMNLP-IJCNLP 2019

### 开放式挑战排行榜 (new!)
想了解在CMRC 2018数据上表现最好的模型吗？请查阅排行榜。
[https://ymcui.github.io/cmrc2018/](https://ymcui.github.io/cmrc2018/)

### CMRC 2018 公开数据集
请通过CodaLab Worksheet下载CMRC 2018公开数据集（训练集，开发集）。
[https://worksheets.codalab.org/worksheets/0x92a80d2fab4b4f79a2b4064f7ddca9ce](https://worksheets.codalab.org/worksheets/0x92a80d2fab4b4f79a2b4064f7ddca9ce)

### 提交方法
如果你想要在**隐藏的测试集、挑战集上测试你的模型**，请通过以下步骤提交你的模型。
[https://worksheets.codalab.org/worksheets/0x96f61ee5e9914aee8b54bd11e66ec647/](https://worksheets.codalab.org/worksheets/0x96f61ee5e9914aee8b54bd11e66ec647/)

**需要注意的是，[CLUE](https://github.com/CLUEbenchmark/CLUE)上提供的测试集仅是CMRC 2018的部分子集。正式评测仍需通过上述方法得到完整测试集、挑战集上的结果。**


### 通过🤗datasets快速加载
你可以通过[HuggingFace `datasets` library](https://github.com/huggingface/datasets)工具包快速加载数据集：

```python
!pip install datasets
from datasets import load_dataset
dataset = load_dataset('cmrc2018')
```
关于`datasets`工具包的更多选项和使用细节可以通过这里访问了解：https://github.com/huggingface/datasets

### 引用
如果你在你的工作中使用了我们的数据，请引用下列文献：

```
@inproceedings{cui-emnlp2019-cmrc2018,
    title = "A Span-Extraction Dataset for {C}hinese Machine Reading Comprehension",
    author = "Cui, Yiming  and
      Liu, Ting  and
      Che, Wanxiang  and
      Xiao, Li  and
      Chen, Zhipeng  and
      Ma, Wentao  and
      Wang, Shijin  and
      Hu, Guoping",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-1600",
    doi = "10.18653/v1/D19-1600",
    pages = "5886--5891",
}
```
### International Standard Language Resource Number (ISLRN)
ISLRN: 013-662-947-043-2

http://www.islrn.org/resources/resources_info/7952/

### 哈工大讯飞联合实验室官方微信公众号
欢迎关注哈工大讯飞联合实验室（HFL）微信公众号，了解最新的技术动态。

![qrcode.png](./qrcode.jpg)

### 联系我们
请提交Issue。