# cmrc2018
**Repository Path**: iflytek/cmrc2018
## Basic Information
- **Project Name**: cmrc2018
- **Description**: A Span-Extraction Dataset for Chinese Machine Reading Comprehension (CMRC 2018)
- **Primary Language**: Python
- **License**: CC-BY-SA-4.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-06-02
- **Last Updated**: 2025-09-28
## Categories & Tags
**Categories**: ai
**Tags**: None
## README
[**中文说明**](./README_CN.md) | [**English**](./README.md)
本目录包含[第二届“讯飞杯”中文机器阅读理解评测(CMRC 2018)](https://hfl-rc.github.io/cmrc2018/)所使用的数据。本数据集已被计算语言学顶级国际会议[EMNLP 2019](http://emnlp-ijcnlp2019.org)录用。
**Title: A Span-Extraction Dataset for Chinese Machine Reading Comprehension**
Authors: Yiming Cui, Ting Liu, Wanxiang Che, Li Xiao, Zhipeng Chen, Wentao Ma, Shijin Wang, Guoping Hu
Link: [https://www.aclweb.org/anthology/D19-1600/](https://www.aclweb.org/anthology/D19-1600/)
Venue: EMNLP-IJCNLP 2019
### 开放式挑战排行榜 (new!)
想了解在CMRC 2018数据上表现最好的模型吗?请查阅排行榜。
[https://ymcui.github.io/cmrc2018/](https://ymcui.github.io/cmrc2018/)
### CMRC 2018 公开数据集
请通过CodaLab Worksheet下载CMRC 2018公开数据集(训练集,开发集)。
[https://worksheets.codalab.org/worksheets/0x92a80d2fab4b4f79a2b4064f7ddca9ce](https://worksheets.codalab.org/worksheets/0x92a80d2fab4b4f79a2b4064f7ddca9ce)
### 提交方法
如果你想要在**隐藏的测试集、挑战集上测试你的模型**,请通过以下步骤提交你的模型。
[https://worksheets.codalab.org/worksheets/0x96f61ee5e9914aee8b54bd11e66ec647/](https://worksheets.codalab.org/worksheets/0x96f61ee5e9914aee8b54bd11e66ec647/)
**需要注意的是,[CLUE](https://github.com/CLUEbenchmark/CLUE)上提供的测试集仅是CMRC 2018的部分子集。正式评测仍需通过上述方法得到完整测试集、挑战集上的结果。**
### 通过🤗datasets快速加载
你可以通过[HuggingFace `datasets` library](https://github.com/huggingface/datasets)工具包快速加载数据集:
```python
!pip install datasets
from datasets import load_dataset
dataset = load_dataset('cmrc2018')
```
关于`datasets`工具包的更多选项和使用细节可以通过这里访问了解:https://github.com/huggingface/datasets
### 引用
如果你在你的工作中使用了我们的数据,请引用下列文献:
```
@inproceedings{cui-emnlp2019-cmrc2018,
title = "A Span-Extraction Dataset for {C}hinese Machine Reading Comprehension",
author = "Cui, Yiming and
Liu, Ting and
Che, Wanxiang and
Xiao, Li and
Chen, Zhipeng and
Ma, Wentao and
Wang, Shijin and
Hu, Guoping",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
month = nov,
year = "2019",
address = "Hong Kong, China",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/D19-1600",
doi = "10.18653/v1/D19-1600",
pages = "5886--5891",
}
```
### International Standard Language Resource Number (ISLRN)
ISLRN: 013-662-947-043-2
http://www.islrn.org/resources/resources_info/7952/
### 哈工大讯飞联合实验室官方微信公众号
欢迎关注哈工大讯飞联合实验室(HFL)微信公众号,了解最新的技术动态。

### 联系我们
请提交Issue。