# summary

**Repository Path**: lddsdu/summary

## Basic Information

- **Project Name**: summary
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2019-09-08
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# CNN News Story Dataset

##### 任务简介
Summary: 给定长文本(story), 生成摘要(highlight), 数据集采用CNN News Story Dataset．

**解压数据**

```sh
$ cd data
$ unzip cnn_stories_tokenized.zip
```

##### 数据加载
```python
from load_data import load_stories

directory = 'data/cnn_stories_tokenized/'
stories = load_stories(directory, 10000)
print('Loaded Stories %d' % len(stories))
```

**原始文本**

```text
# 原始文本
Atlanta -LRB- CNN -RRB- -- A young girl bravely stood to ask the Dalai Lama 's doctor a question , and he gave her an unusual answer .
Dr. Tsewang Tamdin , a world-renowned expert in Tibetan medicine , visited Emory University in Atlanta on Monday as part of his effort to reach more American medical practitioners . He wants to develop collaborative projects between the Tibetan medicine system , which is more than 2,500 years old , and Western medicine .
The little girl told Tamdin she suffered from asthma . She wanted to know if there was anything in Tibetan medicine that could help her get better .
Tamdin , who spoke through a translator for ...

# 参考摘要1
Tibetan medical experts want more collaborative projects with modern medicine
# 参考摘要2
Tibetan doctors sometimes prescribe kindness and compassion to cure illness
```


**相关summarize工具**

- summarize(实现了TextRank的工具包)
```python
from gensim.summarization.summarizer import summarize
```

**依赖**
- gensim
- sumeval


##### 任务
- **抽取式算法**  (要求手写这部分代码)

![TextRank](assets/text_rank.png)

- **生成式算法** (了解)

![Encoder-Decoder](assets/encoder-decoder.png)


##### Summary Eval

摘要的评测指标采用了Rouge和Bleu，使用python sumeval可实现评测，使用方法如下．

- Rouge Metric
```python
from sumeval.metrics.rouge import RougeCalculator


rouge = RougeCalculator(stopwords=True, lang="en")

rouge_1 = rouge.rouge_n(
            summary="I went to the Mars from my living town.",
            references="I went to Mars",
            n=1)

rouge_2 = rouge.rouge_n(
            summary="I went to the Mars from my living town.",
            references=["I went to Mars", "It's my living town"],
            n=2)

rouge_l = rouge.rouge_l(
            summary="I went to the Mars from my living town.",
            references=["I went to Mars", "It's my living town"])

# You need spaCy to calculate ROUGE-BE

rouge_be = rouge.rouge_be(
            summary="I went to the Mars from my living town.",
            references=["I went to Mars", "It's my living town"])

print("ROUGE-1: {}, ROUGE-2: {}, ROUGE-L: {}, ROUGE-BE: {}".format(
    rouge_1, rouge_2, rouge_l, rouge_be
).replace(", ", "\n"))
```
- Bleu Metric
```python
from sumeval.metrics.bleu import BLEUCalculator


bleu = BLEUCalculator()
score = bleu.bleu("I am waiting on the beach",
                  "He is walking on the beach")
```