# HMEAE

**Repository Path**: thunlp/HMEAE

## Basic Information

- **Project Name**: HMEAE
- **Description**: Source code for EMNLP-IJCNLP 2019 paper "HMEAE: Hierarchical Modular Event Argument Extraction".
- **Primary Language**: Python
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-05-29
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Hierarchical Modular Event Argument Extraction
The code is an implementation of Hierarchical Modular Event Argument Extraction (EMNLP19 paper).

# Requirments
tensorflow-gpu==1.10

stanfordcorenlp (see https://github.com/Lynten/stanford-corenlp for detail)

numpy

tqdm

# Usage
To run this code, you need to:
1. put ```English``` folder of ACE05 dataset into ```./```, or you can modify path in ```constant.py```. (You can get ACE2005 dataset here: https://catalog.ldc.upenn.edu/LDC2006T06)
2. put stanford language model into ```./```, or you can modify path in ```constant.py```. (You can download here: https://stanfordnlp.github.io/CoreNLP/history.html)
3. put GloVe embedding file into ```./glove``` folder, or you can modify path in ```constant.py```. (You can download GloVe embedding here: https://nlp.stanford.edu/projects/glove/)
4. Run ```python train.py --gpu 0 --mode HMEAE``` to run with HMEAE model.  Run ```python train.py --gpu 0 --mode DMCNN``` to run with DMCNN model.

All parameters are in ```constant.py```, you can modify them as you wish.

# Dataset
Due to license limitation, we can't distribute datasets directly, please download the dataset by yourself. The download link is given in **Usage** part.

The code will automatically extract information of ACE2005 dataset and dumps them into json format(```train.json``` ,```dev.json``` and ```test.json```) into path ```ACE_DUMP``` in ```constant.py```. This is implented in class ```Extractor``` in ```utils.py```.

Each file is composed of a list, which elements are instances with following format:
```
{
    "tokens": XX,           #tokens of a sentence, a list with string elements
    "start": XX,            #starting offsets of the sentence in original files, an integer
    "end": XX,              #ending offsets of the sentence in original files, an integer
    "offsets":XX,           #offsets of each tokens, a list with tuple elements
    "trigger_tokens":XX,    #tokens of trigger words, a list with string elements
    "trigger_start":XX,     #start index of trigger words of tokens, an integer
    "trigger_end":XX,       #end index of trigger words of tokens, an integer
    "trigger_offsets":XX,   #offsets of trigger words, a list with tuple elements
    "event_type":XX,        #event type of tokens with given triggers, a string
    "file":XX,              #file name without suffix
    "dir":XX,               #dir name
    "entities":XX           #entitie in this sentencem, a list with entity elements
}
```


Each entity is a dictionary with following format:
```
{
    "token":XX,             #tokens of the entity, a list with string elements
    "role":XX,              #role of the entity when trigger is given, a string
    "offsets":XX,           #offsets of entity, a list with tuple elements
    "start":XX,             #start offset of entity, an integer
    "end":XX,               #snd offset of entity, an integer
    "idx_start":XX,         #start index in tokens, an integer
    "idx_end":XX            #end index in tokens, an integer
}
```

# Cite
If the codes help you, please cite our paper:

**HMEAE: Hierarchical Modular Event Argument Extraction.** *XiaoZhi Wang, Ziqi Wang, Xu Han, Zhiyuan Liu, Juanzi Li, Peng Li, Maosong Sun, Jie Zhou, Xiang Ren.* EMNLP-IJCNLP 2019.