# MedicalReportGeneration **Repository Path**: Dane_work/MedicalReportGeneration ## Basic Information - **Project Name**: MedicalReportGeneration - **Description**: A Base Tensorflow Project for Medical Report Generation - **Primary Language**: Python - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-12-02 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## Medical Report Generation A base project for Medical Report Generation. ## Config - python 2.7 / tensorflow 1.8.0-1.11.0 - extra package: nltk, json, PIL, numpy ## DataDownload - IU X-Ray Dataset * The raw data is from [Open-i service of the National Library](https://openi.nlm.nih.gov/), it has many public datasets. * The proccessed data is on [Medical-Report/NLMCXR_png_pairs.zip](https://pan.baidu.com/s/1CwChGVu6HWFDN2Xsy_htTA)(提取码: vdb6), you should unzip it to dir 'data/NLMCXR_png_pairs/'. - PreTrained InceptionV3 model * The raw model is from [TensorflowSlim Image Classification Model Library](https://github.com/tensorflow/models/tree/master/research/slim) * The proccessed data is on [Medical-Report/pretrain_model.zip](https://pan.baidu.com/s/1CwChGVu6HWFDN2Xsy_htTA)(提取码: vdb6), you shold unzip it to dir 'data/pretrain_model/'. ## Train #### First, get post proccess data(I have done it) - get 'data/data_entry.json', it is the report sentences. - get 'data/train_split.json' and 'data/test_split.json', it is the ids for train/val/test. - get 'data/vocabulary.json', it is the vocabulary extracted from report. #### Second, get TFRecord files - get 'data/train.tfrecord' and 'data/test.tfrecord' ```shell $ python datasets.py ``` e.g. if you get tfrecord files, you must annotate the code for func 'get_train_tfrecord()' #### Third, go train - you can train directly. ```shell $ python train.py ``` - you can see the train process ```shell $ cd ./data $ tensorboard --logdir='summary' ## Demo - You could use two chest x-ray imgs to test ```shell $ python demo.py --img_frontal_path='./data/experiments/CXR1900_IM-0584-1001.png' --img_lateral_path='./data/experiments/CXR1900_IM-0584-2001.png' --model_path='./data/model/my-test-2500' ``` - example ![example2](data/experiments/CXR1900_IM-0584-1001.png) ```shell $ The generate report: no acute cardiopulmonary abnormality the lungs are clear there is no focal consolidation there is no focal consolidation there is no pneumothorax or pneumothorax ``` ## Framework #### Core Framework ![example](data/experiments/framework.png) e.g.Yuan Xue et.al-**Multimodal Recurrent Model with Attention for Automated Radiology Report Generation**, MICCAI 2018 ## Experments #### Metrics Results | | BLEU_1 | BLEU_2 | BLEU_3 | BLEU_4 | METEOR | ROUGE | CIDEr | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | CNN-RNN[10] | 0.3087 | 0.2018 | 0.1400 | 0.0986 | 0.1528 | 0.3208 | 0.3068 | | CNN-RNN-Att[11] | 0.3274 | 0.2155 | 0.11478 | 0.1036 | 0.1571 | 0.3184 | 0.3649 | | Hier-RNN[9] | 0.3426 | 0.2318 | 0.1602 | 0.1121 | 0.1583 | 0.3343 | 0.2755| | MRNA[6] | 0.3721 | 0.2445| 0.1729 | 0.1234 | 0.1647 | 0.3224 | 0.3054 | | Ours | 0.4431 | 0.3116 | 0.2137 | 0.1473 | 0.2004 | 0.3611 | 0.4128 | - CNN-RNN and CNN-RNN-Att are simple base models for image caption. - Hier-RNN is a base model for image description generation, because we have not bounding boxes, so we use visual features from CNN directly to decode the sentence word by word. - MRNA is a base model from the MICCAI 2018 paper[6], we use visual features from CNN to generate first sentence, then we concat visual features and semantic features(last sentence encoded from 1d-conv layers) to generate second-final sentence word by word. - Ours are is based on MRNA, but we improve it. e.g. I have only release code for hier rnn and MRNA because others are easy. #### Details I split train/test dataset as 2811/300, use Adam with initial learning rate is 1e-4 with 5 epoch for decay 0.9.Then I set generate max 8 sentence with max 50 words for a sentence. The word embedding size is 512 and RNN units is 512. The more details is on config.py #### IU X-Rat Datasets The raw images are 7470, but both has frontal_view and lateral_view is 3391*2. The raw report is 3927, but sentence num >= 4 is 3631, because the report sentence num between 4 and 8 occupy 90% above, so I set max sentence num = 8. Both has image-pairs and report(sentence num >= 4) is 3111. #### Result Between Normal and Abnormal Reports When I analyse the reports from datasets, I have found **Normal Reports : Abnormal Reports = 2.5 : 1, unbalanced**. My best result is(not release): | | BLEU_1 | BLEU_2 | BLEU_3 | BLEU_4 | METEOR | ROUGE | CIDEr | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | Total Test Data | 0.4431 | 0.3116 | 0.2137 | 0.1473 | 0.2004 | 0.3611 | 0.4128 | | Normal Test Data | 0.5130 | 0.3628 | 0.2615 | 0.1750 | 0.2313 | 0.3894 | 0.4478 | | Abnormal Test Data | 0.2984 | 0.1903 | 0.1274 | 0.0934 | 0.1289 | 0.2397 | 0.2641 | e.g. Total means the total Test Dataset, Normal means the normal report(no disease) of Test Dataset, Abnormal means the abnormal report(with disease or abnormality). ## Summary #### Process Now, I have summarize the process of my research of Medical Report Generation. - First, it is easy to contact this task with Image2Text Task, so I exploit the Image Captions methods to solve this task's problems, like CNN+RNN methods. - Second, I found that Image Captions method can solve the one sentence(short), but this task has many sentences. So I use Image Paragraph Description Generation methods, like CNN+Hierarchical RNN. - Next, I found the reports of this task has the Impression and Findings description, so I exploit QA + Hierarchical RNN method to solve this task's problems. - Finally, I found that language informations are more important than image infos because small scale dataset, interesting. #### Problems There are many challenges for this task, I refer to some points of **[1]**. - **Very Small Medical Data**, most medical datasets only with images and nearly without bounding boxes and reports, so it is very very overfit. - **Very Uncertainty Report Descriptions**, because different doctors have different style description for diagnosis report. - **More-Like Dense Caption Task not Story Generation**, we should ground the description sentence with relevant region. - **Unsuitable Metrics**, the BLEU for machine translation and CIDEr for captioning and so on are not suitable for this task. - **Impractical**, up to now, there are 4-5 papers **[5][6][7][8]**. public for this task, but to be honest, they are only for papers, they do not release code. #### Little Advice - If you want to research medical report generation, you could get more data, and you could focus on the **Semantic Information** not **Visual Information** when data is small. In VQA task, someones found that Language is more useful than Image. - You could use more stronger Language Model(BERT, ELMo or Transformer), maybe useful. ## References - [1][医学诊断报告生成论文综述](https://blog.csdn.net/wl1710582732/article/details/85345285) - [2][Tensorflow Model released im2text](https://github.com/tensorflow/models/tree/master/research/im2txt) - [3][MS COCO Caption Evaluation Tookit](https://github.com/tylin/coco-caption) - [4]**TieNet Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays**, Xiaosong Wang et at, CVPR 2018, NIH - [5]**On the Automatic Generation of Medical Imaging Reports**, Baoyu Jing et al, ACL 2018, CMU - [6]**Multimodal Recurrent Model with Attention for Automated Radiology Report Generation**, Yuan Xue, MICCAI 2018, PSU - [7]**Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation**, Christy Y. Li et al, NIPS 2018, CMU - [8]**Knowledge-Driven Encode, Retrieve, Paraphrase for Medical Image Report Generation**, Christy Y. Li et al, AAAI 2019, DU - [9]**A Hierarchical Approach for Generating Descriptive Image Paragraphs, Jonathan Krause** et al, CVPR 2017, Stanford - [10]**Show and Tell: A Neural Image Caption Generator**, Oriol Vinyals et al, CVPR 2015, Google - [11]**Show, Attend and Tell: Neural Image Caption Generation with Visual Attention**, Kelvin Xu et at, ICML 2015