# text2image

**Repository Path**: guoami/text2image

## Basic Information

- **Project Name**: text2image
- **Description**: Generating Images from Captions with Attention
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 1
- **Forks**: 0
- **Created**: 2021-07-16
- **Last Updated**: 2021-11-25

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

## Generating Images from Captions with Attention

Code for paper [Generating Images from Captions with Attention](http://arxiv.org/abs/1511.02793) by Elman Mansimov, Emilio Parisotto, Jimmy Ba and Ruslan Salakhutdinov; ICLR 2016.

We introduce a model that generates image blobs from natural language descriptions. The proposed model iteratively draws patches on a canvas, while attending to the relevant words in the description.

![theimage](https://pbs.twimg.com/media/CTfsgHYXIAEXEOb.png)

### Getting Started

The code is written in python. To use it you will need:

* Python 2.7
* Theano 0.7 (mostly tested using commit from June/July 2015)
* numpy and scipy
* h5py (HDF5 (>= 1.8.11))
* [skip-thoughts](https://github.com/ryankiros/skip-thoughts)

Before running the code make sure that you set floatX to float32 in Theano settings.

Additionally, depending on the tasks you will probably need to download these files by running:

```
wget http://www.cs.toronto.edu/~emansim/datasets/mnist.h5
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/train-images-32x32.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/train-images-56x56.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/train-captions.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/train-captions-len.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/train-cap2im.pkl
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/dev-images-32x32.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/dev-images-56x56.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/dev-captions.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/dev-captions-len.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/dev-cap2im.pkl
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/test-images-32x32.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/test-captions.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/test-captions-len.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/test-cap2im.pkl
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/gan.hdf5
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/dictionary.pkl
```

### MNIST with Captions

To train the model simply go to mnist-captions folder and run

```
python alignDraw.py models/mnist-captions.json
```

To generate 60x60 MNIST images from captions as specified in appendix of the paper run

```
python sample-captions.py --model models/mnist-captions.json --weights /path/to/trained-weights
```

**Note**: I have also provided implementation of simple draw model in files draw.py and sample.py

### Microsoft COCO

To train the model simply go to coco folder and run

```
python alignDraw.py models/coco-captions-32x32.json
```

To generate images from captions after training run

```
python sample-captions.py --model models/coco-captions-32x32.json --weights /path/to/trained-weights --dictionary dictionary.pkl --gan_path gan.hdf5 --skipthought_path /path/to/skipthoughts-folder
```

**Note**: I have been caught up with other non-research stuff, so I will add baseline model files like noAlignDraw and others during the week of Feb 29 - Mar 6.

Feel free to email me if you have some questions or if you are uncertain about some parts of the code.

### Acknowledgments

I would like to acknowledge the help of [Tom White](https://github.com/dribnet) for some suggestion on cleaning and organizing the code.

### Reference

If you found this code or our paper useful, please consider citing the following paper:

```
@inproceedings{mansimov16_text2image,
  author    = {Elman Mansimov and Emilio Parisotto and Jimmy Ba and Ruslan Salakhutdinov},
  title     = {Generating Images from Captions with Attention},
  booktitle = {ICLR},
  year      = {2016}
}
```

You would probably also need to cite some of the papers that we have referred to ;)