# video-caffe

**Repository Path**: arnoldfychen/video-caffe

## Basic Information

- **Project Name**: video-caffe
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-06-03
- **Last Updated**: 2024-11-25

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Video-Caffe: Caffe with C3D implementation and video reader

[![Build Status](https://travis-ci.org/chuckcho/video-caffe.svg?branch=master)](https://travis-ci.org/chuckcho/video-caffe)

This is 3D convolution (C3D) and video reader implementation in the latest Caffe (Dec 2016). The original [Facebook C3D implementation](https://github.com/facebook/C3D/) is branched out from Caffe on July 17, 2014 with git commit [b80fc86](https://github.com/BVLC/caffe/tree/b80fc862952ba4e068cf74acc0823785ce1cc0e9), and has not been rebased with the original Caffe, hence missing out quite a few new features in the recent Caffe. I therefore pulled in C3D concept and an accompanying video reader and applied to the latest Caffe, and will try to rebase this repo with the upstream whenever there is a new important feature. This repo is rebased on [99bd997](https://github.com/BVLC/caffe/commit/99bd99795dcdf0b1d3086a8d67ab1782a8a08383), on Aug`21 2018.
Please reach [me](https://github.com/chuckcho) for any feedback or question.

Check out the [original Caffe readme](README-original.md) for Caffe-specific information.

## Branches

[`refactor` branch](https://github.com/chuckcho/video-caffe/tree/refactor) is a recent re-work, based on the [original Caffe](https://github.com/BVLC/caffe) and [Nd convolution and pooling with cuDNN PR](https://github.com/BVLC/caffe/pull/3983). This is a cleaner, less-hacky implementation of 3D convolution/pooling than the `master` branch, and is supposed to more stable than the `master` branch. So, feel free to try this branch. One missing feature in the `refactor` branch (yet) is the python wrapper.

## Requirements

In addition to [prerequisites for Caffe](http://caffe.berkeleyvision.org/installation.html#prerequisites), video-caffe depends on cuDNN. It is known to work with CuDNN verson 4 and 5, but it may need some efforts to build with v3.

* If you use "make" to build make sure `Makefile.config` point to the right paths for CUDA and CuDNN.
* If you use "cmake" to build, double-check `CUDNN_INCLUDE` and `CUDNN_LIBRARY` are correct. If not, you may want something like `cmake -DCUDNN_INCLUDE="/your/path/to/include" -DCUDNN_LIBRARY="/your/path/to/lib" ${video-caffe-root}`.

## Building video-caffe

Key steps to build video-caffe are:

1. `git clone git@github.com:chuckcho/video-caffe.git`
2. `cd video-caffe`
3. `mkdir build && cd build`
4. `cmake ..`
5. Make sure CUDA and CuDNN are detected and their paths are correct.
6. `make all -j8`
7. `make install`
8. (optional) `make runtest`

## Usage

Look at [`${video-caffe-root}/examples/c3d_ucf101/c3d_ucf101_train_test.prototxt`](examples/c3d_ucf101/c3d_ucf101_train_test.prototxt) for how 3D convolution and pooling are used. In a nutshell, use `NdConvolution` or `NdPooling` layer with `{kernel,stride,pad}_shape` that specifies 3D shapes in (L x H x W) where `L` is the temporal length (usually 16).
```
...
# ----- video/label input -----
layer {
  name: "data"
  type: "VideoData"
  top: "data"
  top: "label"
  video_data_param {
    source: "examples/c3d_ucf101/c3d_ucf101_train_split1.txt"
    batch_size: 50
    new_height: 128
    new_width: 171
    new_length: 16
    shuffle: true
  }
  include {
    phase: TRAIN
  }
  transform_param {
    crop_size: 112
    mirror: true
    mean_value: 90
    mean_value: 98
    mean_value: 102
  }
}
...
# ----- 1st group -----
layer {
  name: "conv1a"
  type: "NdConvolution"
  bottom: "data"
  top: "conv1a"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    kernel_shape { dim: 3 dim: 3 dim: 3 }
    stride_shape { dim: 1 dim: 1 dim: 1 }
    pad_shape    { dim: 1 dim: 1 dim: 1 }
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
...
layer {
  name: "pool1"
  type: "NdPooling"
  bottom: "conv1a"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_shape { dim: 1 dim: 2 dim: 2 }
    stride_shape { dim: 1 dim: 2 dim: 2 }
  }
}
...
```

## UCF-101 training demo

Scripts and training files for C3D training on UCF-101 are located in [examples/c3d_ucf101/](examples/c3d_ucf101/).
Steps to train C3D on UCF-101:

1. Download UCF-101 dataset from [UCF-101 website](http://crcv.ucf.edu/data/UCF101.php).
2. Unzip the dataset: e.g. `unrar x UCF101.rar`
3. (Optional) video reader works more stably with extracted frames than directly with video files. Extract frames from UCF-101 videos by revising and running a helper script, [`${video-caffe-root}/examples/c3d_ucf101/extract_UCF-101_frames.sh`](examples/c3d_ucf101/extract_UCF-101_frames.sh).
4. Change `${video-caffe-root}/examples/c3d_ucf101/c3d_ucf101_{train,test}_split1.txt` to correctly point to UCF-101 videos or directories that contain extracted frames.
5. Modify [`${video-caffe-root}/examples/c3d_ucf101/c3d_ucf101_train_test.prototxt`](examples/c3d_ucf101/c3d_ucf101_train_test.prototxt) to your taste or HW specification. Especially `batch_size` may need to be adjusted for the GPU memory.
6. Run training script: e.g. `cd ${video-caffe-root} && examples/c3d_ucf101/train_ucf101.sh` (optionally use `--gpu` to use multiple GPU's)
7. (Optional) Occasionally run [`${video-caffe-root}/tools/extra/plot_training_loss.sh`](tools/extra/plot_training_loss.sh) to get training loss / validation accuracy (top1/5) plot. It's pretty hacky, so look at the file to meet your need.
8. At 7 epochs of training, clip accuracy should be around 45%.

A typical training will yield the following loss and top-1 accuracy: ![iter-loss-accuracy plot](examples/c3d_ucf101/c3d_ucf101_train_loss_accuracy.png?raw=true "Iteration vs Training loss and top-1 accuracy")

## Pre-trained model

A pre-trained model is available ([downloadable link](https://www.dropbox.com/s/gglm2c67154nltr/c3d_ucf101_iter_20000.caffemodel?dl=0)) for UCF101 (trained from scratch), achieving top-1 accuracy of ~47%.

## To-do
1. Feature extractor script.
2. Python demo script that loads a video and classifies.
3. Convert Sport1M pre-trained model and make it available.

## License and Citation

Caffe is released under the [BSD 2-Clause license](https://github.com/BVLC/caffe/blob/master/LICENSE).