# Handwritten Digits Recognition-MNIST

**Repository Path**: wangwang-xyz/handwritten-digits-recognition-mnist

## Basic Information

- **Project Name**: Handwritten Digits Recognition-MNIST
- **Description**: A simple model about handwritten digits recognition, mainly for learning of pytorch.
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 3
- **Forks**: 0
- **Created**: 2022-03-25
- **Last Updated**: 2024-04-17

## Categories & Tags

**Categories**: Uncategorized

**Tags**: PyTorch, mnist, CNN

## README

# README

---

*A simple neural network about recognition of handwritten digits build with pytorch, which accuracy reached 98.6%.*

---


## Files description

* **data/:** MNIST dataset
* **libs/:** Third party library
* **logs/:** Training log, open with TensorBoard
* **models/:** Model parameters for each stage of training
* **pics/:** Pictures used in this doc
* **raw_pic/:** Visualization of MNIST, open with TensorBoard
* **dataset_mnist.py:** Read and build local MNIST dataset (Not used)
* **network.py:** Build neural network model
* **train.py:** Training the model
* **eval.py:** Practical validation of the model using the ZED camera


## Training and Testing environment

* python 3.8.12
* cudatoolkit 11.3.1
* cudnn 8.2.1
* pytorch 1.10.0
* tensorboard 2.8.0
* opencv-python 4.5.4.60
* numpy 1.22.3
* pyzed 3.7


## Running

Run `train.py`, then `eval.py`. You can change the model in `network.py` or change parameters in `train.py`.
```shell
python train.py
python eval.py
```


## A Sample Tutorial

Here shows how to build and train a nn model by yourself with pytorch. Here are several key points needs you to implement.  
* Dataset : describe your dataset
* Network : build your NN model
* DataLoader : load data from dataset to model
* Loss function : calculate loss
* Optimizer : take an optimization strategy to backpropagate gradients and update model parameters
* Train and Eval : code to train and test your model


### Describe your dataset

Pytorch provides some common datasets see in [Datasets](https://pytorch.org/vision/stable/datasets.html). The specific tutorial can be found in that website. If you didn't find the dataset you need, you have to download it by your self. And then describe it in your code like `dataset_mnist.py`.  In this part, the point is to load the dataset and get data from it, details as follows. 

```python
from torch.utils.data import Dataset

class DataMNIST(Dataset):

    def __init__(self, rootdir, dataname, labelname, transform = None):
        # load dataset according to the path
        self.data = load(data_path)
        
    def __getitem__(self, idx):
        # Get a frame of data
        return self.data[idx]
    
    def __len__(self):
        # Get the size of dataset
        return self.data.shape
```


### Build your model

Pytorh provides some classic model and its pre-trained weights like ResNet in [here](https://pytorch.org/vision/stable/models.html). Besides, `torch.nn` also provides tools to build your own model quickly like the code in `network.py`. What you have to do is to set the layers you need to use and give a computational graph. The specific instructions of the network layer can be found in the [official documentation](https://pytorch.org/docs/stable/nn.html).
```python
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.CnnModel = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=5, stride=1),
            nn.ReLU(),
            nn.Conv2d(16, 32, kernel_size=3, stride=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Dropout(0.5),
            nn.Conv2d(32, 32, kernel_size=3, stride=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Dropout(0.5),
            nn.Flatten(),
            nn.Linear(512, 128),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(128, 10)
        )

    def forward(self, x):
        x = self.CnnModel(x)

        return x
```
Here I build a simple CNN model to recognize handwritten digits. The key is to clarify the calculation process to set the parameters of each layer. You can make a table as below to help you.  

![Network Table](pics/network.png)

### Training your model
Before training, you have to decide which you will use to train you model, CPU or GPU. You can set the train device by the following code.  
```python
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
```
Then you need to load the dataset you describe just a moment ago with `torch.utils.data.DataLoader`. At the same time, you can do some preprocessing on the data using [transform](https://pytorch.org/vision/stable/transforms.html) from `torchvision.transforms`. 
```python
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(0.1307, 0.3081)
])
# ...
# 数据加载
train_loader = DataLoader(train_set, batch_size=64, shuffle=True)
test_loader = DataLoader(test_set, batch_size=64)
```

After this, you can load your NN model here. In order to train your model, you need to calculate the loss and update the weights through [loss function](https://pytorch.org/docs/stable/nn.html) and [optimizer](https://pytorch.org/docs/stable/optim.html) provides by `torch.nn` and `torch.optim`. The specific choice needs to be determined according to the specific task. For the task here, as shown in the code below.  
```python
import torch
import torch.nn as nn
from network import MyModel

model = MyModel()
model = model.to(device)

# 定义损失函数
loss_fn = nn.CrossEntropyLoss(reduction='mean')
loss_fn = loss_fn.to(device)

# 定义优化器
learning_rate = 2e-3
optimizer = torch.optim.Adagrad(model.parameters(), lr=learning_rate)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[30, 80], gamma = 0.8)
```

Here the `scheduler` can help you automatically adjust the learning rate by the training epoch. We set three stage with a decreasing rate about 0.1~0.8 normally.

After this, we can finally train our model. Here is a simple template.  
```python
epoch = 100
for i in range(epoch):
    print("----------Epoch({}/{})----------".format(i+1, epoch))

    model.train()
    for data in train_loader:
        imgs, targets = data
        imgs = imgs.to(device)
        targets = targets.to(device)

        outputs = model(imgs)
        loss = loss_fn(outputs, targets)

        # 参数更新
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    # 调整学习率
    scheduler.step()

    # 测试
    model.eval()
    with torch.no_grad():
        for data in test_loader:
            imgs, targets = data
            imgs = imgs.to(device)
            targets = targets.to(device)

            outputs = model(imgs)
            loss = loss_fn(outputs, targets)
            total_test_loss = total_test_loss + loss

            acc = (outputs.argmax(1) == targets).sum()
            total_test_acc = total_test_acc + acc
            
    total_test_acc = 100*total_test_acc/test_set_size
    
    if (i+1) % 10 == 0:
        torch.save(model, "./model/model_{}.pth".format(i+1))
        print("Model Saved")
```

You can add outputs and adjust the model saving strategy according to the actual situation.

### Show off your training process
TensorBoard is a powerful tool used to visualize the data from training. Some function used in my code shown as below.  
```python
from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter("logs")
writer.add_scalar("test_loss", y, x)
writer.add_images("test_img", imgs, idx)
writer.add_graph(model, test)
writer.close()
```