# huawei **Repository Path**: forechoni/huawei ## Basic Information - **Project Name**: huawei - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-08-17 - **Last Updated**: 2021-08-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## 训练 需要 model.py 文件 和 trainzwy_kera_dataset.py 两个文件 如果kera下载数据慢 手动下载cifar100 trainzwy_kera_dataset.py训练脚本里面没有添加 l2正则 trainzwy.py 有l2正则 使用本地cifar100数据集 model2.py 是 wideresnet trainzwy_cifar10.py cifar10的训练和测试 trainzwy_wider.py wideresnet 的训练和测试 trainzwy_simpleCNN.py 简单CNN 的训练和测试 evalzwy.py 只有vit的测试 evalzwy_cifar10 ....... ## 更改记录 学习率调大了一些 为了看是否能够快速收敛 默认参数 : batch_size 320 lr : 0.01 loss-> 4.6025 (2021年8月17日 18点47分) lr 0.0001 , batch_size 320, loss -> 0.08 左右 (2021年8月17日 21点04分) lr 0.0001 , data augment: 如下 batch_size 320, loss -> 0.04 acc -> 0.98 test_acc -> 0.26 (2021年8月18日 11点04分) 修改了数据源,使用tf.keras.dataset读取数据,仍然出现测试集效果不佳情况 (*排除数据集加载错误 *) ## 更多记录 初始第一个step的时候 会慢一点 大概需要55s 使用正则后 收敛更快 learning 需要小于0.0001 不然不收敛 现在的问题是测试集精度太低 [`排除数据集错误`,`排除代码错误`] ## 训练集上很好 但是测试集上不太行 (big problems) ### 猜测: 模型的泛化性不足,使用数据增强继续做 数据增强方案: 1. 30% 概率 水平翻转 2. 30% 概率 上下翻转 3. 30% 概率 改变对比度 数据增强后的实验结果: train_loss 从0.0* 变成了 0.5 准确度从95% 变成了 88% 但是很快就继续收敛下降了 结论: 数据增强后没有多大的帮助 , 测试集还是在0.26-0.27之间 问题依然扑朔迷离 ### 猜测模型错误 使用基本的cnn网络,能够在训练集上达到96%准确率(200epoch) 但是测试集上只有34% [**突破口**] ### dropout 问题 dropout 设为0 尝试后无效 ### 添加正则化后重新开始训练 没啥用 结论: vision transformer 泛化能力不足 没有新的解决方法 ```log epoch:0 step: 0 , loss: 7.1803 , time: 69.0155 acc: 0.0156 epoch:0 step: 100 , loss: 5.9652 , time: 0.1732 acc: 0.0781 Train ACC: 0.0676 epoch:1 step: 0 , loss: 5.6489 , time: 0.1760 acc: 0.1094 epoch:1 step: 100 , loss: 5.3926 , time: 0.1804 acc: 0.1656 Train ACC: 0.1454 epoch:2 step: 0 , loss: 5.1994 , time: 0.1847 acc: 0.1750 epoch:2 step: 100 , loss: 4.9337 , time: 0.1818 acc: 0.2094 Train ACC: 0.1895 Test ACC: 0.20131209935897437 epoch:3 step: 0 , loss: 4.8028 , time: 0.1785 acc: 0.2406 epoch:3 step: 100 , loss: 4.8753 , time: 0.1676 acc: 0.2344 Train ACC: 0.2215 epoch:4 step: 0 , loss: 4.5415 , time: 0.1900 acc: 0.2781 epoch:4 step: 100 , loss: 4.4391 , time: 0.2087 acc: 0.2687 Train ACC: 0.2499 epoch:5 step: 0 , loss: 4.4934 , time: 0.1923 acc: 0.2687 epoch:5 step: 100 , loss: 4.3482 , time: 0.1933 acc: 0.2750 Train ACC: 0.2790 epoch:6 step: 0 , loss: 4.2272 , time: 0.1972 acc: 0.2594 epoch:6 step: 100 , loss: 4.2390 , time: 0.1921 acc: 0.2594 Train ACC: 0.3079 epoch:7 step: 0 , loss: 4.1190 , time: 0.1942 acc: 0.2687 epoch:7 step: 100 , loss: 3.8457 , time: 0.1855 acc: 0.3469 Train ACC: 0.3352 epoch:8 step: 0 , loss: 3.7956 , time: 0.1780 acc: 0.3875 epoch:8 step: 100 , loss: 3.4633 , time: 0.1775 acc: 0.3906 Train ACC: 0.3645 epoch:9 step: 0 , loss: 3.5332 , time: 0.1792 acc: 0.4000 epoch:9 step: 100 , loss: 3.7680 , time: 0.1933 acc: 0.3625 Train ACC: 0.3975 epoch:10 step: 0 , loss: 3.4966 , time: 0.1922 acc: 0.4469 epoch:10 step: 100 , loss: 3.1195 , time: 0.1881 acc: 0.4594 Train ACC: 0.4357 epoch:11 step: 0 , loss: 3.3591 , time: 0.1767 acc: 0.3969 epoch:11 step: 100 , loss: 2.9134 , time: 0.1983 acc: 0.4781 Train ACC: 0.4758 epoch:12 step: 0 , loss: 2.6549 , time: 0.1948 acc: 0.5031 epoch:12 step: 100 , loss: 2.7153 , time: 0.1883 acc: 0.4969 Train ACC: 0.5168 Test ACC: 0.2684194711538462 epoch:13 step: 0 , loss: 2.3264 , time: 0.1828 acc: 0.5813 epoch:13 step: 100 , loss: 2.3182 , time: 0.1712 acc: 0.5625 Train ACC: 0.5580 epoch:14 step: 0 , loss: 2.1117 , time: 0.1772 acc: 0.6094 epoch:14 step: 100 , loss: 2.2182 , time: 0.1925 acc: 0.5719 Train ACC: 0.6098 epoch:15 step: 0 , loss: 2.0910 , time: 0.1889 acc: 0.6094 epoch:15 step: 100 , loss: 1.8225 , time: 0.1955 acc: 0.6562 Train ACC: 0.6523 epoch:16 step: 0 , loss: 1.7047 , time: 0.1704 acc: 0.6937 epoch:16 step: 100 , loss: 1.5847 , time: 0.1867 acc: 0.6844 Train ACC: 0.6926 epoch:17 step: 0 , loss: 1.5286 , time: 0.1957 acc: 0.7000 epoch:17 step: 100 , loss: 1.1516 , time: 0.1886 acc: 0.7906 Train ACC: 0.7372 epoch:18 step: 0 , loss: 1.0883 , time: 0.1818 acc: 0.7469 epoch:18 step: 100 , loss: 1.0458 , time: 0.1821 acc: 0.8031 Train ACC: 0.7759 epoch:19 step: 0 , loss: 1.1419 , time: 0.1906 acc: 0.7875 epoch:19 step: 100 , loss: 0.9972 , time: 0.1958 acc: 0.7937 Train ACC: 0.8092 epoch:20 step: 0 , loss: 0.9605 , time: 0.1911 acc: 0.7844 epoch:20 step: 100 , loss: 0.9493 , time: 0.1880 acc: 0.7937 Train ACC: 0.8365 epoch:21 step: 0 , loss: 0.8337 , time: 0.1888 acc: 0.8250 epoch:21 step: 100 , loss: 0.6202 , time: 0.1895 acc: 0.8750 Train ACC: 0.8554 epoch:22 step: 0 , loss: 0.6716 , time: 0.1811 acc: 0.8781 epoch:22 step: 100 , loss: 0.5549 , time: 0.1836 acc: 0.8906 Train ACC: 0.8753 Test ACC: 0.2751201923076923 epoch:23 step: 0 , loss: 0.6230 , time: 0.1847 acc: 0.8688 epoch:23 step: 100 , loss: 0.5463 , time: 0.1993 acc: 0.8875 Train ACC: 0.8920 epoch:24 step: 0 , loss: 0.5380 , time: 0.1812 acc: 0.8906 epoch:24 step: 100 , loss: 0.4676 , time: 0.1854 acc: 0.8906 Train ACC: 0.9029 epoch:25 step: 0 , loss: 0.3989 , time: 0.1907 acc: 0.9250 epoch:25 step: 100 , loss: 0.3460 , time: 0.1851 acc: 0.9313 Train ACC: 0.9114 epoch:26 step: 0 , loss: 0.4207 , time: 0.1882 acc: 0.9219 epoch:26 step: 100 , loss: 0.4548 , time: 0.1863 acc: 0.9031 Train ACC: 0.9181 epoch:27 step: 0 , loss: 0.4362 , time: 0.1959 acc: 0.9187 epoch:27 step: 100 , loss: 0.3911 , time: 0.1936 acc: 0.9125 Train ACC: 0.9302 epoch:28 step: 0 , loss: 0.3015 , time: 0.1980 acc: 0.9375 epoch:28 step: 100 , loss: 0.3148 , time: 0.1817 acc: 0.9469 Train ACC: 0.9289 epoch:29 step: 0 , loss: 0.3750 , time: 0.1919 acc: 0.9313 epoch:29 step: 100 , loss: 0.3677 , time: 0.1892 acc: 0.9219 Train ACC: 0.9327 epoch:30 step: 0 , loss: 0.2702 , time: 0.1922 acc: 0.9469 epoch:30 step: 100 , loss: 0.3102 , time: 0.1949 acc: 0.9437 Train ACC: 0.9393 epoch:31 step: 0 , loss: 0.2324 , time: 0.1861 acc: 0.9594 epoch:31 step: 100 , loss: 0.3269 , time: 0.1810 acc: 0.9375 Train ACC: 0.9396 epoch:32 step: 0 , loss: 0.2422 , time: 0.1869 acc: 0.9656 epoch:32 step: 100 , loss: 0.2504 , time: 0.1980 acc: 0.9406 Train ACC: 0.9416 Test ACC: 0.27770432692307695 epoch:33 step: 0 , loss: 0.2885 , time: 0.1891 acc: 0.9437 ``` (数据增强) 日志片段记录: ```log 0.265625 (320, 100) 6.940832614898682 ..... Test ACC: 0.2682091346153846 epoch:136 step: 0 , loss: 0.0777 , time: 0.1805 acc: 0.9750 ..... Train ACC: 0.9861 ``` ## 日志记录 同时测试和训练的前epoch日志 ```log epoch:0 step: 0 , loss: 4.7695 , time: 64.5342 acc: 0.0219 epoch:0 step: 10 , loss: 4.6901 , time: 0.1708 acc: 0.0094 ..... Train ACC: 0.0710 Test ACC: 0.12918669871794872 epoch:1 step: 0 , loss: 3.6730 , time: 0.1736 acc: 0.1344 epoch:1 step: 10 , loss: 3.8148 , time: 0.1826 acc: 0.1062 ...... Train ACC: 0.1438 Test ACC: 0.17610176282051282 epoch:2 step: 0 , loss: 3.4390 , time: 0.1759 acc: 0.1656 epoch:2 step: 10 , loss: 3.4548 , time: 0.1816 acc: 0.1594 ..... Train ACC: 0.1865 Test ACC: 0.1919871794871795 epoch:3 step: 0 , loss: 3.2281 , time: 0.1793 acc: 0.2500 epoch:3 step: 10 , loss: 3.2622 , time: 0.1821 acc: 0.2375 ......... Train ACC: 0.2168 Test ACC: 0.21852964743589742 epoch:4 step: 0 , loss: 2.9443 , time: 0.1875 acc: 0.3219 epoch:4 step: 10 , loss: 3.1436 , time: 0.1739 acc: 0.2313 ..... Train ACC: 0.2472 Test ACC: 0.2267528044871795 epoch:5 step: 0 , loss: 2.9846 , time: 0.1817 acc: 0.2719 epoch:5 step: 10 , loss: 3.0510 , time: 0.1842 acc: 0.2531 epoch:5 step: 20 , loss: 3.1031 , time: 0.1870 acc: 0.2531 ........... Train ACC: 0.2739 Test ACC: 0.24924879807692307 epoch:6 step: 0 , loss: 2.9099 , time: 0.1843 acc: 0.2469 epoch:6 step: 10 , loss: 2.8569 , time: 0.1822 acc: 0.3063 ..... Train ACC: 0.3004 Test ACC: 0.2511117788461538 epoch:7 step: 0 , loss: 2.6872 , time: 0.1824 acc: 0.3094 epoch:7 step: 10 , loss: 2.7026 , time: 0.1884 acc: 0.2938 epoch:7 step: 20 , loss: 2.7180 , time: 0.1862 acc: 0.3156 epoch:7 step: 30 , loss: 2.7588 , time: 0.1845 acc: 0.3000 epoch:7 step: 40 , loss: 2.6456 , time: 0.1828 acc: 0.3125 ..... Train ACC: 0.3268 Test ACC: 0.2510216346153846 epoch:8 step: 0 , loss: 2.7647 , time: 0.1874 acc: 0.3094 epoch:8 step: 10 , loss: 2.5486 , time: 0.1868 acc: 0.3406 epoch:8 step: 20 , loss: 2.3774 , time: 0.1831 acc: 0.3750 epoch:8 step: 30 , loss: 2.6036 , time: 0.1762 acc: 0.3375 epoch:8 step: 40 , loss: 2.4894 , time: 0.1858 acc: 0.3656 ..... Train ACC: 0.3592 Test ACC: 0.2631209935897436 epoch:9 step: 0 , loss: 2.3157 , time: 0.1783 acc: 0.3719 epoch:9 step: 10 , loss: 2.2943 , time: 0.1831 acc: 0.3812 epoch:9 step: 20 , loss: 2.3775 , time: 0.1860 acc: 0.3969 epoch:9 step: 30 , loss: 2.2903 , time: 0.1776 acc: 0.3969 epoch:9 step: 40 , loss: 2.3354 , time: 0.1874 acc: 0.3969 .... Train ACC: 0.3887 Test ACC: 0.2716145833333333 epoch:10 step: 0 , loss: 2.4460 , time: 0.1844 acc: 0.3844 epoch:10 step: 10 , loss: 2.1943 , time: 0.1867 acc: 0.4219 epoch:10 step: 20 , loss: 2.2457 , time: 0.1804 acc: 0.3875 epoch:10 step: 30 , loss: 2.3677 , time: 0.1811 acc: 0.3406 .... epoch:10 step: 150 , loss: 2.0803 , time: 0.1845 acc: 0.4594 Train ACC: 0.4252 Test ACC: 0.27377804487179486 epoch:11 step: 0 , loss: 2.0448 , time: 0.1811 acc: 0.4594 epoch:11 step: 10 , loss: 2.0000 , time: 0.1784 acc: 0.4969 .... Train ACC: 0.4627 Test ACC: 0.27366786858974357 epoch:12 step: 0 , loss: 1.8369 , time: 0.1872 acc: 0.5094 ...... Train ACC: 0.5027 Test ACC: 0.27141426282051284 epoch:13 step: 0 , loss: 1.6680 , time: 0.1832 acc: 0.5563 .... Train ACC: 0.5467 Test ACC: 0.26661658653846154 epoch:14 step: 0 , loss: 1.5953 , time: 0.1809 acc: 0.5750 .... Train ACC: 0.5899 Test ACC: 0.27665264423076924 epoch:15 step: 0 , loss: 1.4783 , time: 0.1879 acc: 0.5750 .... Train ACC: 0.6404 Test ACC: 0.2655749198717949 epoch:16 step: 0 , loss: 1.2717 , time: 0.1873 acc: 0.6281 epoch:16 step: 10 , loss: 1.3078 , time: 0.1835 acc: 0.6375 .... Train ACC: 0.6813 Test ACC: 0.26379206730769234 epoch:17 step: 0 , loss: 1.0564 , time: 0.1838 acc: 0.6844 epoch:17 step: 10 , loss: 1.0675 , time: 0.1804 acc: 0.6813 ..... Train ACC: 0.7233 Test ACC: 0.2655148237179487 ``` ## TODO LIST 1. 添加代码 (jj) 1.0 加载预训练 [√] 1.1 有准确率accuracy等相关指标 [√] 1.2 测试的指标 新的文件里面写 然后在data.py里面引用过去 (怎么方便怎么来吧) [√] 1.3 代码规范 [√] 1.4 日志记录=> 看是否收敛 (有其他方法也可以) [√] 1.5 调参 达到论文指标 [×] 2. 建github 然后日志同步到云 然后使用华为平台训练 (zwy) 建立github 已经完成 [√] 日志同步到云 阻塞 [x] 使用华为平台训练 正在下载pycharm [x] ## 原论文做法 1. adam beta1 = 0.9 , beta2 = 0.999 batchsize=4096 weight decay of 0.1 linear learning rate warmup and decay cifar 100 wideresnet 训练日志 ```log epoch:0 step: 0 , loss: 7.0200 , time: 30.0285 acc: 0.0125 epoch:0 step: 100 , loss: 6.0532 , time: 0.7581 acc: 0.1219 Train ACC: 0.0916 Test ACC: 0.013810483870967743 epoch:1 step: 0 , loss: 5.6457 , time: 0.7560 acc: 0.1375 epoch:1 step: 100 , loss: 5.2050 , time: 0.7910 acc: 0.1781 Train ACC: 0.1711 Test ACC: 0.010483870967741936 epoch:2 step: 0 , loss: 5.0004 , time: 0.7791 acc: 0.1844 epoch:2 step: 100 , loss: 4.6760 , time: 0.7796 acc: 0.2406 Train ACC: 0.2336 Test ACC: 0.01028225806451613 epoch:3 step: 0 , loss: 4.5619 , time: 0.7713 acc: 0.2406 epoch:3 step: 100 , loss: 4.3672 , time: 0.7697 acc: 0.2781 Train ACC: 0.2877 Test ACC: 0.014516129032258065 epoch:4 step: 0 , loss: 4.0893 , time: 0.7729 acc: 0.2812 epoch:4 step: 100 , loss: 3.8483 , time: 0.7692 acc: 0.3438 Train ACC: 0.3317 Test ACC: 0.010786290322580645 epoch:5 step: 0 , loss: 3.8983 , time: 0.7645 acc: 0.3250 epoch:5 step: 100 , loss: 3.5991 , time: 0.7725 acc: 0.3844 Train ACC: 0.3715 Test ACC: 0.010383064516129033 epoch:6 step: 0 , loss: 3.4419 , time: 0.7642 acc: 0.4000 epoch:6 step: 100 , loss: 3.2601 , time: 0.7793 acc: 0.4469 Train ACC: 0.4031 Test ACC: 0.01028225806451613 epoch:7 step: 0 , loss: 3.3598 , time: 0.7692 acc: 0.4250 ```