# batch-demo **Repository Path**: binbingeng/batch-demo ## Basic Information - **Project Name**: batch-demo - **Description**: springboot测试不同方式批量新增数据库操作 - **Primary Language**: Java - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-09-27 - **Last Updated**: 2024-10-08 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 测试springboot不同方案批量插入mysql数据库30W条数据效率 ## 1. 要求 1. 插入30W条数据 2. 按顺序插入,nid标识数据的顺序,要求数据顺序不乱 3. 速度要尽量快 ## 2. 测试环境 java:1.8 mysql: 5.7 电脑: 笔记本 cpu: i7-11代 2.8GHz 4核8线程 内存:16G 存储:机械硬盘1T 数据准备: 为了测试,准备了两个实体,student和teacher,两个表字段完全相同 实体student采用mybatis,teacher采用mybatis-plus,尽量保证测试环境一致 ```sql CREATE TABLE `student` ( `id` int(10) NOT NULL AUTO_INCREMENT COMMENT '主键', `nid` int(10) DEFAULT NULL COMMENT '序号', `name` varchar(64) DEFAULT NULL COMMENT '姓名', `address` varchar(255) DEFAULT NULL COMMENT '地址', `create_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '创建时间', `update_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=27482 DEFAULT CHARSET=utf8mb4; ``` ```sql CREATE TABLE `teacher` ( `id` int(10) NOT NULL AUTO_INCREMENT COMMENT '主键', `nid` int(10) DEFAULT NULL COMMENT '序号', `name` varchar(64) DEFAULT NULL COMMENT '姓名', `address` varchar(255) DEFAULT NULL COMMENT '地址', `create_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '创建时间', `update_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=300001 DEFAULT CHARSET=utf8mb4; ``` 数据库连接说明: JDBC连接语句如下: ```yml jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=utf8&zeroDateTimeBehavior=convertToNull&useSSL=true&serverTimezone=GMT%2B8&allowMultiQueries=true&rewriteBatchedStatements=true ``` `&allowMultiQueries=true&rewriteBatchedStatements=true`说明 1. &allowMultiQueries=true 作用: 1. 可以在sql语句后携带分号,实现多语句执行。 2. 可以执行批处理,同时发出多个SQL语句 2. &rewriteBatchedStatements=true 作用: 1. MySQL的JDBC连接的url中要加rewriteBatchedStatements参数,并保证5.1.13以上版本的驱动,才能实现高性能的批量插入 2. MySQL JDBC驱动在默认情况下会无视executeBatch()语句,把我们期望批量执行的一组sql语句拆散,一条一条地发给MySQL数据库,批量插入实际上是单条插入,直接造成较低的性能 3. 只有把rewriteBatchedStatements参数置为true, 驱动才会帮你批量执行SQL 4. 这个选项对INSERT/UPDATE/DELETE都有效 ## 3. 测试 ### 3.1 mybatis插入 #### 3.1.1 普通for循环 代码如下: ```java for (Student student:studentList) { studentMapper.insertStudent(student); } ``` ```xml insert into student id, nid, name, address, create_time, update_time, #{id}, #{nid}, #{name}, #{address}, #{createTime}, #{updateTime}, ``` 耗时如下: 6min插入18958条 30w条数据预计:94min 耗费时间太久了,不等测试完全插入了 #### 3.1.2 mybatis的批量插入 代码如下: ```java private void insertByMybatisFor(List studentList) { int batchSize = 200; int length = studentList.size(); int num = (length + batchSize -1)/batchSize; for (int i = 0; i < num; i++) { int fromIndex = i * batchSize; int toIndex = (i + 1) * batchSize < length ? (i + 1) * batchSize : length; List partList = studentList.subList(fromIndex, toIndex); studentMapper.insertBatch(partList); } } ``` ```xml insert into student (id, nid, `name`, address, create_time, update_time) values (#{item.id},#{item.nid},#{item.name},#{item.address},#{item.createTime},#{item.updateTime}) ``` 耗时如下: 57022ms 55533ms 57112ms 平均57s #### 3.1.3 sqlsession单线程插入 代码如下: ```java private void insertByMybatisSqlSessionSingleThread(List studentList) { SqlSession sqlSession = null; try { // 打开批处理 sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH); StudentMapper mapper = sqlSession.getMapper(StudentMapper.class); int count = 0; for (int i = 0, length = studentList.size(); i < length; i++) { mapper.insertStudent(studentList.get(i)); //每200条提交一次防止内存溢出 if (i % 200 == 199) { sqlSession.commit(); sqlSession.clearCache(); logger.info("开始插入任务" + (++count)); } } sqlSession.commit(); sqlSession.clearCache(); logger.info("开始插入任务" + (++count)); }catch (Exception e){ if (sqlSession != null) { sqlSession.rollback(); } e.printStackTrace(); }finally { if (sqlSession != null) { sqlSession.close(); } } } ``` 耗时如下: 52910ms 53345ms 52688ms 平均53s #### 3.1.4 sqlsession多线程插入 代码如下: ```java private void insertByMybatisSqlSessionMultiThread(List studentList) { ExecutorService executorService = Executors.newFixedThreadPool(10); int batchSize = 200; int length = studentList.size(); int num = (length + batchSize -1)/batchSize; for (int i = 0; i < num; i++) { int fromIndex = i * batchSize; int toIndex = (i+1)*batchSize < length ? (i+1)*batchSize : length; List newList = studentList.subList(fromIndex,toIndex); int finalI = i; executorService.execute(()->{ SqlSession sqlSession = null; try { sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false); StudentMapper mapper = sqlSession.getMapper(StudentMapper.class); for (int j = 0; j < newList.size(); j++) { mapper.insertStudent(newList.get(j)); } //mapper.insertBatch(newList); sqlSession.commit(); sqlSession.clearCache(); logger.info("当前线程名称:"+Thread.currentThread().getName()+";当前执行第"+ finalI +"批任务"); }catch (Exception e){ if (sqlSession != null) { sqlSession.rollback(); } e.printStackTrace(); }finally { if (sqlSession != null) { sqlSession.close(); } } }); } // 关闭线程池,不再接受新任务 executorService.shutdown(); try { // 等待所有任务完成 // 无限期等待,直到所有任务完成 executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS); logger.info("所有任务已完成"); } catch (InterruptedException e) { // 处理中断异常 logger.error("等待过程中被中断"+e); Thread.currentThread().interrupt(); // 重新设置中断状态 } } ``` 耗时如下: 12276ms 12722ms 12611ms 平均12.6s #### 3.1.5 JDBC preparedstatement单线程插入 代码如下: ```java private void insertByJdbcPrepareStatementSingleThread(List studentList) throws Exception { Connection connection = null; PreparedStatement statement = null; try { connection = dataSource.getConnection(); connection.setAutoCommit(false); String sql = "insert into student(`nid`,`name`,`address`,`create_time` ) values(?,?,?,?) "; statement = connection.prepareStatement(sql); int count = 0; for (int i = 0; i < studentList.size(); i++) { Student student = studentList.get(i); statement.setLong(1,student.getNid()); statement.setString(2, student.getName()); statement.setString(3, student.getAddress()); statement.setDate(4, new java.sql.Date(new Date().getTime())); statement.addBatch(); //每200条提交一次防止内存溢出 if (i % 200 == 199) { statement.executeBatch(); connection.commit(); statement.clearBatch(); logger.info("已完成第 " + (++count) + " 批次的插入任务"); // 在提交后增加计数器 } } statement.executeBatch(); connection.commit(); statement.clearBatch(); logger.info("已完成第 " + (++count) + " 批次的插入任务"); // 在提交后增加计数器 }catch (Exception e){ e.printStackTrace(); }finally { if(connection!=null){ connection.close(); } if(statement!=null){ statement.close(); } } } ``` 耗时如下: 50234ms 51089ms 50134ms 平均50s #### 3.1.6 JDBC preparedstatement多线程插入 代码如下: ```java private void insertByJdbcPrepareStatementMultiThread(List studentList) throws Exception{ ExecutorService executorService = Executors.newFixedThreadPool(10); int batchSize = 200; int length = studentList.size(); int num = (length + batchSize -1)/batchSize; for (int i = 0; i < num; i++) { int fromIndex = i * batchSize; int toIndex = (i+1)*batchSize < length ? (i+1)*batchSize : length; List newList = studentList.subList(fromIndex,toIndex); int finalI = i; executorService.execute(()->{ Connection connection = null; PreparedStatement statement = null; try { connection = dataSource.getConnection(); connection.setAutoCommit(false); String sql = "insert into student(`nid`,`name`,`address`,`create_time` ) values(?,?,?,?) "; statement = connection.prepareStatement(sql); for (int j = 0; j < newList.size(); j++) { Student student = newList.get(j); statement.setLong(1,student.getNid()); statement.setString(2, student.getName()); statement.setString(3, student.getAddress()); statement.setDate(4, new java.sql.Date(new Date().getTime())); statement.addBatch(); } statement.executeBatch(); connection.commit(); statement.clearBatch(); logger.info("当前线程名称:"+Thread.currentThread().getName()+";当前执行第"+ finalI +"批任务"); }catch (Exception e){ e.printStackTrace(); }finally { if(connection!=null){ try { connection.close(); } catch (SQLException throwables) { throwables.printStackTrace(); } } if(statement!=null){ try { statement.close(); } catch (SQLException throwables) { throwables.printStackTrace(); } } } }); } // 关闭线程池,不再接受新任务 executorService.shutdown(); try { // 等待所有任务完成 // 无限期等待,直到所有任务完成 executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS); logger.info("所有任务已完成"); } catch (InterruptedException e) { // 处理中断异常 logger.error("等待过程中被中断"+e); Thread.currentThread().interrupt(); // 重新设置中断状态 } } ``` 耗时如下: 12321ms 12585ms 12333ms 平均 12.5s 注意:该方法插入后数据顺序是乱的,和预计的顺序不一致 ### 3.2 mybatis-plus插入 #### 3.2.1 普通批量插入 代码如下: ```java private void insertByMybatisPlusBatch(List teacherList) { this.saveBatch(teacherList); } ``` 耗时如下: 13946ms 14600ms 14132ms 平均 14s #### 3.2.2 单线程分批批量插入 代码如下: ```java private void insertByMybatisPlusBatchSingleThread(List teacherList) { int batchSize = 200; int length = teacherList.size(); int num = (length + batchSize -1)/batchSize; for (int i = 0; i < num; i++) { int fromIndex = i * batchSize; int toIndex = (i + 1) * batchSize < length ? (i + 1) * batchSize : length; List newList = teacherList.subList(fromIndex, toIndex); this.saveBatch(newList); logger.info("当前线程名称:"+Thread.currentThread().getName()+";当前执行第"+ i +"批任务"); } } ``` 耗时如下: 50234ms 51089ms 50134ms 平均50s #### 3.2.3 多线程分批批量插入 代码如下: ```java private void insertByMybatisPlusBatchMultiThread(List teacherList) { ExecutorService executorService = Executors.newFixedThreadPool(10); int batchSize = 200; int length = teacherList.size(); int num = (length + batchSize -1)/batchSize; for (int i = 0; i < num; i++) { int fromIndex = i * batchSize; int toIndex = (i + 1) * batchSize < length ? (i + 1) * batchSize : length; List newList = teacherList.subList(fromIndex, toIndex); int finalI = i; executorService.execute(()->{ this.saveBatch(newList); logger.info("当前线程名称:"+Thread.currentThread().getName()+";当前执行第"+ finalI +"批任务"); }); } // 关闭线程池,不再接受新任务 executorService.shutdown(); try { // 等待所有任务完成 // 无限期等待,直到所有任务完成 executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS); logger.info("所有任务已完成"); } catch (InterruptedException e) { // 处理中断异常 logger.error("等待过程中被中断"+e); Thread.currentThread().interrupt(); // 重新设置中断状态 } } ``` 耗时如下: 12244ms 12598ms 12200ms 平均 12.3s ## 4. 总结 序号|分类|测试名称|时间1|时间2|时间3|平均时间|是否乱序 -|-|-|-|-|-|-|- 1|mybatis|普通for循环|94min|...|...|94min|否 2|mybatis|mybatis的批量插入|57022ms|55533ms|57112ms|57s|否 3|mybatis|sqlsession单线程插入|52910ms|53345ms|52688ms|53s|否 4|mybatis|sqlsession多线程插入|12276ms|12722ms|12611ms|12.6s|否 5|mybatis|JDBC preparedstatement单线程插入|50234ms|51089ms|50134ms|50s|否 6|mybatis|JDBC preparedstatement多线程插入|12321ms|12585ms|12333ms|12.5s|是 7|mybatis-plus|普通批量插入|13946ms|14600ms|14132ms|14s|否 8|mybatis-plus|单线程分批批量插入|49031ms|47977ms|49833ms|50s|否 9|mybatis-plus|多线程分批批量插入|12244ms|12598ms|12200ms|12.3s|否 ## 5. 代码 代码已经上传