# batch-demo
**Repository Path**: binbingeng/batch-demo
## Basic Information
- **Project Name**: batch-demo
- **Description**: springboot测试不同方式批量新增数据库操作
- **Primary Language**: Java
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-09-27
- **Last Updated**: 2024-10-08
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# 测试springboot不同方案批量插入mysql数据库30W条数据效率
## 1. 要求
1. 插入30W条数据
2. 按顺序插入,nid标识数据的顺序,要求数据顺序不乱
3. 速度要尽量快
## 2. 测试环境
java:1.8
mysql: 5.7
电脑: 笔记本
cpu: i7-11代 2.8GHz 4核8线程
内存:16G
存储:机械硬盘1T
数据准备:
为了测试,准备了两个实体,student和teacher,两个表字段完全相同
实体student采用mybatis,teacher采用mybatis-plus,尽量保证测试环境一致
```sql
CREATE TABLE `student` (
`id` int(10) NOT NULL AUTO_INCREMENT COMMENT '主键',
`nid` int(10) DEFAULT NULL COMMENT '序号',
`name` varchar(64) DEFAULT NULL COMMENT '姓名',
`address` varchar(255) DEFAULT NULL COMMENT '地址',
`create_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '创建时间',
`update_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=27482 DEFAULT CHARSET=utf8mb4;
```
```sql
CREATE TABLE `teacher` (
`id` int(10) NOT NULL AUTO_INCREMENT COMMENT '主键',
`nid` int(10) DEFAULT NULL COMMENT '序号',
`name` varchar(64) DEFAULT NULL COMMENT '姓名',
`address` varchar(255) DEFAULT NULL COMMENT '地址',
`create_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '创建时间',
`update_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=300001 DEFAULT CHARSET=utf8mb4;
```
数据库连接说明:
JDBC连接语句如下:
```yml
jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=utf8&zeroDateTimeBehavior=convertToNull&useSSL=true&serverTimezone=GMT%2B8&allowMultiQueries=true&rewriteBatchedStatements=true
```
`&allowMultiQueries=true&rewriteBatchedStatements=true`说明
1. &allowMultiQueries=true
作用:
1. 可以在sql语句后携带分号,实现多语句执行。
2. 可以执行批处理,同时发出多个SQL语句
2. &rewriteBatchedStatements=true
作用:
1. MySQL的JDBC连接的url中要加rewriteBatchedStatements参数,并保证5.1.13以上版本的驱动,才能实现高性能的批量插入
2. MySQL JDBC驱动在默认情况下会无视executeBatch()语句,把我们期望批量执行的一组sql语句拆散,一条一条地发给MySQL数据库,批量插入实际上是单条插入,直接造成较低的性能
3. 只有把rewriteBatchedStatements参数置为true, 驱动才会帮你批量执行SQL
4. 这个选项对INSERT/UPDATE/DELETE都有效
## 3. 测试
### 3.1 mybatis插入
#### 3.1.1 普通for循环
代码如下:
```java
for (Student student:studentList) {
studentMapper.insertStudent(student);
}
```
```xml
insert into student
id,
nid,
name,
address,
create_time,
update_time,
#{id},
#{nid},
#{name},
#{address},
#{createTime},
#{updateTime},
```
耗时如下:
6min插入18958条 30w条数据预计:94min
耗费时间太久了,不等测试完全插入了
#### 3.1.2 mybatis的批量插入
代码如下:
```java
private void insertByMybatisFor(List studentList) {
int batchSize = 200;
int length = studentList.size();
int num = (length + batchSize -1)/batchSize;
for (int i = 0; i < num; i++) {
int fromIndex = i * batchSize;
int toIndex = (i + 1) * batchSize < length ? (i + 1) * batchSize : length;
List partList = studentList.subList(fromIndex, toIndex);
studentMapper.insertBatch(partList);
}
}
```
```xml
insert into student (id, nid, `name`, address, create_time, update_time)
values
(#{item.id},#{item.nid},#{item.name},#{item.address},#{item.createTime},#{item.updateTime})
```
耗时如下:
57022ms 55533ms 57112ms 平均57s
#### 3.1.3 sqlsession单线程插入
代码如下:
```java
private void insertByMybatisSqlSessionSingleThread(List studentList) {
SqlSession sqlSession = null;
try {
// 打开批处理
sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH);
StudentMapper mapper = sqlSession.getMapper(StudentMapper.class);
int count = 0;
for (int i = 0, length = studentList.size(); i < length; i++) {
mapper.insertStudent(studentList.get(i));
//每200条提交一次防止内存溢出
if (i % 200 == 199) {
sqlSession.commit();
sqlSession.clearCache();
logger.info("开始插入任务" + (++count));
}
}
sqlSession.commit();
sqlSession.clearCache();
logger.info("开始插入任务" + (++count));
}catch (Exception e){
if (sqlSession != null) {
sqlSession.rollback();
}
e.printStackTrace();
}finally {
if (sqlSession != null) {
sqlSession.close();
}
}
}
```
耗时如下:
52910ms 53345ms 52688ms 平均53s
#### 3.1.4 sqlsession多线程插入
代码如下:
```java
private void insertByMybatisSqlSessionMultiThread(List studentList) {
ExecutorService executorService = Executors.newFixedThreadPool(10);
int batchSize = 200;
int length = studentList.size();
int num = (length + batchSize -1)/batchSize;
for (int i = 0; i < num; i++) {
int fromIndex = i * batchSize;
int toIndex = (i+1)*batchSize < length ? (i+1)*batchSize : length;
List newList = studentList.subList(fromIndex,toIndex);
int finalI = i;
executorService.execute(()->{
SqlSession sqlSession = null;
try {
sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false);
StudentMapper mapper = sqlSession.getMapper(StudentMapper.class);
for (int j = 0; j < newList.size(); j++) {
mapper.insertStudent(newList.get(j));
}
//mapper.insertBatch(newList);
sqlSession.commit();
sqlSession.clearCache();
logger.info("当前线程名称:"+Thread.currentThread().getName()+";当前执行第"+ finalI +"批任务");
}catch (Exception e){
if (sqlSession != null) {
sqlSession.rollback();
}
e.printStackTrace();
}finally {
if (sqlSession != null) {
sqlSession.close();
}
}
});
}
// 关闭线程池,不再接受新任务
executorService.shutdown();
try {
// 等待所有任务完成
// 无限期等待,直到所有任务完成
executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
logger.info("所有任务已完成");
} catch (InterruptedException e) {
// 处理中断异常
logger.error("等待过程中被中断"+e);
Thread.currentThread().interrupt(); // 重新设置中断状态
}
}
```
耗时如下:
12276ms 12722ms 12611ms 平均12.6s
#### 3.1.5 JDBC preparedstatement单线程插入
代码如下:
```java
private void insertByJdbcPrepareStatementSingleThread(List studentList) throws Exception {
Connection connection = null;
PreparedStatement statement = null;
try {
connection = dataSource.getConnection();
connection.setAutoCommit(false);
String sql = "insert into student(`nid`,`name`,`address`,`create_time` ) values(?,?,?,?) ";
statement = connection.prepareStatement(sql);
int count = 0;
for (int i = 0; i < studentList.size(); i++) {
Student student = studentList.get(i);
statement.setLong(1,student.getNid());
statement.setString(2, student.getName());
statement.setString(3, student.getAddress());
statement.setDate(4, new java.sql.Date(new Date().getTime()));
statement.addBatch();
//每200条提交一次防止内存溢出
if (i % 200 == 199) {
statement.executeBatch();
connection.commit();
statement.clearBatch();
logger.info("已完成第 " + (++count) + " 批次的插入任务"); // 在提交后增加计数器
}
}
statement.executeBatch();
connection.commit();
statement.clearBatch();
logger.info("已完成第 " + (++count) + " 批次的插入任务"); // 在提交后增加计数器
}catch (Exception e){
e.printStackTrace();
}finally {
if(connection!=null){
connection.close();
}
if(statement!=null){
statement.close();
}
}
}
```
耗时如下:
50234ms 51089ms 50134ms 平均50s
#### 3.1.6 JDBC preparedstatement多线程插入
代码如下:
```java
private void insertByJdbcPrepareStatementMultiThread(List studentList) throws Exception{
ExecutorService executorService = Executors.newFixedThreadPool(10);
int batchSize = 200;
int length = studentList.size();
int num = (length + batchSize -1)/batchSize;
for (int i = 0; i < num; i++) {
int fromIndex = i * batchSize;
int toIndex = (i+1)*batchSize < length ? (i+1)*batchSize : length;
List newList = studentList.subList(fromIndex,toIndex);
int finalI = i;
executorService.execute(()->{
Connection connection = null;
PreparedStatement statement = null;
try {
connection = dataSource.getConnection();
connection.setAutoCommit(false);
String sql = "insert into student(`nid`,`name`,`address`,`create_time` ) values(?,?,?,?) ";
statement = connection.prepareStatement(sql);
for (int j = 0; j < newList.size(); j++) {
Student student = newList.get(j);
statement.setLong(1,student.getNid());
statement.setString(2, student.getName());
statement.setString(3, student.getAddress());
statement.setDate(4, new java.sql.Date(new Date().getTime()));
statement.addBatch();
}
statement.executeBatch();
connection.commit();
statement.clearBatch();
logger.info("当前线程名称:"+Thread.currentThread().getName()+";当前执行第"+ finalI +"批任务");
}catch (Exception e){
e.printStackTrace();
}finally {
if(connection!=null){
try {
connection.close();
} catch (SQLException throwables) {
throwables.printStackTrace();
}
}
if(statement!=null){
try {
statement.close();
} catch (SQLException throwables) {
throwables.printStackTrace();
}
}
}
});
}
// 关闭线程池,不再接受新任务
executorService.shutdown();
try {
// 等待所有任务完成
// 无限期等待,直到所有任务完成
executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
logger.info("所有任务已完成");
} catch (InterruptedException e) {
// 处理中断异常
logger.error("等待过程中被中断"+e);
Thread.currentThread().interrupt(); // 重新设置中断状态
}
}
```
耗时如下:
12321ms 12585ms 12333ms 平均 12.5s
注意:该方法插入后数据顺序是乱的,和预计的顺序不一致
### 3.2 mybatis-plus插入
#### 3.2.1 普通批量插入
代码如下:
```java
private void insertByMybatisPlusBatch(List teacherList) {
this.saveBatch(teacherList);
}
```
耗时如下:
13946ms 14600ms 14132ms 平均 14s
#### 3.2.2 单线程分批批量插入
代码如下:
```java
private void insertByMybatisPlusBatchSingleThread(List teacherList) {
int batchSize = 200;
int length = teacherList.size();
int num = (length + batchSize -1)/batchSize;
for (int i = 0; i < num; i++) {
int fromIndex = i * batchSize;
int toIndex = (i + 1) * batchSize < length ? (i + 1) * batchSize : length;
List newList = teacherList.subList(fromIndex, toIndex);
this.saveBatch(newList);
logger.info("当前线程名称:"+Thread.currentThread().getName()+";当前执行第"+ i +"批任务");
}
}
```
耗时如下:
50234ms 51089ms 50134ms 平均50s
#### 3.2.3 多线程分批批量插入
代码如下:
```java
private void insertByMybatisPlusBatchMultiThread(List teacherList) {
ExecutorService executorService = Executors.newFixedThreadPool(10);
int batchSize = 200;
int length = teacherList.size();
int num = (length + batchSize -1)/batchSize;
for (int i = 0; i < num; i++) {
int fromIndex = i * batchSize;
int toIndex = (i + 1) * batchSize < length ? (i + 1) * batchSize : length;
List newList = teacherList.subList(fromIndex, toIndex);
int finalI = i;
executorService.execute(()->{
this.saveBatch(newList);
logger.info("当前线程名称:"+Thread.currentThread().getName()+";当前执行第"+ finalI +"批任务");
});
}
// 关闭线程池,不再接受新任务
executorService.shutdown();
try {
// 等待所有任务完成
// 无限期等待,直到所有任务完成
executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
logger.info("所有任务已完成");
} catch (InterruptedException e) {
// 处理中断异常
logger.error("等待过程中被中断"+e);
Thread.currentThread().interrupt(); // 重新设置中断状态
}
}
```
耗时如下:
12244ms 12598ms 12200ms 平均 12.3s
## 4. 总结
序号|分类|测试名称|时间1|时间2|时间3|平均时间|是否乱序
-|-|-|-|-|-|-|-
1|mybatis|普通for循环|94min|...|...|94min|否
2|mybatis|mybatis的批量插入|57022ms|55533ms|57112ms|57s|否
3|mybatis|sqlsession单线程插入|52910ms|53345ms|52688ms|53s|否
4|mybatis|sqlsession多线程插入|12276ms|12722ms|12611ms|12.6s|否
5|mybatis|JDBC preparedstatement单线程插入|50234ms|51089ms|50134ms|50s|否
6|mybatis|JDBC preparedstatement多线程插入|12321ms|12585ms|12333ms|12.5s|是
7|mybatis-plus|普通批量插入|13946ms|14600ms|14132ms|14s|否
8|mybatis-plus|单线程分批批量插入|49031ms|47977ms|49833ms|50s|否
9|mybatis-plus|多线程分批批量插入|12244ms|12598ms|12200ms|12.3s|否
## 5. 代码
代码已经上传