# csx-bsf-archive **Repository Path**: robin-cloud/csx-bsf-archive ## Basic Information - **Project Name**: csx-bsf-archive - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2021-02-10 - **Last Updated**: 2023-11-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 归档程序(csx-bsf-archive) ## 开发背景 为解决业务系统数据库存在大表问题,采用时间分表和冷热分表方法,开发的ETL工具。 ## 代码结构 ``` / #根目录 ../csx-bsf-archive #归档程序 ../csx-bsf-archive-demo #程序demo ../pom.xml #父级POM文件 ../readme.md ``` ## 使用说明 1. 定时任务项目,添加依赖 ```java csx-bsf-archive 1.0.0-SNAPSHOT com.yh.csx ``` 2. 配置文件 application.yml 数据源配置 ```shell ##调度器,可以采用xxljob和spring ##如果采用spring,需要配置表的cron表达式 etl.config.scheduler.name=xxljob etl.config.datasource.basicDS.type = com.alibaba.druid.pool.DruidDataSource etl.config.datasource.basicDS.driver-class-name = com.mysql.cj.jdbc.Driver etl.config.datasource.basicDS.url = jdbc:mysql://10.252.193.28:3306/test?useUnicode=true&characterEncoding=UTF8&useSSL=false&allowMultiQueries=true&autoReconnect=true etl.config.datasource.basicDS.username = etl.config.datasource.basicDS.password = etl.config.datasource.basicDS.filters = stat etl.config.datasource.basicDS.max-active = 30 etl.config.datasource.basicDS.initial-size = 5 etl.config.datasource.basicDS.max-wait = 60000 etl.config.datasource.basicDS.time-between-eviction-runs-millis = 60000 etl.config.datasource.basicDS.min-evictable-idle-time-millis = 300000 etl.config.datasource.basicDS.validation-query = SELECT 'x' etl.config.datasource.basicDS.test-while-idle = true etl.config.datasource.basicDS.test-on-borrow = false etl.config.datasource.basicDS.test-on-return = false etl.config.datasource.basicDS.pool-prepared-statements = true etl.config.datasource.basicDS.max-open-prepared-statements = 20 etl.config.datasource.main0.type = com.alibaba.druid.pool.DruidDataSource etl.config.datasource.main0.driver-class-name = com.mysql.cj.jdbc.Driver etl.config.datasource.main0.url = jdbc:mysql://10.252.193.28:3306/test?useUnicode=true&characterEncoding=UTF8&useSSL=false&allowMultiQueries=true&autoReconnect=true etl.config.datasource.main0.username = etl.config.datasource.main0.password = etl.config.datasource.main0.filters = stat etl.config.datasource.main0.max-active = 30 etl.config.datasource.main0.initial-size = 5 etl.config.datasource.main0.max-wait = 60000 etl.config.datasource.main0.time-between-eviction-runs-millis = 60000 etl.config.datasource.main0.min-evictable-idle-time-millis = 300000 etl.config.datasource.main0.validation-query = SELECT 'x' etl.config.datasource.main0.test-while-idle = true etl.config.datasource.main0.test-on-borrow = false etl.config.datasource.main0.test-on-return = false etl.config.datasource.main0.pool-prepared-statements = true etl.config.datasource.main0.max-open-prepared-statements = 20 ``` 3. 数据表配置 在resource目录下etl增加表配置文件({表名}.yml),例如:md_product_region.yml 同步多个表,就创建多个表配置 ``` dataSourceKey: basicDS #原表数据源,配置第2部分数据源的键值 groupId: csx_basic_data_3-md_product_region #键值 outerSourceKey: main0 #目标数据源,配置第2部分数据源的键值 concurrent: false #是否支持并发处理 cron: "*/20 * * * * ?" #如果调度器采用spring(不支持分布式),需配置此项。默认采用xxljob分布式调度器 dbMapping: database: csx_basic_data_3 #源数据库名 table: md_product_region #源表名,(如果是原表已采用分表(异构一样),物理表名逗号隔开,targetTable配置为空) relationLayer: 0 #关联表,主表为小数字,从关联表为大数字;归档顺序为先从表后主表 targetTable: md_product_region #目标表名 targetPk: id: id ##主键字段 mapAll: true ##是否匹配左右字段 keepPeriod: 30day ##源数据表保留时长,例如1day,1year,2month etlPeriod: 1day ##处理数据时间段长度,1day,2hour splitType: monthly ##分表策略,solid,monthly,yearly,quartly splitSuffixFormat: YYYY_MM ## 分表后缀规则,例如生成表 md_product_region_2020_03 targetColumns: id: id product_code: product_code region_code : region_code regionalized_trade_names: regionalized_trade_names delivery_type: delivery_type big_piece_number: big_piece_number small_piece_number: small_piece_number break_number: break_number must_sale_flag: must_sale_flag origin_descript: origin_descript product_attribute: product_attribute package_num: package_num create_time: create_time update_time: update_time create_by: create_by update_by: update_by etlCondition: " where update_time<={} and update_time>={}" ## 处理条件 commitBatch: 300 # 批量提交的大小 ``` 4,配置XXLJOB A,创建执行器(与已有task的执行器的名字一样); B, 配置调度器 BEAN:etlHandler 5, BSF相关使用,请参阅https://gitee.com/yhcsx/csx-bsf-all