通过DataX从Oracle同步数据到MySQL-安装配置过程
DataX
DataX 是阿⾥巴巴集团内被⼴泛使⽤的离线数据同步⼯具/平台,实现包括 MySQL、SQL Server、Oracle、PostgreSQL、HDFS、Hive、HBase、OTS、ODPS 等各种异构数据源之间⾼效的数据同步功能。
Features
DataX本⾝作为数据同步框架,将不同数据源的同步抽象为从源头数据源读取数据的Reader插件,以及向⽬标端写⼊数据的Writer插件,理论上DataX框架可以⽀持任意数据源类型的数据同步⼯作。同时DataX插件体系作为⼀套⽣态系统, 每接⼊⼀套新System Requirements
Linux
JDK(1.8以上,推荐1.8)
Python(推荐Python2.6.X)
Apache Maven 3.X(Compile DataX)
Quick Start
⼯具部署
⽅法⼀、直接下DataX⼯具包:,下载后解压⾄本地某个⽬录,进⼊bin⽬录,即可运⾏同步作业
$ cd  {YOUR_DATAX_HOME}/bin
$ python datax.py {YOUR_JOB.json}
⽅法⼆、下载DataX源码,⾃⼰编译:
①.安装JDK
tar xvf
mv jdk1.8.0_151
vim /etc/profile.d/jdk.sh
export JAVA_HOME=/usr/local/jdk1.8.0_151
export JAVA_BIN=$JAVA_HOME/bin
export PATH=$PATH:$JAVA_BIN
export CLASSPATH=$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
source /etc/profile.d/jdk.sh
检测安装是否成功
[root@oracle ~]# java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
②.查看Python版本,如果不满⾜则需要⾃⾏安装
[root@oracle ~]# python -V
Python 2.6.6
③.安装Maven
下载地址:
开始安装配置
tar xvf apache-maven-3.6.
mv apache-maven-3.6.1-bin.tar ../maven
vim /etc/profile
M2_HOME=/usr/local/maven
export PATH=${M2_HOME}/bin:/u01/mysql/bin:${PATH}
验证Maven是否安装成功
[root@oracle src]# mvn -v
Apache Maven 3.6.1 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)
Maven home: /usr/local/maven
Java version: 1.8.0_151, vendor: Oracle Corporation
Java home: /usr/local/jdk1.8.0_151/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-862.el7.x86_64", arch: "amd64", family: "unix"
系统需求已配置完成,开始源码安装DataX,下载⽅法⼆选其⼀
下载地址:
mysql下载的zip版本安装步骤git clone git@github:alibaba/DataX.git
开始源码安装
unzip DataX-master.zip
mv DataX-master ../
cd ../DataX-master
mvn -U clean package assembly:assembly -st.skip=true
该过程⾮常⾮常的漫长,需要等待,打包成功,最终显⽰如下
[INFO] BUILD SUCCESS
[INFO]-----------------------------------------------------------------
[INFO] Total time: 08:12min
[INFO] Finished at: 2015-12-13T16:26:48+08:00
[INFO] Final Memory: 133M/960M
[INFO]-----------------------------------------------------------------
打包成功后的DataX包位于{DataX_source_code_home}/target/datax/datax/,结构如下:
cd /usr/local/Datax-master
[root@oracle DataX-master]# ls -a ./target/datax/datax/
.  ..  bin  conf  job  lib  log  log_perf  plugin  script  tmp
配置⽰例:从stream读取数据并打印到控制台,
第⼀步。创建创业的配置⽂件(json格式),可以通过命令查看配置模板:python datax.py -r {YOUR_READER} -w {YOUR_WRITER} cd /usr/local/DataX-master/target/datax/datax/bin
./datax.py -r streamreader -w streamwriter
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
Please refer to the streamreader document:
github/alibaba/DataX/blob/master/streamreader/doc/streamreader.md
Please refer to the streamwriter document:
github/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md
Please save the following configuration as a json file and use
python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
to run the job.
{
"job": {
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"column": [],
"sliceRecordCount": ""
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"encoding": "",
"print": true
}
}
}
],
"setting": {
"speed": {
"channel": ""
}
}
}
}
根据模板配置⾃⼰所需的json,具体如下
{
"job": {
"content": [
{
"reader": {
"name": "oraclereader",
"parameter": {
"column": ["zybh","xmbh","jsbh","xmmc","sfje","fylx","bz","sjc","zllx"],
"connection": [
{
"jdbcUrl": ["jdbc:oracle:thin:@192.168.11.91:1521:orcl"],
"table": ["JDZLJCXM"]
}
],
"password": "jgzdwffz",
"username": "bjxxjgxt",
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": ["zybh","xmbh","jsbh","xmmc","sfje","fylx","bz","sjc","zllx"],
"connection": [
{
"jdbcUrl": "jdbc:mysql://192.168.11.75:3336/prison_practical_platform",
"table": ["JDZLJCXM"]
}
],
"password": "root",
"username": "root",
}
}
}
],
"setting": {
"speed": {
"channel": "5"
}
}
}
}
第⼆步。启动DataX
cd /usr/local/DataX-master/target/datax/datax/bin
./datax.py ../job/oracle11mysql8.json
同步结束,相关⽇志如下:
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
2019-05-2411:59:06.065[main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2019-05-2411:59:06.075[main] INFO  Engine - the machine info  =>
osInfo:    Oracle Corporation 1.825.161-b12
jvmInfo:    Linux amd64 3.10.0-862.el7.x86_64
cpu num:    1
totalPhysicalMemory:    -0.00G
freePhysicalMemory:    -0.00G
maxFileDescriptorCount:    -1
currentOpenFileDescriptorCount:    -1
GC Names    [Copy, MarkSweepCompact]
MEMORY_NAME                    | allocation_size                | init_size
Eden Space|273.06MB                      |273.06MB
Code Cache                    |240.00MB                      |2.44MB
Survivor Space|34.13MB                        |34.13MB
Compressed Class Space|1,024.00MB                    |0.00MB
Metaspace                      |-0.00MB                        |0.00MB
Tenured Gen                    |682.69MB                      |682.69MB
2019-05-2411:59:06.093[main] INFO  Engine -
{
"content":[
{
"reader":{
"name":"oraclereader",
"parameter":{
"column":[
"zybh",
"xmbh",
"jsbh",
"xmmc",
"sfje",
"fylx",
"bz",
"sjc",
"zllx"
],
"connection":[
{
"jdbcUrl":[
"jdbc:oracle:thin:@192.168.11.91:1521:orcl"
]
,
"table":[
"JDZLJCXM"
]
}
],
"password":"********",
"username":"bjxxjgxt"
}
},
"writer":{
"name":"mysqlwriter",
"parameter":{
"column":[
"zybh",
"xmbh",
"jsbh",
"xmmc",
"sfje",
"fylx",
"bz",
"sjc",
"zllx"
],
"connection":[
{
"jdbcUrl":"jdbc:mysql://192.168.11.75:3336/prison_practical_platform",
"table":[
"JDZLJCXM"
]
}
],
"password":"****",
"username":"root"
}
}
}
],
"setting":{
"speed":{
"channel":"5"
}
}
}
2019-05-2411:59:06.111[main] WARN  Engine - prioriy set to0, because NumberFormatException, the value is: null
2019-05-2411:59:06.121[main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2019-05-2411:59:06.121[main] INFO  JobContainer - DataX jobContainer starts job.
2019-05-2411:59:06.122[main] INFO  JobContainer -Set jobId =0
2019-05-2411:59:06.488[job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:oracle:thin:@192.168.11.91:1521:orcl.
2019-05-2411:59:06.605[job-0] INFO  OriginalConfPretreatmentUtil -table:[JDZLJCXM] has columns:[ZYBH,XMBH,JSBH,XMMC,SFJE,FYLX,BZ,SJC,ZLLX].
2019-05-2411:59:06.952[job-0] INFO  OriginalConfPretreatmentUtil -table:[JDZLJCXM]all columns:[
ZYBH,XMBH,JSBH,XMMC,SFJE,FYLX,BZ,SJC,ZLLX
].
2019-05-2409:48:44.768[job-0] INFO  OriginalConfPretreatmentUtil - Write data [
INSERT INTO %s (zybh,xmbh,jsbh,xmmc,sfje,fylx,bz,sjc,zllx) VALUES(?,?,?,?,?,?,?,?,?)
], which jdbcUrl like:[jdbc:mysql://192.168.11.75:3336/prison_practical_platform?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2019-05-2409:48:44.768[job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2019-05-2409:48:44.769[job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] do prepare work .
2019-05-2409:48:44.769[job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2019-05-2409:48:44.769[job-0] INFO  JobContainer - jobContainer starts to do split ...
2019-05-2409:48:44.769[job-0] INFO  JobContainer - Job set Channel-Number to5 channels.
2019-05-2409:48:44.772[job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] splits to[1] tasks.
2019-05-2409:48:44.772[job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to[1] tasks.
2019-05-2409:48:44.820[job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2019-05-2409:48:44.830[job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2019-05-2409:48:44.832[job-0] INFO  JobContainer - Running by standalone Mode.
2019-05-2409:48:44.866[taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for[1] tasks.
2019-05-2409:48:44.876[taskGroup-0] INFO  Channel - Channel set byte_speed_limit to-1, No bps activated.
2019-05-2409:48:44.877[taskGroup-0] INFO  Channel - Channel set record_speed_limit to-1, No tps activated.
2019-05-2409:48:44.898[taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1]is started
2019-05-2409:48:44.901[0-0-0-reader] INFO  CommonRdbmsReader$Task -Begin to read record by Sql: [select zybh,xmbh,jsbh,xmmc,sfje,fylx,bz,sjc,zllx from JDZLJCXM
] jdbcUrl:[jdbc:oracle:thin:@192.168.11.91:1521:orcl].
2019-05-2409:48:45.050[0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select zybh,xmbh,jsbh,xmmc,sfje,fylx,bz,sjc,zllx from JDZLJCXM
] jdbcUrl:[jdbc:oracle:thin:@192.168.11.91:1521:orcl].
2019-05-2409:48:46.419[taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0]is successed, used[1538]ms
2019-05-2409:48:46.420[taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2019-05-24 09:48:54.873 [job-0] INFO  StandAloneJobContainerCommunicator - Total 1554 records, 79079 bytes | Speed 7.72KB/s, 155 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.020s |  All Task WaitReaderTime 0.124s | P 2019-05-24 09:48:54.874 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2019-05-24 09:48:54.874 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2019-05-24 09:48:54.874 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] do post work.
2019-05-24 09:48:54.874 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2019-05-24 09:48:54.875 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /usr/local/DataX-master/target/datax/datax/hook
2019-05-24 09:48:54.875 [job-0] INFO  JobContainer -
[total cpu info] =>
averageCpu                    | maxDeltaCpu                    | minDeltaCpu
-1.00%                        | -1.00%                        | -1.00%
[total gc info] =>
NAME                | totalGCCount      | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        |
maxDeltaGCTime    | minDeltaGCTime
Copy                | 0                  | 0                  | 0                  | 0.000s            | 0.000s            | 0.000s
MarkSweepCompact    | 0                  | 0                  | 0                  | 0.000s            | 0.000s            | 0.000s
2019-05-24 09:48:54.875 [job-0] INFO  JobContainer - PerfTrace not enable!
2019-05-24 09:48:54.876 [job-0] INFO  StandAloneJobContainerCommunicator - Total 1554 records, 79079 bytes | Speed 7.72KB/s, 155 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.020s |  All Task WaitReaderTime 0.124s | P 2019-05-24 09:48:54.876 [job-0] INFO  JobContainer -
任务启动时刻                    : 2019-05-24 09:48:43
任务结束时刻                    : 2019-05-24 09:48:54
任务总计耗时                    :                11s
任务平均流量                    :            7.72KB/s
记录写⼊速度                    :            155rec/s
读出记录总数                    :                1554
读写失败总数                    :                  0
在配置过程中出现的问题,错误如下
[ERROR] Failed to execute goal on project otsstreamreader: Could not resolve dependencies for project com.alibaba.datax:otsstreamreader:jar:1.0.0-SNAPSHOT: Could not find artifact
com.aliyun.openservices:tablestore-streamclient:jar:1.0.0-SNAPSHOT ->[Help 1]
该错误是由于快照版本不⼀致,由于ots基本不会被⽤到,直接把l中的<!--  <module>otsreader</module> <module>otsstreamreader</module> -->注释掉
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-jar-plugin:2.4:jar (default-jar) on project ocswriter: Error assembling JAR: /Users/FengZhen/Desktop/Hadoop/DataX/源码/DataX/l isn't a file. -> [Help 1]
将<!--  <module>ocswriter</module> -->注释掉,重新打包即可。
参考来源:

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。