Apacheatlas的⼊门教程
笔者最近参加了2场⼤数据技术开放活动,在技术分享的时候,发现,2场分享活动,有⼈不约⽽同的推荐了Apache atlas 组件,所以,就像介绍⼀下这个组件。Apache atlas 是个什么样的⼯具?它有哪些功能和作⽤?
其实,在本⼈之前的⽂章中有介绍,它是⼀个⽤在hadoop上的数据治理和元数据框架⼯具。它是基于hadoop平台上,能⽆缝对接hadoop 平台的组件。前端UI默认使⽤solr5,有丰富的rest API ,后端数据库可以是hive,hbase等。能导⼊不同格式的数据源,包括
hive,hbase等(传统数据库,暂不清楚)。
1.Apache atlas 安装
安装步骤,官⽹上⾯有,链接:
为⽅便操作,简单翻译⼀下步骤:
环境:
JDK8
MAVEN3.X
GIT
PYTHON2.7以上
(1)building atlas(构建atlas)
git clone /repos/asf/atlas.git atlas
cd atlas
export MAVEN_OPTS="-Xms2g -Xmx4g"
mvn clean -DskipTests install
注意:
服务器内存⾄少要4G。笔者升级了⼏次配置。这是笔者的截图:
⽂件很多,要下⽐较1-2个⼩时,中间可能也有fail。
(2)打包atlas
(机器上已经装有hbase和solr)
mvn clean -DskipTests package -Pdist
(机器上没有装hbase和solr,atlas⾃带hbase和solr)
mvn clean -DskipTests package -Pdist,embedded-hbase-solr 本⽂这⾥选了后⼀种。
(3)打包完,会在根⽬录下⽣成以下的包:
(4)安装atlas
tar -xzvf apache-atlas-${project.version}-
cd atlas-${project.version}
⽬前它会⾃动解压,这⼀步可以不要。
下载完成后,⽬录结构:
下载apache
其中,atlas_home/distro/target 下⾯,
apache-atlas-1.0.0-SNAPSHOT-bin 是其解压后的⽬录:
注意:接下来是配置步骤。先看完⿊体字,再接着看下⽂。
如果只是atlas默认配置启动,命令:
测试:
报错:
Error 401 Full authentication is required to access this resource
HTTP ERROR 401
Problem accessing /api/atlas/admin/version. Reason:
Full authentication is required to access this resource
原因:
没有权限,正确命令:
这样就成功了。
上⾯的启动,solr,hbase 是内嵌式的,solr端⼝是9838,跟独⽴安装的默认端⼝8983不⼀样。如果需要⾃定义配置,尤其是使⽤hbase做图库的存储后端(HBase as the Storage Backend for the Graph Repository),solr做图表库的索引后端(SOLR as the Indexing Backend for the Graph Repository),请看下⽂。
(5)配置项。conf/atlas-env.sh
cd /apache_atlas/atlas/distro/target/apache -atlas -1.0.0-SNAPSHOT-bin /apache -atlas -1.0.0-SNAPSHOT
bin/atlas_start.py
# The java implementation to use. If JAVA_HOME is not found we expect java and jar to be in path
#export JAVA_HOME=
# any additional java opts you want to set. This will apply to both client and server operations
#export ATLAS_OPTS=
# any additional java opts that you want to set for client only
#export ATLAS_CLIENT_OPTS=
# java heap size we want to set for the client. Default is 1024MB
#export ATLAS_CLIENT_HEAP=
# any additional opts you want to set for atlas service.
#export ATLAS_SERVER_OPTS=
# java heap size we want to set for the atlas server. Default is 1024MB
#export ATLAS_SERVER_HEAP=
# What is is considered as atlas home dir. Default is the base location of the installed software
#export ATLAS_HOME_DIR=
# Where log files are stored. Defatult is logs directory under the base install location
#export ATLAS_LOG_DIR=
# Where pid files are stored. Defatult is logs directory under the base install location
#export ATLAS_PID_DIR=
# Where do you want to expand the war file. By Default it is in /server/webapp dir under the base install dir. #export ATLAS_EXPANDED_WEBAPP_DIR=
如果/etc/profile没有配JAVA_HOME,需要配JAVA_HOME。
配置conf/atlas-application.properties:
#使⽤hbase tables
atlas.audit.hbase.tablename=apache_atlas_entity_audit
启动solr集:
cd solr/bin
./solr create -c vertex_index -d SOLR_CONF -shards #numShards -replicationFactor #replicationFactor
./solr create -c edge_index -d SOLR_CONF -shards #numShards -replicationFactor #replicationFactor
./solr create -c fulltext_index -d SOLR_CONF -shards #numShards -replicationFactor #replicationFactor
SOLR_CONF : l所在的⽬录,其实笔者之前也⼀直没有搞清楚。笔者这⾥是: /usr/local/solr-5.5.1
如果不知道要创建多少numShards ,可忽略,默认是1。笔者的配置如下:
cd /apache_atlas/atlas/distro/target/solr/bin
export SOLR_CONF=/usr/local/solr-5.5.1
./solr start -c-z localhost:2181-p8983
./solr create -c vertex_index -d $SOLR_CONF
./solr create -c edge_index -d $SOLR_CONF
./solr create -c fulltext_index -d $SOLR_CONF
启动solr集后,在atlas-application.properties中配置:
#
t=localhost:2181
启动hbase:
cd hbase/bin
./start-hbase.sh
启动atlas:
bin/atlas_start.py
atlas UI界⾯:
localhost:21000/
错误1:
java.io.FileNotFoundException: /apache_atlas/atlas/distro/target/server/webapp/atlas.war (No such file or directory) at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
ls.jar.Main.run(Main.java:307)
ls.jar.Main.main(Main.java:1288)
The Server is no longer running with pid 6353
configured for local hbase.
hbase started.
configured for local solr.
solr started.
setting up
starting atlas on host localhost
starting atlas on port 21000
这是atlas启动的路径不对导致。⽹上没有该解决⽅法。后来发现启动的路径不对,笔者这⾥,之前启动路径是: /apache_atlas/atlas/distro/target/
正确的启动路径是:
/apache_atlas/atlas/distro/target/apache-atlas-1.0.0-SNAPSHOT-bin/apache-atlas-1.0.0-SNAPSHOT/
错误2:
/apache_atlas/atlas/distro/target/logs 错误⽇志会有:
ERROR:
Collection 'vertex_index' already exists!
Checked collection existence using Collections API command:
localhost:9838/solr/admin/collections?action=list
这是重名collection冲突。命令:
jps
看看是否有多个jar进程。该进程是solr进程。
希望别⼈不要犯跟我⼀样的错误。
错误3:
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论