⾃然语⾔处理NLP学习笔记⼆:NLP实战-开源⼯具tensorflow与
jiagu使⽤
前⾔:
NLP⼯具有⼈推荐使⽤spacy,有⼈推荐使⽤tensorflow。
jiagu的中⽂分词是基于深度学习的⽅法的。看来甲⾻的分词还是⽐较先进的。分词⼀般有3种,字典的,统计学的,深度学习的。
另:需要⼀点python知识,⾃⾏复习。
1. 环境准备
经过各种折腾,总结如下:TensorFlow运⾏环境需要使⽤Python3.5或以上,建议Python3.7.3,tensorflow ⾄少1.6,建议1.14版本。OS上的C运⾏库Glibc版本⾄少2.23以上。
如果你已经准备好这些环境了,跳过此步。
1) Windows:
Python开发⼯具:pycharm 社区版
python在线工具菜鸟工具anaconda 集成了python编译⼯具的管理⼯具
2)Linux:
2.1下载
3.7安装包:
2.2 解压:
tar -xvJf Python-3.7.
2.3 安装依赖包:
yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gcc make
yum install libffi-devel -y (如不安装,会报ModuleNotFoundError: No module named '_ctypes错误)
2.4 编译安装:
./configure prefix=/usr/local/python3
make && make install
2.5 检查效果以及设置Python2,3共存::
[]# python2 -V
Python 2.7.5
[]# python -V
Python 2.7.5
[]# ln -s /usr/local/python3/bin/python3 /usr/bin/python
ln: ⽆法创建符号链接"/usr/bin/python": ⽂件已存在
[]# mv /usr/bin/python /usr/bin/python22
[]# ln -s /usr/local/python3/bin/python3 /usr/bin/python
[]# python -V
Python 3.7.3
[]# ll python*
lrwxrwxrwx. 1 root root 30 7⽉ 29 09:46 python -> /usr/local/python3/bin/python3
lrwxrwxrwx. 1 root root 9 12⽉ 13 2017 python2 -> python2.7
lrwxrwxrwx. 1 root root 7 12⽉ 13 2017 python22 -> python2
-rwxr-xr-x. 1 root root 7136 8⽉ 4 2017 python2.7
-rwxr-xr-x. 1 root root 1835 8⽉ 4 2017 python2.7-config
lrwxrwxrwx. 1 root root 16 3⽉ 8 2018 python2-config -> python2.7-config
lrwxrwxrwx. 1 root root 14 3⽉ 8 2018 python-config -> python2-config
2. 安装tensorflow/Jiagu
pip install tensorflow
或:
pip install --target=e:\tensorflow tensorflow
conda create -n tf --target=e:\tensorflow tensorflow
pip install jiagu
注:
1.linux下如果报不到pip命令,则需要先配置ln -s /usr/local/python3/bin/pip3 /usr/bin/pip
2.linux下直接安装Jiagu,会⾃动关联安装上tensorflow,因此可以直接执⾏Jiagu安
3. 默认国外源下载⾮常慢,使⽤国内的下载⽐较快
4. 卸载TensorFlow:pip uninstall tensorflow
3. 中⽂分词demo代码
功能:⽤户输⼊⼀段话,⼀键进⾏分词,关键词提取,⽂本摘要等。
vi nlpdemo.py
# -*- coding: UTF-8 -*-
import jiagu
#获取⽤户输⼊
text=input("请输⼊你要分词的内容:")
words = jiagu.seg(text) # 分词,可以⽤model选择分词模式,不填则默认,mmseg则使⽤mmseg算法。
print("---------------------分词结果----------------------")
print(words)
print("---------------------词性标注----------------------")
pos = jiagu.pos(words) # 词性标注
print(pos)
print("----------------------实体识别----------------------")
ner = (text) # 命名实体识别
print(ner)
print("----------------------关键词----------------------")
keywords = jiagu.keywords(text, 5) # 关键词
print(keywords)
print("----------------------知识抽取----------------------")
knowledge = jiagu.knowledge(text) # 知识抽取
print(knowledge)
print("----------------------摘要----------------------")
summarize = jiagu.summarize(text, 1) # 摘要
print(summarize)
print("----------------------知识图谱----------------------")
knowledge = jiagu.knowledge(text)
print(knowledge)
python nlpdemo.py
运⾏效果图:
-----------------------------------------------------------------------------------
FAQ:
1. 问题1:linux环境下启动报错`CXXABI_1.3.8' not found :
ImportError: /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /usr/local/python3/lib/python3.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so)
Failed to load the native TensorFlow runtime.
解决办法:
先检查⼀下: strings /usr/lib64/libstdc++.so.6|grep CXXABI
应该要有1.3.8,如果没有,则需要重新安装编译GLIBC_*新版本
注:从其它环境 copy⼀个libstdc++.so.6到指定环境下,可能这个报错不存在了,但会继续报错: version `GLIBC_2.23' not found
2. 问题2:ImportError: /lib64/libm.so.6: version `GLIBC_2.23' not found
(required by /usr/local/python3/lib/python3.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so)
解决办法:
先检查下:strings /usr/lib64/libstdc++.so.6 | grep GLIBC_
安装步骤:
mkdir build
cd build
../configure --prefix=/usr --disable-profile --enable-add-ons --with-headers=/usr/include --with-binutils=/usr/bin
make && make install
如果安装过程遇到乱七⼋糟的错,如下:
错误:编译过程如果报 /lib64/libc.so.6: version `GLIBC_2.14' not found,则需要下载GLIBC_2.17以上重新安装
如果这过程,编译的库失败导致出现系统连接不上等lib库⽆法使⽤问题,则需要回退
LD_PRELOAD=/lib64/libc-2.14.so rm /lib64/libc.so.6
LD_PRELOAD=/lib64/libc-2.14.so ln -s /lib64/libc-2.14.so /lib64/libc.so.6
如果还报错:ls: relocation error: /usr/lib64/libc.so.6: symbol _dl_starting_up, version GLIBC_PRIVATE
not defined in file ld-linux-x86-64.so.2 with link time reference
[root@ lib64]# ls
ls: relocation error: /usr/lib64/libc.so.6: symbol _dl_starting_up, version GLIBC_PRIVATE not defined in file ld-linux-x86-
64.so.2 with link time reference
[root@ lib64]# sln /usr/lib64/ld-2.17.so /usr/lib64/ld-linux-x86-64.so.2
[root@ lib64]# ls
如果还报错:/lib64/libm.so.6: invalid ELF header
[root@localhost lib64]# find . -name "libm-2.17.so"
find: error while loading shared libraries: /lib64/libm.so.6: invalid ELF header
则需要指向新编译的so⽂件
lrwxrwxrwx. 1 root root 12 6⽉ 5 2017 /usr/lib64/libm.so.6 -> libm-2.17.so
[root@localhost lib64]# ll /lib64/libm.so.6
l rwxrwxrwx. 1 root root 12 6⽉ 5 2017 /lib64/libm.so.6 -> libm-2.17.so
[root@localhost lib64]# ll libm-2.*
-rwxr-xr-x. 1 root root 141 7⽉ 29 11:48 libm-2.17.so
-rwxr-xr-x. 1 root root 141 7⽉ 31 12:41 libm-2.17.so.bak
-rwxr-xr-x. 1 root root 3571192 7⽉ 31 12:26 libm-2.23.so
[root@localhost lib64]# rm libm.so.6
r m:是否删除符号链接 "libm.so.6"?y
[root@localhost lib64]# sln /lib64/libm-2.23.so /lib64/libm.so.6
然后重新安装glibc-2.23版本即可。
检查glibc版本结果:
[root@ build]# ldd --version
ldd (GNU libc) 2.23
问题3:执⾏py脚本时报错:⾮法指令(吐核)
解决办法:
tensorflow版本太新⽽服务器太旧cpu识别不了导致。安装tensorflow版本是1.14。
需要减低到tensorflow 1.5版本。但向Jiagu开源⼈确认,⾄少得安装tensorflow 1.6版本。
问题4: 运⾏tensorflow报错SQLite 3.8.3 or later is required
如果报:ptions.ImproperlyConfigured: SQLite 3.8.3 or later is required (found 3.7.17). 说明环境上默认的3.7.17版本太低,需要安装新版
---------------------------------------------------------------------------
linux上已安装Python3.7,同时安装Python3.6版本的步骤:
1.获取安装包安装
2.ln -s /usr/local/python
3.6/bin/python3.6 /usr/bin/python3.6
ln -s /usr/local/python3.6/bin/pip3 /usr/bin/pip3.6
3. 安装 tensorflow 1.5
4.安装⽬录在:/usr/local/python3.6/lib/python3.6/site-packages
cd /usr/local/python3.6/lib/python3.6/site-packages/django/bin
5.新建并启动项⽬
python3.6 django-admin.py startproject jiaguweb
python3.6 manage.py runserver 127.0.0.1:8000
python3.6 manage.py runserver 183.232.65.76:8000
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论