Python实现任意区域文字识别(OCR)操作--688IT编程网

Python实现任意区域⽂字识别（OCR）操作

本⽂的OCR当然不是⾃⼰从头开发的，是基于百度智能云提供的API（我感觉是百度在中国的⼈⼯智能领域值得称赞的⼀⼤贡献），其提供的API完全可以满⾜个⼈使⽤，相对来说简洁准确率⾼。

安装OCR Python SDK

OCR Python SDK⽬录结构

├── README.md

├── aip //SDK⽬录

│├── __init__.py //导出类

│├── base.py //aip基类

│├── http.py //http请求

│└── ocr.py //OCR

└── setup.py //setuptools安装

⽀持Python版本：2.7.+ ,3.+

安装使⽤Python SDK有如下⽅式：

如果已安装pip，执⾏pip install baidu-aip即可。

如果已安装setuptools，下载后执⾏python setup.py install即可。

代码实现

下⾯让我们来看⼀下代码实现。

主要使⽤的模块有

import os # 操作系统相关

import sys # 系统相关

import time # 时间获取

import signal # 系统信号

import winsound # 提⽰⾳

from aip import AipOcr # 百度OCR API

from PIL import ImageGrab # 捕获剪切板中的图⽚

import win32clipboard as wc # WINDOWS 剪切板操作

import win32con # 这⾥⽤于获取 WINDOWS 剪贴板数据的标准格式

第⼀步这⾥的APP_ID,API_KEY,SECRET_KEY是通过登陆百度智能云后⾃⼰在OCR板块申请的, 实现基本的OCR程序，可以通过图⽚获取⽂字。

""" 你的 APPID AK SK """

APP_ID = 'xxx'

API_KEY = 'xxx'

SECRET_KEY = 'xxx'

client = AipOcr(APP_ID, API_KEY, SECRET_KEY)

""" 读取图⽚ """

def get_file_content(filePath):

with open(filePath, 'rb') as fp:

ad()

""" 从API的返回字典中获取⽂字 """

def getOcrText(txt_dict):

txt = ""

if type(txt_dict) == dict:

for i in txt_dict['words_result']:

txt = txt + i["words"]

if len(i["words"]) < 25: # 这⾥使⽤字符串长度决定了⽂本是否换⾏，读者可以根据⾃⼰的喜好控制回车符的输出，实现可控的⽂本显⽰形式

txt = txt + "\n\n"

return txt

""" 调⽤通⽤/⾼精度⽂字识别, 图⽚参数为本地图⽚ """

def BaiduOcr(imageName,Accurate=True):

image = get_file_content(imageName)

if Accurate:

return getOcrText(client.basicGeneral(image))

else:

return getOcrText(client.basicAccurate(image))

""" 带参数调⽤通⽤⽂字识别, 图⽚参数为远程url图⽚ """

def BaiduOcrUrl(url):

return getOcrText(client.basicGeneralUrl(url))

第⼆步，实现快捷键获取⽂字，将识别⽂字放⼊剪切板中，提⽰⾳提醒以及快捷键退出程序""" 剪切板操作函数 """

def get_clipboard():

wc.OpenClipboard()

txt = wc.GetClipboardData(win32con.CF_UNICODETEXT)

wc.CloseClipboard()

return txt

def empty_clipboard():

wc.OpenClipboard()

wc.EmptyClipboard()

wc.CloseClipboard()

def set_clipboard(txt):

wc.OpenClipboard()

wc.EmptyClipboard()

wc.SetClipboardData(win32con.CF_UNICODETEXT, txt)

wc.CloseClipboard()

""" 截图后，调⽤通⽤/⾼精度⽂字识别"""

def BaiduOcrScreenshots(Accurate=True,path="./",ifauto=False):

if not ists(path):

os.makedirs(path)

image = abclipboard()

if image != None:

print("\rThe image has been obtained. Please wait a moment!",end=" ")

filename = str(time.time_ns())

image.save(path+filename+".png")

if Accurate:

txt = getOcrText(client.basicAccurate(get_file_content(path+filename+".png")))

else:

txt = getOcrText(client.basicGeneral(get_file_content(path+filename+".png")))

# f = open(os.path.abspath(path)+"\\"+filename+".txt",'w')

# f.write(txt)

set_clipboard(txt)

winsound.PlaySound('SystemAsterisk',winsound.SND_ASYNC)

# os.startfile(os.path.abspath(path)+"\\"+filename+".txt")

# empty_clipboard()

return txt

else :

if not ifauto:

print("Please get the screenshots by Shift+Win+S! ",end="")

return ""

else:

print("\rPlease get the screenshots by Shift+Win+S ! ",end="")

def sig_handler(signum, frame):

def removeTempFile(file = [".txt",".png"],path="./"):

if not ists(path):

os.makedirs(path)

pathDir = os.listdir(path)

for i in pathDir:

for j in file:

if j in i:

def AutoOcrFile(path="./",filetype=[".png",".jpg",".bmp"]):

if not ists(path):

os.makedirs(path)

pathDir = os.listdir(path)

for i in pathDir:

for j in filetype:

if j in i:

f = open(os.path.abspath(path)+"\\"+str(time.time_ns())+".txt",'w')

f.write(BaiduOcr(path+i))

break

def AutoOcrScreenshots():

signal.signal(signal.SIGINT, sig_handler)

signal.signal(signal.SIGTERM, sig_handler)

print("Waiting For Ctrl+C to exit ater removing all picture files and txt files!")

print("Please get the screenshots by Shift+Win+S !",end="")

while(1):

try:

BaiduOcrScreenshots(ifauto=True)

time.sleep(0.1)

except SystemExit:

removeTempFile()

python怎么读取py文件

break

else :

pass

finally:

pass

最终运⾏函数 AutoOcrScreenshots 函数便可以实现了：

if __name__ == '__main__':

AutoOcrScreenshots()

使⽤⽅法

使⽤ Windows 10 系统时，将以上代码放置在⼀个 .py ⽂件下，然后运⾏便可以使⽤Shift+Win+S快捷键实现任意区域截取，截取后图⽚将暂时存放在剪切板中，程序⾃动使⽤Windows API获取图⽚内容，之后使⽤百度的OCR API获取⽂字，并将⽂字放置在剪切版内存中后发出提⽰⾳。

使⽤者则可以在开启程序后，使⽤快捷键截图后静待提⽰⾳后使⽤Ctrl+V将⽂字内容放置在⾃⼰所需的位置。

补充：Python 中⽂OCR

有个需求，需要从⼀张图⽚中识别出中⽂，通过python来实现，这种这么⾼⼤上的⿊科技我们普通⼈⾃然搞不了，去github了⼀个似乎能满⾜需求的开源库-tesseract-ocr：

它⽀持中⽂OCR，并提供了⼀个命令⾏⼯具。python中对应的包是pytesseract. 通过这个⼯具我们可以识别图⽚上的⽂字。

笔者的开发环境如下：

macosx

python 3.6

brew

安装tesseract

brew install tesseract

安装python对应的包：pytesseract

pip install pytesseract

怎么⽤？

具体代码就⼏⾏:

#!/usr/bin/env python3

# -*- coding: utf-8 -*-

import pytesseract

from PIL import Image

# open image

image = Image.open('test.png')

code = pytesseract.image_to_string(image, lang='chi_sim')

print(code)

OCR速度⽐较慢，⼤家可以拿⼀张包含中⽂的图⽚试验⼀下。

以上为个⼈经验，希望能给⼤家⼀个参考，也希望⼤家多多⽀持。如有错误或未考虑完全的地⽅，望不吝赐教。

688IT编程网

Python实现任意区域文字识别(OCR)操作

发表评论

推荐文章

应用程序的安全检测方法、装置、电子设备和存储介质

nginx map用法正则

VBA之正则表达式(1)--基础篇

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

热门文章

一种任意人头与任意人体的3D结合方法

正则匹配c语言中8进制

fortran数据格式

python中文本转数字用的公式

gh 文本变数值

js判断输入是否为正整数、浮点数等数字的函数代码

qt浮点数正则表达式

QT正则表达式限制输入值

手机号码和电话号码的正则表达式

str转浮点-概述说明以及解释

英豪结尾的诗句

Java正则表达式:符合以特定字符串开头,以特定字符串结尾的所有结果

machinebuilder使用手册

ASP.NET网站建设基本常用代码

LCD显示实时时钟

经纬度正则表达式解析

前端科学计数法转数字

python正则表达式re之compile函数解析

pythonunittest之断言及示例

[lua]lua中匹配字符串小数

最新文章

nginx map用法正则

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

python中re.findall函数实例用法

nginx url表达式

nginx 正则匹配参数

标签列表

688IT编程网

Python实现任意区域文字识别(OCR)操作

发表评论

推荐文章

应用程序的安全检测方法、装置、电子设备和存储介质

nginx map用法 正则

VBA之正则表达式(1)--基础篇

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

热门文章

一种任意人头与任意人体的3D结合方法

正则匹配c语言中8进制

fortran数据格式

python中文本转数字用的公式

gh 文本变数值

js判断输入是否为正整数、浮点数等数字的函数代码

qt浮点数正则表达式

QT正则表达式限制输入值

手机号码和电话号码的正则表达式

str转浮点-概述说明以及解释

英豪结尾的诗句

Java正则表达式:符合以特定字符串开头,以特定字符串结尾的所有结果

machinebuilder使用手册

ASP.NET网站建设基本常用代码

LCD显示实时时钟

经纬度正则表达式解析

前端科学计数法转数字

python正则表达式re之compile函数解析

pythonunittest之断言及示例

[lua]lua中匹配字符串小数

最新文章

nginx map用法 正则

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

python中re.findall函数实例用法

nginx url表达式

nginx 正则匹配参数

标签列表

nginx map用法正则

nginx map用法正则