Delphi提取PDF⽂本实例
⽣成PDF的控件很多,但解析的不是太多,pdf Toolkit可以,但测试的第⼀个复杂的pdf就报告错误,并且汉字乱码,可能使⽤的版本或使⽤⽅法不对。
想起之前使⽤java调⽤的Apache名下的pdfBox库很好⽤,于是就⽤下载了pdfBox,使⽤Delphi来调⽤pdfBox解析pdf⽂本。
环境要求:java运⾏环境
pdfBox应⽤包:pdfbox-app-2.0.6.jar
这⾥使⽤了DOS命令⾏来解析,然后调⽤解析结果。
⾸先是执⾏DOS命令:
procedure CheckResult(b: Boolean);
begin
if not b then
createprocessa
raise Exception.Create(SysErrorMessage(GetLastError));
end;
function RunDOS(const CommandLine: string): string;
var
HRead, HWrite: THandle;
StartInfo: TStartupInfo;
ProceInfo: TProcessInformation;
b: Boolean;
sa: TSecurityAttributes;
inS: THandleStream;
sRet: TStrings;
begin
Result := '';
FillChar(sa, sizeof(sa), 0);
//设置允许继承,否则在NT和2000下⽆法取得输出结果
sa.nLength := sizeof(sa);
sa.bInheritHandle := True;
sa.lpSecurityDescriptor := nil;
b := CreatePipe(HRead, HWrite, @sa, 0);
CheckResult(b);
FillChar(StartInfo, SizeOf(StartInfo), 0);
StartInfo.cb := SizeOf(StartInfo);
StartInfo.wShowWindow := SW_HIDE;
//使⽤指定的句柄作为标准输⼊输出的⽂件句柄,使⽤指定的显⽰⽅式
StartInfo.dwFlags := STARTF_USESTDHANDLES or STARTF_USESHOWWINDOW;
StartInfo.hStdError := HWrite;
StartInfo.hStdInput := GetStdHandle(STD_INPUT_HANDLE); //HRead;
StartInfo.hStdOutput := HWrite;
b := CreateProcess(nil, //lpApplicationName: PChar
PChar(CommandLine), //lpCommandLine: PChar
nil, //lpProcessAttributes: PSecurityAttributes
nil, //lpThreadAttributes: PSecurityAttributes
True, //bInheritHandles: BOOL
CREATE_NEW_CONSOLE,
nil,
nil,
StartInfo,
ProceInfo);
CheckResult(b);
WaitForSingleObject(ProceInfo.hProcess, INFINITE);
inS := THandleStream.Create(HRead);
if inS.Size > 0 then
begin
sRet := TStringList.Create;
sRet.LoadFromStream(inS);
Result := sRet.Text;
sRet.Free;
end;
inS.Free;
CloseHandle(HRead);
CloseHandle(HWrite);
end;
然后调⽤显⽰:
function TfrmPDFTool.GetPDFText(sFile: string): string;
var
cmd:string;
pdfFilePath,pdfFileName,txtFileName:String;
begin
//java -jar pdfbox-app-2.0.6.jar ExtractText -encoding utf-8 e:\\temp\\test.pdf e:\\temp\\
pdfFilePath:=ExtractFilePath(sFile);
pdfFileName:=ExtractFileName(sFile);
txtFileName:=FAppPath+'Temp\'+pdfFileName+'.txt';
cmd:='java -jar '+FAppPath+'PDFBox\pdfbox-app-2.0.6.jar ExtractText '
+' -encoding utf-8 '+sFile
+' '+txtFileName;
AddLog(cmd);
Result:=RunDOS(cmd);
AddLog(Result);
memTxtFile.Lines.LoadFromFile(txtFileName,TUTF8Encoding.Create);
FPDFText:=memTxtFile.Text;
AddLog(FPDFText);
end;
OK,⼤功告成!
以上这篇Delphi提取PDF⽂本实例就是⼩编分享给⼤家的全部内容了,希望能给⼤家⼀个参考,也希望⼤家多多⽀持。

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。