php预览doc文件格式,word文档转html格式在线预览,使用了--688IT编程网

php预览doc⽂件格式,word⽂档转html格式在线预览，使⽤了

phpoffice,p。。。

最近客户要做⼀个word,excel ⽂件在线预览功能，以下是实现此功能的全过程。

由于我们⽤的是PHP开发项⽬，最开始想到的是⽤PHPoffice⾥的phpword来进⾏转换，以下是关键代码。

$phpWord = \PhpOffice\PhpWord\IOFactory::load('test.doc');

$xmlWriter = \PhpOffice\PhpWord\IOFactory::createWriter($phpWord, "HTML");

$xmlWriter->save('test.html);

⽤这种⽅法转是可以转，但是转出来的html⽂件相对原⽂件，丢失了很多字，如果说样式和原⽂不⼀样还可以忍受，但是内容丢失，就不太好了，⽽且对DOC格式⼜⽆法处理，所以这种⽅法，我最终选择了放弃。

然后，我就想⽤python来解决这个问题，查到了python有个pydocx库可以处理word⽂档，于是我就安装了⼀下。

pip install pydocx

这个库⽤起来也很简单，主要代码如下：

from pydocx import PyDocX

html = _html("test2.doc")

f = open("test.html", 'w', encoding="utf-8")

f.write(html)

f.close()

转换效果也还可以，除了表格样式和原⽂有点不⼀样以外，内容倒是没丢失，但是有⼀个问题，这个库是转换docx的，对doc转换不了，我们客户还上传挺多doc格式的⽂件的，于是我只好另外想办法。

查资料发现java有个poi库可以⽤来对word⽂件进⾏转换， Apache POI是Apache软件基⾦会的开放源码函式库，POI提供API给Java程序对Microsoft Office格式档案读和写的功能。我想试⼀下，查资料半天，就开始写了，先Maven引⼊依赖：

org.apache.poi

poi

html代码转链接4.1.2

org.apache.poi

poi-ooxml

4.1.2

org.apache.poi

poi-scratchpad

4.1.2

fr.opensagres.xdocreport

fr.opensagres.verter.xhtml

2.0.2

cn.hutoolhutool-all5.4.3

以下是引⽤别⼈的可⽤代码：

import img.ImgUtil;

import fr.opensagres.verter.xhtml.Base64EmbedImgManager; import fr.opensagres.verter.xhtml.XHTMLConverter;

import fr.opensagres.verter.xhtml.XHTMLOptions;

import org.apache.poi.hwpf.HWPFDocument;

import org.apache.verter.WordToHtmlConverter;

import org.apache.poi.openxml4j.util.ZipSecureFile;

import org.apache.poi.xwpf.usermodel.XWPFDocument;

import org.w3c.dom.Document;

l.parsers.DocumentBuilderFactory;

l.parsers.ParserConfigurationException;

l.transform.OutputKeys;

l.transform.Transformer;

l.transform.TransformerException;

l.transform.TransformerFactory;

l.transform.dom.DOMSource;

l.transform.stream.StreamResult;

import java.awt.image.BufferedImage;

import java.io.*;

/**

* office转换⼯具测试

public class OfficeConvertUtil {

/**

* 将word2003转换为html⽂件 2017-2-27

* @param wordPath word⽂件路径

* @param wordName word⽂件名称⽆后缀

* @param suffix word⽂件后缀

* @throws IOException

* @throws TransformerException

* @throws ParserConfigurationException

public static String Word2003ToHtml(String wordPath, String wordName,

String suffix) throws IOException, TransformerException,

ParserConfigurationException {

String htmlPath = wordPath + File.separator + "html"

+ File.separator;

String htmlName = wordName + ".html";

final String imagePath = htmlPath + "image" + File.separator;

// 判断html⽂件是否存在，每次重新⽣成

File htmlFile = new File(htmlPath + htmlName);

// if (ists()) {

// AbsolutePath();

// }

// 原word⽂档

final String file = wordPath + File.separator + wordName + suffix;

InputStream input = new FileInputStream(new File(file));

HWPFDocument wordDocument = new HWPFDocument(input);

WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(

.newDocument());

wordToHtmlConverter.setPicturesManager((content, pictureType, suggestedName, widthInches, heightInches) -> { BufferedImage bufferedImage = Image(content);

String base64Img = Base64(bufferedImage, Extension());

// 带图⽚的word，则将图⽚转为base64编码，保存在⼀个页⾯中

StringBuilder sb = (new StringBuilder(base64Img.length() +

"data:;base64,".length()).append("data:;base64,").append(base64Img));

String();

});

// 解析word⽂档

wordToHtmlConverter.processDocument(wordDocument);

Document htmlDocument = Document();

// ⽣成html⽂件上级⽂件夹

File folder = new File(htmlPath);

if (!ists()) {

folder.mkdirs();

}

// ⽣成html⽂件地址

OutputStream outStream = new FileOutputStream(htmlFile);

DOMSource domSource = new DOMSource(htmlDocument);

StreamResult streamResult = new StreamResult(outStream);

TransformerFactory factory = wInstance();

Transformer serializer = wTransformer();

serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");

serializer.setOutputProperty(OutputKeys.INDENT, "yes");

serializer.setOutputProperty(OutputKeys.METHOD, "html");

outStream.close();

AbsolutePath();

}

/**

* 2007版本word转换成html 2017-2-27

* @param wordPath word⽂件路径

* @param wordName word⽂件名称⽆后缀

* @param suffix word⽂件后缀

* @return

* @throws IOException

public static String Word2007ToHtml(String wordPath, String wordName, String suffix) throws IOException {

ZipSecureFile.setMinInflateRatio(-1.0d);

String htmlPath = wordPath + File.separator + "html"

+ File.separator;

String htmlName = wordName + ".html";

String imagePath = htmlPath + "image" + File.separator;

// 判断html⽂件是否存在

File htmlFile = new File(htmlPath + htmlName);

// if (ists()) {

// AbsolutePath();

// }

// word⽂件

File wordFile = new File(wordPath + File.separator + wordName + suffix);

// 1) 加载word⽂档⽣成 XWPFDocument对象

InputStream in = new FileInputStream(wordFile);

XWPFDocument document = new XWPFDocument(in);

// 2) 解析 XHTML配置 (这⾥设置IURIResolver来设置图⽚存放的⽬录)

File imgFolder = new File(imagePath);

/ 带图⽚的word，则将图⽚转为base64编码，保存在⼀个页⾯中

XHTMLOptions options = ate().indent(4).setImageManager(new Base64EmbedImgManager());

// 3) 将 XWPFDocument转换成XHTML

// ⽣成html⽂件上级⽂件夹

File folder = new File(htmlPath);

if (!ists()) {

folder.mkdirs();

}

OutputStream out = new FileOutputStream(htmlFile);

AbsolutePath();

}

public static void main(String[] args) throws Exception {

System.out.println(Word2003ToHtml("D:\\tmp", "test", ".doc"));

System.out.println(Word2007ToHtml("D:\\tmp", "test2", ".docx"));

}

⽤java 倒是转换doc格式转的挺好的，但是转换docx格式的时候，样式全乱了，我查了半天POI的⽂档，⽹上也没有哪位⼤佬来解决这个样式乱的问题，于是我想⽤python来转docx ,java来转doc，但是⼜觉得太⿇烦。

在查了半天资料以后，我最终的解决办法如下。

还是回到了⽤php处理，但是不是⽤phpoffice来处理，⽽是⽤unocov进⾏转换，先装libreoffice

yum install libreoffice

然后装unocov

yum install unoconv

⽤以下命令就可以转换了

688IT编程网

php预览doc文件格式,word文档转html格式在线预览,使用了

发表评论

推荐文章

应用程序的安全检测方法、装置、电子设备和存储介质

nginx map用法正则

VBA之正则表达式(1)--基础篇

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

热门文章

m函数数字提取

jest断言方法大全

中兴ZXSEC US 管理员手册

keras系列(一):参数设置

Qt从QString中提取出数字

element input 金额千分位格式化

freemaker 参数解析正则

C#正则验证数字

form表单验证正则

scanf正则表达式用法

grafana value的正则表达式

Android平台浮点数运算应用

js-(JS正则表达式验证数字)

判断Python输入是否是整数,字符,或浮点数

c语言 sscanf 正则规则

从文本中提取数值技巧

js将整数转换成两位浮点数的方法

vue正则限制浮点数

8到20的结尾的正则

shell 正则表达式最后一行

最新文章

应用程序的安全检测方法、装置、电子设备和存储介质

VBA之正则表达式(1)--基础篇

代码编辑的辅助方法、装置及电子设备

SHELL查字符串中包含字符的命令

String方法中replace和replaceAll的区别详解(源码分析)

双字节符号正则

标签列表

688IT编程网

php预览doc文件格式,word文档转html格式在线预览,使用了

发表评论

推荐文章

应用程序的安全检测方法、装置、电子设备和存储介质

nginx map用法 正则

VBA之正则表达式(1)--基础篇

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

热门文章

m函数数字提取

jest断言方法大全

中兴ZXSEC US 管理员手册

keras系列(一):参数设置

Qt从QString中提取出数字

element input 金额千分位格式化

freemaker 参数解析正则

C#正则验证数字

form表单验证正则

scanf正则表达式用法

grafana value的正则表达式

Android平台浮点数运算应用

js-(JS正则表达式验证数字)

判断Python输入是否是整数,字符,或浮点数

c语言 sscanf 正则规则

从文本中提取数值技巧

js将整数转换成两位浮点数的方法

vue正则限制浮点数

8到20的结尾的正则

shell 正则表达式 最后一行

最新文章

应用程序的安全检测方法、装置、电子设备和存储介质

VBA之正则表达式(1)--基础篇

代码编辑的辅助方法、装置及电子设备

SHELL查字符串中包含字符的命令

String方法中replace和replaceAll的区别详解(源码分析)

双字节符号正则

标签列表

nginx map用法正则

shell 正则表达式最后一行