ApachePOI实现word(docdocx)浏览器预览--688IT编程网

ApachePOI实现word（docdocx）浏览器预览⼀、环境准备

1.jdk：1.8

2.maven：

3.6

3.springboot：2.2.2

⼆、maven主要依赖

<groupId>org.apache.poi</groupId>

</dependency>

<groupId>org.apache.poi</groupId>

<artifactId>poi-scratchpad</artifactId>

</dependency>

<groupId>org.apache.poi</groupId>

<artifactId>poi-ooxml</artifactId>

</dependency>

<groupId>fr.opensagres.xdocreport</groupId>

<artifactId>xdocreport</artifactId>

</dependency>

<groupId>org.apache.poi</groupId>

<artifactId>poi-ooxml-schemas</artifactId>

</dependency>

<groupId>org.apache.poi</groupId>

<artifactId>ooxml-schemas</artifactId>

</dependency>

三、具体实现

1.docToHtml

@RequestMapping("/wordToHtml")

public void wordToHtml(HttpServletResponse response){

final String path = "C:\\usr\\local\\";

final String file = "5页.doc";

try{

InputStream input = new FileInputStream(path + file);

docToHtml(input, response);

}catch (Exception e){

e.printStackTrace();

}

下载apache}

public void docToHtml(InputStream input, HttpServletResponse response) throws Exception{

HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(input);

WordToHtmlConverter wordToHtmlConverter = new ImageConverter(

);

wordToHtmlConverter.processDocument(wordDocument);

Document htmlDocument = Document();

ByteArrayOutputStream outStream = new ByteArrayOutputStream();

DOMSource domSource = new DOMSource(htmlDocument);

StreamResult streamResult = new StreamResult(outStream);

TransformerFactory tf = wInstance();

Transformer serializer = tf.newTransformer();

serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");

serializer.setOutputProperty(OutputKeys.INDENT, "yes");

serializer.setOutputProperty(OutputKeys.METHOD, "html");

outStream.close();

/ 清空response

OutputStream toClient = new OutputStream());

response.setContentType("text/html");

response.setCharacterEncoding("UTF-8");

toClient.ByteArray());

toClient.flush();

toClient.close();

}

//图⽚处理

public class ImageConverter extends WordToHtmlConverter{

public ImageConverter(Document document) {

super(document);

}

@Override

protected void processImageWithoutPicturesManager(Element currentBlock, boolean inlined, Picture picture){ Element imgNode = OwnerDocument().createElement("img");

StringBuffer sb = new StringBuffer();

sb.MimeEncoder().RawContent()));

sb.insert(0, "data:" + MimeType() + ";base64,");

imgNode.setAttribute("src", sb.toString());

currentBlock.appendChild(imgNode);

}

预览效果：

2.docxToHtml

@RequestMapping("/wordToHtml")

public void wordToHtml(HttpServletResponse response){

final String path = "C:\\usr\\local\\";

final String file = "3.docx";

try{

InputStream input = new FileInputStream(path + file);

docxToHtml(input, response);

}catch (Exception e){

e.printStackTrace();

}

public void docxToHtml(InputStream inputStream, HttpServletResponse response) throws IOException { XWPFDocument docxDocument = new XWPFDocument(inputStream);

XHTMLOptions options = ate();

//图⽚转base64

options.setImageManager(new Base64EmbedImgManager());

// 转换htm1

ByteArrayOutputStream htmlStream = new ByteArrayOutputStream();

// 清空response

OutputStream toClient = new OutputStream());

response.setContentType("text/html");

response.setCharacterEncoding("UTF-8");

toClient.ByteArray());

toClient.flush();

toClient.close();

}

预览效果：

四、总结

1.主要⼏个maven包的依赖版本需要⼀致

2.⽂档需要标准的word⽂档，举个例⼦，从boss直聘上下载下来的简历不能预览，因为⾥⾯内容实际是html格式，会出现异常：Docment is really HTML File，需要把⽂件另存为标准word格式

3.不能直接修改⽂件后缀名预览，虽然office能打开，但是不是标准word格式，需要另存为你想要的格式(doc,docx)，否则会出现异常java.lang.IllegalArgumentException: The document is really a OOXML file

4.尝试过spire.doc，⽤的是免费版，⽂档超过三页不能预览，这⼀⽅⾯官⽹给出了解释，最终选定poi这个⽅案

688IT编程网

ApachePOI实现word(docdocx)浏览器预览

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符回溯引用和前后查匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式选择题

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

688IT编程网

ApachePOI实现word(docdocx)浏览器预览

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符 回溯引用和前后查 匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式 选择题

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

java正则表达式选择题

非零金额正则表达式

基本的元字符回溯引用和前后查匹配模式

java正则表达式选择题

非零金额正则表达式