ApachePOI实现word(docdocx)浏览器预览⼀、环境准备
1.jdk:1.8
2.maven:
3.6
3.springboot:2.2.2
⼆、maven主要依赖
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>4.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-scratchpad</artifactId>
<version>4.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>4.1.0</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>xdocreport</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml-schemas</artifactId>
<version>4.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>ooxml-schemas</artifactId>
<version>1.4</version>
</dependency>
三、具体实现
1.docToHtml
@RequestMapping("/wordToHtml")
public void wordToHtml(HttpServletResponse response){
final String path = "C:\\usr\\local\\";
final String file = "5页.doc";
try{
InputStream input = new FileInputStream(path + file);
docToHtml(input, response);
}catch (Exception e){
e.printStackTrace();
}
下载apache}
public void docToHtml(InputStream input, HttpServletResponse response) throws Exception{
HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(input);
WordToHtmlConverter wordToHtmlConverter = new ImageConverter(
);
wordToHtmlConverter.processDocument(wordDocument);
Document htmlDocument = Document();
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(outStream);
TransformerFactory tf = wInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
outStream.close();
/
/ 清空response
OutputStream toClient = new OutputStream());
response.setContentType("text/html");
response.setCharacterEncoding("UTF-8");
toClient.ByteArray());
toClient.flush();
toClient.close();
}
//图⽚处理
public class ImageConverter extends WordToHtmlConverter{
public ImageConverter(Document document) {
super(document);
}
@Override
protected void processImageWithoutPicturesManager(Element currentBlock, boolean inlined, Picture picture){        Element imgNode = OwnerDocument().createElement("img");
StringBuffer sb = new StringBuffer();
sb.MimeEncoder().RawContent()));
sb.insert(0, "data:" + MimeType() + ";base64,");
imgNode.setAttribute("src", sb.toString());
currentBlock.appendChild(imgNode);
}
}
预览效果:
2.docxToHtml
@RequestMapping("/wordToHtml")
public void wordToHtml(HttpServletResponse response){
final String path = "C:\\usr\\local\\";
final String file = "3.docx";
try{
InputStream input = new FileInputStream(path + file);
docxToHtml(input, response);
}catch (Exception e){
e.printStackTrace();
}
}
public void docxToHtml(InputStream inputStream, HttpServletResponse response) throws IOException {        XWPFDocument docxDocument = new XWPFDocument(inputStream);
XHTMLOptions options = ate();
//图⽚转base64
options.setImageManager(new Base64EmbedImgManager());
// 转换htm1
ByteArrayOutputStream htmlStream = new ByteArrayOutputStream();
// 清空response
OutputStream toClient = new OutputStream());
response.setContentType("text/html");
response.setCharacterEncoding("UTF-8");
toClient.ByteArray());
toClient.flush();
toClient.close();
}
预览效果:
四、总结
1.主要⼏个maven包的依赖版本需要⼀致
2.⽂档需要标准的word⽂档,举个例⼦,从boss直聘上下载下来的简历不能预览,因为⾥⾯内容实际是html格式,会出现异常:Docment is really HTML File,需要把⽂件另存为标准word格式
3.不能直接修改⽂件后缀名预览,虽然office能打开,但是不是标准word格式,需要另存为你想要的格式(doc,docx),否则会出现异常java.lang.IllegalArgumentException: The document is really a OOXML file
4.尝试过spire.doc,⽤的是免费版,⽂档超过三页不能预览,这⼀⽅⾯官⽹给出了解释,最终选定poi这个⽅案

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。