java利用poi解析docx生成html--688IT编程网

java利⽤poi解析docx⽣成html

公司业务需要把world⽂档中编辑好的新闻（⽂字+图⽚）录⼊到CMS管理后台，⽣成⼀篇新闻发布。因为不能把图⽚直接复制粘贴到UEditor编辑器上，还要⼀个⼀个上传太⿇烦。所以这⾥做了⼀个上传docx⽂件解析后，直接返回html正⽂放到前端编辑器继续编辑。功能要求：

1.图⽚要下载到服务器指定位置，并把前端请求图⽚地址拼接到img标签的src上。

2.图⽚⽂字要按照顺序排列。

3.过滤掉超链接、其他图形等⼀般新闻不⽤的元素。

实现：

1. maven最⼩依赖，3.17版本⽀持jdk1.6及以上。4版本需要jdk1.8及以上⽀持了

<groupId>org.apache.poi</groupId>

</dependency>

java修改html文件

<groupId>org.apache.poi</groupId>

<artifactId>poi-ooxml</artifactId>

</dependency>

<groupId>org.apache.poi</groupId>

<artifactId>poi-ooxml-schemas</artifactId>

</dependency>

2.代码实现

import java.io.FileInputStream;

import java.io.FileOutputStream;

import java.io.InputStream;

import java.util.List;

import org.apache.poi.xwpf.usermodel.XWPFDocument;

import org.apache.poi.xwpf.usermodel.XWPFParagraph;

import org.apache.poi.xwpf.usermodel.XWPFPictureData;

import org.apache.poi.xwpf.usermodel.XWPFRun;

import lbeans.XmlCursor;

import lbeans.XmlObject;

import org.openxmlformats.schemas.drawingml.x2006.main.CTGraphicalObject;

import org.openxmlformats.schemas.drawingml.x2006.picture.CTPicture;

import org.openxmlformats.schemas.drawingml.x2006.wordprocessingDrawing.CTInline;

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTDrawing;

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTR;

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTText;

public class AnalyzeDocx {

public static void main(String[] args) throws Exception {

String content = analyzeDocx("e://abc.docx");

System.out.println(content);

}

public static String analyzeDocx(String path) throws Exception {

StringBuilder sb = new StringBuilder();

try (InputStream in = new FileInputStream(path); XWPFDocument xwpfDocument = new XWPFDocument(in);) {

List<XWPFParagraph> paragraphs = Paragraphs();

for (XWPFParagraph xwpfParagraph : paragraphs) {

List<XWPFRun> runs = Runs();

for (XWPFRun xwpfRun : runs) {

CTR ctr = CTR();

lText().contains("w:type=\"textWrapping\"")){

sb.append("<br>");//段内换⾏

continue;

}

XmlCursor newCursor = wCursor();

newCursor.selectPath("./*");

while (NextSelection()) {

XmlObject object = Object();

if (object instanceof CTText) {// ⽂字

CTText ctText = (CTText) object;

if (ctText.isSetSpace()) {

continue;// 先不⽀持超链接

}

String text = StringValue();

if (text != null && text.length() > 0) {

sb.append(text);

}

} else if (object instanceof CTDrawing) {// 图⽚1

CTDrawing drawing = (CTDrawing) object;

CTInline[] inlineArray = InlineArray();

for (CTInline ctInline : inlineArray) {

CTGraphicalObject graphic = Graphic();

XmlCursor newCursor2 = GraphicData().newCursor();

newCursor2.selectPath("./*");

while (NextSelection()) {

XmlObject object2 = Object();

if (object2 instanceof CTPicture) {

CTPicture picture = (org.openxmlformats.schemas.drawingml.x2006.picture.CTPicture) object2;

sb.append("<br>").append(

imgHtml(xwpfDocument, BlipFill().getBlip().getEmbed()))

.append("<br>");

}

sb.append("<br>");// 分段

}

} catch (Exception e) {

e.printStackTrace();

}

String();

}

private static String imgHtml(XWPFDocument xwpfDocument, String blipID) {

XWPFPictureData pictureData = PictureDataByID(blipID);

String imageName = FileName();

String newfilename = System.currentTimeMillis() + imageName;

byte[] bytev = Data();

try (FileOutputStream fos = new FileOutputStream("E:/" + newfilename);) {

fos.write(bytev);// 此处保存图⽚后，变成可访问的http然后⽤<img>标签包裹

} catch (Exception e) {

e.printStackTrace();

}

return "<img src='/rongmeitiapi/api/picture/find/image/20181107/d66ce5ffc18365a3dab1e46c484dfabb.jpeg'>"; }

}

688IT编程网

java利用poi解析docx生成html

发表评论

推荐文章

应用程序的安全检测方法、装置、电子设备和存储介质

nginx map用法正则

VBA之正则表达式(1)--基础篇

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

热门文章

一种任意人头与任意人体的3D结合方法

正则匹配c语言中8进制

fortran数据格式

python中文本转数字用的公式

gh 文本变数值

js判断输入是否为正整数、浮点数等数字的函数代码

qt浮点数正则表达式

QT正则表达式限制输入值

手机号码和电话号码的正则表达式

str转浮点-概述说明以及解释

英豪结尾的诗句

Java正则表达式:符合以特定字符串开头,以特定字符串结尾的所有结果

machinebuilder使用手册

ASP.NET网站建设基本常用代码

LCD显示实时时钟

经纬度正则表达式解析

前端科学计数法转数字

python正则表达式re之compile函数解析

pythonunittest之断言及示例

[lua]lua中匹配字符串小数

最新文章

nginx map用法正则

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

python中re.findall函数实例用法

nginx url表达式

nginx 正则匹配参数

标签列表

688IT编程网

java利用poi解析docx生成html

发表评论

推荐文章

应用程序的安全检测方法、装置、电子设备和存储介质

nginx map用法 正则

VBA之正则表达式(1)--基础篇

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

热门文章

一种任意人头与任意人体的3D结合方法

正则匹配c语言中8进制

fortran数据格式

python中文本转数字用的公式

gh 文本变数值

js判断输入是否为正整数、浮点数等数字的函数代码

qt浮点数正则表达式

QT正则表达式限制输入值

手机号码和电话号码的正则表达式

str转浮点-概述说明以及解释

英豪结尾的诗句

Java正则表达式:符合以特定字符串开头,以特定字符串结尾的所有结果

machinebuilder使用手册

ASP.NET网站建设基本常用代码

LCD显示实时时钟

经纬度正则表达式解析

前端科学计数法转数字

python正则表达式re之compile函数解析

pythonunittest之断言及示例

[lua]lua中匹配字符串小数

最新文章

nginx map用法 正则

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

python中re.findall函数实例用法

nginx url表达式

nginx 正则匹配参数

标签列表

nginx map用法正则

nginx map用法正则