java利⽤poi解析docx⽣成html
公司业务需要把world⽂档中编辑好的新闻(⽂字+图⽚)录⼊到CMS管理后台,⽣成⼀篇新闻发布。因为不能把图⽚直接复制粘贴到UEditor编辑器上,还要⼀个⼀个上传太⿇烦。所以这⾥做了⼀个上传docx⽂件解析后,直接返回html正⽂放到前端编辑器继续编辑。功能要求:
1.图⽚要下载到服务器指定位置,并把前端请求图⽚地址拼接到img标签的src上。
2.图⽚⽂字要按照顺序排列。
3.过滤掉超链接、其他图形等⼀般新闻不⽤的元素。
实现:
1. maven最⼩依赖,3.17版本⽀持jdk1.6及以上。4版本需要jdk1.8及以上⽀持了
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.17</version>
</dependency>
<dependency>
java修改html文件<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.17</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml-schemas</artifactId>
<version>3.17</version>
</dependency>
2.代码实现
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.util.List;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFPictureData;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import lbeans.XmlCursor;
import lbeans.XmlObject;
import org.openxmlformats.schemas.drawingml.x2006.main.CTGraphicalObject;
import org.openxmlformats.schemas.drawingml.x2006.picture.CTPicture;
import org.openxmlformats.schemas.drawingml.x2006.wordprocessingDrawing.CTInline;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTDrawing;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTR;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTText;
public class AnalyzeDocx {
public static void main(String[] args) throws Exception {
String content = analyzeDocx("e://abc.docx");
System.out.println(content);
}
public static String analyzeDocx(String path) throws Exception {
StringBuilder sb = new StringBuilder();
try (InputStream in = new FileInputStream(path); XWPFDocument xwpfDocument = new XWPFDocument(in);) {
List<XWPFParagraph> paragraphs = Paragraphs();
for (XWPFParagraph xwpfParagraph : paragraphs) {
List<XWPFRun> runs = Runs();
for (XWPFRun xwpfRun : runs) {
CTR ctr = CTR();
CTR ctr = CTR();
lText().contains("w:type=\"textWrapping\"")){
sb.append("<br>");//段内换⾏
continue;
}
XmlCursor newCursor = wCursor();
newCursor.selectPath("./*");
while (NextSelection()) {
XmlObject object = Object();
if (object instanceof CTText) {// ⽂字
CTText ctText = (CTText) object;
if (ctText.isSetSpace()) {
continue;// 先不⽀持超链接
}
String text = StringValue();
if (text != null && text.length() > 0) {
sb.append(text);
}
} else if (object instanceof CTDrawing) {// 图⽚1
CTDrawing drawing = (CTDrawing) object;
CTInline[] inlineArray = InlineArray();
for (CTInline ctInline : inlineArray) {
CTGraphicalObject graphic = Graphic();
XmlCursor newCursor2 = GraphicData().newCursor();
newCursor2.selectPath("./*");
while (NextSelection()) {
XmlObject object2 = Object();
if (object2 instanceof CTPicture) {
CTPicture picture = (org.openxmlformats.schemas.drawingml.x2006.picture.CTPicture) object2;
sb.append("<br>").append(
imgHtml(xwpfDocument, BlipFill().getBlip().getEmbed()))
.append("<br>");
}
}
}
}
}
}
sb.append("<br>");// 分段
}
} catch (Exception e) {
e.printStackTrace();
}
String();
}
private static String imgHtml(XWPFDocument xwpfDocument, String blipID) {
XWPFPictureData pictureData = PictureDataByID(blipID);
String imageName = FileName();
String newfilename = System.currentTimeMillis() + imageName;
byte[] bytev = Data();
try (FileOutputStream fos = new FileOutputStream("E:/" + newfilename);) {
fos.write(bytev);// 此处保存图⽚后,变成可访问的http然后⽤<img>标签包裹
} catch (Exception e) {
e.printStackTrace();
}
return "<img src='/rongmeitiapi/api/picture/find/image/20181107/d66ce5ffc18365a3dab1e46c484dfabb.jpeg'>"; }
}
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论