xslt - Convert docX to a custom XML -


i have been trying convert docx files xml have custom-made. users want data converted xml easier content query in web app , want input docx.

i have tried looking converter api in java none seem fit requirement. have looked docx4j realized converts html , pdf. thinking if there exists converter api can input, say, intermediate translator (xslt) , output custom xml complete data docx.

is there existing tool this? if there none, suggestions on approach have take in coding own converter e.g. openxml, convert xsl-fo first before custom xml?

would love hear community.

thank much.

docx4j can used convert openxml arbitrary xml via xslt.

assuming templates xslt , javax.xml.transform.stream.streamresult result, you'd this:

        wordprocessingmlpackage wordmlpackage = wordprocessingmlpackage.load(new java.io.file(inputfilepath));         maindocumentpart mdp = wordmlpackage.getmaindocumentpart();          // dom document input transform         org.w3c.dom.document doc = xmlutils.marshaltow3cdomdocument(                 mdp.getjaxbelement() );               xmlutils.transform(doc, xslt, null, result); 

however, if want transform xml, docx4j (and apache poi matter), overkill. use openxml4j directly.

whether conversion via xslt best approach though, depends on whether target xml document-oriented, or data-oriented.

if document-oriented, xslt approach.

if data-oriented, might want consider content control data-binding. (there approach, called customxml, i4i patent farce may make approach inadvisable if relying on word editing)


Comments