i writing application has read , interpret data stored in pdf files. reading part done able dump of words on page , not format of words. mean if have extract table, getting numbers in table not markup defines table.
further, there formatting used displays few of these numbers within parentheses (meaning numbers negative) parentheses not part of text. hence, not able distinguish between positive , negative numbers present in pdf table!
how pdf markup along text? pdf similar in structure xml tags used markup tables etc.? if not, then, there resource describes salient features of pdf dom?
i using vba , acrobat library (acroexch etc.)
there no such thing "pdf markup" in sense of html etc. table in pdf cannot distinguished line art, other using ocr, can error-prone if layout complex. drawn using geometrical shapes, in vector-based graphics program.
Comments
Post a Comment