Migrating from HTML

Jira Code: ESMILES-34 (Related)

Migrating a document from HTML to XML may be easy or difficult, depending on how the HTML has been written. Following these four steps will account for 95% of the changes that are required. 1.The first, and probably most painful step is to ensure that all tags are closed and all attributes are quoted, so that the document meets the XML specification. Elements that don’t officially require closure, like TD, LI, P, as well as tags with no content like BR and IMG are likely to be the cheif cause of problems. We find the insistance on quoting even unambiguous attributes annoying, and we’re pleased to see that at least one XML parser (that supplied with Resin 2.0) has the option of being lax about this requirement. 2. Second, if any non-CSS legacy attributes are used (e.g. “bgcolor” to set the background color), these should be converted to their CSS equivalent (e.g. “background-color”, and placed in a stylesheet in the head of the document (either embedded or external). Versions of the Report Generator since 1.0.11 recognise the style=”background-color:red” method of defining attributes, although we still recommend the XML equivalent of background-color=”red”. 3. Third, check the document for inline images, tables, lists or other blocks inside paragraphs. We’ve found this to be a common occurance, due to HTML not requiring a closing tag. 4. Fourth and finally, change the tags that have a different syntax. These are: • TABLE – the HTML attributes “border” and “cellmargin” should be renamed “cellborder” and “cellmargin”. • The legacy FONT element should be replaced with an equivalent SPAN • The various different styles of paragraph and span available in HTML – ADDRESS, CITE etc. should be replaced with a P or SPAN, setting the “class” attribute to control the style. • Definition lists using the DL, DT and DT elements aren’t supported, and should be replaced wither either a normal UL list with the “value” attribute set to the definition, or a TABLE. Provided that no JavaScript, forms or frames are used, these steps should result in a report that is legible and ready to be tailored for it’s eventual destination as a PDF document.

Leave a comment

Your email address will not be published. Required fields are marked *