Escaping JavaScript in QueryPath

Jun 1 2009

Sometimes the HTML you parse with QueryPath will contain JavaScript or other embedded scripting languages. And sometimes such scripts will contain characters that the XML parser might misinterpret as XML or HTML structures.

There are two ways to escape such content -- both of which are standard, and are often done regardless of whether or not you are using QueryPath.

The first method, which is preferred when working with HTML, is to enclose any scripts inside of HTML comments:

<html>
<head>
< script>
<!--
// Script goes here
-->
< /script>
</head>
<body></body>
</html>

(Extra spacing has been added in the example above to keep the tags from being stripped by this blog's formatter. Those spaces should not be present in your code.)

The comment enclosure will prevent the HTML parser from parsing the contents of the script.

In other cases, XMxmlL CDATA sections may be a better fit for your needs:

<html>
<head>
<![[CDATA
// Script goes here
]]>
< /script>
</head>
<body></body>
</html>

CDATA sections will be readily available in the parsed DOM, but the contents of a CDATA section will not be parsed and interpreted. It is therefore safe to embed JavaScript as well as XML/HTML-like tags.

With these two strategies, you should have the tools necessary to prevent embedded scripts from causing QueryPath parse errors. <!--break-->