Generic XML Parser Plugin

Plugin source location: <serge_root>/lib/Serge/Engine/Plugin/parse_xml.pm

This plugin is used to parse arbitrary XML data structures. It uses regular expressions as a configuration parameter to match translatable nodes in the XML DOM tree, and to identify the nodes whose content needs to be treated as HTML which needs to be parsed additionally using the parse_php_xhtml parser.

In case XML format validation fails, the plugin can send an error report to specified recipients. If no email settings are provided, it will simply report the error in the console output.

Code Examples

products.xml
<products> <description>Product list</description> <items> <item sku="P001"> <price>1.23</price> <title>First Product</title> <description><![CDATA[ <p>First Product Description</p> ]]></description> </item> <item sku="P002"> <price>2.34</price> <title>Second Product</title> <description><![CDATA[ <p>Second Product Description</p> ]]></description> </item> </items> </products>

Please see the example configuration file below to learn why only title and certain description nodes are extracted here for translation.

Node Paths

When XML document is parsed, each node in it is given its path, and this path is what path_matches and path_doesnt_match regular expressions should match against (see the example configuration file). This is how the paths are constructed, given the example JSON file above:

products.xml (internal 'path => value' representation)
products/description => Product list products/items/item[0]/sku => P001 products/items/item[0]/price => 1.23 products/items/item[0]/title => First Product products/items/item[0]/description => First Product Description products/items/item[1]/sku => P002 products/items/item[1]/price => 2.34 products/items/item[1]/title => Second Product products/items/item[1]/description => Second Product Description

The constructed path to a node is also extracted as a hint along with the corresponding string.

Usage

example-project.serge