PHP/XHTML Static Content Parser Plugin

Plugin source location: <serge_root>/lib/Serge/Engine/Plugin/parse_php_xhtml.pm

This plugin is used to parse [X]HTML/XML documents, including documents with embedded PHP and JavaScript code.

Plugin does full XML parsing/validation, so the document must be a XHTML/XML document with balanced tags. When it comes to PHP, these are replaced with special plain-text markers or HTML attributes to make resulting HTML valid. In case validation fails, the plugin can send an error report to specified recipients. If no email settings are provided, it will simply report the error in the console output.

By default, the following contents are extracted:

  1. Inner HTML of these tags: <h1>...<h7>, <p>, <li>, <dt>, <dd>, <label>, and <option>;
  2. Values of the following attributes: <alt>, <title>;
  3. value attribute of the <input> tag, whose type attribute is one of the following: text, search, email, submit, reset, or button;
  4. placeholder attribute for <input> and <textarea> tags;
  5. strings inside _(''), __(''), and ___('') wrapper functions (which can be used in embedded PHP or JavaScript). Strings can be single- or double-quoted.

If the extracted candidate string for translation consists only of a PHP include, it is ignored.

In addition to implicit extraction rules listed above, one can add localization hints — special tag attributes that can adjust the string extraction

  • lang="en" attribute on a tag means that the entire Inner HTML of the tag needs to be extracted for translation. It also prohibits this tag to be extracted as a part of some parent tag implicit extraction rule.
  • lang="" attribute means that the tag's attributes and entire tag subtree should be skipped. It also prohibits this tag to be extracted as a part of some parent tag implicit extraction rule.

To control segmentation (for example, to split large paragraph into multiple separately translated sentences, one can use <span lang="en">...</span> and <div lang="en">...</div> wrappers.

Also, if either context or data-l10n-context attribute is present in a tag, its value is used as a context for the translatable string. Similarly, hint or data-l10n-hint attribute can be used to specify hint for the translatable string.

Code Examples

example.php
<p data-l10n-context="context" data-l10n-hint="hint">string</p> <h1>string</h1> <p>string</p> <p lang="">string</p> <img alt="string" title="string" src="..." /> <p> outer string <span lang="">inner string</span> </p> <p> <span lang="en">outer string</span> <span lang="">inner string</span> </p> <p> Click here: <a href="http://sample.com">http://sample.com</a> </p> <p> <span lang="en">Click here:</span> <a href="http://sample.com">http://sample.com</a> </p> <input type="search" placeholder="string"> <input type="text" value="string"> <div>string</div> <div lang="en">string</div> <?php echo "string"; echo _("string"); echo __('string'); echo ___('string'); ?> <script type="text/javascript"> alert("string"); alert(___('string')); </script>

Usage

example-project.serge