Gnash 0.8.7: XML parsing

Gnash 0.8.7 comes with memory optimized XML parsing and much improved compatibility.

The XML and XMLNode classes in ActionScript 2 enable parsing and handling of XML trees. Mostly they are used for configuration data or bits of dynamically loaded content. But sometimes - one example is openstreetmap.org's "potlatch" editor - the XML can have thousands of nodes.

Even a simple XML tree with so many nodes has significant memory requirements. We have to store not only the node type, node value and, if appropriate, the node content and attributes, but also links to parent and child nodes.

ActionScript represents these as XMLNode objects. Translating the entire XML tree into XMLNode objects creates even greater memory requirements: ActionScript objects have their own properties and a heap of internal information necessary for execution. Objects are also handled by the garbage collector, so the thousands of extra XMLNodes puts a strain on the GC.

The trick with XML parsing, then, is not to convert nodes into genuine XMLNode objects until it's unavoidable. It turns out that this isn't too difficult.

An object only needs to exist if a reference to it is alive. When you create an XML object from an XML string, you hold a reference to the top XMLNode object of the tree. The rest of the nodes aren't referenced, so at that point there is no need for them to exist as objects.

You can then access further nodes through properties of any referenced node, starting from the top and working downwards or sideways. XMLNode has the following properties for this purpose:

  • firstChild
  • lastChild
  • nextSibling
  • previousSibling
  • childNodes

Because these are the only methods of accessing other nodes, we can hold off creating the actual object until one of those properties is invoked. We only create the XMLNode once a child or a sibling is requested.

The most expensive operation is the childNodes property: this creates an Array holding references to XMLNode objects for each child node. The result is the creation of one object for each node plus an Array object to hold them.

Note also that the childNodes array itself can be modified by adding or removing elements, but this does not change the actual XML tree!

Flashcoders should know that it's possible to keep memory requirements low even for very large XML trees, provided you don't need to access all the nodes. This does not only apply to Gnash, but also to the Adobe player: accessing all nodes of a large XML tree causes memory usage almost double what it is when the tree is only parsed or stringified.

If only one part of the tree is needed, use childNodes sparingly! The most efficient way to search through an XML tree is to start at firstChild() and iterate with nextSibling() until you find the node you want. This avoids creating objects for nodes you don't need.

Use hasChildNodes() to check whether you need to access childNodes: if a node has no children, the Array object never has to be created.

Note also that it is possible to stringify an entire tree without constructing any further XMLNode objects at all. The XML.toString() function works without creating an XMLNode.

Bear in mind that these improvements only apply for XML trees parsed from text, and are only effective for large XML trees.