How to validate XML files or check for well-formedness

Instructional HowTos, posted by users. No questions here please.

Moderators: AmigoJack, helios, bbadmin, Bob Hansen, MudGuard

Post Reply
User avatar
rsperberg
Posts: 35
Joined: Thu Jul 29, 2004 2:26 pm
Location: NJ

How to validate XML files or check for well-formedness

Post by rsperberg »

There are many XML parsers, but a small fast one is RXP. You can obtain a Windows binary version of it from ftp://ftp.cogsci.ed.ac.uk/pub/richard/rxp.exe. (A very brief descriptive page, with links to the source and a Unix man page is at http://www.cogsci.ed.ac.uk/~richard/rxp.html.)

I find it easiest to put programs that I call from within TextPad in a folder directly off the root that contains no spaces in the name. So, for instance, you might install RXP in:

C:/programs/rxp.exe

(or C:/programs/rxp/rxp.exe or whatever)

I set up two tools in TextPad, one to check well-formedness and one to validate the XML. Each tool calls RXP; only the parameters supplied are different.

In TextPad, you will need to create a new tool (I won't repeat the basics here). I call my first tool "Well-formed check".

The settings in Preferences should be:
Command: C:\programs\rxp.exe
Parameters: -avsN $Filename
Initial folder: $FileDir
(Obviously the command line should match the exact path to where you placed the RXP executable. You can fiddle with the parameters too, of course.)

Check "Capture output" (and other options you may prefer, such as "Save all documents first")

The regular expression that works for me ("Use POSIX regular expression syntax" is checked in Preferences/Editor) is this:
Regular expression to match output
^.+line ([0-9]+) char ([0-9]+) of file:///([A-Z]:.+)$

Registers:
File: 3 Line: 1 Column: 2
With the XML file open in the active window, simply invoke your "Well-formed check" tool.

These settings cause RXP to check that the XML file is well-formed and report in the Command Results window.

The second tool, called "XML validation", is a near duplicate of "Well-formed check". The only difference is that the Parameters line has slightly different settings -- "-avVNs $Filename". With these settings, RXP validates the XML file against the DTD identified in the document type declaration.

If there is a problem in the file, you should be able to double-click on the line specifying the error to have TextPad automatically jump you to the right spot in your file.

Roger Sperberg

Thanks to Wendell Piez of Mulberry Technology for first pointing me to RXP and how to use it with TextPad.
Last edited by rsperberg on Sat Dec 30, 2006 9:14 pm, edited 1 time in total.
User avatar
Bob Hansen
Posts: 1517
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

Thanks for the HOW TO tip Roger..... the toolbox gets better every day!
Hope this was helpful.............good luck,
Bob
JohnRudman
Posts: 3
Joined: Tue Nov 08, 2011 5:15 pm

Post by JohnRudman »

This works well, but be careful when copying the regular expression to match output: the parentheses need backslashes:
^.+line \([0-9]+\) char \([0-9]+\) of file:///\([A-Z]:.+\)$
ben_josephs
Posts: 2456
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Not if you're using "Posix" regular expression syntax:
Configure | Preferences | Editor

[X] Use POSIX regular expression syntax
which is required to maintain your sanity.
JohnRudman
Posts: 3
Joined: Tue Nov 08, 2011 5:15 pm

Post by JohnRudman »

Too late for the sanity thing, but good point. Users will now have a choice!
Post Reply