| 1 | .\" This manpage has been automatically generated by docbook2man |
| 2 | .\" from a DocBook document. This tool can be found at: |
| 3 | .\" <http://shell.ipoline.com/~elmert/comp/docbook2X/> |
| 4 | .\" Please send any bug reports, improvements, comments, patches, |
| 5 | .\" etc. to Steve Cheng <steve@ggi-project.org>. |
| 6 | .TH "XMLWF" "1" "24 January 2003" "" "" |
| 7 | .SH NAME |
| 8 | xmlwf \- Determines if an XML document is well-formed |
| 9 | .SH SYNOPSIS |
| 10 | |
| 11 | \fBxmlwf\fR [ \fB-s\fR] [ \fB-n\fR] [ \fB-p\fR] [ \fB-x\fR] [ \fB-e \fIencoding\fB\fR] [ \fB-w\fR] [ \fB-d \fIoutput-dir\fB\fR] [ \fB-c\fR] [ \fB-m\fR] [ \fB-r\fR] [ \fB-t\fR] [ \fB-v\fR] [ \fBfile ...\fR] |
| 12 | |
| 13 | .SH "DESCRIPTION" |
| 14 | .PP |
| 15 | \fBxmlwf\fR uses the Expat library to |
| 16 | determine if an XML document is well-formed. It is |
| 17 | non-validating. |
| 18 | .PP |
| 19 | If you do not specify any files on the command-line, and you |
| 20 | have a recent version of \fBxmlwf\fR, the |
| 21 | input file will be read from standard input. |
| 22 | .SH "WELL-FORMED DOCUMENTS" |
| 23 | .PP |
| 24 | A well-formed document must adhere to the |
| 25 | following rules: |
| 26 | .TP 0.2i |
| 27 | \(bu |
| 28 | The file begins with an XML declaration. For instance, |
| 29 | <?xml version="1.0" standalone="yes"?>. |
| 30 | \fBNOTE:\fR |
| 31 | \fBxmlwf\fR does not currently |
| 32 | check for a valid XML declaration. |
| 33 | .TP 0.2i |
| 34 | \(bu |
| 35 | Every start tag is either empty (<tag/>) |
| 36 | or has a corresponding end tag. |
| 37 | .TP 0.2i |
| 38 | \(bu |
| 39 | There is exactly one root element. This element must contain |
| 40 | all other elements in the document. Only comments, white |
| 41 | space, and processing instructions may come after the close |
| 42 | of the root element. |
| 43 | .TP 0.2i |
| 44 | \(bu |
| 45 | All elements nest properly. |
| 46 | .TP 0.2i |
| 47 | \(bu |
| 48 | All attribute values are enclosed in quotes (either single |
| 49 | or double). |
| 50 | .PP |
| 51 | If the document has a DTD, and it strictly complies with that |
| 52 | DTD, then the document is also considered \fBvalid\fR. |
| 53 | \fBxmlwf\fR is a non-validating parser -- |
| 54 | it does not check the DTD. However, it does support |
| 55 | external entities (see the \fB-x\fR option). |
| 56 | .SH "OPTIONS" |
| 57 | .PP |
| 58 | When an option includes an argument, you may specify the argument either |
| 59 | separately ("\fB-d\fR output") or concatenated with the |
| 60 | option ("\fB-d\fRoutput"). \fBxmlwf\fR |
| 61 | supports both. |
| 62 | .TP |
| 63 | \fB-c\fR |
| 64 | If the input file is well-formed and \fBxmlwf\fR |
| 65 | doesn't encounter any errors, the input file is simply copied to |
| 66 | the output directory unchanged. |
| 67 | This implies no namespaces (turns off \fB-n\fR) and |
| 68 | requires \fB-d\fR to specify an output file. |
| 69 | .TP |
| 70 | \fB-d output-dir\fR |
| 71 | Specifies a directory to contain transformed |
| 72 | representations of the input files. |
| 73 | By default, \fB-d\fR outputs a canonical representation |
| 74 | (described below). |
| 75 | You can select different output formats using \fB-c\fR |
| 76 | and \fB-m\fR. |
| 77 | |
| 78 | The output filenames will |
| 79 | be exactly the same as the input filenames or "STDIN" if the input is |
| 80 | coming from standard input. Therefore, you must be careful that the |
| 81 | output file does not go into the same directory as the input |
| 82 | file. Otherwise, \fBxmlwf\fR will delete the |
| 83 | input file before it generates the output file (just like running |
| 84 | cat < file > file in most shells). |
| 85 | |
| 86 | Two structurally equivalent XML documents have a byte-for-byte |
| 87 | identical canonical XML representation. |
| 88 | Note that ignorable white space is considered significant and |
| 89 | is treated equivalently to data. |
| 90 | More on canonical XML can be found at |
| 91 | http://www.jclark.com/xml/canonxml.html . |
| 92 | .TP |
| 93 | \fB-e encoding\fR |
| 94 | Specifies the character encoding for the document, overriding |
| 95 | any document encoding declaration. \fBxmlwf\fR |
| 96 | supports four built-in encodings: |
| 97 | US-ASCII, |
| 98 | UTF-8, |
| 99 | UTF-16, and |
| 100 | ISO-8859-1. |
| 101 | Also see the \fB-w\fR option. |
| 102 | .TP |
| 103 | \fB-m\fR |
| 104 | Outputs some strange sort of XML file that completely |
| 105 | describes the the input file, including character postitions. |
| 106 | Requires \fB-d\fR to specify an output file. |
| 107 | .TP |
| 108 | \fB-n\fR |
| 109 | Turns on namespace processing. (describe namespaces) |
| 110 | \fB-c\fR disables namespaces. |
| 111 | .TP |
| 112 | \fB-p\fR |
| 113 | Tells xmlwf to process external DTDs and parameter |
| 114 | entities. |
| 115 | |
| 116 | Normally \fBxmlwf\fR never parses parameter |
| 117 | entities. \fB-p\fR tells it to always parse them. |
| 118 | \fB-p\fR implies \fB-x\fR. |
| 119 | .TP |
| 120 | \fB-r\fR |
| 121 | Normally \fBxmlwf\fR memory-maps the XML file |
| 122 | before parsing; this can result in faster parsing on many |
| 123 | platforms. |
| 124 | \fB-r\fR turns off memory-mapping and uses normal file |
| 125 | IO calls instead. |
| 126 | Of course, memory-mapping is automatically turned off |
| 127 | when reading from standard input. |
| 128 | |
| 129 | Use of memory-mapping can cause some platforms to report |
| 130 | substantially higher memory usage for |
| 131 | \fBxmlwf\fR, but this appears to be a matter of |
| 132 | the operating system reporting memory in a strange way; there is |
| 133 | not a leak in \fBxmlwf\fR. |
| 134 | .TP |
| 135 | \fB-s\fR |
| 136 | Prints an error if the document is not standalone. |
| 137 | A document is standalone if it has no external subset and no |
| 138 | references to parameter entities. |
| 139 | .TP |
| 140 | \fB-t\fR |
| 141 | Turns on timings. This tells Expat to parse the entire file, |
| 142 | but not perform any processing. |
| 143 | This gives a fairly accurate idea of the raw speed of Expat itself |
| 144 | without client overhead. |
| 145 | \fB-t\fR turns off most of the output options |
| 146 | (\fB-d\fR, \fB-m\fR, \fB-c\fR, |
| 147 | \&...). |
| 148 | .TP |
| 149 | \fB-v\fR |
| 150 | Prints the version of the Expat library being used, including some |
| 151 | information on the compile-time configuration of the library, and |
| 152 | then exits. |
| 153 | .TP |
| 154 | \fB-w\fR |
| 155 | Enables support for Windows code pages. |
| 156 | Normally, \fBxmlwf\fR will throw an error if it |
| 157 | runs across an encoding that it is not equipped to handle itself. With |
| 158 | \fB-w\fR, xmlwf will try to use a Windows code |
| 159 | page. See also \fB-e\fR. |
| 160 | .TP |
| 161 | \fB-x\fR |
| 162 | Turns on parsing external entities. |
| 163 | |
| 164 | Non-validating parsers are not required to resolve external |
| 165 | entities, or even expand entities at all. |
| 166 | Expat always expands internal entities (?), |
| 167 | but external entity parsing must be enabled explicitly. |
| 168 | |
| 169 | External entities are simply entities that obtain their |
| 170 | data from outside the XML file currently being parsed. |
| 171 | |
| 172 | This is an example of an internal entity: |
| 173 | |
| 174 | .nf |
| 175 | <!ENTITY vers '1.0.2'> |
| 176 | .fi |
| 177 | |
| 178 | And here are some examples of external entities: |
| 179 | |
| 180 | .nf |
| 181 | <!ENTITY header SYSTEM "header-&vers;.xml"> (parsed) |
| 182 | <!ENTITY logo SYSTEM "logo.png" PNG> (unparsed) |
| 183 | .fi |
| 184 | .TP |
| 185 | \fB--\fR |
| 186 | (Two hyphens.) |
| 187 | Terminates the list of options. This is only needed if a filename |
| 188 | starts with a hyphen. For example: |
| 189 | |
| 190 | .nf |
| 191 | xmlwf -- -myfile.xml |
| 192 | .fi |
| 193 | |
| 194 | will run \fBxmlwf\fR on the file |
| 195 | \fI-myfile.xml\fR. |
| 196 | .PP |
| 197 | Older versions of \fBxmlwf\fR do not support |
| 198 | reading from standard input. |
| 199 | .SH "OUTPUT" |
| 200 | .PP |
| 201 | If an input file is not well-formed, |
| 202 | \fBxmlwf\fR prints a single line describing |
| 203 | the problem to standard output. If a file is well formed, |
| 204 | \fBxmlwf\fR outputs nothing. |
| 205 | Note that the result code is \fBnot\fR set. |
| 206 | .SH "BUGS" |
| 207 | .PP |
| 208 | According to the W3C standard, an XML file without a |
| 209 | declaration at the beginning is not considered well-formed. |
| 210 | However, \fBxmlwf\fR allows this to pass. |
| 211 | .PP |
| 212 | \fBxmlwf\fR returns a 0 - noerr result, |
| 213 | even if the file is not well-formed. There is no good way for |
| 214 | a program to use \fBxmlwf\fR to quickly |
| 215 | check a file -- it must parse \fBxmlwf\fR's |
| 216 | standard output. |
| 217 | .PP |
| 218 | The errors should go to standard error, not standard output. |
| 219 | .PP |
| 220 | There should be a way to get \fB-d\fR to send its |
| 221 | output to standard output rather than forcing the user to send |
| 222 | it to a file. |
| 223 | .PP |
| 224 | I have no idea why anyone would want to use the |
| 225 | \fB-d\fR, \fB-c\fR, and |
| 226 | \fB-m\fR options. If someone could explain it to |
| 227 | me, I'd like to add this information to this manpage. |
| 228 | .SH "ALTERNATIVES" |
| 229 | .PP |
| 230 | Here are some XML validators on the web: |
| 231 | |
| 232 | .nf |
| 233 | http://www.hcrc.ed.ac.uk/~richard/xml-check.html |
| 234 | http://www.stg.brown.edu/service/xmlvalid/ |
| 235 | http://www.scripting.com/frontier5/xml/code/xmlValidator.html |
| 236 | http://www.xml.com/pub/a/tools/ruwf/check.html |
| 237 | .fi |
| 238 | .SH "SEE ALSO" |
| 239 | .PP |
| 240 | |
| 241 | .nf |
| 242 | The Expat home page: http://www.libexpat.org/ |
| 243 | The W3 XML specification: http://www.w3.org/TR/REC-xml |
| 244 | .fi |
| 245 | .SH "AUTHOR" |
| 246 | .PP |
| 247 | This manual page was written by Scott Bronson <bronson@rinspin.com> for |
| 248 | the Debian GNU/Linux system (but may be used by others). Permission is |
| 249 | granted to copy, distribute and/or modify this document under |
| 250 | the terms of the GNU Free Documentation |
| 251 | License, Version 1.1. |