Illegal XML Characters

When you’re working with XML, there are certain characters that are considered “illegal” The following is a potentially incomplete list of characters — I’m not an expert on text, unicode, utf-8, or the like. All i know is that I have to filter these characters out of any string that i send in a SOAP call in Java.

These are the java character codes, provided, for convenience, in a Java array.

private static final char[] XML_ILLEGALS = new char[] { 0x00, //0
0x01, //1
0x02, //2
0x03, //3
0x04, //4
0x05, //5
0x06, //6
0x07, //7
0x08, //8
0x0B, //11
0x0C, //12
0x0E, //14
0x0F, //15
0x10, //16
0x11, //17
0x12, //18
0x13, //19
0x14, //20
0x15, //21
0x16, //22
0x17, //23
0x18, //24
0x19, //25
0x1A, //26
0x1B, //27
0x1C, //28
0x1D, //29
0x1E, //30
0x1F //31
};

Note that when doing RSS, there are other illegal characters, like ampersand, which need to be escaped through encoding, but not all encoding is supported everywhere.

For example, “Oslash” referenced by the w3c here has an acceptable encoding. However, when in an rss xml feed parsed by firefox, it’s “not ok”

Resolution to follow.

Published by

matt

I'm a software engineer in New Orleans interested in making things, growing things, big fast computers, media convergence, and pugs.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>