Illegal XML Characters

Posted in General at 11:21 am by matt

When you’re working with XML, there are certain characters that are considered “illegal” The following is a potentially incomplete list of characters — I’m not an expert on text, unicode, utf-8, or the like. All i know is that I have to filter these characters out of any string that i send in a SOAP call in Java.

These are the java character codes, provided, for convenience, in a Java array.

private static final char[] XML_ILLEGALS = new char[] { 0x00, //0
0x01, //1
0x02, //2
0x03, //3
0x04, //4
0x05, //5
0x06, //6
0x07, //7
0x08, //8
0x0B, //11
0x0C, //12
0x0E, //14
0x0F, //15
0x10, //16
0x11, //17
0x12, //18
0x13, //19
0x14, //20
0x15, //21
0x16, //22
0x17, //23
0x18, //24
0x19, //25
0x1A, //26
0x1B, //27
0x1C, //28
0x1D, //29
0x1E, //30
0x1F //31

Note that when doing RSS, there are other illegal characters, like ampersand, which need to be escaped through encoding, but not all encoding is supported everywhere.

For example, “Oslash” referenced by the w3c here has an acceptable encoding. However, when in an rss xml feed parsed by firefox, it’s “not ok”

Resolution to follow.

Leave a Comment