MovableType Garbage Characters Problem

I found a solution to garbage characters showing up in my blog. The solution is to download the MTStripControlChars plugin which essentially translates the (would-be) Windows-1252 characters into the corresponding Unicode numeric entities. This fixed most of the weird garbage characters, but not all of them. Unfortunately, the plugin is not complete, so I had to customize the MTStripControlChars file to add other character mappings such as copyright symbols, registered trademarks, letter 'e' with an accent é), and other mappings. If you want, you can download my MTStripControlChars.pl where I added some more character mappings.

I had to break out the old ASCII chart of characters and perform some decimal to Hexidecimal conversion which was then added to the MTStripControlChars.pl file. I haven't done decimal-to-hex conversions since my Assembly programming class in college. Ah, the memories. Anyway, you then you simply put <$insert-MT-tag strip_controlchars="2"$> into various locations in the blog's template and presto bango it works!

For example, for the blog's content you would change <$MTEntryBody$> to <$MTEntryBody strip_controlchars="2"$> Just repeat for the Comments and Trackback sections.

Here is a useful chart that assisted in the character mappings I did. I didn't map all of the characters, so some of them may look like gobbly-gook unless I decide to go crazy and add them all into the MT plugin. Actually, the characters will display just fine, but you will see  in front of it - just ignore it - the ones I mapped don't display the  character or probably is already supported by most browsers. To add new entries to the plugin, you basically use this chart to lookup the appropriate code and then add the new entry below this function:
my %windows_1252 = (

So for example, I added this line of code for the copyright symbol (©)
'\xA9' => '&#x00A9;',
(Make sure the last line for your mappings doesn't have a comma)


Explanation of SymbolEntity EncodingEntity Looks LikeASCII EncodingASCII Looks LikeUnicode EncodingUnicode Looks LikeALT+ASCII Number Key CombinationALT+ASCII Looks Like
Standard Keyboard Characters
double quotes&quot;"&#0034;"&#x0022;"ALT+0034"
ampersand&amp;&&#0038;&&#x0026;&n/an/a
less than sign&lt;< &#0060;< &#x003C;< n/an/a
greater than sign&gt;> &#0062;> &#x003E;> n/an/a
ASCII 127 - 159 Not Supported By Some Browsers
euro sign&euro;&#0128;&#x20AC;ALT+0128
single low-9 quotation markn/an/a&#0130;&#x201A;ALT+0130
latin small f with hook - functionn/an/a&#0131;ƒ&#x0192;ƒALT+0131ƒ
double low-9 quotation mark&bdquo;&#0132;&#x201E;ALT+0132
horizontal ellipsis&hellip;&#0133;&#x2026;ALT+0133
dagger&dagger;&#0134;&#x2020;ALT+0134
double dagger&Dagger;&#0135;&#x2021;ALT+0135
circumflex accentn/an/a&#0136;ˆ&#x02C6;ˆALT+0136ˆ
per thousand sign&permil;&#0137;&#x2030;ALT+0137
latin cap S with caronn/an/a&#0138;Š&#x0160;ŠALT+0138Š
left single angle quoten/an/a&#0139;&#x2039;ALT+0139
latin cap ligature OEn/an/a&#0140;Œ&#x0152;ŒALT+0140Œ
latin cap Z with caronn/an/a&#0142;Ž&#x017D;ŽALT+0142Ž
left single quotation mark&lsquo;&#0145;&#x2018;ALT+0145
right single quotation mark&rsquo;&#0146;&#x2019;ALT+0146
left double quotation mark&ldquo;&#0147;&#x201C;ALT+0147
right double quotation mark&rdquo;&#0148;&#x201D;ALT+0148
bullet&bull;&#0149;&#x2022;ALT+0149
en dash&endash;&endash;&#0150;&#x2013;ALT+0150
em dash&emdash;&emdash;&#0151;&#x2014;ALT+0151
small tilden/an/a&#0152;˜&#x02DC;˜ALT+0152˜
trade mark sign&trade;&#0153;&#x2122;ALT+0153
latin small letter s with caronn/an/a&#0154;š&#x0161;šALT+0154š
right single angle quoten/an/a&#0155;&#x203A;ALT+0155
latin small letter oen/an/a&#0156;œ&#x0153;œALT+0156œ
latin small z with caronn/an/a&#0158;ž&#x017E;žALT+0158ž
latin capital letter Y with diaeresisn/an/a&#0159;Ÿ&#x0178;ŸALT+0159Ÿ
ASCII 160 - 255 Supported By Most Browsers
non-breaking space&nbsp; &#0160; &#x00A0; ALT+0160 
inverted exclamation mark&iexcl;¡&#0161;¡&#x00A1;¡ALT+0161¡
cent sign&cent;¢&#0162;¢&#x00A2;¢ALT+0162¢
pound sign&pound;¢&#0163;¢&#x00A3;¢ALT+0163¢
currency sign&curren;£&#0164;£&#x00A4;£ALT+0164£
yen sign&yen;¥&#0165;¥&#x00A5;¥ALT+0165¥
broken vertical bar&brvbar;¦&#0166;¦&#x00A6;¦ALT+0166¦
section sign&sect;§&#0167;§&#x00A7;§ALT+0167§
spacing diaeresis - umlaut&uml;¨&#0168;¨&#x00A8;¨ALT+0168¨
copyright sign&copy;©&#0169;©&#x00A9;©ALT+0169©
feminine ordinal indicator&ordf;ª&#0170;ª&#x00AA;ªALT+0170ª
left double angle quotes&laquo;«&#0171;«&#x00AB;«ALT+0171«
not sign&not;¬&#0172;¬&#x00AC;¬ALT+0172¬
registered trade mark sign&reg;®&#0174;®&#x00AE;®ALT+0174®
spacing macron - overline&macr;¯&#0175;¯&#x00AF;¯ALT+0175¯
degree sign&deg;°&#0176;°&#x00B0;°ALT+0176°
plus-or-minus sign&plusmn;±&#0177;±&#x00B1;±ALT+0177±
superscript two - squared&sup2;²&#0178;²&#x00B2;²ALT+0178²
superscript three - cubed&sup3;³&#0179;³&#x00B3;³ALT+0179³
acute accent - spacing acute&acute;´&#0180;´&#x00B4;´ALT+0180´
micro sign&micro;µ&#0181;µ&#x00B5;µALT+0181µ
pilcrow sign - paragraph sign&para;&#0182;&#x00B6;ALT+0182
middle dot - Georgian comma&middot;·&#0183;·&#x00B7;·ALT+0183·
spacing cedilla&cedil;¸&#0184;¸&#x00B8;¸ALT+0184¸
superscript one&sup1;¹&#0185;¹&#x00B9;¹ALT+0185¹
masculine ordinal indicator&ordm;º&#0186;º&#x00BA;ºALT+0186º
right double angle quotes&raquo;»&#0187;»&#x00BB;»ALT+0187»
fraction one quarter&frac14;¼&#0188;¼&#x00BC;¼ALT+0188¼
fraction one half&frac12;½&#0189;½&#x00BD;½ALT+0189½
fraction three quarters&frac34;¾&#0190;¾&#x00BE;¾ALT+0190¾
inverted question mark&iquest;¿&#0191;¿&#x00BF;¿ALT+0191¿
latin capital letter A with grave&Agrave;À&#0192;À&#x00C0;ÀALT+0192À
latin capital letter A with acute&Aacute;Á&#0193;Á&#x00C1;ÁALT+0193Á
latin capital letter A with circumflex&Acirc;Â&#0194;Â&#x00C2;ÂALT+0194Â
latin capital letter A with tilde&Atilde;Ã&#0195;Ã&#x00C3;ÃALT+0195Ã
latin capital letter A with diaeresis&Auml;Ä&#0196;Ä&#x00C4;ÄALT+0196Ä
latin capital letter A with ring above&Aring;Å&#0197;Å&#x00C5;ÅALT+0197Å
latin capital letter AE&AElig;Æ&#0198;Æ&#x00C6;ÆALT+0198Æ
latin capital letter C with cedilla&Ccedil;Ç&#0199;Ç&#x00C7;ÇALT+0199Ç
latin capital letter E with grave&Egrave;È&#0200;È&#x00C8;ÈALT+0200È
latin capital letter E with acute&Eacute;É&#0201;É&#x00C9;ÉALT+0201É
latin capital letter E with circumflex&Ecirc;Ê&#0202;Ê&#x00CA;ÊALT+0202Ê
latin capital letter E with diaeresis&Euml;Ë&#0203;Ë&#x00CB;ËALT+0203Ë
latin capital letter I with grave&Igrave;Ì&#0204;Ì&#x00CC;ÌALT+0204Ì
latin capital letter I with acute&Iacute;Í&#0205;Í&#x00CD;ÍALT+0205Í
latin capital letter I with circumflex&Icirc;Î&#0206;Î&#x00CE;ÎALT+0206Î
latin capital letter I with diaeresis&Iuml;Ï&#0207;Ï&#x00CF;ÏALT+0207Ï
latin capital letter ETH&ETH;Ð&#0208;Ð&#x00D0;ÐALT+0208Ð
latin capital letter N with tilde&Ntilde;Ñ&#0209;Ñ&#x00D1;ÑALT+0209Ñ
latin capital letter O with grave&Ograve;Ò&#0210;Ò&#x00D2;ÒALT+0210Ò
latin capital letter O with acute&Oacute;Ó&#0211;Ó&#x00D3;ÓALT+0211Ó
latin capital letter O with circumflex&Ocirc;Ô&#0212;Ô&#x00D4;ÔALT+0212Ô
latin capital letter O with tilde&Otilde;Õ&#0213;Õ&#x00D5;ÕALT+0213Õ
latin capital letter O with diaeresis&Ouml;Ö&#0214;Ö&#x00D6;ÖALT+0214Ö
multiplication sign&times;×&#0215;×&#x00D7;×ALT+0215×
latin capital letter O with slash&Oslash;Ø&#0216;Ø&#x00D8;ØALT+0216Ø
latin capital letter U with grave&Ugrave;Ù&#0217;Ù&#x00D9;ÙALT+0217Ù
latin capital letter U with acute&Uacute;Ú&#0218;Ú&#x00DA;ÚALT+0218Ú
latin capital letter U with circumflex&Ucirc;Û&#0219;Û&#x00DB;ÛALT+0219Û
latin capital letter U with diaeresis&Uuml;Ü&#0220;Ü&#x00DC;ÜALT+0220Ü
latin capital letter Y with acute&Yacute;Ý&#0221;Ý&#x00DD;ÝALT+0221Ý
latin capital letter THORN&THORN;Þ&#0222;Þ&#x00DE;ÞALT+0222Þ
latin small letter sharp s - ess-zed&szlig;ß&#0223;ß&#x00DF;ßALT+0223ß
latin small letter a with grave&agrave;à&#0224;à&#x00E0;àALT+0224à
latin small letter a with acute&aacute;á&#0225;á&#x00E1;áALT+0225á
latin small letter a with circumflex&acirc;â&#0226;â&#x00E2;âALT+0226â
latin small letter a with tilde&atilde;ã&#0227;ã&#x00E3;ãALT+0227ã
latin small letter a with diaeresis&auml;ä&#0228;ä&#x00E4;äALT+0228ä
latin small letter a with ring above&aring;å&#0229;å&#x00E5;åALT+0229å
latin small letter ae&aelig;æ&#0230;æ&#x00E6;æALT+0230æ
latin small letter c with cedilla&ccedil;ç&#0231;ç&#x00E7;çALT+0231ç
latin small letter e with grave&egrave;è&#0232;è&#x00E8;èALT+0232è
latin small letter e with acute&eacute;é&#0233;é&#x00E9;éALT+0233é
latin small letter e with circumflex&ecirc;ê&#0234;ê&#x00EA;êALT+0234ê
latin small letter e with diaeresis&euml;ë&#0235;ë&#x00EB;ëALT+0235ë
latin small letter i with grave&igrave;ì&#0236;ì&#x00EC;ìALT+0236ì
latin small letter i with acute&iacute;í&#0237;í&#x00ED;íALT+0237í
latin small letter i with circumflex&icirc;î&#0238;î&#x00EE;îALT+0238î
latin small letter i with diaeresis&iuml;ï&#0239;ï&#x00EF;ïALT+0239ï
latin small letter eth&eth;ð&#0240;ð&#x00F0;ðALT+0240ð
latin small letter n with tilde&ntilde;ñ&#0241;ñ&#x00F1;ñALT+0241ñ
latin small letter o with grave&ograve;ò&#0242;ò&#x00F2;òALT+0242ò
latin small letter o with acute&oacute;ó&#0243;ó&#x00F3;óALT+0243ó
latin small letter o with circumflex&ocirc;ô&#0244;ô&#x00F4;ôALT+0244ô
latin small letter o with tilde&otilde;õ&#0245;õ&#x00F5;õALT+0245õ
latin small letter o with diaeresis&ouml;ö&#0246;ö&#x00F6;öALT+0246ö
division sign&divide;÷&#0247;÷&#x00F7;÷ALT+0247÷
latin small letter o with slash&oslash;ø&#0248;ø&#x00F8;øALT+0248ø
latin small letter u with grave&ugrave;ù&#0249;ù&#x00F9;ùALT+0249ù
latin small letter u with acute&uacute;ú&#0250;ú&#x00FA;úALT+0250ú
latin small letter u with circumflex&ucirc;û&#0251;û&#x00FB;ûALT+0251û
latin small letter u with diaeresis&uuml;ü&#0252;ü&#x00FC;üALT+0252ü
latin small letter y with acute&yacute;ý&#0253;ý&#x00FD;ýALT+0253ý
latin small letter thorn&thorn;þ&#0254;þ&#x00FE;þALT+0254þ
latin small letter y with diaeresis&yuml;ÿ&#0255;ÿ&#x00FF;ÿALT+0255ÿ
Unicode Characters Supported By Most Browsers
not equal to&ne;n/an/a&#x2260;n/an/a
less-than or equal to&le;n/an/a&#x2264;n/an/a
greater-than or equal to&ge;n/an/a&#x2265;n/an/a
black spade suit&spades;n/an/a&#x2660;n/an/a
black club suit, shamrock&clubs;n/an/a&#x2663;n/an/a
black heart suit, valentine&hearts;n/an/a&#x2665;n/an/a
black diamond suit&diams;n/an/a&#x2666;n/an/a

| 3 Comments | 0 TrackBacks

Listed below are links to sites that reference MovableType Garbage Characters Problem:

MovableType Garbage Characters Problem TrackBack URL : http://blog.tmcnet.com/mt/mt-tb.cgi/2324

3 Comments

Dear Tom,

Your MT plugin for stripping control characters is just what I'm looking for, but unfortunately I'm not able to download it... I get an error-message of some kind.

Is there any possibility that you would email me the pl-file?

Seems to be one greate piece of work you've done there!

Thanks in advance!

Best regards,

* hilde *

| Reply

MTStripControlChars.pl Should be fixed now. I had to rename it from .pl to .txt

Hope it works for you! Seems to have fixed the problem for me. I was afraid I'd have to convert to UTF-8 - not a fun project - lots of issues moving MT to UTF-8.

Hi Tom,

I downloaded the plug-in just to eliminate the garbage value that is being displayed whenever we posted entries with prime and double prime quotation marks. However, it doesn't work. Any idea on this? you can check the entries at:
http://72.10.53.208

Thanks
Wendy

Leave a comment

Recent Activity

Friday

  • Tom Keating posted VoIP in Google ChromeOS
  • Tom Keating tweeted, "VoIP in Google ChromeOS: Google released their ChromeOS operating system yesterday. So naturally, as a VoIP fan I w... http://bit.ly/3T68Ox"

Thursday

More...

Recent Comments

  • precoz: I am wondering, if the VOIP market is still increasing read more
  • Dustin: But that's not the point at all. The majority of read more
  • commangerYEK: Nicely done! read more
  • bstella: How did you get an email address to write to read more
  • Paul: Hi Mike, For Cisco (and normal SIP) passive VoIP recording read more
  • redshirt6: Yes, dying to know if it worked! rs6 read more
  • bruno.clermont: SkypeOut work only if I added their phone number as read more
  • bruno.clermont: I just installed it and try to do some call. read more
  • Kris: Tom, I'm curious. Did you ever get any resolution on read more
  • dsi r4: This is the age of smart phone.Nimbuzz launches it's phone read more

Subscribe to Blog

Recent Entry Images

  • google-chromos-flaphone-voip.jpg
  • startech-conxit-tool.jpg
  • thanksgiving-turkey.jpg
  • verizon-island-of-misfit-toys.jpg
  • mindtouch-cloud.jpg
  • microsoft-windows-20-history.jpg
  • taylor-randall-the-price-is-right.jpg
  • fring-google-android-skype.png
  • gotomeeting-logo.gif

Entry Archives

Around TMCnet Blogs

Latest Whitepapers

TMCnet Videos