Damn MTGoogleSearch!

Tom Keating : VoIP & Gadgets Blog
Tom Keating
CTO
| VoIP & Gadgets blog - Latest news in VoIP & gadgets, wireless, mobile phones, reviews, & opinions

Damn MTGoogleSearch!


I’ve been using MTGoogleSearch for Related Entries on my MovableType blog - and unfortunately some of the related entries have UTF-8 characters in the URL titles which changes my webpage’s default iso-8859-1 encoding to UTF-8. If at least one of the Related Entries URL titles has a UTF-8 character, this causes funky characters to display in the blog body. That is, all of my em-dashes, quotes, apostrophes, etc. in the blog body are messed up. Even though I explicitly specify the encoding in the template to be iso-8859-1, I guess MT actually encodes the and saves the file in UTF-8 format. Even though I explicitly specify the encoding (using <$MTPublishCharset$>) in the template and it's set to be iso-8859-1 within mt.cfg, I guess MT actually encodes and saves the file in UTF-8 format due to the UTF-8 characters returned by MTGoogleSearch. Interestingly, when I View Source in Notepad and do a Save As, it displays UTF-8 in the filetype instead of the usual ANSI.

Compare this page: http://blog.tmcnet.com/blog/testblog/main-test.asp (has MTGoogleSearch/Related Entries)

with

http://blog.tmcnet.com/blog/testblog/main-test2.asp (deleted MTGoogleSearch/Related Entries from template)

Same page – I just deleted MTGoogleSearch code from the template.

Notice the funky characters in the first one.

I could probably re-encode all my posts (ISO-8859-1) to UTF-8, but that’s a huge hassle. At least, I think it is. I tried changing MovableType’s default encoding to UTF-8 and rebuilt my site, but then my posts within the MT database had even more funky characters. I'd have to go into each blog post (in the hundreds) and fix all the funky characters and re-save. Uhhh no thanks.

There should be a way of forcing MTGoogleSearch to strip UTF-8 characters or just ignore them without changing the page’s encoding.

Grrr!!! For now I removed the MTGoogleSearch Related Entries feature from my home page, but I'll leave it on the individual blog posts. For some reason UTF-8 characters appear much more often on my main MT template than my individual archive template.

Update! 04/20/2005
I found an alternate solution to strange characters showing up in my blog. The solution is to download the MTStripControlChars plugin This fixed most of the weird characters, but not all of them. I customized the MTStripControlChars file to add other character mappings such as copyright symbols, registered trademarks, letter 'e' with an accent (
é), and other mappings. I had to break out the old ASCII chart of characters and perform some decimal to Hexidecimal conversion which was then added to the MTStripControlChars.pl file. Then you simply put <$MTEntryBody strip_controlchars="2"$> into various locations in the blog's template and presto bango it works!

(Essentially it translates the (would-be) Windows-1252 characters into the corresponding Unicode numeric entities.)



Featured Events