Damn MTGoogleSearch!


I’ve been using MTGoogleSearch for Related Entries on my MovableType blog - and unfortunately some of the related entries have UTF-8 characters in the URL titles which changes my webpage’s default iso-8859-1 encoding to UTF-8. If at least one of the Related Entries URL titles has a UTF-8 character, this causes funky characters to display in the blog body. That is, all of my em-dashes, quotes, apostrophes, etc. in the blog body are messed up. Even though I explicitly specify the encoding in the template to be iso-8859-1, I guess MT actually encodes the and saves the file in UTF-8 format. Even though I explicitly specify the encoding (using <$MTPublishCharset$>) in the template and it's set to be iso-8859-1 within mt.cfg, I guess MT actually encodes and saves the file in UTF-8 format due to the UTF-8 characters returned by MTGoogleSearch. Interestingly, when I View Source in Notepad and do a Save As, it displays UTF-8 in the filetype instead of the usual ANSI.

Compare this page: http://blog.tmcnet.com/blog/testblog/main-test.asp (has MTGoogleSearch/Related Entries)

with

http://blog.tmcnet.com/blog/testblog/main-test2.asp (deleted MTGoogleSearch/Related Entries from template)

Same page – I just deleted MTGoogleSearch code from the template.

Notice the funky characters in the first one.

I could probably re-encode all my posts (ISO-8859-1) to UTF-8, but that’s a huge hassle. At least, I think it is. I tried changing MovableType’s default encoding to UTF-8 and rebuilt my site, but then my posts within the MT database had even more funky characters. I'd have to go into each blog post (in the hundreds) and fix all the funky characters and re-save. Uhhh no thanks.

There should be a way of forcing MTGoogleSearch to strip UTF-8 characters or just ignore them without changing the page’s encoding.

Grrr!!! For now I removed the MTGoogleSearch Related Entries feature from my home page, but I'll leave it on the individual blog posts. For some reason UTF-8 characters appear much more often on my main MT template than my individual archive template.

Update! 04/20/2005
I found an alternate solution to strange characters showing up in my blog. The solution is to download the MTStripControlChars plugin This fixed most of the weird characters, but not all of them. I customized the MTStripControlChars file to add other character mappings such as copyright symbols, registered trademarks, letter 'e' with an accent (
é), and other mappings. I had to break out the old ASCII chart of characters and perform some decimal to Hexidecimal conversion which was then added to the MTStripControlChars.pl file. Then you simply put <$MTEntryBody strip_controlchars="2"$> into various locations in the blog's template and presto bango it works!

(Essentially it translates the (would-be) Windows-1252 characters into the corresponding Unicode numeric entities.)

| 5 Comments | 0 TrackBacks

Listed below are links to sites that reference Damn MTGoogleSearch!:

Damn MTGoogleSearch! TrackBack URL : http://blog.tmcnet.com/mt/mt-tb.cgi/1363

5 Comments

Hello,

The easiest way I found to convert my weblog from iso-8859-1 to utf-8 is to use the import/export feature.

I export my weblog to a file which I donwload on my computer. I open it within a good text editor (I used ultraedit) which can convert from and to utf-8. Once converted to utf8 and saved send it back to the server (binary mode!) and import the entries...

and voilà!

mtgoogle search

Leave a comment

Recent Activity

Today

Friday

  • Tom Keating posted VoIP in Google ChromeOS
  • Tom Keating tweeted, "VoIP in Google ChromeOS: Google released their ChromeOS operating system yesterday. So naturally, as a VoIP fan I w... http://bit.ly/3T68Ox"

Thursday

More...

Recent Comments

  • Rohane: does any one have a gizmo5 acc that they dont read more
  • Kadius: Navteq sucks, to say the very least. Maps are way read more
  • התקף חרדה: i wonder if there is a way to uinsert this read more
  • Imtiaz383: I agree trevor, the XBox 360 consoles seem to be read more
  • Richard: I am using the 9350e right now and it is read more
  • Job Joy: Hi, I have two monitors at work and one at read more
  • A Reevs: Guys just use Cydia so Much easier you have to read more
  • BGF: I've had this unit now for about two years. The read more
  • David: Alright so lets see where we are here.. you got read more
  • Jan Boguslawski: Thanks, :) Great Idea! Greetings Jan, http://us.linkedin.com/in/janboguslawski read more

Subscribe to Blog

Recent Entry Images

  • nokia-n900-skype2.jpg
  • ilive-isp209b-portable-speaker-ipod-iphone2.jpg
  • google-chromos-flaphone-voip.jpg
  • startech-conxit-tool.jpg
  • thanksgiving-turkey.jpg
  • verizon-island-of-misfit-toys.jpg
  • mindtouch-cloud.jpg
  • microsoft-windows-20-history.jpg
  • taylor-randall-the-price-is-right.jpg
  • fring-google-android-skype.png

Entry Archives

Around TMCnet Blogs

  • Communications and Technology Blog - Tehrani.com:
    Happy Thanksgiving 2009
  • On Rad's Radar?:
    Open Neutral Fair
  • VoIP & Gadgets Blog:
    Nokia N900 Maemo 5 Bakes in Skype
  • Communications and Technology Blog - Tehrani.com:
    Interop New York 2009 Videos
  • First Coffee:
    Helpstream and CRM, Scalable Video Coding, Gemalto, Samsung Mobile
  • On Rad's Radar?:
    Mainly Cellular News Tidbits
  • The Readerboard:
    Want To Make Money? Shape Up Your Voice Self-Service
  • VoIP & Gadgets Blog:
    iLive ISP209B Portable Speaker System Review - Alarm Clock
  • Latest Whitepapers

    TMCnet Videos