Is MobileMe the ONE?

Desktop June 12th, 2008

The 3G iPhone and the accompanied MobileMe service introduced in WWDC 2008 stirred the buzz in the blogosphere as ever. MobileMe logo The MobileMe tries to solve the long-lasting synchronization problem, which Microsoft have not figured out a panacea for both enterprise market and mass customers. Apple’s answer seems like an intuitive and elegant solution in the first impression, and we may wonder, is MobileMe the ONE?

I really doubt it, based on some common sense and reasonable guess as the service is not available so far, please correct me in your comment if I am wrong.

Push model may not cross the enterprise boundary
Push mail has been very successful in the enterprise world, but why there are no such things like push document, push worksheet? The instantaneous synchronization is just purely wasted in most cases, even not that inviting in the Corp net which is powered by Gigabit ethernet, let alone the consumers are using much slower high speed internet or 3G wireless connection.

There is no silver bullet to resolve conflicts
MobileMe supports family pack, so it is possible that one file is edited in different computers. The conflict has to be resolved however fast the synchronization is. We also need to track the version number because it is hard to tell which copy is the latest without network time synchronization, so ultimately MobileMe would be a version control system. But according to my personal experience, there is no more intuitive way to resolve the conflict than three-way diff. But this is less user-friendly and not an available option if you are working with images or videos.

Accessibility and Interoperability
These are not big issues for die hard Apple fans. I believe Bonjour would automatically configure the firewall and all made-by-Apple applications would talk to MobileMe without any problem. No idea whether Apple would release the protocol document, like Microsoft did recently, to encourage 3rd party interoperability.

I personally prefer SSH, RSync and SVN( maybe Mercury later), periodically back up to my friends in a different zip code for extra safety.

When RegEx meets WordML

Development May 23rd, 2008

One of excuses to keep me updating this blog is some exhausting logistics work I need to tackle with in last couple weeks, long story short, the requirement is to load a excel file, filter with lookup table, then retrieve extra information from a line-based text file and render the docx file with some words highlighted. Let’s decompose this problem to tasks one by one:

Retrieve extra information from a line-based text file
A typical regular expression match example.

Render the docx file with some words highlighted
This task seems easy, as you know, ultimately docx file is a zipped Office Open XML, aka text. We can even replace all the words in one shot as this recipe suggests. Assume the example sentence is:
Kun loves programming and beer, would you buy me one beer?
The to-be highlighted words are programming and beer.

The behavior of Microsoft Office Word 2007 breaks the sentence into 7 pieces: Kun loves_, programming, _and_, beer, , would you buy me one, beer and ?; _ stands for leading or tailing space. Each piece is rendered with either normal style or highlighted style. That is quite messy.

WordML may support embedded style in the bible somewhere, but I am going to live with that since it is crunch time and we can cheat: have you noticed that our highlighted words are always followed by the normal text? So we can put the whole sentence in the normal style enclosure, whenever the RegEx hits the match, we break the enclosure, insert the highlighted words with highlighted style, then start a new normal enclosure. Brilliant!

Hold on, the text is rendered in Word 2007 as:
Kun lovesprogrammingandbeer, would you buy me onebeer?
According to WordML spec or the scream of Jeni:

It is also notable that since leading and trailing whitespace is not normally significant in XML; some runs require a designating specifying that their whitespace is significant via the xml:space element.

So the formal solution for this quiz is to add xml:space=preserve attribute whenever the normal text has leading and/or tailing space(s). In our case, Kun loves_, _and_ and , would you buy me one_ need that attribute but ? does not. The versatile re.sub also supports a callback function instead of string for more complicated substitution like this. As long as the highlighted word is succeeded by space, the succeeding normal text needs to preserve the space, so we can build the pattern like this:

pattern = re.compile(“(?<=\W(%s))(\s)” % “|”.join(the_list_of_to_be_highlighted_words))

in the callback function, we set the attribute if group(1) is matched. Some corner cases needs more post-process: we need to set the attribute if the highlighted word is not in the head of the line, otherwise we need to eliminate unnecessary normal enclosure.

Or we can set xml:space=perserve to all normal text with extra bytes overhead. It is not perfect but good enough.

I will talk about the CSV later.

In memory of the victims of China earthquake

Misc May 18th, 2008

28,881 victims(confirmed on May 16) were killed in the magnitude 7.9 8.2 earthquake that struck Sichuan province, China, May 12. Another 150,000 people were injured and await medical treatment and water.

This blog is painted to black to vigil the victims in the disaster.

PyAWS 0.3.0 released

Development, Web May 6th, 2008

After 6 months, PyAWS 0.3.0 is eventually released. You can check out the tar ball here.

I almost abandoned this project as I found the XSLT approach is more appealing: ideal for AJAX application and easy to integrate via simplejson in the server side. Furthermore, I joined Microsoft, moved to Canada, and had less spare time to work on less interested hobby work. The last straw is the unexpected complicity of the the BIG FAT refactory.

Until recently, I got the email from one PyAWS user, he reported a bug on unexpected result of ListLookup operation. It is so good to hear from some users that this library still benefits somebody in the world. So I picked it up, completed the refactory and released it today. The library still in active development, the code style stinks, the document sucks and most of all, testing is lacking — I would explain it for a little bit here.

I am a big fan of TDD personally, and we have respected testing troops to help building our products in MSFT as well. However, the complexity of PyAWS is far beyond my capacity: there are tens of operations and twenties of response groups, and response groups may combine, that make it extremely difficult to cover all the paths. To make it worse, the AWS is dynamic, there is no guarantee that the consecutive queries would return the same result. I may consider automation to facilitate the unit tests. If you have better ideas, please leave a comment here.

AideRSS relieves the pain, fails to cure

Web April 29th, 2008

AideRSS logoAideRSS aims to resolve a problem, so called “information overload”, one of the typical symptoms in 2.0 era is we are overwhelmed by the piled thousands of unread posts in Google Reader and it so hard to catch up the pace of the rest of the world. AideRSS ranks the posts to Good, Great and Best categories based on its PostRank. This approach moves one step forward to relieve the pain, but still does not to resolve the problem.

PostRank measures the attention, not the value
Though the algorithm of PostRank is not disclosed as the mysterious PageRank, from the promotion voucher and personal observation, the PostRank is determined by the comments, reference and social bookmarking. In another word, a provocative flaming post may invite more attention, and achieves a higher PostRank then a plain HOWTO, though the latter is more valuable imho.

PostRank reflects the group wisdom, not the personal choice
I once read digg’s program channel, then moved on to programming@reddit because the latter is more programmer-oriented and just meets my flavor. Once the community grows big, the voice from the majority dwarf the “long tail” which the minority audience care most. The same dilemma also applies to the PostRank.

PostRank is too humble
There is no evidence, still my wild guess; the PostRank is feed-based, not internet-wise as PageRank. This assumption is quite reasonable: the global PostRank is too expensive for a startup company; the global PostRank is too provocative to the bloggers, how come the post in Gizmodo ends up lower than the alternative Engadget? The humbleness renders the sorting across feeds less useful.

The next step towards perfection
To address the above issues, the PostRank needs to be personalized. Let the user to define what is Good, Great, Best based on his/her historic behavior; check the influence from the public using the popularity contest score; discover the similar minority for sharing and referring.

A hybrid Bayesian classifier case + Web 2.0 community.