Friday, January 16, 2009

Google v. Powerset v. Live Search



Unfortunately, Microsoft's Live Search engine did not do much better than Powerset for the same search string: 













LiveSearch results

PS: Click on the link to enlarge the snapshot. 




Google v. Powerset search (part III)


Here's a comparison of the two search engines on the search string "flight 1549".
Notice that there is a Wikipedia article about this term. FYI flight 1549 of USAir crashed yesterday over Hudson River luckily with no victims.
Since Powerset only works with wikipedia articles, I thought the term was a good testbed.


The results are pretty disappointing for Powerset. Google yielded the relevant news articles about the flight on the top of its first page. Powerset didn't yield any relevant article in the first page (or as far down as the 5th page; I wouldn't look anywhere deeper than that!):








Powerset results


Ironically, Google listed the "flight 1549" wikipedia article first in its search results effectively beating powerset in its own turf:









Google results


Thursday, January 8, 2009

Powerset (part II)

So, Microsoft acquired Powerset.
Microsoft always strived to catch up with and eventually challenge Google's search prowess. The acquisition of Powerset is part of this plan.
Powerset have been working off their San Fransisco HQs on a different type of search engine. One that is promised to be "natural language driven".
Here's an example of what the Powerset search can do (according to Powerset):
Instead of searching for book children (ala Google), imagine being able to search for: book for children, book by children, and book about children.
According to Powerset, "there would not be any way for us to properly express the query "books by children" without using the natural language". In other words, a natural language driven search would facilitate the natural tendency of users to phrase their queries using natural language rather than a string of words. So at least from a usability point of view, Powerset search seems pretty well-justified.
But does it work as promised?
I ran a few searches along the lines of books by children on the Powerset search engine and I got results which are at best as mixed as those from Google.
Watch this space for an analysis.

Saturday, January 3, 2009

Friday, January 2, 2009

Speaking of Machine Translation...

Have you noticed how no online MT tool (I tried online Systran, Google and Babelfish) is capable of translating transliterated Greek into English?
Given how popular transliteration is for languages with their own "exotic" alphabets, I believe this is an avenue worthy of further exploration.

Machine Translation or CAT?

Asking whether machine translation (MT) or computer-aided translation (CAT) works better is pretty much a version of the chicken or the egg question in the translation circles.

In Machine Translation, (usually rule-based) software translates text from one language to another, and the human translator acts as an editor who corrects and/or customizes the process to meet specific project/data/customer requirements.

CAT tools work like a dictionary or taxonomy of sorts that save human-generated translations and keep them easily accessible, organized and consistent.
CAT-Translated text segments are stored in special files called Translation Memories (TM), which are then used as a basis for new translations.

The former relies on NLP and corpus linguistics algorithms and heuristics for the translation of text from one language to another. The latter relies on human translation and capitalizes on storage and editing post-processes.

Both methods have their advantages and disadvantages.
In fact I believe that an "ideal" solution would involve a successful merge of both methodologies. MT methods seem to scale better and CAT methods seem to fare better in terms of precision.

Wednesday, December 31, 2008

Happy New Year!

Slang is part of life

Down with linguistic purity!

If you surround yourself with people who overuse such language, kindly do yourself and all of us a favor and simply remove yourself from the particular linguistic environment. Why keep nagging? Language is inevitable and slang or various linguistic fads are part of life (and language). If it weren't for those, various mainstream NLP techniques would have a hard time programming in probabilities for single-word transitions (okay, that's a little NLP joke). Besides, most of the language listed in the link above is teenage-speak.

New Year's wish



[cartoon by Nick Galiafianakis for the Washington Post]

Tuesday, December 30, 2008

Read someone the riot act

Here's the gist of the Riot Act [enforced by the British government in 1715]:

"Our sovereign Lord the King chargeth and commandeth all persons, being assembled, immediately to disperse themselves, and peaceably to depart to their habitations, or to their lawful business, upon the pains contained in the act made in the first year of King George, for preventing tumults and riotous assemblies. God save the King."

In other words (aka the British way): 'you noisy louts, don't you know there are people here trying to sleep?'
OR (the un-cut and slightly un-kind and definitely non-British version):
tell someone(s) to "Shut the F*&#$% Up!"...

[Source: The Phrase Finder]