
Everything that comes to mind about language, linguistic software, and life in between. Just an alternative linguist's blog, I guess.
Friday, January 16, 2009
Google v. Powerset v. Live Search

Google v. Powerset search (part III)
Here's a comparison of the two search engines on the search string "flight 1549".
Notice that there is a Wikipedia article about this term. FYI flight 1549 of USAir crashed yesterday over Hudson River luckily with no victims.
Since Powerset only works with wikipedia articles, I thought the term was a good testbed.
The results are pretty disappointing for Powerset. Google yielded the relevant news articles about the flight on the top of its first page. Powerset didn't yield any relevant article in the first page (or as far down as the 5th page; I wouldn't look anywhere deeper than that!):
Ironically, Google listed the "flight 1549" wikipedia article first in its search results effectively beating powerset in its own turf:
Thursday, January 8, 2009
Powerset (part II)
Microsoft always strived to catch up with and eventually challenge Google's search prowess. The acquisition of Powerset is part of this plan.
Powerset have been working off their San Fransisco HQs on a different type of search engine. One that is promised to be "natural language driven".
Here's an example of what the Powerset search can do (according to Powerset):
Instead of searching for book children (ala Google), imagine being able to search for: book for children, book by children, and book about children.
According to Powerset, "there would not be any way for us to properly express the query "books by children" without using the natural language". In other words, a natural language driven search would facilitate the natural tendency of users to phrase their queries using natural language rather than a string of words. So at least from a usability point of view, Powerset search seems pretty well-justified.
But does it work as promised?
I ran a few searches along the lines of books by children on the Powerset search engine and I got results which are at best as mixed as those from Google.
Watch this space for an analysis.
Saturday, January 3, 2009
Friday, January 2, 2009
Speaking of Machine Translation...
Given how popular transliteration is for languages with their own "exotic" alphabets, I believe this is an avenue worthy of further exploration.
Machine Translation or CAT?
In Machine Translation, (usually rule-based) software translates text from one language to another, and the human translator acts as an editor who corrects and/or customizes the process to meet specific project/data/customer requirements.
CAT tools work like a dictionary or taxonomy of sorts that save human-generated translations and keep them easily accessible, organized and consistent.
CAT-Translated text segments are stored in special files called Translation Memories (TM), which are then used as a basis for new translations.
The former relies on NLP and corpus linguistics algorithms and heuristics for the translation of text from one language to another. The latter relies on human translation and capitalizes on storage and editing post-processes.
Both methods have their advantages and disadvantages.
In fact I believe that an "ideal" solution would involve a successful merge of both methodologies. MT methods seem to scale better and CAT methods seem to fare better in terms of precision.
Wednesday, December 31, 2008
Slang is part of life
If you surround yourself with people who overuse such language, kindly do yourself and all of us a favor and simply remove yourself from the particular linguistic environment. Why keep nagging? Language is inevitable and slang or various linguistic fads are part of life (and language). If it weren't for those, various mainstream NLP techniques would have a hard time programming in probabilities for single-word transitions (okay, that's a little NLP joke). Besides, most of the language listed in the link above is teenage-speak.
Tuesday, December 30, 2008
Read someone the riot act
"Our sovereign Lord the King chargeth and commandeth all persons, being assembled, immediately to disperse themselves, and peaceably to depart to their habitations, or to their lawful business, upon the pains contained in the act made in the first year of King George, for preventing tumults and riotous assemblies. God save the King."
In other words (aka the British way): 'you noisy louts, don't you know there are people here trying to sleep?'
OR (the un-cut and slightly un-kind and definitely non-British version):
tell someone(s) to "Shut the F*$% Up!"...
[Source: The Phrase Finder]