Ramifications of a linguist's life: 01/01/2009

Saturday, January 31, 2009

Make your browser your handy research and note-taking tool

I love zotero. It's a free online note-taking tool that right now works with the Mozilla Firefox browser. If you are a student or a researcher and do a lot of online work (who doesn't these days), give it a try. It keeps your online findings organized and in one place. It really makes your searches truly productive and ready to use. Here's a good summary of the Zotero functionality from the Zotero site:

Zotero is an easy-to-use yet powerful research tool that helps you gather, organize, and analyze sources (citations, full texts, web pages, images, and other objects), and lets you share the results of your research in a variety of ways. An extension to the popular open-source web browser Firefox, Zotero includes the best parts of older reference manager software (like EndNote)—the ability to store author, title, and publication fields and to export that information as formatted references—and the best parts of modern software and web applications (like iTunes and del.icio.us), such as the ability to interact, tag, and search in advanced ways. Zotero integrates tightly with online resources; it can sense when users are viewing a book, article, or other object on the web, and—on many major research and library sites—find and automatically save the full reference information for the item in the correct fields. Since it lives in the web browser, it can effortlessly transmit information to, and receive information from, other web services and applications; since it runs on one’s personal computer, it can also communicate with software running there (such as Microsoft Word). And it can be used offline as well (e.g., on a plane, in an archive without WiFi).

I would really like to see a Chrome extension of Zotero! Here's hoping!

Chrome

Since Google announced its brand new browser last year, I stopped using Mozilla and turned exclusively to Chrome. I did have to get back to Mozilla to load some pages full of add-on's that Chrome does not yet support. And I even noticed a slight deterioration in the quality of the Chrome browser in the case of a few pages that I use daily.
Did Google abandoned Chrome? How hard is it to get it up-to-date with add-on's?
It's OK (and understandable) to want to market new products fast but let's not forget quality! Someone needs to finish off the work they started with launching Chrome.

Global Google search glitch today

What happened, you guys?
I quickly fixed the problem when I saw the screwed up http address field but, boy, was it scary to know that Google could ever get screwed up even for a few minutes!
I would really never expect Google to allow any human error as they call it to show.
Let's hope this will never happen again.
In a few of the boards I frequent, people who are clueless of basic web editing, linking processes and the potential of human error in every step along the road started talking about using anti-virus programs or different browsers or even different search engines (!).
I believe that glitch could have hurt Google just because most people are not tech savvy enough to realize how "superficial" an error that was and truly just an "accident". For most users, Google is (or has been until today) beyond human error.

Sunday, January 25, 2009

Google and NLU

Page has said the following:

The ultimate search engine would understand exactly what you mean and give back exactly what you want.

Thank God he also admits that we're not there yet (although Google no doubt works hard toward this goal).
Natural language understanding (NLU) is so much more than a word for word "decoding" of the linguistic meaning. Understanding "exactly what [one] mean[s]" requires full-blown NLU (rather than simply NLP) techniques and approaches. Linguistic and pragmatic context for instance figure big in NLU. And so are some "usability" aspects of the query for instance the intentions of the querent, assumptions and underlying inferences.
The search engines of the future will allow for a query to actually organize matching knowledge they mine from the internet instead of simply match against some web text. So when you plug in a query like "what is the cost of buying a house in Costa Rica in 2009?", you will expect something more specific and on-point than a list of "relevant" documents.

Friday, January 16, 2009

Google v. Powerset v. Live Search

Unfortunately, Microsoft's Live Search engine did not do much better than Powerset for the same search string:

LiveSearch results

PS: Click on the link to enlarge the snapshot.

Google v. Powerset search (part III)

Here's a comparison of the two search engines on the search string "flight 1549".
Notice that there is a Wikipedia article about this term. FYI flight 1549 of USAir crashed yesterday over Hudson River luckily with no victims.
Since Powerset only works with wikipedia articles, I thought the term was a good testbed.

The results are pretty disappointing for Powerset. Google yielded the relevant news articles about the flight on the top of its first page. Powerset didn't yield any relevant article in the first page (or as far down as the 5th page; I wouldn't look anywhere deeper than that!):

Powerset results

Ironically, Google listed the "flight 1549" wikipedia article first in its search results effectively beating powerset in its own turf:

Google results

Thursday, January 8, 2009

Powerset (part II)

So, Microsoft acquired Powerset.
Microsoft always strived to catch up with and eventually challenge Google's search prowess. The acquisition of Powerset is part of this plan.
Powerset have been working off their San Fransisco HQs on a different type of search engine. One that is promised to be "natural language driven".
Here's an example of what the Powerset search can do (according to Powerset):
Instead of searching for book children (ala Google), imagine being able to search for: book for children, book by children, and book about children.
According to Powerset, "there would not be any way for us to properly express the query "books by children" without using the natural language". In other words, a natural language driven search would facilitate the natural tendency of users to phrase their queries using natural language rather than a string of words. So at least from a usability point of view, Powerset search seems pretty well-justified.
But does it work as promised?
I ran a few searches along the lines of books by children on the Powerset search engine and I got results which are at best as mixed as those from Google.
Watch this space for an analysis.

Saturday, January 3, 2009

Powerset now part of Microsoft

I think that's big news on many levels. More on this later...

Friday, January 2, 2009

Speaking of Machine Translation...

Have you noticed how no online MT tool (I tried online Systran, Google and Babelfish) is capable of translating transliterated Greek into English?
Given how popular transliteration is for languages with their own "exotic" alphabets, I believe this is an avenue worthy of further exploration.

Machine Translation or CAT?

Asking whether machine translation (MT) or computer-aided translation (CAT) works better is pretty much a version of the chicken or the egg question in the translation circles.

In Machine Translation, (usually rule-based) software translates text from one language to another, and the human translator acts as an editor who corrects and/or customizes the process to meet specific project/data/customer requirements.

CAT tools work like a dictionary or taxonomy of sorts that save human-generated translations and keep them easily accessible, organized and consistent.
CAT-Translated text segments are stored in special files called Translation Memories (TM), which are then used as a basis for new translations.

The former relies on NLP and corpus linguistics algorithms and heuristics for the translation of text from one language to another. The latter relies on human translation and capitalizes on storage and editing post-processes.

Both methods have their advantages and disadvantages.
In fact I believe that an "ideal" solution would involve a successful merge of both methodologies. MT methods seem to scale better and CAT methods seem to fare better in terms of precision.

Ramifications of a linguist's life