Everything that comes to mind about language, linguistic software, and life in between. Just an alternative linguist's blog, I guess.
Thursday, July 9, 2009
iPhone for hearing loss?
Monday, June 1, 2009
IJCNN2009: June 2009 Atlanta, GA
Sunday, May 24, 2009
Porting old C++ code into Eclipse
Monday, April 13, 2009
Bloggy has a new Wordpress blog
Monday, February 2, 2009
Experienceon.com
Saturday, January 31, 2009
Make your browser your handy research and note-taking tool
Zotero is an easy-to-use yet powerful research tool that helps you gather, organize, and analyze sources (citations, full texts, web pages, images, and other objects), and lets you share the results of your research in a variety of ways. An extension to the popular open-source web browser Firefox, Zotero includes the best parts of older reference manager software (like EndNote)—the ability to store author, title, and publication fields and to export that information as formatted references—and the best parts of modern software and web applications (like iTunes and del.icio.us), such as the ability to interact, tag, and search in advanced ways. Zotero integrates tightly with online resources; it can sense when users are viewing a book, article, or other object on the web, and—on many major research and library sites—find and automatically save the full reference information for the item in the correct fields. Since it lives in the web browser, it can effortlessly transmit information to, and receive information from, other web services and applications; since it runs on one’s personal computer, it can also communicate with software running there (such as Microsoft Word). And it can be used offline as well (e.g., on a plane, in an archive without WiFi).I would really like to see a Chrome extension of Zotero! Here's hoping!
Chrome
Did Google abandoned Chrome? How hard is it to get it up-to-date with add-on's?
It's OK (and understandable) to want to market new products fast but let's not forget quality! Someone needs to finish off the work they started with launching Chrome.
Global Google search glitch today
I quickly fixed the problem when I saw the screwed up http address field but, boy, was it scary to know that Google could ever get screwed up even for a few minutes!
I would really never expect Google to allow any human error as they call it to show.
Let's hope this will never happen again.
In a few of the boards I frequent, people who are clueless of basic web editing, linking processes and the potential of human error in every step along the road started talking about using anti-virus programs or different browsers or even different search engines (!).
I believe that glitch could have hurt Google just because most people are not tech savvy enough to realize how "superficial" an error that was and truly just an "accident". For most users, Google is (or has been until today) beyond human error.
Sunday, January 25, 2009
Google and NLU
The ultimate search engine would understand exactly what you mean and give back exactly what you want.
Thank God he also admits that we're not there yet (although Google no doubt works hard toward this goal).
Natural language understanding (NLU) is so much more than a word for word "decoding" of the linguistic meaning. Understanding "exactly what [one] mean[s]" requires full-blown NLU (rather than simply NLP) techniques and approaches. Linguistic and pragmatic context for instance figure big in NLU. And so are some "usability" aspects of the query for instance the intentions of the querent, assumptions and underlying inferences.
The search engines of the future will allow for a query to actually organize matching knowledge they mine from the internet instead of simply match against some web text. So when you plug in a query like "what is the cost of buying a house in Costa Rica in 2009?", you will expect something more specific and on-point than a list of "relevant" documents.
Friday, January 16, 2009
Google v. Powerset v. Live Search
Google v. Powerset search (part III)
Here's a comparison of the two search engines on the search string "flight 1549".
Notice that there is a Wikipedia article about this term. FYI flight 1549 of USAir crashed yesterday over Hudson River luckily with no victims.
Since Powerset only works with wikipedia articles, I thought the term was a good testbed.
The results are pretty disappointing for Powerset. Google yielded the relevant news articles about the flight on the top of its first page. Powerset didn't yield any relevant article in the first page (or as far down as the 5th page; I wouldn't look anywhere deeper than that!):
Ironically, Google listed the "flight 1549" wikipedia article first in its search results effectively beating powerset in its own turf:
Thursday, January 8, 2009
Powerset (part II)
Microsoft always strived to catch up with and eventually challenge Google's search prowess. The acquisition of Powerset is part of this plan.
Powerset have been working off their San Fransisco HQs on a different type of search engine. One that is promised to be "natural language driven".
Here's an example of what the Powerset search can do (according to Powerset):
Instead of searching for book children (ala Google), imagine being able to search for: book for children, book by children, and book about children.
According to Powerset, "there would not be any way for us to properly express the query "books by children" without using the natural language". In other words, a natural language driven search would facilitate the natural tendency of users to phrase their queries using natural language rather than a string of words. So at least from a usability point of view, Powerset search seems pretty well-justified.
But does it work as promised?
I ran a few searches along the lines of books by children on the Powerset search engine and I got results which are at best as mixed as those from Google.
Watch this space for an analysis.
Saturday, January 3, 2009
Friday, January 2, 2009
Speaking of Machine Translation...
Given how popular transliteration is for languages with their own "exotic" alphabets, I believe this is an avenue worthy of further exploration.
Machine Translation or CAT?
In Machine Translation, (usually rule-based) software translates text from one language to another, and the human translator acts as an editor who corrects and/or customizes the process to meet specific project/data/customer requirements.
CAT tools work like a dictionary or taxonomy of sorts that save human-generated translations and keep them easily accessible, organized and consistent.
CAT-Translated text segments are stored in special files called Translation Memories (TM), which are then used as a basis for new translations.
The former relies on NLP and corpus linguistics algorithms and heuristics for the translation of text from one language to another. The latter relies on human translation and capitalizes on storage and editing post-processes.
Both methods have their advantages and disadvantages.
In fact I believe that an "ideal" solution would involve a successful merge of both methodologies. MT methods seem to scale better and CAT methods seem to fare better in terms of precision.