Managing people is hard enough. Managing smart people is definitely harder.
Managing smart people who constantly blabber about new technology must scare the hell out of most managers.
What takes the cake is:
Managing people and technology and making decisions without listening to your experts.
Managing people and technology and be too scared to make any decisions.
Everything that comes to mind about language, linguistic software, and life in between. Just an alternative linguist's blog, I guess.
Friday, February 29, 2008
Thursday, February 28, 2008
words....
Interesting neologisms of the day (only read with a sense of humor):
celebritology, noun:
1. the study of the lives of celebrities
2. the endless gossip about Britney's life
3. the main subject of attention of the People magazine
chatological, adj. (as in "chatological humor", reminiscent of "eschatological"):
1. the system or theory concerning online chats, online chat rooms, and any other online life species
2. the branch of logic dealing with the same...
Interesting syntactic phenomenon of the day:
Clapton Invited to Play North Korea*
Lucky North Korea will be played by Clapton...
* to confirm the meaning of this schema look at the article
celebritology, noun:
1. the study of the lives of celebrities
2. the endless gossip about Britney's life
3. the main subject of attention of the People magazine
chatological, adj. (as in "chatological humor", reminiscent of "eschatological"):
1. the system or theory concerning online chats, online chat rooms, and any other online life species
2. the branch of logic dealing with the same...
Interesting syntactic phenomenon of the day:
Clapton Invited to Play North Korea*
Lucky North Korea will be played by Clapton...

* to confirm the meaning of this schema look at the article
Monday, February 25, 2008
If it's text processing it is also batch processing...
Here's a consequence of isolating web services from real text processing:
What about batch processing?
What about batch processing?
Tired of superficial text processing
I've been looking at CALAIS the open-source web services for performing information extraction on Reuters data. The info extraction tech used under the hood is based on ClearForest's proprietary rule-based info extraction language called DIAL. Like SRA's and Inxight's (now Business Objects and SAP) similar tools of the trade, this type of languages are tailor-made for general purpose info extraction. The biggest asset of working with such tools is that they allow the developer to go as deep as she can and extend the original NLP basis in order to meet specific customer data and requirements. However, tools like CALAIS shift the focus majorly from the underlying NLP/IE technology to the web services and I/O front and related bells and whistles. They even offer "bounties" for innovative web service applications built for CALAIS. All this while the single most important and most attractive element of this tool is its NLP extensibility power! This remains concealed and under wraps with a roadmap promise to be released by the end of the year. Until then the tool runs with the out-of-the-box IE capabilities, which are -arguably- pretty limited and only impressive to those with limited prior NLP/IE experience. Does someone have their priorities screwed-up?
Labels:
application development,
CALAIS,
ClearForest,
Information Extraction,
Inxight,
NLP,
Reuters,
SRA,
web services
Sunday, February 24, 2008
Semantic Web, the Wikipedia, TextLinguistics and Information Extraction
Paraphrasing an article on AI3 published on Feb 18, 2008, there is finally a recognized collective wave of information extraction tools and sources that mine Wikipedia in order to help enrich the Semantic Web.
Here are a few instances that show-case the successful marriage of text-linguistics and information extraction:
-First paragraphs: Wikipedia articles are being mined for term definitions. It is standard text structure (even in Wikipedia articles) that in the first paragraph outlines the main terms discussed in the article. For this reason, initial paragraphs are good places for looking up terms in the document.
-Redirects: Mining for synomymous terms, spelling variations, abbreviations and other "equivalents" of a term is just par for the course.
-Document Title: This is where we locate named entities and domain-specific terms or semantic variants
-Subject Line and Section Headings: For category identification (topic classification)
-Full text: Whereas in the first paragraph new terms are being defined, it is in the rest of the document that one will find a full description of the definition/meaning, along with related terms, translations and other collocations (linguistic context).
-Embedded article links: Links to and by external pages provide more related terms, potential synonyms, clues for disambiguation and categorization.
-Embedded Lists (and other Hierarchies): Look here for hyponyms, meronyms and other semantic relationships among related terms.
Notice that all of the above are overt structural elements in Wikipedia articles. This type of structure is not unique in Wikipedia articles although Wikipedia standards impose a conscious effort for homogeneity. However, detecting such structural clues in text is no news in the field of text-linguistics (for seminal work in the field check here).
What's new here is the application of text-linguistics analysis techniques to the Web (and Wikipedia in particular) for purposes of Web mining, Information Extraction and the Semantic Web initiative.
The output of such analyses and metrics helps populate ontologies and taxonomies, as well as link records. Areas of focus for these types of application are:
-subcategorization
-WSD, NER and NED
-semantic similarity and relatedness analysis and metrics
Here are a few instances that show-case the successful marriage of text-linguistics and information extraction:
-First paragraphs: Wikipedia articles are being mined for term definitions. It is standard text structure (even in Wikipedia articles) that in the first paragraph outlines the main terms discussed in the article. For this reason, initial paragraphs are good places for looking up terms in the document.
-Redirects: Mining for synomymous terms, spelling variations, abbreviations and other "equivalents" of a term is just par for the course.
-Document Title: This is where we locate named entities and domain-specific terms or semantic variants
-Subject Line and Section Headings: For category identification (topic classification)
-Full text: Whereas in the first paragraph new terms are being defined, it is in the rest of the document that one will find a full description of the definition/meaning, along with related terms, translations and other collocations (linguistic context).
-Embedded article links: Links to and by external pages provide more related terms, potential synonyms, clues for disambiguation and categorization.
-Embedded Lists (and other Hierarchies): Look here for hyponyms, meronyms and other semantic relationships among related terms.
Notice that all of the above are overt structural elements in Wikipedia articles. This type of structure is not unique in Wikipedia articles although Wikipedia standards impose a conscious effort for homogeneity. However, detecting such structural clues in text is no news in the field of text-linguistics (for seminal work in the field check here).
What's new here is the application of text-linguistics analysis techniques to the Web (and Wikipedia in particular) for purposes of Web mining, Information Extraction and the Semantic Web initiative.
The output of such analyses and metrics helps populate ontologies and taxonomies, as well as link records. Areas of focus for these types of application are:
-subcategorization
-WSD, NER and NED
-semantic similarity and relatedness analysis and metrics
Tools of the trade
Try NoteTab Light a freeware with some commercial features available for a 31-day trial. It allows embedded scripting (html, PERL, Gawk). It looks pretty loaded in comparison with NotePad.


Saturday, February 23, 2008
What is it with students?
OK -- you get in the trouble of contacting me with a question related to my doctoral dissertation that you have sitting in front of you. How flattering! Someone actually was interested enough to buy (and hopefully read?) my brainchild.
Now don't spoil the good news with 1) incredibly bad manners and 2) incredulously thick questions.
So to attend to 1, do follow proper email etiquette. By this I mean use a basic "Hi (FirstName)" salutation when you address someone you don't know over email. And while we are at it, please resist the temptation of (inadvertently) offending me by ascribing my brainchild to someone else... Thank you.
To attend to 2, just do not expect people to give you ready answers to q's they belabored for a while! It is called a doctoral dissertation for a reason! I didn't spend one day, one month or even one year on it, dude. Since you have bought it, do me a favor and actually read it or at least browse through its pages. What else can I say. Then, once you are in a place to form intelligent and respectful questions, come back to me. It is call "research" for a reason!
Now don't spoil the good news with 1) incredibly bad manners and 2) incredulously thick questions.
So to attend to 1, do follow proper email etiquette. By this I mean use a basic "Hi (FirstName)" salutation when you address someone you don't know over email. And while we are at it, please resist the temptation of (inadvertently) offending me by ascribing my brainchild to someone else... Thank you.
To attend to 2, just do not expect people to give you ready answers to q's they belabored for a while! It is called a doctoral dissertation for a reason! I didn't spend one day, one month or even one year on it, dude. Since you have bought it, do me a favor and actually read it or at least browse through its pages. What else can I say. Then, once you are in a place to form intelligent and respectful questions, come back to me. It is call "research" for a reason!

Labels:
brainchild,
doctoral dissertation,
etiquette,
manners,
research
watching the news...
The next big revolution will be in the direction of exercising judgment when it comes to information. Do I really need a daily depression dose by CNN and newspapers in order to "get it" that the economy is bad and people are losing their jobs and -most crucially- that the government isn't doing much about it? I say stop watching all this negativity and start doing something about it in your daily life: from voting, from questioning political practices, from living "in the present" and tuning in to what's happening around you (CNN won't tell you how many of your neighbors lost their jobs in the last X months). Open your eyes, be present and use judgment and common sense when it comes to "mass media". Above all THINK. Yes, use the substance in your skull that promotes intelligent life. Mass media perpetuate negativity. This is how it is. It's not to blame. It's up to you to "buy it" or not. I say use your own mind and separate yourself from the "mass". Then you have better chances of staying positive and do something in your life that promotes a change to the better.
Labels:
brain,
change,
mass media,
negativity,
news,
think,
voting
Subscribe to:
Posts (Atom)