Wednesday, December 17, 2008

Automated Information Extraction in Media Production

2nd International Workshop on Automated Information Extraction in Media Production (AIEMPro09)

Special Session at WIAMIS 2009

London, 6-8 May 2009

After the successful exordium at DEXA 2008, the second edition of AIEMPro will have the form of a special session at WIAMIS 2009 (The International Workshop on Image Analysis for Multimedia Interactive Services)

Tentative deadlines:

Paper submission: 11 January 2009
Notification of reviews: 1 February 2009 Final camera ready (this is a STRICT DEADLINE): 13th February 2009

Areas of Interest (not limited to):

· Efficient and real-time audiovisual indexing in acquisition
· Automated repurposing of archived material on new media channels
· Automated news production
· Efficient indexing and retrieval of multimedia streams
· Automatic speech recognition and personality identification
· Collaborative systems for media production
· Information Retrieval systems from Multimedia Archives
· Automated material copyright infraction detection and material fingerprinting
· Content summarisation (e.g., sports highlights)
· Audiovisual genre and editorial format detection and characterisation
· Cross-media indexing and integration
· Content segmentation tools (e.g., shot and scene segmentation)
· Evaluation methods for multimedia analysis tools

Prospective authors must submit their work following the WIAMIS formatting instructions (http://wiamis2009.qmul.net/submissions.php) and send the paper in PDF format DIRECTLY to the organisers by e-mail.

Organisers:
Alberto Messina (RAI CRIT) a.messina@rai.it
Jean-Pierre Evain (European Broadcasting Union) evain@ebu.ch
Robbie De Sutter (VRT medialab) robbie.desutter@vrt.be

SpinVox

I signed up to try SpinVox's voicemail-to-text service on my mobile phone. They quickly set that up and I was impressed that it's a complimentary demo. Performance seemed to be lacking in proper name entity recognition whereas not so much in catching exotic accents. I had a Danish friend leave a voicemail for me with the details of our following day's meeting. SpinVox caught everything my Danish friend said but for her (Danish) name, my (Greek) name, and the name of the place (unfortunately for SpinVox we were meeting at a local Starbucks, so no excuses for not catching that! /wahaha.......... /hmm).
To SpinVox's credit, the call was placed in the middle of the street, a lot of noise in the background and the caller had an accent.
However, in actual conditions (if I really depended on that converted to text voicemail for my meeting) their performance was poor and the text I got useless as unfortunately the proper names SpinVox missed were critical information. For instance, I wouldn't know who was calling since they screwed up their name, and I wouldn't know where she wanted to meet because SpinVox didn't catch the place name. So in that respect, although an admirable effort, it leaves a lot to be desired.
Maybe users could build their own local dictionaries of names based on their -say- address books. They could upload to the SpinVox's server a dictionary of names pronounced with the particular user's accent to help augment SpinVox's central server's dictionary of names and accents. Still, a lot of real-time speech comes with a high unpredictability factor as various callers are expected to call the particular user. Only an adaptive speech recognition system could actually learn from ad hoc input in order to improve itself. Imagine for instance if every time I had a new caller, SpinVox could learn to memorize and subsequently recognize their accent and linguistic model; so, if your boss has an American accent and usually talks about project XYZ and meeting you at Room 234B in Building ABC, SpinVox could learn to expect this type of "talk" (and accent) when he next calls you. That would of course improve speech recognition accuracy and it would involve the successful marriage of a memory (lexicon/vocabulary + accents) with an adaptive learning algorithm.