Contents
How to use
PodCastle is a spoken document retrieval service that uses automatic speech recognition (ASR) technologies to provide full-text searching of the speech data in podcasts, individual audio or movie files on the web, and the video clips on the video sharing services (YouTube, Nico Nico Douga, and Ustream.tv).
PodCastle enables users to find English and Japanese speech data including a search term, read full texts of their recognition results, and easily correct recognition errors by simply selecting from a list of candidate alternatives displayed on an error correction interface. The resulting corrections are used to improve the speech retrieval and recognition performance, and users can actively browse speech data by jumping to any word in the recognition results during playback.
PodCastle supports three functions: retrieving, browsing, and annotating speech data. The retrieval and browsing functions let users understand the speech recognition performance better, and the annotation (error correction) function allows them to contribute to improved performance. This improved performance can then lead to a better user experience of retrieving and browsing speech data.
Retrieval Function
This function allows a full-text search of speech recognition results. When the user types in a search term, a list of speech data containing this term is displayed together with text excerpts of speech recognition results around the highlighted search term. These excerpts can be played back individually. The user can access the full text of one of the search results by selecting that result and then switching over to the browsing function.
Browsing (Reading) Function
With this function the user can view the transcribed text of the speech data. To make errors easy to discover, each word is colored according to the degree of reliability estimated during speech recognition. Furthermore, a cursor moves across the text in synchronization with the audio playback. Because the corresponding full-text result of speech recognition is available to external full-text search engines, it can be found by those engines.
Annotation (Error Correction) Function
This function lets users add annotations to correct any recognition errors. Here, annotation means transcribing the content of speech data, either by selecting the correct alternative from a list of competitive candidates or by typing in the correct text. On our error correction interface, a recognition result excerpt is shown around the cursor and scrolled in synchronization with the audio playback. Each word in the excerpt is accompanied by other candidate words generated beforehand by using a confusion network that can condense a huge internal word graph of a large vocabulary continuous speech recognition (LVCSR) system. Users do not have to worry about temporal errors in word boundaries when typing in the correct text because the temporal position of each word boundary is automatically adjusted in training the speech recognizer. Note that users are not expected to correct all the errors but to correct some errors according to their interests.
Shortcut keys
- [Space]
- Audio play/stop (toggle)
- [Cursor_key]
- Move the cursor or select a candidate
- [Enter] or [Alt]+[]
- Show the candidate list or select a candidate
- [Insert] or [F2]
- Type in a text
- [Esc]
- Cancel typing
- [,] or [.]
- Enter punctuation marks
- [m]
- Enter "(MUSIC)"
- [n]
- Enter "(NOISE)"
- [c]
- Enter "(AD)"
- [o]
- Confirm the first candidate
- [Shift]+[o] Undo the confirmation of the first candidate
- [b]
- Enter a line break(full-text mode only)
- [Shift]+[b]
- Delete line break(full-text mode only)
- [s]
- Enter the name of a speaker(full-text mode only)
- [Shift]+[s]
- Delete the name of a speaker(full-text mode only)
- [Delete]
- Enter a blank candidate
- [Shift]+[Delete]
- Restore the original transcript(MacOS: [Shift]+[Fn]+[Delete])
- [Ctrl]+[z]
- Undo
- [Ctrl]+[j][k]
- Switch to the full-text/correction mode
- [Ctrl]+[][][_]
- Change the playback speed(only for podcasts, not for video clips)
- [Ctrl]+[s]
- Save correction results
