See all apps

Hyperaudio Pad and Language Course Creator

Submitted on January 17, 2013

What problem are you intending to solve?

Learning languages is difficult, we aim to make it easier, more engaging and collaborative by developing an application based on word-accurately-timed transcripts - an approach we call Hyperaudio.

What is the technological approach, or development roadmap?

The technical approach is to create a system composed of loosely coupled modules, this means that we can work on each of the modules in parallel.

The core module is the Hyperaudio Pad is a tool for manipulating content, most other modules are concerned with creating that content, which currently can be achieved by other means.

Our focus then will be on the Hyperaudio Pad, but we will also work on the various other modules we need to provide content.

In true Minimum Viable Product tradition we will only create what we need to make the system usable.

What we intend to make

When we create transcripts for audio and video based media we make that media more accessible, searchable, navigable and indexable - in so doing we make audio and video a first class citizen of the web.

The Hyperaudio ecosystem helps us create word-accurate timed-transcripts and allows us to remix the associated audio or video into new forms.

Currently there is no convenient way to create accurate interactive audio/video transcripts and no tools or services that facilitate the remixing of the resulting transcripts.

The specific concept we wish to prove is that we can use hyperaudio to effectively learn new languages.

Our aim is to create an open-source Hyperaudio ecosystem We will roll out loosely-coupled tools and services to aid the work-flow of creating interactive transcripts from audio and video files right through to the editing and remixing of those ‘hypertranscripts’. We will start with minimal viable components and iterate upon them in response to community input and user feedback.

There are several components that will initially make up the system. All five elements of the system will be interoperable but loosely-coupled, this means that components can provide input that can be used by other components although each can be used independently.

The Hyperaudio Ecosystem

1. Transcript Maker

This service will take audio/video and optional non-timed transcript as inputs and return a timed hypertranscript marked up as HTML (or JSON). If a transcript is included accuracy will be improved. We will use the open-source CMU Sphinx [] or similar to run the automatic transcription. We have run initial tests using CMU Sphinx and first results are encouraging.

2. Transcript Cleaner

Results from the Transcript-Maker can be cleaned using the Transcript Cleaner which will provide a convenient user interface in which to correct any word-recognition errors or anomalies that may have occurred.

3. Transcript Converter

We will provide a service that takes loosely timed formats such as subtitles and closed captioning and converts to a word-accurate hypertranscript.

4. The Hyperaudio Pad

The heart and soul of the system - we would use the Hyperaudio Pad to manipulate and assemble hypertranscripts. We hope to build up a body of work that people can draw from to create their own original remixes. Ease of use on mobile devices will be a key requirement and assembling audio and video from its underlying transcript helps simplify this process. We will borrow heavily from the 'word processor' paradigm - allowing copy and paste, drag and drop of parts of transcripts which can be moved with associated media intact. We will also provide a small range of commands that can be typed into the pad and will double as descriptions. Once assembled users could publish the resulting audio or video programmes at and/or embed work on their own sites.

Importantly media is not cut, just referenced - start and stop points are simply pointed to within larger pieces, making it easy to incorporated parts of the media before and after the references making true additive remixing a possibility. In short - nothing is left on the cutting-room floor.

Napkin Sketch

5. Hyperaudio API

We will create an API that developers can hook into in order to make their own applications.

6. Language Course Creator

We will prove Hyperaudio and the Hyperaudio ecosystem by building an application on top of it, this application will take advantage of the Hyperaudio API and the Hyperaudio Pad in order to demonstrate its usefulness as both a tool and a platform. We also believe that the Hyperaudio Language Course Creator will become a useful and intuitive tool that will allow language course designers to create new effective and engaging learning methods and content.

How will end users interact with it, and how will they benefit?

Users will interact with the Hyperaudio Pad by uploading or selecting audio or video and their associated accurately timed transcripts to the Hyperaudio Library. The user can then easily edit the media by copying, pasting, dragging and dropping sections of various transcripts and associated media, to create new forms. The Hyperaudio ecosystem will benefit its users in many different ways, one of the key applications is education. It would facilitate the creation of language-learning applications. Course designers could upload, share and discover transcribed content that they could then make available to their students. Students would benefit greatly from seeing the text of content exactly as it was being spoken. Students can intuitively and easily repeat content by clicking on the appropriate word and would be able to easily pinpoint, share and save specific words and phrases that they find interesting or challenging. Additionally by encouraging students to use the Hyperaudio Pad to create their own stories in other languages we create an enjoyable and deep method of learning language. Sharing and studying each others productions will both motivate and reinforce learning and remember everything a student creates with the Hyperaudio Pad comes with an interactive transcript built in. To sum up - learning language will be revolutionised by concepts and tools like Hyperaudio and the Hyperaudio Pad that hinge on representing the audio as text and indicating the exact time a word is spoken, further by making this mechanism interactive and creative we motivate and allow learning at individual pace.

How will your app leverage the 1Gbps, sliceable and deeply programmable network?

In general, being able to jump to any point in a media file and play instantly from that point would greatly improve the usability of the whole service. When stitching together sections of multiple pieces of media, this media has to be buffered or at least made 'seekable' to ensure smooth playback and transition. With high-speed networks we should be able to achieve seamless playback of disparate media. In doing so users would not need to compile media into one file and this would encourage full remixing of media as sources and context remain intact. The Hyperaudio library would offer media and media remixes with a full view-source capability as a way of promoting an accessible and lossless remix culture. In the case that a user would submit media for transcription, the uploading speed and so turnaround speeds could be greatly improved. Also as speech-to-text transcription is highly processor intensive, it would be great to spread the load amongst clients via custom P2P networks - in theory this would provide a robust and auto-scalable system. We want to really push things further and create a proper distributed system, where browsers also act as servers so that we can create something robust, efficient and scalable. We aim to do this using the latest browser technologies such as WebRTC combined with a programmable network and a protocol similar to say bittorrent where large files (in our case media files) are distributed and downloaded from many sources - there will of course be some sort of central control so that we can 'take down' media if we have to, but in essence a distributed file system should help with scalability and general efficiency by eliminating bottlenecks.

Further application information

Additional supporting information, materials and resources

Read about project updates - project blog

Take a look at the existing code - project repository

Will your work be beta-ready by the end of the Development Challenge?

The Hyperaudio Language Course Creator will be beta ready as will all parts of the Hyperaudio ecosystem that the application runs on.

How much effort do you expect this work to take?

6 weeks to two months. My colleague Mark Panaghiston and I will both work on it full-time while we will be contracting others to help with various parts of the system that can be worked on in parallel.

Do you need help?

We may need help with VC etc if we decide to pursue that route. And we'll need advice - and maybe some coaching on how to properly exploit the gigabit programmable network.

If you can help let them know in the comments below.

Mark Boas

Web App Developer specializing in Front-end, JS, Real-time Web and HTML5 esp. audio/video. jPlayer dev. Hyperaudio. Open Web & Knight-Mozilla Open News Fellow.

and team members

Mark Boas - Web Developer / Project Manager Experienced in creating media based demos, prototypes and products for organisations such as WNYC, BBC and Al-Jazeera. Will head up the team, establish minimum and viable routes. Mark has experience of creating community and reactively rolling out updates as jPlayer project co-ordinator. Mark often doubles as project-manager of teams he works with. Mark Panaghiston - Web Developer Skills include JavaScript and web based media built up as core developer of the popular jPlayer audio/video library. Mark will be responsible for ensuring media playback is efficient and as mooths as possible over all devices. Mark also brings with him general engineering skills that will allow him to contribute to various parts of the system. Laurian Gridinoc - Creative Technologist Laurian works with cutting-edge technology and makes it work - he also brings a creative edge and sense of the aesthetic to the team. He is currently actively researching speech-to-text technologies and has made contact with major players in this area. He also has significant server experience. Matteo Spinelli - Mobile Web Developer Matteo is one of the world's leading mobile web developers. His various libraries hosted at are world renowned. Matteo also brings a creative eye to the process as well as knowing his way around a command line. Dan Schultz - Web Developer Dan graduated from MIT earlier this year and focusses on disruptive technology. Dan will be involved in all aspects of the process but will be focussing on the API. Dan recently rolled out the OpenedCaptions API

comments powered by Disqus