10 Papers I’m more excited about than the Microsoft “human parity” announcement

There’s a lot of hype around the new Microsoft announcement where they claim human parity on the conversational speech recognition task. I don’t doubt for one second that the folks at Microsoft Research are brilliant and working on new and effective techniques.  

That said, I’ve compiled a list of papers I’ve been reading that start to touch on the many facets of large vocabulary continuous speech recognition (LVCSR). Many of these aren’t hot off the presses, but each lay a foundation for thinking about the problem in different ways and along different axes of success. Hopefully, this can serve as a reality check for people, and highlight some things the Microsoft paper doesn’t mention: the contributions of very smart and innovative organizations that speak specifically to the question, “could we run an industry leading model in Prod?”

A big data approach to acoustic model training corpus selection
http://193.6.4.39/~czap/letoltes/IS14/IS2014/PDF/AUTHOR/IS140948.PDF

The Deep Speech Trilogy

Deep Speech
https://arxiv.org/abs/1412.5567
Deep Speech 2
https://arxiv.org/abs/1512.02595
Reducing Bias in Production Speech Models
https://arxiv.org/abs/1705.04400

1-Bit Stochastic Gradient Descent and Application to Data-Parallel Distributed Training of Speech DNNs
https://www.microsoft.com/en-us/research/publication/1-bit-stochastic-gradient-descent-and-application-to-data-parallel-distributed-training-of-speech-dnns/

Scalable Modified Kneser-Ney Language Model Estimation
http://www.aclweb.org/anthology/P13-2#page=738

Building an Efficient Neural Language Model Over a Billion Words
https://research.fb.com/building-an-efficient-neural-language-model-over-a-billion-words/

Personalized Speech Recognition On Mobile Devices
https://research.google.com/pubs/pub44631.html

Exploring Sparsity in Recurrent Neural Networks
https://arxiv.org/abs/1704.05119

Purely sequence-trained neural networks for ASR based on lattice-free MMI
https://pdfs.semanticscholar.org/6ce6/a9a30cd69bd2842a4b581cf48c6815bdfdd8.pdf

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: