10 Papers I’m more excited about than the Microsoft “human parity” announcement

There’s a lot of hype around the new Microsoft announcement where they claim human parity on the conversational speech recognition task. I don’t doubt for one second that the folks at Microsoft Research are brilliant and working on new and effective techniques.  

That said, I’ve compiled a list of papers I’ve been reading that start to touch on the many facets of large vocabulary continuous speech recognition (LVCSR). Many of these aren’t hot off the presses, but each lay a foundation for thinking about the problem in different ways and along different axes of success. Hopefully, this can serve as a reality check for people, and highlight some things the Microsoft paper doesn’t mention: the contributions of very smart and innovative organizations that speak specifically to the question, “could we run an industry leading model in Prod?”

A big data approach to acoustic model training corpus selection

The Deep Speech Trilogy

Deep Speech
Deep Speech 2
Reducing Bias in Production Speech Models

1-Bit Stochastic Gradient Descent and Application to Data-Parallel Distributed Training of Speech DNNs

Scalable Modified Kneser-Ney Language Model Estimation

Building an Efficient Neural Language Model Over a Billion Words

Personalized Speech Recognition On Mobile Devices

Exploring Sparsity in Recurrent Neural Networks

Purely sequence-trained neural networks for ASR based on lattice-free MMI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: