ideological speed bumps

Sun May 16 03:06:13 UTC 2010

On 5/15/2010 8:59 PM, Tim Cross wrote:

> Hi Eric,
>
> the points you raise and your observations are all true, but I don't think
> there is a good answer. What it really boils down to is that OSS is largely
> about solutions that have been developed by users scratching their own itch.
> Unfortunately, voice recognition is an extremely complex and difficult to
> scratch itch and the number of developers with the necessary skills that want
> to scratch it is very small.

thanks for a great series responses to a complex question. As for scratching 
your own it, there's one big difference. I can't scratch my own itch because my 
hands don't work right.  It's roughly the same problem as telling a blind person 
that they can write their own code in an IDE that has lots of wonderful 
graphical images that tell you what you need to do... whoops

> It has been a umber of years since I've looked at the status of voice
> recognition in the OSS world. Working on these projects would seem to be a
> good proactive approach. In addition to this, two other approaches that might
> be worth pursuing, especially by anyone who is interested in this area and
> doesn't feel they have the technical skill to actually work in the development
> area, would be to lobby commercial vendors to either make some of their code
> open source or to provide low cost licenses and to lobby for project funding to
> support OSS projects that are working on VR. A significant amount of the work
> done to improve TTS interfaces has been made possible because of effective
> loggying and gaining of support from commercial and government bodies.

The vast majority of the speech recognition efforts today are for IVR, 
interactive voice response systems such as those you would ask. "Weather in 
Boston" and get a text-to-speech response like "the weather in Boston is hostile 
to out-of-towners and not very kind to locals either"

The difference between speech recognition and text-to-speech today is that 
usable text-to-speech is easy to create with a team of grad students.  Speech 
recognition takes generations of grad students. Witness how little progress has 
been made on the Sphinx toolkit's since its creation. We have three different 
engines all with different characteristics but all on the same problem space. We 
don't have proper acoustic modeling. We don't have proper language modeling etc. 
etc. I know I'm being a broken record but, these are huge obstacles to 
general-purpose use.

I would love to see us license for little or no money the nuance 
NaturallySpeaking toolkit for purposes of developing accessibility interfaces. I 
can't even get them to return a phone call what I'm calling about a commercial 
application. If it's for accessibility, they don't even pick up the phone. This 
tells me it may be time for some guerrilla action. If someone has a spare $2000, 
I have a scanner and I'm sure we can find some good friends in Europe and Japan. 
  not that I'm saying or even suggesting we should violate nuanc's copyright of 
course because that would be as wrong as denying disabled people information 
they need to make themselves more independent and increasing their prospects for 
working.

> I'm possibly a little more optimistic regarding the future of OSS VR. Voice
> recognition is rapidly moving from living in a very specialised domain to
> being much more general purpose. This is largely due to the growth in small form
> factor devices, such as mobile phones. I've been told that the Google Nexus 1
> phone has quite good VR support. This is an indication that decent VR
> applications that run in an OSS environment are becoming more prevalent.

here's a dirty little secret. They didn't do the speech recognition in the 
phone. Not enough horsepower or memory space for vocabularies. They ship the 
audio to a server which then does speech recognition not real-time and shoves 
the text back to the cell phone.

this may unfortunately be our future for disability use. We'll no longer have 
control over speech recognition engines but instead rent recognition time off 
the cloud.

I really hate the cloud. I understand why pilots hate them as well because if 
you fly to the big fluffy thing, the fluffy soft thing can turn really really 
hard as you run into a mountain hidden within the cloud.  boink!

I'm wait for the equivalent to happen in the software cloud world.

> It is
> also likely as demand increases for VR solutions that more University research
> will occur as it will be seen as something with good commercial potential i.e.
> good funding opportunities.

Speech recognition research is aimed at IVR. Funding has plateaued or even 
dropped because recognition accuracy is not improving. The techniques have run 
out of steam. It will take a radically new approach to put any fire under speech 
recognition again. Sometimes I think the only way nuance is improving 
NaturallySpeaking is by fixing bugs. I doubt there's no new technology going on 
inside.

> Unfortunately, it is also true that the accessibility benefits of technology
> such as VR will all too often be a secondary issue to commercial interests.
> There will be a lag time between this technology existing and it being
> accessible to those who would really benefit from it. This is probably the
> downside of the free market economy where developments are driven by profits.
> However, it is also the percieved profits that ensure commercial resources are
> invested into understanding the problem and developing workable solutions.
> We are still a long way from the sort of society that would put
> the accessibility needs before individual and corporate greed. In fact, we are
> still a long way from getting mainstream recognition of accessibility issues
> to the level they should be, which is why I think lobbying and raising issues
> outside the accessibility community is so important.

yes. It would be interesting to do the calculation but I think there's a good 
chance that sticking disabled people on disability and low-income housing may be 
cheaper to society than all the efforts put into making software and life space 
disabilities accessible. This is also why I advocate for putting disability 
hooks in every machine (i.e. low-cost, at little or no administration) and every 
disabled person carry their own machine with a disability user interface (i.e. 
text-to-speech or speech recognition) so that the cost of enabling a machine for 
accessibility is lower than it is now.

it's all economics. I think if we can come up with a way that tweaks or 
leverages economics in our favor, we can make a big difference. If it's strictly 
"do it because this is right", it cannot fail. Another example of this in a 
different field is light pollution. Light pollution is a good idea to control 
because it reduces energy, makes nighttime driving safer, makes it possible for 
elderly to drive out there is insufficient economic incentive to fix 
streetlights and high glare security lighting to make any progress. Therefore 
any changes based on moral arguments are hard fought hard one battles and 
usually overturned when the people driving the argument vanished from th 
political scene because economic/business people push back to the status quo 
(i.e. short-term goal driven)

We will suffer the same fate with our arguments if we can't  provide a good 
economic argument in addition to our technical and moral/ethical arguments.