[OFF-TOPIC] Re: ideological speed bumps

Sun May 16 07:27:43 UTC 2010

Hi Eric,

I've added comments in-line below. However, this is probably beginning to get
a little off topic for the list. Maybe take further discussion off list if you
want to respond further. Alternatively, maybe you have some suggestions or
ubuntu specific points that could be brought in to get things more on-topic?
I'm not up-to-date enough with current VR issues to be able to provide any
really constructive advice. However, I also understand how important it can be
to have general discussion and possibly find the ideas or energy to carry
things forward further. I'm happy to discuss further off list if you wish. 

regards,

Tim

Eric S. Johansson writes:
 > On 5/15/2010 8:59 PM, Tim Cross wrote:
 > 
 > > Hi Eric,
 > >
 > > the points you raise and your observations are all true, but I don't think
 > > there is a good answer. What it really boils down to is that OSS is largely
 > > about solutions that have been developed by users scratching their own itch.
 > > Unfortunately, voice recognition is an extremely complex and difficult to
 > > scratch itch and the number of developers with the necessary skills that want
 > > to scratch it is very small.
 > 
 > thanks for a great series responses to a complex question. As for scratching 
 > your own it, there's one big difference. I can't scratch my own itch because my 
 > hands don't work right.  It's roughly the same problem as telling a blind person 
 > that they can write their own code in an IDE that has lots of wonderful 
 > graphical images that tell you what you need to do... whoops
 >

Yes, I understand the difficulty and frustration. I wasn't meaning to imply
that you or any other individual should fix the problem directly, though I
suspect there are some who would benefit from VR who are in a position to
assist in code writing for OSS projects. The other point I wanted to make is
that coding is not the only way to help. To a large extent, the lobbying
aspect is also important. A lot of the battle is getting the recognition of
the importance of OSS and low cost solutions in the adaptive technology space.
This is an area most can asist with and in fact, you have demonstrated in
starting this thread. What is needed is to move this sort of discussion more
into the mainstream development area and working towards adaptive tech being
considered as a first class consideration and not as an afterthought, as is
too often the situation. 

 > > It has been a umber of years since I've looked at the status of voice
 > > recognition in the OSS world. Working on these projects would seem to be a
 > > good proactive approach. In addition to this, two other approaches that might
 > > be worth pursuing, especially by anyone who is interested in this area and
 > > doesn't feel they have the technical skill to actually work in the development
 > > area, would be to lobby commercial vendors to either make some of their code
 > > open source or to provide low cost licenses and to lobby for project funding to
 > > support OSS projects that are working on VR. A significant amount of the work
 > > done to improve TTS interfaces has been made possible because of effective
 > > loggying and gaining of support from commercial and government bodies.
 > 
 > The vast majority of the speech recognition efforts today are for IVR, 
 > interactive voice response systems such as those you would ask. "Weather in 
 > Boston" and get a text-to-speech response like "the weather in Boston is hostile 
 > to out-of-towners and not very kind to locals either"
 > 
 > The difference between speech recognition and text-to-speech today is that 
 > usable text-to-speech is easy to create with a team of grad students.  Speech 
 > recognition takes generations of grad students. Witness how little progress has 
 > been made on the Sphinx toolkit's since its creation. We have three different 
 > engines all with different characteristics but all on the same problem space. We 
 > don't have proper acoustic modeling. We don't have proper language modeling etc. 
 > etc. I know I'm being a broken record but, these are huge obstacles to 
 > general-purpose use.
 >

I only partially agree on both your points. Yes, much of the VR work to date
has been for IVR systems, but as the technology improves, I believe this is
changing. For example, the VR support I mentioned on the Nexus phone is for
dictation of SMS text messages. The new phone system we recently installed at
work has VR capabilities that translates voice messages to text and sends it
via SMS an email. I think this type of application of VR will see rapid
development over the next few years and represents the next sttage and a
higher level of sophistication past the IVR model with its limited recognition
abilities. 

Yes, this is a difficult problem. However, it is interesting to note that your
arguments are very similar to the ones that were common in the mid 90's. At
that time, software TTS was thought to be too comutationally intensive to be
practicle for real-time TTS. Creating voices was considered to be an art that
only a very few people could do and many argued it would be many years before
we had a decent OSS TTS engine available. 

I don't think we will see anything in the OSS world that is of production
quality and able to meet the needs of adaptive tech users next month or even
next year. It is a hard problem and will take considerable resources to
address. However, it may not take as long or as many resources as you fear. It
is very difficult to predict the rate of development in these areas. For all
we know, there may be ground braking hardware or algorithms just around the
corner that will completely change the landscape. 

I feel quite positive about developments in this area because I can see
generalised VR becoming more common. The growth in demand/popularity for smart
phones and other small form factor devices is being hampered because
keyboards, both software and hardware, are still the main interface. However,
hardware keyboards are difficult to fit in small devices and software ones are
slow and somewhat inconvenient. Generalised VR will be the commercial
solution in this area. Initially, much of it will be limited IVR type
solutions, but as shown with the Nexus, more general support to dictate
messages etc will also increase in popularity. If this technology becomes part
of things like the Android OS, then this technology will slowly find its way
into the OSS world. 

 > I would love to see us license for little or no money the nuance 
 > NaturallySpeaking toolkit for purposes of developing accessibility interfaces. I 
 > can't even get them to return a phone call what I'm calling about a commercial 
 > application. If it's for accessibility, they don't even pick up the phone. This 
 > tells me it may be time for some guerrilla action. If someone has a spare $2000, 
 > I have a scanner and I'm sure we can find some good friends in Europe and Japan. 
 >   not that I'm saying or even suggesting we should violate nuanc's copyright of 
 > course because that would be as wrong as denying disabled people information 
 > they need to make themselves more independent and increasing their prospects for 
 > working.
 > 

There were a number of attempts to get IBM to make their ViaVoice Outloud TTS
engine available as open source and to make the runtime free or at a low
license cost before someone was actualy successful in finding a model that was
acceptable to the vendor and provided a reasonable outcome for users. I
suspect it depends on individual tenacity, personality and possibly some
degree of luck. I think having a good understanding of business and things
that are likely to motivate any business into accepting or supporting any
proposal is also essential. Most businesses are not well motivated by
altruistic concerns. Many of them still don't undersatnd OSS - some have even
believed the FUD put out by companies like Microsoft. Some even fear losing
lucrative contracts with anti-OSS vendors if theya re seen to support such
initiatives.  Trying to convince a large vendor to provide their product at a
lower prices for people with a disability is unlikely to gain much traction
unless it boils down to good business sense. The difficult part is in
identifying a strong convincing buinsess case that the vendor will see as a
positive and which has benefits for those with a disability that need such
solutions. 

  > > I'm possibly a little more optimistic regarding the future of OSS VR. Voice
 > > recognition is rapidly moving from living in a very specialised domain to
 > > being much more general purpose. This is largely due to the growth in small form
 > > factor devices, such as mobile phones. I've been told that the Google Nexus 1
 > > phone has quite good VR support. This is an indication that decent VR
 > > applications that run in an OSS environment are becoming more prevalent.
 > 
 > here's a dirty little secret. They didn't do the speech recognition in the 
 > phone. Not enough horsepower or memory space for vocabularies. They ship the 
 > audio to a server which then does speech recognition not real-time and shoves 
 > the text back to the cell phone.

I wasn't aware of that. So, if I've got this right, you speak the message you
want to send, this is recorded and sent to a removte server and then a text
version is returned that is sent as the SMS message? It must be fairly close
to real-time as the person I was talking to said that as they speak the
message it is rendered as text on the screen, which enables them to correct
any errors before sending. 

 > 
 > this may unfortunately be our future for disability use. We'll no longer have 
 > control over speech recognition engines but instead rent recognition time off 
 > the cloud.

I suspect this could well be the model we are moving to generally and not just
with respect to adaptive technology. From this perspective, provided the costs
are reasonable, we will not be any worse off than other users who are also
just as dependent for all their services. 

Of course, this does not address the issue of anyone being or becoming
dependent on technology that we don't have control over or access to. This is
largely the underlying concern that RMS had when forming the FSF. While you
could argue that those with a disability are possibly at a greater
disadvantage because the technology is percieved as being more important or
critical to them. However, I think we need to be careful of such arguments.
Yes, technology enables me as someone with a disability to do things, many of
them independently, that were not possible before we had this technology.
However, to argue that my needs are greater or that my pain would be greater
if I lost access to this technology than it would be for someone without a
disability who has lost control or access to some technology they rely on is
dangerous. It runs the risk of creating an 'us and them' paradigm and is based
on subjective value statements that are impossible to quantify. It distracts
from the real issue - ensuring all have access and the ability to control or
own the technology that becomes critical in how we live our lives. 

 > 
 > I really hate the cloud. I understand why pilots hate them as well because if 
 > you fly to the big fluffy thing, the fluffy soft thing can turn really really 
 > hard as you run into a mountain hidden within the cloud.  boink!
 > 
 > I'm wait for the equivalent to happen in the software cloud world.
 >

The 'cloud' is just marketing hype. Its like Web 2.0 - it means nothing and
everything all at the same time. Technically, there is nothing new here. It is
just a swing back to the old 'thin client' and centrally provided service
model that I've seen come and go already during my short career. Yes, its more
sophisticated in some ways and has some improved architecture - thank god we
have learnt something in the last 40 years! Some of the cloud services being
provided are good, some are bad and some are dangerous. There have already
been major stuff ups - ask a sidekick user wha they think! However, I don't
feel anyone should fee any more threatened by the cloud than they do regarding
the many proprietary systems they have been putting data into for the last 20
years. As a friend of mine says - "Its all just hem lines, they will go up and
they will go down". 

 > > It is
 > > also likely as demand increases for VR solutions that more University research
 > > will occur as it will be seen as something with good commercial potential i.e.
 > > good funding opportunities.
 > 
 > Speech recognition research is aimed at IVR. Funding has plateaued or even 
 > dropped because recognition accuracy is not improving. The techniques have run 
 > out of steam. It will take a radically new approach to put any fire under speech 
 > recognition again. Sometimes I think the only way nuance is improving 
 > NaturallySpeaking is by fixing bugs. I doubt there's no new technology going on 
 > inside.
 >

Possibly, I am not up on current research in this area and can only speculate.
I once had similar concerns regarding TTS. Nearly all the research was towards
the fdevelopment of more natural sounding voices, usually using the
concatenative approach. While this style of TTS does appear to generate more
human sounding voices, it also suffers from the limitation that pronounciation
quality falls dramatically as speech rates increase. Sounds wonderful when the
rate is a normal speaking rate, but you cannot understand it once you increase
the rate. As a blind user, I'm use to listening at high speech rates. If I had
to lisen at a normal speaking rate to all the data I need to process each day,
I would never get things done. However, the newer TTS engines are less useful
to me than older systems that use mathematically derived approximations of
speech, which sound less natural, but at least can be understood at high
speech rates. 

 > > Unfortunately, it is also true that the accessibility benefits of technology
 > > such as VR will all too often be a secondary issue to commercial interests.
 > > There will be a lag time between this technology existing and it being
 > > accessible to those who would really benefit from it. This is probably the
 > > downside of the free market economy where developments are driven by profits.
 > > However, it is also the percieved profits that ensure commercial resources are
 > > invested into understanding the problem and developing workable solutions.
 > > We are still a long way from the sort of society that would put
 > > the accessibility needs before individual and corporate greed. In fact, we are
 > > still a long way from getting mainstream recognition of accessibility issues
 > > to the level they should be, which is why I think lobbying and raising issues
 > > outside the accessibility community is so important.
 > 
 > yes. It would be interesting to do the calculation but I think there's a good 
 > chance that sticking disabled people on disability and low-income housing may be 
 > cheaper to society than all the efforts put into making software and life space 
 > disabilities accessible. This is also why I advocate for putting disability 
 > hooks in every machine (i.e. low-cost, at little or no administration) and every 
 > disabled person carry their own machine with a disability user interface (i.e. 
 > text-to-speech or speech recognition) so that the cost of enabling a machine for 
 > accessibility is lower than it is now.
 > 
 > it's all economics. I think if we can come up with a way that tweaks or 
 > leverages economics in our favor, we can make a big difference. If it's strictly 
 > "do it because this is right", it cannot fail. Another example of this in a 
 > different field is light pollution. Light pollution is a good idea to control 
 > because it reduces energy, makes nighttime driving safer, makes it possible for 
 > elderly to drive out there is insufficient economic incentive to fix 
 > streetlights and high glare security lighting to make any progress. Therefore 
 > any changes based on moral arguments are hard fought hard one battles and 
 > usually overturned when the people driving the argument vanished from th 
 > political scene because economic/business people push back to the status quo 
 > (i.e. short-term goal driven)
 > 
 > We will suffer the same fate with our arguments if we can't  provide a good 
 > economic argument in addition to our technical and moral/ethical arguments.

Yep, the moral agument tends to fail because corporate capitalism is largely
amoral. We need to demonstrate strong business cases to justify the outcomes
we want. If a decision is percieved as a good business choice, it is far more
likely to be adopted. However, sometimes, we really need to be quite
creative and use a lot of imagination to formulate such a business case. 

-- 
Tim Cross
tcross at rapttech.com.au

There are two types of people in IT - those who do not manage what they 
understand and those who do not understand what they manage.