Clinical Applications of Speech Technology

  • Upload
    hester

  • View
    49

  • Download
    2

Embed Size (px)

DESCRIPTION

Clinical Applications of Speech Technology. Phil Green Speech and Hearing Research Group Dept of Computer Science University of Sheffield [email protected]. Talk Overview. SPandH - Speech and Hearing @ Sheffield The CAST group - PowerPoint PPT Presentation

Citation preview

  • Clinical Applications of Speech TechnologyPhil GreenSpeech and Hearing Research GroupDept of Computer ScienceUniversity of [email protected]

    The University of Sheffield / Department of Marketing and Communications

  • Talk OverviewSPandH - Speech and Hearing @ SheffieldThe CAST groupBuilding Automatic Speech Recognisers conventional methodologyASR for clients with speech disordersKinematic MapsVoice-driven Environmental ControlVIVOCACustomising VoicesFuture Directions

    The University of Sheffield / Department of Marketing and Communications

  • SPandHPhonetics &LinguisticsHearing & AcousticsElectrical Engineering &Signal ProcessingSpeech & Language Therapy

    The University of Sheffield / Department of Marketing and Communications

  • Prof Mark HawleySchool of Health and Related ResearchAssistive TechnologyProf Pam EnderbyInstitute of General Practice and Primary CareUniversity of SheffieldSpeech TherapyProf Phil GreenProf Roger K MooreSpeech and Hearing Research GroupDepartment of Computer ScienceUniversity of SheffieldSpeech TechnologyDr Stuart CunninghamDepartment of Human Communication SciencesUniversity of SheffieldSpeech Perception, Speech TechnologyContact: [email protected]

    The University of Sheffield / Department of Marketing and Communications

  • Conventional Automatic Speech Recogniser ConstructionStandard technique uses generative statistical models:Each state is characterised by a mixture Gaussian distribution over the components of the acoustic vector x.Parameters of the distributions estimated in training (EM Baum-Welch)All this is the acoustic model. There will also be a language model.Decoding finds model & state sequence most likely to generate X .Training based on large pre-recorded speaker-independent speech corpus

    The University of Sheffield / Department of Marketing and Communications

  • DysarthriaLoss of control of speech articulatorsStroke victims, cerebral palsy, MS..Effects 170 per 100,000 populationSevere cases unintelligible to strangers:

    Often accompanied by physical disabilitychannellampradio

    The University of Sheffield / Department of Marketing and Communications

  • STARDUST: ASR for Dysarthric SpeakersNHS NEAT FundingEnvironmental controlSmall vocabulary, isolated wordsSpeaker-dependentSparse training dataVariable training data

    The University of Sheffield / Department of Marketing and Communications

  • STARDUST MethodologyInitial recordings

    The University of Sheffield / Department of Marketing and Communications

  • STARDUST training resultsECS trial: halved the average time to execute a command

    The University of Sheffield / Department of Marketing and Communications

  • STARDUST Consistency Training

    The University of Sheffield / Department of Marketing and Communications

  • STARDUST Clinical Trial

    The University of Sheffield / Department of Marketing and Communications

  • OPTACIA: Kinematic MapsPronunciation Training AidEC FundingSpeech acoustics mapped to x,y position in map window in real timeMapping by trained Neural NetCustomise for exercises and clientsANN MappingSignalProcessingshsiSpeech

    The University of Sheffield / Department of Marketing and Communications

  • Example: Vowel Map

    The University of Sheffield / Department of Marketing and Communications

  • SPECS: Speech-Driven Environmental Control SystemsNHS HTD FundingIndustrial exploitationSTARDUST on balloon board

    The University of Sheffield / Department of Marketing and Communications

  • VIVOCA- Voice Input Voice Output Communication AidNHS NEAT fundingAssists communication with strangers;Client: buy tea [unintelligible]VIVOCA: A cup of tea with milk and no sugar please [intelligible synthesised speech]Runs on a PDA

    The University of Sheffield / Department of Marketing and Communications

  • Voices for VIVOCAIt is possible to build voices from training dataA local voice is preferableYorkshire voices:Ian MacMillan Christa Ackroyd

    The University of Sheffield / Department of Marketing and Communications

  • Concatenative synthesisInput dataText inputSynthesised speechSpeech recordings UnitsegmentationUnit databaseUnitselectionConcatenation+ smoothingiashFestvox: http://festvox.org/+ ++

    The University of Sheffield / Department of Marketing and Communications

  • Concatenative synthesisHigh qualityNatural soundingSounds like original speakerNeed a lot of data (~600 sentences)Can be inconsistentDifficult to manipulate prosody

    The University of Sheffield / Department of Marketing and Communications

  • HMM synthesisyesyes

    The University of Sheffield / Department of Marketing and Communications

  • HMM synthesis: adaptationInput dataText inputAverage speaker modelSynthesisedspeechSpeech recordingsTrainingSynthesisetHTS http://hts.sp.nitech.ac.jp/Adapted speaker modelAdaptationetSpeechrecordings100200

    The University of Sheffield / Department of Marketing and Communications

  • HMM synthesisConsistentIntelligibleEasier to manipulate prosodyNeeds relatively little input for adaptation data (>5 sentences)Less natural than concatenative

    The University of Sheffield / Department of Marketing and Communications

  • Personalisation for individuals with progressive speech disorders Voice banking Before deteriorationCapturing the essence of a voiceDuring deterioration

    The University of Sheffield / Department of Marketing and Communications

  • HMM synthesis: adaptation for dysarthric speechInput dataText inputAverage speaker modelSynthesisedspeechSpeech recordingsTrainingSynthesisetHTS http://hts.sp.nitech.ac.jp/Adapted speaker modelAdaptationetSpeechrecordingsDuration, phonation and energy information

    The University of Sheffield / Department of Marketing and Communications

  • Future directionsPersonal Adaptive Listeners (PALS)Home ServiceCompanions

    The University of Sheffield / Department of Marketing and Communications

  • The PALS ConceptA PAL is a portable (PDA, wearable..) device which you ownYour PAL is like your valetIt knows a lot about you..The way you speak, the words you like to useYour interests, contacts, networksYou talk with it The knowledge makes conversational dialogues viable It does things for youBookings, appointments, remindersCommunicationAccess to services..It learns to do a better jobBy explicit training (this is how I refer to things, these are the names I use..) USER-AS-TEACHERBy Automatic Adaptation: acoustic models, language models, dialogue models

    The University of Sheffield / Department of Marketing and Communications