UK Speech Conference 2022
5-6 September 2022 — Edinburgh, United Kingdom

Keynote Talks

Abstracts and Bios

Prof Naomi Harte

Trinity College Dublin

Multimodal Speech – embracing the iceberg!

Abstract: This talk will consider the multimodal nature of speech and speech technology. Human speech communication is extremely rich. We use many elements to communicate, from words to gestures and eye gaze, and seamlessly interpret these many cues in our conversations. How can we exploit this in technology? In my talk, I’ll look at how visual and linguistic information can be integrated into deep learning frameworks for audio-visual speech recognition and turn taking prediction. I’ll also explore how conversational interaction online can be challenging due to disruptions to the cues we usually rely on, and consider whether multimodal approaches can help

Bio: Naomi is Professor in Speech Technology in the School of Engineering in Trinity College Dublin. She is Co-PI and a founding member of the ADAPT SFI Centre in Ireland. In ADAPT, she has led a major Research Theme centered on Multimodal Interaction involving researchers from Universities across Ireland and was instrumental in developing the future vision for the Centre for 2021-2026. She is also a lead academic in the Sigmedia Research Group in the School of Engineering. Prior to starting her lectureship in TCD in 2008, Naomi worked in high-tech start-ups in the field of DSP Systems Development, including her own company. She also previously worked in McMaster University in Canada. She was a Visiting Professor at ICSI in 2015, and became a Fellow of TCD in 2017. She earned a Google Faculty Award in 2018 and was shortlisted for the AI Ireland Awards in 2019. She currently serves on the Editorial Board of Computer Speech and Language, and will Chair Interspeech 2023 in Dublin.

Dr Jennifer Williams

University of Southampton

Speech Privacy: Where Are We Going and How to Get There?

Abstract: Audio recording devices and speech technologies are becoming increasingly commonplace in everyday life. At the same time, commercialised speech and audio technologies do not provide consumers with a range of privacy choices. Even where privacy is regulated or protected by law, technical solutions to privacy assurance and enforcement fall short. Within the speech research community there are no standard technical definitions of privacy and security. However, privacy is usually taken to refer to "controlling access to information" whereas security is often taken to refer to "how information can be used (or misused)". This talk highlights several critical challenges to developing trustworthy speech and audio technologies in the context of privacy and security. Particularly, a new type of speech privacy will be introduced as an emerging research area: content-based privacy. True progress toward trustworthy speech and audio technology will require an interdisciplinary approach that combines perspectives from multiple science, legal, artistic, and social domains. Interdisciplinary approaches are especially important because sometimes issues of privacy and security are well-known among technical researchers (who create the technologies). Issues may become known among other scholars only once technology has been commercialised. In fact, with speech technology moving "at the speed of light" lately, timing is critical. Such a gap must be closed for progress on speech technology that moves us toward a freer and more just world.

Bio: Dr Jennifer Williams is a postdoctoral Research Fellow at the University of Southampton. She currently works in two main areas: citizen-centric AI systems and trustworthy autonomous systems. One aspect of her research explores the creation of trustworthy, private, and secure speech/audio solutions for smart buildings that can contribute to accessibility as well as resource management and "low-carbon comfort". Dr. Williams is also the PI of a small interdisciplinary project between University of Southampton and University of Nottingham which explores regulation and policy development in the context of speech applications for the creative industries. She completed her PhD at the University of Edinburgh (2022) in the area of representation learning and speech signal disentanglement. She applied that work to a variety of speech technology applications (voice conversion, speech synthesis, anti-spoofing, naturalness assessment, and privacy). She also holds a position in industry as a Senior Speech Scientist at MyVoice AI where she develops ultra-low power speech technology to run on edge devices. Dr Williams was previously a staff member at MIT Lincoln Laboratory for five years developing prototype speech and text technology for the US Government. She is a member of IEEE and ISCA, serves as a committee member of the ISCA-PECRAC group, and co-organizes ISCA SPSC-SIG events. She is a reviewer for multiple conferences involving AI, text, speech, and multimedia. She holds an MScR in Data Science from University of Edinburgh (2018), an MS in Computational Linguistics from Georgetown University (2012) and a BA in Applied Linguistics, magna cum laude, from Portland State University (2009).

Dr Joanne Cleland

University of Strathclyde

Using Ultrasound to Image the Articulators in the Speech Therapy Clinic

Abstract: Speech sound disorders are common in childhood and can affect the education and wellbeing of children. This talk will first provide an overview of using medical ultrasound to image the articulators for assessment and treatment of speech sound disorders in children. Using this technique, we are able to see the tongue moving in real-time and use this for both the assessment of speech sound disorders and as a biofeedback tool for intervention. In this talk I will make a case for how imaging the articulators leads to more precise diagnosis of speech sound disorders and provides insight into the underlying nature of such disorders. I will also explore how in the future we might automate the classification of speech sound disorders using dynamic ultrasound, leading not only to quicker but also more precise diagnosis.

Bio: Joanne Cleland is a Reader in Speech and Language Therapy at the University of Strathclyde in Glasgow. Her research focuses on using instrumental techniques, particularly ultrasound tongue imaging, for the assessment of speech sound disorders in children. You can find out more about her work by following her on twitter: @DrJoanneCleland

Back to the UK Speech 2022 homepage