UK Speech 2022

Technical Programme

Monday 5th September & Tuesday 6th September, 2022 UK Speech 2022 will be held at University of Edinburgh Central Campus. Registration, posters, coffee and lunches will be in the ground floor of the Informatics Forum, while keynote talks and oral sessions will be in the Room G.152, Teviot Lecture Theatre (Doorway 5, Old Medical School)

You can find abstracts for the presentations in the UK Speech 2022 abstract book

Monday, 5th September 2022

Mon 12:00-13:00 - Lunch and Registration

Atrium/G.07, Informatics Forum

Mon 13:00-13:15 - Welcome

Room G.152, Teviot Lecture Theatre (Doorway 5, Old Medical School)

Welcome and announcements
UK Speech organisers

Mon 13:15-14:15 - Keynote A

Room G.152, Teviot Lecture Theatre (Doorway 5, Old Medical School)
Chair: Dr Korin Richmond

Using Ultrasound to Image the Articulators in the Speech Therapy Clinic
Dr Joanne Cleland, University of Strathclyde

Mon 14:30-15:30 - Poster Session A

Atrium/G.07, Informatics Forum

1. Text-free non-parallel many-to-many voice conversion using normalising flows
Thomas Merritt, Abdelhamid Ezzerg, Piotr Biliński, Magdalena Proszewska, Kamil Pokora, Roberto Barra-Chicote and Daniel Korzekwa

2. Leveraging Explicit Acoustic Features for Controllable TTS
Tian Huey Teh, Devang S Ram Mohan, Vivian Hu, Alexandra Torresquintero, Zack Hodari, Tomás Gómez Ibarrondo, Christopher G. R. Wallis and Simon King

3. Treating the noisy phase issue in speech enhancement using complex ratio masks
Georgiana-Elena Sfeclis

4. Comparing human emotion perception and automatic emotion recognition of user turns in human-machine dialogues
Norbert Braunschweiler, Rama Doddipatla, Simon Keizer and Svetlana Stoyanchev

5. Language Modelling with Recurrent Neural Networks for Code-Switching
Olga Iakovenko and Thomas Hain

6. Speaker Diarization: Importance of the Modulation Spectrum and Incorporating Uncertainty Modelling
Simon McKnight

7. Modelling trajectories of human speech articulators using general Tau theory
Benjamin Elie, David Lee and Alice Turk

8. Multi-sentence TTS with Expressive and Coherent Prosody
Marcel Granero-Moya, Amith Nagaraj, Peter Makarov, Ammar Abbas, Mateusz Lajszczak, Arnaud Joly, Sri Karlapati, Alexis Moinet, Thomas Drugman and Penny Karanasou

9. Investigating perception of spoken dialogue acceptability through surprisal
Sarenne Wallbridge, Peter Bell and Catherine Lai

10. Peter 2.0: Building a Cyborg
Matthew Aylett, Ari Shapiro, Sai Prasad, Lama Nachman, Stacy Marsella and Peter Scott-Morgan

11. Monitoring sleep disordered breathing of long-Covid patients at home using acoustic AI technology
Gerardo Roa Dabike, Ning Ma and Guy Brown

12. Incremental Disfluency Detection for Spoken Learner English
Lucy Skidmore and Roger K. Moore

13. Audio-Based Computational Analysis of Podcast Expressivity
Shahar Elisha, Emmanouil Benetos, Jussi Karlgren and Mariano Beguerisse-Diaz

14. Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription
Xianrui Zheng, Chao Zhang and Phil Woodland

15. Comparing Human and Machine Perceptions of Voice Anonymisation
Farida Yusuf, Dan Kumpik, Matt Clifford, Jonathan Erskine and Jennifer Williams

16. ABAIR-ÉIST: recent progress in Irish language low-resource ASR development
Liam Lonergan, Christian Saam, Mengjie Qian, Neasa Ní Chiaráin, Christer Gobl and Ailbhe Ní Chasaide

17. A summary of the GENEA Challenge 2022 on co-speech gesture generation
Youngwoo Yoon, Pieter Wolfert, Taras Kucherenko, Carla Viegas, Teodor Nikolov, Mihail Tsakov and Gustav Eje Henter

18. Neural formant synthesis – a proving ground for speech-synthesis control
Gustavo Teodoro Döhler Beck, Ulme Wennberg, Zofia Malisz and Gustav Eje Henter

19. Empowering neural TTS with HMMs to get the best of both worlds
Shivam Mehta, Harm Lameris, Éva Székely, Jonas Beskow and Gustav Eje Henter

20. Unsupervised data selection for Speech Recognition with contrastive loss ratios
Chanho Park,Rehan Ahmad and Thomas Hain

21. Domain-Informed Probing of wav2vec 2.0 Embeddings for Phonetic Features
Patrick Cormac English, Julie Carson-Berndsen and John Kelleher

22. Self-supervised Graphs for Audio Representation Learning with Limited Labeled Data
Amir Shirian, Krishna Somandepalli and Tanaya Guha

Mon 15:30-16:00 - Coffee

Atrium/G.07, Informatics Forum

Mon 16:00-17:00 - Keynote B

Room G.152, Teviot Lecture Theatre (Doorway 5, Old Medical School)
Chair: Prof Simon King

Speech Privacy: Where Are We Going and How to Get There?
Dr Jennifer Williams, University of Southampton

Mon 18:30-23:00 - Workshop Dinner

The Scottish National Gallery

18:30, Drinks Reception at the Scottish National Gallery

20:00, Dinner the Scottish Cafe & Restaurant at the Scottish National Gallery

Plus an after dinner Ceilidh at the Scottish Cafe & Restaurant - bring your dancing shoes!

Tuesday, 6th September 2022

Tue 09:30-10:30 - Oral Session A

Room G.152, Teviot Lecture Theatre (Doorway 5, Old Medical School)
Prof Jon Barker

1. Evaluating watchability for video localisation
Zack Hodari, Tian Huey Teh, Vivian Hu, Tomás Gómez Ibarrondo, Devang S Ram Mohan, Alexandra Torresquintero, Chris Wallis, James Leoni and Simon King

2. Transforming adult to child speech for dubbing
Protima Nomo Sudro, Anton Ragni and Thomas Hain

3. Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR
Ondrej Klejch, Electra Wallington and Peter Bell

Tue 10:30-11:30 - Keynote C

Room G.152, Teviot Lecture Theatre (Doorway 5, Old Medical School)
Dr Kate Knill

Multimodal Speech – Embracing the Iceberg!
Prof Naomi Harte, Trinity College Dublin

Tue 11:30-12:00 - Coffee

Atrium/G.07, Informatics Forum

Tue 12:00-13:00 - Poster Session B

Atrium/G.07, Informatics Forum

1. Speaker identification in courtroom contexts: performance of human listeners compared to a state-of-the-art forensic voice comparison system
Philip Weber, Nabanita Basu, Agnes S. Bali, Claudia Rosas-Aguilar, Gary Edmond, Kristy A. Martire and Geoffrey Stewart Morrison

2. Automatic generation of accented speech using phonetic features
Margot Masson, Anthony Ventresque and Julie Carson-Berndsen

3. Exploring hidden speech representations of self-supervised automatic speech recognition models
Tamara Soloveva, Ramon Sanabria and Peter Bell

4. AVSE Challenge: Audio-visual Speech Enhancement Challenge
Lorena Aldana, Cassia Valentini-Botinhao, Ondrej Klejch, Mandar Gogate, Kia Dashtipour, Amir Hussain and Peter Bell

5. Leveraging linguistic knowledge for accent robustness of end-to-end models
Andrea Carmantini and Peter Bell

6. A Biological Understanding of Dramatic Speech through Synthesis
Emily Lau, Brechtje Post and Kate Knill

7. Modelling Pronunciation Variation in Different Spoken Englishes
Emma O'Neill and Julie Berndsen

8. Using Utterance-Specific Dirichlet Priors to Model Uncertainty in Emotion Class Labels
Wen Wu, Chao Zhang, Xixin Wu and Philip C. Woodland

9. PSE-Net: Real-time Personalized Sound Enhancement
Abhinav Mehrotra, Alberto Gil C. P. Ramos, Nic Lane and Sourav Bhattacharya

10. Conversational Speech vs. Sustained Phonation for Diagnosis of Parkinson’s Disease
Steve Beet, Phill Restall and Ladan Baghai-Ravary

11. Tree-Constrained Pointer Generator for End-to-end Contextual ASR
Guangzhi Sun, Chao Zhang and Phil Woodland

12. Canonical-Correlated Graph Neural Network for Multimodal Energy-Efficient Speech Enhancement
Leandro Aparecido Passos Junior, Ahmed Khubaib, Mohsin Raza, Amir Hussain and Ahsan Adeel

13. CognoSpeak: a Cognitive Health Assessment Tool (CcHAT)
Nathan Pevy, Heidi Christensen and Daniel Blackburn

14. Attention Forcing for Speech Synthesis
Qingyun Dou and Mark Gales

15. Multimodal Emotion Recognition in Conversations
Jiachen Luo, Joshua Reiss and Huy Phan

16. Addressing user concerns about multi-modal hearing technology
Dorothy Hardy, Michael Akeroyd, Adeel Hussain, Peter Bell and Amir Hussain

17. Model for Assessor Bias in Automatic Pronunciation Assessment
Jose Antonio Lopez Saenz and Thomas Hain

18. A siamese RNN architecture to detect deliberate imitation and phonetic convergence in L2-speech
Byron Z. Yuan, Aldo Pastore, Dorina De Jong, Hao Xu, Luciano Fadiga and Alessandro D'Ausilio

19. Using conversational data to improve prosody in Text-to-Speech synthesis
Johannah O'Mahony, Catherine Lai and Simon King

20. Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion
Muhammad Umar Farooq, Darshan Adiga Haniya Narayana and Thomas Hain

21. RoomReader: A Multimodal Corpus of Online Multiparty Conversational Interactions
Justine Reverdy, Sam O'Connor Russell, Louise Duquenne, Diego Garaialde, Benjamin Cowan and Naomi Harte

22. Person-specific automatic speaker recognition: understanding the behaviour of individuals for applications of ASR
Vincent Hughes, Paul Foulkes, Philip Harrison, Jessica Wormald, Chenzi Xu, David van der Vloed and Finnian Kelly

23. Alternative Evaluation Methods of Latent Representations of Speech Audio
Eimear Stanley, Yumnah Mohamied and Peter Bell

Tue 13:00-14:00 - Lunch

Atrium/G.07, Informatics Forum

Tue 14:00-15:00 - Poster Session C

Atrium/G.07, Informatics Forum

1. Autovocoder: Vocoding Without Spectrograms
Jacob Webber and Simon King

2. Exploring Prosody Transfer in Speech Synthesis
Atli Sigurgeirsson and Simon King

3. Code-switched Text Generation on Parallel Data
Jie Chi and Brian Lu

4. Voice Puppetry for the People: Harnessing Dramatic Performance for Speech Synthesis
Matthew Aylett, Skaiste Butkute and Christopher Pidcock

5. Improving diagnostic procedures for epilepsy through automated recording and analysis of patients’ history
Nathan Pevy, Heidi Christensen, Traci Walker and Markus Reuber

6. Deliberation Based Multi-Pass Speech Synthesis
Qingyun Dou and Mark Gales

7. Exploring Novel Methods for Automatic Speech Recogniser Based Intelligibility Prediction
Zehai Tu, Ning Ma and Jon Barker

8. View-Specific Assessment of L2 Spoken English
Stefano Banno, Bhanu Balasu, Mark Gales, Kate Knill and Konstantinos Kyriakopolous

9. Why is My Social Robot so Slow? How a Conversational Listener can Revolutionize Turn-Taking
Matthew Aylett, Andrea Carmantini and David Braude

10. Creating New Voices using Normalizing Flows
Piotr Biliński, Thomas Merritt, Abdelhamid Ezzerg, Kamil Pokora, Sebastian Cygert, Kayoko Yanagisawa, Roberto Barra-Chicote and Daniel Korzekwa

11. Phonetic Analysis of Self-supervised Representations of English Speech
Dan Wells, Hao Tang and Korin Richmond

12. Comparison of Audio-Visual Speech Enhancement Models with Hearing Aid Key Performance Indicators
Jasper Kirton-Wingate, Mandar Gogate, Amir Hussain and Tassadaq Hussain

14. Simulation of Teacher-Learner Interaction in English Language Pronunciation Learning
Elaf Islam and Thomas Hain

15. A New Benchmark Multi-modal Speech Corpus With Two Target Speakers
Jasper Kirton-Wingate, Adeel Hussain, Amir Hussain, Kia Dashtipour, Mandar Gogate and Peter Derleth

16. Gender Bias and Universal Substitution Adversarial Attacks on Grammatical Error Correction Systems for Automated Assessment
Vyas Raina and Mark Gales

17. Is there an auditory uncanny valley for synthesised speech?
Alice Ross, Catherine Lai and Martin Corley

18. Exploration of A Self-Supervised Speech Model: A Study on Emotional Corpora
Yuanchao Li, Yumnah Mohamied, Peter Bell and Catherine Lai

19. Cross lingual wav2vec finetuning in mutually intelligible language pairs
Jeffrey Josanne Michael, Toby Godwin and Oscar Saz

20. Phonetically Guided Transfer Learning for Low-Resource Accented English
Edward Storey and Naomi Harte

21. Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs
Zhengjun Yue, Erfan Loweimi, Heidi Christensen, Jon Barker and Zoran Cvetkovic

22. Joint Modelling of Automatic Speaker Verification and Spoofing Countermeasure Systems
Poppy Welch and Jennifer Williams

Tue 15:00-16:00 - Oral Session B

Room G.152, Teviot Lecture Theatre (Doorway 5, Old Medical School)
Prof Julie Berndsen

1. The 2nd Clarity Enhancement Challenge: A machine learning challenge for hearing aid speech intelligibility enhancement
Will Bailey, Michael Akeroyd, Jon Barker, Trevor Cox, John Culling, Simone Graetzer, Graham Naylor, Zuzanna Podwińska and Zehai Tu

2. Back to the Future: Extending the Blizzard Challenge 2013
Sébastien Le Maguer, Simon King and Naomi Harte

3. Fine Grained Spoken Document Summarization Through Text Segmentation
Samantha Kotey, Rozenn Dahyot and Naomi Harte

Tue 16:00-16:15 - Workshop Close

Room G.152, Teviot Lecture Theatre (Doorway 5, Old Medical School)

Future plans and farewell!
UK Speech organisers

Back to the UK Speech 2022 homepage