
Fonetik 2022

Välkommen till det 33e svenska fonetikmötet, Fonetik 2022. Mötet går från lunch till lunch 13-15 juni 2022 på Tal, musik och hörsel, Kungliga Tekniska Högskolan (KTH), Stockholm (se "Hitta hit" för mer information). Welcome to the 33rd Swedish phonetics meeting, Fonetik 2022. The meeting takes place from lunch to lunch at KTH Speech, Music & Hearing, KTH Royal Institute of Technology, Stockholm (see "Venue" for more info).
På den här webbplatsen hittar du samlad information om mötet. Innehållet på platsen uppdateras kontinuerligt under upprinningen till mötet och en tid därefter, sedan fryser vi innehållet men behåller webbplatsen som ett arkiv. Missing
Mötet har som vanligt uppmuntrat bidrag inom alla områden med anknytning till fonetik och tal - till exempel fonologi, lingvistik, datorlingvistik, logopedi, röstforskning, musik och sång, konversationsanalys och samtalsanalys, psykologi, kognitionsvetenskap, tal- och språkteknologi, signalprocessning, och maskininlärning. As usual, the meeting invites contributions from all areas related to phonetics and speech, for example phonology, linguistics, computational linguistics, speech therapy, voice research, music and song, discourse analysis and conversation analysis, psychology, cognitive science, speech and language technology, signal processing and machine learning. I enlighet med mötets tradition har vi speciellt uppmuntrat bidrag från studenter och forskare med fonetikanknytning som ännu inte ör bekanta med det svenska fonetikmötet och dess forskningsnätverk. In line with the meeting's tradition, we have especially encouraged contributions from students and researchers that are not yet familiar with the Swedish phonetics meeting and its associated network of researchers.
Bidragen på fonetikmötet handlar om både svenska, nordiska, och mer främmande språk, och mötet har traditionellt haft regelbundna besök av forskare från grannländerna och inte sällan mer långväga deltagare. Språket för presentationer och plenara diskussioner är sedan mitten av 2010-talet engelska, vilket ger bättre möjligheter både för besökare och för doktorander från andra språkmiljöer. The contributions at the phonetics meeting are about Swedish, Nordic and other, more foreign languages, and the meeting is often visited by researchers from neighbouring countries and occasionally more long-distance participants. The language for presentations and plenary discussions has been English since the mid-10s, which provides better opportunities for both visitors and doctoral students from other language environments to participate.

Hitta hit Venue

Mötet äger rum hos Tal, musik och hörsel (TMH) på Lindstedtsvägen 24, våning 5 på KTHs campus på Vallhallavägen i Stockholm. De flesta sessionerna är i salen Fantum. The meeting is hosted by Speech, Music & Hearing (TMH) at Lindstedtsvägen 24, 5th floor, at KTH's campus on Vallhallavägen in Stockholm. Most sessions takes place in room Fantum.

Schema Schedule

11:50 - 13:00 Lunch
Lunchen är dropin, för de som har lust att komma, på Östra Station nära KTHs tunnelbana. Vi har inte bokat bord, men det bör finnas plats. Kostnaden för lunchen ingår inte i Fonetik 2022. The lunch is dropin, for those who wish to join, at restaurant Östra Station close to KTH subway station. We have not booked tables but there should be room. The lunch is not paid by the conference.



13:00 - 13:20

Emergent speech behaviours/speech articulation

13:20 - 14:20
David House

Birdsong as model for infants’ emergent speech – a brief introduction

Axel G. Ekström

Rapid movements at segment boundaries – preliminary reports on manner

Malin Svensson Lundmark

The time course of onset CV coarticulation

Tugba Lulaci, Mechtild Tronnier, Pelle Söderström, Mikael Roll


14:20 - 15:00


Relationen mellan tal och andra områden samt tillämpningar The relation of phonetics to other fields and applications

15:00 - 16:00
Jens Edlund
Sofia Strömbergsson
LogopediSpeech and Language pathology, KI
Christine Ericsdotter Nordgren
Språkstudion, SU
Zofia Malisz
Tal, musik och hörselSpeech, Music & Hearing, KTH
Morgan Fredriksson
Nagoon/Liquid Media



16:00 - 16:20

History of phonetics and speech science

16:20 - 18:00
Joakim Gustafson

Gunnar Fant

Johan Malmstedt

Hypotheses should better be well-founded and not just testable

Hartmut Traunmüller
17:00 - 18:00

Another half a century in speech research

Björn Granström, Rolf Carlson
Det här är ett utökat bidrag och innehåller andra inslag utöver ren presentation This is an extended presentation and includes other elements than pure presentation



18:30 - 21:00



Glömda färdigheter: spektrogramläsning Forgotten skills: spectrogram reading

9:00 - 9:40
David House
Här presenteras även vinnaren av den stora spektrogramläsningstävlingen! The includes the reveal of the winner of the great spectrogram reading contest!


09:40 - 10:10

Voices of humans and machines

10:10 - 11:30
Mikael Roll

The prosody of surprise questions and exclamations as compared to information-seeking questions in Estonian

Eva-Liina Asu

Creaky voice in South Swedish accent 1

Anna Hjortdal

Deep learning for phonetically meaningful speech manipulation

Gustavo Teodoro Döhler Beck, Ulme Wennberg, Zofia Malisz, Gustav Eje Henter

The voice-mapping system FonaDyn – overview and demo

Sten Ternström
Det här är ett utökat bidrag och innehåller andra inslag utöver ren presentation This is an extended presentation and includes other elements than pure presentation


11:50 - 13:00 Lunch


Speech production by humans and machines

13:00 - 14:20
Petra Bodén

Spell new sounds with new letters. A study of how Swedish L2 learners’ spelling is affected by their L1

Cajsa Fransson, Malin Svensson Lundmark

Sardin: speech-oriented text processing

Christina Tånnander, Jens Edlund

Learning fast with fewer data samples using Neural HMMs

Shivam Mehta, Harm Lameris, Éva Székely, Jonas Beskow, Gustav Eje Henter

Spontaneous neural HMM TTS with prosodic feature modification

Harm Lameris, Shivam Mehta, Gustav Eje Henter, Ambika Kirkland, Birger Moëll, Jim O’Regan, Joakim Gustafson, Éva Székely


14:20 - 15:00


Perception of human and machine speech

15:00 - 16:00
Mechtild Tronnier

Phonetic and phonological variation in vowel discrimination performance: effect of Swedish vowel categories and dialects

Renata Kochančikaitė, Mikael Roll

Mapping specific characteristics of spoken text to listener ratings

Christina Tånnander, Jens Edlund

Formants in text-to-speech systems - comparing TTS voices of Blizzard Challenge 2013

Ayushi Pandey, Sébastien Le Maguer, Julie Carson-Berndsen, Naomi Harte



16:00 - 16:20

Corpora, models and tools 1

16:20 - 17:40
Jonas Beskow

Feature selection for labelling of whispered speech in ASMR recordings using Edyson

Pablo Pérez Zarazaga, Zofia Malisz

The Visible Speech platform - a research infrastructure for secure analysis of speech recordings

Fredrik Karlsson
Ny plats i schemat pga en annan ändring. Tack Fredrik! NB! New slot to make room for another change. Thanks Fredrik!

Speech data augmentation for improving phoneme transcriptions of aphasic speech for the PSST challenge1

Birger Moëll, Jim O’Regan, Shivam Mehta, Ambika Kirkland, Harm Laméris, Joakim Gustafsson, Jonas Beskow

Hearing voices at the National Library - a speech corpus and acoustic model for the Swedish language

Martin Malmsten, Chris Haffenden, Love Börjeson
OBS! Ny plats i schemat pga planeringskonflikt hos författarna. NB! New slot due to a scheduling conflict on the authors' side.



18:30 - 23:00



The commercial voice - a dying breed?

9:00 - 9:40
Christina Tånnander
Martin Forsström
Det här är en inbjuden talare och fyller hela sessionen. This is a keynote, and spans the entire session.


09:40 - 10:10

Corpora, models and tools 2

10:10 - 10:50
Zofia Malisz

Vocal activity detection and speaker diarization in speech databases: a feasibility study

Fredrik Karlsson

Continued finetuning as single speaker adaptation

Jim O'Regan


10:50 - 11:30
Björn Granström

Perception of F0 movements towards potential turn boundaries in German and Swedish conversation: background and methods for an eye-tracking study

Martina Rossi, Kathrin Feindt, Margaret Zellers

The influence of prosody on turn-taking models at syntactically ambiguous places

Erik Ekstedt, Gabriel Skantze


11:30 - 11:50


11:50 - 13:00 Lunch
Kostnaden för luncher ingår inte i Fonetik 2022. Luncherna är inte heller bokade, utan dropin, men borden bör räcka. The cost for lunches isn't included in Fonetik 2022. Also, we haven't booked tables, but there should be room enough. Måndagens lunch är på Östra Station nära KTHs tunnelbana. The Monday lunch is at restaurant Östra Station close to KTH subway station. Tisdagens och onsdagens luncher blir på Harpaviljongen i Lill-Jansskogen. Tuseday and Wednesday lunch are at Harpaviljongen in Lill-Jansskogen.


Socialt program Social programme

Detaljerad info om det sociala programmet kommer i omgångar de närmaste dagarna. De handlar om gemensamma luncher (helt frivilliga, och självbekostade), en reception och en middag, samt en tävling, en workshop, och naturligtvis fikaraster. Detailed info on the social programme will come over the next few days. On the agenda you'll find common lunches (absulutely optional, and not paid for by Fonetik), a reception and a dinner, a competition, a tutorial, and naturally "fika".

Proceedings Proceedings

Det här är förtryck. De officiella artiklarna publiceras i TMH_QPSR 3/2022 efter konferensens slut. These are preprints. The official proceedings will be printed in TMH-QPSR 3/2022 after the conference.
  • Birdsong as model for infants’ emergent speech – a brief introduction

    Author Axel G Ekström
    Abstract Songbirds have long and widely been considered a model species for the development of human speech capacities. Modelling efforts are dependent on parallels and similarities between emergent song and speech behavior. The present text describes eight such parallels, including, among others, neural lateralization, critical periods of development, and a dependency on auditory and perceptual feedback for normal development. The text takes as its unit of comparison patterns of speech observed in developing infants and patterns of song observed in juvenile songbirds, and serves at once as general summary of classic and contemporary research on the two phenomena, as well as a brief introduction to the topic.
    Date 13-15 june 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 4
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • Continued finetuning as single speaker adaptation

    Author Jim O’Regan
    Abstract The adaptation of unsupervised learning techniques to speech recognition have enabled the training of accurate models with less labelled training data, by finetuning a supervised classifier on top of a network pretrained using self-supervised methods. In this paper, we investigate if continuing the fine-tuning of such a model is suitable as a method of speaker adaptation for a single speaker, considering two kinds of user: the casual user, with data measurable in minutes, and the professional user, with data measurable in hours. We conduct experiments across a range of dataset sizes, in an attempt to provide a basis for estimates on how much data would be needed.
    Date 13-15 june 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 4
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • Creaky voice in South Swedish accent

    Author Anna Hjortdal
    Abstract Pitch and voice quality are increasingly understood as closely intertwined. While Swedish and Norwegian word accents have traditionally been understood in terms of pitch, Danish stød, which is systematically related when it comes to distribution and function, has been described as a type of creaky voice. According to the Laryngeal Articulator Model (LAM), both pitch lowering and creaky or harsh voice can be the acoustic outcomes of tightening the laryngeal constrictor mechanism. Laryngeal constriction has been proposed as the articulatory gesture behind word accents and stød. The present study investigated creaky voice in South Swedish word accents. Harmonics-to-noise ratio was significantly lower and jitter significantly higher in accent 1 compared to accent 2 stressed vowels. Further, jitter and shimmer was higher and spectral tilt was lower in sonorant consonants following stressed vowels. The results suggest that prototypical creaky voice is another cue to accent 1 in South Swedish and is in line with proposals that pitch falls in word accents correspond to laryngeal constriction.
    Date 13-15 june 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 6
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • Deep learning for phonetically meaningful speech manipulation

    Author Gustavo Teodoro Döhler Beck
    Author Ulme Wennberg
    Author Zofia Malisz
    Author Gustav Eje Henter
    Abstract The quality of synthetic speech has advanced rapidly in the last decade. Unfortunately, the new technologies have rarely proven to be useful for the speech sciences community. The modern methods lack direct and accurate control over important speech properties such as formants - necessary for stimulus creation in the speech sciences. Consequently, stimulus creation currently still relies on legacy methods that are typically based on task-specific signal processing. Consequently, using manipulated stimuli with audible signal processing artefacts may result in research findings that will not generalise to human perception of natural, artefact-free speech.
    Date 13-15 june 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 4
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • Feature selection for labelling of whispered speech in ASMR recordings using Edyson

    Author Pablo Pérez Zarazaga
    Author Zofia Malisz
    Abstract Whispered speech is a challenging area for traditional speech processing algorithms, as its properties differ from phonated speech and whispered data is not as easily available. A great amount of whispered speech recordings, however, can be found in the increasingly popular genre of ASMR in streaming platforms like Youtbe or Twitch. Whispered speech is used in this genre as a trigger to cause a relaxing sensation in the listener. Accurately separating whispered speech segments from other auditory triggers would provide a wide variety of whispered data, that could prove useful in improving the performance of data driven speech processing methods. We use Edyson as a labelling tool, with which a user can rapidly assign labels to long segments of audio using an interactive graphical interface. In this paper, we propose features that can improve the performance of Edyson with whispered speech and we analyse parameter configurations for different types of sounds. We find Edyson a useful tool for initial labelling of audio data extracted from ASMR recordings that can then be used in more complex models. Our proposed modifications provide a better sensibility for whispered speech, thus improving the performance of Edyson in the labelling of whispered segments.
    Date 13-15 june 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 2
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • Formants in text-to-speech systems - comparing TTS voices of Blizzard Challenge 2013

    Author Ayushi Pandey
    Author Sébastien Le Maguer
    Author Julie Carson-Berndsen
    Author Naomi Harte
    Author Sigmedia Lab
    Abstract Modern trends in synthesis evaluation attempt to capture finer aspects of the human experience of synthetic speech. However, a feature-based exploration of the synthetic speech signal, especially in comparison with human speech signal is still missing from the discussion.
    Date 13-15 june 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 6
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • Hearing voices at the National Library -a speech corpus and acoustic model for the Swedish language

    Author Martin Malmsten
    Author Chris Haffenden
    Author Love Börjeson
    Abstract This paper details our work in developing new acoustic models for automated speech recognition (ASR) at KBLab, the infrastructure for data-driven research at the National Library of Sweden (KB). We evaluate different approaches for a viable speech-to-text pipeline for audio-visual resources in Swedish, using the wav2vec 2.0 architecture in combination with speech corpuses created from KB’s collections. These approaches include pre-training an acoustic model for Swedish from the ground up, and fine-tuning existing monolingual and multilingual models. The collections-based corpuses we use have been sampled from millions of hours of speech, with a conscious attempt to balance regional dialects to produce a more representative, and thus more democratic, model. The acoustic model this enabled, “VoxRex”, outperforms existing models for Swedish ASR. We also evaluate combining this model with various pre-trained language models, which further enhanced performance. We conclude by highlighting the potential of such technology for cultural heritage institutions with vast collections of previously unlabelled audio-visual data.
    Date 13-15 june 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 5
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • Hypotheses should better be well-founded and not just testable

    Author Hartmut Traunmüller
    Abstract This is a contribution about the scientific method – about falsificationism and its non-applicability in phonetics and the life sciences in general – about the advantage of distinguishing between well-founded, provisional and fictitious hypotheses and the a priori confidence we can have in these types, marginally also about the principle of parsimony – and about path-dependence and the lockin effect of “normal science”.
    Date 13-15 june 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 4
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • Learning fast with fewer data samples using neural HMMs

    Author Shivam Mehta
    Author Harm Lameris
    Author Éva Székely
    Author Jonas Beskow
    Author Gustav Eje Henter
    Abstract The neural TTS paradigm synthesises significantly better-quality speech than the previous paradigm of HMM-based statistical parametric speech synthesis (SPSS). However, it requires a large amount of time and a larger corpus to learn the alignments between text and speech because of the underlying nonmonotonic attention mechanism. This paper presents the benefits of merging a neural TTS system with a Hidden Markov Model (HMM) thus mixing these two paradigms and getting the best of both worlds. We replace the underlying attention mechanism in a neural TTS with an autoregressive left-to-right noskip HMM defined by a neural network. This results in a system which learns to speak 10 times faster, requires fewer training samples, does not break down into gibberish, is smaller in size, is fully probabilistic, and allows easy control over the speaking rate without compromising the naturalness of the audio.
    Date 13-15 june 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 3
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • Perception of F0 movements towards potential turn boundaries in German and Swedish conversation: background and methods for an eye-tracking study

    Author Martina Rossi
    Author Kathrin Feindt
    Author Margaret Zellers
    Abstract Understanding the turn-taking system in conversation entails not only knowledge about the linguisticstructural and phonetic before Potential Turn Boundaries (PTBs), but crucially, the precise location of the transition space as well. To investigate the time domain and the phonological domain, we compare production and perception of turn ends in two related languages: German and Swedish. For the first part, we extracted pitch values at seven time points before PTBs from spontaneous speech produced in two-party conversations. The aim was to investigate the possible presence of specific patterns of variations that lead to either speaker change, floor keeping or backchannels. As no such patterns have emerged, for the second part, eye-tracking will be used to investigate the exact time point at which the ending of a turn can be projected by a listener and which acoustic signals are important for this prediction.
    Date 13-15 june 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 4
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • Phonetic and Phonological Variation in Vowel Discrimination Performance: Effect of Swedish Vowel Categories and Dialects

    Author Renata Kochančikaitė
    Author Mikael Roll
    Abstract Acoustic discrimination of speech sounds is affected by various factors, ranging from more universal acoustic properties of categories to the phoneme systems of the native language and dialect, and even influences from languages learned later in life. A discrimination experiment containing East Central Swedish vowels was carried out with 30 native Swedish listeners in order to explore the variation in vowel discrimination performance. Both phonetic and phonological variables have been found to have an effect on discrimination performance. Peripheral location of vowels in the F1/F2 vowel space was found to increase the discrimination performance. South Swedish dialectal area was associated with a decreased discrimination performance. Continuous exposure to foreign languages other than English was not a significant factor.
    Date 13-15 june 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 5
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • Rapid movements at segment boundaries – preliminary reports on manner

    Author Malin Svensson Lundmark
    Abstract This paper reports on a one-to-one relation between articulation and acoustics. It explains how segment boundaries are a result of rapid movements of the articulators. In the acceleration profile, this is identified as peak acceleration, which can be measured. A previous study found that rapid movements of an active articulator – peak acceleration – correlate with the acoustic segment boundary in bilabial and alveolar nasals ([m] and [n]). The purpose of the present paper is to extend this line of research and report on some preliminary data on other manner as well ([p], [b], [l]). The results of both studies show that the one-to-one relationship between acoustics and articulation exists both in different places of articulation and in different manner of articulation.
    Date 13-15 June 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 6
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • Sardin: speech-oriented text processing

    Author Christina Tånnander
    Author Jens Edlund
    Abstract We present Sardin, a text processing system for Swedish TTS production that has recently undergone significant refactoring in preparation for public release and is soon released as free and open software. Sardin is a text processing system with the goal to prepare text for speech-centric science, such as preparing text for speech synthesis training, or for use in speech applications, for example as input of different levels and detail to different TTS systems. The current version of Sardin handles several input and output formats (EPUB, Daisy XML, generic XML, text, IPA, SAMPA), and contains modules for chunking, tokenisation, part-of-speech tagging, text normalisation, and pronunciation and prosodic information.
    Date June 13-15 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 5
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • Speech data augmentation for improving phoneme transcriptions of aphasic speech for the PSST challenge

    Author Birger Moëll
    Author Jim O'Regan
    Author Shivam Mehta
    Author Ambika Kirkland
    Author Harm Laméris
    Author Joakim Gusafson
    Author Jonas Beskow
    Abstract As part of the PSST challenge, we explore how data augmentations, data sources, and model size affect phoneme transcription accuracy on speech produced by individuals with aphasia. We evaluate model performance in terms of feature error rate (FER) and phoneme error rate (PER). We find that data augmentations techniques, such as pitch shift, improve model performance. Additionally, increasing the size of the model decreases FER and PER. Our experiments also show that adding manually transcribed speech from non-aphasic speakers (TIMIT) improves performance when Room Impulse Response is used to augment the data. The best performing model combines aphasic and non-aphasic data and has a 21.0% PER and a 9.2% FER, a relative improvement of 9.8% compared to the baseline model on the primary outcome measurement. We show that data augmentation, larger model size, and additional non-aphasic data sources can be helpful in improving automatic phoneme recognition models for people with aphasia.
    Date 13-15 June 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • Spell new sounds with new letters. A study of how Swedish L2 learners’ spelling is affected by their L

    Author Cajsa Fransson
    Author Malin Svensson Lundmark
    Abstract The current study examines whether there is a connection between Swedish L2 students' L1 and their spelling in Swedish. The data were collected through a spelling test conducted at SFI course levels C and D. The experiment was conducted in an urban area in Småland in three groups consisting of 37 course participants with 12 different L1s. People with Arabic as their L1 are the focus of the study due to the selection. The results show that the spelling mistakes could to some extent be explained by the Arabic phoneme set. For example, consonant pairs were confused to a greater extent when one of the consonants was not in Arabic (i.e. p/b, k/g, v/f) than when both consonants in the pair are in Arabic (i.e. t/d, r/l).
    Date June 13-15 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 6
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • Spontaneous neural HMM TTS with prosodic feature modification

    Author Harm Lameris
    Author Shivam Mehta
    Author Gustav Eje Henter
    Author Ambika Kirkland
    Author Birger Moëll
    Author Jim O’Regan
    Author Joakim Gustafson
    Author Éva Székely
    Abstract Spontaneous speech synthesis is a complex enterprise, as the data has large variation, as well as speech disfluencies normally omitted from read speech. These disfluencies perturb the attention mechanism present in most Text to Speech (TTS) systems. Explicit modelling of prosodic features has enabled intuitive prosody modification of synthesized speech. Most prosody-controlled TTS, however, has been trained on read-speech data that is not representative of spontaneous conversational prosody. The diversity in prosody in spontaneous speech data allows for more wide-ranging data-driven modelling of prosodic features. Additionally, prosody-controlled TTS requires extensive training data and GPU time which limits accessibility. We use neural HMM TTS as it reduces the parameter size and can achieve fast convergence with stable alignments for spontaneous speech data. We modify neural HMM TTS to enable prosodic control of the speech rate and fundamental frequency. We perform subjective evaluation of the generated speech of English and Swedish TTS models and objective evaluation for English TTS. Subjective evaluation showed a significant improvement in naturalness for Swedish for the mean prosody compared to a baseline with no prosody modification, and the objective evaluation showed greater variety in the mean of the per-utterance prosodic features.
    Date June 13-15 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 4
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • The influence of prosody on turn-taking models at syntactically ambiguous places

    Author Erik Ekstedt
    Author Gabriel Skantze
    Abstract Turn-taking is a fundamental aspect of human communication and is the ability to organize turns, between the interlocutors, at appropriate locations throughout a conversation. In this work we investigate the influence of prosody on turn-taking using the recently proposed Voice Activity Projection model, which incrementally models the upcoming speech activity of the interlocutors in a self-supervised manner, without relying on explicit modelling of prosodic features, or specific annotations of turn-taking events. Inspired by psycholinguistic experiments we focus our analysis on single utterances containing syntactically ambiguous places, specifically designed to depend on prosody. We further investigate the implicit influence of prosody on the turn-taking model through prosodic manipulation of the speech signal.
    Date June 13-15 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 7
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • The time course of onset CV coarticulation

    Author Tugba Lulaci
    Author Mechtild Tronnier
    Author Pelle Söderström
    Author Mikael Roll
    Abstract The study investigates the center of gravity in onset fricatives as a main acoustic feature to assess the relation between vowel pronunciation and coarticulatory spectral characteristics of the onset consonant. /s/- and /f/-initial CV sequences were analyzed with backness, roundedness and height of the vowel as predictors of fricative center of gravity. Results showed that the first 15 ms of an onset fricative could carry predictive cues to the upcoming vowel.
    Date June 13-15 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 4
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • The voice-mapping system FonaDyn – overview and demo

    Author Sten Ternström
    Abstract The voice is notoriously variable, and conventional measurement paradigms are weak in terms of providing evidence for effects of treatment and/or training of voices. New methods are needed that can take into account the variability of scalar metrics across the voice range. The voice map, a generalization of the phonetogram, offers a frame of reference that can be used in many ways, for research and in the clinic. FonaDyn is a proof-of concept workbench that we are developing in order to explore and validate the mapping measurement paradigm. In this demo, you can try FonaDyn, to visualize and measure your own phonation faster and in greater detail than ever before.
    Date June 13-15 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 2
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)
  • Vocal activity detection and speaker diarization in speech databases: a feasibility study

    Author Fredrik Karlsson
    Abstract The task of creating speech corpora for phonetic research is time-consuming and could be alleviated by automatic algorithms to provide draft indexing of speech acts. The present investigation assessed the feasibility of applying speech segmentation and speaker diarization models across a collection of recordings to produce a draft indexing that could be utilised by speech management systems to help the researcher to navigate a corpus. The results show that a readily available model for speech segmentation is very likely to contribute to the effectiveness of speech annotation workflows in phonetic research. Speaker diarization models may require specific training to manage consistent speaker separation across a speech corpus, and the evaluated model currently offers no clear advantage to the effectiveness of a speech corpus creation process.
    Date 13-15 June 2022
    Language en
    Place Stockholm
    Publisher KTH Royal Institute of Technology
    Pages 4
    Proceedings Title Proc. of Fonetik 2022
    Conference Name Fonetik 2022 - the XXXIIIrd Swedish Phonetics Conference
    PreprintPDF (DOI pending)

Registrering Registration

Registreringen är stängd, men har ni frågor om registrering så går det bra att kontakta oss på Fonetik 2022s epost. The registrerion is closed now, but if you have questions about registration, you can contact us at Fonetik 2022's email.

Kontakt och organisation Contact & organisation

Registreringen är stängd, men har ni frågor om registrering så går det bra att kontakta oss på Fonetik 2022s epost: The registrerion is closed now, but if you have questions about registration, you can contact us at Fonetik 2022's email:


Organisationskommitten består av Jens Edlund, Christina Tånnander, Zofia Malisz och David House. The organisation committee consists of Jens Edlund, Christina Tånnander, Zofia Malisz and David House. Vi har haft benägen hjälp av många både i och utanför TMH, men spciellt ska nämnas Lia Malm, Jim O'Regan, Ghazaleh Esfandiari, Axel Exström och framför allt Rolf Carlson och Björn Granström som bidragit på alla tänkbara vis. We've had help from a number of people both from TMH and from elsewhere, but special mention goes to Lia Malm, Jim O'Regan, Ghazaleh Esfandiari, Axel Exström and above all Rolf Carlson and Björn Granström who contributed in every concievable manner.


Ett extra tack till Fonetikstiftelsen som även i år bidragit till att hålla nere kostnaderna för deltagarna. Special thanks to Fonetikstiftelsen for once again helping to keep down the costs for the participants.