Embodied Voices: From Human Phonation to AI Speech

MoblieMenu

search

Search

Advanced Search

vol.126
PAST ISSUES

vol.126

vol.125

vol.124

vol.123

vol.122

vol.121

vol.120

vol.119

vol.118

vol.117

vol.116

vol.115

vol.114

vol.113

vol.112

vol.111

vol.110

vol.109

vol.108

vol.107

vol.106

vol.105

vol.104

vol.103

vol.102

vol.101

vol.100

vol.99

vol.98

vol.97

vol.96

no.01~95
SUBSCRIBE

FooterLink

back to top

:::

Embodied Voices: From Human Phonation to AI Speech

Professor Shu-Kai Hsieh, Graduate Institute of Linguistics

Voice marks the moment when the human body meets the world—a vibration of air shaped by muscle, culture, and emotion. At the Graduate Institute of Linguistics at NTU, our research investigates this embodied phenomenon across multiple dimensions by bridging physical mechanisms and cultural interpretation.

Our research begins with the microdynamics of speech production. Using real-time ultrasound imaging, we demonstrate how articulatory movements form the foundation of language (Fig. 15). Beyond physiology, our sociophonetic research further examines variation and emotional expression, revealing how voice serves as a crucial medium for social identity and human connection (Figs. 16–17).

We also investigate the linguistic ancestry of Taiwan’s Formosan languages. Leveraging the latest speech vector technologies, we model acoustic features to reconstruct the complex genealogies of Austronesian groups (Fig. 18). These analyses not only trace history but also make audible ancient sound relationships, thereby connecting contemporary embodied voices to the island’s cultural heritage.

As artificial intelligence produces increasingly expressive synthetic voices and conversational agents acquire greater warmth, rhythm, and apparent empathy, our research addresses questions of presence and personhood. When algorithmic voices evoke emotional responses, where does embodiment end and imagination begin? Through the study of voice—human and artificial—we give the humanities a resonance that can be heard and felt. We reconsider the role of language in an era of growing human–technology integration.

Fig. 15. Experimental setup for ultrasound (imaging) of articulation. (Image Source: Chenhao Chiu)

Fig. 16. Spectral analysis of the presence or absence of affricates in Taiwan Mandarin. (Image Source: Janice Fon)

Art editor Img

Fig. 17. Comparison of second and third tones differences between Chinese Mandarin and English. (Image Source: Janice Fon)

Art editor Img

Fig. 18. Linguistic clustering of Taiwan's Austronesian population groups using speech vector technology. (Image Source: Shu-Kai Hsieh)

NTU HIGHLIGHTS

共編單位次文 2

共編單位次文 2