Professor Shu-Kai Hsieh, Graduate Institute of Linguistics
Voice marks the moment when the human body meets the world—a vibration of air shaped by muscle, culture, and emotion. At the Graduate Institute of Linguistics at NTU, our research investigates this embodied phenomenon across multiple dimensions by bridging physical mechanisms and cultural interpretation.
Our research begins with the microdynamics of speech production. Using real-time ultrasound imaging, we demonstrate how articulatory movements form the foundation of language (Fig. 15). Beyond physiology, our sociophonetic research further examines variation and emotional expression, revealing how voice serves as a crucial medium for social identity and human connection (Figs. 16–17).
We also investigate the linguistic ancestry of Taiwan’s Formosan languages. Leveraging the latest speech vector technologies, we model acoustic features to reconstruct the complex genealogies of Austronesian groups (Fig. 18). These analyses not only trace history but also make audible ancient sound relationships, thereby connecting contemporary embodied voices to the island’s cultural heritage.
As artificial intelligence produces increasingly expressive synthetic voices and conversational agents acquire greater warmth, rhythm, and apparent empathy, our research addresses questions of presence and personhood. When algorithmic voices evoke emotional responses, where does embodiment end and imagination begin? Through the study of voice—human and artificial—we give the humanities a resonance that can be heard and felt. We reconsider the role of language in an era of growing human–technology integration.
Fig. 15. Experimental setup for ultrasound (imaging) of articulation. (Image Source: Chenhao Chiu)
Fig. 16. Spectral analysis of the presence or absence of affricates in Taiwan Mandarin. (Image Source: Janice Fon)

Fig. 17. Comparison of second and third tones differences between Chinese Mandarin and English. (Image Source: Janice Fon)

Fig. 18. Linguistic clustering of Taiwan's Austronesian population groups using speech vector technology. (Image Source: Shu-Kai Hsieh)