Corpus of Spoken Yiddish in Europe

A digital language archive sourced from Holocaust survivor testimonies

Discover the sounds of Yiddish

The Corpus of Spoken Yiddish in Europe (CSYE) is an Open Access digital language archive based on hundreds of video-recorded interviews with survivors of the Holocaust. The materials contained in the corpus are a testament to the social and linguistic diversity of Yiddish-speaking Jewish society and an invaluable resource for linguistic research, Yiddish language instruction, and Holocaust education and commemoration.

The CSYE consists of testimony interviews from the USC Shoah Foundation and time-aligned transcripts, in both the Yiddish alphabet and transliteration. These materials are available free of charge to researchers, students and teachers, and the broader public, subject to our Terms of Use. Visitors to the corpus website can discover survivor interviews by clicking through the Testimonies Index or by exploring the birthplaces on our Map. Each testimony page includes metadata about the survivor, an embedded video player with subtitles (in both orthographies), and links to download the audio files and transcripts in various formats. The transcripts are also displayed on each testimony page in an interactive searchable table. For more details, see our User Guide.

Other digital artifacts that will be made available in the CSYE include transcripts with word- and phoneme-level alignments and a pronunciation dictionary and acoustic model for use with forced alignment software.

The CSYE is a multi-year project, developed with the support of grants and research fellowships. Upon its completion, the corpus will be the most extensive source of conversational Yiddish ever compiled.