Discourse Annotation Tool for Turkish

I hold a PhD in linguistics and carry out interdisciplinary research by compiling (electronic) language resources, particularly by recording linguistic data in corpora (this is called linguistic annotation). My research specialties are discourse and pragmatics and their role in understanding human cognition. Recently, I have concentrated on discourse mechanisms trying to understand the role of discourse relations in human languages. I analyze (written) Turkish texts, investigate explicit and implicit ways of signalling discourse relations and try to specify the features that could be important for discourse relations.

I have been the principal developer of Turkish Discourse Bank or TDB (an electronic resource of Turkish annotated for discourse relations in the Penn Discourse TreeBank style), created with the generous support of a TUBITAK (Scientific and Technological Research Council of Turkey) project (No. 107E156) 2007-2011. My corpus development (and linguistic annotation) efforts go hand in hand with my research on discourse mechanisms. Linguistically annotated corpora are ultimately inputs to language technologies and TDB is the platform where our inquiries on Turkish discourse are recorded with the ultimate aim of serving theoretical investigations and future language technology applications.



Kutlu, F., Zeyrek, D., Kurfalı, M. (2023). Toward a Shallow Discourse Parser of Turkish. Natural Language Engineering, First View, pp. 1-26.

Ersöyleyen, E., Zeyrek, D. & Öter, F. (2023). Annotating and Disambiguating the Discourse Usage of the Enclitic dA in Turkish in: Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII) , Association of Computational Linguistics, 2023, pp. 46-54.

Başıbüyük, K. & Zeyrek, D. (2023). Usage Disambiguation of Turkish Discourse Connectives. Language Resources and Evaluation, 57(1), 223-256.

Mendes, A., Zeyrek, D., Oleškevičienė, G. V. (2023). Explicitness and Implicitness of Discourse Relations in a Multilingual Discourse Bank. Functions of Language, 30(1), 67-91.


Zeyrek, D., Mendes, A., Oleškevičienė, G. V., & Özer, S. (2022). An Exploratory Analysis of TED Talks in English and Lithuanian, Portuguese and Turkish Translations: Results from the Analysis of an Annotated Multilingual Corpus. Contrastive Pragmatics, 3(3), 452-479.

Özer, S., Kurfalı, M., Zeyrek, D., Mendes, A.,Valūnaitė Oleškevičienė, G. (2022) ‘Linking Discourse-level Information and the Induction of Bilingual Discourse Connective Lexicons’. Semantic Web Journal 1 Jan. 2022 : 1081 – 1102.


Mendes, A. and Zeyrek, D. (2021). The discourse markers well and so and their equivalents in the Portuguese and Turkish subparts of the TED-MDB corpus. In J. Lavid-López, C. Maíz-Arévalo and J. R. Zamorano-Mansilla (Eds.) Corpora in Translation and Contrastive Research in the Digital Age: Recent Advances and Explorations (pp. 337-356). John Benjamins.


Zeyrek, D. and Özge, U. (Editors) (2019). Discourse Meaning: The View from Turkish. May, 2020, Trends in Linguistics, De Gruyter Mouton.

Kurfalı, M., Ozer, S., Zeyrek, D. and Mendes, A. (2020). TED-MDB Lexicons: TrEnConnLex, PtEnConnLex in: Proceedings of the First Workshop on Computational Approaches to Discourse , 2020, pp. 148–153.

Canpolat, Salih, F., Ormanoğlu, Z. & Zeyrek, D. (2020). Turkish Emotion-Voice Database (TurEV-DB). Proceedings of the First Joint SLTU and CCURL Workshop (Spoken Language Technologies for Under-resourced Languages and Collaboration and Computing for UnderResourced Languages), pages 368-375. 12th edition of the Language Resources and Evaluation Conference, 11-16 May 2020, Marsailles, France (LREC 2020).


Zeyrek, D., and Başıbüyük, K. (2019). TCL - A Lexicon of Turkish Discourse Connectives. Proceedings of the First International Workshop on Designing Meaning Representations, pages 73-81. August 1st, 2019 Florence, Italy, Association for Computational Linguistics.

Özer, S. and Zeyrek, D. (2019). An automatic discourse relation alignment experiment on TED-MDB. Proceedings of the 2019 Workshop on Widening NLP. Florence, Italy, 2019 Association for Computational Linguistics.

Friedrich, A., Zeyrek, D., Hoek, J. (Editors) (2019). Proceedings of the 13th Linguistic Annotation Workshop. August 1st, 2019, Florence, Italy, Association for Computational Linguistics.

Zeyrek, D., Mendes, A., Grishina, Y., Kurfalı, M., Gibbon, S., & Ogrodniczuk, M. (2019). TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style.Language Resources and Evaluation, 1-27.

Zeyrek, D. (2019). Discourse Structure: The View from Shared Arguments in Turkish Discourse Bank. In S. Özsoy (Ed.) Word Order in Turkish, pp. 287-306. Springer, Cham, 2019.


Oleskeviciene, G. V., Zeyrek, D., Mazeikiene, V., & Kurfalı, M. (2018). Observations on the annotation of discourse relational devices in TED talk transcripts in Lithuanian. In Proceedings of the workshop on annotation in digital humanities co-located with ESSLLI (Vol. 2155, pp. 53-58).

Zeyrek-Bozşahin, D., Soycan, N. (2018). Türkçe Söylem Bankasinda söylem bagintilarinin metin türlerine göre degerlendirilmesi.Y. Aksan, M. Aksan (Haz.). Türkçede Yapi ve İşlev: Prof. Dr. Şükriye Ruhi Armağanı. (s. 131-144). Bilgesu, Ankara.

Zeyrek, D., Demirşahin, I., & Bozşahin, C. (2018). Turkish Discourse Bank: Connectives and Their Configurations. In K. Oflazer and M. Saraclar (Eds.) Turkish Natural Language Processing (pp. 337-356). Springer, Cham.

Zeyrek, D., Mendes, A., Kurfalı, M. (2018). Multilingual Extension of PDTB-Style Annotation: The Case of TED Multilingual Discourse Bank. LREC 2018, 11th edition of the Language Resources and Evaluation Conference, 7-12 May 2018, Miyazaki (Japan).

Zeyrek, D., Kurfalı, M. (2018). An Assessment of Explicit Inter- and Intra-sentential Discourse Connectives in Turkish Discourse Bank. LREC 2018, 11th edition of the Language Resources and Evaluation Conference, 7-12 May 2018, Miyazaki (Japan).


Zeyrek, D. (2017). TED Multilingual Discourse Bank (TED-MDB): A parallel corpus annotated in the PDTB style. 11th Linguistic Annotation Workshop (LAW) , European Chapter of the Association of Computational Linguistics. April 3rd, 2017, Valencia.

Zeyrek, D. & Kurfalı, M. TDB 1.1: Extensions on Turkish Discourse Bank. Proc., 11th Linguistic Annotation Workshop (LAW) European Chapter of Association of Computational Linguistics. 3-4 2017, Valencia, Spain.

Demirşahin, I. & Zeyrek, D. (2017). Pair Annotation as a Novel Annotation Procedure: The Case of Turkish Discourse Bank. In Nancy Ide & James Pustejovsky (Eds.) Handbook of Linguistic Annotation. Springer.


Kaygusuz, Y. & Zeyrek, D. (2016). Turkish children's early vocabulary: A study on the lexical diversity of two sisters. B. Haznedar & R. N. Ketez (Eds.) The Acquisition of Turkish in Childhood. Trends in Language Acquisition Research, 20. John Benjamins (pp. 57-78).

Tolgay, A. E., Zeyrek, D., Kurfalı, M., Bozşahin, C. (2016). A Turkish database for psycholinguistic studies based on frequency, age of acquisition, and imageability. LREC 2016, 10th edition of the Language Resources and Evaluation Conference, 23-28 May 2016, Portorož (Slovenia).

Kurfalı, M., Zeyrek, D., Gonçalves, T. (2016) Automatic prediction of implicit discourse relations in Turkish. Conference Handbook: Structuring Discourse in Multilingual Europe Second Action Conference. Károli Gáspár University of the Reformed Church in Hungary Budapest, 11 –14 April, 2016 (pp. 65-70).


Zeyrek, D., Sağın-Şimşek, Ç., Ataş, U., Rehbein, J. (Eds) (2015). Ankara Papers in Turkish and Turkic Linguistics. Wiesbaden: Harrasowitz Verlag.

Zeyrek, D., Demirşahin, I., Sevdik-Çallı, A. B., Kurfalı, M. (2015). Annotating implicit discourse relations in Turkish: The challenge of corrective discourse relations. Paper presented at the workshop Discourse connectives across languages and modes: Challenges for discourse annotation, organized by Sandrine Zufferey, Liesbeth Degand & Daniel Hardt. 14th International Pragmatics Association (IPRA) Conference. Antwerp, Belgium 26-31 July, 2015. Abstracts (pp. 455-456).


Zeyrek, D. (2014). On the distribution of the contrastive-concessive discourse connectives ama 'but/yet ' and fakat 'but ' in written Turkish. Suihkonen, P., & Whaley, L. J. (Eds.). (2014). On Diversity and Complexity of Languages Spoken in Europe and North and Central Asia (Vol. 164). John Benjamins Publishing Company (pp. 251-275).

Zeyrek, D. & Acartürk, C. (2014). The distinction between unaccusative and unergative verbs in Turkish: an offline and an eyetracking study of split intransitivity. Cogsci 2014 Proceedings (pp.1832-1837).

Demirşahin, I. & Zeyrek, D. (2014). Annotating discourse connectives in spoken Turkish. LAW VIII - The 8th Linguistic Annotation Workshop, Dublin, Ireland, August 23-24, 2014 (pp. 105-109).

Erten, B., Bozşahin, C., Zeyrek, D. (2014). Turkish resources for visual word recognition. LREC 2014,The 9th edition of the Language Resources and Evaluation Conference, 26-31 May, Reykjavik, Iceland (pp. 2106-2110).


Zeyrek, D., Demirşahin, I., Sevdik-Çallı, Ayışığı, B., Çakıcı, R. (2013). Turkish Discourse Bank: Porting a discourse annotation style to a morphologically rich language. Dialogue & Discourse. Vol. 4, No. 2: 174-184.

Demirşahin, I., Öztürel, A., Bozşahin, C., Zeyrek, D. (2013). Applicative structures and immediate discourse in the Turkish Discourse Bank. Proc. of the 7th Linguistic Annotation Workshop & Interoperability with Discourse (pages 122-130). Aug. 8-9 2013, Sofia, Bulgaria.

Zeyrek, D. (2013) The discourse connective yerine 'instead ' in Turkish (13th International Pragmatics Conference, 8-13 September, 2013, New Delhi, India).


Zeyrek, D., Turan Ü. D., Demirşahin I., & Çakıcı R. (2012). Differential properties of three discourse connectives in Turkish: A corpus-based analysis of Fakat, Yoksa, Ayrıca. In A. Benz, M. Stede, & P. Kühnlein (eds.). Constraints in Discourse 3. Representing and inferring discourse structure. (pp. 183-206) John Benjamins Publishing Company.

Demirşahin, Işın, Yalçınkaya İ., & Zeyrek D. (2012). Pair annotation: Adaption of Pair Programming to Corpus Annotation. Proceedings. Association of Computational Linguistics, Proceedings of the Sixth Linguistic Annotation Workshop (pp. 31-39).

Göy, E., Zeyrek D., & Otcu B. (2012). Developmental Patterns in Internal Modification Use in Requests: A Quantitative Study on Turkish Learners of English.In Helen Woodfield & Maria Kogetsidis (Eds.). Interlanguage Request Modification (pp. 51-87). John Benjamins Publishing Company.

Şirin, U., Çakıcı R., & Zeyrek D. (2012). METU Turkish Discourse Bank Browser. Proceedings, The 8th edition of the Language Resources and Evaluation Conference (LREC), 21-27 May, 2012, Istanbul, Turkey.

Demirşahin, Işın, Çallı A. B. S., Balaban H. Ö., Çakıcı R., & Zeyrek D. (2012). Turkish Discourse Bank: Ongoing Developments. Proceedings, Workshop on Turkic Languages. The 8th edition of the Language Resources and Evaluation Conference (LREC), 21-27 May, 2012, Istanbul, Turkey.

Zeyrek, D. (2012). Thanking in Turkish: a Corpus-based Analysis. In Leire Luiz Zarobe and Yolanda Luiz Zarobe (Eds.). Speech Acts and Politeness across Languages and Cultures. Speech Acts and Politeness across Languages and Cultures (pp. 27-52). Peter Lang.


Turan, Ü. D., & Zeyrek D. (2011). Context, contrast, and the structure of discourse in Turkish. In Anita Fetzer & Etsuko Oishi (Eds.). Context and contexts: Parts meets whole? (pp. 147-170). John Benjamins.


Zeyrek, D., Demirşahin Işın, Çallı A. B. S., Balaban H. Ö., Yalçınkaya İ., & Turan Ü. D. (2010). The annotation scheme of the Turkish Discourse Bank and an evaluation of inconsistent annotations. Proceedings of the Fourth Linguistic Annotation Workshop (pp. 282–289). 15–16 July 2010, Uppsala, Sweden.

Aktaş, B., Bozşahin C., & Zeyrek D. (2010). Discourse Relation Configurations in Turkish and an Annotation Environment. ACL 2010. LAW IV. Fourth Linguistic Annotation Workshop.Proceedings of the Workshop (pp. 202-206).15-16 July 2010, Uppsala, Sweden.

Akgün, M., Çağıltay K., & Zeyrek D. (2010). The effect of apologetic error messages and mood stated on computer users' self-appraisal of performance. Journal of Pragmatics. 42(9), 2349-2448.

Acartürk, C., & Zeyrek D. (2010). Unaccusative/Unergative Distinction in Turkish: A Connectionist Approach. The 23rd International Conference on Computational Linguistics. Proceedings of the 8th Workshop on Asian Language Resources


I lead a research group interested in developing discourse-annotated corpora as inputs to NLP systems and as the basis of linguistic investigations. We have been mainly working on Turkish Discourse Bank to enrich it with more annotations. TDB itself was built with a grant from TÜBİTAK, The Scientific and Technological Research Council of Turkey. You can access my research group Turkish Discourse Bank from here as well.

I was part of Textlink: Structuring Discourse in Multilingual Europe (COST Action IS1312).

Recently, I have been involved in developing a multilingual corpus representing several languages in the Textlink Action. This corpus is called TED-Multilingual Discourse Bank, or TED-MDB and currently involves English, Turkish, Portuguese, German, Russian, Polish and Lithuanian. This effort is in progress.


I usually teach Cogs 541 Language Acquisition and Cogs 528 Discourse Mechanisms.

Scientific committees in conferences

I have been involved in the program committee or scientific committee of various interdisciplinary conferences and workshops, for example, Textlink - Structuring Discourse in Multilingual Europe; Language Resources and Evaluation Conference (LREC), Association of Computational Linguistics Conference (ACL), EMNLP (Empirical Methods in Natural Language Processing), NAACL (Annual Conference of the North American Chapter of the Association for Computational Linguistics. I also serve in the program committees of International Conference of Turkish Linguistics (ICTL) and National Turkish Linguistics Conferences (Ulusal Türk Dilbilim Kurultayı).

I am the Editor of Dilbilim Araştırmaları Dergisi/Journal of Linguistic Research, one of the oldest peer-reviewed journals of linguistics in Turkey publishing research articles (primarily) on Turkish.


