CATiB Guidelines

The Columbia Arabic Treebank (CATiB) formalism is a syntactic dependency representation used for the Arabic language.
CATiB contrasts with previous approaches to Arabic treebanking in its emphasis on speed with some constraints on linguistic richness. Two basic ideas inspire the CATiB approach: minimizing annotation of redundant information and using representations and terminology inspired by traditional Arabic syntax.

Guidelines (v 9.0, 2009)


Relevant Publications


  1. Nizar Habash and Ryan Roth. Catib: the columbia Arabic treebank. In Proceedings of the ACL-IJCNLP 2009 conference short papers, 221–224. 2009. URL: https://www.aclweb.org/anthology/P09-2056.pdf

  2. Nizar Habash, Reem Faraj, and Ryan Roth. Syntactic annotation in the columbia Arabic treebank. In Proceedings of MEDAR International Conference on Arabic Language Resources and Tools, Cairo, Egypt. 2009. URL: http://www.elda.org/medar-conference/pdf/25.pdf

  3. Dima Taji, Jamila El Gizuli, and Nizar Habash. An Arabic dependency treebank in the travel domain. arXiv preprint arXiv:1901.10188, 2019. URL: http://lrec-conf.org/workshops/lrec2018/W30/pdf/14_W30.pdf

  4. Anas Shahrour, Salam Khalifa, Dima Taji, and Nizar Habash. Camelparser: a system for Arabic syntactic analysis and morphological disambiguation. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations, 228–232. 2016. URL: https://www.aclweb.org/anthology/C16-2048.pdf

  5. Anas Shahrour, Salam Khalifa, and Nizar Habash. Improving Arabic diacritization through syntactic analysis. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1309–1315. 2015. URL: https://www.aclweb.org/anthology/D15-1152.pdf

  6. Yuval Marton, Nizar Habash, and Owen Rambow. Dependency parsing of modern standard Arabic with lexical and inflectional features. Computational Linguistics, 39(1):161–194, 2013. URL: https://www.mitpressjournals.org/doi/full/10.1162/COLI_a_00138