Publications

Here is a complete list of my publications. You can also visit my profiles at Google Scholar, Semantic Scholar, DBLP, and ACL Anthology.

2021

Farhad Akhbardeh, Cecilia O. Alm, Marcos Zampieri, Travis Desell (2021) Handling Extreme Class Imbalance in Technical Logbook Datasets. Proceedings of The 59th Annual Meeting of the Association for Computational Linguistics (ACL). pdf

Ana-Maria Bucur, Marcos Zampieri, Liviu P Dinu (2021) An Exploratory Analysis of the Relation Between Offensive Language and Mental Health. Findings of the Association for Computational Linguistics. pdf url

Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Marcos Zampieri, Preslav Nakov (2021) A Large-Scale Semi-Supervised Dataset for Offensive Language Identification. Findings of the Association for Computational Linguistics. pdf url

Tharindu Ranasinghe, Marcos Zampieri (2021) MUDES: Multilingual Detection of Offensive Spans. Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). pp. 144–152. Online. pdf url

Tharindu Ranasinghe, Marcos Zampieri (2021) Multilingual Offensive Language Identification for Low-resource Languages. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP). pdf

Marcos Zampieri, Preslav Nakov (Editors) (2021) Similar Languages, Varieties, and Dialects: A Computational Perspective. Studies in Natural Language Processing. Cambridge University Press. url

Matthew Shardlow, Richard Evans, Gustavo Henrique Paetzold, Marcos Zampieri (2021) Semeval-2021 Task 1: Lexical Complexity Prediction Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval). Online. pdf

Abhinandan Desai, Kai North, Marcos Zampieri, Christopher M Homan (2021) LCP-RIT at SemEval-2021 Task 1: Exploring Linguistic Features for Lexical Complexity Prediction Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval). Online. pdf

Tharindu Ranasinghe, Diptanu Sarkar, Marcos Zampieri, Alex Ororbia (2021) WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for Detecting Toxic Spans Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval). Online. pdf

Bharathi Raja Chakravarthi, Gaman Mihaela, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Ruba Priyadharshini, Christoph Purschke, Eswari Rajagopal, Yves Scherrer, Marcos Zampieri (2021) Findings of the VarDial Evaluation Campaign 2021. Proceedings of the 8th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). pp. 120-127. Online. pdf url

Tommi Jauhiainen, Tharindu Ranasinghe, Marcos Zampieri (2021) Comparing Approaches to Dravidian Language Identification. Proceedings of the 8th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). pp. 120-127. Online. pdf url

Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Yves Scherrer, Tommi Jauhiainen (Editors) (2021) Proceedings of the 8th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). Association for Computational Linguistics. Online. pdf url

2020

Tharindu Ranasinghe, Marcos Zampieri (2020) Multilingual Offensive Language Identification with Cross-lingual Embeddings. Proceedings of Empirical Methods in Natural Language Processing (EMNLP). pp. 5838–5844. Online. pdf url

Farhad Akhbardeh, Travis Desell, Marcos Zampieri (2020) MaintNet: A Collaborative Open-Source Library for Predictive Maintenance Language Resources. Proceedings of the 28th International Conference on Computational Linguistics (COLING). pp. 7-11. Barcelona, Spain (Online). pdf url

Farhad Akhbardeh, Travis Desell, Marcos Zampieri (2020) NLP Tools for Predictive Maintenance Records in MaintNet. Proceedings of the First Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing (AACL-IJCNLP). pp. 26-32. Online. pdf url

Marcos Zampieri, Preslav Nakov, Yves Scherrer (2020) Natural Language Processing for Similar Languages, Varieties, and Dialects: A Survey. Natural Language Engineering. Volume 26. Issue 6. pp. 595-612. Cambridge University Press. url

Marcos Zampieri, Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis, Çağrı Çöltekin (2020) Semeval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (Offenseval 2020). Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval). pp. 1425–1447. Barcelona, Spain (Online). pdf url

Mihaela Gaman, Dirk Hovy, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Christoph Purschke, Yves Scherrer, Marcos Zampieri (2020) A Report on the VarDial Evaluation Campaign 2020. Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). pp. 1-14. Barcelona, Spain (Online). pdf url

Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Yves Scherrer (Editors) (2020) Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). Association for Computational Linguistics. Online. pdf url

Marcos Zampieri, Preslav Nakov (Editors) (2020) Special Issue on NLP for Similar Languages, Varieties, and Dialects - Part 2. Natural Language Engineering. Volume 26, Issue 6. Cambridge University Press. url

Sarah Luger, Martina Anto-Ocrah, Tapo Allahsera, Christopher M. Homan, Marcos Zampieri, Michael Leventhal (2020) Health Care Misinformation: An Artificial Intelligence Challenge for Low-resource Languages Proceedings of the AAAI Fall Symposium. Online. pdf

Allahsera Auguste Tapo, Bakary Coulibaly, Sébastien Diarra, Christopher Homan, Julia Kreutzer, Sarah Luger, Arthur Nagashima, Marcos Zampieri, Michael Leventhal (2020) Neural Machine Translation for Extremely Low-Resource African Languages: A Case Study on Bambara. Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages. pp. 23–32. Online. pdf url

Sarah Luger, Tapo Allahsera, Michael Leventhal, Christopher Homan, Marcos Zampieri (2020) Towards a Crowdsourcing Platform for Low Resource Languages–A Semi-Supervised Approach. Proceedings of the 8th AAAI Conference on Human Computation and Crowdsourcing (HCOMP). pdf

Santanu Pal, Marcos Zampieri (2020) Neural Machine Translation for Similar Languages: The Case of Indo-Aryan Languages. Proceedings of the Fifth Conference on Machine Translation (WMT). pp. 424–429. Online. pdf url

Loïc Barrault, Magdalena Biesialska, Ondřej Bojar, Marta R. Costa-jussà, Christian Federmann, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Matthias Huck, Eric Joanis, Tom Kocmi, Philipp Koehn, Chi-kiu Lo, Nikola Ljubešić, Christof Monz, Makoto Morishita, Masaaki Nagata, Toshiaki Nakazawa, Santanu Pal, Matt Post, Marcos Zampieri (2020) Findings of the 2020 Conference on Machine Translation (WMT20). Proceedings of the Fifth Conference on Machine Translation (WMT). pp. 1-55. Online. pdf url

Ritesh Kumar, Atul K. Ojha, Shervin Malmasi, Marcos Zampieri (2020) Evaluating Aggression Identification in Social Media. Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (TRAC-2). pp. 1-5. Marseille, France. pdf url

Ritesh Kumar, Atul Kr. Ojha, Bornini Lahiri, Marcos Zampieri, Shervin Malmasi, Vanessa Murdock, Daniel Kadar (Editors) (2020) Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying. Association for Computational Linguistics. Marseille, France. pdf url

Tharindu Ranasinghe, Sarthak Gupte, Marcos Zampieri, Ifeoma Nwogu (2020) WLV-RIT at HASOC-Dravidian-CodeMix-FIRE2020: Offensive Language Identification in Code-switched YouTube Comments. Proceedings of the Forum for Information Retrieval Evaluation (FIRE). Hyderabad, India. pdf

Zeses Pitenis, Tharindu Ranasinghe, Marcos Zampieri (2020) Offensive Language Identification in Greek. Proceedings of Language Resources and Evaluation (LREC). pp. 5113-5119. Marseille, France. pdf url

Matthew Shardlow, Michael Cooper, Marcos Zampieri (2020) CompLex - A New Corpus for Lexical Complexity Prediction from Likert Scale Data. Proceedings of the Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI). pp. 57-62. Marseille, France. pdf url

2019

Tommi Jauhiainen, Marco Lui, Marcos Zampieri, Timothy Baldwin, Krister Lindén (2019) Automatic Language Identification in Texts: A Survey. Journal of Artificial Intelligence Research (JAIR). Volume 65. pp. 675-782. pdf url

Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar (2019) Predicting the Type and Target of Offensive Posts in Social Media. Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). pp. 1415-1420. Minneapolis, United States. pdf url

Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar (2019) SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval). Proceedings of the International Workshop on Semantic Evaluation (SemEval). pp. 75-86. Minneapolis, United States. pdf url

Gustavo Henrique Paetzold, Shervin Malmasi, Marcos Zampieri (2019) UTFPR at SemEval-2019 Task 5: Hate Speech Identification with Recurrent Neural Networks. Proceedings of the International Workshop on Semantic Evaluation (SemEval). pp. 519-523. Minneapolis, United States. pdf url

Mihaela Vela, Santanu Pal, Marcos Zampieri, Sudip Kumar Naskar, Josef van Genabith (2019) Improving CAT Tools in the Translation Workflow: New Approaches and Evaluation. Proceedings of the 17th Machine Translation Summit (MT Summit). pp. 8-15. Dublin, Ireland. pdf url

Tharindu Ranasinghe, Marcos Zampieri, Hansi Hettiarachchi (2019) BRUMS at HASOC 2019: Deep Learning Models for Multilingual Hate Speech and Offensive Language Identification. Proceedings of the 11th Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE). Kolkata, India. pdf url

Loïc Barrault, Ondřej Bojar, Marta R Costa-jussà, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias Müller, Santanu Pal, Matt Post, Marcos Zampieri (2019) Findings of the 2019 Conference on Machine Translation (WMT19). Proceedings of the Proceedings of the Fourth Conference on Machine Translation (WMT) (Volume 2: Shared Task Papers, Day 1) . pp. 1–61. Florence, Italy. pdf url

Santanu Pal, Marcos Zampieri, Josef van Genabith (2019) UDS–DFKI Submission to the WMT2019 Similar Language Translation Shared Task. Proceedings of the Proceedings of the Fourth Conference on Machine Translation (WMT) (Volume 3: Shared Task Papers, Day 2) . pp. 219–223. Florence, Italy. pdf url

Alistair Plum, Marcos Zampieri, Constantin Orasan, Eveline Wandl-Vogt, Ruslan Mitkov (2019) Large-scale Data Harvesting for Biographical Data. Proceedings of Biographical Data in a Digital World (BD 2019). Varna, Bulgaria. pdf url

Marcos Zampieri, Preslav Nakov (Editors) (2019) Special Issue on NLP for Similar Languages, Varieties, and Dialects - Part 1. Natural Language Engineering. Volume 25, Issue 5. Cambridge University Press. url

Marcos Zampieri, Preslav Nakov, Shervin Malmasi, Nikola Ljubešić, Jörg Tiedemann, Ahmed Ali (Editors) (2019) Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). Santa Fe, United States. Association for Computational Linguistics. pdf url

Marcos Zampieri, Shervin Malmasi, Yves Scherrer, Tanja Samardžić, Francis Tyers, Miikka Silfverberg, Natalia Klyueva, Tung-Le Pan, Chu-Ren Huang, Radu Tudor Ionescu, Andrei M. Butnaru, Tommi Jauhiainen (2019) A Report on the Third VarDial Evaluation Campaign. Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). pp. 1-16. Minneapolis, United States. pdf url

Gustavo Henrique Paetzold, Marcos Zampieri (2019) Experiments in Cuneiform Language Identification. Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). pp. 209-213. Minneapolis, United States. pdf url

2018

Dominik Schneider, Marcos Zampieri, Josef van Genabith (2018) Translation Memories and the Translator: A Report on a User Survey. Babel - International Journal of Translation. Volume 64, Issue 5/6, pp. 734-762. John Benjamins. pdf url

Marcos Zampieri, Preslav Nakov, Shervin Malmasi, Nikola Ljubešić, Jörg Tiedemann, Ahmed Ali (Editors) (2018) Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). Santa Fe, United Sates. Association for Computational Linguistics. pdf url

Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Ahmed Ali, Suwon Shon, James Glass, Yves Scherrer, Tanja Samardžić, Nikola Ljubešić, Jörg Tiedemann, Chris van der Lee, Stefan Grondelaers, Nelleke Oostdijk, Antal van den Bosch, Ritesh Kumar, Bornini Lahiri, Mayank Jain (2018) Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign. Proceedings of the 5th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). pp. 1-17. Santa Fe, United States. pdf url

Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Santanu Pal, Liviu P. Dinu (2018) Discriminating between Indo-Aryan Languages Using SVM Ensembles. Proceedings of the 5th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). pp. 178-184. Santa Fe, United States. pdf url

Marta R. Costa-jussà, Marcos Zampieri, Santanu Pal (2018) A Neural Approach to Language Variety Translation. Proceedings of the 5th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). pp. 275-282. Santa Fe, Untied States. pdf url

Fernando Benites, Shervin Malmasi, Marcos Zampieri (2018) Classifying Patent Applications with Ensemble Methods. Proceedings of the 16th Annual Workshop of The Australasian Language Technology Association (ALTA). Dunedin, New Zealand. Association for Computational Linguistics. pdf url

Ritesh Kumar, Atul K. Ojha, Marcos Zampieri, Shervin Malmasi (Editors) (2018) Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC). Santa Fe, United States. Association for Computational Linguistics. pdf url

Ritesh Kumar, Atul K. Ojha, Shervin Malmasi, Marcos Zampieri (2018) Benchmarking Aggression Identification in Social Media. Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC). pp. 1-11. Santa Fe, United States. pdf url

Shervin Malmasi, Iria del Río, Marcos Zampieri (2018) Portuguese Native Language Identification. Proceedings of International Conference on the Computational Processing of Portuguese (PROPOR), Lecture Notes in Computer Science - LNCS. Springer. pdf url

Iria del Río, Marcos Zampieri, Shervin Malmasi (2018) A Portuguese Native Language Identification Dataset. Proceedings of the 13th Workshop on Innovative Use of NLP for Building Educational Applications (BEA). pp. 291-296. New Orleans, USA. pdf url

Seid Muhie Yimam, Chris Biemann, Shervin Malmasi, Gustavo H. Paetzold, Lucia Specia, Sanja Štajner, Anaïs Tack, Marcos Zampieri (2018) A Report on the Complex Word Identification Shared Task 2018. Proceedings of the 13th Workshop on Innovative Use of NLP for Building Educational Applications (BEA). pp. 66-78. New Orleans, USA. pdf url

Shervin Malmasi, Marcos Zampieri (2018) Challenges in Discriminating Profanity from Hate Speech. Journal of Experimental & Theoretical Artificial Intelligence. Volume 30, Issue 2, pp. 187-202. Taylor & Francis. pdf url

Diego Moussallem, Mohamed Ahmed Sherif, Diego Esteves, Marcos Zampieri, Axel-Cyrille Ngonga Ngomo (2018) LIDIOMS: A Multilingual Linked Idioms Data Set. Proceedings of Language Resources and Evaluation (LREC). pp. 2468-2474. Miyazaki, Japan. pdf url

Diego Moussallem, Thiago Castro Ferreira, Marcos Zampieri, Maria Claudia Cavalcanti, Geraldo Xexéo, Mariana Neves, Axel-Cyrille Ngonga Ngomo (2018) RDF2PT: Generating Brazilian Portuguese Texts from RDF Data. Proceedings of Language Resources and Evaluation (LREC). pp. 3043-4050. Miyazaki, Japan. pdf url

Ekaterina Lapshinova-Koltunski, Marcos Zampieri (2018) Linguistic Features of Genre and Method Variation in Translation: A Computational Perspective. The Grammar of Genres and Styles - From Discrete to Non-Discrete Units. pp. 92-117. Mouton de Gruyter. pdf url

Ahmed Ibrahim Omer, Marcos Zampieri, Michael Oakes (2018) Phonetic Differences for Dialect Clustering. Proceedings of the 9th International Conference on Information and Communication Systems (ICICS). IEEE. pp. 145-150. Irbid, Jordan. url

2017

Marcos Zampieri, Shervin Malmasi, Gustavo Paetzold, Lucia Specia (2017) Complex Word Identification: Challenges in Data Annotation and System Performance. Proceedings of the 4th Workshop on NLP Techniques for Educational Applications (NLPTEA). pp. 59-63. Taipei, Taiwan. pdf url

Marcos Zampieri, Alina Maria Ciobanu, Liviu P. Dinu (2017) Native Language Identification on Text and Speech. Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (BEA). pp. 398-404. Copenhagen, Denmark. pdf url

Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Liviu P. Dinu (2017) Including Dialects and Language Varieties in Author Profiling. Working Notes of CLEF - Conference and Labs of the Evaluation Forum. Dublin, Ireland. pdf url

Octavia-Maria Sulea, Marcos Zampieri, Shervin Malmasi, Mihaela Vela, Liviu P. Dinu, Josef van Genabith (2017) Exploring the Use of Text Classification in the Legal Domain. Proceedings of 2nd Workshop on Automated Semantic Analysis of Information in Legal Texts (ASAIL). London, United Kingdom. pdf url

Shervin Malmasi, Marcos Zampieri (2017) Detecting Hate Speech in Social Media. Proceedings of Recent Advances in Natural Language Processing (RANLP). pp. 467-472. Varna, Bulgaria. pdf url

Octavia-Maria Sulea, Marcos Zampieri, Mihaela Vela, Josef van Genabith (2017) Predicting the Law Area and Decisions of French Supreme Court Cases. Proceedings of Recent Advances in Natural Language Processing (RANLP). pp. 716-722. Varna, Bulgaria. pdf url

Preslav Nakov, Marcos Zampieri, Shervin Malmasi, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi, Ahmed Ali (Editors) (2017) Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). Valencia, Spain. Association for Computational Linguistics. pdf url

Marcos Zampieri, Shervin Malmasi, Nikola Ljubešić, Preslav Nakov, Ahmed Ali, Jörg Tiedemann, Yves Scherrer, Noëmi Aepli (2017) Findings of the VarDial Evaluation Campaign 2017. Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). pp. 1-15. Valencia, Spain. pdf url

Shervin Malmasi, Marcos Zampieri (2017) German Dialect Identification in Interview Transcriptions. Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). pp. 164-169. Valencia, Spain. pdf url

Shervin Malmasi, Marcos Zampieri (2017) Arabic Dialect Identification Using iVectors and ASR Transcripts. Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). pp. 178-183. Valencia, Spain. pdf url

2016

Rohit Gupta, Constantin Orasan, Marcos Zampieri, Mihaela Vela, Josef van Genabith, Ruslan Mitkov (2016) Improving Translation Memory Matching and Retrieval Using Paraphrases. Machine Translation. Volume 30, Issue 1–2, pp. 19–40. Springer. pdf url

Santanu Pal, Sudip Kumar Naskar, Marcos Zampieri, Tapas Nayak, Josef van Genabith (2016) CATaLog Online: A Web-based CAT Tool for Distributed Translation with Data Capture for APE and Translation Process Research. Proceedings of the 26th International Conference on Computational Linguistics (COLING). pp. 98-102. Osaka, Japan. pdf url

Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi (Editors) (2016) Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). Osaka, Japan. Association for Computational Linguistics. pdf url

Shervin Malmasi, Marcos Zampieri, Nikola Ljubešić, Preslav Nakov, Ahmed Ali, Jörg Tiedemann (2016) Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task. Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). pp. 1-14. Osaka, Japan. pdf url

Shervin Malmasi, Marcos Zampieri (2016) Arabic Dialect Identification in Speech Transcripts. Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). pp. 106-103. Osaka, Japan. pdf url

Marcos Zampieri, Shervin Malmasi, Octavia-Maria Sulea, Liviu P. Dinu (2016) A Computational Approach to the Study of Portuguese Newspapers Published in Macau. Proceedings of the Workshop on Natural Language Processing Meets Journalism (NLPMJ). pp. 47-51. New York, United States. pdf url

Eckhard Bick, Marcos Zampieri (2016) Grammatical Annotation of Historical Portuguese: Generating a Corpus-based Diachronic Dictionary. Proceedings of the 19th International Conference on Text, Speech and Dialogue (TSD), Lecture Notes in Artificial Intelligence - LNAI 9924. Springer. pp. 3-11. pdf url

Ondrej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aurélie Névéolé, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor, Marcos Zampieri (2016) Findings of the 2016 Conference on Machine Translation (WMT16). Proceedings of the First Conference on Machine Translation (WMT). pp. 131-198. Berlin, Germany. pdf url

Santanu Pal, Marcos Zampieri, Josef van Genabith (2016) USAAR: An Operation Sequential Model for Automatic Statistical Post-Editing. Proceedings of the First Conference on Machine Translation (WMT). pp. 759-763. Berlin, Germany. pdf url

Marcos Zampieri, Shervin Malmasi, Mark Dras (2016) Modeling Language Change in Historical Corpora: The Case of Portuguese. Proceedings of Language Resources and Evaluation (LREC). pp. 4098-4104. Portoroz, Slovenia. pdf url

Cyril Goutte, Serge Léger, Shervin Malmasi, Marcos Zampieri (2016) Discriminating Similar Languages: Evaluations and Explorations. Proceedings of Language Resources and Evaluation (LREC). pp. 1800-1807. Portoroz, Slovenia. pdf url

Santanu Pal, Marcos Zampieri, Mihaela Vela, Tapas Nayak, Sudip Kumar Naskar, Josef van Genabith (2016) CATaLog Online: Porting a Post-editing Tool to the Web. Proceedings of Language Resources and Evaluation (LREC). pp. 599-604. Portoroz, Slovenia. pdf url

Shervin Malmasi, Marcos Zampieri, Mark Dras (2016) Predicting Post Severity in Mental Health Forums. Proceedings of the Workshop on Computational Linguistics and Clinical Psychology (CLPsych). pp. 133-137. San Diego, United States. pdf url

Marcos Zampieri, Liling Tan, Josef van Genabith (2016) MacSaar at SemEval-2016 Task 11: Zipfian and Character Features for Complex Word Identification. Proceedings of the 9th Workshop on Semantic Evaluation (SemEval). pp. 1001-1005. San Diego, United States. pdf url

Shervin Malmasi, Mark Dras, Marcos Zampieri (2016) LTG at SemEval-2016 Task 11: Complex Word Identification with Classifier Ensembles. Proceedings of the 9th Workshop on Semantic Evaluation (SemEval). pp. 996-1000. San Diego, United States. pdf url

Shervin Malmasi, Marcos Zampieri (2016) MAZA at SemEval-2016 Task 11: Detecting Lexical Complexity Using a Decision Stump Meta-Classifier. Proceedings of the 9th Workshop on Semantic Evaluation (SemEval). pp. 991-995. San Diego, United States. pdf url

Marcos Zampieri (2016). Automatic Language Identification. Working with Text: Tools, Techniques and Approaches for Text Mining. pp. 189-205. Chandos Publishing, Elsevier. url

2015

Preslav Nakov, Marcos Zampieri, Petya Osenova, Liling Tan, Cristina Vertan, Nikola Ljubešić, Jörg Tiedemann (Editors) (2015) Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (LT4VarDial). Hissar, Bulgaria. Association for Computational Linguistics. pdf url

Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Preslav Nakov (2015) Overview of the DSL Shared Task 2015. Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (LT4VarDial). pp. 1-9. Hissar, Bulgaria. pdf url

Marcos Zampieri, Binyam Gebrekidan Gebre, Hernani Costa, Josef van Genabith (2015) Comparing Approaches to the Identification of Similar Languages. Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (LT4VarDial). pp. 66-72. Hissar, Bulgaria. pdf url

Tapas Nayek, Sudip Kumar Naskar, Santanu Pal, Marcos Zampieri, Mihaela Vela, Josef van Genabith (2015) CATaLog: New Approaches to TM and Post Editing Interfaces. Proceedings of the Workshop on Natural Language Processing for Translation Memories (NLP4TM). pp. 36-32. Hissar, Bulgaria. pdf url

Marcos Zampieri, Ekaterina Lapshinova-Koltunski (2015) Investigating Genre and Method Variation in Translation Using Text Classification. Proceedings of the 18th International Conference on Text, Speech and Dialogue (TSD), Lecture Notes in Computer Science - LNCS 9302. Springer. pp. 41-50. pdf url

Carolina Scarton, Marcos Zampieri, Mihaela Vela, Josef van Genabith, Lucia Specia (2015) Searching for Context: a Study on Document-Level Labels for Translation Quality Estimation. Proceedings of the 18th Annual Conference of the European Association for Machine Translation (EAMT). pp. 121-128. Antalya, Turkey. pdf url

Rohit Gupta, Constantin Orasan, Marcos Zampieri, Mihaela Vela, Josef van Genabith (2015) Can Translation Memories Afford Not to Use Paraphrasing? Proceedings of the 18th Annual Conference of the European Association for Machine Translation (EAMT). pp. 35-42. Antalya, Turkey. pdf url

Marcos Zampieri, Alina Maria Ciobanu, Vlad Niculae, Liviu P. Dinu (2015) AMBRA: A Ranking Approach to Temporal Text Classification. Proceedings of the 9th Workshop on Semantic Evaluation (SemEval). pp. 851-822. Denver, United States. pdf url

2014

Marcos Zampieri, Liling Tan (2014) Grammatical Error Detection with Limited Training Data: The Case of Chinese. Proceedings of the 22nd International Conference on Computers in Education (ICCE). pp. 69-74. Nara, Japan. pdf url

Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann (2014) (Editors) Proceedings of the 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial). Dublin, Ireland. Association for Computational Linguistics. pdf url

Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann (2014) A Report on the DSL Shared Task 2014. Proceedings of the 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial). pp. 58-67. Dublin, Ireland. pdf url

Marcos Zampieri, Renato Cordeiro de Amorim (2014) Between Sound and Spelling: Combining Phonetics and Clustering Algorithms to Improve Target Word Recovery. Proceedings of the 9th International Conference on Natural Language Processing (PolTAL). Lecture Notes in Computer Science - LNCS 8686. Springer. pp. 438-449. pdf url

Marcos Zampieri, Binyam Gebrekidan Gebre (2014) VarClass: An Open Source Language Identification Tool for Language Varieties. Proceedings of Language Resources and Evaluation (LREC). pp. 3305-3308. Reykjavik, Iceland. pdf url

Liling Tan, Marcos Zampieri, Nikola Ljubešić, Jörg Tiedemann (2014) Merging Comparable Data Sources for the Discrimination of Similar Languages: The DSL Corpus Collection. Proceedings of the 7th Workshop on Building and Using Comparable Corpora (BUCC). pp. 6-10. Reykjavik, Iceland. pdf url

Vlad Niculae, Marcos Zampieri, Liviu Dinu, Alina Maria Ciobanu (2014) Temporal Text Ranking and Automatic Dating of Texts. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL). pp. 17-21. Gothenburg, Sweden. pdf url

Marcos Zampieri, Mihaela Vela (2014) Quantifying the Influence of MT Output in the Translators' Performance: A Case Study in Technical Translation. Proceedings of the EACL Workshop on Humans and Computer-assisted Translation (HaCat). pp. 93-98. Gothenburg, Sweden. pdf url

2013

Marcos Zampieri (2013) Using Bag-of-words to Distinguish Similar Languages: How Efficient are They? Proceedings of the 14th IEEE International Symposium on Computational Intelligence and Informatics (CINTI). pp. 37-41. Budapest, Hungary. pdf url

Renato Cordeiro de Amorim, Marcos Zampieri (2013) Effective Spell Checking Methods Using Clustering Algorithms. Proceedings of Recent Advances in Natural Language Processing (RANLP). pp. 172-178. Hissar, Bulgaria. pdf url

Sanja Štajner, Marcos Zampieri (2013) Stylistic Changes for Temporal Text Classification. Proceedings of the 16th International Conference on Text, Speech and Dialogue (TSD), Lecture Notes in Artificial Intelligence - LNAI 8082, Springer. pp. 519-526. pdf url

Marcos Zampieri, Sascha Diwersy (Editors)Improving CAT Tools in the Translation Workflow: New Approaches and Evaluation (2013) Non-Standard Data Sources in Corpus-based Research. ZSM-Studien Series - Vol. 5. Shaker. url

Marcos Zampieri, Jürgen Hermes, Stephan Schwiebert (2013) Identification of Patterns and Document Ranking of Internet Texts: A Frequency-based Approach. Non-Standard Data Sources in Corpus-based Research. ZSM-Studien Series - Vol. 5. Shaker. pp. 25-39. pdf url

Marcos Zampieri, Martin Becker (2013) Colonia: Corpus of Historical Portuguese. Non-Standard Data Sources in Corpus-based Research. ZSM-Studien Series - Vol. 5. Shaker. pp. 77-84. pdf url

Marcos Zampieri, Binyam Gebrekidan Gebre, Sascha Diwersy (2013) N-Gram Language Models and POS Distribution for the Identification of Spanish Varieties. Proceedings of TALN. pp. 580-587. Sables d'Olonne, France. pdf url

Binyam Gebrekidan Gebre, Marcos Zampieri, Peter Wittenburg, Tom Heskens (2013) Improving Native Language Identification with TF-IDF Weighting. Proceedings of the 8th NAACL Workshop on Innovative Use of NLP for Building Educational Applications (BEA). pp. 216-223. Atlanta, United States. pdf url

2012

Marcos Zampieri (2012) Evaluating Knowledge-poor and Knowledge-rich Features in Automatic Classification: A Case Study in WSD. Proceedings of the 13th IEEE International Symposium on Computational Intelligence and Informatics (CINTI). pp. 359-363. Budapest, Hungary. pdf url

Marcos Zampieri, Binyam Gebrekidan Gebre, Sascha Diwersy (2012) Classifying Pluricentric Languages: Extending the Monolingual Model. Proceedings of the Fourth Swedish Language Technology Conference (SLTC). pp. 79-80. Lund, Sweden. pdf url

Marcos Zampieri, Binyam Gebrekidan Gebre (2012) Automatic Identification of Language Varieties: The Case of Portuguese. Proceedings of KONVENS. pp. 233-237. Vienna, Austria. pdf url

2010

Marcos Zampieri (2010) A Supervised Machine Learning Method for Word Sense Disambiguation of Portuguese Nouns. Bulletin de Linguistique Apliquee et Generale - BULAG 34. pp. 187-203. pdf url

Jorge Baptista, Neusa Costa, Joaquim Guerra, Marcos Zampieri, Maria Cabral, Nuno Mamede (2010) P-AWL: Academic Word List for Portuguese. Proceedings of the International Conference on the Computational Processing of Portuguese (PROPOR), Lecture Notes in Artificial Intelligence - LNAI 6001, Springer. pp. 120-123. pdf url

Pre-print Manuscripts and Tech Reports

Allahsera Tapo, Michael Leventhal, Sarah Luger, Christopher M. Homan, Marcos Zampieri (2021) Domain-specific MT for Low-resource Languages: The case of Bambara - French. arXiv preprint arXiv:2104.00041. pdf url

Michael Leventhal, Allahsera Tapo, Marcos Zampieri, Christopher M. Homan, Sarah Luger (2020) Assessing Human Translations from French to Bambara for Machine Learning: A Pilot Study. arXiv preprint arXiv:2004.00068. pdf url

Liviu P. Dinu, Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi (2018) Classifier Ensembles for Dialect and Language Variety Identification. arXiv preprint arXiv:1808.04800. pdf url

Marcos Zampieri (2017) Compiling and Processing Historical and Contemporary Portuguese Corpora. arXiv preprint arXiv:1710.00803. pdf url


Last Updated: January 2023 | Template: Plain Academic