Publication:
Traditional Machine Learning and Deep Learning-based Text Classification for Turkish Law Documents using Transformers and Domain Adaptation

dc.contributor.authorGANİZ, MURAT CAN
dc.contributor.authorsAkca O., Bayrak G., Issifu A. M. , GANİZ M. C.
dc.date.accessioned2022-12-26T14:54:57Z
dc.date.accessioned2026-01-10T20:26:20Z
dc.date.available2022-12-26T14:54:57Z
dc.date.issued2022-01-01
dc.description.abstract© 2022 IEEE.Natural Language Processing (NLP) is an interdisciplinary field between linguistics and computer science. Its main aim is to process natural (human) language using computer programs. Text classification is one of the main tasks of this field, and they are widely used in many different applications such as spam filtering, sentiment analysis, and document categorization. Nonetheless, there is only very little text classification work in the law domain and even less for the Turkish language. This may be attributed to the complexity within the domain. The length, complexity of documents, and use of extensive technical jargon are some of the reasons that distinguish this domain from others. Similar to the medical domain, understanding these documents requires extensive specialization. Another reason can be the scarcity of publicly available datasets. In this study, we compile sizeable unsupervised and supervised datasets from publicly available sources and experiment with several classification algorithms ranging from traditional classifiers to much more complicated deep learning and transformer-based models along with different text representations. We focus on classifying Court of Cassation decisions for their crime labels. Interestingly, the majority of the models we experiment with could be able to obtain good results. This suggests that although understanding the documents in the legal domain is complicated and requires expertise from humans, it may be relatively easier for machine learning models despite the extensive presence of the technical terms. This seems to be especially the case for transformer-based pre-trained neural language models which can be adapted to the law domain, showing high potential for future real-world applications.
dc.identifier.citationAkca O., Bayrak G., Issifu A. M. , GANİZ M. C. , \"Traditional Machine Learning and Deep Learning-based Text Classification for Turkish Law Documents using Transformers and Domain Adaptation\", 16th International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2022, Biarritz, Fransa, 8 - 12 Ağustos 2022
dc.identifier.doi10.1109/inista55318.2022.9894051
dc.identifier.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85139594778&origin=inward
dc.identifier.urihttps://hdl.handle.net/11424/284090
dc.language.isoeng
dc.relation.ispartof16th International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2022
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectBilgisayar Bilimleri
dc.subjectAlgoritmalar
dc.subjectBilgi Güvenliği ve Güvenilirliği
dc.subjectMühendislik ve Teknoloji
dc.subjectComputer Sciences
dc.subjectalgorithms
dc.subjectInformation Security and Reliability
dc.subjectEngineering and Technology
dc.subjectMühendislik, Bilişim ve Teknoloji (ENG)
dc.subjectBilgisayar Bilimi
dc.subjectBİLGİSAYAR BİLİMİ, YAPAY ZEKA
dc.subjectBİLGİSAYAR BİLİMİ, BİLGİ SİSTEMLERİ
dc.subjectEngineering, Computing & Technology (ENG)
dc.subjectCOMPUTER SCIENCE
dc.subjectCOMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
dc.subjectCOMPUTER SCIENCE, INFORMATION SYSTEMS
dc.subjectBilgi sistemi
dc.subjectFizik Bilimleri
dc.subjectYapay Zeka
dc.subjectBilgisayar Bilimi Uygulamaları
dc.subjectInformation Systems
dc.subjectPhysical Sciences
dc.subjectArtificial Intelligence
dc.subjectComputer Science Applications
dc.subjectDomain-specific language models
dc.subjectLegal document classification
dc.subjectNatural Language Processing
dc.titleTraditional Machine Learning and Deep Learning-based Text Classification for Turkish Law Documents using Transformers and Domain Adaptation
dc.typeconferenceObject
dspace.entity.typePublication

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
file.pdf
Size:
1.39 MB
Format:
Adobe Portable Document Format