Publication:
Document Embedding based Supervised Methods for Turkish Text Classification

dc.contributor.authorsCelenli, Halil I.; Ozturk, S. Talha; Sahin, Gurkan; Gerek, Aydin; Ganiz, Murat C.
dc.date.accessioned2022-03-12T16:23:57Z
dc.date.accessioned2026-01-11T13:33:01Z
dc.date.available2022-03-12T16:23:57Z
dc.date.issued2018
dc.description.abstractFollowing the recent increase in the amount of available data, Deep Learning has become the most popular branch of Machine Learning. This trend can also be seen in Natural Language Processing (NLP) especially since textual data can now be scraped from in World Wide Web in vast quantities and used in an unsupervised or semi-supervised manner. For this reason, Deep Learning methods are being used more frequently. In this work we devise several classification methods based on the Paragraph Vector model (a.k.a. Doc2Vec) which represents documents as vectors. These include k-Nearest Neighborhood classifier (k-NN), Support Vector Machines (SVM), Centroid Classifier (CC) that works on paragraph vectors of documents and a custom made method which uses pairwise cosine similarities between documents and class centroids as features in Doc2Vec space. Our experiments use a number of representations and classifiers combined in various ways. On the representation side the Paragraph Vector model is compared with Term Frequency (tf) and Term Frequency-Inverse Document Frequency (tf-idf) using SVM, k-NN, CC and Centroid Features Support Vector Machine (CFSVM) as classifiers.
dc.identifier.doidoiWOS:000459847400092
dc.identifier.isbn978-1-5386-7893-0
dc.identifier.urihttps://hdl.handle.net/11424/226147
dc.identifier.wosWOS:000459847400092
dc.language.isoeng
dc.publisherIEEE
dc.relation.ispartof2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK)
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectText Classification
dc.subjectDoc2Vec
dc.subjectDistributed Vector Representations
dc.subjectEmbedding models
dc.subjectParagraph Vectors
dc.titleDocument Embedding based Supervised Methods for Turkish Text Classification
dc.typeconferenceObject
dspace.entity.typePublication
oaire.citation.endPage482
oaire.citation.startPage477
oaire.citation.title2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK)

Files