Publication:
Instance labeling in semi-supervised learning with meaning values of words

dc.contributor.authorALTINEL GİRGİN, AYŞE BERNA
dc.contributor.authorGANİZ, MURAT CAN
dc.contributor.authorsAltinel, Berna; Ganiz, Murat Can; Diri, Banu
dc.date.accessioned2022-03-12T20:32:47Z
dc.date.accessioned2026-01-11T13:14:29Z
dc.date.available2022-03-12T20:32:47Z
dc.date.issued2017
dc.description.abstractIn supervised learning systems; only labeled samples are used for building a classifier that is then used to predict the class labels of the unlabeled samples. However, obtaining labeled data is very expensive, time consuming and difficult in real-life practical situations as labeling a data set requires the effort of a human expert. On the other side, unlabeled data are often plentiful which makes it relatively inexpensive and easier to obtain. Semi-Supervised Learning methods strive to utilize this plentiful source of unlabeled examples to increase the learning capacity of the classifier particularly when amount of labeled examples are restricted. Since SSL techniques usually reach higher accuracy and require less human effort, they attract a substantial amount of attention both in practical applications and theoretical research. A novel semi-supervised methodology is offered in this study. This algorithm utilizes a new method to predict the class labels of unlabeled examples in a corpus and incorporate them into the training set to build a better classifier. The approach presented here depends on a meaning calculation, which computes the words' meaning scores in the scope of classes. Meaning computation is constructed on the Helmholtz principle and utilized to various applications in the field of text mining like feature extraction, information retrieval and document summarization. Nevertheless, according to the literature, ILBOM is the first work which uses meaning calculation in a semi-supervised way to construct a semantic smoothing kernel for Support Vector Machines (SVM). Evaluation of the proposed methodology is done by performing various experiments on standard textual datasets. ILBOM's experimental results are compared with three baseline algorithms including SVM using linear kernel which is one of the most frequently used algorithms in text classification field. Experimental results show that labeling unlabeled instances based on meaning scores of words to augment the training set is valuable, and increases the classification accuracy on previously unseen test instances significantly.
dc.identifier.doi10.1016/j.engappai.2017.04.003
dc.identifier.eissn1873-6769
dc.identifier.issn0952-1976
dc.identifier.urihttps://hdl.handle.net/11424/234432
dc.identifier.wosWOS:000403122500012
dc.language.isoeng
dc.publisherPERGAMON-ELSEVIER SCIENCE LTD
dc.relation.ispartofENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectText classification
dc.subjectSemantic kernel
dc.subjectSemi-supervised learning
dc.subjectInstance labeling
dc.subjectHelmholtz principle
dc.subjectTEXT CLASSIFICATION
dc.subjectSEMANTIC KERNEL
dc.titleInstance labeling in semi-supervised learning with meaning values of words
dc.typearticle
dspace.entity.typePublication
oaire.citation.endPage163
oaire.citation.startPage152
oaire.citation.titleENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
oaire.citation.volume62

Files