Publication:
Effects of Positivization on the Paragraph Vector Model

dc.contributor.authorsGerek A., Yuney M.C., Erkaya E., Ganiz M.C.
dc.date.accessioned2022-03-15T02:14:19Z
dc.date.accessioned2026-01-11T13:44:51Z
dc.date.available2022-03-15T02:14:19Z
dc.date.issued2019
dc.description.abstractNatural language processing (NLP) is an important field of Artificial Intelligence. One of the fundamental problems in NLP is to create vector (distributed) representations of words so that vectors of words that have similar meaning lie closer in space. One of the most popular algorithms for creating these representations are word embedding models such as word2vec and fastText. Similarly the paragraph vector model (doc2vec) is used to create distributed representations of documents while simultaneously creating distributed representations for the words in these documents. These models create a dense, and low dimensional (usually in the low hundreds) vector representations which may include negative values. In this study we focus on these negative values and introduce a family of regularization methods in which document, word and/or context vectors of the paragraph vector model are forced to have only positive components. We measure its effects on several tasks; text classification, semantic similarity, and analogy tasks. Although positivization greatly increases the sparsity of the word embeddings, and should be expected to result in a loss of information, our results show that there is almost no reduction in the performance of the regularized embeddings in these tasks. We also observe an increase in the classification accuracy in one case. We foresee that these approaches can be beneficial in machine learning systems which require non-negative vectors. © 2019 IEEE.
dc.identifier.doi10.1109/INISTA.2019.8778304
dc.identifier.isbn9781728118628
dc.identifier.urihttps://hdl.handle.net/11424/248026
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.relation.ispartofIEEE International Symposium on INnovations in Intelligent SysTems and Applications, INISTA 2019 - Proceedings
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectanalogy
dc.subjectartificial intelligence
dc.subjectnatural language processing
dc.subjectregularization
dc.subjectsemantic similarity
dc.subjecttext classification
dc.subjectword embeddings
dc.titleEffects of Positivization on the Paragraph Vector Model
dc.typeconferenceObject
dspace.entity.typePublication
oaire.citation.titleIEEE International Symposium on INnovations in Intelligent SysTems and Applications, INISTA 2019 - Proceedings

Files