Publication: Analyzing the performance of different large language models of chatgpt on turkish homonyms
Loading...
Files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Son zamanların popüler konusu ChatGPT ve gerçekleştirdiği başarılı işler,
yapay zekânın ne kadar geliştiğini ve ilerleyen yıllar için vadettiklerini bizlere
göstermektedir. ChatGPT’nin hâlihazırda kullanılan Büyük Dil Modelleri
arasındaki farklılıklar bu çalışmanın konusunu oluşturmaktadır. ChatGPT-3.5
ve ChatGPT-4’ün performansları Türkçedeki eş adlı kelimeler üzerinden
incelenmiştir. Büyük Dil Modelleri oluşturulurken kullanılan Doğal Dil İşleme
sistemlerinde aşılması en büyük zorluklardan birisi de bu sistemlerin kelimeanlam belirsizliğini ayırt edebilme becerileridir. Bu belirsizlikleri tespit etmek
amacıyla Türkçede en yaygın olarak kullanılan 200 eş adlı kelime örneklem
olarak seçilmiştir. Ardından tek bir eş adlı kelimenin, aynı cümle içerisinde iki
farklı anlama da gelecek şekilde iki kez kullanılmasıyla cümleler oluşturulmuş
ve öncelikle ChatGPT-3.5’den sonra ChatGPT-4’den farklı anlamları tespit
etmesi istenmiştir. ChatGPT’ler her iki anlamdan birini bilemediği ve bazen
iki anlamı da bilemediği çıktılar üretmiştir. Amaç doğrultusunda ChatGPT-3.5
ve ChatGPT-4 modellerinden alınan çıktılar karşılaştırılmıştır. ChatGPT 3.5’e
kıyasla daha fazla parametreye ve veri setine sahip olan ChatGPT-4, beklendiği
gibi çok daha iyi bir performans göstermiştir. Başarı oranı dağılım analizi, eş adlı
kelimeye göre performans değişikliği, eş adlı kelimenin karakter sayısı ve başarı
oranı, istatistiksel testler yapılan diğer analizlerdir.
Anahtar Kelimeler: ChatGPT-3.5, ChatGPT-4, Büyük Dil modeli, Eş Adlı
Kelime, Dil Bilimsel Belirsizlik.
ChatGPT, the popular topic in recent periods, and its achievements show us how much artificial intelligence has developed and what it promises for the coming years. This study focuses on the differences between ChatGPT and its currently used Large Language Models. The performances of ChatGPT-3.5 and ChatGPT-4 are analyzed on Turkish homonyms. One major challenge faced by Natural Language Processing systems used in the generation of Large Language Models is identifying word-sense ambiguity. In order to detect these ambiguities, the 200 most commonly used synonyms in Turkish were selected as the sample. Then, sentences were formed by using a single homonym twice in the same sentence to convey two different meanings, and ChatGPT-3.5 and then ChatGPT-4 were asked to detect the different meanings. ChatGPTs generated outputs in which they could not know either of the two meanings and sometimes could not know both meanings. In line with the objective, the outputs from ChatGPT-3.5 and ChatGPT-4 models were compared. As expected, ChatGPT-4, with its larger parameters and datasets, outperformed ChatGPT-3.5. Success rate distribution analysis, performance variation based on the homonym, the number of characters of the homonym and the success rate are the other statistical tests carried out. Keywords: ChatGPT-3.5, ChatGPT-4, Large Language Model, Homonym, Linguistic Ambiguity.
ChatGPT, the popular topic in recent periods, and its achievements show us how much artificial intelligence has developed and what it promises for the coming years. This study focuses on the differences between ChatGPT and its currently used Large Language Models. The performances of ChatGPT-3.5 and ChatGPT-4 are analyzed on Turkish homonyms. One major challenge faced by Natural Language Processing systems used in the generation of Large Language Models is identifying word-sense ambiguity. In order to detect these ambiguities, the 200 most commonly used synonyms in Turkish were selected as the sample. Then, sentences were formed by using a single homonym twice in the same sentence to convey two different meanings, and ChatGPT-3.5 and then ChatGPT-4 were asked to detect the different meanings. ChatGPTs generated outputs in which they could not know either of the two meanings and sometimes could not know both meanings. In line with the objective, the outputs from ChatGPT-3.5 and ChatGPT-4 models were compared. As expected, ChatGPT-4, with its larger parameters and datasets, outperformed ChatGPT-3.5. Success rate distribution analysis, performance variation based on the homonym, the number of characters of the homonym and the success rate are the other statistical tests carried out. Keywords: ChatGPT-3.5, ChatGPT-4, Large Language Model, Homonym, Linguistic Ambiguity.
Description
Citation
AYTEKİN Ç., Karabina T. B., "ANALYZING THE PERFORMANCE OF DIFFERENT LARGE LANGUAGE MODELS OF CHATGPT ON TURKISH HOMONYMS", İstanbul Aydın Üniversitesi Sosyal Bilimler Dergisi, cilt.16, sa.3, ss.365-390, 2024
