Publication: A search engine for Turkish with stemming
Abstract
Intranet, büyük, küçük veya orta ölçekli pek çok işletmelerde, etkinliği ve verimliliği arttiran sistemlerin temel taşıdır. Intranet ise varliğini ve kullanilabilirliğini 'Arama Motoru' teknolojisine borçludur. Türkçe kullanılan intranetleri etkili olarak kullanabilmek için güçlü bir Türkçe arama motoruna ihtiyaç vardır. Bugün ticari ya da akademik, Türkçe dahil, pek çok dil için arama motorları geliştirilmiştir. Fakat, su an Türkçe icin yapilmiş arama motorlari, aratılan kelimelerin bire bir aynılarının bulunduğu dökümanlara erişebilmektedir. Halbuki Turkçe'nin dilbilgisi yapısı, belli bir kelimenin anlamını yitirmeden pek çok degişik ekler alabilmesine izin vermektedir. Bu tezin amaci, Türkçe 'nin dilbilgisi yapısını göz önüne alarak, bir kelimenin çekim eki almis tüm hallerini bulabilen, böylece aranılan bilgiye ulaşma ihtimalini arttıracak, bir arama motoru tasarlayip, uygulamaktir
Intranet is the cornerstone of many Engineering Management systems that enable efficiency and productivity gains in various businesses, whether they are small, big or medium-sized. Intranet owes its very existence and usefulness to a powerful technology: Search Engine. An Intranet where the working language is Turkish requires a powerful search engine for the Turkish language. Many search engines, commercial or not, are currently available in many languages, including the Turkish language. However, existing search engines for Turkish language are utilizing the pattern matching technique, which is effective only to find a word exactly the way it is searched for. The grammatical characteristics of the Turkish language makes it very challenging to create a powerful search engine that goes beyond pattern matching. The aim of this thesis is to design and implement a search engine for the Turkish language that incorporates stemming so that the root of the search-word could be determined and searched for, thus, increasing the probability of finding the desired documents or information.
Intranet is the cornerstone of many Engineering Management systems that enable efficiency and productivity gains in various businesses, whether they are small, big or medium-sized. Intranet owes its very existence and usefulness to a powerful technology: Search Engine. An Intranet where the working language is Turkish requires a powerful search engine for the Turkish language. Many search engines, commercial or not, are currently available in many languages, including the Turkish language. However, existing search engines for Turkish language are utilizing the pattern matching technique, which is effective only to find a word exactly the way it is searched for. The grammatical characteristics of the Turkish language makes it very challenging to create a powerful search engine that goes beyond pattern matching. The aim of this thesis is to design and implement a search engine for the Turkish language that incorporates stemming so that the root of the search-word could be determined and searched for, thus, increasing the probability of finding the desired documents or information.
