Information extraction and manipulation system for the Web Sources

Tatar, Serhan

Publication:
Information extraction and manipulation system for the Web Sources

dc.contributor.advisor	EYLER, M Akif
dc.contributor.author	Tatar, Serhan
dc.contributor.department	Marmara Üniversitesi
dc.contributor.department	Fen Bilimleri Enstitüsü
dc.contributor.department	Bilgisayar Mühendisliği Bilim Dalı
dc.date.accessioned	2026-01-13T09:03:11Z
dc.date.issued	2002
dc.description.abstract	WEB KAYNAKLARINDAN BİLGİ SAĞLANMASI VE MANİPÜLASYONU İnternette başlıca gezinim tekniği bağlantıları kullanarak bilgiye ulaşma ve anahtar kelimeleri kullanarak aramalar yapmaktır. Fakat, internetin devasa büyüklüğü düşünüldüğünde bu tekniklerin yeterli olmadığı görülmektedir. Ayrıca, diğer bir problem de heterojen yapılarda bulunan bilginin belirli bir veri modeli ve biçimi içinde sunulabilmesidir. Bu tezde, WebXtractor isimli sistem anlatılmış ve geliştirilmiştir. Sistem temel olarak Web kaynaklarından bilgi elde edilmesini ve elde edilen bilginin rafine hale getirilmesini sağlamaktadır. Özellikle, internetten veri aktarma işlemlerinde oldukça etkili bir şekilde kullanılabilmektedir. WebXtractor'un sahip olduğu başlıca özellikler aşağıda sıralanmıştır: Kaynakların internetten otomatik olarak getirilmesi ve ayrıştırılması Kaynaklardan otomatik olarak kullanıcının belirttiği bilginin ayıklanması Kaynakların ilişkilendirilmesi Veri modeli tasarımı Görsel araçlar sayesinde hızlı ve kolay uygulama geliştirme imkanı WebXtractor sistemi içerisinde, kullanıcının sistemi kolayca yapılandırabilmesi için 3 araç geliştirilmiştir. Tez içerisinde bu araçların nasıl kullanıldığı ve WebXtractor ile nasıl uygulama geliştirileceği konuları da detaylı bir şekilde anlatılmıştır. Ayrıca sistemin kullanımını anlatan örnek uygulamalar gerçeklenmiş ve gösterilmiştir. Bu uygulamalardan ilkinde, Web üzerinde bulunan çoklu bir veri kaynağından elde edilen bilgi entegre hale getirilmiş ve kullanıcının istediği veri modeli ve biçimi içerisinde kullanıcıya sunulmuştur. İkinci uygulamada ise tek bir dokümandan oluşan kaynaktan elde edilen bilgi sadece biçim değişikliği yapılarak kullanıcıya sunulmuştur. Clicking on links and using search for links is the main navigation technique in the Internet. However, it seems that the method is not useful when we consider the enormous size of the Internet. Moreover, another important problem is presentation of the information, which is stored in heterogeneous structures, in a specified data model and format. In this thesis, WebXtractor system is described and developed. The system is used to extract information from the Web sources and refine the extracted information. Especially, when migrating data from the Web, the system can be used efficiently. Main features of WebXtractor include: Automatic retrieval and parsing of the Web sources Automatic information extraction Source integration Data model design Easy and rapid application development facilities by the help of visual tools In WebXtractor system, three tools were developed for user to configure the system easily. In the thesis, the toolkit was analyzed in detail. In addition, application development in WebXtractor was explained. Sample applications that show the usability of the system were also implemented and shown. In the first example, data that is stored on a multiple-instance Web source was integrated and the integrated information was presented to user in user specified data model and format. In the second example, data that is stored on a single-instance Web source was presented to user in user specified format.
dc.format.extent	XI,62y.
dc.identifier.uri	https://katalog.marmara.edu.tr/veriler/yordambt/cokluortam/1F/T0048396.pdf
dc.identifier.uri	https://hdl.handle.net/11424/209392
dc.language.iso	eng
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Arabirim donanımı ve iletişim
dc.subject	Bilgi Sağlama, İnternette
dc.subject	bilgisayar bilimi
dc.subject	Genel Konular
dc.subject	Veri işleme
dc.title	Information extraction and manipulation system for the Web Sources
dc.type	masterThesis
dspace.entity.type	Publication

Collections

Tezler

Publication: Information extraction and manipulation system for the Web Sources

Files

Collections

Publication:
Information extraction and manipulation system for the Web Sources