Publication: du-CBA : veriden habersiz ve artırımlı sınıflandırmaya dayalı birliktelik kuralları çıkarma mimarisi
Abstract
Günümüzde nesnelerin interneti, mobil uygulamalar gibi hayatımızı kolaylaştıran, istemcilerin ve sunucuların birlikte çalışması gereken sistemlerin kullanımı artmıştır. Bu alanlarda makine öğrenmesi modeli kullanılması ihtiyaç haline gelmiştir. Ancak istemcilerden verilerin toplanması, sunucuya aktarılması, makine öğrenmesi modeli eğitilmesi ve ardından bu modelin istemcilerde çalışan cihazlara entegre edilmesi bir çok problemi beraberinde getirmektedir. Verilerin istemcilerden sunucuya transferi ağ trafiğine sebep olmakta ve fazla enerji gerektirmektedir. Veriler içerisinde kişisel veriler yer almaktadır. Transfer sırasında kişisel verilerin aktarılması veri mahremiyetini istismar edilebilmektedir. Veri mahremiyetini sağlamak için veri paylaşımının yapılmaması ise veri çeşitliliğini ve büyüklüğünü azalttığından dolayı sunucuda geliştirilecek modelin doğru sonuç vermemesine sebep olmakta, verimliliğini düşürmekte ve bu alandaki makine öğrenmesi çalışmalarını kısıtlamaktadır. Tez çalışması kapsamında bahsedilen bu problemlere çözüm üretmek için federe öğrenme mimarisi kullanılmaktadır. Mimariye göre her bir istemci, kendi verilerini kullanılarak yerel bir makine öğrenmesi modeli eğitmektedir. Eğitilen modeller sunucuya gönderilmekte ve sunucuda bu modeller birleştirilerek yeni bir model oluşturulmaktadır. Oluşturulan nihai model tekrar istemcilere dağıtılmaktadır. Bu sayede ağ trafiği azaltılmakta, enerji ihtiyacı düşürülmektedir. Veri mahremiyeti ise, bütün veriler yerine tek başına anlamsız olan veriler gönderildiği için korunmaktadır. Federe öğrenme mimarisi güncel ve geliştirilmeye açık bir alandır. Bu alanda geliştirilecek algoritmalara ihtiyaç duyulmaktadır. Bu çalışmada Veriden Habersiz İlişkili Kurallara Dayalı Sınıflandırma (Data Unaware Classification Based on Association, du-CBA) olarak adlandırılan algoritma geliştirilmiştir. Algoritma federe öğrenme mimarisi için geliştirilmiş ilişkisel sınıflandırma algoritmasıdır. Federe öğrenme ile klasik öğrenme mimarilerini karşılaştırıp başarılarını ölçmek için çalışma kapsamında bir simülasyon ortamı oluşturulmuştur. Simülasyonda du-CBA ve CBA algoritmaları kullanılarak modeller eğitilmiş ve sonuçlar kıyaslanmıştır. Modellerin eğitiminde University of California Irvine (UCI) veri havuzundan alınan beş veri seti kullanılmıştır. Deneysel sonuçlar ayrı ayrı her bir veri seti için federe öğrenme ile eğitilen modelin, klasik öğrenme ile eğitilen modelle neredeyse aynı doğruluğu elde ettiğini göstermiştir. Model federe öğrenme yöntemi ile eğitildiğinde, eğitim süresinin yaklaşık olarak %70 oranında azaldığı ortaya çıkmıştır. Böylece federe öğrenme mimarisini kullanacak cihazlarda enerji ihtiyacının düştüğü sonucuna varılmaktadır. Ayrıca veri yerine eğitilmiş model merkeze gönderildiği için hem veri mahremiyeti sağlanmış hem de ağ trafiği kayda değer şekilde azalmıştır. Deneysel sonuçlar geliştirilen algoritmanın başarılı sonuçlar ürettiğini göstermektedir.
Today, the use of systems that make our life easier, such as the Internet of Things and mobile applications, and which require clients and servers to work together, has increased. The use of machine learning models in these areas has become a necessity. However, collecting data from the clients, transferring them to the server, training the machine learning model and then integrating this model into the devices running on the clients bring along many problems. The transfer of data from the clients to the server causes network traffic and requires a lot of energy. The data includes personal data. The transfer of personal data during the transfer can exploit data privacy. The lack of data sharing to ensure data privacy, on the other hand, reduces the variety and size of data, causing the model to be developed on the server to not give correct results, reducing its efficiency and restricting machine learning studies in this area. Federated learning architecture is used to produce solutions to these problems mentioned within the scope of the thesis. According to the architecture, each client trains a local machine learning model using its own data. The trained models are sent to the server and a new model is created by merging these models on the server. The final model created is distributed to the clients again. In this way, network traffic is reduced and energy demand is reduced. Data privacy, on the other hand, is protected as only meaningless data is sent instead of all data. Federated learning architecture is an up-to-date and open field. Algorithms to be developed in this area are needed. In this study, an algorithm called Data Unaware Classification Based on Association (du-CBA) has been developed. The algorithm is an association classification algorithm developed for federated learning architecture. In order to compare federated learning and classical learning architectures and measure their success, a simulation environment was created within the scope of the study. Models were trained using du-CBA and CBA algorithms in the simulation and the results were compared. Five data sets from the University of California Irvine (UCI) repository were used to train the models. Experimental results showed that for each data set separately, the model trained with federated learning achieved almost the same accuracy as the model trained with classical learning. When the model is trained with the federated learning method, it has been revealed that the training time is reduced by approximately 70%. Thus, it is concluded that the energy requirement of the devices that will use the federated learning architecture has decreased. In addition, since the trained model is sent to the center instead of data, both data privacy is ensured and network traffic is significantly reduced. Experimental results show that the developed algorithm produces successful results.
Today, the use of systems that make our life easier, such as the Internet of Things and mobile applications, and which require clients and servers to work together, has increased. The use of machine learning models in these areas has become a necessity. However, collecting data from the clients, transferring them to the server, training the machine learning model and then integrating this model into the devices running on the clients bring along many problems. The transfer of data from the clients to the server causes network traffic and requires a lot of energy. The data includes personal data. The transfer of personal data during the transfer can exploit data privacy. The lack of data sharing to ensure data privacy, on the other hand, reduces the variety and size of data, causing the model to be developed on the server to not give correct results, reducing its efficiency and restricting machine learning studies in this area. Federated learning architecture is used to produce solutions to these problems mentioned within the scope of the thesis. According to the architecture, each client trains a local machine learning model using its own data. The trained models are sent to the server and a new model is created by merging these models on the server. The final model created is distributed to the clients again. In this way, network traffic is reduced and energy demand is reduced. Data privacy, on the other hand, is protected as only meaningless data is sent instead of all data. Federated learning architecture is an up-to-date and open field. Algorithms to be developed in this area are needed. In this study, an algorithm called Data Unaware Classification Based on Association (du-CBA) has been developed. The algorithm is an association classification algorithm developed for federated learning architecture. In order to compare federated learning and classical learning architectures and measure their success, a simulation environment was created within the scope of the study. Models were trained using du-CBA and CBA algorithms in the simulation and the results were compared. Five data sets from the University of California Irvine (UCI) repository were used to train the models. Experimental results showed that for each data set separately, the model trained with federated learning achieved almost the same accuracy as the model trained with classical learning. When the model is trained with the federated learning method, it has been revealed that the training time is reduced by approximately 70%. Thus, it is concluded that the energy requirement of the devices that will use the federated learning architecture has decreased. In addition, since the trained model is sent to the center instead of data, both data privacy is ensured and network traffic is significantly reduced. Experimental results show that the developed algorithm produces successful results.
