Publication: Machine learning-based decision tree model for differential diagnosis of congenital adrenal hyperplasia subtypes using steroid hormone profiles
Abstract
Konjenital adrenal hiperplazi (KAH), steroidogenez yolundaki enzim eksiklikleriyle karakterize edilen ve kortizol sentezini bozarak aşırı steroid hormonu üretimine yol açan bir grup genetik hastalıktır. Farklı KAH alt tiplerini birbirinden ayırt etmek, uygun klinik yönetim için kritik öneme sahiptir. Bu çalışma, 18 ana steroid hormonunun profiline dayalı olarak KAH alt tiplerinin doğru ayırıcı tanısı için makine öğrenmesi tabanlı bir karar ağacı modeli geliştiren yeni bir yaklaşımı sunmaktadır. Çalışma grubunda 702 sağlıklı bireyin ve genetik testlerle 9 farklı KAH alt tipinden birine tanı almış 328 hastanın verileri (ilgili steroid hormon seviyeleri dahil) kullanılmıştır. En ayırt edici hormon belirteçlerini tanımlamak için LightGBM algoritması kullanılmıştır. Seçilen bu özellikler kullanılarak KAH alt tiplerini sınıflandırmak için bir karar ağacı modeli oluşturulmuştur. Modelin performansı çapraz doğrulama teknikleri kullanılarak değerlendirilmiş olup, KAH alt tiplerini ayırmada yüksek doğruluk (%98,16 - %100), duyarlılık (%65,4 - %100) ve özgüllük (%89 - %100) elde etmiştir. Modelin sağlıklı bireyleri KAH vakalarından ayırt etmedeki genel doğruluğu %97, özgüllüğü %93,7 ve duyarlılığı %99,6 olarak bulunmuştur. Bu yaklaşım, KAH alt tiplerinin sınıflandırılmasına önemli ölçüde katkıda bulunan ve bu hastalıkların patofizyolojisi hakkında değerli bilgiler sağlayan önemli steroid hormon profillerinin tanımlanmasını sağlamıştır. Bu makine öğrenmesi tabanlı karar ağacı modeli, KAH alt tiplerinin ayırıcı tanısı için umut verici bir araçtır ve klinisyenlerin hastaları zamanında ve doğru bir şekilde tedavi etmelerine yardımcı olur. Elde edilen sonuçlar, nadir endokrin hastalıklar gibi alanlarda tanısal doğruluğu artırmak için ileri bilgisayar yöntemleriyle birlikte steroid hormon profillerinin kullanılmasının potansiyelini vurgulamaktadır.
Congenital adrenal hyperplasia (CAH) is a group of genetic disorders characterized by enzyme deficiencies in the steroidogenesis pathway, leading to impaired cortisol synthesis and excessive steroid hormone production. Distinguishing between the different subtypes of CAH is crucial for appropriate clinical management. In this study, a novel approach that develops a machine learning-based decision tree model for the accurate differential diagnosis of CAH subtypes based on the profiles of 18 major steroid hormones was presented. The study cohort included data from healthy individuals (n=702) and from patients (n=328) diagnosed with one of 9 CAH subtypes by genetic testing, including the corresponding steroid hormone levels. The LightGBM algorithm was used to identify the most discriminatory hormone markers. A decision tree model was then constructed using these selected features to classify CAH subtypes. The performance of our model was evaluated using cross-validation techniques and achieved high accuracy (98.16% to 100%), sensitivity (65.4% to 100%), and specificity (89% to 100%) in discriminating CAH subtypes. The overall accuracy of the model distinguishing healthy individuals from CAH cases was 97%, specificity was 93.7% and sensitivity was 99.6%. In particular, current approach enabled the identification of important steroid hormone profiles that contribute significantly to the classification of CAH subtypes and provide valuable insights into the pathophysiology of these diseases. In summary, this machine learning-based decision tree model is a promising tool for the differential diagnosis of CAH subtypes, helping clinicians to treat patients in a timely and accurate manner. Furthermore, the obtained results emphasize the potential of using steroid hormone profiles in conjunction with advanced computational methods to improve diagnostic accuracy in rare endocrine diseases such as CAH.
Congenital adrenal hyperplasia (CAH) is a group of genetic disorders characterized by enzyme deficiencies in the steroidogenesis pathway, leading to impaired cortisol synthesis and excessive steroid hormone production. Distinguishing between the different subtypes of CAH is crucial for appropriate clinical management. In this study, a novel approach that develops a machine learning-based decision tree model for the accurate differential diagnosis of CAH subtypes based on the profiles of 18 major steroid hormones was presented. The study cohort included data from healthy individuals (n=702) and from patients (n=328) diagnosed with one of 9 CAH subtypes by genetic testing, including the corresponding steroid hormone levels. The LightGBM algorithm was used to identify the most discriminatory hormone markers. A decision tree model was then constructed using these selected features to classify CAH subtypes. The performance of our model was evaluated using cross-validation techniques and achieved high accuracy (98.16% to 100%), sensitivity (65.4% to 100%), and specificity (89% to 100%) in discriminating CAH subtypes. The overall accuracy of the model distinguishing healthy individuals from CAH cases was 97%, specificity was 93.7% and sensitivity was 99.6%. In particular, current approach enabled the identification of important steroid hormone profiles that contribute significantly to the classification of CAH subtypes and provide valuable insights into the pathophysiology of these diseases. In summary, this machine learning-based decision tree model is a promising tool for the differential diagnosis of CAH subtypes, helping clinicians to treat patients in a timely and accurate manner. Furthermore, the obtained results emphasize the potential of using steroid hormone profiles in conjunction with advanced computational methods to improve diagnostic accuracy in rare endocrine diseases such as CAH.
