Addressing Class Imbalance in Healthcare Data: Machine Learning Solutions for Age-Related Macular Degeneration and Preeclampsia
Keywords:
Healthcare Domain, Class Imbalance, Ensemble Classifiers, Diagnostic Decision-Making, Personalized Medicine, Machine Learning TechniquesAbstract
The use of machine learning in healthcare has
transformed the way diseases are diagnosed and treatments
are optimized. However, medical databases often lack balanced
data due to challenges in data collection caused by privacy
regulations. Certain health conditions are underrepresented,
which hampers machine learning performance. To address this
problem, a hybrid approach has been proposed that combines
the Synthetic Minority Oversampling Technique (SMOTE) with
undersampling and uses two specific techniques tailored for
imbalanced datasets. Comparative evaluations were conducted
using various thresholds to reduce one class and employing
Balanced Accuracy to mitigate bias toward the majority class,
with popular machine learning methods. The results showed
that Balanced Bagging and Balanced Random Forest consistently
outperformed other methods, performing the best with
an average ranking of 1.42 and 3.58 out of 32 configurations
in the two datasets, respectively. Tree-based approaches such
as Random Forest and Gradient Boosting demonstrated similar
effectiveness, emphasizing the power of aggregating predictions
from multiple trees to reduce bias. Notably, undersampling and
SMOTE proved advantageous for non-tree-based models like
KNN, SVM, and Logistic Regression showcasing their usefulness
across different algorithms. This study provides a robust solution
for handling imbalanced datasets in healthcare, which could
potentially optimize healthcare interventions and improve patient
outcomes and care.
Downloads
References
S. Makridakis, “The forthcoming artificial intelligence (ai) revolution:
Its impact on society and firms,” Futures, vol. 90, pp. 46–60, 2017, doi
1016/j.futures.2017.03.006.
V. Noorbakhsh-Sabet, N. Zand, Y. Zhang, and A. Abedi, “Artificial
intelligence transforms the future of health care,” The American Journal
of Medicine, pp. 795–801, 2019, doi 10.1016/j.amjmed.2019.01.017.
M. Khushi, K. Shaukat, T. M. Alam, I. A. Hameed, S. Uddin, S. Luo,
X. Yang, and M. C. Reyes, “A comparative performance analysis of data
resampling methods on imbalance medical data,” IEEE Access, vol. 9,
pp. 109 960–109 975, 2021, doi 10.1109/ACCESS.2021.3102399.
Centro del Conocimiento Bioético, “Comisión nacional de bioética ::
México,” 2015.
M. Bach, A. Werner, J. ̇Zywiec, and W. Pluskiewicz, “The study of
under- and over-sampling methods’ utility in the analysis of highly
imbalanced data on osteoporosis,” Medical Science Monitor, pp. 174–
, 2017, doi 10.1016/j.ins.2016.09.038.
M. P. Reddy, S. J. Fox, and Purohit, “Artificial intelligence-enabled
healthcare delivery,” Journal of the Royal Society of Medicine, vol. 112,
no. 1, pp. 22–28, 2019, doi 10.1177/01410768188155.
J. Xiao, C. Choi, and J. Sun, “Opportunities and challenges in developing
deep learning models using electronic health records data: a systematic
review,” Journal of Medical Systems, 2018, doi 10.1093/jamia/ocy068.
A. Fernández, V. López, M. Galar, M. J. Del Jesus, and
F. Herrera, “Analysing the classification of imbalanced data-sets
with multiple classes: Binarization techniques and ad-hoc ap-
proaches,” Knowledge-Based Systems, vol. 42, pp. 97–110, 2013, doi
1016/j.knosys.2013.01.018.
B. Krawczyk, “Learning from imbalanced data: open challenges and
future directions,” Progress in Artificial Intelligence, vol. 5, no. 4, pp.
–232, 2016, doi 10.1007/s13748-016-0094-0.
W. L. Wong, X. Su, X. Li, C. M. G. Cheung, R. Klein, C.-Y. Cheng, and
T. Y. Wong, “Global prevalence of age-related macular degeneration and
disease burden projection for 2020 and 2040: a systematic review and
meta-analysis,” The Lancet. Global health, vol. 2, no. 2, pp. e106–16,
February 2014, doi 10.1016/S2214-109X(13)70145-1.
INEGI, “Banco de indicadores - ixtacamaxtitlán,” 2016.
A. Jimenez-Corona and E. Graue-Hernandez, “Global prevalence and
years lived with disability (ylds) due to vision loss in mexico in 2016,”
Investigative Ophthalmology and Visual Science, vol. 59, no. 9, 2018.
T. A. Sivakumaran, R. P. Igo, J. M. Kidd, A. Itsara, L. J. Kopplin,
W. Chen, S. A. Hagstrom, N. S. Peachey, P. J. Francis, M. L. Klein,
E. Y. Chew, V. L. Ramprasad, W. T. Tay, P. Mitchell, M. Seielstad, D. E.
Stambolian, A. O. Edwards, K. E. Lee, D. V. Leontiev, G. Jun, Y. Wang,
L. Tian, F. Qiu, A. K. Henning, T. LaFramboise, P. Sen, M. Aarthi,
R. George, R. Raman, M. K. Das, L. Vijaya, G. Kumaramanickavel, T. Y.
Wong, A. Swaroop, G. R. Abecasis, R. Klein, B. E. K. Klein, D. A. Nick-
erson, E. E. Eichler, and S. K. Iyengar, “A 32 kb critical region excluding
y402h in cfh mediates risk for age-related macular degeneration,” PLoS
ONE, vol. 6, no. 10, 2011, doi 10.1371/journal.pone.0209943.
E. M. Stone, A. J. Aldave, A. V. Drack, M. W. MacCumber,
V. C. Sheffield, E. Traboulsi, and R. G. Weleber, “Recommendations
for genetic testing of inherited eye diseases: Report of the ameri-
can academy of ophthalmology task force on genetic testing,” Oph-
thalmology, vol. 119, no. 11, pp. 2408–2410, November 2012, doi
1016/j.ophtha.2012.05.047.
L. Hindorff, J. MacArthur, J. HA, H. PN, K. AK, and M. TA, “Catalog
of published genome-wide association studies - national human genome
research institute (nhgri),” 2014.
R. T. Yanagihara, C. S. Lee, D. S. W. Ting, and A. Y. Lee, “Method-
ological challenges of deep learning in optical coherence tomography for
retinal diseases: a review,” Translational Vision Science & Technology,
vol. 9, no. 2, pp. 11–11, 2020, doi 10.1167/tvst.9.2.11.
T. M. Alam, K. Shaukat, I. A. Hameed, W. A. Khan, M. U. Sarwar,
F. Iqbal, and S. Luo, “A novel framework for prognostic factors iden-
tification of malignant mesothelioma through association rule mining,”
Biomedical Signal Processing and Control, vol. 68, p. 102726, 2021,
doi 10.1016/j.bspc.2021.102726.
P. Cacheiro Martínez, J. M. Ordovás, and D. Corella, “Métodos de
selección de variables en estudios de asociación genética. aplicación a un
estudio de genes candidatos en enfermedad de parkinson,” Universidad
de Santiago de Compostela, Coruña, España, Tech. Rep., 2011.
R. Iniesta, E. Guinó, and V. Moreno, “Análisis estadístico de polimorfis-
mos genéticos en estudios epidemiológicos,” Gaceta Sanitaria, vol. 19,
no. 4, pp. 333–341, 2005.
M. Zhang and P. N. Baird, “A decade of age-related macular degen-
eration risk models: What have we learned from them and where are
we going?” Ophthalmic Genetics, vol. 00, pp. 1–7, November 2016, doi
1080/13816810.2016.1227451.
L. Sobrin and J. M. Seddon, “Nature and nurture-genes and
environment-predict onset and progression of macular degeneration,”
Progress in retinal and eye research, vol. 40, pp. 1–15, 2014, doi
1016/j.preteyeres.2013.12.004.
C. Castaneda, K. Nalley, C. Mannion, P. Bhattacharyya, P. Blake,
A. Pecora, A. Goy, and K. S. Suh, “Clinical decision support systems
for improving diagnostic accuracy and achieving precision medicine,”
Journal of Clinical Bioinformatics, vol. 5, no. 1, p. 4, 2015, doi
1186/s13336-015-0019-3.
A. Martínez-Velasco, L. Martínez-Villaseñor, A. C. Perez-Ortiz, J. C.
Zenteno, A. B. Luna-Angulo, A. R. Villa-Romero, A. Rendon, F. J.
Estrada, L. Martínez-Villasenor, L. Miralles-Pechuan, A. Rendon, and
F. J. Estrada-Mena, “Cfh and htra1 genes associated with amd in
mexican population,” Investigative Ophthalmology & Visual Science,
vol. 58, no. 8, p. 2268, 2017, doi 10.13140/RG.2.2.10175.61609.
P. Larrañaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A.
Lozano, R. Armañanzas, G. Santafé, A. Pérez, and V. Robles, “Machine
learning in bioinformatics,” Briefings in Bioinformatics, vol. 7, no. 1, pp.
–112, March 2006, doi 10.1016/B978-0-323-89775-4.00020-1.
K. L. Spencer, L. M. Olson, N. Schnetz-Boutaud, P. Gallins, A. Agarwal,
A. Iannaccone, S. B. Kritchevsky, M. Garcia, M. A. Nalls, A. B. New-
man, W. K. Scott, M. A. Pericak-Vance, and J. L. Haines, “Using genetic
variation and environmental risk factor data to identify individuals at
high risk for age-related macular degeneration,” PLoS ONE, vol. 6, no. 3,
p. e17784, March 2011, doi 10.1371/journal.pone.0017784.
R. Jiang, W. Tang, X. Wu, and W. Fu, “A random forest approach to the
detection of epistatic interactions in case-control studies,” BMC Bioin-
formatics, vol. 10, no. Suppl 1, p. S65, January 2009, doi 10.1186/1471-
-10-S1-S65.
B. Gold, J. C. J. E. Merriam, J. Zernant, L. S. Hancox, A. J. Taiber,
K. Gehrs, K. Cramer, J. Neel, J. Bergeron, G. R. Barile, R. T. Smith,
G. S. Hageman, M. Dean, R. Allikmets, S. Chang, L. A. Yannuzzi,
I. Barbazetto, L. E. Lerner, S. Russell, J. Hoballah, J. Hageman, and
H. Stockman, “Variation in factor b (bf) and complement component 2
(c2) genes is associated with age-related macular degeneration,” Nature
Genetics, vol. 38, no. 4, pp. 458–462, April 2006, doi 10.1038/ng1750.
X. Chen, C.-T. Liu, M. Zhang, and H. Zhang, “A forest-based approach
to identifying gene and gene-gene interactions,” Proceedings of the
National Academy of Sciences of the United States of America, vol.
, no. 49, pp. 19 199–19 203, 2007, doi 10.1073/pnas.0709868104.
A. Çelebiler, H. Seker, B. YÜKSEL, A. Orun, S. Bilgili, and M. B.
Karaca, “Discovery of the connection among age-related macular de-
generation, mthfr c677t and pai 1 4g/5g gene polymorphisms, and body
mass index by means of bayesian inference methods,” Turkish Journal
of Electrical Engineering and Computer Sciences, vol. 21, no. 7, pp.
–2078, 2013, doi 10.3906/elk-1111-21.
P. Fraccaro, M. Nicolo, M. Bonetto, M. Giacomini, P. Weller, C. E.
Traverso, M. Prosperi, D. OSullivan, and D. OSullivan, “Combining
macula clinical signs and patient characteristics for age-related macular
degeneration diagnosis: a machine learning approach,” BMC Ophthal-
mology, vol. 15, p. 10, January 2015, doi 10.1186/1471-2415-15-10.
S. Krishnaiah, B. Surampudi, and J. Keeffe, “Modeling the risk of
age-related macular degeneration and its predictive comparisons in a
population in south india,” International Journal of Community Medicine
and Public Health, vol. 2, no. 2, p. 137, 2015, doi 10.5455/2394-
ijcmph20150514.
K. E. Walker, Evaluation of Children’s Futures: Improving Health and
Development Outcomes for Children in Trenton, New Jersey, 2001-2005,
, inter-University Consortium for Political and Social Research.
M. Sircar, R. Thadhani, and S. A. Karumanchi, “Pathogenesis
of preeclampsia,” Current Opinion in Nephrology and
Hypertension, vol. 24, no. 2, pp. 131–138, 2015, doi
1097/MNH.0000000000000105.
ACOG Committee on Practice Bulletins–Obstetrics, “ACOG practice
bulletin. Diagnosis and management of preeclampsia and eclampsia.
Number 33, January 2002,” Obstetrics and Gynecology, vol. 99, no. 1,
pp. 159–167, January 2002, doi 10.1016/s0029-7844(01)01747-1.
A. R. Vest and L. S. Cho, “Hypertension in pregnancy,” Current
atherosclerosis reports, vol. 16, pp. 1–11, 2014, doi 10.1007/s11883-
-0395-8.
M. A. Kohn, C. R. Carpenter, and T. B. Newman, “Understanding
the direction of bias in studies of diagnostic test accuracy,” Aca-
demic Emergency Medicine, vol. 20, no. 11, pp. 1194–1206, 2013, doi
1111/acem.12255.
P. M. M. Bossuyt, “Clinical validity: Defining biomarker per-
formance,” Scandinavian Journal of Clinical and Laboratory In-
vestigation, vol. 70, no. sup242, pp. 46–52, January 2010, doi
3109/00365513.2010.493383.
L. C. Kenny, W. B. Dunn, D. I. Ellis, J. Myers, P. N. Baker, and D. B.
Kell, “Novel biomarkers for pre-eclampsia detected using metabolomics
and machine learning,” Metabolomics, vol. 1, no. 3, pp. 227–234, 2005,
doi 10.1007/s11306-005-0003-1.
C. K. Neocleous, P. Anastasopoulos, K. H. Nikolaides, C. N.
Schizas, and K. C. Neokleous, “Neural networks to estimate the
risk for preeclampsia occurrence,” in Proceedings of the International
Joint Conference on Neural Networks, 2009, pp. 2221–2225, doi
1109/IJCNN.2009.5178820.
M. Espinilla, J. Medina, A.-L. García-Fernández, S. Campaña, and
J. Londoño, “Fuzzy intelligent system for patients with preeclampsia
in wearable devices,” Mobile Information Systems, pp. 1–10, October
, doi 10.1155/2017/7838464.
M. Velikova, J. T. Van Scheltinga, P. J. Lucas, and M. Spaander-
man, “Exploiting causal functional relationships in bayesian network
modelling for personalised healthcare,” International Journal of Ap-
proximate Reasoning, vol. 55, no. 1 PART 1, pp. 59–73, 2014, doi
1016/j.ijar.2013.03.016.
E. Tejera, M. Jose Areias, A. Rodrigues, A. Rama, J. Manuel
Nieto-Villar, and I. Rebelo, “Artificial neural network for normal,
hypertensive, and preeclamptic pregnancy classification using mater-
nal heart rate variability indexes,” Journal of Maternal-Fetal and
Neonatal Medicine, vol. 24, no. 9, pp. 1147–1151, 2011, doi
3109/14767058.2010.545916.
P. M. Villa, P. Marttinen, J. Gillberg, A. I. Lokki, K. Majander,
M. R. Ordén, P. Taipale, A. Pesonen, K. Räikkönen, E. Hämäläinen,
E. Kajantie, and H. Laivuori, “Cluster analysis to estimate the risk of
preeclampsia in the high-risk prediction and prevention of preeclampsia
and intrauterine growth restriction (predo) study,” PLoS ONE, vol. 12,
no. 3, pp. 1–14, 2017, doi 10.1371/journal.pone.0174399.
M. W. Moreira, J. J. Rodrigues, A. M. Oliveira, R. F. Ramos, and
K. Saleem, “A preeclampsia diagnosis approach using bayesian net-
works,” in 2016 IEEE International Conference on Communications
(ICC), 2016, pp. 1–5.
P. Fergus, C. C. Montanez, B. Abdulaimma, P. Lisboa, C. Chalmers,
and B. Pineles, “Utilizing deep learning and genome-wide associa-
tion studies for epistatic-driven preterm birth classification in african-
american women,” IEEE/ACM transactions on computational biology
and bioinformatics, vol. 17, no. 2, pp. 668–678, 2018.
R. Mehta, N. Bhatt, and A. Ganatra, “A survey on data mining technolo-
gies for decision support system of maternal care domain,” International
Journal of Computer Applications, vol. 138, no. 10, pp. 975–8887, 2016,
doi 10.5120/ijca2016908965.
G. Kovács, “Smote-variants: A python implementation of 85 minority
oversampling techniques,” Neurocomputing, vol. 366, pp. 352–354,
November 2019, doi 10.1016/j.neucom.2019.06.100.
J. Luengo, A. Fernández, S. García, and F. Herrera, “Addressing data
complexity for imbalanced data sets: analysis of smote-based oversam-
pling and evolutionary undersampling,” Soft Computing, vol. 15, no. 10,
pp. 1909–1936, 2011, doi 10.1007/s00500-010-0625-8.
G. E. A. P. Batista, R. C. Prati, and M. C. Monard, “A study of the
behavior of several methods for balancing machine learning training
data,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, p. 20, June
, doi 10.1145/1007730.1007735.
B. Zhu, Z. Gao, J. Zhao, and S. K. vanden Broucke, “Iric: An r library
for binary imbalanced classification,” SoftwareX, vol. 10, p. 100341,
K.-J. Wang, A. M. Adrian, K.-H. Chen, and K.-M. Wang, “A hybrid
classifier combining borderline-smote with airs algorithm for estimating
brain metastasis from lung cancer: A case study in taiwan,” Computer
methods and programs in biomedicine, vol. 119, no. 2, pp. 63–76, 2015,
doi 10.1016/j.cmpb.2015.03.003.
X. Y. Liu and Z. H. Zhou, “The influence of class imbalance on
cost-sensitive learning: An empirical study,” in Proceedings - IEEE
International Conference on Data Mining, ICDM, 2006, pp. 970–974,
doi 10.1109/ICDM.2006.158.
S. H. Khan, M. Hayat, M. Bennamoun, F. A. Sohel, and R. Togneri,
“Cost-sensitive learning of deep feature representations from imbalanced
data,” IEEE Transactions on Neural Networks and Learning Systems,
vol. 29, no. 8, pp. 3573–3587, August 2018.
G. M. Weiss and F. Provost, “Learning when training data are costly:
The effect of class distribution on tree induction,” Journal of Artificial
Intelligence Research, vol. 19, pp. 315–354, 2007, doi 10.1613/jair.1199.
K. McCarthy, B. Zabar, and G. Weiss, “Does cost-sensitive learning
beat sampling for classifying rare classes?” in Proceedings of the 1st
International Workshop on Utility-Based Data Mining, UBDM ’05,
, pp. 69–77, doi 10.1145/1089827.1089836.
N. Japkowicz and S. Stephen, “The class imbalance problem: A system-
atic study,” Intelligent Data Analysis, vol. 6, no. 5, pp. 429–449, 2002,
doi 10.3233/IDA-2002-6504.
M. D. Alanis Tamez, “Prediagnóstico de enfermedades crónicas medi-
ante algoritmos de cómputo inteligente,” Ph.D. dissertation, CIC, IPN,
, doi 10.13053/cys-24-3-3492.
B. U. A. M. Moreno, “Sistema de clasificación paralelo basado en un
ensamble de tipo mezcla de expertos,” Ph.D. dissertation, Universidad
Autónoma Metropolitana, 2017.
Y. Xia, K. Chen, and Y. Yang, “Multi-label classification with weighted
classifier selection and stacked ensemble,” Information Sciences, 2020,
doi 10.1016/j.ins.2020.06.017.
J. Diez-Pastor, J. J. Rodríguez, C. García-Osorio, and L. Kuncheva,
“Diversity techniques improve the performance of the best imbalance
learning ensembles,” Information Sciences, vol. 325, pp. 98–117, 2015,
doi 10.1016/j.ins.2015.07.025.
L. J. Mena, E. E. Orozco, V. G. Felix, R. Ostos, J. Melgarejo,
and G. E. Maestre, “Machine learning approach to extract diagnostic
and prognostic thresholds: Application in prognosis of cardiovascular
mortality,” Computational and Mathematical Methods in Medicine, vol.
, 2012, doi 10.1155/2012/750151.
J. Li, Y. Liu, and Q. Li, “Intelligent fault diagnosis of rolling bearings
under imbalanced data conditions using attention-based deep learning
method,” Measurement, vol. 189, p. 110500, 2022, doi 10.1088/1742-
/2369/1/012001.
L. I. Santos, M. O. Camargos, M. F. S. V. D’Angelo, J. B. Mendes,
E. E. C. de Medeiros, A. L. S. Guimarães, and R. M. Palhares, “Decision
tree and artificial immune systems for stroke prediction in imbalanced
data,” Expert Systems with Applications, vol. 191, p. 116221, 2022.
X. Peng, X. Jin, S. Duan, and C. Sankavaram, “Active learning-
assisted semi-supervised learning for fault detection and diagnostics with
imbalanced dataset,” IISE Transactions, vol. 55, no. 7, pp. 672–686,
, doi 10.1080/24725854.2022.2074579.
H. Ullah, M. B. B. Heyat, F. Akhtar, A. Y. Muaad, C. C. Ukwuoma,
M. Bilal, M. H. Miraz, M. A. S. Bhuiyan, K. Wu, R. Damaševiˇcius et al.,
“An automatic premature ventricular contraction recognition system
based on imbalanced dataset and pre-trained residual network using
transfer learning on ecg signal,” Diagnostics, vol. 13, no. 1, p. 87, 2023,
doi 10.3390/diagnostics13010087.
M. Mazur-Milecka, N. Kowalczyk, K. Jaguszewska, D. Zamkowska,
D. Wójcik, K. Preis, H. Skov, S. Wagner, P. Sandager, M. Sobotka
et al., “Preeclampsia risk prediction using machine learning methods
trained on synthetic data,” in Polish Conference on Biocybernetics and
Biomedical Engineering. Springer, 2023, pp. 267–281.
V. P. Kovacheva, B. W. Eberhard, R. Y. Cohen, M. Maher, R. Saxena,
and K. J. Gray, “Prediction of preeclampsia from clinical and genetic
risk factors in early and late pregnancy using machine learning and
polygenic risk scores,” MedRxiv, pp. 2023–02, 2023.
A. R. Chłopowiec, K. Karanowski, T. Skrzypczak, M. Grzesiuk, A. B.
Chłopowiec, and M. Tabakov, “Counteracting data bias and class imbal-
ance—towards a useful and reliable retinal disease recognition system,”
Diagnostics, vol. 13, no. 11, p. 1904, 2023.
Y. Xie, Q. Wan, H. Xie, Y. Xu, T. Wang, S. Wang, and B. Lei,
“Fundus image-label pairs synthesis and retinopathy screening via gans
with class-imbalanced semi-supervised learning,” IEEE Transactions on
Medical Imaging, 2023, doi.
Y. A. Veturi, W. Woof, T. Lazebnik, I. Moghul, P. Woodward-Court,
S. K. Wagner, T. A. C. de Guimarães, M. D. Varela, B. Liefers, P. J.
Patel et al., “Syntheye: Investigating the impact of synthetic data on
artificial intelligence-assisted gene diagnosis of inherited retinal disease,”
Ophthalmology Science, vol. 3, no. 2, p. 100258, 2023, doi.
K.-J. Wang, A. M. Adrian, K.-H. Chen, and K.-M. Wang, “A hybrid
classifier combining borderline-smote with airs algorithm for estimating
brain metastasis from lung cancer: a case study in taiwan,” Computer
Methods and Programs in Biomedicine, vol. 119, no. 2, p. 63—76, April
, doi 10.1016/j.cmpb.2015.03.003.
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer,
“Smote: Synthetic minority over-sampling technique,” Journal of Ar-
tificial Intelligence Research, vol. 16, pp. 321–357, January 2002, doi
5555/1622407.1622416.
V. Palodeto, H. Terenzi, and J. L. B. Marques, “Training neural networks
for protein secondary structure prediction: the effects of imbalanced data
set,” in International Conference on Intelligent Computing, 2009, pp.
–265.
R. Blagus and L. Lusa, “Smote for high-dimensional class-imbalanced
data,” BMC Bioinformatics, vol. 14, 2013, doi 10.1186/1471-2105-14-
F. R. Torres, J. A. Carrasco-Ochoa, and J. F. Martínez-Trinidad, “Smote-
d a deterministic version of smote,” in Pattern Recognition: 8th Mexican
Conference, MCPR 2016, Guanajuato, Mexico, June 22-25, 2016. Pro-
ceedings 8. Springer, 2016, pp. 177–188.
H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE
Transactions on knowledge and data engineering, vol. 21, no. 9, pp.
–1284, 2009.
A. J. Mohammed, M. M. Hassan, and D. H. Kadir, “Improving clas-
sification performance for a novel imbalanced medical dataset using
smote method,” International Journal of Advanced Trends in Computer
Science and Engineering, vol. 9, no. 3, pp. 3161–3172, May 2020, doi
30534/ijatcse/2020/104932020.
Z. Yao, Y. Yuan, Z. Shi, W. Mao, G. Zhu, G. Zhang, and Z. Wang,
“Funswin: A deep learning method to analysis diabetic retinopathy
grade and macular edema risk based on fundus images,” Frontiers in
Physiology, vol. 13, p. 961386, 2022.
M. S. Khan, N. Tafshir, K. N. Alam, A. R. Dhruba, M. M. Khan,
A. A. Albraikan, F. A. Almalki et al., “Deep learning for ocular disease
recognition: an inner-class balance,” Computational Intelligence and
Neuroscience, vol. 2022, 2022, doi 10.1155/2022/5007111.
T. M. Alam, K. Shaukat, W. A. Khan, I. A. Hameed, L. A. Almuqren,
M. A. Raza, M. Aslam, and S. Luo, “An efficient deep learning-based
skin cancer classifier for an imbalanced dataset,” Diagnostics, vol. 12,
no. 9, p. 2115, 2022, doi 10.3390/diagnostics12092115.
A. M. Sowjanya and O. Mrudula, “Effective treatment of imbalanced
datasets in health care using modified smote coupled with stacked deep
learning algorithms,” Applied Nanoscience, vol. 13, no. 3, pp. 1829–
, 2023, doi 10.1007/s13204-021-02063-4.
K. Koc, Ö. Ekmekcio ̆glu, and A. P. Gurgun, “Prediction of construction
accident outcomes based on an imbalanced dataset through integrated
resampling techniques and machine learning methods,” Engineering,
Construction and Architectural Management, 2022, doi 10.1108/ECAM-
-2022-0305.
X.-w. Chen and M. Wasikowski, “Fast: a roc-based feature selection
metric for small samples and imbalanced data classification problems,”
in Proceedings of the 14th ACM SIGKDD international conference
on Knowledge discovery and data mining, 2008, pp. 124–132, doi
1145/1401890.1401910.
Z. P. Agusta et al., “Modified balanced random forest for im-
proving imbalanced data prediction,” International Journal of Ad-
vances in Intelligent Informatics, vol. 5, no. 1, pp. 58–65, 2019, doi
26555/ijain.v5il.255.
L. Breiman, “Random forests,” Machine learning, vol. 45, pp. 5–32,
J. H. Friedman, “Greedy function approximation: a gradient boosting
machine,” Annals of statistics, pp. 1189–1232, 2001.
J. R. Quinlan, “Induction of decision trees,” Machine learning, vol. 1,
pp. 81–106, 1986, doi 10.1007/BF00116251.
D. R. Cox, “The regression analysis of binary sequences,” Journal of
the Royal Statistical Society Series B: Statistical Methodology, vol. 20,
no. 2, pp. 215–232, 1958, doi 10.1111/j.2517-6161.1959.tb00334.x.
T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE
transactions on information theory, vol. 13, no. 1, pp. 21–27, 1967, doi
1109/TIT.1967.1053964.
C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning,
vol. 20, pp. 273–297, 1995, doi 10.1007/BF00994018.
M. L. Calle and V. Urrea, “Stability of random forest importance
measures,” Briefings in bioinformatics, vol. 12, no. 1, pp. 86–89, 2011,
doi 10.1093/bib/bbq011.
M. S. Shelke, P. R. Deshmukh, and V. K. Shandilya, “A review
on imbalanced data handling using undersampling and oversampling
technique,” Int. J. Recent Trends Eng. Res, vol. 3, no. 4, pp. 444–449,