Addressing Class Imbalance in Healthcare Data: Machine Learning Solutions for Age-Related Macular Degeneration and Preeclampsia

Authors

Keywords:

Healthcare Domain, Class Imbalance, Ensemble Classifiers, Diagnostic Decision-Making, Personalized Medicine, Machine Learning Techniques

Abstract

The use of machine learning in healthcare has
transformed the way diseases are diagnosed and treatments
are optimized. However, medical databases often lack balanced
data due to challenges in data collection caused by privacy
regulations. Certain health conditions are underrepresented,
which hampers machine learning performance. To address this
problem, a hybrid approach has been proposed that combines
the Synthetic Minority Oversampling Technique (SMOTE) with
undersampling and uses two specific techniques tailored for
imbalanced datasets. Comparative evaluations were conducted
using various thresholds to reduce one class and employing
Balanced Accuracy to mitigate bias toward the majority class,
with popular machine learning methods. The results showed
that Balanced Bagging and Balanced Random Forest consistently
outperformed other methods, performing the best with
an average ranking of 1.42 and 3.58 out of 32 configurations
in the two datasets, respectively. Tree-based approaches such
as Random Forest and Gradient Boosting demonstrated similar
effectiveness, emphasizing the power of aggregating predictions
from multiple trees to reduce bias. Notably, undersampling and
SMOTE proved advantageous for non-tree-based models like
KNN, SVM, and Logistic Regression showcasing their usefulness
across different algorithms. This study provides a robust solution
for handling imbalanced datasets in healthcare, which could
potentially optimize healthcare interventions and improve patient
outcomes and care.

Downloads

Download data is not yet available.

Author Biographies

Antonieta Martinez-Velasco, Universidad Panamericana

Antonieta Martínez-Velasco is a professor and researcher at the Engineering School at the Universidad Panamericana. She received her Ph.D. in engineering from Universidad Panamericana. Her main research areas are data analytics, Artificial Intelligence, and Machine learning techniques applied to social and health sciences. She is part of the Mexican National Researchers System.

Lourdes Martínez -Villaseñor, Universidad Panamericana

Lourdes Martínez-Villaseñor is a Full-time Professor
in the School of Engineering at the Universidad
Panamericana, Mexico, and head of the postgraduate
academic area. She is a Computer Systems
Engineer and a Doctor in Computational Sciences
from Tecnológico de Monterrey, Mexico. She has
the distinction of level 1 of the National System of
Researchers of CONACYT. Her main research interests
are artificial intelligence applied to healthcare
systems and ethics for artificial intelligence.

Luis Miralles-Pechuán, University College Dublin

Luis Miralles-Pechuán is a Lecturer at Technological University Dublin. He obtained his PhD and Bachelor in Computer Science at the University of Murcia (Spain). He worked as a full-time researcher/lecturer at the University Panamericana in Mexico for three years. He started a PhD in 2012 on creating new approaches within the Online Advertising world. During his PhD, he got familiar with ML and many papers on how to apply ML to online advertising. After finishing his PhD, he worked in postdoc levels I and II in CeADAR, University College Dublin, and there, he won the prize for supervising the best student
paper at the Digital Forensic conference. His topic is applying Reinforcement Learning to fight the COVID-19 pandemic and plan the containing levels, considering public health and the economy. Lastly, he has expertise in human activity recognition and generalized zero-shot learning (GZSL) and applying machine learning to improve the accessibility of websites.

References

S. Makridakis, “The forthcoming artificial intelligence (ai) revolution:

Its impact on society and firms,” Futures, vol. 90, pp. 46–60, 2017, doi

1016/j.futures.2017.03.006.

V. Noorbakhsh-Sabet, N. Zand, Y. Zhang, and A. Abedi, “Artificial

intelligence transforms the future of health care,” The American Journal

of Medicine, pp. 795–801, 2019, doi 10.1016/j.amjmed.2019.01.017.

M. Khushi, K. Shaukat, T. M. Alam, I. A. Hameed, S. Uddin, S. Luo,

X. Yang, and M. C. Reyes, “A comparative performance analysis of data

resampling methods on imbalance medical data,” IEEE Access, vol. 9,

pp. 109 960–109 975, 2021, doi 10.1109/ACCESS.2021.3102399.

Centro del Conocimiento Bioético, “Comisión nacional de bioética ::

México,” 2015.

M. Bach, A. Werner, J. ̇Zywiec, and W. Pluskiewicz, “The study of

under- and over-sampling methods’ utility in the analysis of highly

imbalanced data on osteoporosis,” Medical Science Monitor, pp. 174–

, 2017, doi 10.1016/j.ins.2016.09.038.

M. P. Reddy, S. J. Fox, and Purohit, “Artificial intelligence-enabled

healthcare delivery,” Journal of the Royal Society of Medicine, vol. 112,

no. 1, pp. 22–28, 2019, doi 10.1177/01410768188155.

J. Xiao, C. Choi, and J. Sun, “Opportunities and challenges in developing

deep learning models using electronic health records data: a systematic

review,” Journal of Medical Systems, 2018, doi 10.1093/jamia/ocy068.

A. Fernández, V. López, M. Galar, M. J. Del Jesus, and

F. Herrera, “Analysing the classification of imbalanced data-sets

with multiple classes: Binarization techniques and ad-hoc ap-

proaches,” Knowledge-Based Systems, vol. 42, pp. 97–110, 2013, doi

1016/j.knosys.2013.01.018.

B. Krawczyk, “Learning from imbalanced data: open challenges and

future directions,” Progress in Artificial Intelligence, vol. 5, no. 4, pp.

–232, 2016, doi 10.1007/s13748-016-0094-0.

W. L. Wong, X. Su, X. Li, C. M. G. Cheung, R. Klein, C.-Y. Cheng, and

T. Y. Wong, “Global prevalence of age-related macular degeneration and

disease burden projection for 2020 and 2040: a systematic review and

meta-analysis,” The Lancet. Global health, vol. 2, no. 2, pp. e106–16,

February 2014, doi 10.1016/S2214-109X(13)70145-1.

INEGI, “Banco de indicadores - ixtacamaxtitlán,” 2016.

A. Jimenez-Corona and E. Graue-Hernandez, “Global prevalence and

years lived with disability (ylds) due to vision loss in mexico in 2016,”

Investigative Ophthalmology and Visual Science, vol. 59, no. 9, 2018.

T. A. Sivakumaran, R. P. Igo, J. M. Kidd, A. Itsara, L. J. Kopplin,

W. Chen, S. A. Hagstrom, N. S. Peachey, P. J. Francis, M. L. Klein,

E. Y. Chew, V. L. Ramprasad, W. T. Tay, P. Mitchell, M. Seielstad, D. E.

Stambolian, A. O. Edwards, K. E. Lee, D. V. Leontiev, G. Jun, Y. Wang,

L. Tian, F. Qiu, A. K. Henning, T. LaFramboise, P. Sen, M. Aarthi,

R. George, R. Raman, M. K. Das, L. Vijaya, G. Kumaramanickavel, T. Y.

Wong, A. Swaroop, G. R. Abecasis, R. Klein, B. E. K. Klein, D. A. Nick-

erson, E. E. Eichler, and S. K. Iyengar, “A 32 kb critical region excluding

y402h in cfh mediates risk for age-related macular degeneration,” PLoS

ONE, vol. 6, no. 10, 2011, doi 10.1371/journal.pone.0209943.

E. M. Stone, A. J. Aldave, A. V. Drack, M. W. MacCumber,

V. C. Sheffield, E. Traboulsi, and R. G. Weleber, “Recommendations

for genetic testing of inherited eye diseases: Report of the ameri-

can academy of ophthalmology task force on genetic testing,” Oph-

thalmology, vol. 119, no. 11, pp. 2408–2410, November 2012, doi

1016/j.ophtha.2012.05.047.

L. Hindorff, J. MacArthur, J. HA, H. PN, K. AK, and M. TA, “Catalog

of published genome-wide association studies - national human genome

research institute (nhgri),” 2014.

R. T. Yanagihara, C. S. Lee, D. S. W. Ting, and A. Y. Lee, “Method-

ological challenges of deep learning in optical coherence tomography for

retinal diseases: a review,” Translational Vision Science & Technology,

vol. 9, no. 2, pp. 11–11, 2020, doi 10.1167/tvst.9.2.11.

T. M. Alam, K. Shaukat, I. A. Hameed, W. A. Khan, M. U. Sarwar,

F. Iqbal, and S. Luo, “A novel framework for prognostic factors iden-

tification of malignant mesothelioma through association rule mining,”

Biomedical Signal Processing and Control, vol. 68, p. 102726, 2021,

doi 10.1016/j.bspc.2021.102726.

P. Cacheiro Martínez, J. M. Ordovás, and D. Corella, “Métodos de

selección de variables en estudios de asociación genética. aplicación a un

estudio de genes candidatos en enfermedad de parkinson,” Universidad

de Santiago de Compostela, Coruña, España, Tech. Rep., 2011.

R. Iniesta, E. Guinó, and V. Moreno, “Análisis estadístico de polimorfis-

mos genéticos en estudios epidemiológicos,” Gaceta Sanitaria, vol. 19,

no. 4, pp. 333–341, 2005.

M. Zhang and P. N. Baird, “A decade of age-related macular degen-

eration risk models: What have we learned from them and where are

we going?” Ophthalmic Genetics, vol. 00, pp. 1–7, November 2016, doi

1080/13816810.2016.1227451.

L. Sobrin and J. M. Seddon, “Nature and nurture-genes and

environment-predict onset and progression of macular degeneration,”

Progress in retinal and eye research, vol. 40, pp. 1–15, 2014, doi

1016/j.preteyeres.2013.12.004.

C. Castaneda, K. Nalley, C. Mannion, P. Bhattacharyya, P. Blake,

A. Pecora, A. Goy, and K. S. Suh, “Clinical decision support systems

for improving diagnostic accuracy and achieving precision medicine,”

Journal of Clinical Bioinformatics, vol. 5, no. 1, p. 4, 2015, doi

1186/s13336-015-0019-3.

A. Martínez-Velasco, L. Martínez-Villaseñor, A. C. Perez-Ortiz, J. C.

Zenteno, A. B. Luna-Angulo, A. R. Villa-Romero, A. Rendon, F. J.

Estrada, L. Martínez-Villasenor, L. Miralles-Pechuan, A. Rendon, and

F. J. Estrada-Mena, “Cfh and htra1 genes associated with amd in

mexican population,” Investigative Ophthalmology & Visual Science,

vol. 58, no. 8, p. 2268, 2017, doi 10.13140/RG.2.2.10175.61609.

P. Larrañaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A.

Lozano, R. Armañanzas, G. Santafé, A. Pérez, and V. Robles, “Machine

learning in bioinformatics,” Briefings in Bioinformatics, vol. 7, no. 1, pp.

–112, March 2006, doi 10.1016/B978-0-323-89775-4.00020-1.

K. L. Spencer, L. M. Olson, N. Schnetz-Boutaud, P. Gallins, A. Agarwal,

A. Iannaccone, S. B. Kritchevsky, M. Garcia, M. A. Nalls, A. B. New-

man, W. K. Scott, M. A. Pericak-Vance, and J. L. Haines, “Using genetic

variation and environmental risk factor data to identify individuals at

high risk for age-related macular degeneration,” PLoS ONE, vol. 6, no. 3,

p. e17784, March 2011, doi 10.1371/journal.pone.0017784.

R. Jiang, W. Tang, X. Wu, and W. Fu, “A random forest approach to the

detection of epistatic interactions in case-control studies,” BMC Bioin-

formatics, vol. 10, no. Suppl 1, p. S65, January 2009, doi 10.1186/1471-

-10-S1-S65.

B. Gold, J. C. J. E. Merriam, J. Zernant, L. S. Hancox, A. J. Taiber,

K. Gehrs, K. Cramer, J. Neel, J. Bergeron, G. R. Barile, R. T. Smith,

G. S. Hageman, M. Dean, R. Allikmets, S. Chang, L. A. Yannuzzi,

I. Barbazetto, L. E. Lerner, S. Russell, J. Hoballah, J. Hageman, and

H. Stockman, “Variation in factor b (bf) and complement component 2

(c2) genes is associated with age-related macular degeneration,” Nature

Genetics, vol. 38, no. 4, pp. 458–462, April 2006, doi 10.1038/ng1750.

X. Chen, C.-T. Liu, M. Zhang, and H. Zhang, “A forest-based approach

to identifying gene and gene-gene interactions,” Proceedings of the

National Academy of Sciences of the United States of America, vol.

, no. 49, pp. 19 199–19 203, 2007, doi 10.1073/pnas.0709868104.

A. Çelebiler, H. Seker, B. YÜKSEL, A. Orun, S. Bilgili, and M. B.

Karaca, “Discovery of the connection among age-related macular de-

generation, mthfr c677t and pai 1 4g/5g gene polymorphisms, and body

mass index by means of bayesian inference methods,” Turkish Journal

of Electrical Engineering and Computer Sciences, vol. 21, no. 7, pp.

–2078, 2013, doi 10.3906/elk-1111-21.

P. Fraccaro, M. Nicolo, M. Bonetto, M. Giacomini, P. Weller, C. E.

Traverso, M. Prosperi, D. OSullivan, and D. OSullivan, “Combining

macula clinical signs and patient characteristics for age-related macular

degeneration diagnosis: a machine learning approach,” BMC Ophthal-

mology, vol. 15, p. 10, January 2015, doi 10.1186/1471-2415-15-10.

S. Krishnaiah, B. Surampudi, and J. Keeffe, “Modeling the risk of

age-related macular degeneration and its predictive comparisons in a

population in south india,” International Journal of Community Medicine

and Public Health, vol. 2, no. 2, p. 137, 2015, doi 10.5455/2394-

ijcmph20150514.

K. E. Walker, Evaluation of Children’s Futures: Improving Health and

Development Outcomes for Children in Trenton, New Jersey, 2001-2005,

, inter-University Consortium for Political and Social Research.

M. Sircar, R. Thadhani, and S. A. Karumanchi, “Pathogenesis

of preeclampsia,” Current Opinion in Nephrology and

Hypertension, vol. 24, no. 2, pp. 131–138, 2015, doi

1097/MNH.0000000000000105.

ACOG Committee on Practice Bulletins–Obstetrics, “ACOG practice

bulletin. Diagnosis and management of preeclampsia and eclampsia.

Number 33, January 2002,” Obstetrics and Gynecology, vol. 99, no. 1,

pp. 159–167, January 2002, doi 10.1016/s0029-7844(01)01747-1.

A. R. Vest and L. S. Cho, “Hypertension in pregnancy,” Current

atherosclerosis reports, vol. 16, pp. 1–11, 2014, doi 10.1007/s11883-

-0395-8.

M. A. Kohn, C. R. Carpenter, and T. B. Newman, “Understanding

the direction of bias in studies of diagnostic test accuracy,” Aca-

demic Emergency Medicine, vol. 20, no. 11, pp. 1194–1206, 2013, doi

1111/acem.12255.

P. M. M. Bossuyt, “Clinical validity: Defining biomarker per-

formance,” Scandinavian Journal of Clinical and Laboratory In-

vestigation, vol. 70, no. sup242, pp. 46–52, January 2010, doi

3109/00365513.2010.493383.

L. C. Kenny, W. B. Dunn, D. I. Ellis, J. Myers, P. N. Baker, and D. B.

Kell, “Novel biomarkers for pre-eclampsia detected using metabolomics

and machine learning,” Metabolomics, vol. 1, no. 3, pp. 227–234, 2005,

doi 10.1007/s11306-005-0003-1.

C. K. Neocleous, P. Anastasopoulos, K. H. Nikolaides, C. N.

Schizas, and K. C. Neokleous, “Neural networks to estimate the

risk for preeclampsia occurrence,” in Proceedings of the International

Joint Conference on Neural Networks, 2009, pp. 2221–2225, doi

1109/IJCNN.2009.5178820.

M. Espinilla, J. Medina, A.-L. García-Fernández, S. Campaña, and

J. Londoño, “Fuzzy intelligent system for patients with preeclampsia

in wearable devices,” Mobile Information Systems, pp. 1–10, October

, doi 10.1155/2017/7838464.

M. Velikova, J. T. Van Scheltinga, P. J. Lucas, and M. Spaander-

man, “Exploiting causal functional relationships in bayesian network

modelling for personalised healthcare,” International Journal of Ap-

proximate Reasoning, vol. 55, no. 1 PART 1, pp. 59–73, 2014, doi

1016/j.ijar.2013.03.016.

E. Tejera, M. Jose Areias, A. Rodrigues, A. Rama, J. Manuel

Nieto-Villar, and I. Rebelo, “Artificial neural network for normal,

hypertensive, and preeclamptic pregnancy classification using mater-

nal heart rate variability indexes,” Journal of Maternal-Fetal and

Neonatal Medicine, vol. 24, no. 9, pp. 1147–1151, 2011, doi

3109/14767058.2010.545916.

P. M. Villa, P. Marttinen, J. Gillberg, A. I. Lokki, K. Majander,

M. R. Ordén, P. Taipale, A. Pesonen, K. Räikkönen, E. Hämäläinen,

E. Kajantie, and H. Laivuori, “Cluster analysis to estimate the risk of

preeclampsia in the high-risk prediction and prevention of preeclampsia

and intrauterine growth restriction (predo) study,” PLoS ONE, vol. 12,

no. 3, pp. 1–14, 2017, doi 10.1371/journal.pone.0174399.

M. W. Moreira, J. J. Rodrigues, A. M. Oliveira, R. F. Ramos, and

K. Saleem, “A preeclampsia diagnosis approach using bayesian net-

works,” in 2016 IEEE International Conference on Communications

(ICC), 2016, pp. 1–5.

P. Fergus, C. C. Montanez, B. Abdulaimma, P. Lisboa, C. Chalmers,

and B. Pineles, “Utilizing deep learning and genome-wide associa-

tion studies for epistatic-driven preterm birth classification in african-

american women,” IEEE/ACM transactions on computational biology

and bioinformatics, vol. 17, no. 2, pp. 668–678, 2018.

R. Mehta, N. Bhatt, and A. Ganatra, “A survey on data mining technolo-

gies for decision support system of maternal care domain,” International

Journal of Computer Applications, vol. 138, no. 10, pp. 975–8887, 2016,

doi 10.5120/ijca2016908965.

G. Kovács, “Smote-variants: A python implementation of 85 minority

oversampling techniques,” Neurocomputing, vol. 366, pp. 352–354,

November 2019, doi 10.1016/j.neucom.2019.06.100.

J. Luengo, A. Fernández, S. García, and F. Herrera, “Addressing data

complexity for imbalanced data sets: analysis of smote-based oversam-

pling and evolutionary undersampling,” Soft Computing, vol. 15, no. 10,

pp. 1909–1936, 2011, doi 10.1007/s00500-010-0625-8.

G. E. A. P. Batista, R. C. Prati, and M. C. Monard, “A study of the

behavior of several methods for balancing machine learning training

data,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, p. 20, June

, doi 10.1145/1007730.1007735.

B. Zhu, Z. Gao, J. Zhao, and S. K. vanden Broucke, “Iric: An r library

for binary imbalanced classification,” SoftwareX, vol. 10, p. 100341,

K.-J. Wang, A. M. Adrian, K.-H. Chen, and K.-M. Wang, “A hybrid

classifier combining borderline-smote with airs algorithm for estimating

brain metastasis from lung cancer: A case study in taiwan,” Computer

methods and programs in biomedicine, vol. 119, no. 2, pp. 63–76, 2015,

doi 10.1016/j.cmpb.2015.03.003.

X. Y. Liu and Z. H. Zhou, “The influence of class imbalance on

cost-sensitive learning: An empirical study,” in Proceedings - IEEE

International Conference on Data Mining, ICDM, 2006, pp. 970–974,

doi 10.1109/ICDM.2006.158.

S. H. Khan, M. Hayat, M. Bennamoun, F. A. Sohel, and R. Togneri,

“Cost-sensitive learning of deep feature representations from imbalanced

data,” IEEE Transactions on Neural Networks and Learning Systems,

vol. 29, no. 8, pp. 3573–3587, August 2018.

G. M. Weiss and F. Provost, “Learning when training data are costly:

The effect of class distribution on tree induction,” Journal of Artificial

Intelligence Research, vol. 19, pp. 315–354, 2007, doi 10.1613/jair.1199.

K. McCarthy, B. Zabar, and G. Weiss, “Does cost-sensitive learning

beat sampling for classifying rare classes?” in Proceedings of the 1st

International Workshop on Utility-Based Data Mining, UBDM ’05,

, pp. 69–77, doi 10.1145/1089827.1089836.

N. Japkowicz and S. Stephen, “The class imbalance problem: A system-

atic study,” Intelligent Data Analysis, vol. 6, no. 5, pp. 429–449, 2002,

doi 10.3233/IDA-2002-6504.

M. D. Alanis Tamez, “Prediagnóstico de enfermedades crónicas medi-

ante algoritmos de cómputo inteligente,” Ph.D. dissertation, CIC, IPN,

, doi 10.13053/cys-24-3-3492.

B. U. A. M. Moreno, “Sistema de clasificación paralelo basado en un

ensamble de tipo mezcla de expertos,” Ph.D. dissertation, Universidad

Autónoma Metropolitana, 2017.

Y. Xia, K. Chen, and Y. Yang, “Multi-label classification with weighted

classifier selection and stacked ensemble,” Information Sciences, 2020,

doi 10.1016/j.ins.2020.06.017.

J. Diez-Pastor, J. J. Rodríguez, C. García-Osorio, and L. Kuncheva,

“Diversity techniques improve the performance of the best imbalance

learning ensembles,” Information Sciences, vol. 325, pp. 98–117, 2015,

doi 10.1016/j.ins.2015.07.025.

L. J. Mena, E. E. Orozco, V. G. Felix, R. Ostos, J. Melgarejo,

and G. E. Maestre, “Machine learning approach to extract diagnostic

and prognostic thresholds: Application in prognosis of cardiovascular

mortality,” Computational and Mathematical Methods in Medicine, vol.

, 2012, doi 10.1155/2012/750151.

J. Li, Y. Liu, and Q. Li, “Intelligent fault diagnosis of rolling bearings

under imbalanced data conditions using attention-based deep learning

method,” Measurement, vol. 189, p. 110500, 2022, doi 10.1088/1742-

/2369/1/012001.

L. I. Santos, M. O. Camargos, M. F. S. V. D’Angelo, J. B. Mendes,

E. E. C. de Medeiros, A. L. S. Guimarães, and R. M. Palhares, “Decision

tree and artificial immune systems for stroke prediction in imbalanced

data,” Expert Systems with Applications, vol. 191, p. 116221, 2022.

X. Peng, X. Jin, S. Duan, and C. Sankavaram, “Active learning-

assisted semi-supervised learning for fault detection and diagnostics with

imbalanced dataset,” IISE Transactions, vol. 55, no. 7, pp. 672–686,

, doi 10.1080/24725854.2022.2074579.

H. Ullah, M. B. B. Heyat, F. Akhtar, A. Y. Muaad, C. C. Ukwuoma,

M. Bilal, M. H. Miraz, M. A. S. Bhuiyan, K. Wu, R. Damaševiˇcius et al.,

“An automatic premature ventricular contraction recognition system

based on imbalanced dataset and pre-trained residual network using

transfer learning on ecg signal,” Diagnostics, vol. 13, no. 1, p. 87, 2023,

doi 10.3390/diagnostics13010087.

M. Mazur-Milecka, N. Kowalczyk, K. Jaguszewska, D. Zamkowska,

D. Wójcik, K. Preis, H. Skov, S. Wagner, P. Sandager, M. Sobotka

et al., “Preeclampsia risk prediction using machine learning methods

trained on synthetic data,” in Polish Conference on Biocybernetics and

Biomedical Engineering. Springer, 2023, pp. 267–281.

V. P. Kovacheva, B. W. Eberhard, R. Y. Cohen, M. Maher, R. Saxena,

and K. J. Gray, “Prediction of preeclampsia from clinical and genetic

risk factors in early and late pregnancy using machine learning and

polygenic risk scores,” MedRxiv, pp. 2023–02, 2023.

A. R. Chłopowiec, K. Karanowski, T. Skrzypczak, M. Grzesiuk, A. B.

Chłopowiec, and M. Tabakov, “Counteracting data bias and class imbal-

ance—towards a useful and reliable retinal disease recognition system,”

Diagnostics, vol. 13, no. 11, p. 1904, 2023.

Y. Xie, Q. Wan, H. Xie, Y. Xu, T. Wang, S. Wang, and B. Lei,

“Fundus image-label pairs synthesis and retinopathy screening via gans

with class-imbalanced semi-supervised learning,” IEEE Transactions on

Medical Imaging, 2023, doi.

Y. A. Veturi, W. Woof, T. Lazebnik, I. Moghul, P. Woodward-Court,

S. K. Wagner, T. A. C. de Guimarães, M. D. Varela, B. Liefers, P. J.

Patel et al., “Syntheye: Investigating the impact of synthetic data on

artificial intelligence-assisted gene diagnosis of inherited retinal disease,”

Ophthalmology Science, vol. 3, no. 2, p. 100258, 2023, doi.

K.-J. Wang, A. M. Adrian, K.-H. Chen, and K.-M. Wang, “A hybrid

classifier combining borderline-smote with airs algorithm for estimating

brain metastasis from lung cancer: a case study in taiwan,” Computer

Methods and Programs in Biomedicine, vol. 119, no. 2, p. 63—76, April

, doi 10.1016/j.cmpb.2015.03.003.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer,

“Smote: Synthetic minority over-sampling technique,” Journal of Ar-

tificial Intelligence Research, vol. 16, pp. 321–357, January 2002, doi

5555/1622407.1622416.

V. Palodeto, H. Terenzi, and J. L. B. Marques, “Training neural networks

for protein secondary structure prediction: the effects of imbalanced data

set,” in International Conference on Intelligent Computing, 2009, pp.

–265.

R. Blagus and L. Lusa, “Smote for high-dimensional class-imbalanced

data,” BMC Bioinformatics, vol. 14, 2013, doi 10.1186/1471-2105-14-

F. R. Torres, J. A. Carrasco-Ochoa, and J. F. Martínez-Trinidad, “Smote-

d a deterministic version of smote,” in Pattern Recognition: 8th Mexican

Conference, MCPR 2016, Guanajuato, Mexico, June 22-25, 2016. Pro-

ceedings 8. Springer, 2016, pp. 177–188.

H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE

Transactions on knowledge and data engineering, vol. 21, no. 9, pp.

–1284, 2009.

A. J. Mohammed, M. M. Hassan, and D. H. Kadir, “Improving clas-

sification performance for a novel imbalanced medical dataset using

smote method,” International Journal of Advanced Trends in Computer

Science and Engineering, vol. 9, no. 3, pp. 3161–3172, May 2020, doi

30534/ijatcse/2020/104932020.

Z. Yao, Y. Yuan, Z. Shi, W. Mao, G. Zhu, G. Zhang, and Z. Wang,

“Funswin: A deep learning method to analysis diabetic retinopathy

grade and macular edema risk based on fundus images,” Frontiers in

Physiology, vol. 13, p. 961386, 2022.

M. S. Khan, N. Tafshir, K. N. Alam, A. R. Dhruba, M. M. Khan,

A. A. Albraikan, F. A. Almalki et al., “Deep learning for ocular disease

recognition: an inner-class balance,” Computational Intelligence and

Neuroscience, vol. 2022, 2022, doi 10.1155/2022/5007111.

T. M. Alam, K. Shaukat, W. A. Khan, I. A. Hameed, L. A. Almuqren,

M. A. Raza, M. Aslam, and S. Luo, “An efficient deep learning-based

skin cancer classifier for an imbalanced dataset,” Diagnostics, vol. 12,

no. 9, p. 2115, 2022, doi 10.3390/diagnostics12092115.

A. M. Sowjanya and O. Mrudula, “Effective treatment of imbalanced

datasets in health care using modified smote coupled with stacked deep

learning algorithms,” Applied Nanoscience, vol. 13, no. 3, pp. 1829–

, 2023, doi 10.1007/s13204-021-02063-4.

K. Koc, Ö. Ekmekcio ̆glu, and A. P. Gurgun, “Prediction of construction

accident outcomes based on an imbalanced dataset through integrated

resampling techniques and machine learning methods,” Engineering,

Construction and Architectural Management, 2022, doi 10.1108/ECAM-

-2022-0305.

X.-w. Chen and M. Wasikowski, “Fast: a roc-based feature selection

metric for small samples and imbalanced data classification problems,”

in Proceedings of the 14th ACM SIGKDD international conference

on Knowledge discovery and data mining, 2008, pp. 124–132, doi

1145/1401890.1401910.

Z. P. Agusta et al., “Modified balanced random forest for im-

proving imbalanced data prediction,” International Journal of Ad-

vances in Intelligent Informatics, vol. 5, no. 1, pp. 58–65, 2019, doi

26555/ijain.v5il.255.

L. Breiman, “Random forests,” Machine learning, vol. 45, pp. 5–32,

J. H. Friedman, “Greedy function approximation: a gradient boosting

machine,” Annals of statistics, pp. 1189–1232, 2001.

J. R. Quinlan, “Induction of decision trees,” Machine learning, vol. 1,

pp. 81–106, 1986, doi 10.1007/BF00116251.

D. R. Cox, “The regression analysis of binary sequences,” Journal of

the Royal Statistical Society Series B: Statistical Methodology, vol. 20,

no. 2, pp. 215–232, 1958, doi 10.1111/j.2517-6161.1959.tb00334.x.

T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE

transactions on information theory, vol. 13, no. 1, pp. 21–27, 1967, doi

1109/TIT.1967.1053964.

C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning,

vol. 20, pp. 273–297, 1995, doi 10.1007/BF00994018.

M. L. Calle and V. Urrea, “Stability of random forest importance

measures,” Briefings in bioinformatics, vol. 12, no. 1, pp. 86–89, 2011,

doi 10.1093/bib/bbq011.

M. S. Shelke, P. R. Deshmukh, and V. K. Shandilya, “A review

on imbalanced data handling using undersampling and oversampling

technique,” Int. J. Recent Trends Eng. Res, vol. 3, no. 4, pp. 444–449,

Published

2024-09-29

How to Cite

Martinez-Velasco, A., Martínez -Villaseñor, L., & Miralles-Pechuán, L. (2024). Addressing Class Imbalance in Healthcare Data: Machine Learning Solutions for Age-Related Macular Degeneration and Preeclampsia. IEEE Latin America Transactions, 22(10), 806–820. Retrieved from https://latamt.ieeer9.org/index.php/transactions/article/view/8952