Automatic Cyberbullying Detection: a Mexican case in High School and Higher Education students

Authors

Keywords:

Bullying, Cyberbullying, Machine learning, Social networks, Deep learning

Abstract

The social interaction among young students has been partially or totally transformed to mobile-based communication, specifically through the use of social networks. This new communication environment has allowed a more immediate, diverse and massive interaction, offering a faster and more effective situation when carrying out academic and recreational activities. However, this scenario has also promoted the phenomenon of social harassment known as bullying, exponentially increasing its scope and diversifying the types and forms of aggression. Machine learning and natural language processing techniques have been used to create models that detect bullying situations among students, using data corpus from mainly public social networks. However, generally, these data sources are not representative of the social networks commonly used by the students; generating classification models that do not consider the vocabulary used by this social group. This article describes the methodology used to create a representative data corpus of the interaction between Mexican high school and university students, and a comparative analysis on characteristics that influence the quality of the content of a corpus in this domain. In addition, the performance achieved by implementing various machine learning models to identify bullying situations is presented. The best result is reported for the Naive Bayesian classifier (F1-Score of 0.862), performing better than models based on deep learning such as Recurrent (F1-Score of 0.845) and Convolutional (F1-Score of 0.807) Neural Networks.

Downloads

Download data is not yet available.

Author Biographies

Karla Ivette Arce-Ruelas, Facultad de Ingeniería, Arquitectura y Diseño, Universidad Autónoma de Baja California (UABC)

Is a Ph.D. student in the Science and Engineering program at Universidad Autónoma de Baja California; she obtained Bachelor's and Master's in Computer Science from the same institution. His research interest focuses on using technology to support early childhood education, the knowledge model, Natural Language Processing, and Machine Learning.

Omar Alvarez-Xochihua, Universidad Autónoma de Baja California

Received   his Ph.D. degree in Computer Science from TexasA&M  University,  USA.  Currently is a Professor of Computer Science at Universidad Autónoma de Baja California, México. He is conducting research activities in the areas of educational technology, knowledge representation and natural language processing.

Luis Pellegrin, Facultad de Ciencias, Universidad Autónoma de Baja California

Is a full-time professor and researcher of Computer Science in the Faculty of Sciences at the Universidad Autónoma de Baja California (UABC). In 2008, he earned a M.Sc. degree in Artificial Intelligence in the Universidad Veracruzana. And in 2017, he received the Ph.D. in Computer Sciences from the Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE). His research interest is in the vision and language area, including image annotation, automatic generation of sentences, neural networks, and machine learning focus in representation, indexing and automatic analysis of multimodal data

Liliana Cardoza-Avendaño, Facultad de Ingeniería, Arquitectura y Diseño, Universidad Autónoma de Baja California (UABC)

Electrical Engineer from Universidad Autónoma de Baja California, México in 2005,  PhD in science from Universidad Autónoma de Baja California, México in 2012, Member of National System of researchers level I from 2014 to the present, publications in 11 journals indexed in JCR, participation in 8 national and international congresses.

José Ángel González-Fraga, Facultad de Ciencias, Universidad Autónoma de Baja California

Received his BSc degree in electrical engineering from Universidad Autónoma de San Luis Potosí (UASLP),México,  in  2002  and  his  MSc  and  PhD  degrees in computer science from Centro de Investigación Científica y de Educación  Superior  de  Ensenada(CICESE),  México, in 2004 and 2007, respectively. He is currently a full-time professor at Universidad Autónoma de Baja California. His re-search interests include pattern recognition, adaptive image processing and robot vision.

References

A. Loredo-Abdalá, A. Perea-Martínez, & G. López-Navarrete, “‘Bullying’: acoso escolar. La violencia entre iguales. Problemática real en adolescentes”, Acta Pediátrica de México, vol. 29, núm. 4, pp. 210–4, 2008.

X. Garcia Continente, A. Pérez Giménez, & M. Nebot Adell, “Factores relacionados con el acoso escolar (bullying) en los adolescentes de Barcelona”, Gaceta Sanitaria, vol. 24, núm. 2, pp. 103–108, 2010.

L. E. C. Benavides, “Una propuesta para identificar, clasificar y tipificar el Bullying (Acoso Escolar)”. Revista Iberoamericana para la Investigación y el Desarrollo Educativo ISSN:2007-2619, núm 10, 2015.

D. Lessne & C. Yanez, “Student Reports of Bullying: Results from the 2015 School Crime Supplement to the National Crime Victimization Survey”. Web Tables. NCES 2017-015. National Center for Education Statistics, 2016.

K. L. Modecki, J. Minchin, A. G. Harbaugh, N. G. Guerra & K. C. Runions, “Bullying prevalence across contexts: A meta-analysis measuring cyber and traditional bullying”. Journal of Adolescent Health, vol. 55, núm. 5, pp. 602-611, 2014.

M. A. Al-Garadi, M. R. Hussain, N. Khan, G. Murtaza, H. F. Nweke, I. Ali & A. Gani, “Predicting cyberbullying on social media in the big data era using machine learning algorithms: review of literature and open challenges”. IEEE Access, vol. 7, pp. 70701-70718, 2019.

H. Rosa, N. Pereira, R. Ribeiro, P.C. Ferreira, J. P. Carvalho, S. Oliveira & I. Trancoso, “Automatic cyberbullying detection: A systematic review”. Computers in Human Behavior, vol. 93, pp. 333-345, 2019.

K. Dinakar, B. Jones, C. Havasi, H. Lieberman & R. Picard, “Common sense reasoning for detection, prevention, and mitigation of cyberbullying”. In: ACM Transactions on Interactive Intelligent Systems (TiiS), vol. 2, núm. 3, pp. 18, 2012.

J. M. Xu, K. S. Jun, X. Zhu & A. Bellmore, “Learning from Bullying Traces in Social Media”. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. NAACL HLT ‘12, pp. 656–666, 2012.

C. Núñez-Prado, L. Chanona-Hernández & G. Sidorov, “Generation of a Corpus in Spanish with Aggressive Expressions”. Research in Computing Science 149(8), pp. 1055-1060, 2020.

M. A. Aragón, M. Álvarez-Carmona, M. Montes-y-Gómez, H. J. Escalante, L. Villaseñor-Pineda & D. Moctezuma, “Overview of MEX-A3T at IberLEF 2019: Authorship and aggressive analysis in Mexican Spanish tweets”. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), pp. 478-494, 2019.

M. E. Aragón, H. Jarquín-Vásquez, M. Montes-y-Gómez, H. J. Escalante, L. Villaseñor-Pineda, H. Gómez-Adorno, J. P. Posadas-Durán & G. Bel-Enguix, “Overview of MEX-A3T at IberLEF 2020: Fake News and Aggressive analysis in Mexican Spanish”. In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020), pp. 222-235, 2020.

F. Cerezo, “Bullying: análisis de la situación en las aulas españolas”. International Journal of Psychology and Psychological Therapy, vol. 9, núm. 3, pp. 383-394, 2009.

V. Balakrishnan, S. Khan, T. Fernandez & H. R. Arabnia, “Cyberbullying detection on twitter using Big Five and Dark Triad features”. Personality and individual differences, vol. 141, pp. 252-257, 2019.

V. Banerjee, J. Telavane, P. Gaikwad & P. Vartak, “Detection of Cyberbullying Using Deep Neural Network”. In 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS), IEEE, pp. 604-607, 2019.

F. Tapia, C. Aguinaga & R. Luje, “Detection of Behavior Patterns through Social Networks like Twitter, using Data Mining techniques as a method to detect Cyberbullying”. In 2018 7th International Conference on Software Process Improvement (CIMPS), IEEE, pp. 111-118, 2018.

C. Chelmis & M. Yao, “Minority Report: Cyberbullying Prediction on Instagram”. In Proceedings of the 10th ACM Conference on Web Science, pp. 37-45, 2019.

A. Kumar & G. Garg, “Sentiment analysis of multimodal twitter data”. Multimedia Tools and Applications, vol.78, núm.17, pp.24103-24119, 2019.

D. Mouheb, M. H. Abushamleh, Z. Al Aghbari & I. Kamel, “Real-time detection of cyberbullying in arabic twitter streams”. In 2019 10th IFIP International Conference on New Technologies, Mobility and Security (NTMS), IEEE, pp. 1-5, 2019.

L. Cheng, R. Guo & H. Liu, “Robust cyberbullying detection with causal interpretation”. In Companion Proceedings of the 2019 World Wide Web Conference, pp. 169-175, 2019.

L. Cheng, J. Li, Y. N. Silva, D. L. Hall & H. Liu, “Xbully: Cyberbullying detection within a multi-modal context”. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 339-347, 2019.

M. Yao, C. Chelmis & D. S. Zois, “Cyberbullying 0065nds here: Towards robust detection of cyberbullying in social media”. In: The World Wide Web Conference, pp. 3427-3433, 2019.

N. S. Samghabadi, A. P. L. Monroy & T. Solorio, “Detecting Early Signs of Cyberbullying in Social Media”. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp. 144-149, 2020.

K. Wang, Q. Xiong, C. Wu, M. Gao & Y. Yu, “Multi-modal cyberbullying detection on social networks”. In 2020 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1-8, 2020.

M. Fortunatus, P. Anthony & S. Charters, “Combining textual features to detect cyberbullying in social media posts”. Procedia Computer Science, vol. 176, pp. 612-621, 2020.

24 D. Van Bruwaene, Q. Huang & D. Inkpen, “A multi-platform dataset for detecting cyberbullying in social media”. Language Resources and Evaluation, pp. 1-24, 2020.

O. Maimon & L. Rokach, “Data mining and knowledge discovery handbook”, 2015.

Instituto Nacional de Estadística y Geografía (INEGI). “Encuesta de cohesión social para la prevención de la violencia y la delincuencia 2014”, 2014.

M. Kocatürk & T. Türk-Kurtça, Moral Disengagement, “Attitudes Towards Violence and Irrational Beliefs as Predictors of Bullying Cognition in Adolescence”. International Education Studies, vol. 13, núm. 10, 2020.

A. Reisen, M. C. Viana & E. T. dos Santos Neto, “Adverse childhood experiences and bullying in late adolescence in a metropolitan region of Brazil”. Child abuse & neglect, vol. 92, pp. 146-156, 2019.

S.C. Satapathy, A. Govardhan, K.S. Raju & J. K. Mandal, “Emerging ICT for Bridging the Future”. Proceedings of the 49th Annual Convention of the Computer Society of India (CSI) Volume 1(Vol.337), Springer, 2014.

N. Kurniasih, E. Kuswarno, A. Yanto & T. Suganda, “Science Mapping for Popular Topics in Cyberbullying Prevention Articles”. Library Philosophy and Practice (e-journal), pp. 1-10, 2020.

N. S. Ansary, “Cyberbullying: Concepts, theories, and correlates informing evidence-based best practices for prevention”. Aggression and violent behavior, vol. 50, pp. 101343, 2020.

N. Berdugo Gómez, “Factores que influyen en la violencia escolar o bullying en adolescentes” [Tesis de pregrado, Universidad Cooperativa de Colombia]. Repositorio Institucional UCC. 2020. https://repository.ucc.edu.co/handle/20.500.12494/18382

R. Ruiz-Ramírez, A. Pérez-Olvera, E. Zapata-Martelo & B. Martínez-Corona,” Análisis del bullying en tres escuelas del nivel medio superior”. CPU-e, Revista de Investigación Educativa, vol. 31, pp. 28-50, 2020.

K. N. M. Marín & J. G. C. Coob, “Psychometric properties and results of the school violence and bullying scale: how to distinguish bullying and school violence”. Revista Electrónica de Psicología Iztacala, vol. 23, núm. 3, pp. 984-1014, 2020.

D. Garnacho. “Dataset de Sentimientos en Español”. https://github.com/garnachod/TwitterSentimentDataset

J. Arce. “Listado general de palabras en español”. https://github.com/javierarce/palabras/find/master

J. A. Varela, F. Cabrera, D. Zarabozo, Y. Larios & M. González, “Las 5000 palabras más frecuentes en los libros de texto oficiales de la educación básica en México”. Revista Electrónica de Investigación Educativa, Vol.15, no.3, pp.114-123, 2013. Recuperado de http://redie.uabc.mx/vol15no3/contenido-varelaetal.html

A. Karpathy, L. Fei-Fei, “Deep Visual-Semantic Alignments for Generating Image Descriptions”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3128-3137, 2015.

R. Pascanu, T. Mikolov & Y. Bengio, “On the difficulty of training recurrent neural networks”. In: Proceedings of the 30th International Conference on Machine Learning, in PMLR vol. 28, no.3, pp. 1310-1318, 2013.

Y. Kim, “Convolutional Neural Networks for Sentence Classification”. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751, 2014.

M. Salama, H. A. Kader & A. Abdelwahab, “An analytic framework for enhancing the performance of big heterogeneous data analysis”. International Journal of Engineering Business Management, vol. 13, pp. 1847979021990523, 2021.

F. Elsafoury, S. Katsigiannis, Z. Pervez & N. Ramzan, “When the Timeline Meets the Pipeline: A Survey on Automated Cyberbullying Detection”. IEEE Access 9: pp. 103541-103563, 2021.

Published

2022-01-06

How to Cite

Arce-Ruelas, K. I., Alvarez-Xochihua, O., Pellegrin, L., Cardoza-Avendaño, L., & González-Fraga, J. Ángel. (2022). Automatic Cyberbullying Detection: a Mexican case in High School and Higher Education students. IEEE Latin America Transactions, 20(5), 770–779. Retrieved from https://latamt.ieeer9.org/index.php/transactions/article/view/5934

Most read articles by the same author(s)