Detection of violent speech against women in Mexican tweets using an active learning approach

Grisel Miranda-Piña; Roberto Alejo; Eréndira  Rendón-Lara; Vicente García

Authors

Grisel Miranda-Piña Instituto Tecnológico de Toluca https://orcid.org/0000-0001-7122-0658
Roberto Alejo Tecnológico Nacional de México https://orcid.org/0000-0002-7580-3305
Eréndira Rendón-Lara Instituto Tecnológico de Toluca https://orcid.org/0000-0003-4581-6022
Vicente García Universidad Autonóma de Ciudad Juárez https://orcid.org/0000-0003-2820-2918

Keywords:

Violence against women, MLP, Active learning, Twitter, Mexican Spanish Language, Speech violence detection

Abstract

In Latin American and Caribbean States the verbal violence against women on social networks, such as Twitter, is a serious threat that has been addressed through the implementation of social norms, public policies, and social movements. Nevertheless, a challenge is the effective and automatic real-time detection of violent tweets. In this sense, traditional machine learning algorithms have been proposed to tackle social issues where the training process is performed in a static manner. However, considering that Twitter is a dynamic environment where a vast of tweets are generated each second, it requires powerful machine learning algorithms that could exploit this pool of unlabeled data to be incorporated into the model through continuous updates. This paper explores an active learning method based on uncertainty sampling, which identifies the most confusing tweets to be labeled by an expert in real-time. This focused selection prioritizes which data can be used to train a multilayer perceptron that can achieve a better performance with fewer training samples. Experimental results show that including new samples yields promising results, increasing the AUC from 0.8712 to 0.8833.

Downloads

Download data is not yet available.

Author Biographies

Grisel Miranda-Piña, Instituto Tecnológico de Toluca

Grisel Miranda-Pina is a computer systems engineer graduated from the Tecnológico de Estudios Superiores de Jocotitlán in 2021. She is currently pursuing a Master's Degree in Engineering Sciences at the Tecnológico Nacional de México, Toluca Campus. Among her research interests are the applications of artificial intelligence and artificial neural networks in solving real problems within a big data context.

Roberto Alejo, Tecnológico Nacional de México

Roberto Alejo is a doctor in Advanced Computer Systems from the Universitat Jaume I, Spain. He is currently assigned to the Division of Graduate Studies and Research of the Tecnológico Nacional de México, Toluca Campus. He is also a specialist in artificial neural networks, machine learning and data mining, with a deep scientific interest in the application of artificial intelligence to solve real problems.

Eréndira Rendón-Lara, Instituto Tecnológico de Toluca

Erendira Rendon-Lara is a doctor in Computer Science from the Toluca Technological Institute. She works as a professor-researcher in the Division of Graduate Studies and Research at the Tecnológico Nacional de México, Toluca Campus. Her main academic interests focus on Data Mining and recently on "Material Informatics".

Vicente García, Universidad Autonóma de Ciudad Juárez

Vicente Garcia is a doctor in Advanced Computer Systems from the Universitat Jaume I, Castellón de la Plana, Spain, in 2010. He is currently a full-time professor in the Department of Electrical and Computer Engineering at the Autonomous University of Ciudad Juárez. His research interests include data preprocessing methods, data complexity, non-parametric classification, performance evaluation and big data.

References

ONU, “Declaration on the elimination of violence against women,” UN General Assembly: New York, NY, USA, 1993.

INEGI, “Violencia contra las mujeres en méxico.” https://www.inegi. org.mx/tablerosestadisticos/vcmm/, 2023. Encuesta Nacional sobre la Dinámica de las Relaciones en los Hogares (ENDIREH). Ediciones 2016 y 2021.

U. Women, “COVID-19 and ending violence against women and girls.” https://www.unwomen.org/sites/default/files/ Headquarters/Attachments/Sections/Library/Publications/2020/

Policy- brief- COVID- 19- and- violence- against- women- and- girls- en.pdf, 2020. Policy brief no. 17.

INEGI, “Comunicado de prensa núm. 404/23.” https://www.inegi.org. mx/contenidos/saladeprensa/boletines/2023/MOCIBA/MOCIBA2022. pdf, 2023. Módulo sobre ciberacoso 2022.

WHO, “Violence against women prevalence estimates, 2018.” https://www.who.int/publications/i/item/9789240022256, 2021. Ac- cessed: 20-07-2023.

O. J. Nacional, “Ficha Técnica - Ley Olimpia.” http://ordenjuridico.gob.mx/violenciagenero/LEY%20OLIMPIA.pdf. Accessed: 16-09-2023.

V.Castro,C.L.Vidal,andR.S.Riquelme,“Deteccióndeviolenciaver- bal hacia las mujeres en redes sociales mediante técnicas de aprendizaje automático,” Repositorio Digital Sistema de Bibliotecas Universidad del Bio-Bio (SIBUBB), 2019.

R. Lewis, M. Rowe, and C. Wiper, “Online abuse of feminists as an emerging form of violence against women and girls,” British journal of criminology, vol. 57, no. 6, pp. 1462–1481, 2017.

G. M. Abaido, “Cyberbullying on social media platforms among uni- versity students in the united arab emirates,” International Journal of Adolescence and Youth, vol. 25, no. 1, pp. 407–420, 2020.

A. Prusa, B. G. Nice, and O. Soledad, “Not one women less, not one more death: Feminist activism and policy responses to gender-based violence in latin america.” https://gjia.georgetown.edu/2020/08/12/not- one-women-less-not-one-more-death-feminist-activism-and-policy- responses-to-gender-based-violence-in-latin-america/. Accessed: 30-07-2023.

M.E.R.ContrerasandJ.V.Alvarez,Reconocimientodeagresiónverbal en Twitter con el uso de patrones lingüísticos. PhD thesis, Pontificia Universidad Católica de Valparaíso, 2017.

G. A. P. Cruz and E. E. M. Vasquez, “Modelo de detección de violencia contra la mujer en redes sociales en español, utilizando opinion mining,” bachelor’s thesis, Universidad Tecnológica de Perú, 2020.

M. Salehi, S. Ghahari, M. Hosseinzadeh, and L. Ghalichi, “Domestic violence risk prediction in Iran using a machine learning approach by analyzing Persian textual content in social media,” Heliyon, vol. 9, no. 5, p. e15667, 2023.

F. Rodríguez-Sánchez, J. Carrillo-de Albornoz, and L. Plaza, “Automatic classification of sexism in social networks: An empirical study on twitter data,” IEEE Access, vol. 8, pp. 219563–219576, 2020.

J.M.Lane,D.Habib,andB.Curtis,“Linguisticmethodologiestosurveil the leading causes of mortality: Scoping review of twitter for public health data,” J Med Internet Res, vol. 25, p. e39484, 2023.

L.Zhou,S.Pan,J.Wang,andA.V.Vasilakos,“Machinelearningonbig data: Opportunities and challenges,” Neurocomputing, vol. 237, pp. 350– 361, 2017.

IBM, “¿qué es el etiquetado de datos?.” https://www.ibm.com/es- es/topics/data-labeling, 2023. Accessed: 20-07-2023.

J.Bengar,J.vandeWeijer,B.Twardowski,andB.Raducanu,“Reducing label effort: Self-supervised meets active learning,” in 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 1631–1639, 2021.

E. Mosqueira-Rey, E. Hernández-Pereira, D. Alonso-Ríos, J. Bobes- Bascarán, and Á. Fernández-Leal, “Human-in-the-loop machine learn- ing: a state of the art,” Artificial Intelligence Review, vol. 56, no. 4, pp. 3005–3054, 2023.

D. Schuler, “Social computing,” Communications of the ACM, vol. 37, no. 1, pp. 28–29, 1994.

M. Riveni, T.-D. Nguyen, M. S. Aktas, and S. Dustdar, “Application of provenance in social computing: A case study,” Concurrency and Computation: Practice and Experience, vol. 31, no. 3, p. e4894, 2019.

D. Cavaliere, G. Fenza, V. Loia, and F. Nota, “Emotion-aware mon- itoring of users’ reaction with a multi-perspective analysis of long- and short-term topics on twitter,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. In Press, no. In Press, pp. 1– 10, 2023.

J. A. García-Díaz, M. Cánovas-García, R. Colomo-Palacios, and R. Valencia-García, “Detecting misogyny in spanish tweets. an approach based on linguistics features and word embeddings,” Future Generation Computer Systems, vol. 114, pp. 506–518, 2021.

G. O. Gutiérrez-Esparza, M. Vallejo-Allende, and J. Hernández-Torruco, “Classification of cyber-aggression cases applying machine learning,” Applied Sciences, vol. 9, no. 9, p. 1828, 2019.

S. U. Masruroh, D. Z. A. Utami, D. Khairani, M. Azhari, M. I. Helmi, and R. A. Putri, “Sentiment analysis on twitter towards the ratification of a bill on the elimination of sexual violence in Indonesia using machine learning,” in 2022 10th International Conference on Cyber and IT Service Management (CITSM), pp. 1–5, 2022.

P. Kapil, A. Ekbal, and D. Das, “Investigating deep learning ap- proaches for hate speech detection in social media,” arXiv preprint arXiv:2005.14690, 2020.

S. Adeeba, K. Banujan, B. T. G. S. Kumara, and Z. Li, “Twitter mining for detecting home violence,” in 2023 3rd International Conference on Advanced Research in Computing (ICARC), pp. 142–147, 2023.

M. E. Aragón and A. P. López-Monroy, “Author profiling and ag- gressiveness detection in spanish tweets: Mex-a3t 2018.,” in IberEval SEPLN, pp. 134–139, 2018.

S. Frenda, S. Banerjee, P. Rosso, and V. Patti, “Do linguistic features help deep learning? the case of aggressiveness in mexican tweets,” Computación y Sistemas, vol. 24, no. 2, pp. 633–643, 2020.

M. A. Al-Garadi, S. Kim, Y. Guo, E. Warren, Y.-C. Yang, S. Lakamana, and A. Sarker, “Natural language model for automatic identification of intimate partner violence reports from twitter,” Array, vol. 15, p. 100217, 2022.

G. del Valle-Cano, L. Quijano-Sánchez, F. Liberatore, and J. Gómez, “Socialhaterbert: A dichotomous approach for automatically detecting hate speech on twitter through textual analysis and user profiles,” Expert Systems with Applications, vol. 216, p. 119446, 2023.

R. P. Díaz Redondo, A. Fernández Vilas, M. Ramos Merino, S. M. Valladares Rodríguez, S. Torres Guijarro, and M. M. Hafez, “Anti- sexism alert system: Identification of sexist comments on social media using ai techniques,” Applied Sciences, vol. 13, no. 7, pp. 1–14, 2023.

K. Li, “An evaluation of automation on misogyny identification (ami) and deep-learning approaches for hate speech -highlight on graph convolutional networks and neural networks,” in 2022 International Conference on Computers, Information Processing and Advanced Edu- cation (CIPAE), pp. 239–244, 2022.

K. Elshakankery and M. F. Ahmed, “Hilatsa: A hybrid incremental learn- ing approach for arabic tweets sentiment analysis,” Egyptian Informatics Journal, vol. 20, no. 3, pp. 163–171, 2019.

J. Qian, H. Wang, M. ElSherief, and X. Yan, “Lifelong learning of hate speech classification on social media,” arXiv preprint arXiv:2106.02821, 2021.

Y. B. I. Goodfellow and A. Courville, Deep Learning. MIT Press, 2016.

S. Ruder, “An overview of gradient descent optimization algorithms,”

arXiv preprint arXiv:1609.04747, p. 1–14, 2016.

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”

arXiv preprint arXiv:1412.6980, p. 1–15, 2017.

Y. Sun, S. Wang, Y. Li, S. Feng, H. Tian, H. Wu, and H. Wang, “Ernie

0: A continual pre-training framework for language understanding,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 8968–8975, 2020.

M. McCloskey and N. J. Cohen, Catastrophic Interference in Connec- tionist Networks: The Sequential Learning Problem, vol. 24, pp. 109– 165. Academic Press, 1989.

Y. Cui, P. Koppol, H. Admoni, S. Niekum, R. Simmons, A. Steinfeld, and T. Fitzgerald, “Understanding the relationship between interactions and outcomes in human-in-the-loop machine learning,” in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp. 4382–4391, 2021.

V.-L. Nguyen, M. H. Shaker, and E. Hüllermeier, “How to measure un- certainty in uncertainty sampling for active learning,” Machine Learning, vol. 111, no. 1, pp. 89–122, 2022.

R. N. Waykole and A. D. Thakare, “A review of feature extraction methods for text classification,” Int. J. Adv. Eng. Res. Dev, vol. 5, no. 04, pp. 351–354, 2018.

U. Sharma and J. Singh, “Review of feature extraction techniques for fake news detection,” in Advances in Information Communication Technology and Computing, (Singapore), pp. 389–399, 2023.

S. Arroni, Y. Galán, X. Guzmán-Guzmán, E. R. Nuñez-Valdez, and A. Gómez, “Sentiment analysis and classification of hotel opinions in twitter with the transformer architecture,” International Journal of

Interactive Multimedia and Artificial Intelligence, vol. 8, no. 1, pp. 53–

, 2023.

L. Abdi and S. Hashemi, “To combat multi-class imbalanced problems

by means of over-sampling techniques,” IEEE Transactions on Knowl-

edge and Data Engineering, vol. 28, no. 1, pp. 238–251, 2016.

E. Rendon, R. Alejo, C. Castorena, F. J. Isidro-Ortega, and E. E. Granda- Gutierrez, “Data sampling methods to deal with the big data multi-class

imbalance problem,” Applied Sciences, vol. 10, no. 4, p. 1276, 2020.

N. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer, “SMOTE: Syn- thetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16,

pp. 321–357, 2002.

R. Alejo, J. Monroy-de Jesús, J. H. Pacheco-Sánchez, E. López-

González, and J. A. Antonio-Velázquez, “A selective dynamic sampling back-propagation approach for handling the two-class imbalance prob- lem,” Applied Sciences, vol. 6, no. 7, p. 200, 2016.

Detection of violent speech against women in Mexican tweets using an active learning approach

Authors

Keywords:

Abstract

Downloads

Author Biographies

Grisel Miranda-Piña, Instituto Tecnológico de Toluca

Roberto Alejo, Tecnológico Nacional de México

Eréndira Rendón-Lara, Instituto Tecnológico de Toluca

Vicente García, Universidad Autonóma de Ciudad Juárez

References

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

Similar Articles

Make a Submission

Information