The social interaction among young students has been partially or totally transformed to mobile-based communication, specifically through the use of social networks. This new communication environment has allowed a more immediate, diverse and massive interaction, offering a faster and more effective situation when carrying out academic and recreational activities. However, this scenario has also promoted the phenomenon of social harassment known as bullying, exponentially increasing its scope and diversifying the types and forms of aggression. Machine learning and natural language processing techniques have been used to create models that detect bullying situations among students, using data corpus from mainly public social networks. However, generally, these data sources are not representative of the social networks commonly used by the students; generating classification models that do not consider the vocabulary used by this social group. This article describes the methodology used to create a representative data corpus of the interaction between Mexican high school and university students, and a comparative analysis on characteristics that influence the quality of the content of a corpus in this domain. In addition, the performance achieved by implementing various machine learning models to identify bullying situations is presented. The best result is reported for the Naive Bayesian classifier (F1-Score of 0.862), performing better than models based on deep learning such as Recurrent (F1-Score of 0.845) and Convolutional (F1-Score of 0.807) Neural Networks.


Is a Ph.D. student in the Science and Engineering program at Universidad Autónoma de Baja California; she obtained Bachelor's and Master's in Computer Science from the same institution. His research interest focuses on using technology to support early childhood education, the knowledge model, Natural Language Processing, and Machine Learning.

Received   his Ph.D. degree in Computer Science from TexasA&M  University,  USA.  Currently is a Professor of Computer Science at Universidad Autónoma de Baja California, México. He is conducting research activities in the areas of educational technology, knowledge representation and natural language processing.

Is a full-time professor and researcher of Computer Science in the Faculty of Sciences at the Universidad Autónoma de Baja California (UABC). In 2008, he earned a M.Sc. degree in Artificial Intelligence in the Universidad Veracruzana. And in 2017, he received the Ph.D. in Computer Sciences from the Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE). His research interest is in the vision and language area, including image annotation, automatic generation of sentences, neural networks, and machine learning focus in representation, indexing and automatic analysis of multimodal data

Electrical Engineer from Universidad Autónoma de Baja California, México in 2005,  PhD in science from Universidad Autónoma de Baja California, México in 2012, Member of National System of researchers level I from 2014 to the present, publications in 11 journals indexed in JCR, participation in 8 national and international congresses.

Received his BSc degree in electrical engineering from Universidad Autónoma de San Luis Potosí (UASLP),México,  in  2002  and  his  MSc  and  PhD  degrees in computer science from Centro de Investigación Científica y de Educación  Superior  de  Ensenada(CICESE),  México, in 2004 and 2007, respectively. He is currently a full-time professor at Universidad Autónoma de Baja California. His re-search interests include pattern recognition, adaptive image processing and robot vision.


