A Study of Algorithm-Based Detection of Fake News in Brazilian Election: Is BERT the Best?

Lara Souto Moreira; Gabriel  Machado Lunardi; Matheus de Oliveira Ribeiro; Williamson  Silva; Fabio  Paulo Basso

Authors

Lara Souto Moreira Universidade Federal do Pampa (UNIPAMPA) https://orcid.org/0000-0002-2876-4585
Gabriel Machado Lunardi Universidade Federal de Santa Maria (UFSM) https://orcid.org/0000-0001-6655-184X
Matheus de Oliveira Ribeiro Universidade Federal do Pampa (UNIPAMPA) https://orcid.org/0000-0001-6073-2674
Williamson Silva Universidade Federal do Pampa (UNIPAMPA) https://orcid.org/0000-0003-1849-2675
Fabio Paulo Basso Universidade Federal do Pampa (UNIPAMPA) https://orcid.org/0000-0003-4275-0638

Keywords:

Fake News, BERT, Brazil, Natural Language Processing, Machine Learning

Abstract

The recent Brazilian election was plagued by the proliferation of false news on the internet. Many people turned to social media to fact-check information and verify its authenticity. In today's digital and data-driven world, fake news can spread rapidly, causing detrimental effects, such as potentially influencing the outcome of an election. In light of this, verifying information has become increasingly reliant on software. While intelligent software can be used to detect and mitigate the spread of fake news, there is a lack of research on the use of such technology in the Portuguese language, particularly when it comes to the implementation of newer strategies such as the Representation of a Bidirectional Transformer Encoder (BERT). Our study evaluated BERT's ability to detect fake news compared to traditional machine learning algorithms, using text classification to identify false news. The results demonstrate BERT's superiority over other algorithms, with a statistically significant difference in all cases. BERT can considered a viable option for detecting fake news.

Downloads

Download data is not yet available.

Author Biographies

Lara Souto Moreira, Universidade Federal do Pampa (UNIPAMPA)

Lara Souto Moreira is currently a Software Engineering student at the Federal University of Pampa (UNIPAMPA), Brazil. Her research interests include Data Analysis, Machine Learning, Databases, and Information Systems.

Gabriel Machado Lunardi, Universidade Federal de Santa Maria (UFSM)

Gabriel Machado Lunardi has a Ph.D. in Computer Science from the Federal University of Rio Grande do Sul (UFRGS), Brazil, and is currently an Adjunct Computer Science Professor at the Federal University of Santa Maria (UFSM), Santa Maria, Brazil. Gabriel has experience in Artificial Intelligence, Natural Language Processing (NLP), Recommender Systems, Machine Learning, and Knowledge Discovery in Databases.

Matheus de Oliveira Ribeiro, Universidade Federal do Pampa (UNIPAMPA)

Matheus de Oliveira Ribeiro has a bachelor's degree in Computer Science from the State University of Paraná. He is currently working towards a Master's degree in Software Engineering at the Federal University of Pampa (UNIPAMPA). His research is focused on User Experience and Machine Learning.

Williamson Silva, Universidade Federal do Pampa (UNIPAMPA)

Williamson Silva received a Ph.D. in Informatics from the Institute of Computing of the Federal University of Amazonas (UFAM). He is currently an Adjunct Professor of Software Engineering at the Federal University of Pampa (UNIPAMPA). His research interests include Software Engineering, Empirical Software Engineering, Software Quality, Computing Education Research, Usability, User Experience, Machine Learning, and Human-Centered Machine Learning.

Fabio Paulo Basso, Universidade Federal do Pampa (UNIPAMPA)

Fábio Paulo Basso is currently an Adjunct Professor at the Federal University of Pampa (UNIPAMPA), Brazil. His research interests include software reuse, software architecture, coopetition business models performed through services, technology transfer of computer science invents, precision agriculture and precision animal husbandry.

References

C. Manning and H. Schutze, Foundations of Statistical Natural

Language Processing. MIT Press, 1999.

K. Faceli, Inteligência artificial: uma abordagem de aprendizado de

máquina. Grupo Gen - LTC, 2018.

K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, “Fake news

detection on social media: A data mining perspective,” ACM SIGKDD

explorations newsletter, vol. 19, no. 1, pp. 22–36, 2017.

G. M. Lunardi, “Representing the filter bubble: Towards a model to

diversification in news,” in Advances in Conceptual Modeling: ER

Workshops FAIR, MREBA, EmpER, MoBiD, OntoCom, and ER

Doctoral Symposium Papers, Salvador, Brazil, November 4–7, 2019,

Proceedings 38, pp. 239–246, Springer, 2019.

I. Moraes, “Notícias falsas e pós-verdade: o mundo das fake

news e da (des)informação | politize!.” https://www.politize.com.br/

noticias-falsas-pos-verdade/, 2017. (Accessed on 08/28/2022).

P. Falcão, A. B. d. Souza, et al., “Pandemia de desinformação: as fake

news no contexto da covid-19 no brasil,” 2021.

R. A. Monteiro, R. L. Santos, T. A. Pardo, T. A. d. Almeida,

E. E. Ruiz, and O. A. Vale, “Contributions to the study of fake

news in portuguese: New corpus and automatic detection results,”

in International Conference on Computational Processing of the

Portuguese Language, pp. 324–334, Springer, 2018.

Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training

of deep bidirectional transformers for language understanding,” 2018.

F. Souza, R. Nogueira, and R. Lotufo, “Bertimbau: Pretrained bert

models for brazilian portuguese,” in Intelligent Systems (R. Cerri

and R. C. Prati, eds.), (Cham), pp. 403–417, Springer International

Publishing, 2020.

K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes,

and D. Brown, “Text classification algorithms: A survey,” Information,

vol. 10, no. 4, 2019.

G. M. Lunardi, G. M. Machado, F. Al Machot, V. Maran, A. Machado,

H. C. Mayr, V. A. Shekhovtsov, and J. P. M. de Oliveira, “Probabilistic

ontology reasoning in ambient assistance: predicting human actions,”

in 2018 IEEE 32nd International Conference on Advanced Information

Networking and Applications (AINA), pp. 593–600, IEEE, 2018.

d. A. C. Igor Bichara, EVALUATION OF MACHINE LEARNING

CLASSIFIERS IN ORDINAL MULTICLASS FAKE NEWS DETECTION

SCENARIO. PhD thesis, Universidade Federal do Rio de Janeiro, 2019.

M. K. Jain, D. Gopalani, Y. K. Meena, and R. Kumar, “Machine

learning based fake news detection using linguistic features and word

vector features,” in 2020 IEEE 7th Uttar Pradesh Section International

Conference on Electrical, Electronics and Computer Engineering

(UPCON), pp. 1–6, IEEE, 2020.

J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training

of deep bidirectional transformers for language understanding,” CoRR,

vol. abs/1810.04805, 2018.

V. N. Barbosa, F. M. Mendes Neto, S. A. Filho, and L. Silva, “A

comparative study of machine learning algorithms for the detection

of fake news on the internet,” in XVIII Brazilian Symposium on

Information Systems, SBSI, p. 8, 2022.

H. F. Villela, F. Corrêa, J. S. de Araújo Nery Ribeiro, A. Rabelo, and

E. E. Costa, “Uma analise da acuracia obtida e datasets utilizados em

algoritmos de identificação de fake news,” in ISLA 2022 Proceedings,

Lacais, 2022.

L. D. D. Almeida, V. Fuzaro, F. V. Nieto, and A. L. M. Santana,

“Identificação de “fake news” no contexto político brasileiro: uma

abordagem computacional,” Anais do II Workshop sobre as Implicações

da Computação na Sociedade (WICS 2021), 2021.

R. L. S. Santos, R. A. Monteiro, and T. A. S. Pardo, “The fake . br

corpus-a corpus of fake news for brazilian portuguese,” 2018.

S. Sekine and E. Ranchhod, Named entities: recognition, classification

and use, vol. 19. John Benjamins Publishing, 2009.

L. Martin, B. Muller, P. J. Ortiz Suárez, Y. Dupont, L. Romary,

É. de la Clergerie, D. Seddah, and B. Sagot, “CamemBERT: a tasty

French language model,” in 58th Annual Meeting of the Association

for Computational Linguistics, pp. 7203–7219, 2020.

J. Canete, G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, and J. Pérez,

“Spanish pre-trained bert model and evaluation data,” Pml4dc at iclr,

vol. 2020, pp. 1–10, 2020.

D. Berrar, “Cross-validation,” in Encyclopedia of Bioinformatics and

Computational Biology - Volume 1, pp. 542–545, 2019.

C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and

A. Wesslén, Experimentation in software engineering. Springer Science

& Business Media, 2012.

A Study of Algorithm-Based Detection of Fake News in Brazilian Election: Is BERT the Best?

Authors

Keywords:

Abstract

Downloads

Author Biographies

Lara Souto Moreira, Universidade Federal do Pampa (UNIPAMPA)

Gabriel Machado Lunardi, Universidade Federal de Santa Maria (UFSM)

Matheus de Oliveira Ribeiro, Universidade Federal do Pampa (UNIPAMPA)

Williamson Silva, Universidade Federal do Pampa (UNIPAMPA)

Fabio Paulo Basso, Universidade Federal do Pampa (UNIPAMPA)

References

Downloads

Published

How to Cite

Issue

Section

Make a Submission

Information