A Study of Algorithm-Based Detection of Fake News in Brazilian Election: Is BERT the Best?
Keywords:
Fake News, BERT, Brazil, Natural Language Processing, Machine LearningAbstract
The recent Brazilian election was plagued by the proliferation of false news on the internet. Many people turned to social media to fact-check information and verify its authenticity. In today's digital and data-driven world, fake news can spread rapidly, causing detrimental effects, such as potentially influencing the outcome of an election. In light of this, verifying information has become increasingly reliant on software. While intelligent software can be used to detect and mitigate the spread of fake news, there is a lack of research on the use of such technology in the Portuguese language, particularly when it comes to the implementation of newer strategies such as the Representation of a Bidirectional Transformer Encoder (BERT). Our study evaluated BERT's ability to detect fake news compared to traditional machine learning algorithms, using text classification to identify false news. The results demonstrate BERT's superiority over other algorithms, with a statistically significant difference in all cases. BERT can considered a viable option for detecting fake news.
Downloads
References
C. Manning and H. Schutze, Foundations of Statistical Natural
Language Processing. MIT Press, 1999.
K. Faceli, Inteligência artificial: uma abordagem de aprendizado de
máquina. Grupo Gen - LTC, 2018.
K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, “Fake news
detection on social media: A data mining perspective,” ACM SIGKDD
explorations newsletter, vol. 19, no. 1, pp. 22–36, 2017.
G. M. Lunardi, “Representing the filter bubble: Towards a model to
diversification in news,” in Advances in Conceptual Modeling: ER
Workshops FAIR, MREBA, EmpER, MoBiD, OntoCom, and ER
Doctoral Symposium Papers, Salvador, Brazil, November 4–7, 2019,
Proceedings 38, pp. 239–246, Springer, 2019.
I. Moraes, “Notícias falsas e pós-verdade: o mundo das fake
news e da (des)informação | politize!.” https://www.politize.com.br/
noticias-falsas-pos-verdade/, 2017. (Accessed on 08/28/2022).
P. Falcão, A. B. d. Souza, et al., “Pandemia de desinformação: as fake
news no contexto da covid-19 no brasil,” 2021.
R. A. Monteiro, R. L. Santos, T. A. Pardo, T. A. d. Almeida,
E. E. Ruiz, and O. A. Vale, “Contributions to the study of fake
news in portuguese: New corpus and automatic detection results,”
in International Conference on Computational Processing of the
Portuguese Language, pp. 324–334, Springer, 2018.
Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training
of deep bidirectional transformers for language understanding,” 2018.
F. Souza, R. Nogueira, and R. Lotufo, “Bertimbau: Pretrained bert
models for brazilian portuguese,” in Intelligent Systems (R. Cerri
and R. C. Prati, eds.), (Cham), pp. 403–417, Springer International
Publishing, 2020.
K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes,
and D. Brown, “Text classification algorithms: A survey,” Information,
vol. 10, no. 4, 2019.
G. M. Lunardi, G. M. Machado, F. Al Machot, V. Maran, A. Machado,
H. C. Mayr, V. A. Shekhovtsov, and J. P. M. de Oliveira, “Probabilistic
ontology reasoning in ambient assistance: predicting human actions,”
in 2018 IEEE 32nd International Conference on Advanced Information
Networking and Applications (AINA), pp. 593–600, IEEE, 2018.
d. A. C. Igor Bichara, EVALUATION OF MACHINE LEARNING
CLASSIFIERS IN ORDINAL MULTICLASS FAKE NEWS DETECTION
SCENARIO. PhD thesis, Universidade Federal do Rio de Janeiro, 2019.
M. K. Jain, D. Gopalani, Y. K. Meena, and R. Kumar, “Machine
learning based fake news detection using linguistic features and word
vector features,” in 2020 IEEE 7th Uttar Pradesh Section International
Conference on Electrical, Electronics and Computer Engineering
(UPCON), pp. 1–6, IEEE, 2020.
J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training
of deep bidirectional transformers for language understanding,” CoRR,
vol. abs/1810.04805, 2018.
V. N. Barbosa, F. M. Mendes Neto, S. A. Filho, and L. Silva, “A
comparative study of machine learning algorithms for the detection
of fake news on the internet,” in XVIII Brazilian Symposium on
Information Systems, SBSI, p. 8, 2022.
H. F. Villela, F. Corrêa, J. S. de Araújo Nery Ribeiro, A. Rabelo, and
E. E. Costa, “Uma analise da acuracia obtida e datasets utilizados em
algoritmos de identificação de fake news,” in ISLA 2022 Proceedings,
Lacais, 2022.
L. D. D. Almeida, V. Fuzaro, F. V. Nieto, and A. L. M. Santana,
“Identificação de “fake news” no contexto político brasileiro: uma
abordagem computacional,” Anais do II Workshop sobre as Implicações
da Computação na Sociedade (WICS 2021), 2021.
R. L. S. Santos, R. A. Monteiro, and T. A. S. Pardo, “The fake . br
corpus-a corpus of fake news for brazilian portuguese,” 2018.
S. Sekine and E. Ranchhod, Named entities: recognition, classification
and use, vol. 19. John Benjamins Publishing, 2009.
L. Martin, B. Muller, P. J. Ortiz Suárez, Y. Dupont, L. Romary,
É. de la Clergerie, D. Seddah, and B. Sagot, “CamemBERT: a tasty
French language model,” in 58th Annual Meeting of the Association
for Computational Linguistics, pp. 7203–7219, 2020.
J. Canete, G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, and J. Pérez,
“Spanish pre-trained bert model and evaluation data,” Pml4dc at iclr,
vol. 2020, pp. 1–10, 2020.
D. Berrar, “Cross-validation,” in Encyclopedia of Bioinformatics and
Computational Biology - Volume 1, pp. 542–545, 2019.
C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and
A. Wesslén, Experimentation in software engineering. Springer Science
& Business Media, 2012.