Development of a Smartphone Application and Chrome Extension to Detect Fake News in English and European Portuguese
Keywords:
Machine Learning, Deep Learning, Web Scraping, Natural Language Processing, Extra Gradient BoostingAbstract
In a digital society, the truth portrayed by information is crucial in promoting education, security, and evolution. However, fake news raises a significant concern in that regard. Although there has been a continuous effort in the fight against fake news, it is still a multifaceted challenge in constant change as the menace renovates itself. Thus, in our approach, several machine learning and deep learning models were developed to obtain models that can detect fake content that appears online. The models can then be interfaced with users’ devices, namely in the form of browser extensions and smartphone applications. The classification models run on a cloud server and are accessible via web services. These models can detect fake news in English and European Portuguese, with a stronger focus on the latter, given the reduced number of projects in this specific field and language. Besides developing the first public dataset for fake news detection in European Portuguese through web scraping, the models achieved better performance than previous work while being trained with a significantly higher amount of data from a wider variety of sources.
Downloads
References
U. Gneezy, “Deception: The role of consequences,” American Economic Review, vol. 95, pp. 384–394, March 2005.
X. Zhang and A. A. Ghorbani, “An overview of online fake news: Characterization, detection, and discussion,” Information Processing & Management, vol. 57, p. 102025, Mar. 2020.
C. Shao, G. L. Ciampaglia, O. Varol, K. Yang, A. Flammini, and F. Menczer, “The spread of low-credibility content by social bots,” Nature Communications, vol. 9, p. 4787, Nov. 2018.
J. McGarrigle, “Explained: What is Fake news? | Social Media and Filter Bubbles,” Available at https://www.webwise.ie/teachers/what-is-fake-news/, 2018.
S. Maheshwari, “10 Times Trump Spread Fake News,” Available at https://nyti.ms/3Qb9kA6, 2017.
S. Khan, S. Hakak, N. Deepa, B. Prabadevi, K. Dev, and S. Trelova, “Detecting COVID-19-Related Fake News Using Feature Extraction,” Frontiers in Public Health, vol. 9, p. 788074, Jan. 2022.
M. Holroyd, “Five of the most viral misinformation posts since Ukraine war began,” Available at https://www.euronews.com/my-europe/2022/08/24/ukraine-war-five-of-the-most-viral-misinformation-posts-and-false-claims-since-the-conflic, 2022.
A. Abdulrahman and M. Baykara, “Fake News Detection Using Machine Learning and Deep Learning Algorithms,” in 2020 International Conference on Advanced Science and Engineering (ICOASE), pp. 18–23, IEEE, Dec. 2020.
W. Antoun, F. Baly, R. Achour, A. Hussein, and H. Hajj, “State of the Art Models for Fake News Detection Tasks,” in 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), pp. 519–524, IEEE, Feb. 2020.
Z. Khanam, B. N. Alwasel, H. Sirafi, and M. Rashid, “Fake News Detection Using Machine Learning Approaches,” IOP Conference Series: Materials Science and Engineering, vol. 1099, p. 012040, Mar. 2021.
A. Thota, P. Tilak, S. Ahluwalia, and N. Lohia, “Fake News Detection: A Deep Learning Approach,” SMU Data Science Review, vol. 1, no. 3, 2018.
I. Ahmad, M. Yousaf, S. Yousaf, and M. O. Ahmad, “Fake News Detection Using Machine Learning Ensemble Methods,” Complexity, vol. 2020, pp. 1–11, Oct. 2020.
S. Mishra, P. Shukla, and R. Agarwal, “Analyzing Machine Learning Enabled Fake News Detection Techniques for Diversified Datasets,” Wireless Communications and Mobile Computing, vol. 2022, pp. 1–18, Mar. 2022.
P. K. Verma, P. Agrawal, I. Amorim, and R. Prodan, “Welfake: Word embedding over linguistic features for fake news detection,” IEEE Transactions on Computational Social Systems, vol. 8, no. 4, pp. 881–893, 2021.
S. R. Sahoo and B. Gupta, “Multiple features based approach for automatic fake news detection on social networks using deep learning,” Applied Soft Computing, vol. 100, p. 106983, Mar. 2021.
P. Patwa, S. Sharma, S. Pykl, V. Guptha, G. Kumari, M. S. Akhtar, A. Ekbal, A. Das, and T. Chakraborty, “Fighting an infodemic: COVID-19 fake news dataset,” in Combating Online Hostile Posts in Regional Languages during Emergency Situation, pp. 21–29, Springer International Publishing, 2021.
R. M. Silva, R. L. S. Santos, T. A. Almeida, and T. A. S. Pardo, “Towards automatically filtering fake news in Portuguese,” Expert Systems with Applications, vol. 146, p. 113199, 2020.
J. F. C. Rodrigues, “Fake News Classification in European Portuguese Language,” Available at http://hdl.handle.net/10071/22194, 2020.
M. R. P. Teixeira, “Índice de Credibilidade de Conteúdos Noticiosos em Língua Portuguesa para Uso em Ambiente Escolar,” Available at http://hdl.handle.net/10400.22/18330, 2021.
F. P. Shah and V. Patel, “A review on feature selection and feature extraction for text classification,” in 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 2264–2268, IEEE, Mar. 2016.
B. Venkatesh and J. Anuradha, “A review of feature selection and its methods,” Cybernetics and Information Technologies, vol. 19, no. 1, pp. 3–26, 2019.
P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Machine Learning, vol. 63, pp. 3–42, Apr. 2006.
R. Wang, “AdaBoost for Feature Selection, Classification and Its Relation with SVM, A Review,” Physics Procedia, vol. 25, pp. 800–807, Jan. 2012.
A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Frontiers in Neurorobotics, vol. 7, p. 21, Dec. 2013.
T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794, Aug. 2016.
Z. Zhang, “Introduction to machine learning: k-nearest neighbors,” Annals of Translational Medicine, vol. 4, p. 218, June 2016.
D. Maulud and A. Mohsin Abdulazeez, “A Review on Linear Regression Comprehensive in Machine Learning,” Journal of Applied Science and Technology Trends, vol. 1, pp. 140–147, Dec. 2020.
T. Evgeniou and M. Pontil, “Support Vector Machines: Theory and Applications,” in Machine Learning and Its Applications, Advanced Lectures, vol. 2049, pp. 249–257, Sept. 2001.
S. Indolia, A. K. Goswami, S. P. Mishra, and P. Asopa, “Conceptual Understanding of Convolutional Neural Network- A Deep Learning Approach,” Procedia Computer Science, vol. 132, pp. 679–688, Jan. 2018.
N. M. Rezk, M. Purnaprajna, T. Nordström, and Z. Ul-Abdin, “Recurrent Neural Networks: An Embedded Computing Perspective,” IEEE Access, vol. 8, pp. 57967–57996, 2020.
L. S. Moreira, G. M. Lunardi, M. d. O. Ribeiro, W. Silva, and F. P. Basso, “A Study of Algorithm-Based Detection of Fake News in Brazilian Election: Is BERT the Best?,” IEEE Latin America Transactions, vol. 21, pp. 897–903, Sept. 2023.
H. Dalianis, “Evaluation Metrics and Evaluation,” in Clinical Text Mining: Secondary Use of Electronic Patient Records, pp. 45–53, Springer International Publishing, 2018.
M. Sintra, “Fake News e a Desinformação: Perspetivar comportamentos e estratégias informacionais,” Available at http://hdl.handle.net/10362/79564, 2019.
Observador, “Como é o mundo clandestino dos "sites" em português associados às ’fake news’,” Available at https://observador.pt/2019/02/20/como-e-o-mundo-clandestino-dos-sites-em-portugues-associados-as-fake-news/, 2019.
D. de Notícias, “Diário de Notícias: "Fake news: sites portugueses com mais de dois milhões de seguidores",” Available at https://apav.pt/apav_v3/index.php/pt/1866-diario-de-noticias-fake-news-sites-portugueses-com-mais-de-dois-milhoes-de-seguidores, 2018.
K. Sedor, “The Law of Large Numbers and its Applications,” Available at https://www.lakeheadu.ca/sites/default/files/uploads/77/images/Sedor%20Kelly.pdf, 2015.
“Deep Learning vs Machine Learning: The Ultimate Battle,” Turing, Available at https://www.turing.com/kb/ultimate-battle-between-deep-learning-and-machine-learning, 2022.
R. Afonso, “fake-news-pt-eu,” Available at https://github.com/ro-afonso/fake-news-pt-eu, 2024.