fake-news-pt-eu

This project was developed to contribute to the fight against fake news, with a stronger focus on the European Portuguese language. If you find this repository useful, please cite it in your work, alongside our paper.

Based on the conducted search, this is the first dataset with fake and real news in European Portuguese publicly available.

The diagram below depicts the project's pipeline, with the two English and European Portuguese approaches:

European Portuguese Dataset

The dataset has over 60 000 rows with news articles and statements extracted through Web Scraping.

It is comprised of 4 columns: Text (news title and body merged together), Label (0 for fake, 1 for real), Source and URL.

The Source column was added because many fake news websites love to promote articles from other fake news websites, which means not all articles present on a given website belong to it.

All the fact-checks also had the source behind the statement being fact-checked, which varied from individuals like politicians or celebrities to social media as a whole.

The Web Scrapers used to gather the data are also available, alongside many Python notebooks with different classification models and techniques.

Best Machine Learning and Deep Learning models

The best models for the English and European Portuguese approach were BERT (0.96 F1-score) with tokenized text data and XGBoost (0.957 F1-score) with pre-processed text (lemmatization and stopword removal), Sentiment Analysis, POS tagging and TF-IDF, respectively.

The distilled version of the English BERT model is available here.

The European Portuguese XGBoost model is available here. Since the AWS EC2 instance of the Free Tier used in the project only has 1 GB of RAM, this model couldn't be used. To solve this issue, another distilBERT model was trained, this time with the European Portuguese data, with an F1-score of 0.92. The model is available here.

Applications Development and Deployment

To put the ML and DL models into action, the following system was developed:

A Chrome extension and Android application communicate with a Flask app run on a docker container inside an AWS EC2 instance, which allows users to check whether a given text is real or fake through POST and GET requests.

Users can also report fake or real news articles, which are then processed in a script run on a local computer with a dedicated Graphical Processing Unit (GPU).

The models are fine-tuned with the feedback data and then sent over to the cloud instance through Secure Shell (SSH) and Secure File Transfer Protocol (SFTP) commands, as well as a POST request which allows the Flask app to replace the old models with the improved ones.

The Chrome extension is available here.

Get the project running

To deploy the Flask app on the cloud and try it out, I recommend following this tutorial. A few changes will be required, naturally.

Once the cloud instance is working as intended, the system is ready to use after following these steps:

Move the English and European Portuguese datasets to the "Flask Cloud and Local RESTful Script" folder
Download the distilBERT models from here and here and move them to the "Flask Cloud and Local RESTful Script" folder
Adapt the cloud instance IP in the "data_fetch_websocket.py" script, the Chrome extension and the Android app

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
English Classification		English Classification
European Portuguese Classification		European Portuguese Classification
FakeNewsApp		FakeNewsApp
Flask Cloud and Local RESTful Script		Flask Cloud and Local RESTful Script
News Detector Chrome Extension		News Detector Chrome Extension
Final_Dataset_English.csv		Final_Dataset_English.csv
Final_dataset_portuguese.csv		Final_dataset_portuguese.csv
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

English Classification

English Classification

European Portuguese Classification

European Portuguese Classification

FakeNewsApp

FakeNewsApp

Flask Cloud and Local RESTful Script

Flask Cloud and Local RESTful Script

News Detector Chrome Extension

News Detector Chrome Extension

Final_Dataset_English.csv

Final_Dataset_English.csv

Final_dataset_portuguese.csv

Final_dataset_portuguese.csv

LICENSE

LICENSE

README.md

README.md

Repository files navigation

fake-news-pt-eu

European Portuguese Dataset

Best Machine Learning and Deep Learning models

Applications Development and Deployment

Get the project running

About

Releases

Packages

Languages

License

ro-afonso/fake-news-pt-eu

Folders and files

Latest commit

History

Repository files navigation

fake-news-pt-eu

European Portuguese Dataset

Best Machine Learning and Deep Learning models

Applications Development and Deployment

Get the project running

About

Resources

License

Stars

Watchers

Forks

Languages