Towards the Categorization of Brazilian Financial Market Headlines



Machine Learning, Text Categorization, Financial Market


Financial market news portals are valuable sources of information as they hold great power over investors' decision-making processes. Due to the vast amount of text data produced by news portals, several studies have been conducted to comprehend the behavioral variations of texts and automate the categorization of short texts. However, extracting useful information that influences investors' decision-making process is not a trivial task, given that news portals use a heterogeneous and specific language for each content produced, making it challenging to generate a standard document format. This work proposes GOOSE, a solution for the cateGOrizatiOn of Short texts derived from multiple sources of information, to portray the financial market's current situation. To this end, GOOSE is based on Bidirectional Long Short-Term Memory (Bi-LSTM) and GloVe Embeddings to increase reliability in the short texts classification process. That way, GOOSE obtains data from news portals, which, once combined with a word embedding mechanism, are used as input for the Bi-LSTM to classify financial market news texts. The results obtained showed that GOOSE's efficiency in categorizing texts had an accuracy of 84% but also demonstrated the feasibility of its use in the extraction of information from financial market news portals.


Download data is not yet available.

Author Biographies

Matheus Schmitz, Universidade de Brasília (UnB)

É graduado em Engenharia de Computação (2019) pela Universidade de Brasília (UnB), Brasil. Atualmente é mestrando no Programa de Pós-Graduação em Informática (PPGI) vinculado ao Departamento de Ciência da Computação (CiC) da UnB. Tem interesse nas linhas de pesquisa relacionadas ao Aprendizado de Máquina e ao Processamento de Linguagem Natural.

Roger Immich, Universidade Federal do Rio Grande do Norte (UFRN)

É Professor do Instituto Metrópole Digital (IMD) da Universidade Federal do Rio Grande do Norte (UFRN). Ele recebeu seu Ph.D. em Engenharia Informática pela Universidade de Coimbra, Portugal (2017). Foi pesquisador visitante na Universidade da Califórnia em Los Angeles, Estados Unidos (UCLA) em 2017, e realizou pós-doutorado no Instituto de Computação da Universidade de Campinas (UNICAMP) em 2019. Seus interesses de pesquisa são em Smart Cities, IoT, Quality of Experience, bem como Cloud and Fog computing.

Gustavo Pessin, Instituto Tecnológico Vale, Mineração.

É Pesquisador do Instituto Tecnológico Vale, Mineração. Pessin obteve seu o título de Doutor em Ciência da Computação pela Universidade de São Paulo, como membro do Mobile Robotics Lab. Durante seu doutorado Pessin desenvolveu pesquisas no Robotics Lab, na Heriot-Watt University, Edimburgo, Reino Unido, e no Communication and Distributed Systems Group, na Universität Bern, Suíça. Em 2015, Pessin ocupou o cargo de Visiting Scholar no Media Lab do Massachusetts Institute of Technology. Suas pesquisas são relacionadas com robótica móvel autônoma, aplicações com aprendizado de máquina, data analytics e IoT industrial.

Geraldo Pereira Rocha Filho, Universidade de Brasília (UnB)

É Professor adjunto do Departamento de Ciência da Computação da Universidade de Brasília. Foi Pesquisador no Instituto de Computação da UNICAMP por meio do Pós-Doutorado em 2018. Obteve o título de Doutor e Mestre em Ciência da Computação e Matemática Computacional pelo ICMC-USP em 2018 e 2014, respectivamente. Seus interesses de pesquisa são redes de sensores sem fio, redes veiculares, redes inteligentes, cidades inteligentes e aprendizado de máquina.


Sahar Sohangir, Dingding Wang, Anna Pomeranets, & T. Khoshgoftaar (2018). Big Data: Deep Learning for Financial Sentiment Analysis. Journal of Big Data, 5, 1–25.

Enamoto, L., Weigang, L., & Rocha Filho, G. (2021). Generic Framework for Multilingual Short Text Categorization Using Convolutional Neural Network. Multimedia Tools and Applications, 1–16.

Sebastiani, F., & Ricerche, C. (2002). Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34, 1–47.

Aggarwal, C. (2018). Machine Learning for Text. Springer Publishing Company, Incorporated.

Onur Can Sert, Salih Doruk Şahin, Tansel Özyer, & Reda Alhajj (2020). Analysis and Prediction in Sparse and High Dimensional Text Data: The Case of Dow Jones Stock Market. Physica A: Statistical Mechanics and its Applications, 545, 123752.

Carosia, A., Coelho, G., & Silva, A. (2019). The Influence of Tweets and News on the Brazilian Stock Market Through Sentiment Analysis. (pp. 385-392).

Nti, I., Adekoya, A., & Weyori, B. (2020). Predicting Stock Market Price Movement Using Sentiment Analysis: Evidence From Ghana. Applied Computer Systems, 25, 33-42.

Muhammad Abubakr Naeem, Saqib Farid, Balli Faruk, & Syed Jawad Hussain Shahzad (2020). Can Happiness Predict Future Volatility in Stock Markets?. Research in International Business and Finance, 54, 101298.

Johnson D. Kinyua, Charles Mutigwe, Daniel J. Cushing, & Michael Poggi (2021). An Analysis of the Impact of President Trump’s Tweets on the DJIA and S&P 500 Using Machine Learning and Sentiment Analysis. Journal of Behavioral and Experimental Finance, 29, 100447.

Althelaya, K., El-Alfy, E.S., & Mohammed, S. (2018). Stock market forecast using multivariate analysis with bidirectional and stacked (LSTM, GRU). In 2018 21st Saudi Computer Society National Computer Conference (NCC) (pp. 1–7).

W. Zhao, G. Zhang, G. Yuan, J. Liu, H. Shan, & S. Zhang (2020). The Study on the Text Classification for Financial News Based on Partial Information. IEEE Access, 8, 100426-100437.

Hutto, C., & Gilbert, E. (2014). VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. In Book title is required!. The AAAI Press.

Jeffrey Pennington, Richard Socher, & Christopher D. Manning (2014). GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543).

Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Comput., 9(8), 1735–1780.

Zhang, M. (2015). Bidirectional Long Short-Term Memory Networks for Relation Classification. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation (pp. 73–78).

Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for Hyper-Parameter Optimization. In Advances in neural information processing systems (pp. 2546–2554).



How to Cite

Schmitz, M., Immich, R., Pessin, G., & Pereira Rocha Filho, G. (2021). Towards the Categorization of Brazilian Financial Market Headlines. IEEE Latin America Transactions, 20(2), 344–351. Retrieved from