Investigating the influence of groups of variables on the task of predicting the age of an author in blog posts

Authors

  • Rosalvo Oliveira Neto
  • Rodrigo Ribeiro Oliveira Univasf
  • Ana Emília de Melo Queiroz Univasf

Keywords:

Author Profiling, Age Identification, Important Features, Data Mining, Text Mining

Abstract

The identification of the profile of users from texts on the Internet is a relevant task in the context of today’s society. This activity is known in the literature as Author Profiling. Among the essential characteristics to be deduced in this task is the age. This feature is paramount, for example, for the identification of potential sexual predators in environments targeted for children. However, one of the issues faced in resolving this problem is the identification of which variables should be taken into account to address this problem. Thus, this article aims to identify which variables are relevant in building a data mining solution to infer a user’s age from a text on the Internet. An experimental study was carried out in a database of a prestigious international competition, considered a benchmarking of the area, to validate this work. The results showed that there is a difference between the possibilities of variables that can be constructed to solve this problem and justifies the importance of each variable group for this purpose. The main contribution of this study was to find different relevance among groups of variables previously mentioned in the literature.

Downloads

Download data is not yet available.

Published

2020-04-24

How to Cite

Oliveira Neto, R., Oliveira, R. R., & Queiroz, A. E. de M. (2020). Investigating the influence of groups of variables on the task of predicting the age of an author in blog posts. IEEE Latin America Transactions, 18(5), 838–844. Retrieved from https://latamt.ieeer9.org/index.php/transactions/article/view/1693