Illustrating Classic Brazilian Books using a Text-To-Image Diffusion Model

Authors

Keywords:

Image Generation, Diffusion Models, Text-to-Image, Illustration

Abstract

In recent years, Generative Artificial Intelligence (GenAI) has undergone a profound transformation in addressing intricate tasks involving diverse modalities such as textual, auditory, visual, and pictorial generation. Within this spectrum, text-to-image (TTI) models have emerged as a formidable approach to generating varied and aesthetically appealing compositions, spanning applications from artistic creation to realistic facial synthesis, and demonstrating significant advancements in computer vision, image processing, and multimodal tasks. The advent of Latent Diffusion Models (LDMs) signifies a paradigm shift in the domain of AI capabilities. This article delves into the feasibility of employing the Stable Diffusion LDM to illustrate literary works. For this exploration, seven classic Brazilian books have been selected as case studies. The objective is to ascertain the practicality of this endeavor and to evaluate the potential of Stable Diffusion in producing illustrations that augment and enrich the reader's experience. We will outline the beneficial aspects, such as the capacity to generate distinctive and contextually pertinent images, as well as the drawbacks, including any shortcomings in faithfully capturing the essence of intricate literary depictions. Through this study, we aim to provide a comprehensive assessment of the viability and efficacy of utilizing AI-generated illustrations in literary contexts, elucidating both the prospects and challenges encountered in this pioneering application of technology.

Downloads

Download data is not yet available.

Author Biographies

Felipe Rodrigues Perche Mahlow, São Paulo State University, Bauru, São Paulo, Brazil

Felipe Rodrigues Perche Mahlow received a teaching degree in Physics and a bachelor's degree in Materials Physics from São Paulo State University (Unesp), in 2020 and 2023, respectively. He is currently pursuing a Ph.D. degree in Computer Science at São Paulo State University (Unesp), with a focus on classical and Quantum Machine Learning and their applications to Quantum Information Science. His research encompasses the use of Artificial Intelligence, as well as Quantum Computing.

André Felipe Zanella, State University of Maringa, Maringa, Paraná, Brazil

André Felipe Zanella holds a Bachelor's degree in Mathematics (2022) and a Master's degree in Computer Science (2022), both from the State University of Maringá. Currently, he is pursuing a Ph.D. in Computer Science at the same institution (2024). His expertise lies in Computer Science, focusing on Computer Systems and Machine Learning. He is interested in research in optimization, Machine Learning, and TTI Diffusion Models.

William Alberto Cruz Castañeda, Federal Technological University of Paraná, Guarapuava, Paraná, Brazil

William Alberto Cruz Castaneda holds a Bachelor’s degree in Computer Science from Benemerita Universidad Autónoma de Puebla and a Bachelor’s degree in Computer Engineering from the Federal University of Rio Grande do Sul. Holds Master’s and Ph.D. in Electrical Engineering with a focus on Biomedical Engineering at the Federal University of Santa Catarina. Holds a Postdoctoral degree in AI and Biomedical Engineering at the State University of Santa Catarina. He is a professor at the Federal Technological University of Paraná. His research interests include Ubiquitous Computing, Machine Learning, and GenAI.

Regilene Aparecida Sarzi-Ribeiro, São Paulo State University, Bauru, São Paulo, Brazil

Regilene Aparecida Sarzi-Ribeir holds Postdoctoral degrees in Poetics and Cultures in Digital Humanities at UFG (2022) and Arts at UNESP/SP (2013). She is part of the Chair of Design, Art, and Science at Media Lab BR and coordinates the Media Lab UNESP. Research member of Red de Investigación de la Imagen at the University of Malaga, Spain. Holds a Ph.D. in Communication and Semiotics from PUC/SP (2012) and a Master's in Arts from UNESP/SP (2007). Graduated in Art Education with a specialization in Visual Arts from FAAC - UNESP, Bauru/SP (1994). Currently, faculty member at UNESP/Bauru, coordinates the Graduate Program in Media and Technology, and leads the labIMAGEM Research Group.

References

OpenAI, “Gpt-4 technical report,” 2023, doi: https://doi.org/10.48550/arXiv.2303.08774.

K. I. Roumeliotis and N. D. Tselikas, “Chatgpt and open-ai models: A preliminary review,” Future Internet, vol. 15, no. 6, p. 192, 2023, doi: https://doi.org/10.3390/fi15060192.

J. Huang and M. Tan, “The role of chatgpt in scientific communication: writing better scientific review articles,” American Journal of Cancer Research, vol. 13, no. 4, p. 1148, 2023. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/37168339/

Y. Wang, R. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio et al., “Tacotron: Towards end-to-end speech synthesis,” arXiv preprint arXiv:1703.10135, 2017, doi: https://doi.org/10.48550/arXiv.1703.10135.

A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “Wavenet: A genera-tive model for raw audio,” arXiv preprint arXiv:1609.03499, 2016, doi: https://doi.org/10.48550/arXiv.1609.03499.

Y. Ning, S. He, Z. Wu, C. Xing, and L.-J. Zhang, “A review of deep learning based speech synthesis,” Applied Sciences, vol. 9, no. 19, p. 4050, 2019, doi: https://doi.org/10.3390/app9194050.

J. Betker, G. Goh, L. Jing, T. Brooks, J. Wang, L. Li, L. Ouyang, J. Zhuang, J. Lee, Y. Guo et al., “Improving image generation with better captions,” Computer Science, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:264403242

A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” 2021, doi: https://doi.org/10.48550/arXiv.2102.12092.

(2022) Midjourney. [Online]. Available: https://www.midjourney.com

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2022, doi:https://doi.org/10.48550/arXiv.2112.10752.

I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press, 2016. [Online]. Available: https://www.deeplearningbook.org/

C. Wang, C. Xu, C. Wang, and D. Tao, “Perceptual adversarial networks for image-to-image transformation,” IEEE Transactions on Image Pro-cessing, vol. 27, no. 8, pp. 4066–4079, 2018, doi:https://doi.org/10.1109/TIP.2018.2836316.

J. Liu, C. Wang, H. Su, B. Du, and D. Tao, “Multistage gan for fabric defect detection,” IEEE Transactions on Image Processing, vol. 29, pp. 3388–3400, 2019, doi:https://doi.org/10.1109/TIP.2019.2959741.

T. Qiao, J. Zhang, D. Xu, and D. Tao, “Mirrorgan: Learning text-to-image generation by redescription,” in Proceedings of the IEEE/CVF

Conference on Computer Vision and Pattern Recognition, 2019, pp.1505–1514, doi:https://doi.org/10.1109/CVPR.2019.00160.

K. Walczak and W. Cellary, “Challenges for higher education in the era of widespread access to generative ai,” Economics and Business Review, vol. 9, no. 2, pp. 71–100, 2023, doi:https://doi.org/10.18559/ebr.2023.2.743.

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020, doi:https://doi.org/10.48550/arXiv.2006.11239.

F.-A. Croitoru, V. Hondru, R. T. Ionescu, and M. Shah, “Diffusion models in vision: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, doi:https://doi.org/10.1109/TPAMI.2023.3261988.

H. Vartiainen and M. Tedre, “Using artificial intelligence in craft education: crafting with text-to-image generative models,” Digital Creativity, vol. 34, no. 1, pp. 1–21, 2023, doi: https://doi.org/10.1080/14626268.2023.2174557.

A. R. Doshi and O. Hauser, “Generative artificial intelligence enhances creativity,” Available at SSRN, 2023, doi: http://dx.doi.org/10.2139/ssrn.4535536.

H. Smart, “Making books with generative ai,” Master’s thesis, Concordia University, December 2023, unpublished. [Online]. Available: https://spectrum.library.concordia.ca/id/eprint/993284/

B. Tomlinson, R. W. Black, D. J. Patterson, and A. W. Torrance, “The carbon emissions of writing and illustrating are lower for ai than for humans,” Scientific Reports, vol. 14, no. 1, p. 3732, 2024, doi: https: //doi.org/10.1038/s41598-024-54271-x.

W. A. Cruz-Castañeda, M. Amadeus, A. Zanella, and F. R. Perche-Mahlow, Generative AI Methodology for Producing Assisted Art:

Representation of the Historical-Cultural Identity of Southern Brazil, S. Hai-Jew, Ed. IGI Global, 2024, doi: https://doi.org/10.4018/ 979-8-3693-1950-5.ch010.

A. M. Piskopani, A. Chamberlain, and C. Ten Holter, “Responsible ai and the arts: The ethical and legal implications of ai in the arts

and creative industries,” in Proceedings of the First International Symposium on Trustworthy Autonomous Systems, ser. TAS ’23. New York, NY, USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/10.1145/3597512.3597528

D. Podell, Z. English, K. Lacey, A. Blattmann, T. Dockhorn, J. Müller, J. Penna, and R. Rombach, “Sdxl: Improving latent diffusion models

for high-resolution image synthesis,” arXiv preprint arXiv:2307.01952, 2023, doi:https://doi.org/10.48550/arXiv.2307.01952.

J. Hessel, A. Holtzman, M. Forbes, R. L. Bras, and Y. Choi, “Clipscore: A reference-free evaluation metric for image captioning,” 2022, doi:

https://doi.org/10.48550/arXiv.2104.08718.

T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training gans,” 2016, doi: https://

doi.org/10.48550/arXiv.1606.03498.

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill et al., “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258, 2021, doi: https://doi.org/10.48550/arXiv.2108.07258

Published

2024-11-14

How to Cite

Rodrigues Perche Mahlow, F., Zanella, A. F., Cruz Castañeda, W. A., & Aparecida Sarzi-Ribeiro, R. (2024). Illustrating Classic Brazilian Books using a Text-To-Image Diffusion Model. IEEE Latin America Transactions, 22(12), 1000–1008. Retrieved from https://latamt.ieeer9.org/index.php/transactions/article/view/9172