Researchers Keerthana Murugaraj, Salima Lamsiyah, Marten During, and Martin Theobald, affiliated with the University of Duisburg-Essen in Germany, have developed a novel approach to extract meaningful topics from large-scale historical newspaper archives. Their work, published in the journal “Digital Scholarship in the Humanities,” focuses on improving the understanding of public discourse over time, particularly in the context of nuclear power and safety.
Extracting coherent themes from vast, unstructured historical newspaper archives is a challenging task. Traditional methods like Latent Dirichlet Allocation (LDA) often struggle with the dynamic nature of language and the noise inherent in digitized texts. The researchers employed BERTopic, a neural topic-modeling approach that uses transformer-based embeddings to better capture the complexity and evolution of topics in historical texts.
The study focused on newspaper articles published between 1955 and 2018, specifically examining discourse on nuclear power and nuclear safety. By analyzing topic distributions and their temporal evolution, the researchers identified long-term trends and shifts in public discourse. This approach allowed them to explore patterns such as the co-occurrence of themes related to nuclear power and nuclear weapons and their changing importance over time.
The use of BERTopic demonstrated scalability and contextual sensitivity, offering richer insights compared to traditional methods. This research not only contributes to historical, nuclear, and social-science research but also highlights the potential for improving topic modeling in the energy sector. For instance, energy companies and policymakers could use similar techniques to analyze public sentiment and discourse on energy-related topics, aiding in strategic planning and communication.
The researchers also acknowledged current limitations and proposed directions for future work, emphasizing the need for continued innovation in this field. Their findings underscore the value of advanced topic modeling in uncovering historical insights, which can inform current and future energy policies and public engagement strategies.
Source: Murugaraj, K., Lamsiyah, S., During, M., & Theobald, M. (2023). Automating Historical Insight Extraction from Large-Scale Newspaper Archives via Neural Topic Modeling. Digital Scholarship in the Humanities.
This article is based on research available at arXiv.

