From Dickens to data science

Research analysing characters and their networks in literary texts is enabling interaction with the dynamic character networks for all 15 of Dickens works.

What if we could browse through all the literary works of an author and quickly get ideas for similarities or differences in the underlying narrative structures? Researchers Dr Markus Luczak-Roesch and Dr Adam Grener are approaching this problem space by applying novel data analytics and network science.

Data-driven analysis has emerged as a growing methodology, if not sub-discipline within literary studies. This approach, broadly described as "distant reading", has harnessed available technology to open new avenues for how we understand literary texts, both individually and in the aggregate.

Whereas traditional literary scholarship is generally grounded in the interpretation of the specific language of a text or body of texts, macroanalytic approaches have offered new ways of seeing texts.

The interdisciplinary research project attempts to theorise the relationship between macroanalytic and microanalytic (distant and close) readings of individual works, applying the Transcendental Information Cascades (TIC) approach to understand how emergent structures of information are generated during the unfolding of a text.

This treats the text as a diachronically evolving information system and uses TIC to isolate the structural properties of that system. The network thus provides a visualisation of the occurrence of characters and models the information structures they generate.

The novels of Charles Dickens (1812-1870) are a particularly interesting object of investigation within this field of research. Not only was Dickens a central figure in the development of the nineteenth-century novel — the literary form that has been a primary object of computational analyses — but his novels construct vast and elaborate character networks as they represent and the rapidly changing Victorian world.

Dickens' character networks are important because of their density and of the complex social world they represent; the way in which those networks were generated also warrants attention. Dickens was a pioneer of the serial novel form, writing monthly (or weekly) installments of his novels over the course of up to eighteen months.

Thus, his novels not only create character networks in the process of their unfolding, but also dramatise the creation and management of those networks in the very act of composition. They offer the opportunity to analyse both how a novel, taken as a completed aesthetic object, maps character connections and also how those networks are imagined and managed in their production.

The approach has been tested on nineteen novels to this point; all fifteen novels by Charles Dickens, and four by other Victorian novelists for comparative purposes. An initial user study to evaluate the tool was performed involving humanities scholars and university students in English literature.