Authenticating Google Books with Juxta Commons

What do you get when you collate as many free Google versions of the same text as you can find? Those familiar with Google Books may suggest that you’ll quickly discover rampant OCR errors, or perhaps some truly astounding misinformation in the metadata fields. In my experiment using Juxta Commons to explore the versions of Alfred, Lord Tennyson’s long poem, The Princess, available online, I encountered my fill of both of these issues. But I also discovered a number of interesting textual variations – ones that led me to a deeper study of the poem’s publication history.

In the process of testing the efficacy of the software, I believe I stumbled upon a useful experiment that may prove helpful in the classroom: a new way to introduce students to textual scholarship, to the value of metadata, and to the modes of inquiry made possible by the digital humanities.

Many of the editions of Tennyson’s works offered in Google Books are modern, or modern reprints, and are thus available only in snippet view. Paging through the results, I chose six versions of the Princess that were available in e-book form, and I copied and pasted the text into the text editor in Juxta Commons*. Because the poem is relatively long, I chose to focus solely on its Prologue – not only to expedite the process of collation, but to see if one excerpt could give a more global view of changes to the poem across editions. Another important step was to click on the orange “i” button at the upper left of the screen to save original URLs and basic metadata about the object for future reference.

source info

This step turned out to be invaluable, once I realized that the publication information offered on the title pages of the scanned documents didn’t always agree with the metadata offered by Google (see this example).

Once the set was complete, and collated, I noticed right away that there were significant passages that were missing in the 1863 and 1900 editions of the poem.


Stepping chronologically through the set using the witness visibility feature (the eye icons on the left) showed no apparent timeline for this change (why would it be missing in 1863, present in 1866, 1872, 1875, and excised again in 1900?). The answer could only be found in a robust explanation of the revision and publication history of Tennyson’s work.

Without going too deeply into the reasons behind this set of differences (I’ll refer you to Christopher Ricks’ selected critical edition of Tennyson, if you’re interested), The Princess happens to be one of the most revised long poems of Tennyson’s career. The Prologue was expanded in the 5th edition (published in 1853) and it is that version that generally considered the standard reading text today. However, as we have seen from the Google Books on offer, even in 1900, editions were offered that were based on earlier versions of the poem. Could the fact that both versions missing the stanzas are American editions be important?

I invite Tennyson scholars to help me continue to piece together this puzzle. However, I believe that in this one example we have seen just how powerful Juxta Commons can be for delving into seemingly innocuous editions of one of Tennyson’s poem and exposing a myriad of possible topics of study. Next time you’re wondering just *which* version of a text you’re looking at on Google Books, I hope you’ll consider Juxta Commons a good place to start.

* Please note that Juxta Commons can accept some e-book formats, but those offered by Google Books have image information only, and the text cannot be extracted.

Leave a Reply

Your email address will not be published. Required fields are marked *