About & FAQ

About this project

This site is a contribution from the University of Virginia Press's Rotunda Digital Imprint to the 250th Anniversary of the Declaration of Independence. It is intended as a companion to the other forthcoming sites being created by the University of Virginia's associated documentary editing projects funded by the NEH. This site will be updated and corrected over time, with features added. Because it will keep evolving, it launches in beta on July 4, 2026.

Where does the data come from and how complete is it?

The metadata supplying the visualizations on this site are pulled from 12 published editions of documentary history, which are collections of letters selected, transcribed and annotated by historians. These editions are digitized on the University of Virginia Press's Rotunda platform, and some subsequently, on Founders Online. While many of these published editions are authoritative, aiming to collect every letter from their represented figure, others are more curated or were published prior to the discovery of new letters. There are also about five thousand documents here in "Early Access" form, prior to being edited and published by the active documentary projects, so they lack annotations or normalized names.

Below is a list of the collections represented, by proportion of dataset, having content from Jan 1, 1771 through Dec 31, 1783. 1771 was chosen as the starting year here to provide a quiet baseline, to contrast the increasing activity as the war progressed. Note that this site overwhelmingly skews towards the Papers of George Washington, as he has the most records of this time period in our collection.

The Papers of George Washington - (University of Virginia Press) 20,992 documents
The Papers of Benjamin Franklin - (Yale University Press) 9,518 documents
The Adams Family Papers - (Massachusetts Historical Society) 4,744 documents
The Papers of Thomas Jefferson - (Princeton University Press and Monticello) 2,542 documents
The Papers of James Madison - (University of Virginia Press) 831 documents
The Selected Papers of John Jay - (The Trustees of Columbia University in the City of New York) 765 documents
The Papers of Alexander Hamilton - (Columbia University Press) 742 documents
The Papers of Eliza Lucas Pinckney and Harriott Pinckney Horry: Digital Edition - (University of Virginia Press, Rotunda Digital Imprint) 219 documents
The Papers of the Revolutionary Era Pinckney Statesmen Digital Edition - (University of Virginia Press, Rotunda Digital Imprint) 174 documents
The Letters of Benjamin Rush - (American Philosophical Society) 139 documents
The Documentary History of the Ratification of the Constitution - (The State Historical Society of Wisconsin) 13 documents
John Marshall - (The University of North Carolina Press) 10 documents

How was the data prepared?

As part of the digitization process, Rotunda's editors extract or assign metadata from the documents to facilitate search and discovery on our websites. These are typically documentID, author(s), recipient(s) and date. We pulled this existing metadata from documents from the period represented on this site into a csv file, and then investigated the feasibility of assigning normalized location and person metadata where applicable. All of the content on Rotunda and Founders Online are digitized into TEI-compliant XML files, which allow for the semantic markup of content in a letter, in other words, allowing us to identify and extract individual sections, such as salutation, dateline, body, closing, etc.

Location Data

To begin this process, we pulled the dateline elements from all documents and added them as a column in our starting dataset. The datelines in letters generally include the date, and ideally, where a letter was written. Unhelpfully, authors often abbreviated when writing these, and as spelling was not yet standardized, and often phonetic, so extracting the locations would require a significant amount of processing. Examples include: "Philada 21st March, 1782", "Philadelpa 6th Feb. 1782" or even "Philadelphie, Le Juno cet 4 1782". Additionally problematic, this dataset has many documents that are simply labeled "headquarters" (or "head quarters", "hqs", "Hd Qtrs", and so on). To tackle this, first regular expressions were used to create a new column that extracted all calendar words in English, French and Spanish, leaving the most likely hints at locations. Then began the several months long process of going through each of the document rows and seeing how many locations could be identified or inferred with reasonable accuracy. Thankfully, sorting by author and date often allowed for easier clustering. For many ambiguous documents, the editorial notes were often very helpful. To attempt to address the multitude of "camp" or "headquarters", we ran a script against published camp locations with dates, from sources such as this from the Washington Papers and this from Wikipedia, to extract possible locations for documents written by George Washington in these date windows. Nevertheless, we were not able to assign locations for every document and left blank for any that were missing or unclear.

In assigning locations, the goal was to ultimately map these to an authority like Geonames, so we could pull GPS coordinates and hierarchical data. Thus our inferred location field included the city or town, county, and state. In each case we tried to use the modern equivalent, as there were many cases where towns names changed or their borders changed across counties and states, which would be more likely to map against Geonames' database. A new script was then run against this database, using its API, and returned a geonameid, latitude, longitude, country code, and two levels of administrative hierarchies (often county and state), each of which was ultimately utilized to power the maps in our site.

People

One of the most important tasks needed to make this site reliably illustrate connections between people was to attach each person to their own id, so the graph could match on an id, rather than the text of a name, and would thus be able to distinguish between different people with the same name (very common in this period). Our options were to create our own id, or use an established authority, such as VIAF or Wikidata. Despite a few attempts, mapping the people's names and birth/death dates to these large authorities was not reliable, and produced too many false positives. Luckily, Rotunda has a publication which attempted to tackle this very problem, focused on this time period, in People of the Founding Era. This project (2008-2016) used a team of editors and students to attempt to identify and disambiguate individuals writing and mentioned in documentary collections during this time period, so we could far more accurately connect our dataset's people to their ids and compiled information.

That said, the first pass of auto-reconciliation was not wholly successful, and still incorrectly mapped individuals to the wrong names. For example, Thomas Jefferson (255793), and not his more famous (and relevant) Thomas Jefferson (53967). So as with the locations, this process would require months of extensive human review of the documents and database. AI (Claude Opus 4.5-4.8) was used at this stage, quite effectively, to generate a succession of spreadsheet reports, providing subsets of problematic patterns for humans to check, such as PFE ids pointing to multiple people with different name values. These reports would highlight rows with possible issues, including links to documents and decision columns, which would then be merged back into the dataset. When a person could not be reliably connected with a person in PFE, we created another unique Rotunda ID, such as ROT-PER-002, so that the person could still be used and matched in the database powering this site.

How were the visualizations built?

After preparing the data as much as possible, the csv was imported into a Neo4j Aura graph database, in order to explore new ways to expose connections between the people referenced in the documents. Whereas a regular (or relational) database stores letters as a long list of rows, a graph database organizes people as "nodes" and the letters between them as "edges". This enables us to follow a chain from one person to the people they corresponded with, and filter on different types of relationships or properties (such as gender). This is best exemplified on the site in our Correspondence Network. Questions that would fuel visualizations were converted into queries to the database in Cypher syntax, and exported as JSON. These files were dropped in our website repository to source visualizations, using d3.js or TopoJSON (maps) to bring them to life. Complex visualizations were prototyped or created with the help of Claude Opus 4.5-4.8. The website is written in JavaScript, using the Astro static site framework.

Found an error or have a correction?

While we tried to make the data as accurate as possible, if you suspect an error, please advise us at rotunda-bugs-request@virginia.edu

Can I reuse or cite the data?

The metadata we created is published on GitHub under a Creative Commons Attribution 4.0 (CC BY 4.0) license. You're free to use it, as long as you credit us. It contains no transcriptions; the documents themselves remain the property of their respective publishers, so to quote or cite a letter, please refer to it from Founders Onlineor Rotunda's collections. If the data supports your work, we'd love to hear about it so we can link to or feature it.

What is the connection between this site and Founders Online?

The team at Rotunda, the digital imprint of the University of Virginia Press, created Founders Online in 2013, and has since maintained and hosted the site, via a yearly partnership with the National Archives and Records Administration (NARA).

Who made this site?

Credits coming