Advancing Election Data Standards: View From the Trenches

February 4, 2015 E. John Sebes

Today was the first day of another important Election Data Standards Meeting, this time in Los Angeles. And that reminds me to share with our readers where election data standards fit in to our work, and how important they are to helping ensure a consistent, reliable, verifiable election experience. Let's summarize these efforts by reviewing the four benefits that were in discussions today.

Interoperability

Interoperability is a bit of a geeky but important term. Simply put it refers to the ability for different devices, machinery, or processes to cooperate with one another. A good example of an interoperability standard you might be familiar with but don't realize is something you rely on every time you turn on your home entertainment system to watch a movie on your DVD player: HDMI. That data standard results in cables you use to connect your TV monitor to your DVD player and your cable box.

So, one type of standards-enabled interoperability is data exchange. One system needs data to do its job, and the source data is produced by another system; but the two systems don't speak the same language to express the data. In election technology, a common example is election results. Commercial Election Management System (EMS) products produce election definitions and election results data in their own format, because until recently there wasn't a standardized way of doing so. Election reporting systems need to consume that data, but it's hard to do because different counties (and other electoral jurisdictions) use different formats. For example, in California, a complete collection of results from all counties would involve 5 different proprietary or legacy formats, perhaps more in cases where two counties use the same EMS product but very different versions.

Large news organizations, as well as academics and other research organizations including the OSET Foundation and its TrustTheVote Project, can put a lot of effort into "data-wrangling" and come up with something that's nearly uniform. It's time consuming and error prone, and needs to be done several times as election results get updated from election night to final results. But more to the point, election officials don't have a ready, re-usable technical capability to "just get the data out."

Well, now we have a "standard" for U.S. election definitions and election results (we'll say more on that in reporting from the annual conference this week). So, what does that mean?

In the medium to long term, the providers of all the EMS products could support the new Standard, and consumers of the data (elections organizations themselves, election reporting products, in-house tools and Apps of large news organizations, and of course, open source platforms like our VoteStream) can be re-tooled to use Standards-compliant data. But in the shorter term, elections organizations and their existing technology base, need the ability to translate from existing formats to the Standard format. (A big part of our just-restarted work on VoteStream is to create a translator/aggregator tool set for election officials, but more on that as VoteStream reporting proceeds.)

Componentization

Interoperability by itself is great in some cases, if the issue is mainly getting two systems to talk to one another. For example, at the level of an individual county, election reporting is mostly a matter of data transfer from the EMS that the county uses, to an election result publishing system. Some counties have created a basic web publishing system that consumes results from their EMS. However, it's not so easy for any county to re-use such a solution unless they use an EMS that speaks exactly the same lingo.

For another example at the local level, a Standards-compliant election definition data set can be a bridge between an EMS that defines the information on each ballot, and a separate system that consumes an election definition. And this offers election officials the ability to design the layout of paper ballots. (In the TrustTheVote Project, we call that our Ballot Design Studio.) The point here is that data standards can enable innovations in election technology, because various different jobs can be delegated to systems that specialize in that job, and these specialized systems can inter-operate with them. Thus, we can now break large monolithic systems into smaller, lighter weight, simpler modules or components that together comprise a larger system. And that the "componentization" benefit.

Aggregation

Component interoperability by itself is not so great if you're trying to aggregate multiple data sets of the same kind, but from different sources. Taking election result reporting as the example again, here is a problem faced by consumers of election results. Part of one county votes in one Federal congressional district, and part of another county votes in the same district. Each county's EMS assigns some internal identifier to each district, but it's derived from whatever the county folks use; this is true even if an election result is represented in the new data Standard. In one county, the district -- and by extension the contest for the representative for the district -- might be called the "4th Congressional District," while in the other it could be called "CD-4." If you're trying to get results for that one contest, you need to be able identify that those are the same district and the results for the contest need to include numbers from both counties.

Currently, consumers of this data have processes for overcoming these challenges, but that ability is limited to each consumer organization, in some cases private to that organization. But what election officials need from Standards is the ability to *automatically* aggregate disparate data sets. Hmm, more Standards!

This exact issue is one of the things we're discussing this at the Standards meeting this week in Los Angeles, CA: a need for a standard way to name election items that span jurisdictions or even elections in a single jurisdiction.

Combination

Combination is closely related to aggregation, except that aggregation is combining data sets of the same kind, while combination occurs when we have multiple data sets, each containing different but complementary information about some of the same things. That was one of the challenges we had in VoteStream Alpha: election results referred to precincts (vote counts per precinct), GIS data also (the geo-codes representing a precinct), and voter-registration statistics as well (number of registered voters per precinct, actually several stats related). But many precincts had a different name in each data source! That made it challenging, for example, to report election results in the context of how registration and turnout numbers, and using mapping to visualize variations in registration levels and turnout numbers.

We'll be showing how to automate the response to such challenges, as part of VoteStream Beta, using the data Standards, identifiers, and enumerations under discussion right now.

More to Come

That's the report from Day 1. More later …