Elections and the Fractal Fracas of Misguided Innovation
We’ve recently seen a bit of an online fracas about the word “fractal” in relation to elections and election fraud. There are several different, mostly unrelated ideas in the mix; however, it turns out that it’s easy to untangle most of it by focusing on just two questions:
What is “fractal” and what does it have to do with elections?
Does “fractal” help with voter fraud?
To discuss those questions and answers, co-founder & Chief Technology Officer John Sebes and I co-wrote this piece. It is more-or-less a follow-on to John’s recent article on this topic, and sets the table for more conversation about work underway here to produce a far higher caliber, easier to use data hygiene engine for voter registration data management systems. We start with the first question.
What is Fractal?
“Fractal” is one of a handful of mathematical technology tools and concepts, none of which we believe have any useful application to election administration, registering voters, or casting and counting ballots. To give you a sense of why we believe that, let’s start with the very simplest explanation of the concept. A fractal is a never-ending pattern. Fractals are infinitely complex patterns that are self-similar across different scales. They are created by repeating a simple pattern in an on-going feedback loop. Driven by the important mathematical concept of “recursion,” fractals are images of dynamic systems (sometimes literally visual images). Fractal patterns are familiar; nature is full of fractals such as the shapes of leaves or formations of tree branches. And there are abstract fractals for creating amazing visuals – such as the Mandelbrot Set, which is generated by a computer repeatedly calculating an equation.
So, if we consider this as basically as possible, it should be clear that…
There is no complexity in the administration of election data that would necessarily benefit from the application of fractal concepts.
Nevertheless, let’s consider a quick breakdown of some aspects of fractals for better understanding of our assertion.
We explained above that “fractal” is a math concept, usually referring to a fractal set of numbers, or a fractal shape that can be constructed based on a “fractal set.” There is no doubt, fractals are dope-fire for math geeks. If you’re inclined, Wikipedia has tons about fractal dimensions and more.
The most common usage of the term is “fractal software,” which is software that developers can use to manipulate fractals to produce interesting visuals, and again Wikipedia covers this nicely.
There is another application of the term, “fractal programming” which is a set of concepts (“paradigm” or “methodology”) about how to perform software development of many kinds. Note that fractal programming is not related to fractal software mentioned above. Though still a bit esoteric, the concepts have been around for years, and there are solid academic articles and textbooks about it, for example, this (rather spendy) book.
One of the more recent branches of these ideas is “fractal algorithms” (an algorithm is a formula or procedure for computing something) used for optimizing the provisioning of compute and other resources in cloud computing. This is something of real interest to Google, Amazon, Microsoft and all producers of cloud computing (a highly distributed internetwork of computer servers to host online computing).
Finally, there is another esoteric corner of computing called “fractal data analytics.” Fractal analysis is assessing fractal characteristics of data. It consists of several methods to assign a fractal dimension and other fractal characteristics to a dataset which may be a theoretical dataset, or a pattern or signal extracted from phenomena including topography, natural geometric objects, ecology and aquatic sciences, sound, market fluctuations, heart rates, digital images, molecular motion, and data science. An important limitation of fractal analysis is that arriving at an empirically determined fractal dimension does not necessarily prove that a pattern is fractal; rather, other essential characteristics have to be considered. Fractal analysis is valuable in expanding our knowledge of the structure and function of various systems, and as a potential tool to mathematically assess novel areas of study. And it even has some limited applications in advanced multivariate fraud detection for financial services. There even is a successful company by the name.
But honestly, in our 17-years in election technology and 40-years in technology overall, we strongly believe fractal is not helpful for election data administration.
To bring this point home to the topic at hand, one more thought about fractal. At the end of the day, “fractal” is also simply a word. As with all words, people can use the term however they wish. In technology, fancy or esoteric words are often used to describe something thought of as innovative. For example, anyone could start a business with a name like “Fractal Systems & Software Inc.” producing a software product that does something-or-other, and could call the software itself “fractal software” — just as easy as you could call it “prismatic software” or “idiomatic software” or anything they wish to position and promote (and attempt to distinguish or claim a unique value proposition to what they offer).
Here is a legitimate example of a company integrating “fractal” into its name and offering specialized computer hardware. However, here is another example of a technology organization called Fractal.app (for no evident reason other than the name perhaps sounds dope). This organization claims to “Empower a human-centered web3 space and help projects scale with decentralized identity solutions.” The software that they are developing could be useful for identity management in blockchain transactions, but nothing relevant to elections admnistration.
Fractals & Fraud
Out of intellectual honesty and giving those who believe they have some new science-sauce for elections data administration let’s consider how fractal concepts might arguably be applied. So, another example of the buzz-wordy use of “fractal” is software that performs data analysis to detect fraud. To be honest, it’s not clear whether there is anything in the applicable data analytic techniques that is actually related to the math fractals or the software fractals we’ve already described. However, we do know several organizations use the term to refer to software applied in financial fraud analysis, including the venerable ATM (automatic teller machine) company NCR.
NCR employs a service called “Fractals” but it appears from our analysis that this is (again) merely the use of a label for a product name. Why? Because NCR isn’t using fractal math either!
The NCR Fractals system actually employs Bayesian statistical analysis with proprietary inference techniques, to make an adaptive classification engine (ACE). The ACE incorporates data from a wide range of sources. In addition to typical transactional information, ACE can include data from other internal systems and fraud-scoring models. And the service leverages information from specialist third parties — other related types of data clearinghouses — in order to validate the device, IP address and geo-location of the payment source; all things any digital commerce platform would want to do. And in fact, American Express employs similar tools.
In our humble, but informed opinion, this is far-and-away overkill and largely inapplicable to examining the correctness of voter registration data. However, if you remain overly fascinated and unconvinced, then google “fractal fraud”, pop some corn, open your favorite beverage, and settle in to your favorite chair for a trip down a rabbit hole.
Here’s the bottom line: there isn’t really anything called “fractal technology” that would apply to election administration in the sense that semiconductor technology is used for computer hardware, or virtualization technology is used for cloud computing. Computer chips and software are fundamental and important technologies. Of course, anybody can use the term to encapsulate a value proposition; for example, “moisturization technology” for skin care. “Fractal technology” could just be some set of software (not a technology per se) that somebody wants to call “fractal” just to have a dope name.
However, the cross-over issue remains fraud management, which is relevant to elections, because election officials and law enforcement have a responsibility to find and prosecute election fraud. As rare as it occurs in practice, the responsibility remains an important one. So for now, let’s set aside buzzwords, esoteric technology, and dope names, including fractal, and consider how fraud analysis connects to elections, and specifically the management of voter lists.
Voter Lists Are Not Dirty
To begin, let’s be clear about the claims of “dirty voter lists”. While there are on-going hygiene issues with voter rolls, suggesting lists are “dirty” is to imply that the data is populated with intentionally fraudulent data. This kind of sensationalism and provocative language must stop so we can get at the root of the issues and repair them.
Election officials carefully follow the law to manage voter lists, and they don’t insert fraudulent voter records into any voter list, voter database, voter registration system, or voter records management system. We can cite no verified and adjudicated reports of that ever happening. We invite readers to provide us such citations (spoiler alert: our Legal department tells us they don’t exist). The claims are driven by conspiracy-theory and specious at best. Here is what we know: There are specific rules about what election officials must put into voter records, and what they may not. These intake rules are not perfect (see below) but they are designed for effectiveness to prevent junk registrations.
Before moving on to consider those clunky voter lists, let’s be very clear: of course, anyone is free to disagree with these statements about what state election officials do and don’t do. However, for those who disagree, they are in essence accusing election officials of intentional malfeasance, filling voter rolls with bogus voter records, and conspiring with one another to both maintain this malfeasance and conceal it. That’s for a conspiracy minded person to say, but for everyone else, please remember this: voter lists are routinely obtained from states by political parties and campaigns, whose staff scour them for information useful in election campaigns. So, conspiracists need to include all those people in their theory.
Returning to the real world where election officials don’t maintain intentionally dirty voter lists, those election officials also struggle with voter list data that is clunky. By “clunky” we mean the methods and means of ensuring the data is verifiable, accurate, secure, and transparent (where disclosure is allowed) is faulty, unreliable, and often breaks. Those faulty, unreliable breakdowns are not the result of a malfeasant act. When a voter record is entered into a voter list, that’s because it matches the legal requirements for a voter registration — more on that shortly; it’s legitimate and current. Yet, voter records can become out of date for a variety of legitimate reasons; for some examples: voters move to a new address in town, move to another state, die, or are convicted of a felony. These and several other specific cases are all clearly defined in election law as reasons why a voter record becomes out of date.
Wait, don’t election officials suspend or remove out-of-date voter records? Of course they do, but within limits that are set by Federal election law that set national standards for how it’s done. There are basically two processes.
The first process is called “list matching” where an elections office obtains some external data about people, and uses it to find matching entries in the state’s voter list. The external data can be: state department of corrections data on felons; state DMV records on address changes; state department of vital statistics data on dead persons; Social Security Administration records on dead persons; voter records from other states that list new voters in another state who used to be registered in this state.
When this type of list matching activity finds a match that fits with a factor of election law (e.g., currently jailed felons may not vote), there are procedures to remove or suspend the matched voter records. Matching isn’t always exact, and it's not uncommon that a real live person is removed from a voter list because they had a partial match with a real dead person. So there are appeal processes, and occasional litigation. It’s a normal part of US elections processes, sometimes ugly, sometimes confusing and disturbing, but all part of the Great Rube Goldberg Machine of American election administration including the Federal government, 50 states, 6 territories, and thousands of local election offices.Because list matching isn’t done often, and because it isn’t super-accurate, it takes time to remove old entries. So, voter lists are not dirty (in the sense of containing fraudulent data), but they are aged and tarnished, if the sense that they are often encumbered with old data that’s in the process of being removed.
More cruft (as computer scientists often call this condition) comes from the second method of voter list management.
The second process is called “post-card based list management” (yes, this sounds antique) where election officials make a list of all the voters who haven’t voted in some time period (different states have different definitions for that period of time) and send those voters on that list a U.S. mailed (often 3”x5” postcard) notice, requesting the voter confirm that they still live at that address of record. If there is no response to the post cards for four (4) years, then the voter record can be suspended or removed.
You might say, “Four years?!” but that’s specified in Federal election law, passed to ensure that perfectly eligible voters, who registered but haven’t voted in a a couple of election cycles, are not removed because of USPS mishaps or postcards filed as junk mail by someone at the address. A lot of people don’t like it, but it is the law, and election officials follow it.
As a result, we get more “cruft” on a voter list, with some voter records marked as not having been active for a certain number of years, and tagged for removal by a certain date (in accordance with the federal 4-year rule) unless the voter suddenly votes.
Fraudulent Voter Registration?
So, if voter lists must, by law, become aged (crufty) even when the individual voter records being added are known good and qualified, what about those voter registrations that are not in good-standing: might they be fraudulent? There’s a simple story about why that’s rare, and why claims of widespread dirty voter lists are at best misunderstandings.
Here’s how the vast majority of voter registrations work.
Someone submits a voter registration application (most often on paper, online, or via a transaction at a DMV) with a name, residential address, date of birth, and driver’s license number (or state ID number), together with some contact information that isn’t used to evaluate the application.
These registration applications are matched against DMV data.
If there is an exact match, a new voter record is created, or a matching existing record is updated for change of address, name, etc.
If there isn’t a match, the application is rejected.
Election officials are obligated by law to accept registration requests that meet the criteria. They cannot reject a request because the name is odd, the address unfamiliar, the signature strange looking (the signature can come from DMV records or be part of a paper application).
And they may not reject such a request merely because some “fractal” software (or any software, or 3rd party opinion) suggests that for some reason the voter data is suspicious.
Nor can they remove an existing voter record because of fractal-derived fraud suspicion — they can only remove voter records using either of the processes of list matching, or postcard-based list management.
Hardly Perfect
Is this incoming-registration process perfect and wonderful? Of course not. A small fraction of requests are not vetted with DMV data, for requests of people who don’t have a driver’s license or state ID card. DMV data isn’t perfect either; it could allow a residence address that isn’t real, and election officials, despite having the data to catch it afterwards, might not do so. So there are edge cases for bad actors to slip junk data in the front door of voter registration.
It is also possible that a criminal could use someone’s driver’s license information to impersonate them and register them to vote even though the individual does not want to vote. The possibility of voter registration impersonation has been raised for years, and while possible, there are only a handful of cases of suspicion, and no prosecutions of record for such an act that we can find. Again, readers are encouraged to sent us citations. But even if there were a voter record of you, which didn’t result from your request, the voter record itself is a legitimate verifiable entry in the voter list. The impersonator has committed a crime in submitting that registration, but the voter record itself is not fraudulent. In other words, the process of registration in the case of an impersonator is a fraud, but the record itself is legitimately verified data.
There are other “edge cases.” Consider people who intentionally lie about citizenship or age eligibility. Again, according to records, it is rare, and even more rarely including follow-up fraudulent voting. Appearing in person to fraudulently cast a ballot, is to commit a felony, and personally risky — far more so than other options. Submitting an absentee ballot following a fraudulent registration is also not effective, because the signature on the submission envelope must match the registration (often DMV) signature on file, during the legally-required signature matching process of absentee ballot intake.
Nevertheless, even though the scope for ballot fraud is very small, the voter registration process is still not perfect. However, that does not mean that election officials should start using extra-legal methods of allowing “fractal” (or prismatic, or idiomatic) software determine which voter registration requests to reject. Yes, we need to do better with voter list hygiene, so what can be effectively done to improve voter records management and increase public confidence in voter records?
The Real Problem with Voter Record Trust
Let’s address the confidence problem first.
The root cause of distrust of voter lists is the fact that they are (mostly) not public.
Not being able to see voter data, and not being able to see the routine processes of updating it and correcting it, means that people can infer some wild stories, ranging from “hidden purges of undesirable voters”, to “millions of fake voter records maintained on purpose”.
“Sunshine is the best disinfectant” as the wise saying goes, and the first step that election officials can take is to cast sunshine on both the data itself, and the routine work performed to manage and maintain that data. Then any of the concerned folks can decide for themselves if there are nefarious problems; and if they see signs of such, they’d have evidence to take to election officials and/or the public.
It wouldn’t be hard, because in fact it already happens, just not in a very confidence-inspiring manner. First of all, every state already make its voter data available to eligible organizations with limits on use, and sometimes for a significant fee. There are plenty of organizations and people who can see if there are what they view as problems, who can (and do) litigate or seek other corrective action. There are real issues, but in a small number of cases. But that’s not exactly sunlight; rather more like candle-light.
Better is what (for instance) North Carolina does: publishes on the World-Wide Web the (at that moment) current voter list every Saturday, adding to an archive of such snapshots going back years. That’s not exactly full sunlight either, because an archive of (.CSV) files is not super useful for most people (unless, like us, you love hacking Excel spreadsheets). Yet, the data is there, and it is available to compare current with past, to find cases of new, changed, or deleted voter records. We know: We wrote software to do that. Ditto for comparing voter lists for two states, to see if there might be records in two states that could be the same person — (we’ve done that too) and its basically what the old (now defunct) Crosscheck system used to do, and what the Election Registration Information Center (ERIC) system also (mostly) provides.
Other states also make the data available frequently without constraint, just less conveniently: you have to get the data from them on physical media mailed to you, and there is a small fee.
All these states have worked out important details, like how to redact personal identifying information, how to remove from the public data more information of vulnerable people who would be harmed by public disclosure of their address.
In fact, with the subset of states that make available free or cheap voter list data, you could compile (and refresh every week) a decent subset of a “national voter file.” With just CA, NY, and NC to name a few, there are three of the top 10 biggest states and about a quarter of the population. So anybody that wants to perform large scale data analytics can do so now.
But clearly, it can be improved, with every state publishing its data, and making it interactive so that people don’t have to do software programming to massage the data into information. And — well, we can imagine (that’s what we do here at the Institute) — publishing that data in the established NIST national standard common data format, so that it is super-easy to rapidly compare and aggregate.
Why Not Make it Better?
So, why isn’t this happening?
It would be the tip of an iceberg, with 3rd parties doing amazing things with the open public data, normalizing addresses (this is actually a big deal, by the way) shining a light on real issues, and debunking fake issues. Why not? 🤔
Well, for one thing, there’s a different story in every state, which after all, is the way of U.S. elections. In some states, new legislation would be required to enable voter list publication. In other states, there may be perceived political issues, where some residents of the state would be taken aback that their name, address, year of birth, and possibly other non personally-identifiable data is actually already a matter of public record, once you register to vote. And in every state, some new legislation or rule-making would be required to extend the data for greater use, such as including the last 4-digits of Driver’s License Number (DLN) or noting the lack of DLN or SSN.
And for sure, in every state, some new funding would be required to pay for doing what North Carolina already does, much less improvements like making the data interactive.
We also know there is some trepidation about the “clunkiness” of existing voter data. For instance,
There will be more voter records in some counties than there are adult residents of those counties, and that is because of the 4-year rule of Federal law. So, if we’re intellectually honest, we must say the file is always pending additional clean-up; if we’re intellectually dishonest, we’ll cry that the voter file if full of fraudulent voters.
There will be records of recently deceased people, because the list matching to remove them hasn’t happened yet. In fact, it would probably require funding to do list management more frequently (if the data was going to be publicly available), and changes to the data format, such as to identify a record as very-likely about a person who moved out of state, and is pending removal in some number of months, in order to make it more comprehensible.
And yet, we are unaware of people taking North Carolina election officials to task for publishing data that is, actually, pretty clunky. Nor do we see claims of fraud from people who examine that data. So, to be candid, we expect the trepidation of other states’ election officials might be overstated. Stretching a little further into far-less comfortable areas above our pay-grade, we submit the cynic could fret that few on any side of the issue are genuinely interested in making things better, because that would eliminate their cause and purpose and cease the value or utility of their fight. We hope that is not the case. 🙄
Could Fraud Analytics Make it Better?
Notwithstanding our contention earlier that systems like NCR’s, or technology used at American Express or tools embedded in massive digital commerce platforms like eBay are simply beyond overkill and in many ways just inapplicable and unworkable, it is possible that fraud analytics could be usefully applied to a national file snapshot created by acquiring data from every state (at some significant cost), or to a single state’s voter file and some external data.
But there are still real world limits.
With over a half-dozen of some of the most veteran election administrators now on the OSET Institute team, we have even more insight to the practicalities. Consider:
First, this is not something that state election officials have the capacity, funding, or legal enablement to perform.
As we described above, there are only a few specific reasons why an election official can reject a voter registration application, let alone suspend an existing one. And the results of a fraud analysis are not among them.
With new funding, a state election office could obtain and use fraud analysis software, and use the results at a minimum to start the post-card 4-year process. But election offices nationwide are under-funded and many lack the capacity to perform a new Request for Information (RFI) and Request for Proposal (RFP) process for a new analytics product or service, let alone to hire and retain data analytics professionals.
That’s not to say procuring some external (and potentially expensive) outside source to do this is a bad idea. However, we note that there are timing, funding, and feasibility constraints that would not apply to independent watchdog organizations using open public data and sharing results with election officials.
Another key point is that analytics are only as good as the data.
For example, someone might see your residence address in a voter list, and check your home’s property record and see that you’re not listed there, nor is the property on a local list of rental properties. They might ask, “Do you really live there any longer?” Yet, the property records could be incomplete due to human data entry error or software glitches (or, wait for it, data update delays). Or they could find the names of quite a few people who have at some point in recent years received mail at that address, and whose voter registration may not have expired (does that mean they are fake voter registrations?) In fact, the data is simply out of date.
In other words, fraud analytics could be an additional tool in election officials’ hands for voter list management, but it cannot deliver smoking-gun results that enable immediate take down of allegedly fake voter records because existing laws and regulatory practices still apply.
The Way Forward: Transparency, Technology, and a Thousand Flowers
We’re troubled by the extent to which many Americans distrust elections (even to the point they may flat-out no longer believe in them). And that includes suspicions that voter lists are cooked, or dirty, or phantom-ridden, or any number of other suspicions. It doesn’t have to be this way.
The first step is open public data, followed by improvements in the data’s usefulness. We know from experience that such data can be used to assist voters, and the more data is publicly available, the more technology can be built by public benefit organizations to deliver more capability to more people, whether voters, organizers, or watchdogs.
It hasn’t been shown nearly as much as it should be, but it is also possible for open public data enables 3rd parties to use the data to benefit the government organizations that published it. We believe third party voter data analytics could turn out that way too.
But perhaps most impactful is the show-me factor. If there are those who believe they see thousands of suspicious voter records; that they have real records that could be real evidence; and if concerned parties could look for themselves and see addresses with hundreds of people registered — then that could be fraud indeed… or it could be something that election officials need to explain better and to improve their data.
Regardless, the first step is to cast light on the troubling suspicions of far too many Americans. Then, let’s talk about some public technology that could do a faster, easier, higher-quality job at a very favorable price-point — which is the point of public (open-source) technology. 🤓 That’s a subject for another time, called PairWise™ (a brief mention appears at the bottom of John’s previous posting). Stay tuned.