OSET Institute

View Original

Who Should Make a Voter AI Chatbot? (Part 2)

In the previous installment of this series, I gave a simple answer of “Nobody!” to the question of who should build a voter Chatbot. The reason was simple: the typical Chatbot is equally simple — and fatally flawed: a thin veneer of web (or App) user interface on top of an application programming interface (API) that connects over the Internet to a massive computing complex run by tech-titans with AI divisions like Google, Microsoft, Amazon, Meta, OpenAI, and a other emerging contenders.

All of these titans and contenders have invested very large financial resources and computing resources to build large language models (LLMs) that are the core of the AI that enables a variety of software systems to converse with humans in a natural manner. To become very human-like, whimsical, creative, and so forth, the LLMs are built on an enormous array of human text, true and false,  good and bad, ugly and more. That goal being met, as a side effect, the LLMs are fundamentally polluted, capable of lies and hate, outright falsehood, and creative falsehood a.k.a. Hallucinations. To be sure, preventing issues with generative-AI technologies can be challenging.

Simple Chatbots based on these LLMs are a disaster for anything where accuracy, safety, and reliability are fundamental requirements.

That being the case, there is a huge panoply of techniques to try to mitigate those basic defects. A system built using these techniques is sometimes called (in contrast to a simple Chatbot) a natural language agent (NLA). Often, an NLA is built for a specific purpose, to converse about a particular topic of domain or knowledge. Such “domain specific” NLAs can be a horse of a different color, but there are several approaches to building them. Here are a few.

1. Who Should Make an Elections LLM?

Perhaps the most obvious path is to make an LLM that is specific to a domain, such as elections, and avoid the consequences of polluted base models. There are two basic methods: [1] start from scratch (from the ground-up), and [2] build on top (graft on to an existing LLM).

The start-from-scratch method is hugely expensive. Only the tech-titans like Google, Microsoft et. al. have the resources to do that — but actually they have no motivation to do so. Building a new LLM from scratch, limited to the base data of a particular domain, is very expensive — and then building an NLA on top of that, with the necessary “guardrails” (and other safety tricks—some would say “hacks”) specific to the domain … that’s just massive investment of resources. And doing this separately for each special domain? That’s not going to happen.

So, who should build an election base model from scratch? Nobody nobody with that mission has the necessary money, and nobody with the money can earn a profit from it.

2. Who Should Train an LLM on Elections?

Then there is the build-on-top approach. The tech-titans’ LLMs have the ability to add to the “training data,” some additional data of your choice. So, if you want to build a domain specific NLA, you could start with an existing LLM (already built at huge cost by someone else, who are eager for your dollars to use it—yes that’s the idea; they build them and you subscribe to their API to use them). So, then you assemble your domain-specific information (data) base, and feed it to the tech-titan’s LLM as more training data, in addition to the huge amount of human wisdom and detritus it started with.

  • 🤓 Good news: the resulting LLM++ is more conversant in your domain!

  • 🙄 Bad news: it’s just as polluted as before you began.

  • 🤓 Good news: You can make use of safety workarounds (to try to avoid the falsehoods and hallucinations) because your desired input and output is limited to one domain of knowledge—elections.

  • 🙄 Bad news: you can (🤔 um, actually you will) invest indefinitely in these workarounds in your NLA, because the LLM is still polluted (it always will be) and you cannot eliminate lies and hallucinations; you can only reduce the risk someone will receive them.

So, who should build such a thing anyway? Someone who:

  1. Realizes this is not for the faint of technical heart;

  2. Has access to moderately deep financial pockets (realistically, in tens of millions of dollars);

  3. Has the resources and patience for testing; and

  4. Has the ability to take the flak when the NLA malfunctions and informs somebody how to use well-proven techniques for voter intimidation or electoral fraud.

Who might that be? Not the tech-titans, and neither state or local election officials.

3. But Wait; What About Generous Titans?

But wait, what if a tech titan really wanted to do some good for elections, not seeking revenue in return for investment? Couldn’t they do something?

Sure. They could gather a very complete and comprehensive mountain of additional training data on how U.S. elections are run. They could take their existing LLM (Gemini, Bard, Llama, etc.) and then train (and iteratively test and modify) it some more with this new mountain of data, then deploy the result for public use.

Maybe? Well, any such organization that did so would, indeed, not be the faint of heart. And they’d have the deep resources to fund the effort, and could (without assurance that they would) invest in testing (and fixing, and testing, and fixing …), and they could surely withstand whatever flak came their way when (not if) creative and skilled people made it misbehave to explain how a prior election was stolen and the next could be too.

The lesson here: start with a polluted LLM, you end up with an unsafe system.

Sure, with a lot of effort, it could be a lot less unsafe than a simple Chatbot. However, if you’re operating in an environment with nearly zero tolerance for inaccuracy and hallucination, then you have a mismatch. The system isn’t actually good enough for real public benefit, but someone could choose to operate it anyhow. The result: maybe it would be good “PR” for the operator …for a while, but with a serious risk of messing with voters and public perception of trustworthiness in elections.

4. Formula 1

As a result, I’d say the answer to this “Who Should Build” question is … the tech-titans should not try to make an “election-ish” silk purse out of their general purpose LLM sow’s ear.

Ouch! That sounds rude. I don’t mean disrespect to these companies, their inventive people, and the many great uses the technology can be put to. It’s just that a powerful general purpose technology is not (in this case) the right base to build a special purpose system with very low tolerances. 

Maybe it’s more like Formula-1 race cars. They have extremely low tolerances for error. Nobody builds them from general purpose cars, not even the most high performance car products. Don’t take my word for it; our COO’s family was in the F1 business ages ago; he’ll tell you: Formula 1 race cars are the epitome of purpose-designed, engineered, and built products. As motor vehicles go, they are not inexpensive—$12M to $15M per vehicle, on average. And they require a very well trained team—or “pit crew”—to operate at full capability. I’ll have more to say about this in future posts.

So, with that loose analogy in mind, some questions for my next installment:

  1. Who would build a custom-crafted domain specific NLA?

  2. How, would it be done in a cost effective manner?

  3. Who would be the pit crew of such a low tolerance, highly application-specific (i.e., elections information), high-performance machine?

Stay tuned for more