Who Should Make an Elections AI Service Agent? (Part 5)
In my previous (4th) installment in this series, we pivoted from the Who question to the question of How to build a safe, low-tolerance, domain-specific natural language agent (NLA or “DS-NLA”). And that meant we finally pivoted away from referring to such a service as a “Chatbot” because, as these next two installments clarify, it is far more than that (and thus the pivot in post title.) However, the entirety of Part 4 was spent addressing the elephant-in-the-room question of how to build such a service, if the current base models are fundamentally polluted and unsafe. This time, we assume for the moment that the base model challenge is tractable and we explore the next question: “What else is required, in addition to a safe base model?”
Requirements for Domain Specificity
Let’s assume that a safe-ish base model can be the foundation on which to build a domain specific NLA (DS-NLA) for a single domain. In addition, what does such a DS-NLA need to do, not only to be useful, but to also be trustworthy? Some partial answers suggest that there is real work to be done, aside from the elephant we discussed last time. Let’s start with three key requirements for any DS-NLA, how it would work and look.
The NLA’s builders need to first compile a corpus of vetted authoritative information that is comprehensive across the domain (e.g., everything about how elections work in a specific state).
For any prompt (i.e., question) that’s relevant to the domain, the response must be derived from that corpus only, excluding irrelevant information that may be within the base model’s training data.
Input filtering is required, to handle (to some extent) prompts that are not relevant to the domain; such filtering would be specific to the domain, and related to the corpus of authoritative information.
There is more to say about these requirements, but the common theme is a need for NLA-building techniques that are independent of any one domain, but can be used repeatably to build an NLA specific to one domain. To the best of my knowledge, NLAs (or DS-NLAs) with intentionally limited-functionality have not yet proven to be highly profitable (i.e., more than sufficient ROI has not been identified or proven). So, there hasn’t been much effort to commercially produce them. And yet, for some domains (such as election administration) the degree of profitability aside, they’re needed. So, it’s time to get started.
Requirements for Trustworthy Domain Specific NLAs
Aside from being useful, a Natural Language Agent (NLA) should also operate in a manner that engenders trust, without impacting usability. Perhaps the most important requirement for trust is citations to authoritative sources.
A DS-NLA’s responses need to include citations to specific items in the corpus; items that users can inspect, and independently verify that the NLA’s response is credible based on the authority of the source data used to support the NLA’s answer.
That’s so important it is worth repeating: An NLA has to empower users to take responsibility for checking the NLA’s responses by independently examining citations that produce the evidence of correctness of the NLA’s assertions. An NLA that is a “just trust me” black-box cannot deliver on a basic requirement:
Help a user navigate a complex set of facts, to find authoritative answers to the user’s questions — answers that are factual and evidence-based.
That's why providing citations is capability that we believe is fundamental for an NLA to be potentially trustworthy. However, by itself, citations are not enough. An engaging, simple user experience design is also required (and arguably condition-precedent) to support trustworthiness. And there are three points about this:
For input filtering, highly user-friendly responses are required. A domain-specific NLA (or “DS-NLA”) needs to filter prompts to exclude those that are not part of the domain; but it also needs to be more informative and usable than a Star Trek Classic AI response of “I am not programmed to respond in that area.”
As a result, user interface and experience design is both essential and domain specific. At least for a domain of any real complexity — and elections are plenty complex — a single “ask me anything” search box is not going to cut it as an acceptable user experience.
Therefore, a credible experience needs to include information about the domain, and guide a user to construct prompts (questions) that will generate useful responses (answers).
That’s hardly a complete set of observations, but enough to suggest that for the scope of some effort, at present we have no repeatable worked examples we can leverage. 🙄
Iteration
One more critical point: I expect the effort will be an iterative process, where usage of a DS-NLA will generate information for the system’s builders to improve the system. And this will happen in two ways:
To iterate to the point where empirical results suggest that the DS-NLA might be safe enough for general use; and
To factor in field-use experience to improve the DS-NLA by reducing unsafe or irrelevant responses, and reduce scenarios where safe responses are not helpful.
In other words, it’s not a “one-and-done” process to build a DS-NLA; builders will need to become operators who refine and extend the agent over time. That factor is also a driver of requirements, which I’ll cover next time.
To be continued in one more installment …