Thoughts on building user interfaces for AI-powered features
My first experience building AI-powered UIs was about 3 years ago. It was a
business intelligence application, similar to Excel but in the browser.
Visually, it was almost like any other spreadsheet app except for the sidebar,
a sidebar with a text box that allowed you to query information about your
data. This was right after ChatGPT launched, so it was an idea worth
exploring: How can we leverage AI to analyze spreadsheets?
Rules weren’t clear yet, so we could build almost anything we wanted but, and
there’s not a lot of existing UI patterns that we could get inspiration from.
We’re building something that feels new.
For the past 10 years I’ve been building UIs, mostly in finance, where we
products and services already exist, sometimes for decades. There’s still
innovation going on but there’s always invoices, data tables, date-pickers,
and line charts.
Most of the UIs are improvements of other UIs that already exist. We’re
constantly iterating, following trends, trying new technologies. We already
know which inputs and outputs we’ll get.
Building with text is new.
We know that we’ll need to send instructions to the LLM provider and we also
know we’ll get text back.
How can we build the bridge between browser -> data -> LLM -> data -> browser
in a way that makes sense for the users?
If I want to get the list of all rows that have an amount greater than 0 I can
write a formula, but when the formula is written by the AI, how can we tell
the AI which column has the amount we’re looking for?
You can add it using text, or we can add special context using the UI.
In spreadsheets you can select multiple cells at once, let’s send that as
context:
{prompt:"Calculate the average of the amount column X, and return the result in a new column called 'Average Amount'",selected_data:JSON.stringify({...}),spreadsheet:JSON.stringify({...})}
By the way I never got to making sure code syntax worked in my blog so... it's
ugly, yes.
So this is where the technical decisions have to be made:
Do we need to send the entire page?
Yes, because even if an advanced user knows the AI will get a special tier
of context for the selected data:
What happens when there’s nothing selected?
What if the real answer to the query is outside of the selection?
What if the selection contains a formula? Then the actual value depends on
some other fields, and they may cascade references.
What if the value on a cell depends on a webhook or external API? This means
we should send some updates to our spreadsheet when the user may not be
looking… Is it appropriate? safe?
Let’s send the entire page to the LLM, that’s what makes sense.
Except for the very small insignificant fact that
processing more data is more expensive. Running queries across
multi-megabyte spreadsheets can easily exhaust the available context (and our
instructions count against it too).
And lastly, what if the AI still does not give us what we’re asking for?
That’s a few cents down the drain. Now 100x by the number of users that could
be using the application every single day, running multiple queries.
So it gets very complicated, very quickly.
Now let's take a step back. What if we keep it small, limit the amount of
context we can send?
Dealing with dates is still complicated, this was 2022 so even if things
have evolved it was not clear at the time how to deal with these situations
Text based for spreadsheets and business intelligence can work with things
such as column-name matching, data import-export with somewhat automated
pipelines
Importing/exporting from/to context starts to have issues because
spreadsheets also rely on special syntax and formulas and getting that in
and out of the GPT apis proved to be incredibly hard
Sometimes it is more valuable to give the AI the formula and not the value
itself, but again, time consuming, and the referenced fields could also be
references to other cells increasing complexity
Because some cells can be not real values but formulas, writing data back to
the spreadsheet starts to become a real engineering problem, you don’t want
to completely remove the real references, right?
It’s been a few years but I still wonder how we’ll solve some of these issues.
Since then I’ve worked on other types of projects but still building
AI-powered features such as building UIs that will enable an end-to-end
process for gathering financial information from private documents using AI,
then making insights available to investors @ Tap, as well as using AI for
marketing and customer interactions @ Treble.
And we’re still asking lots of similar questions.
It’s clear that users have high expectations of AI-powered features but the
solutions aren’t crystal clear, there’s a lot of small details to uncover in
each step and interaction so that the user experience is great, polished, and
most important of all: solves the user’s problem.