Interfaces exist to enable interaction.
Interfaces exist to enable interaction between humans and our world. [...] The act of designing interfaces is not Art. Interfaces are not monuments unto themselves. Interfaces do a job and their effectiveness can be measured.
One underappreciated aspect of LLMs is how they'll (likely) start a new era of interfaces — similar to what GUIs and the mouse did back in the '80s — breaking one of the last barriers in human-to-computer interaction: The continuous-discrete mismatch.
The continuous-discrete mismatch
Humans naturally communicate in a fluid, continuous language — our everyday spoken words. This is problematic because computers are discrete machines that communicate only in discrete languages.
This fundamental difference creates a mismatch that forces us to adapt whenever we want to interact with a computer, often creating friction that makes using these system slower & harder. We can become accostumed to communicating in discrete languages and, over time, make the friction smaller. However, there will always be a learning curve.
For instance, this becomes clear whenever people are learning how to use a computers and they say things like "Why can't it do X? It's so simple.". In reality, it's not that the computer can't do X, rather, it's the person that can't properly communicate with the system in order for it to do X.
There is a communication problem in human-to-computer interactions. There's no denying it.
For years, the solution to this was "get used to it". Our interfaces should strive to create the least amount of friction possible, but there was just no better solution to this — until now. Thankfully, LLMs exist now.
LLMs as translators
Since LLMs can understand the continuous nature of human language, I suggest we use them as translators:
Human language —> LLM understands the user's intent —> LLM chooses a discrete action in the system based on the user's intent.
Where the discrete action
is the computer doing something, such as:
- Buying a plane ticket;
- Displaying the best possible interface for that given task, e.g.: user wants to check a stock price, so let's show a stock ticker.
I call this: dynamic UI.
Dynamic UIs
A true dynamic UI is one where the system understands the user's intention and create the best UI for the given context on-the-fly.
To illustrate the concept, check this demo I've made of an app where you can upload a GitHub repo and ask questions about it. The LLM then dynamically adjusts the interface, shifting from a general file explorer to a code-specific view, depending on the user's intent.
Let's break it down what happened in the UI:
- First, the LLM opened a pane of all the files it was currently looking at;
- Then, it decided to change the ’file explorer’ pane to be a ’code explorer’ pane. Initially, it just opened a single code file;
- Later, it opened a new tab in the ’code explorer’ to display the
RegisterServiceImpl.java
.
The important detailt here: this behaviour was not pre-programmed. Instead, I've created a bunch of mini React components and the LLM selected which to use and when based on what the user asked.
I predict there will be a lot more UIs like this in the short-medium future: where the developer creates a set of interface components and the LLM pick from it at runtime.
To be fair, this prediction isn't really hard to make as we're starting to see some UIs move to this.
Truly dynamic UIs
One may argue that this isn't really dynamic since you have to create yourself the set of React components the LLM can pick from. This is a valid point and, in the short-medium term I don't see LLMs getting smart enough to truly create good on-the-fly interfaces from the ground-up and fast enough.
However, LLMs will eventually overcome this barrier as their capabilities grow and inference technology improves.
Nonetheless, there is one big problem: most people type way slower than click, making chat-like interfaces slow for tasks that don't inherently involves natural language.
How to solve this?
UIs from the future
First, let's define the problem:
For LLMs to know what the user wants to accomplish, the user first needs to type its intention and, typing tends to be slow.
The crux of the problem is getting the user's intention to the LLM faster. To this end, I predict that the same way we needed a new device to get to GUIs — the mouse —, there'll be new device (or software) that'll solve this by essentially knowing the user's intention before it actually types it out.
In terms of speed: typing < clicking < thinking.
Maybe something like Neuralink.
[Update on 4 Oct 2023]: Or maybe something like Tab.
Imagine a future where the computer builds a custom interface just for your particular usecase on-the-fly. I think this is where we're headed.