Build the App, Which Happens to Have AI

ChatGPT was an innovation – particularly for building services surrounding natural languages. One sure thing was that natural language processing was not born in 2021. However, many were too busy to learn the difference between stemming and lemmatizing, not to mention the difficulties in building a custom BERT model that is versatile and easy to deploy. With ChatGPT, many tasks from mundane regular expressions to high-level natural language user interfaces became a matter of writing instructions and providing a few examples. It was clear that many systems would be born out of the GPT API.

And surely they did. From ChatGPT, and then services with machine vision capabilities, along with many providers offering similar stacks, we now have an abundance of “AI” apps. It has become commonplace to see a website with chatting capabilities. Tools like Microsoft Office, which was rolling out intelligent features like designing slides from before, introduced text-based interfaces to their lesser-known features. Many startups claim to set a new way of interfacing with AI, with plenty of AI apps on Apple’s App Store being just mobile ChatGPT clones with GPT API behind the scenes.

Now in 2024, one thing has become very clear – not all AI apps are useful. The apps and programs we use every day don’t necessarily “speak.” Our email clients may have AI assist capabilities, but most of our work hours are still spent writing replies or fixing the AI assist’s draft. Office programs may have some automatic features, but we still spend hours making pixels perfect in our slides. This goes beyond our reluctance to embrace the new king of AI. It’s because, for tasks like answering emails, making slideshows, writing code, drawing blueprints, and many other activities we’ve been doing for decades, we need tools with reliable performance and ease of use, not something that nudges us with fancy new features that will disrupt our work hours.

Many, or perhaps most, application software don’t necessarily need “AI” features like natural language interfaces. Often, such interfaces are added for the sake of joining the AI adaptation race. Natural language sounds great, allowing new users to, well, naturally access the deepest features. However, for skilled users of a tool, what is desirable is to press the trusty shortcut Ctrl+Z instead of shouting ‘undo’ every time they make a mistake.

Perhaps a good case of natural language interface for the sake of a natural language interface can be smart speakers like Google’s Assistant hardware, Amazon’s Echo, and Apple’s HomePod. Smart speakers were promised to bring a new paradigm of voice computing to our touch-centric daily lives. They were speakers that could answer trivia, guide us through cooking with recipes and timers, make calls and calendar adjustments, control the household, and play some music. But in reality, we didn’t need something to speak to for all those tasks. We could always reach for our smartphones to do all of those things quickly by typing instead of struggling with speakers mishearing our pronunciations. It was at least a speaker, so we decided to listen to music with them and then return to our phones because shouting to turn down the volume felt unnecessary. In the end, smart speakers could only survive by being something else: a very good speaker with a voice control option or a data hub with an always-on screen.

But there are sure uses for interfaces based on natural language, particularly when the program has open feature sets with numerous use cases and potential personalizations. It is up to the architects to present a valid reason to talk with their applications. Many times users find it useless to speak with programs. It is only when the features are well thought out, beyond being just another fancy new toy, that users would at least consider playing with it. What people want from programs is to get things done, whether it’s fulfilling their contract or finding some fun in their busy life. Playing with a toy just because of its fanciness may happen when the novelty is fresh enough – but we are at a point where “AI” itself is no longer fancy or fresh by itself.

In brief, chatbots would be made useful when they are counselors rather than chattering bots. From ELIZA to ChatGPT, the common use case for chatbots (or chatter bots in older terms) is to counsel the user. ELIZA was a good, understanding listener; GPT counseled as a confident nerd of everything on the web. These counseling agents, if they must be chattering endpoints, should at least excel in chatting, like maintaining an excellent persona for a virtual character chat. However, this is where modern AI, or Transformer-based causal language models trained on large web corpora, fails the most: providing reliable information to users.

No matter how large language models become, it is unlikely that the current deep learning architectures will bring the truth-speaking machine, let alone the challenge of defining the truth. Offering a personalized experience is yet another challenge scaling the models alone would not solve. For an application incorporating language models to be useful, it needs to go over this challenge by providing enough information to the generative agents to offer personalized counselings. In other words, a natural language interface is useful when it offers a new, natural way to interact with the data stored in the program.

To build a natural interface to the stored data, it is crucial for users to store their data in the application in the first place. People store their data because it is beneficial to do so – because the application provides enough benefit to justify the effort of storing data in the program. Whether it is typed-out notes, handwritten memos, photos, music creations, geographical records, or any other structured and unstructured data sources, the app should excel at handling them so well that users will use it regardless of the fancy AI features. If users find the AI features useful, that’s an added benefit that will attract them to use it more.

A good case of this would be Goodnotes and Notion. Goodnotes is a mobile application famous among iPad users as a go-to place for handwritten notes, offering a traditional notebook-like experience with digital conveniences such as inserting photo assets and searching the note. Notion is a popular web service for building personal knowledge bases and dashboards. Both of them recently added a functionality to chat. What distinguishes their chats is that they are not generic chatters. Instead, they are augmented with user content. For sure, these retrieval-augmented chatbots are not a novel concept. What sets them apart is their strength as applications without AI functionalities.

This is particularly true for Goodnotes. Goodnotes, as a mobile note-taker, has invested much in handwriting recognition, offering features like searching handwritten notes since 2010s. Their recent feature to chat with user notes is an extension of this, as their retrieval is from handwritten notes with cited sources provided as the user’s handwritten notes. Notion also benefits from this strategy because it offers rich features good enough to use as a personal knowledge base. A new application can try to mimic such approaches by building a talk-with-a-doc program. However, if it is not on par with Goodnotes or Notion for saving documents, users would find no reason to use it instead.

Build great apps. Let the great app happen to speak. Applying chatting functionality is easy; making intriguing features is still a challenge. People may try an AI application for one or two subscription cycles out of curiosity. What draws people over and over are great apps that are still great without AI. It isn’t about making an app that speaks; it’s about making an app worth listening to.

Gyu-min Lee

Build the App, Which Happens to Have AI

Leave a Reply Cancel reply