This post began as an experiment where I used a Large Language Model (LLM) for content creation. It helped with some ideas, but largely missed what I was going for and wrote like it was trained on Wikipedia. So I rewrote nearly all of it.

image credit: https://thecodinglove.com/wysiwyg

I’ve been reading a lot of hype about Large Language Models (LLMs) like chatGPT lately. I’ve been reading, they are going to 50x people’s workloads and make developers, machine learning engineers, and nearly every other knowledge worker obsolete. I hear things like “What if we just put the whole thing in the model’s context, and tell ChatGPT-X to figure out the details.” Wouldn’t that just be awesome?!

GPT stands for God Provided Technology, like sweet sweet manna from… Oh. Shit. That’s not right. It stands for “Generative Pre-trained Transformer”. That’s kinda pedestrian.

LLMs are transformative and represent a huge advancement in the field. But what really does it mean to use them?

Well, I’m going to try to provide some thoughts about these questions, from my own experience using traditional ML tools, frameworks, and models, as well as my experience with LLMs. To do that, I’m going to use no-code dev tools (some of which are called WYSIWYG which stands for “what you see is what you get”) as what I think is a useful analogy for what LLMs will eventually become.

What are LLMs?

Large language models are a type of neural network with a very large number of parameters (often in the billions) that have been trained on massive datasets of text and code. They can be used for a variety of Natural Language Processing (NLP) and Natural Language Generation (NLG) tasks, including generating text, translating languages, writing different 19kinds of creative content, and answering your questions in an informative way. The latest versions have been fine-tuned using Reinforcement Learning from Human Feedback (RLHF) creating even more powerful models that can perform even better in long-term contexts.

LLMs are fantastic zero/one/few shot, and end-to-end model engines.

A one-shot model can take one example of a behavior and generalize that behavior to new inputs. One-shot techniques are usually not the most powerful, but when you’re just getting started, these models can help you prove a concept or even help you collect new data to train a more powerful one.

An end-to-end model can take minimally processed input and convert that into a “final” output, without having to do intermediate steps of processing. This approach, when it’s appropriate, can drive development faster and allow you to tackle problems that would be expensive or time-consuming to build in traditional ML pipelines. End-to-end models do this by learning to do intermediate processing on their own.

How are people using LLMs?

One of the main challenges of using LLMs is that they can be difficult to control. LLMs are trained on such a large amount of data that they can sometimes generate outputs that are not accurate or relevant to the task at hand.

Two ways to improve the controllability of LLMs are to use prompt engineering and fine-tuning. Prompt engineering is the process of creating a prompt that will guide the LLM to generate the desired output. I often think of this as a no-code development tool for NLP.

Prompt engineering can be used to control the LLM output without massive datasets, and can sometimes be done with limited expertise or evaluation. For example, if you want the LLM to generate a specific type of text, such as a poem or a news article, you can use prompt development to create a prompt that will guide the LLM to generate the desired output.

Fine tuning on the other hand is more like traditional machine learning, where you provide inputs and desired outputs and try to update the model to specialize it to do one task. In the literature, this is consistently a more powerful approach. However, it requires a much better understanding of the model development process including data collection, evaluation, and parameter tuning.

Prompt engineering is kinda like no-code development

The expertise level needed for prompt engineering and fine-tuning varies depending on the complexity of the task. For simple tasks, such as generating a poem, you may not need any expertise in machine learning or natural language processing. However, for more complex tasks, like translating languages, you will need to have some expertise in these fields.

No-code development tools are software that allows users to create applications without having to write any code. These tools are becoming increasingly popular, as they make it possible for anyone to create applications, regardless of their level of technical expertise. However, no-code development tools have been around since before the 70s and are still not the way a majority of development is done. Developers chose the flexibility and power of more traditional development practices in the majority of use cases. Some no-code platforms do super-charge small facets of engineering though.

LLMs can be seen as an analog for no-code development tools. Just as no-code development tools make it possible to create applications without having to write any code, LLMs make it possible to use machine learning and natural language processing without having to have any expertise in these fields.

In the most hyped-up examples of how these models are going to change everything, they are often discussed from the perspective of prompt engineering, where a user with little understanding of machine learning or the mechanics of the model will be able to quickly iterate and develop against real-world problems. This is a powerful force for democratization.

It doesn’t however exempt you from the challenges of classical ML: objective measurement, issues with your problem formulation, and ethical concerns surrounding your collection/use of the data. To really nail these things is to have some of the hardest skills that ML engineers must master.

In the end, LLMs are likely to be just like no-code dev tools. Some people use them to explore without technical skills. Some will use them in a more traditional way to supercharge their tools and products. BUT, things won’t go so well, if there’s not someone in the room that can provide a clear-eyed evaluation of the end product.

So the free lunch really is in fast iteration via one/few-shot learning and powerful end-to-end capacity. Embracing that, and doubling down on those implications will lead to the best products and advancements. The products that take that approach will have staying power beyond a time unit defined by the rise and fall of Dogecoin.

References:

CEO is so worried about remote workers using A.I. and doing multiple jobs he threatens to increase quotas by ‘30 to 50 times our normal production’
https://fortune.com/2023/04/21/remote-work-artificial-intelligence-multiple-jobs-ceo-threatens-productivity/

Openai Prompt Creation Documentation
https://platform.openai.com/docs/guides/completion/prompt-design

Language Models are Few-Shot Learners
https://arxiv.org/pdf/2005.14165.pdf

One-shot learning (computer vision)
https://en.wikipedia.org/wiki/One-shot_learning_(computer_vision)

What is End-to-end Deep Learning?
https://www.coursera.org/lecture/machine-learning-projects/what-is-end-to-end-deep-learning-k0Klk

WYSIWYG
https://en.wikipedia.org/wiki/WYSIWYG