Beyond the Hype: Understanding the Science Behind AI and LLMs
From Word2Vec to GPTs: A No-Nonsense Guide to How AI Works
Remember when I stated there was no magic in AI? I now have proof!
I found this engineering-focus post by a Google deep learning engineer that explains how LLMs work, and why it’s just a lot of code and statistics. I am going to take some of his information and take a stab at explaining Large Language Models (LLMs) for people not in the field.
To understand how ChatGPT and such technologies work, we first have to start with some history, and some definitions.
Nomenclature
I’ve never been a big fan of learning words before knowing how to use them, so we are only going to learn 3 new words today, in context of machine learning.
A vector is an object containing information about a token.
A token is a unit of information for which a vector is created.
An embedding is how we save those vectors for later.
A machine learning application takes a token, creates a vector about it, and stores it as an embedding. That is it. You now know half of everything there’s to know about machine learning and Large Language Models (LLMs).
The early days: Word2Vec
Word2Vec was a natural language processing technique published by Google in 2013. Using a large dataset of text, their algorithm learned to associate words. For example, it knew that a “man” and a “woman” were more similar than a man and a cat.
Word2Vec was able to position a bunch of similar words in 3D space (using vectors), and then tell you the distance between them. The image below is for the word “man”, highlighted in pink, along with the next 10 closest words to it: woman, human, manking, humans, many …etc. You can physically see that those words are located close together in a space of words in a database, so to speak.
TensorFlow Projector is where you can experiment with this concept on your own.
It is important to understand that Word2Vec did not know the meaning of words. It could not tell you what a human was, or how to use your manners. All it knew was that those words were related, and how closely they were related withing its database.
Word2Vec also has a pretty cool ability to manipulate vectors. For example, when asked for a “female king”, it knew to return a “queen,” because a queen was “king - man + woman”.
All of this was pretty innovative at the time and it laid the framework for more complex models to come. It also demonstrated that vectors could be used to efficiently store information for neural networks.
However, Word2Vec also had a few big limitation: (1) it could not process more than two words together and (2) it was not aware of the context, thus getting words with multiple meanings confused and matched incorrectly.
Meet GPTs: Generative Pretrained Transformers.
A Transformer was a deep learning architecture developed once again by Google, in 2017, which made the above process many-fold more useful. Instead of simply comparing words one-by-one, an algorithm could now compare multiple words in a sentence, and store their relationships based on context.
So if you said “Today is ____”, the algorithm now knew how likely you were to say “Monday” vs “A good day” vs “time to file taxes” vs something else, all depending on the context of words in the sentence, and words around that sentence.
Furthermore, this new way of building models was also significantly faster and allowed for pre-training of the models. Researches could now throw a ton of data at the model and basically let it figure out relationships in that data any way the model saw fit. Then, researchers would fine-tune (aka correct) the model using what they knew were definitely correct relationships.
It’s kind of like they let a child eat anything the child wanted, and when she was full, they explained that strawberries were sweet, honey was sticky, mushrooms had to be cooked, and so on.
With this power at hand, GPTs were born, and not only could these programs now store data, they could also generate data using a prompt, thus the “generative” part of their name.
Conceptually nothing has changed since Word2Vec except the model’s ability to consume, process, store, and retrieve vasts amounts of data quickly.
Doesn’t ChatGPT talk to me like a human?
No. Chat GPT is really no different than your basic word matching algorithm. Chat, and whatever GPT4 that powers it, is incredibly “smart” in a sense that it can store and use billions of data points. It appears smart because our brains cannot comprehend the processing that happens inside a vast amount of hot GPUs somewhere at a Microsoft server farm. It’s beyond something we can easily visualize, but it doesn’t make it any more magic, than a magic trick.
Chat appears to respond like a human because it has access to incredible amount of information stored in a way that can be retrieved in near-real time. It’s like a very good student with incredible memory: when you ask it for something, it knows exactly how to respond, every time. But it does not respond because it is aware of your question, it simply respond with the information which, accordingly to its calculations, is located closest to your request vectors.
To think of it, Chat is not smart at all. It’s just really good at magic tricks.
p.s. How did I generate that image of a brain at the start of this post?
First, I asked ChatGPT: “My article title is: "Beyond the Hype: Understanding the Science Behind AI and LLMs", can you create an image description that would fit well into this article”
Chat gave me the following response:
A visually engaging and thought-provoking digital artwork showcasing a human brain and a circuit board interconnected, symbolizing the fusion of human intelligence and artificial intelligence. The human brain is illustrated in rich, natural tones, vividly detailed to show its complexity and intricacy. In contrast, the circuit board is depicted with metallic shades, intricate circuits, and glowing nodes, representing the sophisticated technology behind AI and LLMs. The two elements are seamlessly integrated, with circuits and neural pathways merging, illustrating the synergy between human cognition and AI's computational power. The background is a soft gradient from a deep blue to a light, tech-inspired hue, subtly hinting at a digital realm. The overall composition conveys a sense of harmony and coexistence between the organic and the technological, capturing the essence of AI's advancements and its deep-rooted connections to human-like processing and learning.
Second: I fed that response into Midjourney, using their /imagine prompt, and got back the image above.