Generative AI basics - Arthur Takeda

Generative AI has revolutionized the way we interact with technology, especially with the advent of Large Language Models (LLMs) like GPT-3, Llama, Gemini, Claude, Mixtral, and many others.

In this article, I will briefly talk about some of the key concepts that powers these technologies.

Transformers: The Backbone of LLMs

Transformers, an architecture introduced by Google in their 2017 paper "Attention is all you need", are at the heart of most modern LLMs.

The core concepts of transformers are parallelization and attention.

Parallelization

Unlike RNNs (Recurrent Neural Networks) and LSTMs (Long Short-Term Memory) that process input sequentially, transformers handle input in parallel.

This approach not only speeds up training but also enhances the efficiency of the model.

Attention

The attention mechanism enables models to focus on the most relevant parts of the input for generating output.

By weighing different parts of the input, the model dynamically adjusts its focus, which results in a more context-aware process.

Encoder and Decoder: The Two Pillars of Transformers

Transformers have two main components: the encoder and the decoder.

Encoder

The encoder's role is to process and transform the input into a vector representation, known as the context vector. Embedding is the process to convert words into vectors.

This vectorization is very important because it allows the model to actually understand the input and its context mathematically.

Decoder

The decoder takes the context vector and translates it into an output (text, image, audio...). This conversion from numerical representations to a human-friendly format is what makes transformers powerful.

Training and Fine-Tuning LLMs

The training of LLMs sometimes involves using trillions of parameters like GPT-4 is rumoured to have.

The quality and diversity of training data impacts the performance and reliability of these models.

It's also a very resource-intensive process that requires a lot of computational power, which is why most LLMs are trained on GPUs and why it's so expensive to build or train one.

For those reasons, it's currently not financially possible for most companies/individuals to train their own LLMs. Instead, they must rely on pre-trained LLMs that can be further fine-tuned for specific tasks.

Fine-Tuning

Fine-tuning is the process of adapting a pre-trained model to a specific task. This process is much faster and cheaper than training a model from scratch.

But still requires a lot of computational power and data to be effective. As always, the more data (and good data) you have, the better the results.

Some of the most common fine-tuning techniques are:

Reinforcement Learning from Human Feedback (RLHF);
Mixture of Experts (MoE);

Reinforcement Learning from Human Feedback (RLHF)

RLHF involves using human feedback to fine-tune models. This feedback, collected through interactions like rating responses (ChatGPT does this, for example), is crucial for aligning the model's outputs with what the user wants from it.

Mixture of Experts (MoE)

MoE employs multiple models to generate results. This technique enhances the diversity and range of the outputs, making the model more versatile and comprehensive.

Learn more

I'm by no means an expert on the subject, but I hope this article has helped you understand the very basics of generative AI.

If you want to learn more about it, I recommend checking out these resources as I am currently doing:

"Attention is all you need" - The foundational paper on transformers.
LlamaIndex - An open-source data framework for RAG architectures.
LangChain - A framework for developing applications around LLMs.
Intro to Large Language Models - A video by Andrej Karpathy, former OpenAI and Tesla.