|

What are LLMs?

What are LLMs? Photo by Bernd đź“· Dittrich on Unsplash

What is a Large Language Model? (LLM)

A Large Language Model (LLM) is a type of artificial intelligence (AI) program that can recognize and generate text, among other tasks. LLMs are trained on massive datasets — hence the name “large.” LLMs are built on machine learning, specifically a type of neural network called a transformer model.

In simpler terms, an LLM is a computer program that has been fed with enough samples to recognize and interpret human language or other types of complex data. Many LLMs are trained on data collected from the Internet — thousands or millions of gigabytes of text. However, the quality of these samples impacts how well LLMs learn a natural language, so LLM programmers may use a more curated dataset.

LLMs use a type of machine learning called deep learning to understand how characters, words, and phrases work together. Deep learning involves the probabilistic analysis of unstructured data, which eventually allows the deep learning model to recognize distinctions between pieces of content without human intervention.

LLMs are further trained through fine-tuning: they undergo fine-tuning or are adjusted by prompts for a specific task that the programmer wants them to perform, such as interpreting questions and generating responses or translating text from one language to another.

What are LLMs used for?

LLMs can be trained to perform a variety of tasks. One of the most well-known use cases is their application as generative AI: when given a prompt or a question, they can produce texts in response. The publicly available LLM ChatGPT, for example, can generate articles, poems, and other textual forms in response to user inputs.

Any large and complex dataset can be used to train LLMs. Some LLMs can help programmers write code. They can write functions when prompted or, given some starting code, complete writing a program. LLMs can also be used in:

Examples of real-world LLMs include ChatGPT (OpenAI), Gemini (Google), Llama (Meta), and Bing Chat (Microsoft). GitHub Copilot is another example but focuses on code rather than human natural language.

What are some advantages and limitations of LLMs?

One feature of LLMs is their ability to respond to unpredictable requests. A traditional computer program receives commands in accepted syntax or from a certain set of user inputs. A video game has a finite set of buttons, an application has a finite set of things a user can click, and a programming language is composed of precise if/then statements.

In contrast, an LLM can respond to natural human language and use data analysis to respond to an unstructured question or request in a way that makes sense. Where a typical computer program would not recognize a prompt like “What are the four best funk bands in history?”, an LLM can respond with a list of bands and a convincing, reasonable defense of why they are the best.

In terms of the information they provide, however, LLMs can be only as reliable as the data they ingest. If fed with false information, they will generate false information to respond to user requests. LLMs can also sometimes “hallucinate”: create false information when they are unable to generate an accurate response. For example, the Fast Company news company asked about Tesla’s last quarter results; while ChatGPT provided a coherent news article in response, much of the information was invented.

In terms of security, user-facing LLM-based applications are prone to bugs just like any other application. LLMs can also be manipulated with malicious inputs to provide certain types of responses about others - including responses that are dangerous or unethical. Finally, one security concern with LLMs is that users may submit secure and confidential data to enhance their productivity. But LLMs use the inputs they receive to train their models in the future and are not designed to be secure safes: they may expose confidential data in response to requests from other users.

How do LLMs work?

Machine learning and deep learning

At a basic level, LLMs are built on machine learning. Machine learning is a subset of AI and refers to the practice of feeding large amounts of data to programs so they can train the programs to identify features of that data without human intervention.

LLMs use a type of machine learning called deep learning. Deep learning models can essentially train themselves to recognize distinctions without human intervention, although some human fine-tuning is typically necessary.

Deep learning uses probability to “learn.” For example, in the well-known phrase “The quick brown fox jumped over the lazy dog,” which is a pangram, the letters “e” and “o” are the most common, appearing four times each. From this, a deep learning model could conclude (correctly) that these characters are the most common in English language texts.

Realistically, a deep learning model actually cannot conclude anything from a single phrase. But after analyzing trillions of sentences, it can learn enough to predict how to correctly complete an incomplete sentence or even generate its own sentences.

Neural networks

To enable this type of deep learning, LLMs are built on neural networks. Just as the human brain is built of neurons that connect and send signals, an artificial neural network (usually shortened to “neural network”) is built of interconnected nodes. They consist of several “layers”: an input layer, an output layer, and one or more layers in between. The layers only pass information to others if their outputs match a certain threshold.

Transformer models

The specific type of neural networks used by LLMs is called transformer models. Transformer models are capable of learning context - especially important for human language, which is highly dependent on context. Transformer models use a mathematical technique called Self-attention (link to an article by Sebastian Raschka with further explanations) that detects sudden ways in which elements of a sequence relate to each other. This makes them better at understanding context than other types of machine learning. It allows them to understand, for example, how the end of a sentence connects to its beginning and how sentences in a paragraph relate to each other.

This allows LLMs to interpret human language, even if the language is vague or poorly defined, placed in combinations it has not encountered before, or contextualizing it in different ways. At some level they “understand” semantics in a way that they can associate words e concepts by their meaning, having seen them grouped in a way millions or billions of times.

This article was created by translating the linked Cloudfare Article to Portuguese. And later that led to this current English version. I hope you enjoyed the reading and that you could learn a bit more about LLMs. See you!