Introduction to General LLM Use
Session Overview
- Large Language Model (LLM) Landscape
- Accessing and using LLMs
- Learning with and from LLMs using Prompt Engineering
What are LLMs?
- Architecture: Most LLMs, including GPT (Generative Pre-trained Transformer) and BERT (Bidirectional Encoder Representations from Transformers), are based on the transformer architecture. This enables them to process words in relation to all other words in a sentence, rather than one at a time, allowing for a deep understanding of context and nuances in language.
What are LLMs?
- Tokenization Process: LLMs convert text into tokens, which are numerical representations of words or subwords. This process allows the model to efficiently process and generate text. Tokenization plays a crucial role in handling diverse vocabularies, including scientific terminology, by breaking down complex words into manageable pieces.
What are LLMs?
- Training and Data: LLMs are trained on vast datasets comprising a wide range of text sources, from books and articles to websites and more. This extensive training enables them to learn language patterns, grammar, and knowledge across various domains, including science. The quality and diversity of training data significantly influence the model’s performance and bias. GPT-4 was trained on ~1 PB of data and is thought to have 1.7 trillion parameters.
What are LLMs?
- Ethical Considerations and Limitations: While LLMs are powerful tools, it’s important to acknowledge their limitations and ethical concerns, including biases in the training data, potential for generating misleading information, and the need for human oversight in interpreting and validating their outputs in scientific contexts. LLMs potentially “understand” language, in the sense that they have emergent capabilities as their network sizes scale.
Prompt Engineering Resources