Skip to content

Glossary

Plain-language definitions of the terms you'll meet across Bee. Skim it once, return when a word trips you up. Terms link to the Bee pages where they're explained in depth.

Aโ€“C

Agent
An LLM-driven system that decides and acts โ€” choosing tools, taking steps, and reacting to results in a loop, rather than answering once. See Agent Fundamentals.
Attention
The mechanism that lets a transformer weigh how much each token should "pay attention" to every other token. The core idea behind modern LLMs. See The Transformer.
Chunking
Splitting documents into smaller pieces before embedding them for retrieval. The single biggest lever on RAG quality. See Chunking.
Context window
The maximum number of tokens a model can consider at once (prompt + response). Exceed it and earlier content is dropped.
Cosine similarity
A measure of how similar two embedding vectors are, based on the angle between them. Ranges from -1 (opposite) to 1 (identical direction).

Dโ€“H

Embedding
A list of numbers (a vector) that represents the meaning of text, so similar meanings sit close together in vector space. See Embeddings.
Fine-tuning
Continuing to train a pretrained model on your own data to specialize its behavior. An alternative (or complement) to prompting and RAG.
Guardrails
Checks that constrain model inputs/outputs โ€” filtering unsafe content, validating format, or blocking prompt injection. See Security.
Hallucination
When a model produces confident but false or unsupported information. Mitigated (not eliminated) by RAG, grounding, and evaluation.
Hybrid search
Combining keyword search (e.g. BM25) with vector search to get the best of exact matches and semantic matches. See Hybrid Search & Reranking.

Iโ€“P

Inference
Running a trained model to get outputs (as opposed to training it). Your API calls are inference.
LLM (Large Language Model)
A neural network trained to predict the next token over huge amounts of text, which turns out to be a remarkably general capability. See How LLMs Work.
MCP (Model Context Protocol)
An open standard for connecting LLM applications to tools and data sources through a common interface. See MCP.
Prompt engineering
The craft of writing inputs that reliably get good outputs from a model. See Prompt Engineering.

Qโ€“Z

Quantization
Reducing the numeric precision of a model's weights (e.g. 16-bit โ†’ 4-bit) to shrink memory and speed up inference, usually with a small quality cost.
RAG (Retrieval-Augmented Generation)
Fetching relevant documents and putting them in the prompt so the model answers from your data instead of only its training. See RAG.
Reranking
A second, more accurate pass that reorders retrieved candidates by relevance before they reach the model. See Hybrid Search & Reranking.
System prompt
Instructions that set a model's persona, rules, and output format for the whole conversation. See System Prompts.
Temperature
A knob (0โ€“1+) controlling output randomness. Low = consistent; high = creative.
Token
The unit a model actually processes โ€” roughly a word-piece. Billing and context limits are counted in tokens. See Tokenization.
Tool / function calling
Letting a model request that your code run a function (search, calculate, call an API) and use the result. See Function & Tool Calling.
Vector database
A database optimized for storing embeddings and finding the nearest ones to a query vector. See Vector Databases.

Missing a term?

Suggest an addition or add it yourself โ€” the glossary is a great first contribution.