Knowledge & RAG Pipeline

Orbit Classroom uses Retrieval-Augmented Generation (RAG) to ground AI chat responses in actual class materials. Instead of relying solely on the language model's general knowledge, the system retrieves relevant content from uploaded documents and provides it as context for every answer.

How RAG Works

The pipeline has two phases: ingestion (when materials are uploaded) and retrieval (when a student asks a question).

Ingestion Phase

Material Upload → Text Extraction → Chunking → Embedding → Storage (pgvector)

Text extraction -- content is pulled from PDFs, Word documents, presentations, and other supported formats.
Chunking -- extracted text is split into manageable segments with configurable size and overlap.
Embedding -- each chunk is converted into a vector representation using a configured embedding model.
Storage -- vectors are stored in pgvector (PostgreSQL extension) for fast similarity search.

Retrieval Phase

Student Question → Query Processing → Hybrid Search → Reranking → LLM Context Injection → Response with Citations

The student's question is processed (optionally augmented for better retrieval).
Hybrid search combines vector similarity with BM25 keyword search to find relevant chunks.
Retrieved chunks are reranked for relevance.
Top chunks are injected into the LLM prompt as context.
The AI generates a response with inline citations pointing back to source materials.

Admin Configuration

Admins can tune every stage of the RAG pipeline through the admin settings panel.

Chunking Settings

Setting	Description
Chunk size	Maximum number of tokens per chunk
Chunk overlap	Number of overlapping tokens between consecutive chunks
Markdown-aware chunking	Respects heading boundaries to keep sections intact

Retrieval Settings

Setting	Description
Top-k results	Number of chunks to retrieve per query
Similarity threshold	Minimum similarity score for a chunk to be considered relevant
Hybrid search	Combines vector similarity search with BM25 keyword matching

Reranking

After initial retrieval, chunks can be reranked using a secondary model to improve relevance ordering. This is especially useful when the initial retrieval returns a large number of candidates.

Query Augmentation

User queries can be expanded or reformulated before retrieval to improve recall. This helps when student questions are short or ambiguous.

tip

Start with the default settings and adjust incrementally. Increasing top-k retrieves more context but may dilute relevance. Lowering the similarity threshold broadens recall but risks including less relevant chunks.

Citations

When the AI uses retrieved chunks in its response, it includes inline citations that reference the source material and chunk. Students can trace any claim back to the original document.

info

If the AI cannot find relevant material in the RAG pipeline, it falls back to general model knowledge and displays a "Not grounded" warning to the student.

Embedding Models

The embedding model used for vectorization is configurable via admin settings. Changing the embedding model requires reprocessing existing materials to regenerate vectors.

For Admins

After changing the embedding model, trigger a reprocessing of all materials to ensure consistency. Mixing embeddings from different models will degrade retrieval quality.

How RAG Works​

Ingestion Phase​

Retrieval Phase​

Admin Configuration​

Chunking Settings​

Retrieval Settings​

Reranking​

Query Augmentation​

Citations​

Embedding Models​