Knowledge & RAG Pipeline
Orbit Classroom uses Retrieval-Augmented Generation (RAG) to ground AI chat responses in actual class materials. Instead of relying solely on the language model's general knowledge, the system retrieves relevant content from uploaded documents and provides it as context for every answer.
How RAG Works
The pipeline has two phases: ingestion (when materials are uploaded) and retrieval (when a student asks a question).
Ingestion Phase
Material Upload → Text Extraction → Chunking → Embedding → Storage (pgvector)
- Text extraction -- content is pulled from PDFs, Word documents, presentations, and other supported formats.
- Chunking -- extracted text is split into manageable segments with configurable size and overlap.
- Embedding -- each chunk is converted into a vector representation using a configured embedding model.
- Storage -- vectors are stored in pgvector (PostgreSQL extension) for fast similarity search.
Retrieval Phase
Student Question → Query Processing → Hybrid Search → Reranking → LLM Context Injection → Response with Citations
- The student's question is processed (optionally augmented for better retrieval).
- Hybrid search combines vector similarity with BM25 keyword search to find relevant chunks.
- Retrieved chunks are reranked for relevance.
- Top chunks are injected into the LLM prompt as context.
- The AI generates a response with inline citations pointing back to source materials.
Admin Configuration
Admins can tune every stage of the RAG pipeline through the admin settings panel.
Chunking Settings
| Setting | Description |
|---|---|
| Chunk size | Maximum number of tokens per chunk |
| Chunk overlap | Number of overlapping tokens between consecutive chunks |
| Markdown-aware chunking | Respects heading boundaries to keep sections intact |
Retrieval Settings
| Setting | Description |
|---|---|
| Top-k results | Number of chunks to retrieve per query |
| Similarity threshold | Minimum similarity score for a chunk to be considered relevant |
| Hybrid search | Combines vector similarity search with BM25 keyword matching |
Reranking
After initial retrieval, chunks can be reranked using a secondary model to improve relevance ordering. This is especially useful when the initial retrieval returns a large number of candidates.
Query Augmentation
User queries can be expanded or reformulated before retrieval to improve recall. This helps when student questions are short or ambiguous.
Start with the default settings and adjust incrementally. Increasing top-k retrieves more context but may dilute relevance. Lowering the similarity threshold broadens recall but risks including less relevant chunks.
Citations
When the AI uses retrieved chunks in its response, it includes inline citations that reference the source material and chunk. Students can trace any claim back to the original document.
If the AI cannot find relevant material in the RAG pipeline, it falls back to general model knowledge and displays a "Not grounded" warning to the student.
Embedding Models
The embedding model used for vectorization is configurable via admin settings. Changing the embedding model requires reprocessing existing materials to regenerate vectors.
After changing the embedding model, trigger a reprocessing of all materials to ensure consistency. Mixing embeddings from different models will degrade retrieval quality.