Large Data Notes Guides • Benchmarks • Templates

Tool • Checklist

RAG evaluation checklist

A quick checklist to test a retrieval + answer system (RAG). The aim is to reduce wrong answers by checking sources, retrieval, grounding, and failure handling.

1) Data sources

  • Sources are listed and owned (no unknown scrape)
  • Each source has a freshness rule (how often it updates)
  • Access rules are clear (private vs public)
  • Text is cleaned (headers, footers, repeated boilerplate removed)

2) Index and chunks

  • Chunk size is chosen on purpose (not default)
  • Chunks keep meaning (no mid-table or mid-sentence splits)
  • Each chunk stores: source, section, date, and a stable id
  • Duplicates are reduced (near-duplicate text does not flood retrieval)

3) Retrieval checks

  • Top-k results include the right source for known queries
  • Empty retrieval is handled (no answer or ask for more detail)
  • Recall risk is tracked (how often the right text is missing)
  • Query rewrite is measured (does it help or harm retrieval)

4) Grounding checks

  • Answers cite the retrieved text (or quote small spans)
  • When evidence is weak, the system says it is unsure
  • Confident text without evidence is blocked
  • Output format rules are enforced (tables, bullet lists, JSON, etc)

5) Safety and refusal rules

  • Restricted topics trigger refusal or safe redirect
  • Personal data is not exposed in responses
  • Prompt injection is handled (ignore unsafe instructions in retrieved text)
  • Tool calls have allow-lists and hard limits

6) Cost, latency, and fallbacks

  • Latency budget exists and is measured (p50/p95)
  • Token use is capped (max input and output)
  • Caching is used where safe
  • Fallback exists (smaller model, keyword search, human hand-off)

7) Evaluation set

  • At least 20–50 real questions exist for testing
  • Questions include hard cases (ambiguous, rare, long context)
  • Each question has an expected source or a valid “cannot answer”
  • Results are tracked over time (before/after changes)

Next steps