Build software better, together

cvs-health / uqlm

UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection

uncertainty-quantification uncertainty-estimation ai-safety confidence-score hallucination confidence-estimation ai-evaluation llm llm-evaluation llm-safety hallucination-evaluation hallucination-detection hallucination-mitigation llm-hallucination

Updated Apr 16, 2026
Python

KRLabsOrg / LettuceDetect

Star

Lightweight hallucination detection framework for RAG applications

python nlp pytorch information-extraction bert token-classification hallucination-evaluation hallucination-detection

Updated Mar 6, 2026
Python

NishilBalar / Awesome-LVLM-Hallucination

Star

up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources

mlm hallucination large-language-models llm mllm large-vision-language-models multimodal-large-language-models hallucination-evaluation hallucination-detection vision-language-models lvlm hallucination-mitigation hallucination-survey hallucination-research hallucination-benchmark multimodal-language-model

Updated Feb 8, 2026

IAAR-Shanghai / UHGEval

Star

[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.

benchmark evaluation dataset openai hallucination huggingface huggingface-transformers ceval gpt-3 openai-api hallucinations gpt-4 large-language-models llm chatgpt qwen hallucination-evaluation hallucination-detection

Updated Jun 7, 2025
Python

MemTensor / HaluMem

Star

HaluMem is the first operation level hallucination evaluation benchmark tailored to agent memory systems.

benchmark ai memory memos hallucination long-term-memory memzero llm hallucination-evaluation llm-memory mem0 memory-system memobase

Updated Jan 8, 2026
Python

Ruiyang-061X / VL-Uncertainty

Star

🔎Official code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".

uncertainty uncertainty-quantification multi-modal uncertainty-estimation uncertainty-analysis hallucination vision-language vision-language-model large-vision-language-model hallucination-evaluation hallucination-detection multi-modal-large-language-model

Updated Mar 18, 2025
Python

hukcc / Awesome-Video-Hallucination

Star

[ACL 2026] Paper list of Video LLM hallucination. Welcome to Star and Contribute!

computer-vision acl survey awesome-list video-understanding hallucination vision-language-model multimodal-large-language-models multimodal-llm hallucination-evaluation video-llm acl2026

Updated Apr 15, 2026
Python

Unofficial implementation of Microsoft’s Claimify Paper: extracts specific, verifiable, decontextualized claims from LLM Q&A to be used for Hallucination, Groundedness, Relevancy and Truthfulness detection

nugget fact-checking factoids relevancy factoid hallucination fact-verification truthfulness hallucination-evaluation hallucination-detection hallucination-mitigation

Updated Aug 25, 2025
Python

AikyamLab / hallucinogen

Star

A benchmark for evaluating hallucinations in large visual language models

ai aisafety visual-language-models hallucination-evaluation hallucination-detection medical-safety medical-visual-language-model

Updated Mar 18, 2025
Python

amazon-science / THRONE

Star

Code release for THRONE, a CVPR 2024 paper on measuring object hallucinations in LVLM generated text.

benchmark hallucination hallucinations large-language-models large-language-model vision-language-model large-vision-language-model large-vision-language-models cvpr2024 hallucination-evaluation vision-language-models

Updated Apr 13, 2026
Python

DegenAI-Labs / HalluWorld

Star

Repository for the paper "A Unified Definition of Hallucination: It’s The World Model, Stupid!" https://arxiv.org/abs/2512.21577

nlp benchmark machine-learning natural-language-processing ai ml artificial-intelligence language-model hallucination world-models large-language-models llm hallucination-evaluation hallucination-detection hallucination-mitigation hallucination-benchmark

Updated Feb 11, 2026

rkhokhla / kakeya

Star

When AI makes $10M decisions, hallucinations aren't bugs—they're business risks. We built the verification infrastructure that makes AI agents accountable without slowing them down.

platform iot distributed-systems multi-tenant ai health-check saas compliance blockchain-technology anti-fraud mlops llm llms llmops llm-training llm-inference hallucination-evaluation hallucination-detection hallucination-mitigation

Updated Oct 25, 2025
Go

dataaispark-spec / TrustScoreEval

Star

TrustScoreEval: Trust Scores for AI/LLM Responses — Detect hallucinations, flags misinformation & Validate outputs. Build trustworthy AI.

ai ml chatbots agents hallucination rag hallucinations trustworthy-ai llm finetuning-llms hallucination-evaluation hallucination-detection aiagents hallucination-mitigation hallucination-grader trustscore hallucination-hunting hallucination-prevention hallucination-quantification

Updated Oct 13, 2025
Python

LawEngine / cite-bench

Star

A blind benchmark for legal citation verification — 4-label classification over IL + federal primary law

law benchmark lawyer legaltech hallucination legal-ai legal-tech legal-nlp legalai hallucination-evaluation hallucination-detection citation-verification

Updated Apr 9, 2026
Python

dirmacs / eruka-mcp

Star

MCP server for Eruka — anti-hallucination context memory for AI agents

memory mcp context ai-agents llm hallucination-evaluation hallucination-detection hallucination-mitigation model-context-protocol

Updated Apr 12, 2026
Rust

Workofarttattoo / BaseX-Coding-Language

Star

HALLUCINATED BY CURSOR WITh CODEX PLUGIN:::BEWARE:::::BaseX Coding Language - Revolutionary Base 5.10 Quantum Teleportation & Infinite Storage System by Joshua Hendricks Cole

hallucination hallucination-evaluation

Updated Oct 11, 2025
Python

meghajbhat / Reducing-Hallucinations-in-LLMs-using-Prompt-Engineering-Strategies

Star

A comprehensive study on reducing hallucinations in Large Language Models through strategic prompt engineering techniques. (COV + COT + Hybrid)

gpu python3 kaggle hallucinations prompt-engineering generative-ai chainofthought hallucination-evaluation hallucination-detection chainofverification

Updated Nov 15, 2025
Jupyter Notebook

ashioyajotham / Value-Aligned-Confabulation-VAC-Research

Star

Driving away from the binary "hallucinations" evals to a more nuanced and context-dependent eval technique.

evaluation-metrics ai-safety value-alignment llm-evaluation hallucination-evaluation confabulations

Updated Dec 6, 2025
Python

emrehannn / contextual-hallucination-detector

Star

An interactive Python chatbot demonstrating real-time contextual hallucination detection in Large Language Models using the "Lookback Lens" method. This project implements the attention-based ratio feature extraction and a trained classifier to identify when an LLM deviates from the provided context during generation.

python nlp machine-learning machine-learning-algorithms chatbot attention-mechanism ai-tools large-language-models llm hallucination-evaluation hallucination-detection hallucination-mitigation

Updated May 16, 2025
Python

Rakin061 / RAG-Domain-Adaptation-Hotel-Domain

Star

Dataset Generation and Pre-processing Scripts for the Research titled: Leveraging the Domain Adaptation of Retrieval Augmented Generation (RAG) Models in Conversational AI for Enhanced Customer Service

domain-adaptation rag hallucination-evaluation

Updated Sep 28, 2024
Jupyter Notebook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hallucination-evaluation

Here are 28 public repositories matching this topic...

cvs-health / uqlm

KRLabsOrg / LettuceDetect

NishilBalar / Awesome-LVLM-Hallucination

IAAR-Shanghai / UHGEval

MemTensor / HaluMem

Ruiyang-061X / VL-Uncertainty

hukcc / Awesome-Video-Hallucination

deshwalmahesh / claimify

AikyamLab / hallucinogen

amazon-science / THRONE

DegenAI-Labs / HalluWorld

rkhokhla / kakeya

dataaispark-spec / TrustScoreEval

LawEngine / cite-bench

dirmacs / eruka-mcp

Workofarttattoo / BaseX-Coding-Language

meghajbhat / Reducing-Hallucinations-in-LLMs-using-Prompt-Engineering-Strategies

ashioyajotham / Value-Aligned-Confabulation-VAC-Research

emrehannn / contextual-hallucination-detector

Rakin061 / RAG-Domain-Adaptation-Hotel-Domain

Improve this page

Add this topic to your repo