Best Eval AI Skills & MCP Servers
59 curated Eval skills and MCP servers — install any of them into Claude, Cursor, ChatGPT, n8n, or any AI stack with one command.
Superlocalmemory
Information-geometric agent memory with mathematical guarantees. 4-channel retrieval, Fisher-Rao similarity, zero-LLM mode, EU AI Act compliant. Works with Claude, Cursor, Windsurf, and 17+ AI tools.
Mnemo
Structured fact memory MCP server — SQLite + FTS5, trust scoring, entity graph, bilingual retrieval for Claude Code & Codex
Judges
45 specialized judges that evaluate AI-generated code for security, cost, and quality.
Clawmem
On-device memory layer for AI agents. Claude Code, OpenClaw, and Hermes. Hooks + MCP server + hybrid RAG search.
Prism
Prism Coder — Cognitive memory + tool-calling intelligence for AI agents. Mind Palace persistent memory (BFCL Gold Certified, 100% Tool-Call Accuracy, 54 Agent Skills, Zero-Search HDC/HRR retrieval, HIPAA-hardened local-first storage, SLERP-optimized GRPO
Tuningengines Cli
Tuning Engines CLI, MCP server, and Python agent runtime adapters for governed model, agent, skill, and MCP workflows. Fine-tune open-source LLMs, run inference, manage datasets/evaluations, and connect LangGraph or Temporal while Tuning Engines handles p
Server
The agent eval standard for MCP. Score every agent output for quality, safety, and cost.
Cogmemai
CogmemAi — Autonomous Cognitive Memory for Any Ai System. 95.10% on LongMemEval (top published score on the field's hardest long-term memory benchmark) and 91% on LoCoMo (above human performance). Autonomous memory capture: your Ai's work is saved even wh
Md Feedback
MCP server for markdown plan review — companion to the MD Feedback VS Code extension. AI agents read annotations, mark tasks done, evaluate quality gates, and generate session handoffs. 27 tools for Claude Code, Cursor, and other MCP-compatible clients.
Skar
Skar turns a captured AI agent trace into a committed pytest regression test. MCP server + CLI. Use when a tool-using agent run fails and you want to lock the failure as an executable test.
Calculator
Evaluate, simplify, and differentiate mathematical expressions via MCP. STDIO or Streamable HTTP.
Formulon
MCP server for Formulon Excel-compatible formula and workbook evaluation
Ori Memory
Cognitive architecture for persistent AI agent memory. Knowledge graph with learning retrieval, ACT-R decay, and spreading activation. Markdown-native, local-first, zero cloud. MCP server + CLI.
Paper Search Agent
MCP server for paper-search-agent: academic paper discovery, access planning, and full-text retrieval via campus network
Memory Lancedb
MCP server for LanceDB-backed long-term memory with hybrid retrieval (Vector + BM25), cross-encoder rerank, multi-scope isolation, and memory lifecycle management
Mcplab
MCP server that exposes MCPLab evaluation tools — query runs, results, and traces via the Model Context Protocol
Pdf Reader
MCP server for efficient PDF text extraction, search, and metadata retrieval for Claude Code
Mcp
Model Context Protocol server for digitalcalculator.info financial calculators. v0.3.0 ships 9 calculator tools (mortgage monthlyPayment, compound-interest futureValue, retirement401k projection, Social Security estimatedBenefit, paycheck netPay, IRA cont
Adaptive Recall
Adaptive memory system for AI applications. Multi-strategy retrieval, cognitive scoring, knowledge graph, and self-improving ML. Connects via MCP or REST API.
Node Webrtc
MCP server for @agentdance/node-webrtc — lets AI agents discover, evaluate, and get started with the pure-TypeScript WebRTC stack
Merch Connector
MCP server that gives merchandising agents eyes on any storefront — scrape, audit, compare, roundtable analysis, and eval tracking via 11 tools.
Enquire
MCP server giving AI agents (Claude Code, Claude Desktop, Cursor, ChatGPT, Codex, OpenClaw) persistent long-term memory backed by your local Obsidian markdown vault. Hybrid retrieval (BM25 + ML embeddings + BGE reranker, RRF-fused), HNSW + int8 quantizati
Lightrag
Model Context Protocol (MCP) server for LightRAG - 30 fully working tools with complete RAG and Knowledge Graph integration
Agentdb
Self-learning vector memory for AI agents — single-file .rvf cognitive container with HNSW search, episodic Reflexion memory, causal graph + Cypher, 9 RL algorithms, Thompson Sampling bandit, 41 MCP tools, hybrid (BM25 + dense) retrieval, GNN attention. 1
About Eval skills on iClaude
iClaude is the universal install layer for AI skills. Every Eval skill on this page can be installed into Claude Code, Claude Desktop, Cursor, ChatGPT, n8n, Codex, and more — using a single copy-paste command. No config drift, no per-stack adapters, no manual MCP wiring.