Best Evaluation AI Skills & MCP Servers
13 curated Evaluation skills and MCP servers — install any of them into Claude, Cursor, ChatGPT, n8n, or any AI stack with one command.
Tuningengines Cli
Tuning Engines CLI, MCP server, and Python agent runtime adapters for governed model, agent, skill, and MCP workflows. Fine-tune open-source LLMs, run inference, manage datasets/evaluations, and connect LangGraph or Temporal while Tuning Engines handles p
Server
The agent eval standard for MCP. Score every agent output for quality, safety, and cost.
Formulon
MCP server for Formulon Excel-compatible formula and workbook evaluation
Mcplab
MCP server that exposes MCPLab evaluation tools — query runs, results, and traces via the Model Context Protocol
Server Tester
Playwright-based testing and evaluation framework for MCP servers
Ai Agent Guidelines
MCP server exposing public instruction workflows as tools, backed by hidden AI agent skills for requirements, orchestration, quality, research, evaluation, governance, resilience, and physics-inspired analysis
Nia Web Eval Agent
NIA AI Web Evaluation Agent MCP Server - Autonomous browser testing and debugging
Mcplab Core
Core evaluation engine for MCPLab — agent adapters, MCP client, scenario runner, and result types
Mcplab
MCPLab - Test and evaluate MCP servers with LLMs — run evals, compare agents and launch the MCPLab web app
Mcp
Official MCP server for Autousers — UX evaluation, calibrated AI personas, side-by-side design review.
Mcplab Reporting
HTML report generation for MCPLab evaluation runs
Evals
GitHub Action for evaluating MCP server tool calls using LLM-based scoring
Xcomet
MCP Server for xCOMET translation quality evaluation
About Evaluation skills on iClaude
iClaude is the universal install layer for AI skills. Every Evaluation skill on this page can be installed into Claude Code, Claude Desktop, Cursor, ChatGPT, n8n, Codex, and more — using a single copy-paste command. No config drift, no per-stack adapters, no manual MCP wiring.