MCP Verdict
The neutral registry for MCP servers

Which MCP servers actually work?

A dozen servers promise the same capability. Most have never been run by anyone neutral. Some are broken. Some hand an agent your entire filesystem by default. MCP Verdict installs them, tests what they do, and publishes a verdict, so you can stop guessing and pick one.

No vendor pays for placement. The method is public, one click from every score.

The gap

Every other score reads the label. Almost none of them run the tool.

Several products now put a number on an MCP server. They score the packaging: GitHub stars, manifest quality, declared permissions, provenance, maintenance recency. A score assembled from a project's own metadata is a popularity-and-age score wearing a quality costume, and the author controls most of the inputs. One popular MCP score already has a published guide titled "how to hit 100." A number you can grind toward measures completeness, not quality.

None of that answers the question a builder actually asks. If I point an agent at this server, does it do what it claims, does it stay up across ten calls, how fast is it, and what can it reach on my machine. That answer only comes from running the thing.

The wedge

We run the server and take a side.

Every entry has been installed and exercised against a structured evaluation. We check whether it functions as advertised, how reliably it responds across repeated calls, how fast, and what it can touch on the host. The score is behavioral, built from running the tool, not from reading its npm page. Then a person writes a verdict that says use it or do not.

Here is what that catches. In our first category, two servers pass every functional test and still fail the grade: one ships with unrestricted filesystem access by design and reads /etc/passwd without complaint, and another lets a shell command walk straight out of its sandbox. A metadata score would wave both through. We ran them, so we caught them.

Function

Does it do what it says? We run every advertised tool. Pass, partial, or fail, with notes.

Reliability

How consistently does it respond correctly across repeated calls in one run?

Latency

How fast is it? Scored on a curve: under 250ms is full marks, past three seconds it falls hard.

Security

What can it reach? We test path scoping, allowlists, and escape routes. One high flag caps the score.

The neutrality thesis

We have no servers to sell and no platform to protect.

Every directory and marketplace that ranks these tools has a reason to. A vendor's store favors the tools that make its platform look good. A directory with a connector gateway favors the tools that route through it. The scorers built on metadata favor whoever fills in the most fields. MCP Verdict runs none of that. We do not host a directory, we do not sell placement, and the scoring weights and the method are published and versioned.

Everyone else scores the packaging. We test the product. We will print a failing grade on a well-funded server and a passing grade on a one-person project when that is what running it shows.

Read the neutrality thesis

What a verdict looks like

A real score, explained.

This is a live entry from the registry. The grade, the composite, the four sub-scores, the confidence, the methodology version, and the written verdict are all public. Every entry looks like this.

A98

@modelcontextprotocol/server-filesystem

MCP server

The official Anthropic-maintained filesystem MCP server. Exposes 14 tools for read, write, edit, search, and directory operations. Enforces a configurable allowlist of directories via command-line arguments or the MCP Roots protocol.

Method rung1.v1computed Jun 9, 2026How we score
Functional100/100
Reliability95/100
Latency100/100
Security98/100

The verdict

The reference implementation. All 14 advertised tools pass functional testing. Directory allowlist is enforced at startup and at every operation: both read and write outside allowed directories are blocked with a clear error. Setup is a single npx command. The only design note worth flagging: the server grants full read-write access to every allowed directory with no per-directory read-only mode at the server level. Use Docker volume mounts with the ro flag if you need a read-only allowed directory. For any production agent deployment, this is the server to start with.

See the full entry and test history

Start here.

Find a server.

The registry is open. Filter by type and sort by score or confidence. Every entry shows its grade, its sub-scores, and the written verdict, including the unflattering ones.

Browse the registry

List a server.

Built one, or depend on one worth testing? Submit it. A human reviews every submission and we run it before it is listed. We do not auto-publish.

Submit a server

Today, MCP servers. Agents and the rest of the tool layer are where we go next.