← Zurueck
threat-intelligencemcppythonsoctooling

Building Heimdall - A Threat Intelligence MCP Server From Scratch

2025.03.15

Why I Built This

I got tired of tabs.

That’s really what started it. You’re in a SOC, an alert fires, you get an IP address. Now you need context. So you open VirusTotal in one tab. AbuseIPDB in another. Shodan in a third. Maybe Hybrid Analysis if there’s a hash involved. And somewhere in the back of your mind you’re trying to map this to MITRE ATT&CK because your report template demands it.

By the time you’ve assembled the picture, ten minutes have passed. For one IOC. Multiply that by the twenty alerts sitting in your queue and the math stops working pretty quickly.

I’ve used commercial platforms that promise to solve this. They do, kind of - but they come with six-figure price tags, three-month onboarding cycles, and a sales rep who calls you every quarter to discuss your “threat intelligence maturity journey.” I didn’t need a maturity journey. I needed one place to throw an IP and get back everything I know about it.

Then MCP happened.

What MCP Changes

The Model Context Protocol turned out to be the missing piece I didn’t know I was looking for. I’d been using Claude Desktop for a while, mostly for writing and analysis. But when MCP dropped and I saw that you could give Claude access to custom tools via a local server, the idea was obvious: what if I could just ask Claude “what do we know about 185.220.101.1?” and it would go query every threat intel source I have access to, correlate the results, and give me a coherent answer?

Not a dashboard. Not a SOAR playbook. Just a conversation where the AI has access to the same tools I’d use manually - but faster.

So I built Heimdall.

The name fits. In Norse mythology, Heimdall is the guardian of Bifrost who watches over the realm. In my homelab, where every piece of infrastructure follows Norse naming conventions anyway, the name was practically taken already.

The Architecture

Heimdall is a FastMCP server written in Python. It exposes seven tools that Claude (or any MCP-compatible client) can call:

vt_lookup takes an IOC - IP, domain, URL, or file hash - and queries the VirusTotal API. Returns detection ratios, last analysis results, community votes. This was the first tool I built and still the most reliable one.

abuseipdb_check queries AbuseIPDB for an IP address. Returns the abuse confidence score, ISP, usage type, country, and the number of reports. Straightforward, works well.

shodan_host was supposed to query Shodan for open ports, services, vulnerabilities. I’ll get to why “supposed to” is the operative word in a moment.

map_infrastructure does DNS resolution, MX record lookup, NS records, and basic infrastructure mapping for a domain. Doesn’t depend on any paid API, just does DNS queries. Probably the most underrated tool in the set.

enrich_iocs is the bulk function - you throw a comma-separated list of mixed IOC types at it and it runs them all through every available provider, then cross-correlates. This is the workhorse for incident response where you have a list of twenty indicators from an alert and need context on all of them.

parse_iocs extracts IOCs from raw text. Paste in an email body, a log excerpt, a threat report, and it pulls out IPs, domains, URLs, and hashes. Handles defanging too - it knows that hxxps://evil[.]com is a URL.

sandbox_report pulls detonation results from Hybrid Analysis for a given SHA256 hash.

The whole thing runs as a Python FastMCP application with async handlers for each intelligence source. It supports two transports: stdio for Claude Desktop (local) and streamable-http on port 8000 for claude.ai. That second one means I can use Heimdall from my browser without having Claude Desktop running.

The Shodan Disaster

Let me tell you about the most frustrating debugging session of this entire project.

Shodan was supposed to be simple. Query the API, get back open ports and services, done. Instead, I got this:

LocalProtocolError: Illegal header name b''

Every. Single. Time.

I spent an embarrassing amount of time digging through my HTTP client code looking for a malformed header. Checked the API key was being passed correctly. Verified the endpoint URL. Tested with curl directly - worked fine. Tested through my code - same error.

The actual problem? My Shodan API key was a free-tier key with zero query credits. The account was valid, the key was valid, but the plan didn’t include programmatic host lookups. Shodan returned a 403, and my HTTP client didn’t handle that gracefully. Instead of a clean error message, the response parsing blew up trying to read an empty header from the error response.

The fix was two things: first, upgrading to a Shodan membership ($69/year - not exactly breaking the bank for a security tool). Second, and more importantly, adding proper error handling that actually tells you what’s wrong instead of vomiting a stack trace:

async def shodan_host(self, ip_address: str) -> dict:
    if not self.shodan_key:
        return {"error": "Shodan API key not configured"}
    
    try:
        response = await self.client.get(
            f"https://api.shodan.io/shodan/host/{ip_address}",
            params={"key": self.shodan_key}
        )
        if response.status_code == 401:
            return {"error": "Shodan: Invalid API key"}
        if response.status_code == 403:
            return {"error": "Shodan: Insufficient plan"}
        response.raise_for_status()
        return response.json()
    except Exception as e:
        return {"error": f"Shodan lookup failed: {str(e)}"}

Lesson learned: when you’re integrating five different APIs with five different authentication schemes and five different error response formats, the boring error handling code is more important than the actual feature code. I should have known that. I preach it to my team every day. And I still shipped a tool without it.

The enrich_iocs Timeout

The bulk enrichment function seemed straightforward - loop over the IOCs, query each provider, aggregate results. It worked beautifully with a single IOC. Two IOCs? Timeout.

The issue was subtle. Even after I removed the broken Shodan API key, the code was still trying to make the Shodan call. Without a key configured, you’d expect it to skip Shodan entirely. But the conditional check happened after the HTTP client was already initialized with the empty key, and the connection attempt would hang for 30 seconds waiting for a response that would never come.

The fix was checking for the key existence before even attempting the request - fail fast, not fail eventually. For two IOCs hitting three providers each, you’re making six API calls. If one of them hangs for 30 seconds, you blow through any reasonable timeout.

This is the kind of bug that doesn’t show up in unit tests because your test environment has all the API keys configured. It only shows up when someone deploys with a partial configuration - which is exactly the common case.

The sandbox_report 301 Problem

Hybrid Analysis has an API. The documentation says to POST to /api/v2/search/hash. What the documentation doesn’t mention is that if you hit the endpoint without a trailing slash, you get a 301 redirect. And if your HTTP client follows that redirect, it switches from POST to GET (because that’s what the HTTP spec says for 301). And the GET request hits a completely different endpoint that returns HTML instead of JSON.

The error message? Just “HTTP 301.” Not exactly illuminating at 2 AM.

The fix was adding the trailing slash. One character. After two hours of debugging.

What I Actually Use It For

The theoretical use case - “ask Claude about an IOC” - turned out to be just the beginning. In practice, I use Heimdall in three ways:

Incident triage. An alert fires. I paste the raw alert text into Claude with Heimdall connected. Claude parses out the IOCs automatically (using parse_iocs), enriches all of them (using enrich_iocs), and gives me a prioritized summary. What used to take 15 minutes of tab-switching now takes about 30 seconds.

Threat report enrichment. Someone shares a threat advisory with a list of IOCs. I paste it in. Heimdall checks every single one against all sources and tells me which ones are relevant to our environment. The map_infrastructure tool is particularly useful here - it gives me DNS context that helps determine if a domain is actually a threat or just parked.

Investigation support. During a deeper investigation, I have a running conversation with Claude where I can ask follow-up questions. “Is this IP related to any known campaigns?” “What else is hosted on this server?” “Show me the ATT&CK techniques associated with this hash.” Claude reasons across the tool outputs and connects dots I might miss when manually jumping between dashboards.

The thing that surprised me most is how much value comes from the conversational aspect. With a traditional dashboard, you see data. With Heimdall through Claude, you have a conversation about the data. You can ask “what’s weird about this?” and get an analytical response that considers all the context from all the tools at once.

What’s Still Broken

I’ll be honest about the current state. Heimdall works well enough for my daily use, but it’s not production-ready in any enterprise sense:

The error handling is better than it was, but still inconsistent across providers. VirusTotal returns clean JSON errors. AbuseIPDB sometimes returns HTML error pages. Shodan has the licensing confusion. Each provider needs its own error handling strategy, and I haven’t unified them yet.

Rate limiting is rudimentary. I’m relying on per-provider rate limits rather than implementing a global throttle. For my single-user use case this is fine. For a team deployment it would fall over quickly.

There’s no caching. Every query hits the live APIs. For frequently-checked IOCs, this wastes API credits and adds latency. A simple TTL cache on the responses would help a lot.

The MITRE ATT&CK mapping is the weakest part. Right now it’s keyword-based matching from the VirusTotal detection labels to ATT&CK technique IDs. It’s better than nothing, but it produces both false positives and false negatives that need manual verification.

What I’d Do Differently

If I started over, I’d build the error handling and API abstraction layer first, before writing a single line of feature code. Every provider should implement the same interface, return the same response format, and handle failures the same way. I built feature-first and am paying the refactoring tax now.

I’d also add integration tests from day one that specifically test with missing API keys, expired keys, and rate-limited responses. The “happy path” works. It’s the twelve different failure modes that bite you.

And I’d think harder about the MCP transport layer. The stdio transport for Claude Desktop just works - it’s local, it’s fast, there’s no authentication to worry about. The streamable-http transport for remote access is more complex: you need to think about authentication, TLS, CORS, and whether you really want your threat intelligence server accessible over the network. I ended up putting it behind my Netbird mesh VPN, which solves the network security problem but adds deployment complexity.

Was It Worth It?

Absolutely. Even with all the rough edges, Heimdall has meaningfully changed how I do threat intelligence work. The time savings are real - I’m faster at triage, more thorough in investigations, and I catch correlations I would have missed doing things manually.

But more than that, building Heimdall taught me something about how LLM tooling actually works in practice versus how it’s marketed. The MCP integration isn’t magic. It’s plumbing. The value comes from the plumbing being reliable enough that you stop thinking about it and start thinking about the actual security problem.

That’s also the bar for “good enough.” Not perfect. Not feature-complete. Just reliable enough that it disappears into the workflow.

Heimdall isn’t there yet. But it’s close enough that I use it every day, and that’s a better validation than any benchmark could provide.