Technical articles on AI agents, Azure, .NET, architecture, and EV charging systems from Sydney.

Category: AI & Agents

MCP Transport Types Explained: stdio vs Streamable HTTP vs SSE in C#

Nobody tells you this when you start building MCP servers, but picking the wrong MCP transport is the kind of mistake that doesn’t hurt immediately. It hurts three months later when you’re trying to deploy to Azure and realise your architecture assumes a single process on a single machine. I’ve been there. This post is what I wish I’d read before that happened.

The MCP C# SDK gives you four transport options: stdio, Streamable HTTP, SSE, and in-memory. They all carry the same JSON-RPC 2.0 messages between your AI client and your server — the difference is in where your server lives, how connections are managed, and what happens under production load. I’ll walk through each one honestly, with working C# code and the stuff the official docs gloss over.

If you haven’t read my post on what MCP servers actually are, start there first — this one assumes you already know the basics.

What Is an MCP Transport, Exactly?

The simplest way I can put it: the MCP transport is the pipe that carries messages between the AI model and your server. The protocol itself — tool calls, capability manifests, responses — is identical regardless of which transport you use. What changes is the physical channel those messages travel through.

Think of it like this. The conversation between you and a colleague is the same whether you’re in the same room, on a video call, or texting. The transport just determines who can initiate, how fast messages arrive, and what happens when the connection drops.

The Four Transports at a Glance

Before I go into detail on each one, here’s the map. I’ll refer back to this table throughout.

Transport Direction Sessions Scales horizontally? Best for
stdio Bidirectional Implicit — one per process N/A Local tools, Claude Desktop, IDE integrations
Streamable HTTP (stateless) Request-response None ✅ No constraints Production APIs, Azure Container Apps
Streamable HTTP (stateful) Bidirectional Mcp-Session-Id header ⚠️ Needs sticky sessions Long-running agents, server push notifications
SSE (legacy) Server→client stream + POST Query string session ID ⚠️ Needs sticky sessions Legacy client compatibility only
In-memory Bidirectional Implicit — one per pipe N/A Unit tests, same-process embedding

1. stdio — Start Here, Always

stdio is the one that surprised me the most when I first encountered MCP. The client literally launches your server as a child process and the two talk through stdin and stdout. No HTTP, no ports, no certificates. Your server is just a program reading from one pipe and writing to another.

This is how Claude Desktop connects to local MCP servers. You add your server to claude_desktop_config.json, Claude spawns the process, and from that point on the two are talking JSON-RPC over standard I/O. It’s almost offensively simple for something that enables quite sophisticated agent behaviour.

What I like about it

  • It just works, everywhere. No networking, no firewall rules, no TLS ceremony. If you can run the binary, you have a working MCP server. I’ve got stdio servers running in environments where I’m not allowed to open any network ports.
  • Bidirectional by default. The server can push notifications back to the client at any time — stdin/stdout flow control gives you natural backpressure with zero configuration.
  • Clean isolation. Each client connection gets its own process. If the server crashes, the client knows immediately and only that session is affected. Nothing shared, nothing leaked.
  • Easiest to debug. You can literally run the server in a terminal and type JSON at it. I’ve done this more times than I’d like to admit.

What to watch out for

  • Local only. The server has to live on the same machine as the client. This rules it out for any cloud or API-style deployment.
  • One client per process. Not a problem for dev tools, but if you’re imagining dozens of agents sharing one server instance — that’s not stdio’s job.
  • Secret leakage is a real risk. This one caught me off guard. By default, your child process inherits every environment variable from the parent — which means GITHUB_TOKEN, OPENAI_API_KEY, AWS_SECRET_ACCESS_KEY, all of it flows straight to your server (or a third-party server you’re connecting to). The fix is one option, but you have to know to set it.

Server — the code is three lines

var builder = Host.CreateApplicationBuilder(args);

builder.Services.AddMcpServer()
    .WithStdioServerTransport()
    .WithTools<MyTools>();

// stdout belongs to the protocol — logs must go to stderr
builder.Logging.AddConsole(options =>
{
    options.LogToStandardErrorThreshold = LogLevel.Trace;
});

await builder.Build().RunAsync();

Client

var transport = new StdioClientTransport(new StdioClientTransportOptions
{
    Command = "dotnet",
    Arguments = ["run", "--project", "MyMcpServer"],
    ShutdownTimeout = TimeSpan.FromSeconds(10)
});

await using var client = await McpClient.CreateAsync(transport);

The environment variable thing — actually fix this

Don’t skip this, especially if you’re connecting to any third-party MCP server. The safe pattern is to start from the SDK’s curated allowlist and add only what your specific server needs:

// GetDefaultEnvironmentVariables() gives you PATH, HOME, and standard
// system dirs — nothing sensitive
var env = StdioClientTransportOptions.GetDefaultEnvironmentVariables();
env["MY_SERVER_API_KEY"] = apiKey; // add only what's needed

var transport = new StdioClientTransport(new StdioClientTransportOptions
{
    Command = "my-mcp-server",
    InheritEnvironmentVariables = false, // <-- this is the important bit
    EnvironmentVariables = env,
});

If you’re writing the server yourself and you know exactly what it needs, this feels like overkill. If you’re connecting to someone else’s server — treat it like any other third-party code and don’t hand it your credentials by default.

2. Streamable HTTP — What You Want in Production

Once your MCP server needs to live somewhere other than the developer’s laptop, Streamable HTTP is the answer. It’s the transport I reach for on every Azure deployment, and the one the SDK team clearly considers the future of the protocol.

Here’s how it works: the client sends an HTTP POST. The server holds that POST response body open as an SSE stream and writes the JSON-RPC response through it — along with any intermediate messages like progress updates. So you get the reliability and auth ecosystem of HTTP, with real-time streaming baked in. Clever.

It comes in two flavours and the choice matters a lot for your deployment architecture.

Stateless mode — what I use by default

Every request is independent. No session tracking. No in-memory state between calls. This is gloriously simple to operate — deploy three instances behind Azure Front Door and every request can land on any instance. No sticky session configuration, no session replication, no “which node has this user’s state” debugging at 2am.

The trade-off: the server can’t send unsolicited messages to the client. If your tools are pure request-response — client asks, server answers — stateless is perfect and I’d argue it’s the right default for most tool-use scenarios.

Stateful mode — when you need server push

If your server needs to push notifications mid-conversation — progress updates on a long-running job, real-time alerts, streaming intermediate results — you need stateful mode. The server issues an Mcp-Session-Id header after the first request and tracks per-session state in memory. Clients can also open long-lived GET requests to receive unsolicited notifications.

The cost is operational: you need session affinity at your load balancer. On Azure Container Apps this means pinning the session to a specific replica. Not complicated, but it’s a constraint you need to plan for.

Server code

dotnet add package ModelContextProtocol.AspNetCore

Stateless (the default I recommend):

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddMcpServer()
    .WithHttpTransport(options =>
    {
        options.Stateless = true;
    })
    .WithTools<MyTools>();

var app = builder.Build();
app.MapMcp(); // registers POST /mcp and GET /mcp
app.Run();

Stateful (when you need server push):

builder.Services.AddMcpServer()
    .WithHttpTransport(options =>
    {
        options.Stateless = false;
    })
    .WithTools<MyTools>();

Client

// AutoDetect tries Streamable HTTP first, falls back to SSE if needed
var transport = new HttpClientTransport(new HttpClientTransportOptions
{
    Endpoint = new Uri("https://my-mcp-server.example.com/mcp")
    // TransportMode defaults to AutoDetect
});

await using var client = await McpClient.CreateAsync(transport);

Session resumption — more useful than it sounds

In stateful mode, if the connection drops mid-conversation, you don’t have to start from scratch. Store the session ID and server info after connecting, then resume:

// Save these after first connection
string savedSessionId = client.SessionId;
McpServerCapabilities savedCaps = client.ServerCapabilities;
McpImplementation savedServerInfo = client.ServerInfo;

// On reconnect
var transport = new HttpClientTransport(new HttpClientTransportOptions
{
    Endpoint = new Uri("https://my-mcp-server.example.com/mcp"),
    KnownSessionId = savedSessionId
});

await using var client = await McpClient.ResumeSessionAsync(transport, new ResumeClientSessionOptions
{
    ServerCapabilities = savedCaps,
    ServerInfo = savedServerInfo
});

For long-running agent workflows — the kind where the model is orchestrating a multi-step task over several minutes — this is the difference between a graceful reconnect and a broken session that kills the job.

If you have browser clients (CORS)

Most MCP clients aren’t browsers, but if yours is, you’ll need CORS. One thing worth emphasising: CORS is not a substitute for host name validation — you need both. The CORS config tells browsers which origins are allowed; host name validation protects against DNS rebinding attacks from non-browser clients.

var allowedOrigins = builder.Configuration
    .GetSection("Mcp:AllowedOrigins")
    .Get<string[]>() ?? ["http://localhost:5173"];

builder.Services.AddCors(options =>
{
    options.AddPolicy("McpBrowserClient", policy =>
    {
        policy.WithOrigins(allowedOrigins)
            .WithMethods("POST", "GET", "DELETE")
            .WithHeaders("Content-Type", "Authorization",
                         "MCP-Protocol-Version", "Mcp-Session-Id")
            .WithExposedHeaders("Mcp-Session-Id");
    });
});

app.UseCors();
app.MapMcp("/mcp").RequireCors("McpBrowserClient");

3. SSE — Use It Only If You Have To

I’ll be straight with you: I wouldn’t choose SSE for anything new. The SDK marks EnableLegacySse as obsolete with diagnostic MCP9004, and that label exists for a reason.

SSE splits communication across two endpoints: your server streams to the client over a /sse connection, and the client sends messages back via HTTP POST to a /message endpoint. It was the original remote transport before Streamable HTTP arrived, and it has a structural flaw that’s hard to work around: the POST endpoint returns HTTP 202 before your handler even runs, which means there’s no backpressure. Under load, requests pile up and you can’t signal to the client that the server is overwhelmed.

The only reason to use SSE today is compatibility. Some older MCP clients — early Claude Desktop builds, some third-party tools — only speak SSE. If you need to support them alongside newer clients, run SSE alongside Streamable HTTP and let clients self-select.

Client

var transport = new HttpClientTransport(new HttpClientTransportOptions
{
    Endpoint = new Uri("https://my-mcp-server.example.com/sse"),
    TransportMode = HttpTransportMode.Sse,
    MaxReconnectionAttempts = 5,
    DefaultReconnectionInterval = TimeSpan.FromSeconds(1)
});

Server — note the pragma

builder.Services.AddMcpServer()
    .WithHttpTransport(options =>
    {
        options.Stateless = false; // SSE requires stateful mode

#pragma warning disable MCP9004
        options.EnableLegacySse = true;
#pragma warning restore MCP9004
    })
    .WithTools<MyTools>();

The fact that you need a pragma suppression to enable it should tell you everything about the SDK team’s intentions here.

4. In-Memory — Your Testing Best Friend

This one doesn’t get talked about enough. The in-memory transport connects a client and server using System.IO.Pipelines inside the same process — no network, no serialisation overhead, no infrastructure. For unit and integration tests it’s genuinely great.

What I like about it: your tests exercise the real MCP protocol, not mocks. Tool registration, schema generation, capability negotiation — all of it runs for real, just without a network between the two sides. I caught a schema mismatch bug in a tool’s [Description] attribute using an in-memory test that I’d completely missed in manual testing.

using System.IO.Pipelines;
using ModelContextProtocol.Client;
using ModelContextProtocol.Protocol;
using ModelContextProtocol.Server;

Pipe clientToServer = new(), serverToClient = new();

await using var server = McpServer.Create(
    new StreamServerTransport(
        clientToServer.Reader.AsStream(),
        serverToClient.Writer.AsStream()),
    new McpServerOptions
    {
        ToolCollection =
        [
            McpServerTool.Create(
                (string message) => $"Echo: {message}",
                new() { Name = "echo" })
        ]
    });

_ = server.RunAsync();

await using var client = await McpClient.CreateAsync(
    new StreamClientTransport(
        clientToServer.Writer.AsStream(),
        serverToClient.Reader.AsStream()));

var tools = await client.ListToolsAsync();
var echo = tools.First(t => t.Name == "echo");
Console.WriteLine(await echo.InvokeAsync(new() { ["arg"] = "Hello, MCP!" }));
// Output: Echo: Hello, MCP!

One limitation worth knowing: in-memory tests won’t catch network-specific failure modes — timeouts, dropped connections, serialisation edge cases from real socket buffers. Use in-memory for protocol correctness tests, and add a handful of real HTTP integration tests for your production transport paths.

Which MCP Transport Should You Use?

Here’s the decision in plain terms. I’ve seen people overthink this — pick the one that fits your deployment and move on.

Your situation Use this
Connecting to Claude Desktop, VS Code, or any local AI client stdio
Remote server on Azure / AWS, tools are pure request-response Streamable HTTP — stateless
Remote server, needs server push or long-running job tracking Streamable HTTP — stateful
Supporting a legacy client that only speaks SSE SSE (but plan to migrate)
Writing unit or integration tests In-memory
Not sure — want the client to handle it automatically HttpTransportMode.AutoDetect on the client

One More Thing: Enterprise SSO with the ID-JAG Flow

If you’re deploying an MCP server inside a corporate environment with Okta or Entra ID, you’ll likely hit the question of how agents authenticate without requiring users to log in every time. The SDK has a built-in solution for this: the Identity Assertion Grant flow.

In short: it exchanges an OIDC ID token from your identity provider for an MCP access token via a two-step RFC-standard token exchange. Your agent stays authenticated, the access token is cached until expiry, and you call InvalidateCache() if you get a 401 and need to refresh.

using ModelContextProtocol.Authentication;

var provider = new IdentityAssertionGrantProvider(
    new IdentityAssertionGrantProviderOptions
    {
        ClientId = "mcp-client-id",
        IdpTokenEndpoint = "https://company.okta.com/oauth2/token",
        IdpClientId = "idp-client-id",
        // This callback retrieves the current user's ID token from your SSO client
        IdTokenCallback = (context, cancellationToken) =>
            mySsoClient.GetIdTokenAsync(cancellationToken)
    },
    new HttpClient());

var tokens = await provider.GetAccessTokenAsync(
    resourceUrl: new Uri("https://mcp-server.example.com"),
    authorizationServerUrl: new Uri("https://auth.mcp-server.example.com"),
    cancellationToken: ct);

// On 401: provider.InvalidateCache() forces a fresh exchange next call

I’ll cover enterprise auth in more depth in a dedicated post — there’s enough there to warrant its own treatment, especially for Azure-hosted scenarios with Entra ID.

Frequently Asked Questions

What is the default MCP transport in the C# SDK?

There isn’t one — you choose explicitly when configuring the server. The most common starting points are WithStdioServerTransport() for local tools and WithHttpTransport() from ModelContextProtocol.AspNetCore for anything deployed remotely.

Can I switch transports without rewriting my tools?

Yes, completely. The transport is configured in Program.cs; your [McpServerTool] methods don’t know or care which transport is active. I’ve migrated projects from stdio to Streamable HTTP in under 10 minutes — it’s genuinely just a config change.

What is the difference between stateless and stateful Streamable HTTP?

Stateless means each HTTP POST is a standalone request — no session, no shared state, infinitely scalable. Stateful means the server issues an Mcp-Session-Id and tracks per-session state in memory, which enables server-to-client push notifications and session resumption but requires sticky sessions at your load balancer.

Is SSE still supported in the MCP C# SDK?

Yes, but it’s marked obsolete (diagnostic MCP9004). The SDK team recommends Streamable HTTP for all new work. SSE is there for backwards compatibility with older clients and will likely be removed in a future major version.

What transport does Claude Desktop use?

stdio. Claude Desktop launches your server as a child process using the command defined in claude_desktop_config.json and communicates over stdin/stdout. This is why local MCP servers for Claude are so simple to set up — no networking involved at all.

Can I run stdio and Streamable HTTP on the same server?

Not in a single AddMcpServer() setup, but the practical pattern is to put your tool logic in a shared class library and have two thin host projects — one with stdio for local dev tooling, one with Streamable HTTP for production. The tool code is identical; only the entry point differs.


Transport decisions look simple on paper and get complicated fast when you’re deploying to Azure with sticky session requirements and corporate SSO in the way. If you’re navigating that and want a second opinion, find me on LinkedIn — I’m in Sydney and happy to talk it through.

This is part of my MCP server series: What Are MCP Servers? · Real-World MCP Case Study · Production MCP Deployment on Azure.

What Are MCP Servers? A Complete Guide for AI Engineers

If you’re building AI agents that need to interact with real systems — databases, APIs, internal tools — you’ve probably run into the same wall I did. I’m talking about the mess of hardcoded JSON schemas in system prompts, brittle API wrappers, and production incidents that happen when the LLM invents a parameter name that doesn’t exist. An MCP server (Model Context Protocol server) is the structured solution to that problem, and as an AI engineer working in Sydney, it’s now a core part of how I build agent systems.

Let me walk you through what MCP servers are, why they matter, and how to build one in C# using the official SDK — with real, working code.

The Problem an MCP Server Solves

When you build an AI agent that needs to do things — query a database, call an API, read a file — you have to teach the model what tools exist, what they accept, and what they return. The old approach was: write it into the system prompt and pray.

The problem is that system prompts don’t have a schema. If your tool expects a date in ISO 8601 format and the model sends "tomorrow", your app crashes. If you rename a parameter, you need to update prompts scattered across multiple deployments. There’s no contract between the AI and your code.

MCP fixes this by defining a standard protocol — a contract — between AI models (clients) and the external capabilities they need (servers). An MCP server advertises what tools it offers, what inputs each tool expects, and what it returns. The model calls tools through MCP the same way a browser calls APIs through HTTP: with a defined interface that both sides agree on in advance.

This isn’t just theoretical tidiness. It’s what makes AI agents maintainable in production.

How an MCP Server Works

Think of MCP like a USB-C standard for AI tools. Before USB-C, every device had its own charger. After, one cable works everywhere. MCP does the same thing for AI capabilities.

Here’s the flow when an AI agent uses an MCP tool:

  1. The agent (MCP client) connects to your MCP server at startup.
  2. The server sends a capabilities manifest — a list of tools it provides, with their names, descriptions, and input schemas.
  3. The agent’s host (Claude, ChatGPT, your own orchestrator) reads this manifest and makes the tools available to the model.
  4. When the model decides to call a tool, it sends a structured request through the client.
  5. Your server executes the tool and returns a structured response.
  6. The model reads the result and continues reasoning.

The beauty of step 2 is that the model now has a machine-readable description of your tools — not a paragraph of English in a system prompt, but a typed schema. Fewer hallucinations. Cleaner architecture. Tools you can version and test independently.

The Three Things an MCP Server Can Expose

MCP servers can expose three types of capabilities:

Tools — Functions the model can call. Think of these as your API endpoints. A tool takes typed inputs and returns a result. This is what most people think of when they hear “MCP server.”

Resources — Read-only data the model can access. Like files, database records, or live feeds. The model can browse available resources and read their contents.

Prompts — Reusable prompt templates the model can invoke. Useful for standardising how certain tasks are framed across your application.

For most enterprise use cases, you’ll mostly be building Tools. Resources matter when you want the model to explore data rather than query it directly. Prompts are useful in multi-agent workflows.

Architecture Overview

Before we touch code, here’s how the pieces connect. The diagram shows the three zones: the AI client (Claude Desktop or your app), the MCP server (our C# application), and the backend systems it talks to. Use the zoom controls inside the diagram to explore it in detail.

MCP Server architecture: AI client → JSON-RPC protocol → C# server → backend systems. Pan and zoom inside the diagram, or open full-screen in draw.io ↗

Building a Real MCP Server in C#

Enough theory. Let me show you how this looks in code using the official MCP C# SDK (ModelContextProtocol), which Microsoft co-maintains with Anthropic. This is production-ready code — not a toy example.

Step 1: Install the packages

dotnet add package ModelContextProtocol
dotnet add package Microsoft.Extensions.Hosting

This gives you the full MCP server runtime with the .NET generic host, dependency injection, and logging baked in.

Step 2: Project structure

MyMcpServer/
├── Program.cs
├── Tools/
│   └── ProductTools.cs
└── MyMcpServer.csproj

Step 3: Wire up the host in Program.cs

using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using MyMcpServer.Tools;

var builder = Host.CreateApplicationBuilder(args);

builder.Services.AddMcpServer()
    .WithStdioServerTransport()   // Claude Desktop and most MCP clients use stdio
    .WithTools<ProductTools>();   // Register your tool class

// Log to stderr — stdout must stay clean for the MCP protocol
builder.Logging.AddConsole(options =>
{
    options.LogToStandardErrorThreshold = LogLevel.Trace;
});

// Your tools get full DI — inject anything here
builder.Services.AddSingleton<IProductRepository, SqlProductRepository>();

await builder.Build().RunAsync();

Three things worth noting. First, AddMcpServer() wires up the entire MCP runtime — you write zero protocol handling. Second, WithStdioServerTransport() is for local Claude Desktop use; for Azure deployments you swap it for the ModelContextProtocol.AspNetCore HTTP transport. Third, your tool classes get full dependency injection — inject your database, HTTP clients, loggers, anything.

Step 4: Define your tools with attributes

using ModelContextProtocol;
using ModelContextProtocol.Server;
using System.ComponentModel;

namespace MyMcpServer.Tools;

[McpServerToolType]
public sealed class ProductTools
{
    private readonly IProductRepository _repo;

    public ProductTools(IProductRepository repo)
    {
        _repo = repo;
    }

    [McpServerTool, Description("Search for products by name or SKU. Returns matching products with current stock levels.")]
    public async Task<string> SearchProducts(
        [Description("The product name or SKU to search for.")] string query,
        [Description("Maximum number of results to return. Defaults to 10.")] int limit = 10)
    {
        var results = await _repo.SearchAsync(query, limit);

        if (!results.Any())
            return $"No products found matching '{query}'.";

        return string.Join("\n---\n", results.Select(p =>
            $"SKU: {p.Sku}\nName: {p.Name}\nStock: {p.StockLevel}\nPrice: {p.Price:C}"));
    }

    [McpServerTool, Description("Get detailed information about a specific product by its SKU.")]
    public async Task<string> GetProductDetails(
        [Description("The product SKU (e.g. PROD-12345).")] string sku)
    {
        var product = await _repo.GetBySkuAsync(sku);

        if (product is null)
            return $"No product found with SKU '{sku}'.";

        return $"""
            SKU: {product.Sku}
            Name: {product.Name}
            Description: {product.Description}
            Stock Level: {product.StockLevel} units
            Price: {product.Price:C}
            Category: {product.Category}
            Last Updated: {product.UpdatedAt:yyyy-MM-dd HH:mm} UTC
            """;
    }
}

The [McpServerTool] attribute registers the method as an MCP tool. The [Description] attributes on both the method and its parameters become the typed schema that the AI model reads — this replaces the fragile English paragraphs in your system prompt. If the description is precise, the model uses the tool correctly. If the parameter description specifies the expected format, the model sends the right format. Every time.

What the AI Model Actually Sees

When Claude or any MCP client connects to this server, it automatically receives a capabilities manifest generated from your C# code:

{
  "tools": [
    {
      "name": "SearchProducts",
      "description": "Search for products by name or SKU. Returns matching products with current stock levels.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "query": {
            "type": "string",
            "description": "The product name or SKU to search for."
          },
          "limit": {
            "type": "integer",
            "description": "Maximum number of results to return. Defaults to 10.",
            "default": 10
          }
        },
        "required": ["query"]
      }
    }
  ]
}

The SDK generates this JSON schema from your C# method signatures automatically. No manual schema writing. No JSON maintenance. Rename a parameter in C#, the schema updates. Add a new tool method, it appears in the manifest. This is the developer experience that was missing before MCP.

Running It Locally with Claude Desktop

To test your MCP server with Claude Desktop, add it to your claude_desktop_config.json:

{
  "mcpServers": {
    "product-server": {
      "command": "dotnet",
      "args": ["run", "--project", "/path/to/MyMcpServer"]
    }
  }
}

Restart Claude Desktop, open a conversation, and ask: “What products do we have in the PROD-1 range?” Claude will automatically call SearchProducts, read the result, and answer from real data — not a hallucination.

That moment when you see the AI reach into your actual database and return a real answer is quietly satisfying in a way that’s hard to describe until you’ve experienced it.

When Should You Build an MCP Server?

Build one when:

  • You want the same capabilities available to multiple AI clients (Claude, Copilot, your internal tools) without duplicating integration code.
  • You’re building an AI agent that needs to interact with internal systems — databases, ERPs, CRMs — that aren’t publicly accessible.
  • You want to give your tools a proper versioned schema rather than relying on natural language in prompts.
  • You’re building a multi-agent system and need a reliable way for agents to share capabilities.

Don’t bother when you’re prototyping with a single model and a single capability. Use function calling directly. MCP pays off when you’re thinking about the long run — multiple clients, multiple tools, real maintainability.

The Bigger Picture

MCP is becoming the USB-C of AI tooling. Claude supports it natively. GitHub Copilot is adding support. OpenAI’s agent framework is converging on it. If you’re building AI systems that need to interact with the real world, this is the protocol worth learning now.

More importantly for us as engineers: it gives us a way to reason about AI tool integrations the same way we reason about APIs. Contract-first, typed, testable, versioned. That’s the kind of discipline that makes enterprise AI systems actually work in production.

I’ve been working with MCP servers on Azure-hosted agent systems, and the difference in maintainability compared to prompt-embedded tool descriptions is night and day. If you’re building anything beyond a demo, start here.


Working on an AI integration project in Sydney and want to talk through the architecture? Connect with me on LinkedIn — I’m always up for a conversation about what’s actually working in production.

Next up: How I Built an MCP Server for Enterprise Data — a Real-World Case Study, where I go deeper on authentication, error handling, and deploying to Azure Container Apps.

How to Configure Claude Code with Kimi K2, DeepSeek, and GLM: Complete WSL Setup Guide

Claude Code is a powerful CLI tool that can be configured to work with multiple AI providers beyond Anthropic’s default endpoints. In this comprehensive guide, you’ll learn how to set up Claude Code configuration with three popular AI providers: Kimi K2, DeepSeek, and GLM, all while using Windows Subsystem for Linux (WSL).

Prerequisites

  • Windows 10/11 with WSL installed
  • Claude Code CLI installed
  • API tokens for Kimi K2, DeepSeek, and/or GLM
  • Basic familiarity with bash commands

Why Use Multiple AI Providers with Claude Code?

Different AI providers offer unique advantages:

  • Kimi K2: Excellent for Chinese language processing and local deployment options
  • DeepSeek: Strong performance in coding tasks and mathematical reasoning
  • GLM: Optimized for conversational AI and general-purpose tasks

Step 1: Create Environment Files for Each AI Provider

First, we’ll create separate environment files for each AI provider to store their API configurations securely.

Creating the Kimi K2 Environment File

Create the ~/.claude-kimi-env file:

export ANTHROPIC_BASE_URL="https://api.moonshot.cn/v1"
export ANTHROPIC_AUTH_TOKEN="your_kimi_token_here"

Creating the DeepSeek Environment File

Create the ~/.claude-deepseek-env file:

export ANTHROPIC_BASE_URL="https://api.deepseek.com/v1"
export ANTHROPIC_AUTH_TOKEN="your_deepseek_token_here"

Creating the GLM Environment File

Create the ~/.claude-glm-env file:

export ANTHROPIC_BASE_URL="https://api.z.ai/api/anthropic"
export ANTHROPIC_AUTH_TOKEN="your_glm_token_here"

Step 2: Set Up Convenient Aliases

To make switching between AI providers seamless, we’ll create bash aliases that automatically load the correct environment and launch Claude Code.

Create or edit the ~/.bash_aliases file and add the following aliases:

# Custom aliases

  # Claude GLM alias
  alias claude-glm='source ~/.claude-glm-env && claude --dangerously-skip-permissions'

  # Claude Kimi alias
  alias claude-kimi='source ~/.claude-kimi-env && claude --dangerously-skip-permissions'

  # Claude DeepSeek alias
  alias claude-deepseek='source ~/.claude-deepseek-env && claude --dangerously-skip-permissions'

  # Add more aliases below this line

Step 3: Ensure Aliases Load Automatically in WSL

For the aliases to work every time you start WSL, verify that your ~/.bashrc file includes the following lines (they should be there by default):

if [ -f ~/.bash_aliases ]; then
      . ~/.bash_aliases
  fi

Step 4: Apply the Configuration

To use the new aliases immediately without restarting WSL, run:

source ~/.bash_aliases

How to Use Your New Claude Code Setup

Now you can easily switch between AI providers using simple commands:

  • claude-kimi – Launch Claude Code with Kimi K2
  • claude-deepseek – Launch Claude Code with DeepSeek
  • claude-glm – Launch Claude Code with GLM

Security Best Practices

Important Security Tips:

  • Never commit environment files to version control
  • Use strong, unique API tokens for each provider
  • Regularly rotate your API keys
  • Set appropriate file permissions: chmod 600 ~/.claude-*-env

Troubleshooting Common Issues

Aliases Not Working

If your aliases aren’t working after starting WSL:

  1. Check if ~/.bash_aliases exists
  2. Verify ~/.bashrc sources the aliases file
  3. Run source ~/.bashrc to reload the configuration

API Connection Issues

If you encounter API connection problems:

  • Verify your API tokens are correct
  • Check if the API endpoints are accessible from your network
  • Ensure the base URLs are properly formatted

Advanced Configuration Tips

Adding Model-Specific Parameters

You can extend your environment files to include model-specific parameters:

export ANTHROPIC_BASE_URL="https://api.deepseek.com/v1"
  export ANTHROPIC_AUTH_TOKEN="your_token"
  export ANTHROPIC_MODEL="deepseek-chat"

Creating Project-Specific Configurations

For different projects, you might want different AI providers. Consider creating project-specific environment files and aliases.

Conclusion

You’ve successfully configured Claude Code to work with multiple AI providers on Windows WSL. This setup gives you the flexibility to choose the best AI provider for each task while maintaining a consistent development workflow.

The combination of environment files and bash aliases provides a clean, secure, and efficient way to manage multiple AI provider configurations. Whether you’re working with Kimi K2’s Chinese language capabilities, DeepSeek’s coding expertise, or GLM’s conversational strengths, you can now switch between them effortlessly.

How I Set Up My AI Development Environment in WSL on Windows with an RTX 5070 Ti

Introduction

Over the past few weeks, I’ve been diving into AI model training and inference using open-source GPT-style models. I wanted a setup that could take advantage of my NVIDIA RTX 5070 Ti for faster experimentation, but still run inside WSL (Windows Subsystem for Linux) for maximum compatibility with Linux-based tools.

After a bit of trial and error — and a couple of GPU compatibility hurdles — I now have a fully working environment that runs Hugging Face models directly on my GPU. Here’s exactly how I did it.

1. Installing WSL and Preparing Ubuntu

I started by making sure WSL2 was installed and running an Ubuntu distribution:

wsl --install -d Ubuntu
wsl --set-default-version 2

Then I launched Ubuntu from the Start Menu, created my user, and updated everything:

sudo apt update && sudo apt -y upgrade

2. Enabling GPU Support in WSL

Since I wanted GPU acceleration, I installed the latest NVIDIA Game Ready/Studio Driver for Windows. This is important because WSL uses the Windows driver to expose the GPU inside Linux.

Inside WSL, I checked GPU visibility:

nvidia-smi

If you see your GPU listed, you’re good to go.

3. Installing Micromamba for Environment Management

I like to keep my AI experiments isolated in separate environments, so I use micromamba (a lightweight conda alternative).

First, I installed bzip2 (needed for extracting micromamba):

sudo apt install -y bzip2

Then downloaded and initialized micromamba:

cd ~
curl -L https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj
./bin/micromamba shell init -s bash -r ~/micromamba
exec $SHELL

4. Creating a Python Environment

I created an environment named llm with Python 3.11:

micromamba create -y -n llm python=3.11
micromamba activate llm

5. Installing PyTorch with RTX 5070 Ti Support

Here’s where I hit my first big roadblock. The PyTorch stable builds didn’t yet support my compute capability 12.0 (Blackwell architecture). The fix was to install the nightly cu128 build of PyTorch, which does include sm_120 support:

pip install transformers datasets accelerate peft bitsandbytes trl sentencepiece evaluate

Transformers – Hugging Face’s main library for working with pre-trained models (like GPT, BERT, etc.), including easy APIs for loading, running, and fine-tuning them.

Datasets – A fast, memory-efficient library for loading, processing, and sharing large datasets used in machine learning.

Accelerate – A tool from Hugging Face that makes it simple to run training across CPUs, GPUs, or multiple devices with minimal code changes.

PEFT (Parameter-Efficient Fine-Tuning) – A library for applying lightweight fine-tuning methods like LoRA so you can adapt large models without retraining all parameters.

Bitsandbytes – A library for quantizing models (e.g., 8-bit, 4-bit) to save memory and speed up inference/training, especially on GPUs.

TRL (Transformers Reinforcement Learning) – Hugging Face’s library for training transformer models with reinforcement learning techniques like RLHF (Reinforcement Learning from Human Feedback).

SentencePiece – A tokenizer library that helps split text into subword units, especially useful for multilingual and large-vocabulary models.

Evaluate – A library to easily compute machine learning metrics (like accuracy, BLEU, ROUGE, etc.) in a standardized way.

6. Installing AI Libraries

With PyTorch sorted, I installed the Hugging Face ecosystem and related tools:

pip install transformers datasets accelerate peft bitsandbytes trl sentencepiece evaluate

7. Testing GPU Inference

To confirm everything worked, I ran a small model on GPU:

from transformers import AutoTokenizer, pipeline

model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tok = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline("text-generation", model=model_id, tokenizer=tok, device_map="auto")

print(pipe("Explain LoRA in one sentence.", max_new_tokens=50)[0]["generated_text"])

The output came back quickly — and my GPU usage spiked in nvidia-smi — a great sign that everything was working.

8. Conclusion

With this setup, I can run and fine-tune open-source GPT models entirely on my RTX 5070 Ti inside WSL. It’s a clean, isolated environment that avoids Windows-specific headaches and keeps me close to the Linux ecosystem most AI tooling is built for.

If you’re working with a newer NVIDIA GPU, don’t be surprised if you need to grab nightly builds until stable releases catch up. Once you do, you’ll be able to enjoy the full speed of your hardware without leaving the comfort of Windows.

Fixing “spawn npx ENOENT” in Windows 11 When Adding MCP Server with Node/NPX

If you’re running into the error:

spawn npx ENOENT

while configuring an MCP (Multi-Context Plugin) server on Windows 11, you’re not alone. This error commonly appears when integrating tools like @upstash/context7-mcp using Node.js environments that rely on NPX, especially in cross-platform development.

This post explains:

  • What causes the “spawn npx ENOENT” error on Windows
  • The difference between two MCP server configuration methods
  • A working fix using cmd /c
  • Why this issue is specific to Windows

The Problem: “spawn npx ENOENT”

Using this configuration in your .mcprc.json or a similar setup:

{
  "mcpServers": {
    "context7": {
      "command": "npx",
      "args": ["-y", "@upstash/context7-mcp@latest"]
    }
  }
}

will cause the following error on Windows:

spawn npx ENOENT

This indicates that Node.js tried to spawn npx but couldn’t locate it in the system’s PATH.

Root Cause: Windows vs Unix Shell Behavior

On UNIX-like systems (Mac/Linux), spawn can run shell commands like npx directly. But Windows behaves differently:

  • Windows expects a .exe file to be explicitly referenced when spawning a process.
  • npx is not a native binary executable; it requires a shell to interpret and run it.
  • Node’s child_process.spawn does not invoke a shell by default unless specifically instructed.

In the failing example, the system tries to invoke npx directly as if it were a standalone executable, which doesn’t work on Windows.

The Fix: Wrapping with cmd /c

This configuration solves the issue:

{
  "context7": {
    "command": "cmd",
    "args": [
      "/c",
      "npx",
      "-y",
      "@upstash/context7-mcp@latest"
    ]
  }
}

Explanation

  • "cmd" invokes the Windows Command Prompt.
  • "/c" tells the shell to execute the command that follows.
  • The rest of the line (npx -y @upstash/context7-mcp@latest) is interpreted and executed properly by the shell.

This ensures that npx is resolved correctly and executed within a compatible environment.

Technical Comparison

Configuration Style Works on Windows? Shell Used? Reason
"command": "npx" No No Tries to execute npx directly without shell
"command": "cmd", "args": ["/c", "npx", ...] Yes Yes Executes the command within the Windows shell, allowing proper resolution

Best Practices

When using Node.js-based CLI tools across platforms:

  • Wrap shell commands using cmd /c (Windows) or sh -c (Unix)
  • Avoid assuming that commands like npx are executable as binaries
  • Test your scripts in both Windows and Unix environments when possible

Conclusion

If you’re encountering the spawn npx ENOENT error when configuring MCP servers on Windows 11, the fix is straightforward: use cmd /c to ensure shell interpretation. This small change ensures compatibility and prevents runtime errors across different operating systems.

Scraping JSON-LD from a Next.js Site with Crawl4AI: My Debugging Journey

Scraping data from modern websites can feel like a puzzle, especially when they’re built with Next.js and all that fancy JavaScript magic. Recently, I needed to pull some product info—like names, prices, and a few extra details—from an e-commerce page that was giving me a headache. The site (let’s just call it https://shop.example.com/products/[hidden-stuff]) used JSON-LD tucked inside a <script> tag, but my first attempts with Crawl4AI came up empty. Here’s how I cracked it, step by step, and got the data I wanted.

The Headache: Empty Results from a Next.js Page

I was trying to grab details from a product page—think stuff like the item name, description, member vs. non-member prices, and some category info. The JSON-LD looked something like this (I’ve swapped out the real details for a fake example):

{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Beginner’s Guide to Coffee Roasting",
  "description": "Learn the basics of roasting your own coffee beans at home. Recorded live last summer.",
  "provider": {
    "@type": "Organization",
    "name": "Bean Enthusiast Co."
  },
  "offers": [
    {"@type": "Offer", "price": 49.99, "priceCurrency": "USD"},
    {"@type": "Offer", "price": 59.99, "priceCurrency": "USD"}
  ],
  "skillLevel": "Beginner",
  "hasWorkshop": [
    {
      "@type": "WorkshopInstance",
      "deliveryMethod": "Online",
      "workshopSchedule": {"startDate": "2024-08-15"}
    }
  ]
}

My goal was to extract this, label the cheaper price as “member” and the higher one as “non-member,” and snag extras like skillLevel and deliveryMethod. Simple, right? Nope. My first stab at it with Crawl4AI gave me nothing—just an empty [].

What Went Wrong: Next.js Threw Me a Curveball

Next.js loves doing things dynamically, which means the JSON-LD I saw in my browser’s dev tools wasn’t always in the raw HTML Crawl4AI fetched. I started with this basic setup:

from crawl4ai import AsyncWebCrawler
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy

schema = {
    "name": "Product Schema",
    "baseSelector": "script[type='application/ld+json']",
    "fields": [{"name": "json_ld_content", "selector": "script[type='application/ld+json']", "type": "text"}]
}

async def extract_data(url):
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(url=url, extraction_strategy=JsonCssExtractionStrategy(schema))
        extracted_data = json.loads(result.extracted_content)
        print(extracted_data)

# Output: []

Empty. Zilch. I dug into the debug output and saw the JSON-LD was in result.html, but result.extracted_content was blank. Turns out, Next.js was injecting that <script> tag after the page loaded, and Crawl4AI wasn’t catching it without some extra nudging.

How I Fixed It: A Workaround That Worked

After banging my head against the wall, I figured out I needed to make Crawl4AI wait for the JavaScript to do its thing and then grab the JSON-LD myself from the HTML. Here’s the code that finally worked:

import json
import asyncio
from crawl4ai import AsyncWebCrawler

async def extract_product_schema(url):
    async with AsyncWebCrawler(verbose=True, user_agent="Mozilla/5.0") as crawler:
        print(f"Checking out: {url}")
        result = await crawler.arun(
            url=url,
            js_code=[
                "window.scrollTo(0, document.body.scrollHeight);",  # Wake up the page
                "await new Promise(resolve => setTimeout(resolve, 5000));"  # Give it 5 seconds
            ],
            bypass_cache=True,
            timeout=30
        )

        if not result.success:
            print(f"Oops, something broke: {result.error_message}")
            return None

        # Digging into the HTML myself
        html = result.html
        start_marker = '<script type="application/ld+json">'
        end_marker = '</script>'
        start_idx = html.find(start_marker) + len(start_marker)
        end_idx = html.find(end_marker, start_idx)

        if start_idx == -1 or end_idx == -1:
            print("Couldn’t find the JSON-LD.")
            return None

        json_ld_raw = html[start_idx:end_idx].strip()
        json_ld = json.loads(json_ld_raw)

        # Sorting out the product details
        if json_ld.get("@type") == "Product":
            offers = sorted(
                [{"price": o.get("price"), "priceCurrency": o.get("priceCurrency")} for o in json_ld.get("offers", [])],
                key=lambda x: x["price"]
            )
            workshop_instances = json_ld.get("hasWorkshop", [])
            schedule = workshop_instances[0].get("workshopSchedule", {}) if workshop_instances else {}
            
            product_info = {
                "name": json_ld.get("name"),
                "description": json_ld.get("description"),
                "providerName": json_ld.get("provider", {}).get("name"),
                "memberPrice": offers[0] if offers else None,
                "nonMemberPrice": offers[-1] if offers else None,
                "skillLevel": json_ld.get("skillLevel"),
                "deliveryMethod": workshop_instances[0].get("deliveryMethod") if workshop_instances else None,
                "startDate": schedule.get("startDate")
            }
            return product_info
        print("No product data here.")
        return None

async def main():
    url = "https://shop.example.com/products/[hidden-stuff]"
    product_data = await extract_product_schema(url)
    if product_data:
        print("Here’s what I got:")
        print(json.dumps(product_data, indent=2))

if __name__ == "__main__":
    asyncio.run(main())

What I Got Out of It

{
  "name": "Beginner’s Guide to Coffee Roasting",
  "description": "Learn the basics of roasting your own coffee beans at home. Recorded live last summer.",
  "providerName": "Bean Enthusiast Co.",
  "memberPrice": {
    "price": 49.99,
    "priceCurrency": "USD"
  },
  "nonMemberPrice": {
    "price": 59.99,
    "priceCurrency": "USD"
  },
  "skillLevel": "Beginner",
  "deliveryMethod": "Online",
  "startDate": "2024-08-15"
}

How I Made It Work

Waiting for JavaScript: I told Crawl4AI to scroll and hang out for 5 seconds with js_code. That gave Next.js time to load everything up.DIY Parsing: The built-in extractor wasn’t cutting it, so I searched the HTML for the <script> tag and pulled the JSON-LD out myself.Price Tags: Sorted the prices and called the lowest “member” and the highest “non-member”—seemed like a safe bet for this site.

What I Learned Along the Way

  • Next.js is Tricky: It’s not just about the HTML you get—it’s about what shows up after the JavaScript runs. Timing is everything.
  • Sometimes You Gotta Get Hands-On: When the fancy tools didn’t work, digging into the raw HTML saved me.
  • Debugging Pays Off: Printing out the HTML and extractor output showed me exactly where things were going wrong.

Powered by WordPress & Theme by Anders Norén