Secure the AI Stack Through Platform Engineering

As data and AI-driven organizations push the boundaries of innovation, platform engineering has emerged as a key enabler of speed, scale, and reliability. Whether you’re deploying microservices, data products, or advanced AI agents, the promise of self-service developer platforms is to make innovation repeatable and secure (see the last post for reference).

But speed without control is risky. The recent Docker security bulletin exposed a significant threat: thousands of unprotected MCP (Model Context Protocol) servers running in production across the internet. These insecure endpoints provide attackers with direct access to AI model internals,posing risks from model theft to poisoning attacks. But it’s not only the MCP servers that pose a threat. When OpenAI announced its ChatGPT Agents, Sam Altman said:

We have built a lot of safeguards and warnings into it, and broader mitigations than we’ve ever developed before from robust training to system safeguards to user controls, but we can’t anticipate everything. In the spirit of iterative deployment, we are going to warn users heavily and give users freedom to take actions carefully if they want to.

Well, to me that doesn’t sound like security-by-design was a primary principle when building this. My friend Steve Jones wrote about it, especially when digging into the subagents.

The problem with MCP: AI teams should think about security as a core principle

MCP (Model Context Protocol) servers sit at the heart of many AI and agentic systems. They manage model context switching, agent routing, fine-tuning deployments, and serve real-time inference APIs.

The problem? Many of these servers were:

  • Publicly accessible without authentication.
  • Lacking TLS or identity-aware access controls.
  • Hosting AI model metadata, prompts, logs, and routes - all unprotected.

In other words: there’s the potential that many AI applications were wide open to exploitation or could be used to launch attacks.

But what shall we do to still leverage MCP functionality while ensuring security and compliance? To mitigate these risks, it’s crucial to ensure that MCP servers are properly secured and integrated into a broader platform engineering strategy.

Backstage Templates: Scaffolding Secure AI Development

One of the most powerful tools in a platform engineer’s toolkit is Backstage, an open-source internal developer portal (IDP) framework to establish an IDP within an organization and support developers. One of the core features of Backstage is its software templates capability: customizable scaffolding mechanisms that allow teams to spin up new services, agents, pipelines, or data products with a single click - and crucially, with security and compliance built-in. You define a template with easy-to-follow flows and integrate a build-in and custom actions like Github integration.

Example: Scaffolding a Secure AI Agent with Backstage

Use case: A team wants to publish an AI agent to perform purchasing and contracting tasks using internal data and a fine-tuned LLM.

Using a Backstage software template, the team clicks “Create New Agent” in the internal developer portal. The template scaffolds:

  1. Code and Infrastructure Setup

    • Git repository with a standardized project structure
    • Predefined MCP configuration with access policy placeholders
    • Secure Dockerfile with non-root user and distroless base image
  2. Security by Default

    • Authentication via OIDC to control agent access
    • Automatic provisioning of secrets via Vault or Azure Key Vault
    • Network policy templates to restrict outgoing traffic
  3. DevSecOps Pipeline Integration

    • SBOM generation with tools like OWASP CycloneDX Generator (cdxgen)
    • Static security analysis and CVE scans in CI/CD
    • Auto-registration in a service catalog with ownership and compliance metadata
  4. Observability Hooks

    • Pre-integrated with Observability platforms like Prometheus/Grafana or Datadog
    • Standard logs and traces for all inference traffic and config changes
    • Alerting on failed authentication or unexpected model behavior

Why is this helping both developers and IT security & SRE experts? By embedding security and operations into the scaffolding, developers don’t have to worry about best practices - they’re simply there.

Platform Engineering Principles in Action

This approach reflects three core principles of platform engineering:

  1. Golden Paths: Developers follow a guided, standardized path that eliminates cognitive overhead and reduces room for misconfiguration.
  2. Security as a Service: Identity, secrets, policy, and monitoring are exposed as reusable platform primitives.
  3. Self-Service, Not Self-Made: Developers can focus on innovation, while the platform team ensures that every new AI agent or data product is born secure.

Key takeaways

  1. Templatize Trust Don’t trust developers to remember security, generate it by default. Use Backstage templates to enforce secure defaults for MCP, model APIs, and AI agents.
  2. Standardize Deployment Pipelines Platform teams should ship pre-approved CI/CD pipelines with SBOM scanning, dependency checks, and audit trail generation to ensure DevSecOps at scale.
  3. Secure the Entire AI Lifecycle Treat MCP endpoints, prompt stores, vector databases, and LLM routing as production-critical infrastructure. Secure and monitor them as you would any core service.