Now onboarding design partners

Secure AI agents at runtime
not through a proxy.

InferenceFort runs inside the agent runtime to discover AI activity, enforce policy on model and tool calls, redact sensitive data, and produce audit evidence — across cloud and local models, with no proxy to route through.

Get early access See how it works

Discover

Surface the agents, models, and tool calls running across your stack — cloud or local.

Secure

Inspect context and evaluate policy on model and tool calls — allow, redact, or block before they run.

Govern

Set identity-aware policy and budgets, and capture audit evidence for the calls you govern.

Orchestrates your security stack, in-process — Lakera Guard

One platform to govern all your AI models & frameworks

⛓ LangChain ◈ LangGraph ⊞ CrewAI ◐ Anthropic ◯ OpenAI ⬢ Ollama ▣ vLLM ◆ Bedrock ✦ Vertex ✧ Groq ⌗ Splunk ⌑ Sentinel

The problem

AI agents outran your security stack.

01 · Discovery

You can't see what's running

Shadow AI ships to production with no registry, no review.
Local models (Ollama, vLLM) never touch the network — firewalls see nothing.
No inventory. Which agents? Which models? Whose? Nobody knows.

"We found out about the agent when the invoice arrived."

02 · Security

You can't stop an unsafe call

PII leaks. An SSN goes into GPT-4o with nothing checking it first.
Prompt injection turns an agent's own tools into an exfiltration channel.
DLP is blind. It sees encrypted HTTPS — it can't redact or block.

"The SSN reached the model before anyone could object."

03 · Governance

You can't set rules or prove them

No spend caps. A 3 AM retry loop burned $40,000 unnoticed.
No approvals. Finance okayed GPT-3.5; engineering shipped GPT-4o.
No audit trail. Who sent what to which model? No record exists.

"The security review asked for the audit trail. There wasn't one."

The solution

One panel that discovers, secures,
and governs the whole fleet.

InferenceFort runs inside the agent runtime. The moment a governed agent runs — registered or not, cloud or on-prem — it appears here: named, owned, checked against policy, and capped. This is what your security team sees.

app.inferencefort.com/agents Live

AI agent inventory All teams · 6 Cloud + on-prem Needs attention · 2

$258 / $430 cap

Spend today

1,204

Calls governed today

Blocked / capped

Active agents

Agent

Owner

Model

Daily spend vs cap

Status

SB support-bot
Support

Priya Shah

Customer Success

claude-sonnet-4

$32/ $50

Active

SC sales-copilot
Sales

Sam Rivera

Revenue

gpt-4o

$47/ $50

Near cap

BA billing-agent
Finance

Alex Morgan

Finance Ops

gpt-4o

$50/ $50

Capped · blocked

CB claims-bot
Claims

Nina Kapoor

Insurance

claude-sonnet-4

$18/ $40

PHI redacted

RB research-bot
R&D

Mei Lin

Research

claude-opus-4

$61/ $120

Active

ET etl-runner
Platform

Data Team

Platform Eng

llama-3-70b · on-prem

$0.00/ unmetered

Local · governed

IA intern-agent
Engineering

Jordan Park

Eng (intern)

gpt-4o

—not approved

Blocked

7 agents across 6 teams · cloud + on-prem · each governed call attributed to a real human and checked against policy before it runs

DiscoverEvery agent surfaced — including shadow AI no one registered.

AttributeEach one bound to a real owner, team, and model.

EnforceSet spend caps and policy for the whole fleet from here.

AuditGoverned calls streamed to Splunk, Sentinel, or Elastic.

Everything in one package

Six capabilities. One install.

No "observe layer" sold separately, no "enforce tier" upgrade. Every capability runs in-process from the first governed call.

In-process enforcement

Policy decided inside the agent process, before the call runs. Explicit fail-open / fail-closed posture, zero hot-path network.

blocked → never left the process

Tool-call & MCP governance

The action layer: MCP server allow-lists, blocked and approval-gated tools, and the lethal-trifecta guard that stops injection-to-exfiltration on the third leg.

private + untrusted + exfil → block

User attribution on every event

Every LLM and tool call carries user × agent × end-customer × decision × cost. The audit answer to "who did that?" — no anonymous calls.

user × agent × customer × cost

Egress & data residency

Pin each provider to approved endpoints and regions. A prompt bound for a non-approved destination is stopped before connect — the data-transfer control, enforced.

openai → *.openai.azure.com only

PHI redaction & content rules

Safe Harbor identifiers redacted before the model ever sees the text. Substring and regex rules plus Lakera and Presidio detectors, verdicts fused.

SSN 123-45-6789 → [REDACTED]

Multi-agent tracing & cost

One causal trace across agents, processes, and languages — supervisor→researcher fan-out as a DAG — with per-user, per-agent, per-customer spend and budget caps.

trace → span → agent graph

Why we stand out

Not a proxy. It runs inside your process.

A network gateway only sees encrypted traffic and never sees local models at all. InferenceFort evaluates policy on model and tool calls from inside the runtime — before they run — without rerouting traffic or sending prompts through anyone else's server.

Agent action A model or tool call fires inside your process.

Inspect context Identity, prompt, and tool + arguments.

Evaluate policy Local rules — no network round-trip.

Decision

Allow Redact Block

Log evidence Structured audit record for your SIEM.

Network gateway / proxy

✕ Blind to local models (Ollama, vLLM)

✕ Blind to in-process LLM calls

✕ Round-trip to the cloud on every call

✕ Requires DNS / proxy re-routing

✕ Sees tokens, not users — no identity

✕ Your prompts pass through a third party

InferenceFort

✓ Every model — cloud, Ollama, on-prem, Bedrock

✓ One patch covers every provider & framework

✓ Policy runs in-process from a local cache

✓ pip install — zero infrastructure changes

✓ User × team × agent × cost on every record

✓ Traffic goes direct; audit is metadata only

Sovereign & on-prem AI

Sovereign AI, audit-ready:
EU AI Act · NIST 800-171 · HIPAA.

Ollama, vLLM, a fine-tuned Llama on your own GPUs — local models make no external call, so network gateways, CASBs, and DLP see nothing. InferenceFort runs inside the process: private models governed exactly like cloud ones.

Invisible to everyone else

A local model call never touches the wire — perimeter tools can't inspect what never leaves. We sit at the call, not the network.

on-prem = governed

Air-gap ready. Assessor ready.

Policy evaluates in-process from a cached bundle — enforcement works fully disconnected, in classified networks. Audit logs trace every action to a user, the control NIST SP 800-171 and CMMC assessments ask for first.

NIST 800-171 · CMMC · zero external dependency

One policy, cloud to on-prem

Same rules, same audit schema for GPT-5 and llama on your GPUs. Lifetime event logging per EU AI Act Art. 12; endpoint pinning keeps data in-region for GDPR residency.

EU AI Act Art. 12 · GDPR Ch. V

Seamless Integration

Deploys as an SDK.
No per-call code.

Add one package and set one environment variable. Calls made through your LangChain, LangGraph, CrewAI, Bedrock, and Ollama stack are governed at runtime — developers don't touch each call site.

No per-call code — one package, one environment variable
Covers cloud models and on-prem / Ollama
Python SDK — LangChain, LangGraph, CrewAI, Bedrock, Vertex
Multi-tenant isolation, built for enterprise deployments

Get early access

terminal

# 1. add the package
$ pip install inferencefort

# 2. point it at your control plane
$ export INFERENCEFORT_KEY=your-team-key

# 3. run your app — calls are now governed at runtime
$ python app.py

✓ governing: ALLOW priya@ · BLOCK ssn-rule · CAP $50/day

per-call code changes for developers

Local

policy decisions — no network round-trip

SIEM

-ready audit evidence

SOC 2

on the roadmap

Your team already signs in through Okta

— connect once, and governed calls map to a real person.

Still have questions?

We expected that.

The ones we get most often — answered as honestly as we can.

Do our prompts or customer data ever reach InferenceFort?

No. Your AI calls go directly from your application to the model — InferenceFort is not in the request path. It only receives the audit record you choose to send, which is metadata: who called, which model, what it cost, and the verdict. Sensitive data like PII can be redacted before the model itself sees it.

How do you find agents nobody told us about?

Once the SDK is installed, governance activates automatically for calls made through supported frameworks — so the moment an agent, registered or not, makes a model call, it appears in your inventory: named, attributed to an owner, with its model and spend. Shadow AI stops being invisible.

Can each team have its own models, caps, and policies?

Yes. Policies are identity-aware, so you can say "Sales can use these models up to $50/day, Finance these others" and have it enforced per user, team, and agent. Set it once in the panel; every agent picks it up automatically — no per-team rollout.

Does this cover on-prem and local models too?

On-prem is our strongest case. Local models (Ollama, vLLM, a Llama on your own GPUs) make no external call, so network gateways and DLP are blind to them — but InferenceFort runs in-process, so they're governed identically to cloud. Policy evaluates from a local cache with no round-trip, so enforcement works even fully air-gapped.

What does our team have to change to roll this out?

Almost nothing. There's no rebuild, no rerouting of traffic, and no per-team rollout — governance switches on across your existing AI agents, and your engineers keep working exactly as they do today.

You've seen the problem.
You know the fix.

We're working with a small group of design partners right now. If you're shipping AI agents and want governance developers actually deploy — let's talk.

Secure AI agents at runtimenot through a proxy.

Discover

Secure

Govern

AI agents outran your security stack.

You can't see what's running

You can't stop an unsafe call

You can't set rules or prove them

One panel that discovers, secures,and governs the whole fleet.

Six capabilities. One install.

In-process enforcement

Tool-call & MCP governance

User attribution on every event

Egress & data residency

PHI redaction & content rules

Multi-agent tracing & cost

Not a proxy. It runs inside your process.

Sovereign AI, audit-ready:EU AI Act · NIST 800-171 · HIPAA.

Invisible to everyone else

Air-gap ready. Assessor ready.

One policy, cloud to on-prem

Deploys as an SDK.No per-call code.

We expected that.

You've seen the problem.You know the fix.

Secure AI agents at runtime
not through a proxy.

One panel that discovers, secures,
and governs the whole fleet.

Sovereign AI, audit-ready:
EU AI Act · NIST 800-171 · HIPAA.

Deploys as an SDK.
No per-call code.

You've seen the problem.
You know the fix.