Now onboarding design partners
|

Every AI agent.
Governed. Attributed. Audited.

The control plane for your AI fleet — identity-aware policy, cost attribution and audit for every LLM call across LangChain, LangGraph, CrewAI and your local models.

inferencefort · live governance pipeline — type a prompt, watch the verdict
Incoming LLM call
User context
user: priya@acme.com
team: cx-support
model: claude-sonnet-4
Pipeline
1
Identity resolved
priya · cx-support · active
2
Policy: model + budget
claude-sonnet ok · $3.20 / $50 today
3
Content rules
substring · regex · pii
4
External detector
lakera · presidio · content safety
5
Attribution log
user × team × model × cost → SIEM
LangChain LangGraph CrewAI Anthropic SDK OpenAI Agents Ollama vLLM Okta Microsoft Entra Splunk Sentinel
Live attribution

Watch your AI fleet in real time.

Every call lands in the attribution log — who, which team, what model, what cost, what verdict. This is a simulation of the live stream.

Attribution stream
0 events
The attribution graph

One graph. Every question answered.

Users, teams, agents and models — connected in one queryable model. Click any node to trace its lineage and see what the graph knows about it.

click a node — or watch it auto-explore
What you get

Four layers. One coherent product.

Every layer reinforces the others. Remove one and the rest lose half their value.

Identity-aware governance

Every call bound to a real human from Okta or Entra. Policies in your language: sales-org → claude-sonnet, $50/day.

Attribution graph

One queryable model: user × team × agent × model × cost × findings. The schema nobody else has.

Credential gating

Provider keys live in your vault. Scoped, expiring credentials per user. Shadow AI surfaces as findings — not silent spend.

Purpose-bound access

Coding seats used for coding. Marketing seats used for marketing. Soft-mode reporting first; hard-mode enforcement when you're ready.

Detector orchestration

Lakera, Presidio, Azure Content Safety, your own rules. Run them in parallel. Fuse verdicts. Same code, all backends.

SIEM-shaped audit

Every input, output, verdict, decision — one structured record. Flows into Splunk, Sentinel, Elastic. Your SOC sees AI in dashboards they already have.

Install

Drop it in. Done.

Your platform team adds InferenceFort to the internal artifact repo. Developers never change a line of code. Every framework call routes through your governance pipeline — automatically.

  • Zero developer-side configuration
  • Works for cloud models and on-prem / Ollama
  • Python, TypeScript, Go SDKs
  • Multi-tenant, enterprise-grade isolation from day one
# requirements.txt — pinned by your platform team
inferencefort==1.0.0

# your code, unchanged
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate

llm = ChatAnthropic(model="claude-sonnet-4")
prompt = ChatPromptTemplate.from_template("{q}")
chain = prompt | llm

# governed automatically — policy, rules, audit
result = chain.invoke({"q": "summarize ticket #4521"})
<5ms
policy decision overhead
0%
framework calls covered
0
dev config changes
SOC 2
on the roadmap
Common questions

The honest answers

How is this different from Lakera or Microsoft AGT?
Lakera classifies content. Microsoft AGT enforces policy. Neither binds findings to identity, team, agent, or cost. InferenceFort orchestrates the detectors you already trust, enriches their calls with full agent context, and produces ranked findings with owners attached. We don't compete with them — we make them work.
Do you detect prompt injection yourselves?
No, and that's deliberate. Detection is a research problem with a moving target. You plug in Lakera, Presidio, Azure Content Safety, or your own rules. We orchestrate them, fuse their verdicts with your policy and identity, and route every decision into your SIEM.
Does this work for local models?
Yes. Because we sit at the framework layer — not the network — every ChatOllama, vLLM, or on-prem Llama call is governed just like a cloud call. Network gateways are blind to local traffic; we aren't.
What does the developer have to change?
Nothing. Your platform team adds InferenceFort to your internal artifact repository once. From then on every pip install langchain in your org silently gets the instrumented build. Developers write the same code they would have written.
How do you handle multi-tenancy?
Strict tenant isolation enforced at the database layer, not just in application code. Shared cloud, dedicated cloud, or BYOC — same product, different deployment shape.
What about latency?
In-process policy decisions resolve in single-digit milliseconds. External detector calls run in parallel with their own timeout and fail-open/closed knob. Streaming, async, and batch invocations are all instrumented.

Govern every prompt.
Sleep at night.

We're working with a small group of design partners through 2026. If you're running AI agents in production and want a real control plane, say hi.