Working draft — Sancto CS is finalizing the long version with redacted example policies from three of our clients.

Why AI startups specifically struggle

Standard SOC 2 templates assume your data goes to known sub-processors (AWS, Stripe, Datadog). AI products send customer data to OpenAI, Anthropic, Google, fine-tuning pipelines, vector DBs, and sometimes open-source models on Modal/Replicate. Every one of those is a separate sub-processor your auditor will ask about — and most teams haven't documented the data flow.

The other surprise: training data. If you've ever fine-tuned a model, the auditor wants to see how customer data was scrubbed, who approved, and whether any of it leaked into a model that serves other customers.

Days 1–30: foundation

  • Pick the framework (we recommend Vanta or Drata — Secureframe is fine too)
  • Document every sub-processor, including all LLM APIs, vector DBs, fine-tuning services
  • Draft data flow diagrams for each AI feature: input → preprocessing → model call → output → storage
  • Set up SSO, MFA everywhere, password manager rollout
  • Inventory of devices, automated patching

Days 31–60: controls

  • Access reviews (quarterly) — automate via Vanta/Drata
  • Model governance policy: which models are approved, who approves fine-tuning, how PII is handled
  • Customer data isolation: each tenant's vectors in a separate namespace, never cross-contaminated
  • Logging and monitoring: who accessed what, when, what model returned
  • Incident response runbook — including "model returned wrong/harmful output"

Days 61–90: audit prep

  • Pick an auditor — boutiques are cheaper but slower; Type II observation window is 3–12 months
  • Run an internal readiness check (Vanta does this automatically)
  • Pre-write your security page (saves dozens of customer questionnaires later)
  • Schedule the audit. Type I first if you need a fast checkmark; Type II if you can wait 3+ months

What auditors flag specifically about AI

  • "Do you log every prompt sent to the LLM?" → They want yes. Even if you don't store, you should be able to.
  • "How do you prevent prompt injection from exposing other tenants' data?" → System prompts + per-tenant retrieval scoping
  • "Is customer data used to train your models?" → Have a written policy. Default to no.
  • "What happens to OpenAI/Anthropic's logs?" → You need their zero-retention agreement on file
The fastest path to Type II for an AI startup is Type I now, observation window starting tomorrow, Type II in 6 months. Don't wait for "perfect."