> For the complete documentation index, see [llms.txt](https://docs.coherent.global/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.coherent.global/trust-center/coherent-trust-center/insights-ai-security-and-data-usage.md).

# Insights - AI Security and Data usage

### Summary <a href="#summary" id="summary"></a>

* Insights is designed for regulated environments. Excel files are scanned and processed locally in the browser via a WASM-based engine; the data level the customer selects governs what — if anything — leaves for the cloud.
* The platform supports four data levels (L0–L3). The default (Level 1) captures only basic metadata: file names, sheet names, authors, and structural information. No cell data, formulas, or file content.
* For Insights Projects, the cloud stores project configuration, structured assessment results, attached supporting documents, and generated reports — but not Excel file content.
* AI analysis is performed by cloud-hosted LLMs accessed through Coherent's centralized proxy (Coherent does not run a local AI model in the default configuration). The proxy authenticates and meters traffic but does not log or retain prompt or response content. OpenAI does not use API data to train its models.
* Agent guardrails include scoped tool access, sandboxed execution, tenant isolation, governed system prompts, and structured output validation.
* Customers can bring their own AI API keys, route through their own gateway, or integrate with an on-premise LLM — with no application code changes.

<details>

<summary>How Insights handles your data</summary>

Coherent Insights separates what stays local from what reaches the cloud, and gives customers explicit control over where that boundary sits.

**The platform: what goes to the cloud by default**

When you scan files with Insights, scanning and extraction run locally in the browser, and the platform captures only **Level 1 metadata by default** — file names, sheet names, creator/modifier info, formula counts, and non-sensitive structural information. No cell data, no formulas, no file content.

Insights supports four data levels, and the customer controls which level is active:

<table><thead><tr><th width="173.11328125">LEVEL</th><th>WHAT'S CAPTURED</th><th>TYPICAL USE CASE</th></tr></thead><tbody><tr><td><strong>Level 0</strong></td><td>File fingerprints (SHA-256 hashes) and similarity vectors only</td><td>Deduplication, grouping, compliance tracking — zero content exposure</td></tr><tr><td><strong>Level 1</strong> (default)</td><td>Basic metadata: file names, sheet names, authors, table headers</td><td>Cataloging and estate visibility without revealing content</td></tr><tr><td><strong>Level 2</strong></td><td>Summarized formulas, sample content, logic representations</td><td>Deeper reporting and analysis while limiting full content exposure</td></tr><tr><td><strong>Level 3</strong></td><td>Full workbook content, formulas, and optionally the file itself</td><td>Advanced analysis — only enabled on explicit customer request</td></tr></tbody></table>

Moving beyond Level 1 requires deliberate customer action. By default, only metadata reaches the cloud.

**Insights Projects: browser-local processing**

When running an Insights Project (a bounded analysis over a defined set of files), the architecture shifts further toward local processing:

* **Excel files stay in the browser.** The scanning and extraction engine runs as WebAssembly (WASM) inside the browser. Full file content is processed locally and is not uploaded to Coherent's servers.
* **File fingerprints and metadata** are synchronized to the cloud so the platform can coordinate the project, track progress, and enable collaboration.
* **AI interactions** go from the browser through Coherent's AI proxy to a cloud-hosted LLM. The proxy authenticates and routes the request — it does not log, store, or retain prompt content or responses. Coherent does not run a local AI model in the default configuration, and what the agent can see is bounded by the data level the customer has enabled.

**What IS stored in the cloud for a Project:**

* **Project configuration**: the project plan, analysis templates, field definitions, and settings
* **Assessment results**: the structured output of the AI's analysis (field values, confidence indicators, evidence citations) — governance artifacts, not source file content
* **Supporting documents**: context documents (policies, checklists, audit requirements) that users attach to a project
* **Reports**: generated reports

**What is NOT stored in the cloud:**

* **Excel file content**: cell data, formulas, and workbook internals remain in the browser unless the customer explicitly enables Level 2 or Level 3 storage
* **Raw file uploads**: the platform does not require or perform bulk file uploads for Projects

**Practical note on supporting documents:**&#x20;

If you attach context documents to a project that happen to contain sensitive data, those documents will be stored in the cloud as part of the project. The Excel files being analyzed stay local, but supplementary materials explicitly added to the project do not. Be mindful of what you attach.

</details>

<details>

<summary>AI and LLM security</summary>

**Data handling and training**

Coherent uses OpenAI as its default AI provider, accessed through a centralized AI proxy. Under the API agreement with OpenAI:

* **API data is not used to train or improve OpenAI's models.** This is explicit in OpenAI's API data usage policy.
* **API data is retained by OpenAI for up to 30 days** for abuse monitoring, then deleted.
* **Coherent's proxy does not log or retain prompt or response content.** It tracks only usage metadata (token counts, model used, timestamp) for billing and metering.

**What gets sent to the LLM**

During an Insights Project, the AI agent sends analysis context to the LLM — structured descriptions of workbook content, instructions, and schema definitions. The agent is designed to work with structural and summary information rather than raw data, so **PII transmission to the LLM is not required by the workflow**. However, there is no absolute technical guarantee that no sensitive data element will ever appear in an AI prompt, particularly when analyzing files that contain sensitive content at the cell level.

**Agent guardrails**

Coherent's AI agents operate under strict controls:

* **Scoped tool access**: agents are granted only the tools required for their specific task — no broad or ambient tool access
* **Sandboxed execution**: agent loops run in isolated environments, preventing unintended side effects
* **Context scoping**: each interaction receives only the context it needs; no bleed-through between tenants or tasks
* **Task boundaries**: agents are explicitly scoped to defined task types and cannot self-direct outside their assigned workflow
* **System prompt governance**: all agent behavior is governed by controlled system prompts, reviewed as part of platform governance
* **Structured output validation**: agent outputs are validated against expected schemas before being consumed downstream
* **Tenant isolation**: permissions are enforced at the platform level; agents and users can only act within their authorized boundaries

**Customer options for AI provider control**

For organizations that require additional control over AI traffic, Coherent supports several configurations:

<table><thead><tr><th width="246.46875">OPTIONS</th><th>HOW IT WORKS</th></tr></thead><tbody><tr><td><strong>Standard</strong> (default)</td><td>Coherent manages the AI proxy and API keys. All AI traffic is metered and governed centrally.</td></tr><tr><td><strong>Customer API keys</strong></td><td>The customer provides their own OpenAI (or other provider) API keys. Coherent's proxy is reconfigured to use the customer's keys, putting the AI data relationship directly under the customer's enterprise agreement.</td></tr><tr><td><strong>Customer AI gateway</strong></td><td>The customer routes AI traffic through their own API gateway or proxy infrastructure. Coherent applications are pointed at the customer's endpoint.</td></tr><tr><td><strong>Customer-hosted AI</strong></td><td>For customers with on-premise or private-cloud LLM deployments, Coherent can integrate with a local AI provider, keeping all AI traffic within the customer's network. This option involves additional cost and alignment on the customer's infrastructure and policies; it is best explored after initial engagement rather than as a starting configuration.</td></tr></tbody></table>

These options require no application code changes — only proxy configuration.

</details>

<details>

<summary>Deployments</summary>

<table><thead><tr><th width="246.41015625">MODEL</th><th>DESCRIPTION</th></tr></thead><tbody><tr><td><strong>Standard Cloud</strong></td><td>Coherent-hosted on SOC 2 Type II certified AWS infrastructure. Files processed locally in the browser; metadata and project artifacts in the cloud.</td></tr><tr><td><strong>Hybrid</strong></td><td>Coherent manages the platform and indexing in the cloud (fingerprints and metadata only); sensitive content, the project database, and detailed assessment archives reside in customer-managed infrastructure that the platform connects to for analysis.</td></tr><tr><td><strong>On-Premise / Private Cloud</strong></td><td>The entire platform — including local OmniStore storage and browser-side scanning, compilation, and AI execution — deployed within the customer's infrastructure using secure Docker containers. Well suited to pilots that must stay fully in-network. This option carries additional cost and scoping; most customers begin with Standard Cloud or Hybrid and evaluate on-premise once requirements are well understood.</td></tr></tbody></table>

</details>

<details>

<summary>Compliance</summary>

**Certifications and standards**

<table><thead><tr><th width="245.85546875">AREA</th><th>STATUS</th></tr></thead><tbody><tr><td>SOC 2 Type II</td><td>Certified (Security trust services category), annually audited</td></tr><tr><td>Encryption in transit</td><td>TLS 1.3 preferred; TLS 1.2 supported as fallback</td></tr><tr><td>Encryption at rest</td><td>AES-256</td></tr><tr><td>Authentication</td><td>JWT-based; supports Microsoft Entra ID (Azure AD) SSO</td></tr><tr><td>PII detection</td><td>Built-in scanning during local file processing; configurable filtering</td></tr><tr><td>Data retention</td><td>Configurable. On-demand report scans are transient (de-identified metadata deleted after the report is generated); Insights Projects persist configuration, results, and reports by design, with configurable retention.</td></tr></tbody></table>

**Scope note:** The current SOC 2 Type II audit covers the Security trust services category. The AI/LLM architecture (proxy, provider integration, agent guardrails) was outside the audited system boundary.

</details>

<details>

<summary>Quick reference</summary>

| CONCERN                                  | HOW INSIGHTS ADDRESS IT                                                                                                                      |
| ---------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- |
| Do my Excel files leave my machine?      | No. Files are scanned and processed in the browser; only the metadata your chosen data level permits (Level 1 by default) reaches the cloud. |
| Is my data used to train AI models?      | No. OpenAI's API policy prohibits training on API data, and Coherent's proxy does not log or retain prompt or response content.              |
| What about PII in AI prompts?            | The agent workflow is designed to avoid transmitting PII. For additional assurance, customers can use their own AI keys or a local LLM.      |
| Can I control the AI provider?           | Yes. Bring your own API keys, route through your own gateway, or integrate with a local AI deployment.                                       |
| What's stored in the cloud for Projects? | Project configuration, structured assessment results, attached supporting documents, and generated reports. Not Excel file content.          |
| What about tenant isolation?             | Strict tenant boundaries, RBAC, scoped agent permissions, no cross-tenant data access.                                                       |

</details>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.coherent.global/trust-center/coherent-trust-center/insights-ai-security-and-data-usage.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
LEVEL	WHAT'S CAPTURED	TYPICAL USE CASE
Level 0	File fingerprints (SHA-256 hashes) and similarity vectors only	Deduplication, grouping, compliance tracking — zero content exposure
Level 1 (default)	Basic metadata: file names, sheet names, authors, table headers	Cataloging and estate visibility without revealing content
Level 2	Summarized formulas, sample content, logic representations	Deeper reporting and analysis while limiting full content exposure
Level 3	Full workbook content, formulas, and optionally the file itself	Advanced analysis — only enabled on explicit customer request
OPTIONS	HOW IT WORKS
Standard (default)	Coherent manages the AI proxy and API keys. All AI traffic is metered and governed centrally.
Customer API keys	The customer provides their own OpenAI (or other provider) API keys. Coherent's proxy is reconfigured to use the customer's keys, putting the AI data relationship directly under the customer's enterprise agreement.
Customer AI gateway	The customer routes AI traffic through their own API gateway or proxy infrastructure. Coherent applications are pointed at the customer's endpoint.
Customer-hosted AI	For customers with on-premise or private-cloud LLM deployments, Coherent can integrate with a local AI provider, keeping all AI traffic within the customer's network. This option involves additional cost and alignment on the customer's infrastructure and policies; it is best explored after initial engagement rather than as a starting configuration.
MODEL	DESCRIPTION
Standard Cloud	Coherent-hosted on SOC 2 Type II certified AWS infrastructure. Files processed locally in the browser; metadata and project artifacts in the cloud.
Hybrid	Coherent manages the platform and indexing in the cloud (fingerprints and metadata only); sensitive content, the project database, and detailed assessment archives reside in customer-managed infrastructure that the platform connects to for analysis.
On-Premise / Private Cloud	The entire platform — including local OmniStore storage and browser-side scanning, compilation, and AI execution — deployed within the customer's infrastructure using secure Docker containers. Well suited to pilots that must stay fully in-network. This option carries additional cost and scoping; most customers begin with Standard Cloud or Hybrid and evaluate on-premise once requirements are well understood.
AREA	STATUS
SOC 2 Type II	Certified (Security trust services category), annually audited
Encryption in transit	TLS 1.3 preferred; TLS 1.2 supported as fallback
Encryption at rest	AES-256
Authentication	JWT-based; supports Microsoft Entra ID (Azure AD) SSO
PII detection	Built-in scanning during local file processing; configurable filtering
Data retention	Configurable. On-demand report scans are transient (de-identified metadata deleted after the report is generated); Insights Projects persist configuration, results, and reports by design, with configurable retention.