# Secure XConnector access with OAuth2 tokens

For the implementation of `XConnector`, we recommend to implement security for both the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#proxy-service) and [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#remote-service) using [OAuth2](https://oauth.net/2/) [JWT](https://jwt.io/) (JSON web token) access tokens issued by your Spark tenant from [Keycloak](https://www.keycloak.org/). While your organization may leverage various modern identity providers that support OIDC or SAML for user authentication, the access token from Keycloak plays a central role in the security for XConnector.

After reading this section, you should be able to understand:

* How the authentication flow can work via the[Introduction to XConnector](/xconnector/introduction-to-xconnector.md#proxy-service) or using a direct call to the   [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#remote-service).
* The essential components in the [Keycloak](https://www.keycloak.org/) [JWT](https://jwt.io/) needed to validate authentication.

## Options for securing the `XConnector` implementation

In the `XConnector`[Introduction to XConnector](/xconnector/introduction-to-xconnector.md#information-flow-and-components), the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#proxy-service) is an optional component that can be used to facilitate the routing of `XConnector` calls to the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#remote-service)s.

* If the intention is for the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#remote-service) to be publicly available without authentication, then the steps described in this article would not be required.
* If you have implemented a [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#proxy-service) for your `XConnector` implementation, then it follows to implement security at the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#proxy-service) and to review [#option-1-secure-the-proxy-service](#option-1-secure-the-proxy-service "mention").
* If the `XConnector` has been implemented as a direct call to the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#remote-service), then it follows to implement security at the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#remote-service) and to review [#option-2-secure-the-remote-service](#option-2-secure-the-remote-service "mention").

## Option 1: Secure the Proxy service

It is crucial to safeguard your [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#proxy-service) endpoints unless the[Introduction to XConnector](/xconnector/introduction-to-xconnector.md#remote-service)s are intended for public access.

Spark utilizes credential-less API integration objects for authentication to the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#proxy-service) endpoint. These integrations segregate responsibilities between administrators and users, enabling administrators to establish a trust policy between Spark and the cloud provider via the provider's native authentication and authorization mechanism. Upon Spark's connection to the cloud provider, authentication and authorization are facilitated through this trust policy.&#x20;

Additionally, administrators can specify an allowed list of endpoints accessible by the API integration object, restricting Spark's access to specific proxy services and resources. This capability empowers administrators to enforce organizational policies governing data egress and ingress.

### Proxy service authentication flow

<figure><img src="/files/Fvb1DbdHeCg4KAS5FLX1" alt=""><figcaption></figcaption></figure>

1. [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#spark-xconnector) sends a request to the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#proxy-service). The `Authorization` header should include the bearer token. The request body should conform to [Remote service input and output data formats](/xconnector/remote-service-input-and-output-data-formats.md).
2. The [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#proxy-service) validates the access token's signature against the customer's Keycloak tenant OpenID Connect Metadata document, using predefined authentication in the validation policy.
3. After successful validation, the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#proxy-service) forwards the request to the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#remote-service) using the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#remote-service)'s API Key or OAuth2 Client Credentials access token.

### Example 1: Secure end-to-end data consumption from Azure Data Lake through Azure API Management

This provides an overview of how [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#spark-xconnector) can be used to securely provide access to data from Azure Data Lake.

<figure><img src="/files/C73dik49e03L71NuiOkk" alt=""><figcaption></figcaption></figure>

1. A client application initiates a request to the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#spark-xconnector). The `Authorization` header should include the bearer token. The request body should conform to [Remote service input and output data formats](/xconnector/remote-service-input-and-output-data-formats.md).
2. Spark verifies the access token's authenticity by referencing the Keycloak tenant OpenID Connect Metadata document.
3. Upon successful validation, Spark transfers the request to Azure API Management, functioning as the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#proxy-service).
4. Within Azure API Management, the initial step involves validating the request's IP Address against the predefined range outlined in the `ip-filter` section of the validation policy. Following this, it authenticates the access token's signature using the Keycloak OpenID Connect Metadata document, also specified in the same policy.
5. Post-validation, the Azure API Management retrieves the API key for the Azure Function App (serving as the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#remote-service)) from Azure Key Vault, utilizing a passwordless managed system identity.
6. The obtained API key from step 5 is incorporated as the value of the `x-functions-key` header in the request. Subsequently, the Azure API Management solution forwards the request to the Azure Function App's HTTP Trigger.
7. The Function App's network security group confirms the request's IP address matches the API Management (per the network security group rule) and validates the presence of the API key in the `x-functions-key` request header. Upon meeting all criteria, the Function App employs the Azure Storage Files Data Lake client library for .NET to access data from Data Lake Storage, leveraging a passwordless managed system identity.
8. The Function app transfers data from Data Lake Storage Gen V2 to Spark.
9. Spark formats the retrieved data and delivers it back to the client application.

## Option 2: Secure the Remote service

Use of the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#proxy-service) is optional. The [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#spark-xconnector) can also be setup directly to the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#remote-service).

### Remote service authentication flow

<figure><img src="/files/ni0CsXqm7Qpi4kbUuEWV" alt=""><figcaption></figcaption></figure>

1. [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#spark-xconnector) sends a request to the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#remote-service). The `Authorization` header should include the bearer token. The request body should conform to [Remote service input and output data formats](/xconnector/remote-service-input-and-output-data-formats.md).
2. The [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#remote-service) validates the access token's signature against the customer's Keycloak tenant OpenID Connect Metadata document, using the custom validation functionality. Upon sucessful validation, the request is authenticated.

### Example 2: Secure end-to-end data consumption from Azure Data Lake through direct call to the Remote service

This provides an overview of how [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#spark-xconnector) can be used to securely provide access to data from Azure Data Lake without the use of the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#proxy-service).

<figure><img src="/files/2UpGYYy0ygF8yaN0q5Ur" alt=""><figcaption></figcaption></figure>

1. A client application initiates a request to the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#spark-xconnector). The `Authorization` header should include the bearer token. The request body should conform to [Remote service input and output data formats](/xconnector/remote-service-input-and-output-data-formats.md).
2. Spark verifies the access token's authenticity by referencing the Keycloak tenant OpenID Connect Metadata document.
3. Upon successful validation, Spark transfers the request to the Azure Function App, functioning as the [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#remote-service).
4. The Function App's network security group validates the request's IP address corresponds to the Spark tenant. The Azure Function Ap employs custom validation functionality to  authenticate the access token's signature using the Keycloak OpenID Connect Metadata document.
5. &#x20;Upon meeting all criteria, the Function App employs the Azure Storage Files Data Lake client library for .NET to access data from Data Lake Storage, leveraging a passwordless managed system identity.
6. The Function app transfers data from Data Lake Storage Gen V2 to Spark.
7. Spark formats the retrieved data and delivers it back to the client application.

## Keycloak access token

Here is an abbreviated example of a decoded [Keycloak](https://www.keycloak.org/) [JWT](https://jwt.io/) access token that is used for authentication across `Xconnector`.

```json
{
  "alg": "RS256",
  "typ": "JWT",
  "kid": "NbOq0T3N1h5jbvjIKhu3CjCalTGLe1dYiQ6H19ihVEM"
}.{
  "exp": 1707365863,
  "iat": 1707358663,
  "auth_time": 1707358662,
  "jti": "03e6b9c5-7fa-41db-be98-064751a60edc",
  "iss": ,
  "aud": "product-factory",
  "sub": "80b43a33-24b8-3f07-aac8-f34a625a20aa",
  "typ": "Bearer",
  "azp": "product-factory",
  "nonce": "dc328d4d-5e66-4161-97dc-ab1c6de5ac52",
  "scope": "open id offline_access spark profile"
}.[Signature]
```

From the Keycloak access token, your [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#proxy-service) or custom token [Introduction to XConnector](/xconnector/introduction-to-xconnector.md#remote-service) must validate the following:

### Payload

The important components of the access token payload are:

| Claim name | Description                                                                                                                                | Required                                                          |
| ---------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------- |
| `iss`      | Identifies the token issuer, allowing the recipient to verify the token's authenticity and origin by matching it with the expected issuer. | Yes                                                               |
| `aud`      | Identifies the intended audience for the token, specifying who the token is intended for. From Spark the value is `product-factory`.       | No, but recommended                                               |
| `scope`    | Identifies the permissions or access rights granted to the berare of the token.                                                            | No, but recommended to use `open id offline_access spark profile` |

### Signature

In a JWT issued by Keycloak, the signature refers to the cryptographic signature added to the token to ensure its integrity and authenticity. It is validated using the public key found in the OpenID Connect Metadata document published by Keycloak. This validation process helps confirm that the JWT has not been tampered with and can be trusted.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.coherent.global/xconnector/secure-xconnector-access-with-oauth2-tokens.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
