Secure XConnector access with OAuth2 tokens

For the implementation of XConnector, we recommend to implement security for both the Proxy service and Introduction to XConnector using OAuth2arrow-up-right JWTarrow-up-right (JSON web token) access tokens issued by your Spark tenant from Keycloakarrow-up-right. While your organization may leverage various modern identity providers that support OIDC or SAML for user authentication, the access token from Keycloak plays a central role in the security for XConnector.

After reading this section, you should be able to understand:

Options for securing the XConnector implementation

In the XConnectorIntroduction to XConnector, the Proxy service is an optional component that can be used to facilitate the routing of XConnector calls to the Introduction to XConnectors.

Option 1: Secure the Proxy service

It is crucial to safeguard your Proxy service endpoints unless theIntroduction to XConnectors are intended for public access.

Spark utilizes credential-less API integration objects for authentication to the Proxy service endpoint. These integrations segregate responsibilities between administrators and users, enabling administrators to establish a trust policy between Spark and the cloud provider via the provider's native authentication and authorization mechanism. Upon Spark's connection to the cloud provider, authentication and authorization are facilitated through this trust policy.

Additionally, administrators can specify an allowed list of endpoints accessible by the API integration object, restricting Spark's access to specific proxy services and resources. This capability empowers administrators to enforce organizational policies governing data egress and ingress.

Proxy service authentication flow

  1. Introduction to XConnector sends a request to the Proxy service. The Authorization header should include the bearer token. The request body should conform to Remote service input and output data formats.

  2. The Proxy service validates the access token's signature against the customer's Keycloak tenant OpenID Connect Metadata document, using predefined authentication in the validation policy.

  3. After successful validation, the Proxy service forwards the request to the Introduction to XConnector using the Introduction to XConnector's API Key or OAuth2 Client Credentials access token.

Example 1: Secure end-to-end data consumption from Azure Data Lake through Azure API Management

This provides an overview of how Introduction to XConnector can be used to securely provide access to data from Azure Data Lake.

  1. A client application initiates a request to the Introduction to XConnector. The Authorization header should include the bearer token. The request body should conform to Remote service input and output data formats.

  2. Spark verifies the access token's authenticity by referencing the Keycloak tenant OpenID Connect Metadata document.

  3. Upon successful validation, Spark transfers the request to Azure API Management, functioning as the Proxy service.

  4. Within Azure API Management, the initial step involves validating the request's IP Address against the predefined range outlined in the ip-filter section of the validation policy. Following this, it authenticates the access token's signature using the Keycloak OpenID Connect Metadata document, also specified in the same policy.

  5. Post-validation, the Azure API Management retrieves the API key for the Azure Function App (serving as the Introduction to XConnector) from Azure Key Vault, utilizing a passwordless managed system identity.

  6. The obtained API key from step 5 is incorporated as the value of the x-functions-key header in the request. Subsequently, the Azure API Management solution forwards the request to the Azure Function App's HTTP Trigger.

  7. The Function App's network security group confirms the request's IP address matches the API Management (per the network security group rule) and validates the presence of the API key in the x-functions-key request header. Upon meeting all criteria, the Function App employs the Azure Storage Files Data Lake client library for .NET to access data from Data Lake Storage, leveraging a passwordless managed system identity.

  8. The Function app transfers data from Data Lake Storage Gen V2 to Spark.

  9. Spark formats the retrieved data and delivers it back to the client application.

Option 2: Secure the Remote service

Use of the Proxy service is optional. The Introduction to XConnector can also be setup directly to the Introduction to XConnector.

Remote service authentication flow

  1. Introduction to XConnector sends a request to the Introduction to XConnector. The Authorization header should include the bearer token. The request body should conform to Remote service input and output data formats.

  2. The Introduction to XConnector validates the access token's signature against the customer's Keycloak tenant OpenID Connect Metadata document, using the custom validation functionality. Upon sucessful validation, the request is authenticated.

Example 2: Secure end-to-end data consumption from Azure Data Lake through direct call to the Remote service

This provides an overview of how Introduction to XConnector can be used to securely provide access to data from Azure Data Lake without the use of the Proxy service.

  1. A client application initiates a request to the Introduction to XConnector. The Authorization header should include the bearer token. The request body should conform to Remote service input and output data formats.

  2. Spark verifies the access token's authenticity by referencing the Keycloak tenant OpenID Connect Metadata document.

  3. Upon successful validation, Spark transfers the request to the Azure Function App, functioning as the Introduction to XConnector.

  4. The Function App's network security group validates the request's IP address corresponds to the Spark tenant. The Azure Function Ap employs custom validation functionality to authenticate the access token's signature using the Keycloak OpenID Connect Metadata document.

  5. Upon meeting all criteria, the Function App employs the Azure Storage Files Data Lake client library for .NET to access data from Data Lake Storage, leveraging a passwordless managed system identity.

  6. The Function app transfers data from Data Lake Storage Gen V2 to Spark.

  7. Spark formats the retrieved data and delivers it back to the client application.

Keycloak access token

Here is an abbreviated example of a decoded Keycloakarrow-up-right JWTarrow-up-right access token that is used for authentication across Xconnector.

From the Keycloak access token, your Proxy service or custom token Introduction to XConnector must validate the following:

Payload

The important components of the access token payload are:

Claim name
Description
Required

iss

Identifies the token issuer, allowing the recipient to verify the token's authenticity and origin by matching it with the expected issuer.

Yes

aud

Identifies the intended audience for the token, specifying who the token is intended for. From Spark the value is product-factory.

No, but recommended

scope

Identifies the permissions or access rights granted to the berare of the token.

No, but recommended to use open id offline_access spark profile

Signature

In a JWT issued by Keycloak, the signature refers to the cryptographic signature added to the token to ensure its integrity and authenticity. It is validated using the public key found in the OpenID Connect Metadata document published by Keycloak. This validation process helps confirm that the JWT has not been tampered with and can be trusted.

Last updated