Using Service Principals in Databricks for Spark Access to Storage

Introduction

The recommended approach to connecting to storage is to use service principals. For Azure this means creating an entra-service-principal within Azure and give it access to storage accounts by giving it Storage Blob Data Contributor role for the storage account.

Secret scopes in databricks should be used to store credentials and users, service principals and groups can be given access to read the secret scope.

Creating Secret Scopes

Scopes can be created using the databricks cli or through the REST api. For example, create the scope using the cli:

databricks secrets create-scope <secret-scope>

or use the REST api:

> curl --request POST https://<databricks cluster>/api/2.0/secrets/scopes/create --data @create-scope.json --header "Authorization: Bearer ${DATABRICKS_TOKEN}"

where create-scope.json contains: { "scope": "standard-scope", "initial_manage_principal": "users" }

Then add the secret:

> curl --request POST https://<databricks cluster>/api/2.0/secrets/put --data @add-storage-secret.json --header "Authorization: Bearer ${DATABRICKS_TOKEN}"

with add-storage-secret.json containing the account principal secret created above:

{

"scope": "standard-scope",

"key": "my-secret-key",

"string_value": "<secret>"

}

Configuring Spark

Clusters and/or notebooks can then use the secret scopes to configure access to storage. For cluster wide access the following spark properties are set in the Spark configuration:

spark.hadoop.fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net OAuth

spark.hadoop.fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider

spark.hadoop.fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net <application-id>

spark.hadoop.fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net {{secrets/<secret-scope>/<service-credential-key>}}

spark.hadoop.fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net https://login.microsoftonline.com/<directory-id>/oauth2/token

For notebook level, the configuration is set with:

service_credential = dbutils.secrets.get(scope="<secret-scope>",key="<service-credential-key>")

spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")

spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")

spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")

spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)

spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")

The <application-id> should be the Application (client) ID for the Microsoft Entra ID application and the <directory-id> is the Directory (tenant) ID for the Microsoft Entra ID application - see Azure Application Objects And Service Principals

<service-credential-key> is the key in the secret scope containing the client secret

Accessing Azure Storage

The abfss protocol is used :

spark.read.load("abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path-to-data>")

or :

CREATE TABLE <database-name>.<table-name>;

COPY INTO <database-name>.<table-name>

FROM 'abfss://container@storageAccount.dfs.core.windows.net/path/to/folder'

FILEFORMAT = CSV

COPY_OPTIONS ('mergeSchema' = 'true');

Page updated

Google Sites

Report abuse

Using Service Principals in Databricks for Spark Access to Storage

Introduction

Creating Secret Scopes

Configuring Spark

Accessing Azure Storage

CONTACT INFORMATION