The recommended approach to connecting to storage is to use service principals. For Azure this means creating an entra-service-principal within Azure and give it access to storage accounts by giving it Storage Blob Data Contributor role for the storage account.
Secret scopes in databricks should be used to store credentials and users, service principals and groups can be given access to read the secret scope.
Scopes can be created using the databricks cli or through the REST api. For example, create the scope using the cli:
databricks secrets create-scope <secret-scope>
or use the REST api:
> curl --request POST https://<databricks cluster>/api/2.0/secrets/scopes/create --data @create-scope.json --header "Authorization: Bearer ${DATABRICKS_TOKEN}"
where create-scope.json contains: { "scope": "standard-scope", "initial_manage_principal": "users" }
Then add the secret:
> curl --request POST https://<databricks cluster>/api/2.0/secrets/put --data @add-storage-secret.json --header "Authorization: Bearer ${DATABRICKS_TOKEN}"
with add-storage-secret.json containing the account principal secret created above:
{
"scope": "standard-scope",
"key": "my-secret-key",
"string_value": "<secret>"
}
Clusters and/or notebooks can then use the secret scopes to configure access to storage. For cluster wide access the following spark properties are set in the Spark configuration:
spark.hadoop.fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net OAuth
spark.hadoop.fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net <application-id>
spark.hadoop.fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net {{secrets/<secret-scope>/<service-credential-key>}}
spark.hadoop.fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net https://login.microsoftonline.com/<directory-id>/oauth2/token
For notebook level, the configuration is set with:
service_credential = dbutils.secrets.get(scope="<secret-scope>",key="<service-credential-key>")
spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")
The <application-id> should be the Application (client) ID for the Microsoft Entra ID application and the <directory-id> is the Directory (tenant) ID for the Microsoft Entra ID application - see Azure Application Objects And Service Principals
<service-credential-key> is the key in the secret scope containing the client secret
The abfss protocol is used :
spark.read.load("abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path-to-data>")
or :
CREATE TABLE <database-name>.<table-name>;
COPY INTO <database-name>.<table-name>
FROM 'abfss://container@storageAccount.dfs.core.windows.net/path/to/folder'
FILEFORMAT = CSV
COPY_OPTIONS ('mergeSchema' = 'true');