Unity Catalog Governance Pack: Encryption Guide for Unity Catalog

Encryption Guide for Unity Catalog

Overview

Encryption provides data protection at two levels:

  • At rest: Data stored on disk is encrypted
  • In transit: Data moving between services is encrypted

1. Encryption at Rest

Default Encryption (Platform-Managed Keys)

Databricks encrypts all data at rest by default using platform-managed keys:

  • Delta tables on ADLS Gen2: AES-256 encryption
  • Databricks DBFS: AES-256 encryption
  • Notebook content: Encrypted in control plane
  • Cluster EBS volumes: Encrypted

Customer-Managed Keys (CMK)

For regulatory requirements, you can bring your own encryption keys:

Azure Key Vault Setup

# Create Key Vault
az keyvault create 
  --name your-governance-kv 
  --resource-group your-rg 
  --location eastus 
  --sku premium 
  --enable-purge-protection 
  --enable-soft-delete

# Create encryption key
az keyvault key create 
  --vault-name your-governance-kv 
  --name databricks-cmk 
  --ktype RSA 
  --size 2048

# Grant Databricks access to the key
az keyvault set-policy 
  --name your-governance-kv 
  --object-id <databricks-enterprise-app-object-id> 
  --key-permissions get wrapKey unwrapKey

Terraform Configuration

resource "azurerm_key_vault_key" "databricks_cmk" {
  name         = "databricks-cmk"
  key_vault_id = azurerm_key_vault.governance.id
  key_type     = "RSA"
  key_size     = 2048

  key_opts = ["wrapKey", "unwrapKey"]
}

resource "azurerm_databricks_workspace" "this" {
  name                = "your-workspace"
  resource_group_name = azurerm_resource_group.this.name
  location            = azurerm_resource_group.this.location
  sku                 = "premium"

  customer_managed_key_enabled = true

  custom_parameters {
    encryption {
      key_vault_uri   = azurerm_key_vault.governance.vault_uri
      key_name        = azurerm_key_vault_key.databricks_cmk.name
      key_version     = azurerm_key_vault_key.databricks_cmk.version
    }
  }
}

ADLS Gen2 Encryption

# Enable infrastructure encryption (double encryption)
az storage account update 
  --name yourstorageaccount 
  --resource-group your-rg 
  --encryption-key-source Microsoft.Keyvault 
  --encryption-key-vault https://your-governance-kv.vault.azure.net 
  --encryption-key-name storage-cmk

2. Encryption in Transit

TLS Configuration

All Databricks communication uses TLS 1.2+ by default:

  • Workspace UI to control plane: TLS 1.2
  • Cluster to metastore: TLS 1.2
  • Cluster to ADLS: TLS 1.2
  • Inter-cluster communication: TLS 1.2
  • JDBC/ODBC connections: TLS 1.2

Enforce Minimum TLS Version

# ADLS Gen2: Enforce TLS 1.2
az storage account update 
  --name yourstorageaccount 
  --min-tls-version TLS1_2

Cluster Encryption

Enable encryption for cluster inter-node communication:

{
  "spark_conf": {
    "spark.databricks.cluster.encryption.enabled": "true"
  }
}

3. Delta Lake Encryption Considerations

Column-Level Encryption

For columns requiring additional encryption beyond storage-level:

-- Encrypt sensitive columns before writing
CREATE OR REPLACE FUNCTION your_catalog.governance.encrypt_value(
  value STRING,
  key STRING
)
RETURNS STRING
COMMENT 'AES encryption for column-level protection'
RETURN BASE64(AES_ENCRYPT(value, key));

-- Decrypt when reading (requires key access)
CREATE OR REPLACE FUNCTION your_catalog.governance.decrypt_value(
  encrypted_value STRING,
  key STRING
)
RETURNS STRING
COMMENT 'AES decryption for column-level protection'
RETURN CAST(AES_DECRYPT(UNBASE64(encrypted_value), key) AS STRING);

Key Rotation

Implement regular key rotation:

# Key rotation schedule
# 1. Generate new key version in Key Vault
# 2. Re-encrypt data with new key
# 3. Update Databricks workspace configuration
# 4. Verify data accessibility
# 5. Mark old key version for deletion (after retention period)

4. Secrets Management

Azure Key Vault Integration

# Create a secret scope backed by Key Vault
# (via Databricks CLI)
# databricks secrets create-scope 
#   --scope governance-secrets 
#   --scope-backend-type AZURE_KEYVAULT 
#   --resource-id /subscriptions/.../resourceGroups/.../providers/Microsoft.KeyVault/vaults/your-kv 
#   --dns-name https://your-governance-kv.vault.azure.net/

# Access secrets in notebooks
# encryption_key = dbutils.secrets.get(scope="governance-secrets", key="encryption-key")

Best Practices

  • Never hardcode secrets in notebooks or scripts
  • Use managed identity when possible
  • Rotate secrets on a regular schedule (90 days recommended)
  • Audit secret access via Key Vault diagnostic logs
  • Use separate Key Vaults for different environments

5. Encryption Checklist

  • [ ] Platform encryption at rest is enabled (default)
  • [ ] Customer-managed keys configured (if required by regulation)
  • [ ] ADLS Gen2 encryption enabled
  • [ ] Minimum TLS 1.2 enforced on all storage accounts
  • [ ] Cluster inter-node encryption enabled
  • [ ] Key rotation schedule established
  • [ ] Secrets stored in Key Vault (not in code)
  • [ ] Secret scopes configured in Databricks
  • [ ] Key Vault diagnostic logging enabled
  • [ ] Encryption key access audited

This is 1 of 6 resources in the DataStack Pro toolkit. Get the complete [Unity Catalog Governance Pack] with all files, templates, and documentation for $39.

Get the Full Kit →

Or grab the entire DataStack Pro bundle (6 products) for $164 — save 30%.

Get the Complete Bundle →

Leave a Reply