Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update adb-exfiltration-protection to use azurerm v4 #150

Merged
merged 2 commits into from
Nov 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 12 additions & 13 deletions examples/adb-exfiltration-protection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,15 @@ Resources to be created:

## How to use

1. Update `terraform.tfvars` file and provide values to each defined variable
1. Update `terraform.tfvars` file and provide values to each defined variable.
2. (Optional) Configure your [remote backend](https://developer.hashicorp.com/terraform/language/settings/backends/azurerm)
3. Run `terraform init` to initialize terraform and get provider ready.
4. Run `terraform apply` to create the resources.

## How to fill in variable values

Some variables have no default value and will require one, e.g. `subscription_id`

Most of the values are to be found at: https://docs.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr

In `variables.tfvars`, set these variables:
Expand All @@ -47,16 +49,17 @@ firewallfqdn = ["dbartifactsprodseap.blob.core.windows.net","dbartifactsprodeap.

| Name | Version |
| ---------------------------------------------------------------------------- | ------- |
| <a name="requirement_azurerm"></a> [azurerm](#requirement\_azurerm) | =2.83.0 |
| <a name="requirement_databricks"></a> [databricks](#requirement\_databricks) | 0.3.10 |
| <a name="requirement_azurerm"></a> [azurerm](#requirement\_azurerm) | >=4.0.0 |
| <a name="requirement_databricks"></a> [databricks](#requirement\_databricks) | >=1.52.0|

## Providers

| Name | Version |
| ---------------------------------------------------------------- | ------- |
| <a name="provider_azurerm"></a> [azurerm](#provider\_azurerm) | 2.83.0 |
| <a name="provider_external"></a> [external](#provider\_external) | 2.2.0 |
| <a name="provider_random"></a> [random](#provider\_random) | 3.1.0 |
| <a name="provider_azurerm"></a> [azurerm](#provider\_azurerm) | 4.9.0 |
| <a name="provider_external"></a> [external](#provider\_external) | 1.58.0 |
| <a name="provider_random"></a> [random](#provider\_random) | 3.6.3 |
| <a name="provider_dns"></a> [dns](#provider\_dns) | 3.4.2 |

## Modules

Expand Down Expand Up @@ -95,11 +98,11 @@ No modules.

| Name | Description | Type | Default | Required |
| -------------------------------------------------------------------------------------------------------------- | ----------- | ----------- | ----------------- | :------: |
| <a name="input_subscription_id"></a> [subscription\_id](#input\_subscription\_id) | n/a | `string` | n/a | yes |
| <a name="input_dbfs_prefix"></a> [dbfs\_prefix](#input\_dbfs\_prefix) | n/a | `string` | `"dbfs"` | no |
| <a name="input_firewallfqdn"></a> [firewallfqdn](#input\_firewallfqdn) | n/a | `list(any)` | n/a | yes |
| <a name="input_hubcidr"></a> [hubcidr](#input\_hubcidr) | n/a | `string` | `"10.178.0.0/20"` | no |
| <a name="input_metastoreip"></a> [metastoreip](#input\_metastoreip) | n/a | `string` | n/a | yes |
| <a name="input_no_public_ip"></a> [no\_public\_ip](#input\_no\_public\_ip) | n/a | `bool` | `true` | no |
| <a name="input_private_subnet_endpoints"></a> [private\_subnet\_endpoints](#input\_private\_subnet\_endpoints) | n/a | `list` | `[]` | no |
| <a name="input_rglocation"></a> [rglocation](#input\_rglocation) | n/a | `string` | `"southeastasia"` | no |
| <a name="input_sccip"></a> [sccip](#input\_sccip) | n/a | `string` | n/a | yes |
Expand All @@ -111,11 +114,7 @@ No modules.

| Name | Description |
| -------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- |
| <a name="output_arm_client_id"></a> [arm\_client\_id](#output\_arm\_client\_id) | n/a |
| <a name="output_arm_subscription_id"></a> [arm\_subscription\_id](#output\_arm\_subscription\_id) | n/a |
| <a name="output_arm_tenant_id"></a> [arm\_tenant\_id](#output\_arm\_tenant\_id) | n/a |
| <a name="output_azure_region"></a> [azure\_region](#output\_azure\_region) | n/a |
| <a name="output_databricks_azure_workspace_resource_id"></a> [databricks\_azure\_workspace\_resource\_id](#output\_databricks\_azure\_workspace\_resource\_id) | n/a |
| <a name="output_resource_group"></a> [resource\_group](#output\_resource\_group) | n/a |
| <a name="output_azure_resource_group_id"></a> [azure\_resource\_group\_id](#output\_azure\_resource\_group\_id) | n/a |
| <a name="output_workspace_id"></a> [workspace\_id](#output\_workspace\_id) | n/a |
| <a name="output_workspace_url"></a> [workspace\_url](#output\_workspace\_url) | n/a |
<!-- END_TF_DOCS -->
13 changes: 0 additions & 13 deletions examples/adb-exfiltration-protection/main.tf
Original file line number Diff line number Diff line change
@@ -1,20 +1,7 @@
/**
* Azure Databricks workspace in custom VNet with traffic routed via firewall in the Hub VNet
*
* Module creates:
* * Resource group with random prefix
* * Tags, including `Owner`, which is taken from `az account show --query user`
* * VNet with public and private subnet for Databricks
* * VNet with subnet for deployment of Azure Firewall
* * Azure Firewall with access enabled to Databricks-related resources
* * Databricks workspace
*/

module "adb-exfiltration-protection" {
source = "../../modules/adb-exfiltration-protection"
hubcidr = var.hubcidr
spokecidr = var.spokecidr
no_public_ip = var.no_public_ip
rglocation = var.rglocation
metastore = var.metastore
scc_relay = var.scc_relay
Expand Down
14 changes: 14 additions & 0 deletions examples/adb-exfiltration-protection/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
output "azure_resource_group_id" {
description = "ID of the created Azure resource group"
value = module.adb-exfiltration-protection.azure_resource_group_id
}

output "workspace_id" {
description = "The Databricks workspace ID"
value = module.adb-exfiltration-protection.workspace_id
}

output "workspace_url" {
description = "The Databricks workspace URL"
value = module.adb-exfiltration-protection.workspace_url
}
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
# versions.tf
terraform {
required_providers {
databricks = {
source = "databricks/databricks"
version = ">=1.20.0"
}
azurerm = {
source = "hashicorp/azurerm"
version = ">=2.83.0"
version = ">=4.0.0"
}
databricks = {
source = "databricks/databricks"
version = ">=1.52.0"
}
random = {
source = "hashicorp/random"
Expand All @@ -17,3 +16,8 @@ terraform {
}
}
}

provider "azurerm" {
subscription_id = var.subscription_id
features {}
}
24 changes: 13 additions & 11 deletions examples/adb-exfiltration-protection/terraform.tfvars
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
hubcidr = "10.178.0.0/20"
spokecidr = "10.179.0.0/20"
no_public_ip = true
rglocation = "westeurope"
subscription_id = "<your Azure Subscription ID here>"
dbfs_prefix = "dbfs"
workspace_prefix = "adb"
hubcidr = "10.178.0.0/20"
spokecidr = "10.179.0.0/20"
rglocation = "westeurope"

# We can pull this information automatically, i.e. from
# https://github.com/microsoft/AzureTRE/blob/main/templates/workspace_services/databricks/terraform/databricks-udr.json
# that is maintained by Microsoft team (although it may not be updated immediately).
metastore = [
metastore = [
"consolidated-westeurope-prod-metastore.mysql.database.azure.com",
"consolidated-westeurope-prod-metastore-addl-1.mysql.database.azure.com",
"consolidated-westeurope-prod-metastore-addl-2.mysql.database.azure.com",
Expand All @@ -15,24 +18,23 @@ metastore = [
"consolidated-westeuropec2-prod-metastore-2.mysql.database.azure.com",
"consolidated-westeuropec2-prod-metastore-3.mysql.database.azure.com",
]

// get from https://learn.microsoft.com/en-us/azure/databricks/resources/supported-regions#--metastore-artifact-blob-storage-system-tables-blob-storage-log-blob-storage-and-event-hub-endpoint-ip-addresses
scc_relay = [
scc_relay = [
"tunnel.westeurope.azuredatabricks.net",
"tunnel.westeuropec2.azuredatabricks.net"
]
webapp_ips = [
webapp_ips = [
"52.232.19.246/32",
"40.74.30.80/32",
"20.103.219.240/28",
"4.150.168.160/28",
]
eventhubs = [
eventhubs = [
"prod-westeurope-observabilityeventhubs.servicebus.windows.net",
"prod-westeuc2-observabilityeventhubs.servicebus.windows.net",
]
dbfs_prefix = "dbfs"
workspace_prefix = "adb"
firewallfqdn = [ // dbfs rule will be added - depends on dbfs storage name
firewallfqdn = [ // dbfs rule will be added - depends on dbfs storage name
"dbartifactsprodwesteu.blob.core.windows.net", //databricks artifacts
"arprodwesteua1.blob.core.windows.net",
"arprodwesteua2.blob.core.windows.net",
Expand Down
11 changes: 5 additions & 6 deletions examples/adb-exfiltration-protection/variables.tf
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
variable "subscription_id" {
type = string
description = "Azure Subscription ID to deploy the workspace into"
}

variable "hubcidr" {
description = "IP range for creaiton of the Spoke VNet"
type = string
Expand All @@ -10,12 +15,6 @@ variable "spokecidr" {
default = "10.179.0.0/20"
}

variable "no_public_ip" {
description = "If workspace should be created with No-Public-IP"
type = bool
default = true
}

variable "rglocation" {
description = "Location of resource group"
type = string
Expand Down
40 changes: 11 additions & 29 deletions modules/adb-exfiltration-protection/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,13 @@
# Provisioning Azure Databricks workspace with a Hub & Spoke firewall for data exfiltration protection

This template provides an example deployment of: Hub-Spoke networking with egress firewall to control all outbound traffic from Databricks subnets. Details are described in: https://databricks.com/blog/2020/03/27/data-exfiltration-protection-with-azure-databricks.html
This module will create Azure Databricks workspace with a Hub & Spoke firewall for data exfiltration protection.

With this setup, you can setup firewall rules to block / allow egress traffic from your Databricks clusters. You can also use firewall to block all access to storage accounts, and use private endpoint connection to bypass this firewall, such that you allow access only to specific storage accounts.
## Module content


To find IP and FQDN for your deployment, go to: https://docs.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr

## Overall Architecture
This module can be used to deploy the following:

![alt text](https://raw.githubusercontent.com/databricks/terraform-databricks-examples/main/modules/adb-exfiltration-protection/images/adb-exfiltration-classic.png?raw=true)

Resources to be created:
* Resource group with random prefix
* Tags, including `Owner`, which is taken from `az account show --query user`
* Hub-Spoke topology, with hub firewall in hub vnet's subnet.
Expand All @@ -32,22 +28,6 @@ Resources to be created:
6. Run `terraform init` to initialize terraform and get provider ready.
7. Run `terraform apply` to create the resources.


## How to fill in variable values

Most of the values are to be found at: https://learn.microsoft.com/en-us/azure/databricks/resources/supported-regions and https://docs.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr

In `variables.tfvars`, set these variables (bigger regions have multiple instances of each service):

```hcl
metastore = ["consolidated-westeurope-prod-metastore.mysql.database.azure.com"]
scc_relay = ["tunnel.westeurope.azuredatabricks.net"]
webapp_ips = ["52.230.27.216/32"] # given at UDR page
eventhubs = ["prod-westeurope-observabilityeventhubs.servicebus.windows.net"]
# find these for your region, follow Databricks blog tutorial.
firewallfqdn = ["dbartifactsprodseap.blob.core.windows.net","dbartifactsprodeap.blob.core.windows.net","dblogprodseasia.blob.core.windows.net","cdnjs.com"]
```

<!-- BEGIN_TF_DOCS -->
## Requirements

Expand Down Expand Up @@ -121,11 +101,13 @@ No modules.

| Name | Description |
| -------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- |
| <a name="output_arm_client_id"></a> [arm\_client\_id](#output\_arm\_client\_id) | n/a |
| <a name="output_arm_subscription_id"></a> [arm\_subscription\_id](#output\_arm\_subscription\_id) | n/a |
| <a name="output_arm_tenant_id"></a> [arm\_tenant\_id](#output\_arm\_tenant\_id) | n/a |
| <a name="output_azure_region"></a> [azure\_region](#output\_azure\_region) | n/a |
| <a name="output_databricks_azure_workspace_resource_id"></a> [databricks\_azure\_workspace\_resource\_id](#output\_databricks\_azure\_workspace\_resource\_id) | n/a |
| <a name="output_resource_group"></a> [resource\_group](#output\_resource\_group) | n/a |
| <a name="output_arm_client_id"></a> [arm\_client\_id](#output\_arm\_client\_id) | Deprecated |
| <a name="output_arm_subscription_id"></a> [arm\_subscription\_id](#output\_arm\_subscription\_id) | Deprecated |
| <a name="output_arm_tenant_id"></a> [arm\_tenant\_id](#output\_arm\_tenant\_id) | Deprecated |
| <a name="output_azure_region"></a> [azure\_region](#output\_azure\_region) | Deprecated |
| <a name="output_databricks_azure_workspace_resource_id"></a> [databricks\_azure\_workspace\_resource\_id](#output\_databricks\_azure\_workspace\_resource\_id) | Deprecated |
| <a name="output_resource_group"></a> [resource\_group](#output\_resource\_group) | Deprecated |
| <a name="output_workspace_url"></a> [workspace\_url](#output\_workspace\_url) | n/a |
| <a name="output_resource_group_id"></a> [resource\_group\_id](#output\_resource\_group\_id) | n/a |
| <a name="output_workspace_id"></a> [resource\_workspace\_id](#output\_resource\_workspace\_id) | n/a |
<!-- END_TF_DOCS -->
33 changes: 0 additions & 33 deletions modules/adb-exfiltration-protection/main.tf
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
/**
* Azure Databricks workspace in custom VNet
*
* Module creates:
* * Resource group with random prefix
* * Tags, including `Owner`, which is taken from `az account show --query user`
* * VNet with public and private subnet
* * Databricks workspace
*/
provider "azurerm" {
features {}
}

resource "random_string" "naming" {
special = false
upper = false
Expand Down Expand Up @@ -44,23 +31,3 @@ resource "azurerm_resource_group" "this" {
location = local.location
tags = local.tags
}

output "arm_client_id" {
value = data.azurerm_client_config.current.client_id
}

output "arm_subscription_id" {
value = data.azurerm_client_config.current.subscription_id
}

output "arm_tenant_id" {
value = data.azurerm_client_config.current.tenant_id
}

output "azure_region" {
value = local.location
}

output "resource_group" {
value = azurerm_resource_group.this.name
}
r3stl355 marked this conversation as resolved.
Show resolved Hide resolved
44 changes: 44 additions & 0 deletions modules/adb-exfiltration-protection/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
output "databricks_azure_workspace_resource_id" {
description = "**Deprecated** The ID of the Databricks Workspace in the Azure management plane"
value = azurerm_databricks_workspace.this.id
r3stl355 marked this conversation as resolved.
Show resolved Hide resolved
}

output "arm_client_id" {
description = "**Deprecated**"
value = data.azurerm_client_config.current.client_id
}

output "arm_subscription_id" {
description = "**Deprecated**"
value = data.azurerm_client_config.current.subscription_id
}

output "arm_tenant_id" {
description = "**Deprecated**"
value = data.azurerm_client_config.current.tenant_id
}

output "azure_region" {
description = "**Deprecated**"
value = local.location
}

output "resource_group" {
description = "**Deprecated**"
value = azurerm_resource_group.this.name
}

output "workspace_url" {
description = "The Databricks workspace URL"
value = "https://${azurerm_databricks_workspace.this.workspace_url}/"
}

output "azure_resource_group_id" {
description = "ID of the created Azure resource group"
value = azurerm_resource_group.this.id
}

output "workspace_id" {
description = "The Databricks workspace ID"
value = azurerm_databricks_workspace.this.workspace_id
}
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
# versions.tf
terraform {
required_providers {
databricks = {
source = "databricks/databricks"
version = ">=1.20.0"
version = ">=1.52.0"
}
azurerm = {
source = "hashicorp/azurerm"
version = ">=2.83.0"
version = ">=4.0.0"
}
random = {
source = "hashicorp/random"
Expand Down
6 changes: 0 additions & 6 deletions modules/adb-exfiltration-protection/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,6 @@ variable "spokecidr" {
default = "10.179.0.0/20"
}

variable "no_public_ip" {
description = "If workspace should be created with No-Public-IP"
type = bool
default = true
}

variable "rglocation" {
description = "Location of resource group"
type = string
Expand Down
Loading