Terraform: Custom Resources (without Go)

Usually, when we want to cover some infrastructure with Terraform, we try to find an existing provider with a resource we need. For most usual cases (for major clouds, for most-used resources) we do find necessary implementations, but sometimes that happens, that we need either a not-so-popular resource, or something highly specialized, that no-one else bothered before us to it.

The normal way would be then to implement a custom Terraform provider. But what if we don't introduce Go into our codebase (as that is the only language the providers could be written with)? What if we don't want to set up the full-blown tool chain for development, including CI/CD? What if we also don't want to solve the trouble of publishing the resource into some registry (public registry, or a self-hosted one)?

As long as there are SDKs or tools that can create the needed resources for us, we can stay with the already known experiences and technologies, while using Terraform only for gluing everything together, so that we could manage the lifecycle of all resources (including their interdependencies) in one place.

Important: the new to-be-implemented custom resource should follow the declarative approach, and it should have characteristics of a resource. I.e., it should have clear lifecycle states, it should be possible to provision the resource, to destroy it, to read its current state (for drift detection). The resource may be virtual, or it may represent only a part of another resource. But as long as it complies with the Declarative approach (when we describe some desired state of resources – as opposed to the Imperative approach, when we write down the steps needed to achieve the desired state), it could be a good candidate for a custom resource.

So, to create a custom resource, we need to cover with some custom code the usual lifecycle states of a Terraform resource: provisioning (create), deprovisioning (destroy) and drift mitigation (refresh).

The approach expressed below targets Azure, and it also utilizes PowerShell 7+, but it only serves as an example, and the approach could be used in other cases and with other clouds as well.

When using Terraform with Azure, we employ the recommended approach:

At first, we try to find a definition for a needed resource in the AzureRM provider (implemented and maintained by Hashicorp).
Then, if the needed resource is not available, we try to find it in AzAPI (a thin wrapper over Azure APIs, which is maintained by Microsoft).
And we fall back to the solution of custom resources only when neither AzureRM not AzAPI support what we need.

Let's take as an example the task of managing the AppSettings of an Azure App Service separately from the App Service itself. For example, the need for it may arise in the following cases:

When the App Service is created in one Terraform project, but its configuration is extended from another Terraform project (and there are some reasons to keep this separation).
When there is a need to break a cyclic dependency: two App Services depend on each other, because they require some info from one another to complete their configuration.

Example: App Service 1 wants to call App Service 2, and we want to implement authentication using their SystemAssigned managed identities (which are created automatically during provisioning of the App Services). App Service 1 needs to know the principal_id of App Service 2, so it could request a token for it. App Service 2 in its turn needs to know principal_ids of all callers (in this example, App Service 1), so it could validate them.

The needed resource is not supported by AzureRM (there is only a closed feature-request). In AzAPI there is a page on the resource of AppSettings, but it seems to be auto-generated, and it also lacks examples of usage. Anyway, we still need an example to illustrate the approach, so we will proceed with creating a custom resource for it 😉

We will use this configuration of App Services (link to the repo with complete code, see at the end of the article).

# apps.tf
# Declarations of Resource Group and AppService Plan are skipped.

resource "azurerm_linux_web_app" "appService1" {
  name                = "app-service1"
  location            = azurerm_resource_group.resourceGroup.location
  resource_group_name = azurerm_resource_group.resourceGroup.name
  service_plan_id     = azurerm_service_plan.appServicePlan.id

  site_config {}

  identity {
    type = "SystemAssigned"
  }

  app_settings = {
    EXAMPLE_SETTING_1 = 42
  }
}

resource "azurerm_linux_web_app" "appService2" {
  name                = "app-service2"
  location            = azurerm_resource_group.resourceGroup.location
  resource_group_name = azurerm_resource_group.resourceGroup.name
  service_plan_id     = azurerm_service_plan.appServicePlan.id

  site_config {}

  identity {
    type = "SystemAssigned"
  }

  app_settings = {
    EXAMPLE_SETTING_2 = 43
  }
}

Create

For reusability of the created custom resource, we will introduce a module. First, we will create a script that performs the creation of necessary App Settings. We will use PowerShell which invokes Az CLI, but as said before, any tool or programming language can be used. It will be invoked by Terraform in the same way it can be invoked from terminal, so we only need to make sure that the necessary tooling is available on the host machine.

# additional-app-settings/assets/create.ps1

[CmdletBinding()]
param (
  [Parameter(Mandatory)] [string] ${subscription-id},
  [Parameter(Mandatory)] [string] ${resource-group-name},
  [Parameter(Mandatory)] [string] ${app-service-name},
  [Parameter(Mandatory)] [hashtable] ${app-settings}
)

$settings = (${app-settings}.Keys | ForEach-Object { "$($_)=$(${app-settings}[$_])" }) -join " "

az webapp config appsettings set `
  --subscription ${subscription-id} `
  --resource-group ${resource-group-name} `
  --name ${app-service-name} `
  --settings $settings

To include the script into Terraform lifecycle, we will use the fake built-in resource terraform_data with a custom provisioner.

# additional-app-settings/main.tf
# See link to the repo below for configuration of the module (providers and inputs).

locals {
  # The property id is marked as 'known after apply' during initial creation.
  # This avoids deadlocking the implemented custom refresh mechanism.
  # We parse the id to retrieve name and resource group name.
  appService        = provider::azurerm::parse_resource_id(var.appService.id)
  resourceGroupName = local.appService.resource_group_name
  appServiceName    = local.appService.resource_name
}

resource "terraform_data" "appSettings" {
  triggers_replace = {
    subscriptionId    = var.subscriptionId
    resourceGroupName = local.resourceGroupName
    appServiceName    = local.appServiceName
    appSettings       = var.appSettings
  }

  input = {
    subscriptionId    = var.subscriptionId
    resourceGroupName = local.resourceGroupName
    appServiceName    = local.appServiceName
    appSettings       = jsonencode(var.appSettings)
  }

  provisioner "local-exec" {
    when        = create
    interpreter = ["pwsh", "-Command"]
    command     = <<-EOT
      ${path.module}/assets/create.ps1 `
        -subscription-id $env:subscriptionId `
        -resource-group-name $env:resourceGroupName `
        -app-service-name $env:appServiceName `
        -app-settings ($env:appSettings | ConvertFrom-Json -AsHashtable)
    EOT
    environment = self.input
    quiet       = true # Silences printing of the invoked command. All other output is not silenced.
  }
}

Note the following:

All inputs of the module are added into triggers_replace, which will make sure that any changes of the parameters are noticed and reconciled (although this is achieved via recreation).
The properties interpreter and command together fulfil the task of invocation of custom code. If your script is written in Bash, you may use ["/bin/bash", "-c"].
Parameters to the invoked script are passed via properties input and environment. This will become relevant for the destroy-time provisioner (explained below).

Let's instantiate the created module and check out it works.

# app-settings.tf

module "appService1AppSettings" {
  source = "./additional-app-settings"

  subscriptionId = data.azurerm_client_config.current.subscription_id
  appService     = azurerm_linux_web_app.appService1
  appSettings = {
    CALLEE = azurerm_linux_web_app.appService2.identity[0].principal_id
  }
}

module "appService2AppSettings" {
  source = "./additional-app-settings"

  subscriptionId = data.azurerm_client_config.current.subscription_id
  appService     = azurerm_linux_web_app.appService2
  appSettings = {
    CALLER = azurerm_linux_web_app.appService1.identity[0].principal_id
  }
}

When we try to invoke terraform apply at this point, we will see that it works – the necessary app settings are created successfully. But if we try to run it once again, we will see that Terraform detected them as a drift and wants to remove them:

  # azurerm_linux_web_app.appService1 will be updated in-place
  ~ resource "azurerm_linux_web_app" "appService1" {
      ~ app_settings                                   = {
          - "CALLEE"           = "808d076e-0d68-45e6-80aa-d7e194ddaed6" -> null
            # (1 unchanged element hidden)
        }
        # (28 unchanged attributes hidden)

        # (2 unchanged blocks hidden)
    }

  # azurerm_linux_web_app.appService2 will be updated in-place
  ~ resource "azurerm_linux_web_app" "appService2" {
      ~ app_settings                                   = {
          - "CALLER"           = "384fe864-f61e-4335-bb1b-65198b89e872" -> null
            # (1 unchanged element hidden)
        }
        # (28 unchanged attributes hidden)

        # (2 unchanged blocks hidden)
    }

To mitigate that, we need to add the following section to the declarations of App Services:

# apps.tf

resource "azurerm_linux_web_app" "appService1" {
  ...
  
  lifecycle {
    ignore_changes = [app_settings["CALLEE"]]
  }
}

resource "azurerm_linux_web_app" "appService2" {
  ...
  
  lifecycle {
    ignore_changes = [app_settings["CALLER"]]
  }
}

That is the most unfortunate disadvantage of this solution. In the case when we need to configure some AppSettings in the resource itself, but some other with a separate module, we have to know the names of all additional AppSettings in advance and to ignore them in the App Services. Otherwise, Terraform will be trying to delete them every time.

It is also possible to shift management of AppSettings completely out of the resource of App Service, and then to completely ignore the property app_settings. Terraform then will not know anything about the AppSettings, which also means that there will be completely no drift detection for them.

Destroy

Deprovisioning phase will be covered by another script which is to be invoked by a destroy-time provisioner.

# additional-app-settings/assets/destroy.ps1

[CmdletBinding()]
param (
  [Parameter(Mandatory)] [string] ${subscription-id},
  [Parameter(Mandatory)] [string] ${resource-group-name},
  [Parameter(Mandatory)] [string] ${app-service-name},
  [Parameter(Mandatory)] [hashtable] ${app-settings}
)

$settings = ${app-settings}.Keys -join " "

az webapp config appsettings delete `
  --subscription ${subscription-id} `
  --resource-group ${resource-group-name} `
  --name ${app-service-name} `
  --setting-names $settings

# additional-app-settings/main.tf

resource "terraform_data" "appSettings" {
  ...

  input = {
    subscriptionId    = var.subscriptionId
    resourceGroupName = local.resourceGroupName
    appServiceName    = local.appServiceName
    appSettings       = jsonencode(var.appSettings)
  }

  ...

  provisioner "local-exec" {
    when        = destroy
    interpreter = ["pwsh", "-Command"]
    command     = <<-EOT
      ${path.module}/assets/destroy.ps1 `
        -subscription-id $env:subscriptionId `
        -resource-group-name $env:resourceGroupName `
        -app-service-name $env:appServiceName `
        -app-settings ($env:appSettings | ConvertFrom-Json -AsHashtable)
    EOT
    environment = self.input
    quiet       = true
    on_failure  = continue
  }
}

The destroy-time provisioner imposes some differences as compared to the create-time provisioner:

The destroy-time provisioner cannot reference any local variables, input parameters or other resources. Instead, they always use the captured state of the existing (just-to-be-destroyed) resource. Thus, we capture all the necessary values in the available property input, and then access them in the provisioner block via the special self object.
We use environment to inject the values into the script. With this, we follow the recommendation against the code injection attack.
The property environment expects the type map(string). When we need to pass some complex object (in our case – a map of key-values pairs of AppSettings), we need to serialize it before passing and deserialize it in the script (thus the invocations of jsonencode() and ConvertFrom-Json).
And we also don't want to be too strict in case of possible failures during the destruction of the resource. Supporting all cases which could go wrong is tricky (maybe the App Service itself has been deleted - we don't want to cause a complete deadlock of Terraform), thus, we relax the requirement of successfulness with on_failure = continue.

Important: if we ever decide to decommission the resource, we need to be extremely careful. Destroy-time provisioners run only when they are present in the code at the time of the destruction. Multistep approach should be then utilized: count = 0, and only then deletion of the resource.

We can now test if the destroy-time provisioner works:

terraform apply -replace module.appService1AppSettings.terraform_data.appSettings

Drift Mitigation

One of the powerful Terraform features is drift mitigation. Every resource of every normal provider implements a special Read method, which is invoked during the refresh phase.

Unfortunately, provisioners can be invoked only be of one of two types: create or destroy. There is no special provisioner type to hook into the refresh phase of terraform apply. For the drift mitigation, we will have to employ something else.

The implemented resource already reacts to changes of the input parameters (via triggers_replace). We need to add another 'synchronization pulse' to mark the resource for recreation based on external changes. Luckily, there is a provider pseudo-dynamic/value with a resource that implements exactly the capability we need.

At the first step, we need to read the current AppSettings of the App Service. We can achieve it with a data-resource.

# additional-app-settings/refresh.tf

data "azurerm_linux_web_app" "appService" {
  resource_group_name = local.resourceGroupName
  name                = local.appServiceName
}

Then, we need to find out, if the current AppSettings are in desired state (if all the necessary AppSettings are present and if the values are the same as we expect them to be).

# additional-app-settings/refresh.tf

locals {
  currentAppSettings = data.azurerm_linux_web_app.appService.app_settings

  areAppSettingsInDesiredState = alltrue([
    for desiredKey, desiredValue in var.appSettings :
    contains(keys(local.currentAppSettings), desiredKey) ?
    local.currentAppSettings[desiredKey] == desiredValue :
    false # We can't use the logical operator '&&' here due to a bug in short-circuiting.
  ])
}

Afterward, we will configure the reaction, when the AppSettings no longer appear to be in the desired state. We use the resource value_replaced_when for that.

# additional-app-settings/refresh.tf

resource "value_replaced_when" "driftDetected" {
  condition = !local.areAppSettingsInDesiredState
}

This resource is special: it expects only one boolean value as condition, and it produces a new random value every time when during an invocation of terraform apply the condition is false. Otherwise, it locks the previously produced value and does not change it. This fancy behavior then plays well with the property triggers_replace, as it causes recreation of the resource every time, when anything inside it changes.

# additional-app-settings/main.tf

resource "terraform_data" "appSettings" {
  triggers_replace = {
    ...
    driftDetectionTrigger = value_replaced_when.driftDetected.value
  }

  ...
}

With this, we made our custom resource to detect and react on any drift: be it someone accidentally changing an AppSetting, or even maliciously removing it.

There is a minor inconvenience: although one terraform apply correctly detects and mitigates the drift, there is a need to execute it another time - just so that the resourcevalue_replaced_when.driftDetected could settle its condition.

Now, there is just one feature missing. When Terraform detects some drift during the refresh phase, it reports it in the log, so we could verify and explicitly approve it. We could achieve it with another fancy resource that prints custom warnings to the console logs.

# additional-app-settings/main.tf

data "validation_warnings" "appSettingsAreNotInDesiredState" {
  dynamic "warning" {
    for_each = var.appSettings
    iterator = each
    content {
      condition = !contains(keys(local.currentAppSettings), each.key)
      summary   = "AppSetting ${each.key} is not present, so it will be added"
    }
  }

  dynamic "warning" {
    for_each = var.appSettings
    iterator = each
    content {
      condition = (
        contains(keys(local.currentAppSettings), each.key) ?
        local.currentAppSettings[each.key] != each.value :
        false
      )
      summary = "AppSetting ${each.key} does not have desired value, so it will be updated"
    }
  }
}

Although this approach emulates the refresh phase, it does not entirely follow the regular phases of Terraform. The implementation relies on an additional data resource, meaning that its reading will happen every time. The flag -refresh=false does not take any effect – the drift will be detected and mitigated regardless.

The approach expressed above could be used as a pattern. It bridges the gap between Terraform and other tooling that is not available via some custom provider. It can be used instead of implementing some custom provider, which allows to stay in the technology stack already adopted by the team.

The approach plays nicely in the cases, which the concepts provisioning, deprovisioning and drift mitigation could be applied to. There are some things that one needs to know when using it, but in general some new case can be implemented with the pattern only once, and as long there is no need to change it drastically, it will continue to live (it is even resilient to external impact – which is covered by the drift mitigation).

The complete executable code could be found in this repo: https://github.com/egorshulga/terraform-custom-resource.