Azure Policy Remediation Solution

I was recently presented with an issue where specific Azure Policies were failing and there was a desire to have their remediation automated. The original issue for the failure had to do with a race condition where the policy initiative was executing prior to the successful creation of all resources that were being hit in the initiative. So, how can we auto remediate failed (DINE) policies?

My solution involves the use of two Azure services; Logic Apps and Automation Accounts.

The Logic App is pretty simple…

  1. Run on a timer. Run every hour, or every day, or every 10 minutes, depending on how long it is acceptable to leave failed policy executions out there.
  2. Run a query against Azure Resource Graph to find the failed executions. I’ll get in to this in more detail later.
  3. Send that list of failed executions to an automation account and wait for it to reply back saying it got the message.

The Automation Account Runbook is a little more complex, but not exactly rocket science…

  1. Parse out the list of failed policy executions and make sure you have a unique list by assignment ID. This got a bit sloppy as the json didn’t want to behave as I expected it to. In any case, it works.
  2. Log in to Azure
  3. Run the appropriate PUT API call depending on if the assignment is at a Management Group, Subscription, or Resource Group level.
  4. Send that response back to the Logic App letting it know it was able to do what it needed to do.

So now that we have the general flow of this thing, let’s walk through creating this in your own environment. The only prerequisites are that you have access to an Azure subscription where you can create resources and assign roles to System Assigned Managed Identities.

  1. Create a Resource Group to store this workload.
  2. Create a Logic App in your RG.
    • Choose your hosting plan option (I just did consumption)

3. Press the “Create” button

4. Open the Logic App and go to Settings=>Identity. Turn System Assigned Managed Identity on and save.

5. Now grant this identity the reader role at the top level Management Group where it may need to look for failed policy assignments (it must have reader in order to run the ARG query). You will also need to grant it contributor on the Automation Account we have yet to create, or you can grant it contributor at a scope higher than the Automation Account, such as as the current Resource Group.

6. Now go to Development Tools=>Logic App Designer

7. Add a trigger. I went ahead and just made the trigger time based. As stated earlier, this can be daily, hourly, every 10 minutes…whatever you like. The frequency of the execution may have some bearing on the Azure Resource Graph (ARG) query you need to write, depending on the needs of your specific implementation. I’m just going to set this to run every 10 minutes.

8. After clicking on the add a trigger in the editor, search for “Recurrence” and you should see this action listed under Schedule. Click on it. and edit.

9. Click the “+” under the recurrence step you just added, press “add an action”, and search for “http”. Select the HTTP action and edit it using the following values.

  • URI – https://management.azure.com/providers/Microsoft.ResourceGraph/resources?api-version=2021-03-01
  • Method – POST
  • Headers – Content-Type = application/json
  • Body – (Note: you will need to edit this query to suit your needs. Maybe you want it to only look back at failures that happened in the last 10 or 15 minutes or over the last 24 hours. Maybe you want it to look for only specific policyIDs. Edit this query to get just the results that work for your particular use case. To test ARG queries and edit them, in the search bar at the top of Azure search for “Resource Graph Explorer”. You can then copy and paste this query in to the query editor to play around with it. Now, back to the query to enter in the body……)

{
“query”: “policyresources | where type == ‘microsoft.policyinsights/policystates’ and properties.complianceState == ‘NonCompliant’ and properties.policyDefinitionAction == ‘deployifnotexists’ | project policyAssignmentId = properties.policyAssignmentId, policyDefinitionId = properties.policyDefinitionId , complianceState = properties.complianceState”
}

  • Choose “Authentication” in Advanced parameters
    • Authentication Type = Managed Identity
    • Managed Identity = System-assigned managed identity

10. Click out of the configuration for the HTTP action and click the “+”, add an action. Search for “http” again but this time choose the “HTTP Webhook”.

11. Before starting to configure this step, open a new tab and create an automation account in the same resource group. On the second tab in the creation process ensure “System Assigned” is checked.

12. Once the automation account is created, go to the account and create a new runbook.

13. After pressing the create button it should bring you to the editor page. Enter “#tempText” and save and publish. This must be done in order to perform the next step.

14. After creating the runbook, open it in the portal and in the left navigation pane go to Resources=>Webhooks. + Add Webhook

15. Click Create new webhook. Give it a name, set it to enabled, and set expiration date to some point in the future. Before pressing OK, copy the webhook URL and paste it in the Logic App tab, HTTP Webhook action’s Subscribe URI field.

16. Back on the webhook creation tab, press OK.

17. Click on the Parameters and run settings. Press OK.

18. Press Create.

19. Now back on the HTTP Webhook action in the Logic App, set…

  • Subscribe Method = POST
  • Click in the Subscribe Body
    • You will see two icons pop-up to the left. Click the top one that looks like lightning. In the next pop-up, select the HTTP body.
    • Now click the function (fx) icon. In the search that pops-up, search for callback and click on listCallbackUrl()

20. Click out of the configuration and save the logic app.

21. Back on the runbook tab, click on overview in the left navigation pane and choose edit in portal.

22. Get the code for the runbook from runbook.ps1 in gitHub and paste it in the runbook.

23. Save and Publish.

24. Back at the Automation Account level (not the runbook) search the left subnav panel for Account Settings=>Identity. Assign the Automation Account’s system-assigned managed identity the ability to manage policy at the same level you assigned the reader role to the other SAMI (usually a higher Management Group) I just assigned it contributor, but you could also assign just Resource Policy Contributor.

Thant’s it. You’re ready to go. It’s working….

I’m sure you want me to prove it though. I get it.

First things first. Let’s make sure you have some failed DeployIfNotExist (DINE) policies out there. Do you remember that ARG query we used earlier? And how I explained how to test that query in the Resource Graph Explorer. Go run the query and make sure you’re getting something. If you’re not, this whole workload has nothing to do. If you are not getting anything in that query and think you should, keep tweaking the query. If you’re not sure. Go break something.

Loose instructions to get a broken DINE policy

  • Copy an existing DINE policy that’s easy to use, meaning it “controls” a resource that is easy to provision and doesn’t require too much setup.
  • Edit the copy (custom) policy to have the ARM template portion attempt to make in impossible update. For example, if somewhere in the ARM template it is trying to set a field to true or false or on or off, update it to set it to banana. The policy will allow you to save it, but when it tries to run it will fail.
  • After you have made the required change to your custom policy assign it and run “az policy state trigger-scan”. This will force an evaluation, and this will take a while. Once it’s done rerun the ARG query and you should have a failure.

Once you have something showing up in your ARG query, make sure your HTTP action step uses the query that is giving you results. Now run the logic app. The end result should be a remediation task that is running. To debug you can view the output in the runbook.

That’s all I got for you. Hope this helps

Kippis,

Greg

Tags:

One response

Leave a Reply

Your email address will not be published. Required fields are marked *

Latest Comments