Azure Policy Service

APS occupies a bit of an odd space in the Azure noosphere.  Conceptually, it’s part access control, part desired state configuration, and part up-front configuration management.  I’m not going to get too deep into the nuts and bolts here – it’s possible to cause a lot of hard-to-fix trouble with any policy provider – but it’s worth a look at what this does and how it does it.  Used as intended and with some forethought, it can save a lot of time and trouble getting an environment clean and running as expected.

Like all policy engines, APS starts with a policy definition.  It’s essentially just a list of resources we want to watch, and a set of conditions we’d like to ensure those resources meet.  Microsoft has eight of these pre-defined, for common situations, and they make for good examples. There’s a pre-defined policy that applies to all resources and only allows creation in a given location; anyone that tries to create a VM in Western Europe, for example, will fail policy check and get an ugly 403 error instead of the VM he wanted.  If you’ve ever worked on large, cross-region platforms, you’ve undoubtedly had someone spin up resources in the wrong location inadvertently, and if you’re lucky you caught it before it got too far down the road. APS actions are evaluated during resource creation or updates – when you make the request, it’s checked against existing policy and adjusted or denied entirely as specified in policy.

Of course, in order to really help in any given environment, your policy engine has to support custom policies.  Maybe it’s not a big worldwide issue that all servers everywhere have your exact configuration, but it’s an issue in your life and that’s what matters to you.  APS takes policy definitions in all the usual formats – portal, powershell, azure cli – but in the end they’re all JSON objects, with an unusually well-documented format.  Like all really powerful engines, policy definitions aren’t the easiest things to work with. There are a lot of options and a lot of precision required to define exactly what you want, and between those things this can get a little tough to work with.  Microsoft’s documentation advises “audit before deny”, so you can field test a policy without risking massive destruction.

Custom policy is tricky in execution but pretty simple in concept.  Define the resources to which the policy applies, the condition for which you want to check, and the action to take if that condition isn’t met.  APS provides six actions – ‘effects’ in APS parlance – that cover everything you’d expect from a DSC engine. They’re detailed here with accompanying example policy definitions.  It’s easy to conceptualize what you want and how it fits into APS, and the documentation is good.

What APS isn’t

It’s not a policy engine in the AD GPO sense.  It doesn’t get into granular user or resource rights assignment, and it wants to deal with resources as resources, rather than as objects against which user accounts interact.  You can, with some juggling, shoehorn it into handling permissions, but it’s a clunky, inefficient way of handling those cases. Those are better handled by RBAC (role-based access control) than wrestling scripts into APS.  

It’s not exactly DSC, either.  APS applies on resource creation or update, but doesn’t poll existing resources.  It’ll stop someone from creating a SQL server with an old SQL version, but it won’t correct an old SQL server that was there before the policy existed.  If it’s in place before any resources are created, it can sort of simulate configuration management, but it’s the wrong tool for that job and relying on it can lead to headaches down the road.  If you’re looking for configuration management, consider an Azure Automation Account instead. That’s purpose-built for exactly that task and much better suited to it.

Tips for starting out

Make liberal use of “AuditIfNotExists”.  It doesn’t do any lasting damage, just logs discrepancies.  That makes it long-term useful for spotting issues that aren’t necessarily problems, but it also makes it ideal for dry-running your custom policies.  If your new policy runs and the audit shows that it’s hitting resources you didn’t expect, no harm done – just review the policy for what resources it applies to and correct whatever needs correcting.

Start small.  Maybe you want to make sure every resource gets tagged as it’s created, but you don’t want resource creation to fail outright if it’s not tagged.  Narrow it down to one resource type and and audit it to make sure it’s behaving as you expect. Once you’re satisfied that it is, you can change the Audit to an Append and make sure it appends what you wanted.  It’s easy to start small and build up than to slather it all over everything and deal with confusion later, when people find they can’t create resources or that the resources they create have confusing or misleading tags they can’t explain.

Be aware that APS takes the most restrictive of applicable policies and enforces it, without regard to policy order.  It evaluates all applicable policies on resource creation or update, and if it can find any excuse to deny the request it’ll take it.

Further reading:
Microsoft’s official APS documentation – especially the concepts and samples pages.

Azure Policy Repository – lots of policy definitions for all kinds of resources.