Adam Kinniburgh
VP Innovation, SquaredUp
If you’re reading this article, chances are you’re about to start moving into Azure and want to get your cost control right from the start, or you’re already on your journey and are running into problems getting a clear view of where you’re spending your money. Moving into Azure, or any other cloud, brings with it a new set of challenges that you just don’t face on-prem, and the big one is that pretty much everything incurs its own running cost. With every new deployment, you’re having to think about your budget and suddenly there’s a whole new audience to cater for when it comes to reporting.
I’ve been administering Azure estates of various sizes since it launched in 2010 and while the Azure of today is an entirely different platform to a decade ago, there are still some things that are hard to get right. In this article I’m going to share what I’ve learned and hopefully it’ll help you on your journey.
One of the best things about Azure is that it’s extremely approachable. You don’t need to be a seasoned Systems Engineer to make sense of it, nor do you need to learn a new language to figure out what you need. In a lot of cases, someone has already done what you’re trying to do and has written a helpful article to guide you through it.
It being so easy to spin up new resources all over the place is often the underlying cause for tricky cost control, so making sure you use all the tools at your disposal to sensibly partition these new resources is key.
As a very high-level comment, most organisations break down their budget based on department / function. These are typically the “big buckets” that costs are allocated to over the course of a year, and Azure has an equivalent “big bucket” concept too… a Subscription.
Subscriptions are the top-level container in Azure and serve primarily as a billing object. If you start by giving each department its own subscription, you’re already making a good move towards your Azure invoices being easy to attribute to a specific budget area.
Subscriptions not only serve as the top-level billing object, but they’re also at the top in terms of permissions in Azure. You’ll see the Identity & Access Management (IAM) blade against everything, but a great starting point is to apply some Subscription-level permissions to those who need oversight of the cost. The “Billing Reader” role is ideal for those in your finance team, and the “Reader” role gives resource-level visibility without the ability to make changes. The latter being ideal for technical staff who don’t have day-to-day work in Azure but still need to keep an eye on what’s there (think project managers, IT managers). Applying these rights at Subscription-level means it’s a one-time operation and they automatically cascade down your hierarchy. And even better is to apply those rights to dedicated groups in Azure Active Directory, rather than to individuals, then manage your permissions that way in the future.
As I said before, breaking down a budget by department is a very high-level statement and, in most cases, it gets way more granular than that. Resource Groups are the next level down in the hierarchy and just like Subscriptions, can be used to report against for costs or have permissions applied. At this level, you’re probably starting to think about the teams or services within a department, so as an example, a Subscription for your sales department might contain Resource Groups that split out various services they run like your online store, order fulfilment systems, your CRM platform etc. By using Resource Groups to split out individual applications, you’re adding a level of granularity to your reporting as well as creating a really clean permissions structure. If the people responsible for creating new resources in Azure can only do so in the right locations, you can avoid sprawl and prevent things from getting “lost”.
Despite your best efforts to create a really neat logical structure in Azure that perfectly aligns to your company’s org chart and budget, someone is always going to come along and ask a question like “how much do our customer-facing systems cost?”. In cases like these, where you’re probably having to look across multiple Resource Groups or more likely, multiple Subscriptions, you need a good mechanism to relate these otherwise siloed resources without having to break out the calculator. Introducing… Tags.
Tags in Azure are an extremely powerful feature that let you define a bespoke taxonomy that’s entirely separate to the structure of your Subscriptions and Resource Groups. Tags are simple key:value pairs that you can use for all sorts of neat things. In this case, applying a Tag along the lines of category:customerfacing to all of the appropriate resources will let you report on the cost of every resource with that Tag with no regard for where they’re deployed or under which Subscription. You could even tag things based on their owner, who deployed them etc. Tags give you the ability to break out of your hierarchy when you need to.
Following the tips above should lead you down a path towards a very straightforward and easy to manage Azure estate, but you must enforce those rules. There are a few pretty simple ways to do this… IAM, Policy, and automation.
A well-designed IAM structure means you can give visibility to those who need it, but also keep people within their assigned areas when they do need higher privileges. You can avoid new resources appearing in the wrong subscription and as a result, stop those unexpected costs on your next invoice.
Azure Policy is also a great feature to employ right from the start, and as a first step, use it enforce the use of Tags. If everything needs to have a Tag or a set of Tags to facilitate better reporting, make them mandatory. It’s really no burden on those administering Azure to have to add these Tags when they’re creating new resources, and it’ll save a few headaches in the future.
And finally, onto automation. If you do think that enforcing tags on everything will be too much to ask, at the very least I’d recommend tagging your resources and resource groups with the name of the person who created them to ensure there’s some future accountability. I won’t go into detail in this article but there are some great write-ups covering the use of Azure Logic Apps, Automation Accounts, and Runbooks to achieve this quickly. Also keep an eye out for an upcoming post from SquaredUp about how we did this internally.
Along with it being very easy to create resources in Azure, it’s also incredibly easy to create resources that are way over-specified, are running when you just don’t need them, or aren’t actually doing anything at all. It definitely makes life easy when setting up a new VM to give it 128GB RAM and 48 CPU cores and leave it running 24/7, but equally, resources are billed by the hour and the more you use, the more it costs. It can be challenging to overcome the mindset you develop through years of managing relatively fixed cost on-prem infrastructure, but a little forward planning is really all it takes to master your cloud costs.
Something that’s often overlooked when moving resources to Azure is where on Earth to put them, literally. Azure is split up into a few dozen geographical regions and people often default to the approach of “well I’m in the UK so I’ll build everything in the UK”. In some cases, this is perfectly valid, for example when all your service users are also in the UK and you want super-low latency. But in a lot of cases, you’re better shopping around to take advantage of the larger regions where costs can be lower by as much as 25-30%. Microsoft’s own Pricing Calculator is good, but I also like AzurePrice.net which shows region-by-region comparisons.
Understanding what you’re going to deploy in the future also helps as you can quickly get tied into a region when you’re deploying a lot of Infrastructure-as-a-Service (IaaS), but for most Platform-as-a-Service (PaaS) resources you can be really flexible. This leads me nicely into Tip 7.
There are two main approaches to Azure migrations… lift-and-shift or modernize. Lift-and-shift is really appealing as you’re typically using a tool like Azure Migrate or Azure Site Recovery to simply lift an on-prem VM and shift it into Azure, spin it up, then pat yourself on the back. The downside to this is that running VMs in Azure is akin to setting fire to stacks of cash. Modernizing your services is really the best approach, but it does require a bit more effort up front.
Taking the modernization approach, you’re lifting the services you run on your on-prem VMs but you leave the VMs behind. PaaS resources are drastically cheaper as you primarily pay for the top-level service, all the underlying server infrastructure is shared, but the upfront effort comes in the form of redeveloping your apps to use PaaS. Another bonus is that you also remove the ongoing cost of maintaining all those VMs i.e. patching operating systems, troubleshooting crashes etc, so its often worth that initial pain to overcome those hidden costs.
Pretty much everything in Azure is billed based on an hourly cost multiplied by the number of hours of runtime. If you’ve picked the right region and the right type of resource, the next consideration is the size, which ultimately determines the hourly cost. As a simple example, an A-series VM with 0.75GB RAM is considerably cheaper than an M-series VM with 3,800GB RAM, understandably. The best approach to this is to make good use of your existing monitoring data to identify what your services actually need and only provision what’s necessary. Azure makes it very easy to pick exactly what you need with stunning granularity, from memory-optimised vs compute-optimised VMs, from shared SQL instances vs dedicated servers, you can always find an appropriate size that provides just enough to do the job. Start there and size up if needed.
The second half of that cost calculation is the number of hours of runtime. Power scheduling is very useful if you are running VMs as turning them off when they aren’t needed will save some money. Azure can do this for you if a fixed schedule is suitable, for example VMs that only need to run during office hours, or you can go for any number of really clever automated solutions using tools like PowerApps and Flow in conjunction with Tags to define custom schedules. At SquaredUp we took this a step further and integrated our Flows with Slack so that our developers can set their own schedules and even use on-demand controls like /startmyvm without needing access to the Azure Portal. There are 168 hours in a week but only 40 of them are office hours, so you can cut 75% off the cost of a VM by just turning it off evenings and weekends.
Once you’ve got a really clear understanding of what you’re moving to Azure, I’d highly recommend buying Reservations. Reservations are effectively a long-term commitment to use a certain amount of resource over a period of time. You can still pay monthly, but by making that commitment Microsoft are actually very generous. A typical three-year VM reservation can save you as much as 58% over pay-as-you-go.
I’ll also throw Enterprise Agreements (EA) into the mix here, although they don’t represent the value they used to. An EA is a contract made through a Microsoft Partner that requires a level of monetary commitment up front. In the past, pre-paying meant you got serious discounts over pay-as-you-go, along with some nice perks too. Nowadays, an EA is really only appealing to your finance team as it’ll replace the per-subscription invoices with just one. It also has its own billing portal and a PowerBI pack. If you’re getting to 10+ subscriptions, consider an EA and your finance team will be happy.
Bring-your-own-license (BYOL) is also worth mentioning here. By default, the monthly cost for a Windows Server includes the software license cost. If you’re a Visual Studio subscriber, or have a Volume License Agreement that covers Windows Server, you can cut out that built-in cost by bringing your own.
There are two broad themes covered by this article…
One of the biggest challenges in “getting Azure right” is that you’re moving from a fairly controlled world of upfront spending where your IT teams own everything, to a pay-as-you-go world where a much broader audience can play. Not everyone in this new world naturally thinks about the cost of running their resources and Microsoft actually make it hard to see them in some cases too.
Engaging with everyone who has any stake in Azure, no matter how small, is the real big (and final) tip from me. If you let people see that their work costs money, and maybe it costs a bit more than what another team is doing, you might just inspire them to take ownership and keep things on track. Managing the cost of IT has always been a team game, it’s just a much bigger team when you start using Azure and not everyone knows the rules yet.