Table of Contents
About 18 months ago, I took over a product that was fully run on Azure. The solution costs had ballooned from $2000/month at its inception, to almost $5000/month after being on Azure for about 2.5 years.
In this article, I’ll talk through how I did a deep dive into each component of the solution to investigate where the costs were being incurred and how we could optimize the solution to give better performance and lower costs.
As this was done over the past 12-18 months, I don’t have a lot of charts and graphs to show actual costs as I didn’t screenshot them at the time, but at the end of the process, I thought it would be worthwhile documenting the process in case it helps anyone else.
The product used Azure App Service for an ASP.NET MVC Web App, and older ASP.NET WCF Web API, Azure SQL, Blob storage, CosmosDb, KeyVault, and various other Azure services.
Cost management and Billing
Azure provides a LOT of information on billing and cost management, so if you take time to look at the billing charts and also take time to drill down into each service, you can find a great deal of information on where your costs are coming from. Sometimes, a very small amount of code can make a significant difference to your costs.
One example we’ll look at below involves a developer taking less than an hour to write a small in-memory cache that took one of our storage areas costs from $17/day to under 1c/day.
My first focus was on performance! We had seen an increase in traffic to the website over the past few years and the performance was not ideal. We had a default of 2 instances of the Premium App Service Plan that we were one and saw a large amount of scaling. We would see the system scaling up to 3 or 4 instances most days. We had CPU based auto-scaling configured on our Azure App Service Plan.
So the very first task was looking through our Application Insights to see what the slowest or most resource-intensive API calls were. From there, we rewrote the majority of approximately 30-40 API calls to make sure that they were are performant as possible.
There were several areas where we could improve the logic of these calls, but one of the primary areas we found improvement was in our database access code. In an attempt to “re-use code”, there were many instances of existing sub-functions being called that were going off to look-up data in the database.
These sub-functions would often return a lot more data than was needed for the specific API call we were looking at.
I prefer to focus on optimal speed and fewer database calls than code reuse, so we rewrote a lot of this code to load the required data (especially lookup data) upfront. We also focussed on only returning the minimal amount of data required for the task at hand.
Once we had updated these slowest 30-40 API calls several things happened. Firstly, the underlying speed of the system was, of course, faster, but we also saw a SIGNIFICANT decrease in the amount of scaling that the App Service was doing daily. The base load of the site was down giving us more headroom, so there needed to be a higher spike in traffic to see us need to scale.
So less calling means less cost. We saw an approx 60-80% decrease in scaling which saw us save about $200/month.
Once we’d completed the performance optimisations above, and the base load on our App Service Plan came down, we also took a look at the available App Service Plan we were on was the most suitable. It turned out that there has recently been a new Premium V3 range of plans introduced that were better for us than the Premium V2 plan that we were currently on.
Moving the new Premium V3 App Service Plan saved us another $150/month.
Choosing the correct data storage system is paramount. But you should also regularly revisit your choices to ensure that those choices still fit the requirements of the system.
Choose the right data store
When this project was first moved to Azure, CosmosDb was the flavour of the month/year. Microsoft was pushing it HARD to be the perfect solution to a large number of data storage needs.
We set up Databases and Collections to run most of our email management processes. The system sends anywhere from 500,000 to 2,500,000 emails a month (all legit to subscribed mailing lists)
So we stored everything in Cosmos from the anti-spam configuration key/value pair settings per customer to the SendGrid webhook callback messages that we use to monitor bounced emails, opens, and link clicks. We had the CosmosDb config set to the lowest 400RU setting so it should have only cost us about $45/month.
READ THE SMALL PRINT!
After several years, when I looked at the bill, the costs had blown out to over $500/month!!
After doing some searching around, I found out the system had automatically scaled itself from 400RU to 22,000RU. At this point, I found the small print in the documentation that says that CosmosDb requires a minimum of 100RU per GB of storage space used, and we were storing over 200GB of what were essentially log files from the SendGrid callbacks…
So this presented an issue as to what to do with them as we needed regular fast access but wanted to reduce the cost.
Azure Table Storage to the rescue!
We found that as the records were a flat structure, we could happily move these over to Table storage. With Table storage, you can store a huge number of records and if you use the correct partitioning system, it’s super fast to access and very cheap.
So I wrote a little console app that loaded all the logs 1 day at a time, reformatted them for Table storage and wrote them into two different tables (with different partition keys as we needed to query them two different ways)
It took a month to copy the 180,000,000 records we had over, but in the end, querying the data was actually faster than in CosmosDb, and only cost a few cents a day.
So that saved us approx $400/month.
Flow on effects
I mentioned above that we had spent a lot of time focused on fixing performance issues in the code itself.
As we’d optimised a large number of the heavier API calls, including optimising SQL queries and removing N+1 issues in code, the overall load on our SQL Elastic Pool dropped by about 40%.
So this allowed us to reduce the allocated DTU’s on the Elastic Pool from 300 to 200, saving us about $300/month
Join the mailing list to stay up to date with published articles
Moving on to looking at our Blob storage costs, I could see that we were paying about $17/day for Blob storage, which seemed exceptionally high as we only store several thousand images for our eCommerce storefront.
Drilling down on the Storage account usage and billing reports, I could see that we were calling the ListBlob API tens or sometimes hundreds of thousands of times a day. As these API calls scan the container to return all the blobs, it’s an expensive query to run.
It turned out we were calling this once for every product shown on every page on the site.
* There’s quite a complicated reason why we do this that’s related to cascading product type, subtype and category image hierarchy
So I asked the developer who looked after that part of the site to create a simple in-memory cache of the images in the container and use that instead. These files seldom change, and even caching the images for a few minutes made a HUGE difference.
The cost per day for that storage account dropped immediately from $17 per day to less than 1 cent per day!
This change which took a developer 2 hours to implement saved us close to $500/month.
The final part of the cost savings we found were with Azure Reserved Instances.
Reserved Instances are a way of agreeing to reserve a certain number or level of resources for a period of time. Most Reserved Instances in Azure are for 1 year or 3 years. You can often elect to pay them per month along with all other costs, or pay the entire reserve up front, which typically gains you a little further discount.
We knew that we would be running the App Service Plan at the same minimum level for the foreseeable future, so elected to purchase a Reserved Instance for the plan that we were on for 2 instances for 1 year. This allowed us to save about 30% or around $300 per month on the base App Service cost.
The App Service still scales when it needs to, and we pay the retail cost for those additional instances while they’re running, but this scaling is now minimsed.
If you’ve hung in this long, then I really appreciate it!
With a bit of time and effort, you can really optimise your cloud costs so you’re not paying for things you don’t need.
Make sure that you’re aware of your baseline costs, and make sure you check the costs every month to see if any particular costs is increasing. I make sure to group the billing reports by Resource type and I have a good sense of what each resource should be costing each month. If one seems to be going up, drill down and find out why.