Cloud repatriation: The conditional promises of cloud computing
A decade ago, major organizations across the globe were rushing to embrace cloud computing. Today, many are transitioning back to on-prem. So what’s behind the cloud repatriation movement? What are the benefits and limitations of cloud and on-prem? And why is shifting back to on-prem a good decision we won’t be making?
Drawn in by the promise of limitless scalability, cost savings, and faster time to market, companies and organizations across industries have rushed to embrace cloud computing over the past two decades.
But there was a fundamental misinterpretation of what cloud can and cannot do. Running everything on the cloud isn’t the best fit for every organization—and we’re seeing some organizations realize that:
- 42% of U.S. organizations are considering or have already moved at least half of their cloud-based workloads to on-premises infrastructure
- 94% of IT leaders have been involved in a cloud repatriation project in the last three years
But what’s driving companies back to on-prem? Should you be doing the same? And what other ways can you bring down or control cloud costs?
Queue-it works with some of the world’s biggest online businesses on their busiest days. This gives us a front-row seat to the benefits and challenges of cloud-hosted infrastructure. And while we’re unlikely to ever leave the cloud—because it’s a perfect fit for our needs—I think the cloud repatriation movement is an overdue acknowledgement that cloud computing has key limitations that, for some companies, outweigh its potential benefits.
The bold promises that drew companies to cloud computing depend on important conditions. But a lot of companies ignored the conditions and just saw the benefits. So let’s look at the three biggest conditional promises of cloud computing—and where so many companies went wrong.
The main reason for the cloud repatriation movement, like the main reason for most business decisions, is to save money. 53% of enterprises say they’re yet to see “substantial value” from their investment in cloud.
When cloud computing emerged, many organizations rushed to embrace it for its cost-saving benefits. But the cost savings of cloud are conditional.
Cloud computing can save you money if:
- You optimize your infrastructure specifically for the cloud
- You regularly audit cloud computing costs
- Your traffic is relatively predictable and you can stay within savings plans
Cloud computing is suited to some use cases, but not all. And even if it is right for your business, you need to optimize your use of it and carefully monitor your expenditure.
Cloud computing bills are infamously difficult to understand. Small changes can drive huge costs. It’s often tough to tell where exactly your money is going. And even small variations in usage can drive huge variations in expense.
That’s why 42% of CIOs and CTOs say they struggle to manage resource usage, and about 32% of cloud budgets are wasted.
But the big misconception in the transition to cloud is that you can simply “lift and shift” on-prem services to the cloud and expect huge savings. You need real experts, a lot of developer time, and significant system optimization to benefit from the cost-saving potential of the cloud.
The second misunderstood promise of cloud is that you could simply click an “add more servers” button. Or better yet, that your servers could automatically scale up and down to perfectly meet demand.
While cloud lets you scale much more and much faster than on-prem, it also tends to create more complex systems. And the more complex your systems are, the more complex scaling them becomes—especially during sudden traffic surges.
That’s why it's more accurate to say: Cloud can improve scalability if you develop infrastructure that’s purpose-built for scale and successfully address tough-to-scale bottlenecks.
We work with hundreds of cloud-based businesses and organizations that struggle to handle sudden traffic peaks without traffic management tools like a virtual waiting room. Our customers tell us that autoscaling doesn’t react fast enough, they’re limited by third-party bottlenecks, or that scaling for sudden peaks is simply too expensive.
While cloud makes scaling significantly easier, true scalability doesn’t come from moving to cloud alone. It comes from testing, optimization, and changing or upgrading third-party service providers. It requires developer resources, time, and money.
RELATED: 3 Autoscaling Challenges & How to Overcome Them With a Virtual Waiting Room
Alongside cost-savings and scalability, security and data privacy is often viewed as one of the big benefits of cloud computing.
By switching to cloud, businesses assumed they’d get access to major companies like Amazon and Microsoft’s security measures and could let them deal with compliance for data privacy.
This is true, to some extent. But the reality is that security and privacy aren’t things you can just “offload” to a third party. Many companies have found that ensuring security and compliance through cloud computing is just as, if not more complicated, than for on-prem.
At Queue-it, for example, we’ve had to set up dedicated AWS data centers in multiple regions to ensure compliance with local regulations.
The complexity and geographical distribution of cloud services can make it challenging to have full visibility and control over system access, data processing, and data storage. And because cloud is so accessible, you can host a service with relatively little knowledge or experience about things like security and privacy, which can backfire as you grow.
If you have an on-prem data center, by contrast, you know exactly where your data is being processed and stored. You know all the ways of accessing your data and can secure and protect it as you see fit. You can adjust your security and privacy measures to align with your local regulations, risk tolerance, and organizational needs. That’s why many public sector and financial institutions choose (or are required) to host certain sensitive data exclusively on-prem.
Again, the security and privacy benefits of cloud are not black and white. “Unexpected security issues” was the #1 reason companies decided to move cloud-based workloads on-prem. But at the same time, 93% of IT leaders agree that cloud technologies can help prevent cybersecurity incidents. Effective security and fully compliant data processing on the cloud requires—you guessed it—time, resources, expertise, and money.
RELATED: Security, Privacy & Availability at Queue-it: Frequently Asked Questions
The choice of cloud, on-prem or hybrid is one that involves many factors—from scalability, to costs, to expertise and IT resources, to security and compliance.
As I’ve just laid out, cloud isn’t a cure-all. The benefits don’t come by default. They don’t come without work. And they don’t always apply to every company or workload.
When companies like Dropbox, Basecamp, and Ahrefs reconsidered how they hosted their services, they opted for either hybrid or completely on-prem.
But that’s not going to be the right choice for everyone. Cloud computing was and continues to be a huge enabler for many businesses. It gives the opportunity to cut costs, boost scalability, and reduce risk and improve compliance.
Just like the move to cloud, the move away from cloud requires careful consideration to determine if it’s right for you.
For us at Queue-it, cloud was and continues to be the best way to host our SaaS solution.
Our virtual waiting room is built around and for extreme variation and peak traffic moments, which is only possible through cloud computing. We’ve operated fully on the cloud since day one. And that’s not going to change anytime soon.
Take one look at the daily traffic through our platform and you can clearly see why we’re a perfect match for cloud-based infrastructure.
The number of visitors passing through our waiting rooms can go from 106 million one day, to 84 million the next. From 51 million in April, to 116 million in August.
We don’t have a “normal” day. Our job is to handle traffic spikes from hundreds of companies across the globe. And we need cloud to achieve that.
Queue-it’s platform is built to handle traffic spikes of over 50,000 visitors per minute with no human involvement. Our load tests simulate a million virtual users hitting our service per minute, without any errors or downtime.
But this level of scalability and reliability isn’t something that’s easy or feasible or worthwhile for most companies. We’ve spent and continue to spend a lot of money and time to make it possible. Our AWS bills are among our biggest expenses. And our developers spend about 20% of their time working specifically on the scalability, reactivity, resilience, and reliability of our platform.
We’ve made a variety of unique and expensive choices that make our platform more reliable and scalable, but which wouldn’t make sense for a “normal” business. These include:
- Multiple availability zone architecture: In each region Queue-it has data centers in, it is deployed across at least three availability zones. This allows Queue-it to remain available in the unlikely event of two availability zones failing at the same time.
- Cell-based architecture: Each Queue-it customer has at least one dedicated subdomain and load balancer entry point which handles traffic and distributes load. This ensures potential attacks on one customer can't impact other customers, and enables us to incrementally deploy new releases.
- Aggressive autoscaling: Our autoscaling is configured to be highly aggressive, which can increase costs, but allows us to scale extremely fast when traffic starts to rise.
- Regular load testing: We load test regularly with huge volumes of traffic to ensure tests are always up to date and meet the expected requirements.
- Redundancies: In every spot where there’s potential for failure or insufficient resources, we have redundancies in place.
- Decoupled architecture: Our admin platform is completely detached from our end-user-facing queue service. This is more expensive than running them together, but it limits the blast radius of any potential failures.
- Simplified architecture: We prioritize simplicity in our architectural approach, because the simpler our architecture is, the simpler it is to scale. All systems are built to minimize dependencies and single points of failure, which may make us slower moving than some companies, but lets us maintain extremely high availability.
- Minimal third-party usage: Aside from AWS, our service relies on very few downstream services to operate—to minimize the risk of third-party outages.
The cloud repatriation movement, to me, is an important balancing of the scales. Like any other big trend, companies and orgs across industries rushed to embrace cloud computing without pausing to ask if it was the right fit for them.
Now, almost 20 years after the commercialization of cloud computing, people are finally asking the right questions.
Cloud is like any other major tech development—AI, blockchain, the internet of things. It’s a huge enabler for many businesses. But for others, it’s just another fancy trend that creates more work and larger costs without any real value-add.
That’s why major companies like Dropbox, Basecamp, and Ahrefs have pivoted to a hybrid or completely on-prem approach—and have reportedly saved millions by doing so.
But for us, we know that Queue-it couldn’t exist without the cloud. We couldn’t handle the kind of crazy unpredictable traffic spikes and huge traffic fluctuations we see on a weekly basis.
We know cloud is right for us. We’re conscious of the costs and do our best to keep them down. But we have extreme standards for reliability and scalability, and cloud enables us to live up to those standards.