The Cost of Scaling for Peak Demand with Head of Engineering Martin Jensen

In this episode, Martin Jensen, Head of Engineering, breaks down the true cost of scaling for peak demand. He explains the limits of autoscaling, when pre-scaling makes sense, and how tools like virtual waiting rooms are used to handle sudden spikes in traffic. Martin also shares insights on common system bottlenecks, performance trade-offs, and practical strategies for staying in control during high-demand moments like ticket sales, product drops, and popular registrations.

Martin Nørskov Jensen is an experienced engineering leader and Head of Engineering at Queue-it. With 15+ years in software development and 5+ years in leadership, he builds agile, high-performing teams focused on collaboration, trust, and engineering excellence.

Episode transcript:

Jose

Hello and welcome to the Smooth Scaling Podcast, where we speak with industry experts to uncover how to design, build and run scalable and resilient systems to ultimately provide a great customer experience.

I'm your host, Jose Quaresma, and today we have Martin Jensen with us. He's the Head of Engineering here at Queue-it, and we'll be talking about the costs of scaling for peak demand. We talked not only about scaling and autoscaling, pre-scaling, but also other tools and mechanisms that can be leveraged to complement that and to work together to address peak demand. Of course, we are a little bit biased, so we did get a bit into virtual waiting rooms and how they also can help in peak demand scenarios and what other things they bring to a system.

Jose

Hi, Martin. Welcome. It's great to have you here.

We're going to talk about costs of scaling and a few things around that area. But I would like to start by having a little introduction from you, a little bit of your background. How did you get to be a head of engineering at Queue-it?

Martin

Yeah, so I have about 20 years of experience in the IT industry, first as a software developer and later as a manager, engineering manager. I spent a good deal of time in the financial sector where I worked with a lot of different things, among them online trading, which also speaks into large-scale systems and high-demand systems. Later on I worked for another Danish scale-up company, also with some very heavy load on the back end. And that sort of got me here to Queue-it, where I'm now head of engineering and enjoying the work a lot.

Jose

We're happy to have you.

So, starting to go a little bit into our topic today, we're covering the cost of scaling for peak demand. So, we could start with: how do you think about peak demand?

Martin

Yeah, so most websites will have a fairly even demand throughout the day in whatever region that site might have most of its customers. But sometimes some sites will have these peak demand events where suddenly you see a lot of traffic on the site, much more than you would normally see on the site. So that's what we define as peak demand.

Jose

So not so much the absolute value of the traffic, but more like what's the difference from your baseline traffic on the website to, oh, now we all of a sudden got way up here.

Martin

Yeah, so if you look at the graph, you can imagine it's like a reverse icicle, you know, sticking out of the graph. That's what we define as peak demand.

Jose

So, if it's something more slowly growing and almost organically, then we wouldn't count as peak demand, right?

Martin

No, that would be normal growth.

Jose

Can you think of or share some examples of this peak demand?

Martin

Yeah, so I think one of the very classic examples would be, for example, ticketing sales for concerts. Usually these are announced to go live at, say, 10 in the morning in a region, and then people are going to jump into their browsers and start hitting the site maybe just before 10.

And that means you can actually see an increase in traffic going from the base traffic to X, 10, 100 in a very short amount of time. And that's within a minute or even, in some cases, within seconds. So that's going to definitely be felt like a very high peak demand.

In other cases, it might be commercial sites that have drops of specific things. It might be, you know, you could have social media announcements that would lead people into a specific commercial site. You can also have government institutions, public institutions have signups. So, there's a lot of different scenarios where we see these things happening.

Jose

And would you say that they are all always predictable, or would you have some peaks in some cases that are also unpredictable?

Martin

So, in some cases, I would say some of them are predictable. They would usually not be predictable based on data, so for example, traffic data, because there's not necessarily a pattern there, but they would be predictable in that a company knows when it has sales starting, for example, for a ticket sale. So, in those cases, it can be predicted.

In other cases, less so. So, for example, I think the example of a social media post suddenly, you know, attracting people to a specific website or web shop, that's something that can be difficult to predict, right?

Jose

Yeah, that's a good point. And so, if one is then trying to address peak demand, handle peak demand, what are for you some of the strategies to handle such events?

Martin

Yeah, so the most important recommendation of this is scaling. So, we recommend that whenever you have a website with a substantial amount of traffic, that you design it in a way so it can scale with the traffic.

That can be done in multiple ways. You can do it vertically, which means buying bigger hardware, or you can do it horizontally, which means you will scale out. We recommend, and this is also the most modern approach, to do it horizontally.

But that requires that you have the architecture for that. So, it's a little bit more of a complicated architecture, but it scales better. That would be the most important advice. It doesn't necessarily handle all situations, but it will definitely handle the slow growth, the steady growth on a site. So that would be our main recommendation.

Jose

And you just said it doesn't necessarily handle all situations. So, I guess what I'm hearing is it will not solve everything, right? So, what are some of those situations where the autoscaling approach, whether it's vertical or horizontal, is not enough?

Martin

So horizontal scaling is based on cloud architectures. And in cloud architectures, you can scale out quickly. Unfortunately, not quickly enough to handle all situations.

So, for example, you can imagine that these ticket sales, when they go live, as we talked about before, you will see an increase in traffic within seconds. An extreme increase in traffic within seconds. And adding servers to a web farm, for example, that is not done in seconds. That's done in minutes, right? So, scaling up takes time. There are also other pieces of the architecture, so caches, for example, which needs to be scaled up and prepared. And it's not something that you can do instantaneously, which is kind of what these kinds of events require.

Jose

And I guess there's also the other side because we talked about, are these peaks predictable or unpredictable? And I guess on the predictable side, there would also be an approach more towards pre-scaling, right? Is that something that you see working well as well? Are there any downsides?

Martin

So pre-scaling is something we can use. So, if you know that you have an event coming, for example, then of course you can prepare for it by pre-scaling.

The problem there is that, first of all, it can be expensive because you are scaling up and you're not going to have a constant load. So that means you will be optimizing for your max load, which might be only for a short amount of time. But if you don't have that capacity at that time, then you might crash your site. So, you still need to do it in a way.

So that means it can be costly, basically. The other thing is that it's usually easy to scale up, for example, your web servers, which is sort of the front line of your whole architecture. But let's say you're in a ticket sale scenario, then you will have many components within your architecture that need to take part of that. You might need people to log in. You will probably need people to pay for the tickets. so, there's going to be a lot of components, and you need to be able to scale all those and that's not always possible. They don't always have the same scalability features. Some of them might be third party components some of them might be based on older architectures.

So, there are many situations many companies that don't really have architectures that where all components scale equally.

Jose

What can you do in that case?

Martin

So, what we recommend is to, in that case, control the traffic by putting in a virtual waiting room. Basically, you need to understand what the capacity of your architecture is, not just on the front end, on the web service, but in the full flow of, for example, the ticketing situation. And then use the waiting room to control the flow of traffic onto your web servers and then aim for something that is maybe not at but just below your ideal capacity.

And that way, make sure that there's a constant flow and also that the users, while they're waiting to get in, are informed of where they are in the queue so that they don't panic, and they have a good buying experience also while waiting.

Jose

That's a good point. But we're talking about cost, right? So even if people out there, there are definitely cases where you could say that the full user journey in the infrastructure and is fully scalable. And I guess you could say limitless because you're in the cloud, so I guess there's a theoretical limit to the data center size, but they could just confidently scale all that they need.

How would you suggest people to think about the balance between autoscaling versus other mechanisms such as virtual waiting room? Is that a cost analysis on saying, well, even if I can scale all that I need and fast enough and pre-scale, but then should I do that because it might get impossibly expensive? Is that how you see that?

Martin

Yeah, I mean, I think those two technologies complement each other. You need both of them if you have peak situations. So, you want to, of course, make sure that you can scale because scaling is also a way of keeping resources down because you're not always going to have the same load on your web service or on your system.

So, you want to make sure that when you do have lower traffic, then you also scale down. Then when people get online and start hitting your web servers, then your web servers scale up. So that's something you would want in any case.

And then you would always probably want to optimize for this ideal load, where you you know that you can measure in your system. But then you would also want the waiting room to make sure that when you do have peaks then the waiting room kicks in and then protects your system from crashes and that can be both expected events and can be events that you didn't see coming.

Jose

Yeah, and in that case, you could have the flow, the traffic going into the website, but if you all of a sudden see a peak, then the waiting room can kick in and protect your infrastructure. That balance does make sense. What are the most common misconceptions or challenges when using autoscaling?

Martin

So, there's a few you know we've been we've been talking about the cloud for a long time and there's definitely some promises that came with the cloud that to some degree has been fulfilled but it's not magical. It can sometimes seem like it, but it's not.

There's you know at the end of the day somewhere in the cloud centers there's also you know physical computers. So, there are restrictions there are limitations that you need to keep in mind.

For example, when scaling up, it's not given that you can scale up limitlessly. There's going to be limits. You might need specific types of instances of servers that are in high demand, maybe.

There are also many components in an architecture. They might have different kinds of limitations. So, if you have some serverless processing going on, then maybe there's some limitations on there that's not going to scale easily or as much as you need to.

And there's still networks. So, there's still networking within the infrastructure that you need to also, you know, networks have limitations as well, it has bandwidth limitations. So, there are many things that can still go wrong and still limit your architecture.

In any case, you know, whenever you have an application like a web shop, for example, or any kind of software application, there's always going to be a bottleneck somewhere. It's just a matter of where's the bottleneck. And that also goes for systems in the cloud.

Jose

Yeah. And from a bottleneck perspective, I think you referred to it before, the external dependencies, right? I think you talk about payment systems, for example, some of the others. If you look more internally to what's within one's control in one's infrastructure landscape, do you have a feeling or experience of where the bottlenecks are more often?

Martin

Oh, I mean, I've seen a lot of bottlenecks in my time. It can be in many different places, right? So, let's go with architecture, for example, that have some older components. It might be systems that were built over many years, tens of years. That might be an old database somewhere. That can be a bottleneck in many ways. It might be other components that are critical to the business but encapsulate some kind of business logic that has been the core of the business for many years. It can be many things, and it's just always going to be somewhere in your chain of processing that's going to be something that will be the slowest point.

Jose

Often it feels like if you do want to increase the performance of your system for for peak traffic then it's often: you look at it, you find a bottleneck, and then try to improve the performance of that bottleneck, but then there will be another bottleneck right so there's I guess a never-ending exercise there yeah exactly exactly going all the way back to theory of constraints.

Martin

And sometimes these bottlenecks will show up in different situations, right? So, you might suddenly see behavior of database start changing when it reaches a certain load. Or it might be that suddenly we're hitting something in the database that's not optimized, and that's going to cause a bottleneck to appear. And that can then have a downstream effect or an upstream effect on other systems if this happens.

Jose

So, what I'm hearing you say, and correct me here if I'm wrong, is that autoscaling is an important part of your strategy for handling traffic into one system. It's necessary but not sufficient. So, it's to be taken as part of a whole, as a strategy on how to address this.

And trying to also zoom out a little bit that we talked about autoscaling and virtual waiting rooms as being a good complement to each other for handling peak demand. Are there other tools, mechanisms that you could see in one's infrastructure toolkit that should be used?

Martin

Yeah, so there's definitely some of the classic components that you will often see in a well-built website. So, you can have CDNs, of course, for your static resources that will move those resources closer to the users.

Jose

Can you explain to someone listening or watching that doesn't know what a CDN is?

Martin

So, the CDN is short for content delivery node, just to get that out in the world. Basically, what they are is that in the cloud environments, these will be located as nodes close to geographic locations, so that no matter where you are in the world, you're probably going to be close to one of these CDNs.

And then what you can do is you can put your static resources, so that would be images, for example, or files like static text files on those. So basically, all the content that doesn't need processing on the web servers. They can be then cached there and loaded and downloaded closer to the user. So that speeds up the whole web viewing process.

Then we have caches. For example, you usually have caches in front of your critical components in your architecture. So that can be, for example, if you have, let's say, you're a web shop and you have some item that's super popular that people search for all the time, then you can cache the search so that your web servers don't need to run that query every time. It's just to say, okay, these sneakers, they're super popular right now, so we can see when customers search for it, so we're just going to return this pre-prepared response. So those are definitely things that we also recommend that you use heavily in your architecture.

Jose

And it sounds like some good tools to bring in. Maybe not simple tools, right? In my experience, also, caching is something that is... There are some quite important details on how the caching is working. And if you're not careful with that, you think that you're caching content, but in the end you're not and your server is actually getting hit, right?

Martin

So, caching can be a little bit complicated, for sure. I think CDN is pretty simple. That's just storing things closer to people. On the edge, yeah. Caches definitely come with a little bit of complexity. But I think these days, the technology is definitely there. So, it's a question of using it in a good way.

Jose

You refer to virtual waiting rooms and how they can complement scaling in addressing peak demand. Can you tell us a little bit about how is it that they can be integrated in someone's infrastructure? And then also, if you think that there's other advantages than just helping controlling the flow and the cost.

Martin

So, let's say you have a website, like a commercial website, and you have your ideal load defined. Then what you do is you create an integration layer in your own infrastructure that once traffic starts going above your ideal load, you will start sending new traffic to this waiting room component. It's all handled at the HTTP level. So, there's no magic in it. You would basically forward HTTP requests to the waiting room component, and that would then handle the whole queuing functionality.

Jose

And that we've seen working best at the edge, right? So, we have different ways of setting it up, server-side, client-side, but I think our experience at the edge is where it works best.

Martin

Exactly, because that's where you have your traffic. It's also within your architecture, your system architecture. So that means you can't really or, you know, bad actors can't really cheat. So that is the best location to handle that.

Then what you do is basically because you understand your ideal load, you can also define the rate at which you want traffic to enter your site. Let's say you're in a ticketing situation. You have a rough idea of how fast people go in, find their tickets, and then pay for them and move on. So that means you can then set up your waiting room to just forward the waiting customers into your site. So, it's actually very simple, and it's all based on HTTP. And it helps you throttle the rate of customers entering your site.

Jose

Yeah. And one thing just to add from my side that I've learned, that was something that I hadn't thought so much about before I faced it with some of our customers, that flow and that optimal rate going through isn't always the maximum that the website can handle. And I think that was quite interesting to think about.

And when you think about it, it’s obvious. But if you have very limited supply of an item, if you're selling something in your web shop that you only have 10 of, and your website handles 1,000 people coming in—then it's actually not a great user experience if a thousand people are trying to reserve and buy those 10 items.

So, then the rate that you want to get in the website then is actually not limited by the performance of your website, but more the items that you have and you're trying to avoid having as many clashes as possible when reserving items. So that was a little click on my mind. Like, okay, yeah, that makes a lot of sense. but I was always thinking more about performance and how much is the max. So that was quite interesting.

Martin

Yeah, definitely. I mean, sometimes the bottleneck is not even the technology, right? It's the product.

Jose

So here we're talking about peak demand and how this complements CDNs and the scaling part. But what would you say are the other main upsides of having a virtual waiting room?

Martin

So, it means that you know how much traffic you're going to get on your site. So, there should be no surprises. That's the deal. And I think that's really what the main upside is here because as long as you're walking into unknown territory, a lot of stuff can happen that you didn't expect. These web setups are complicated. There can be a lot of different components. And testing for everything can be very difficult and costly. So, in that sense, you know what you're getting.

Jose

I really like this idea of being in control of the situation. Then I also think another important point with virtual wating rooms is thinking about human psychology and queuing psychology. So how people feel about queuing, this idea that you can be very transparent with people on how many people are ahead of you, what is the expected wait time.

That's something that studies, psychology studies shows over and over again that it's very important for people to know how much they have to wait. I think that's quite an added benefit of having a virtual waiting room and being able to control and predict that.

Martin

Oh, yeah, definitely. And you can also change the rate of how fast customers move on to the website, right? So, if something unexpected does happen, you can also use the waiting room to slow down traffic but still have customers waiting in this safe way.

Jose

And thinking about wrapping up a little bit, I think we covered quite nicely how to hand peak demand, focusing on scaling, both autoscaling and pre-scaling considering cost. Are there any practical tips that you would give to people considering how to control the scaling approach, anything that you would like to share?

Martin

So, make sure you have metrics in place, understand your architecture and your system, right? you can't go blindly into these things. So, you have to understand what's your max load, what's your capacity, and what is your desired load, which is probably going to be maybe some way from your max capacity because you don't want to go there. I think that's the most important thing, understanding and then having a high level of transparency. So, having good dashboards, good metrics where you can see for all your critical components, how are they doing, and react if something is out of the ordinary in intense situations, right?

Jose

Yes. Very good. Thank you, Martin, for coming by. It was great. I've learned quite a bit around scaling and how we focus on handling big demand.

Martin

Thank you.

Jose

Thank you. And that's it for this episode of the Smooth Scaling Podcast. Thank you so much for listening. If you enjoyed, consider subscribing and share it with a friend or colleague. If you want to share any thoughts or comments with us, send them to smoothscaling@queue-it.com. This podcast is researched by Joseph Thwaites, produced by Perseu Mandillo, and brought to you by Queue-it, your virtual wedding room partner. I'm your host, Jose Quaresma. Until next time, keep it smooth, keep it scalable.

[This transcript was generated using AI and may contain errors.]

Handle peak traffic with confidence, no matter the demand

Discover Queue-it

Choose a language:

The Cost of Scaling for Peak Demand with Head of Engineering Martin Jensen

Episode transcript:

Handle peak traffic with confidence, no matter the demand