In this episode, Mojtaba Sarooghi, Distinguished Product Architect at Queue-it, breaks down the design principles and distributed systems behind Queue-it’s virtual waiting room. He explains how the team handles massive traffic spikes, upholds strict first-in, first-out fairness on request, and maintains reliability at a scale that would overwhelm most platforms. Moji also covers the shift from server-side integrations to Edge compute, how Safety Net protects against unexpected peaks, and why simplicity and failure-oriented design drive every architectural choice. A clear, technical exploration of scaling responsibly when millions depend on your system.
Mojtaba Sarooghi is a Distinguished Product Architect at Queue-it. Moji was one of the company’s first employees, starting his journey as a software developer over 10 years ago. He is highly experienced with AWS services, product and architectural design, managing developer teams, and defining and executing on product vision.
Episode transcript:
Jose
Hello and welcome to the Smooth Scaling Podcast, where we talk with industry experts to uncover how to design, build, and run scalable and resilient systems. I'm your host, José Quaresma, and today we had a really insightful conversation with Moji Sarooghi, a very dear colleague and a distinguished product architect here at Queue-it.
He is actually our first returning guest, and today we discussed in detail our virtual waiting room—how it looks from the visitor’s perspective, how does the journey looks like, and also what’s happening behind the scenes. We covered the services and technologies powering the waiting room and our system, and what gets Moji excited about the future—what he and the team are looking forward to improve and setup in the future.
Enjoy.
Welcome, Moji. Actually, welcome back to the podcast—you’re our first repeating guest.
Moji
Okay, nice. Nice to be here. And it was good to be here before as well.
Jose
So hopefully it’ll be good today too.
Moji
Yes, thank you.
Jose
So today we're hoping to get a little more into the details of how a virtual waiting room—our virtual waiting room—works. And I’d like to start by asking some of the basics.
When a visitor is trying to access a protected website and goes through the waiting room, what’s the flow?
Moji
The whole flow from the visitor perspective is that the visitor tries to buy something from a customer website that’s protected by what we call a connector.
And what this actually is, is a piece of code that runs—depending on the scenario—either on the EDGE or on the customer’s website. It really depends on the setup.
This piece of code will intercept the request, and based on the information and context of that request, it will make a decision to redirect the user to the waiting room or not.
That’s the technical side. But from the visitor's perspective, they go to the website and then see that there is a waiting room. They get information about their place in line—what to expect in the next few minutes.
We’re proud to say we offer a good user experience: it’s fair and informative. The user can clearly see what’s going to happen.
From the technical side, we're also proud to say we have a lot of connectors that can be integrated by the customer very quickly. We call it instant connector integration, and I think that’s one of the really interesting things we offer at Queue-it.
Jose
Awesome. And on that connector side—you mentioned Edge and the customer website. Can you tell us a little bit about the main differences?
Moji
Sure. Yeah, I remember when we started moving more toward the Edge. Initially, we had the integration on the server side. I think one of the first ones was C#. So we had a piece of code the customer would put on their server to integrate with us.
Then, six or seven years ago, new technologies came along—CloudFront, Edge Workers, Lambda@Edge—and all the other CDN providers started offering compute on the edge. We thought, “Oh, pretty cool, we can start leveraging this to make it easier for our customers to integrate with us.”
It’s good for them, and it’s good for us. Why? Because maintainability is easier. Different customers can all use the same technology to integrate.
Also—and this is pretty important—the whole integration happens at the edge. That means if there’s a spike in traffic, it doesn’t hit the customer’s server. It gets intercepted closer to the visitor, improving performance.
Another part I really like is how easy it makes integration. Imagine you have a really old web server with really old code. Integrating on the server side would mean involving a lot of development teams. Questions come up—"Is it a Java server?" "Do I need to use our C# server?" "Do I need to use actions, filters, middleware?"
But with the edge, you don’t touch anything on your server. You just put the Queue-it code on the edge, and everything works.
When we say “on the edge”—technically speaking—it means CDNs now have the ability to run compute. A few years ago, they didn’t. But now, there's a proxy on the server where you can execute code, and that’s exactly where we run our lightweight Queue-it connector.
Jose
Nice. I mean, it makes total sense with the edge side—from a performance perspective and also just simplicity, right?
You also have less variation since you're not dependent on the existing architecture on the server side. So you keep it much more clean and separate.
Moji
100%, yeah. There’s all that—security, performance, being closer to the end user—on one side. But also maintainability.
I remember some customers had, I don’t know, thousands of developers who had worked on their websites—not exaggerating. There were a lot of teams involved in producing their product, their shop, different parts of the webpages. So getting hold of all those people to create a server-side solution is pretty hard. And if something breaks and you want to help them, it’s hard.
But when we say “on the edge,” it means we’re totally decoupled from their technology and development style.
Jose
And I’d say an important part of this journey is also having confidence that users cannot bypass the queue. So how do we enforce that? How do we make sure the queue can’t be bypassed?
Moji
Yeah, I don’t want to go into too much detail on that.
Jose
That’s fair.
Moji
But at Queue-it, when we started developing the connector, we began with client-side integration. What we were doing back then was using JavaScript on the client side to understand if the user had passed the queue or not.
That was easy to bypass. If you search online for how to bypass a queue, people find articles and think, “Oh, we can bypass it.” And we say, no—that’s a different integration we’re talking about.
The one we’re promoting now is the server-side integration. And how we do that is with various hashing and verification techniques on the request itself. We check whether a request has gone through the waiting room before. If it hasn’t, we determine whether it should go through it, and based on that condition, either let it go to the origin or redirect it to the waiting room.
We know about bots—there are many types—and you guys have done quite a few podcasts about bot protection here at Queue-it, and how we help with that. We also know that people in the market try pretty aggressively to bypass us.
But we have several layers of protection, and we can enable them based on the scenario. The first and most basic one is server-side protection, where the code can't be manipulated and the request context is intercepted server-side instead of client-side.
Jose
And I think when you're now saying server-side, you're including both what we talked about as the server—kind of the connection server—but also edge. So the edge connectors, you're also putting them in that...
Moji
100%, yeah. When—in the broader perspective—when we say connector, we mean the connector that does the task of verifying the user on the client side, and the connector on the server side.
And then when we move to the server side, it’s about where you place that code. Do you put it on the edge, on the proxy level, or do you put it closer to the origin, which is the server?
Jose
Got it. And I think maybe just to add there, something we also see sometimes with solution architects and the team, is that sometimes the bypassing isn’t necessarily a vulnerability in the system.
It’s just that when you design the user journey and say, “OK, there are these two entry points to the part I want to protect,” and you protect those entry points—but then you actually forgot a third entry point.
So in that sense, the ones that were set to be protected are being protected, but then the system could still be bypassed if users go through a third entry point that’s not protected by the queue.
So that’s kind of a bypass of the design as a whole, but not a bypassing of the system itself.
Moji
From a vulnerability perspective, right. What you’re referring to—and you know this better than me—is how we set up.
That’s part of the story as well. You might have a technical limitation, maybe a vulnerability, I don’t know—but also something related to the setup. That’s something we see quite commonly, and that’s also part of our job: to guide our customers in the right direction.
And again, there are all these differences. Sometimes you see something on Twitter or Reddit, and then when you investigate, it turns out, “Oh, there was a misconfiguration here,” or, “This kind of protection needs another level for this specific use case.”
So the base is there. The base is: it needs to happen not on the client, but on the server side.
The second part is: we need to do a good setup—kind of plan this properly.
And the third one is: do we need customization? And we’re good at that. I think we are pretty good. We have a strong team on the setup, and also customization based on the specific needs of the customer.
Jose
We’ve been talking about the waiting room, but we also have the peak protection mode, right?
So users could be accessing a website, but there’s no active waiting room—and then it’s up to the system to decide and evaluate: should I be forwarding users to a waiting room, or should I just let them through to the website?
Can you tell us a little bit about how that part works—how is that decision made?
Moji
We call this feature Safety Net. It means that it works as a safety net.
So, again, in parts of the world, you don’t expect traffic to happen. You don’t have a drop, you say, “OK, it’s my normal business day, and everything is working.”
And on a normal business day, I can have thousands of visitors per minute visiting my website. You can do a calculation—what does that mean based on the request? What does it mean based on the session?—and you end up with that number.
You do a setup on your Queue-it or GO Platform, and you say, “OK, this is what I can handle.”
And then this will work in the background. What that means is, the connector will do its logic—or its magic—to understand how many requests, how many active visitors you have.
And based on the setup, if there is a spike in the peaks—why could that happen? I don’t know. Suddenly there’s a post on social media because marketing wanted to promote a specific product, and there’s a peak of visitors coming to the website—then the queue will be activated automatically.
So in this scenario, for our customers, it’s like insurance. We like to use insurance word for that one.
You are in control of your traffic if something unexpected happens. And we have quite a lot of our customers, specifically in the retail market, that use this feature.
Jose
Financial services as well.
Moji
Exactly. And also, I see more interest in the airlines and those kinds of markets where there’s stuff happening without people being fully notified.
Jose
And you as a customer can also have a setup that has both, right? So on your website, if you're doing a drop for a specific product, you could have a scheduled waiting room—that’s the other one we were talking about in the beginning, right? A waiting room that will always be visible, and you're expecting that product to drop and the sale to start at 10 a.m. But then you can still have the 24/7 peak protection—also known as Safety Net—on the whole site as well. It's very configurable. You can fine-tune how you want to queue people, where, and when.
Moji
Yeah, that's really cool. Queue-it’s solution is pretty flexible from that perspective. And we've been in this business for the last 15 years, I think. We've seen quite a lot of different cases. I remember one specific case where the setup you’re mentioning was discussed pretty heavily.
The scenario was that the customer had a specific drop on a specific path of the website. Imagine you're selling a specific product and it’s at /product/XYZ. But they were worried that when this goes live, people won’t just go to that specific page—they’ll flood the rest of the website too. So what they did was set a relatively low outflow limit for that specific path. Then they added a scheduled waiting room and created a Safety Net, or 24/7 peak protection, for the whole website.
That way, they could manage the peak traffic, even if people didn’t know exactly where to go and just hit the homepage instead. They could protect against crashes or disruptions.
Jose
Thanks. And I think when I think about what we do, there are a couple of things that at first look pretty simple, but then, when you have to make them work at scale, things get complicated. I think it's quite impressive that we manage to do that at scale.
So I wanted to just touch on two of those things. One of them is the first-in, first-out principle. That’s one that—if you're writing an algorithm at home, not sure how many people do that—but if you do, for a small-scale case, it's simple. This person comes in, I put them at the end of the queue; the next one comes in, same thing. And then you process them in order. That’s pretty straightforward.
But at scale—in the thousands or millions—it gets super complex. Can you tell us a little about how we make that work?
Moji
Yeah. You know, the whole art here is that we're talking about traffic in the range of a couple of million per minute. Actually, more than a couple—tens of millions. And we’re saying that what counts as a DDoS attack for some businesses is just daily business for us.
So the system we design isn’t just about first-in, first-out. It’s a distributed system. It can scale really fast. It’s fault-tolerant. And at the same time, it maintains that first-in, first-out logic.
When you put all of that together—plus aiming for a good user experience and being informative—it just becomes a little more complex overall. But I think I talked a lot about this in the previous episode—the simplicity. We try to achieve all this by creating simple services that build up to this whole complex system.
Jose
And you mentioned the information. I think another part that I was… So I mentioned two things: the first-in, first-out part, but then also the expected waiting time. That’s also one that, if you think about it at a very small scale, is pretty easy. If you have 10 people in the queue and you have one per minute coming out, you kind of know how long it is for the 10th to wait. But again, back to a distributed system, making that work at that scale is quite impressive.
Moji
You have a point on that one. The main thing here is that it’s a distributed system that can scale, and one node can go down or a new node can come up. How you do a database lookup, how many times per second you do a database lookup, how you do your caching system, and then whether you do the lookup if some information is in the cache, how much TTL the cache has, what kind of database you use.
I think one part of our system—if I want to be really technically in-depth—is that we use DynamoDB in AWS. And it is pretty neat technology. It can handle a couple hundred thousand TPS—transactions per second—distributed requests, and it can scale pretty well. That is still one of the main technologies that helped us achieve this, beside all the autoscaling and design, but having this kind of technology is pretty helpful in that picture.
Jose
You mentioned that we handle the amount of traffic that, for some customers out there, is technically a DDoS for them, and we handle that every day, right? So can you share a little bit on—from a design perspective, from architecture—how we can handle that? What have we done? What do we have in place that allows us to handle that volume?
Moji
I think, again, I go back to that one. I think the first rule is to create your system simple. If you cannot call somewhere else out-of-the-box that is executing a logic, don’t do that. Don’t try to create a complicated architecture where you call one node, and the other node calls another node. I don’t want to get into the fight between coupling and cohesion and the whole microservices thing, but keeping in mind that your system needs to be simple helps quite a lot.
And the whole “design for failure” part—that was also another episode that one of my colleagues did—is pretty cool. Again, the motto in this design is to use technology when it’s needed. Not just because there is a tool, or SNS in AWS, and it looks good. No, that will not do it.
And then also, I would say kind of a layered system in a way that the layers are needed. You know, we are also using WAF in AWS when we need it. But at the same time, we have our own protection related to how many requests we get. And then you have all this scalability and being stateless. Yeah, there are all these different factors in the design.
I would go first by saying my answer is simplicity, and having the design mindset that you’re creating a system that needs to be available 24/7. I didn’t talk about this earlier, but I think this is also one of the most important parts of what we offer our customers. We have a system that is reliable. And usually in the development team, the product team, whenever we talk, we say that first thing first: the system needs to be available 24/7. And reliable. Availability and reliability—the first thing.
And to achieve that, you put this on the wall, and then you say, “To achieve this, let’s find tactics and strategies. Do a simple design. Create it in a way that it can handle overloads, fallbacks, think about failures, and use the technology that is needed to handle the stuff without trying to complicate the solution.”
I don’t know how much I answered your question.
Jose
No, I think that was a good answer, thank you.
So now, I guess from a user journey perspective—we talked a little bit about our visitor journey, right? Kind of trying to access the website, being redirected to the waiting room, and then they’re there in the waiting room, waiting.
And we talked a little bit about that—the importance of using queuing psychology and providing as much information as possible to the visitors that are waiting.
But at a certain point, they’ll then be redirected to the website, right? Can you tell us a bit more about that part?
One of the things is how to determine the ideal flow—that’s what we usually call outflow, right? That’s outflow from the queuing perspective. So what is the outflow from the queue, the waiting room, to the website?
I guess there are two questions there. One is, how do we determine that? And second, how do we enforce that? Again, it's pretty easy to design a super simple system that enforces that—in a multi-server distributed system, it gets much more complicated.
Moji
From the informative part—user experience and user flow—as we discussed, the user knows they want to buy this specific product that is pretty hyped. He or she will go there and get redirected to the waiting room.
We have this—again, something we’re proud of—this custom theme. You design your waiting room with your brand, with all the information. You say, “Sorry” or “Not sorry—we’re happy to have you here. You’re seeing this page because you’re waiting for this specific product.”
That’s something static. As a customer, you design your system, and then we add some dynamic information to it: What is your expected wait time? How many people are in front of you?
Customers can add some information—we call that messaging. And we usually tell our customers: put in good, informative messaging. Say to people, “Don’t worry, we have stock available,” or “You’ll get what you came for,” and, “If not, we’ll find another way to reach you.”
So this is the whole good user experience. Again, I mentioned the static part, the kind of half-dynamic part that the customer brings and adds messaging, and the totally dynamic part related to how the waiting room is reacting—like your place in line, wait time, and all that—with a cool design, colors, and stuff like that.
That’s the user side.
On the backend, what we do is—this distributed system, every one of the servers, by sharing messages—you could say in some parts using a database, using cache—they communicate together with different technologies.
I can say, in some states, we’re using a database that’s pretty fast, like Dynamo, to allow different nodes—you could say the waiting room flow managers—to communicate with each other.
They report: “I’ve received this number of requests in the last minute,” or last 10 seconds, or last second. And then, based on how the outflow is set up, when all the nodes get this information, they make a decision—what should happen in the next 10 seconds? How many should we redirect?
And there’s a lot of interesting stuff in these challenges. One of them we call a no-show—that there might be, for any reason, a case where when we want to redirect the visitor, the visitor is not there. Maybe they switched tabs, or they’re looking at something else.
At the same time, we want to get the flow back to the origin—we don’t want to make people wait more than what is needed. So if the customer says, “My outflow is 500 per minute, I can handle that,” then in our system we try to keep it at 500 per minute.
So the servers need to react based on the requests they get from the client side. They talk to each other and make a decision: “Okay, I want to redirect this number of people.”
We have some cool naming—we call it open window, which means which queue numbers are allowed to get redirected in the next 10 seconds, or whatever that interval is.
And then the servers talk—one part of the system uses NoSQL Dynamo, and another part of the system can send messages between nodes to make the decision about what should be redirected.
And this should happen pretty fast—we’re talking about seconds and minutes in this scenario—in a reliable way as well.
Jose
And so, when the user is redirected, how do we then kind of—how do we pass on that information?
So I think—is it—we have something, the Queue-it token? Can you tell us a little bit about that?
Moji
Yeah. You know, in this scenario, what we do is put some information—we use the internet, because there’s nothing hidden.
The user or visitor gets redirected to the queue by a 302, and then they will carry what kind of waiting room they want to see, and when it’s their turn.
We’ll carry back what was their queue ID, and also the time of the redirect, and some other information. And we do a hashing of that information.
So one of the things our connector does is verify that the request—or the visitor—has passed the waiting room. That’s what we call the Queue-it token, which is hashed.
Inside that token, we can add all kinds of information related to that visitor, so we can recognize that this was the same visitor who was waiting in the waiting room.
We have different tooling for that, to make sure the person that started the journey is the person that lands in the connector—and the connector verifies that. It could be stateful or stateless.
The server can verify it, or the connector can verify it on the edge or on the origin, depending on the setup. Or it can communicate with us in a more stateful manner—asking, was it the same visitor or not? What was their experience when they got to the waiting room, and when they were redirected back?
So in short, we have both static verification and also dynamic verification of this visitor during the whole journey.
Jose
And then there’s, I guess, one step forward—is that the user does get a cookie that keeps on being used further onwards, right?
Moji
Exactly.
We also sign some kind of signature—we call it that. Again, we use standard internet technologies, whatever is part of the request.
What we use here leverages the cookie or session, and also some kind of signature for the user. We do a static verification, and we do a dynamic verification of that request and cookie—or token—depending on the scenario and the system state.
So, in kind of a simple way: as a visitor, you get a ticket. That ticket is verified by the Queue-it side. You get the ticket, it’s passed to the connector, and the connector verifies it—in a static way or a dynamic way—based on the scenario.
Jose
Nice. Thank you, I think that was a good overview.
Is there anything you’d like to add, kind of when we look at the visitor journey? Was there any question I forgot to ask?
Moji
No, I think actually we covered quite a lot. We went through the whole user experience, the whole integration part, and a little bit touching on the development, the technology, and at least some of the architectural practices that we have.
Again, I mentioned this—and I’ll mention it again—I think it’s still the most important thing: reliability, reliability, reliability.
That is what we promise, and what we keep in mind when we design the system.
Jose
So Moji, now I’d just like to ask something—kind of, I think, as I said, it was a really good overview.
Can we talk a little bit about—this might be a bit random—but can you share with us some of the specific services that we’re using on AWS, and kind of why and how we put them together?
I know it would take forever to go through all of them, but what are the main things that come to mind?
Moji
Sure. From the AWS perspective—if I just want to name technologies that we use—we use Application Load Balancer, we use EC2 instances as compute, we use ECS as kind of a Kubernetes or Fargate-style container hosting orchestration.
We use SQS. We use SNS. And, as I said, the most important one that I really personally like is DynamoDB—we use that heavily.
Then we are using Edge—CDN with CloudFront. We use Lambdas as well. We have a connector on Lambda@Edge—it’s not part of the heart of the queue engine, but as a connector, we use that. Actually, I think it was our first Edge connector... or, no, I don’t remember. But it was something that existed back then.
We use WAF technology, which is at AWS. What else... we’re using all the different stuff related to permissions and compliance. Also Athena—all the different kinds of technologies.
But the main thing here is: again, use the technology that you need in a way that helps you achieve what you want.
So when I’m in a meeting, I don’t say, “Oh, did we use this technology?” or “Did we connect this Lambda to that?” No—we say, “What do we want to achieve?” and then go and pick what helps us achieve that.
And I think we do that in the right way. We start by identifying what we need to solve, and then find the tool from the AWS toolbox to solve it.
Jose
And looking forward a little bit—is there, I don’t know if there’s... I would guess there’s a lot of stuff on your mind all the time—but are there a couple of things that are more highlighted when you think about the future, the next few years, that you’re considering?
Moji
Yeah, I think we are pretty excited and working on this cloud-agnostic or multi-cloud solution.
It’s becoming a really hot topic. Based on all the compliance, and all the different cloud providers, how can we design a system that is portable?
And Kubernetes helps us do that. But then there are all the different tooling pieces that we get out of the box in AWS—how can we create a system that’s plug-and-play? Where you plug in a new tool, and it just works?
That’s one part.
Also, tool choosing—is it open source, is it managed or unmanaged? There’s a lot of discussion there, but it brings some really nice challenges.
I think we’ll talk more about those challenges later. I know one of my colleagues already talked about it—but that transition from managed to unmanaged to open source is pretty interesting.
That’s one part.
We’ve recently—I think we’re finalizing it, or maybe already GA—Bring Your Own Proxy at Queue-it. That means our customers are able to set up their proxy in front of the waiting room.
I really like this kind of feature. It gives our customers more flexibility—the traffic goes through their proxy before hitting us. And that opens up quite a lot of possibilities.
I know we already talked about HEP in one of our podcasts. That’s one of the things our customers will be able to do—it means they can put their own specific bot protection tool in front of Queue-it’s waiting room.
So whatever tool they use for their user journey, they can also apply it when a visitor goes through the queue—which is pretty cool. I’m really proud of that release.
Also, we are thinking more and more—again, talking about a North Star vision—we’re doing some research into how much we can leverage more on the Edge, related to the queue core, not just the connector.
And I think something interesting will come out of that as well.
Jose
Exciting. I’m sure we’ll have you on the podcast again—maybe when we get to that decision, or after that project—it would be great to hear more about it.
I think we covered a lot. We went through the journey, got a little bit into the nitty-gritty, and got some details out of you—and some thoughts and considerations as well.
So I think it’s a good place to wrap it up. Thanks so much for coming.
Moji
Thank you for your time. Happy to be here, as always. Thank you.
Jose
And that’s it for this episode of the Smooth Scaling Podcast. Thank you so much for listening.
If you enjoyed, consider subscribing and perhaps share it with a friend or colleague.
If you want to share any thoughts or comments with us, send them to smoothscaling@queue-it.com.
This podcast is researched by Joseph Thwaites, produced by Perseu Mandillo, and brought to you by Queue-it—your virtual waiting room partner.
I’m your host, José Quaresma. Until next time, keep it smooth, keep it scalable.
[This transcript was generated using AI and may contain errors.]