Load Testing for Peak Traffic with Radview's CTO Yam Shal-Bar

In this episode, Yam Shal-Bar, CTO at Radview, discusses the the evolving world of load testing and how it's used to prepare for peak traffic. He covers the most common system bottlenecks, the importance of iterative testing, and strategies for accurately simulating user journeys. Yam shares insights into common misconceptions around testing, best practices, and trends like AI for test analysis and API-level testing. Whether you're launching a new web app or tuning an existing one, this episode is packed with practical advice for testing systems for resilience and scalability.

Yam Shal-Bar is an experienced development leader and software architect with over two decades of expertise in managing distributed teams and delivering enterprise-scale software solutions. As CTO at RadView Software, he drives the company’s technical roadmap and leads the development of core products, including RadView’s performance testing platform and web dashboard. Throughout his career—including leadership roles at British Telecom, Reliance Infocomm, and Vodafone—he has championed Agile methodologies, DevOps practices, and CI/CD pipelines to deliver robust, scalable systems.

Episode transcript:

Jose
Hello and welcome to the Smooth Scaling Podcast, where we talk with industry experts to uncover how to design, build, and run stable, scalable, and resilient systems—with ultimately the goal of providing great user experiences. I'm your host, Jose Quaresma, and today with me I have Yam Shal-Bar, the CTO at Radview. We'll be talking about load testing and how to prepare for peak demand.

We discuss, among other things, what the usual suspects are when it comes to bottlenecks in Yam’s experience. And we also touch upon the impact of AI in load testing. Welcome, Yam. Welcome to the Smooth Scaling Podcast.

Yam
Thank you. So nice to be here. Thank you. Thank you for inviting me.

Jose
It's great to have you. So could we start by having you tell us a little bit about yourself and also about Radview?

Yam
Yes, sure. So Radview has been around for many years—almost 30-some years in the business of load testing. So that's what we do. And I've been there for almost 20 years now. Yeah, next year it's going to be 20.

We do load testing. This is our focus. We have our product, WebLOAD, which simulates user behavior and then scales it under load. We measure the behavior of your system and help you find the bottlenecks and make it run smoother.

Jose
You said it's been 20 years at Radview. Were you always in this space, also before that?

Yam
No, no, actually I moved into this as a new space. I was always in software development, but for different things. I worked in different places around the world. I lived in India, I lived in London—yeah, so I’ve been around. But the load testing part started for me at Radview.

Jose
And it sounds like it was a good move for you. You said it's almost 20 years—congratulations for that, by the way.

So you mentioned simulating actual user journeys. Can you tell us a little bit about how you do that? How does a customer do that using WebLOAD?

Yam
Sure, yeah. So like every good thing, you start with planning. The first step is—you need to plan. You need to try to map out what your users are going to do.

There’s always a difference when you have an actual system running in production—you can see what it's doing. But sometimes users are planning to launch a new application that nobody has ever used. That’s a different type of challenge. You need to guess what users will do. But it's still the same process: gather whatever information you have—either from your own systems or from similar sources, similar websites, what it's going to look like. And you want to map out the journeys that your users will take. Yeah, so that’s the basics.

Then you drill down into creating scripts for each one of these journeys. I usually prefer—or recommend—you to start small. Don’t jump ahead and try to make your best assumption of how everybody is going to use it, running 20 different scripts at once. People have the tendency to do that because it’s the most realistic, and they’re right. I do recommend that approach be your last test.

But for the first test, just take the most common scenario and see how it works. Because it’s just so much easier to read and understand the statistics that way. When you have one simple scenario, the behavior is pretty expected. And when it’s not, you can say, “Oh, this graph is not supposed to do that.”

But when you run 20 different things at the same time, it’s hard to say—maybe that’s predictable, maybe it’s fine. So that’s why I recommend starting simple. Then add another simple script, and another. Eventually, you get to the bigger ones.

Jose
Okay. And of course, we at Queue-it are big fans of load testing and performance testing. We have a lot of customers where the focus is on peaks—right? Sudden surges in traffic. Is there anything special, or specific, in how to approach that from a load testing perspective?

Yam
So it really depends on how your application works. Some applications build pressure gradually. So, you know, you can test that by going from one user, two users, then more.

And some applications, just by their nature, have a burst. A good example for us is universities. Many universities have course registration on a specific date—say, 9 a.m.—and everybody can register at that moment. Everyone wants to get the best course, so at 9 a.m. sharp—bam—everybody jumps in.

That’s usually a much bigger peak than you typically talk about. And that does require special treatment in load testing.

So for these cases, we usually recommend modeling your script the same way. Users log into the system over time—and then they wait for that 9 a.m. mark—and then all of them, bam, start firing requests. That simulates the behavior of many users hitting the system at the same time.

Jose
And in your experience—and you've been doing this for quite some time now—are there specific areas, types of infrastructure components, or types of scenarios that are more often the bottlenecks when customers start load testing?

Yam
Yes, so you have the usual suspects. Databases are notorious for being a problem—specifically locks. Most of the other components are easier to scale, but some things are inevitably going to conflict. Like, “I want this course,” “you want this course,”—only one can take it, so it’s going to involve locking. That’s the usual suspect.

But I think part of what we do—and the reason we do it—is you never know. Which is the interesting thing. We’ve seen customers say, “Oh, I want this website to support 5,000 users.” And they get blocked at 5. They think they’ll reach 5,000—but the first five users already hit a wall. They didn’t think about something that caused it to block. And that’s why you want to test.

And I also really like what you guys are doing. Because there’s this notion of planning—we try to plan for the best, try to know our limits. But every system has a limit. Sometimes you get caught by surprise. You have more volume than anticipated. And then having a user experience that’s not just “OK, the system’s crashing, bye”—having some kind of system that says, “All right, there’s a queue, we’ll let you in”—I think that’s a really brilliant approach.

Again, no matter what you do, you're going to hit a limit at some point.

And for most companies, it’s a good thing to have too many customers coming to their website. I say, yeah, all right. But, you know, if it does happen, you probably want them to have a good experience—rather than see a 503 or system down message.

Jose
Well, I very much agree with that—but of course, I’m also biased.

We actually had Martin Jensen, our Head of Engineering, on another episode of the podcast as well, and this was one of the things we were talking about—exactly that idea of scaling, and how that can be complemented with the virtual waiting room.

Because yes, you can scale—at least you can scale some components. But then it really varies: can you scale all the components that are required in a user journey?

We’re talking about bottlenecks—as you mentioned, the database being one of the most famous culprits from that perspective. So, yeah, very much agree.

Yam
And also, you’re right—it’s the flip side of the coin. Just like someone might say, “I have load testing, so I don’t need to worry about anything else”—I think the reverse is also true.

If you have a queue, you should also load test. Because you don’t want to reach that point unnecessarily. You don’t want a situation where you say, “Okay, we support 5,000 users,” and then the first five go in and the rest are just stuck waiting in line.

You want to optimize your system beforehand. So you load test, you optimize your system to the max, and then you can rely on your backup plan—if needed. That’s kind of the best of both worlds.

Jose
We see that quite often as well. We often work with our customers on getting to this number—right? This idea of capacity. What should the flow be—how many users per minute do you want to let into your website? And sometimes that’s a technical infrastructure constraint, but sometimes it’s more of an inventory constraint.

But going back to the infrastructure constraint—do you have a specific way to define or get to the capacity of a web application? How do you help your customers determine that number? I don’t know if that’s based on concurrent users on the website… so how do you think about that? It’d be very interesting to hear.

Yam
Yeah, this is a very good question. I think it’s also a matter of terminology, and it’s something that’s good to think about. As a customer, you need to be aware—because even though all these numbers seem to represent the same thing, like “how much can I support.”

We use so many different metrics across the industry, and it’s important to realize they’re not the same. For example, if you say, “My website supports 100,000 users,” maybe that’s over a whole day. That’s completely different from saying it supports 100,000 users concurrently, which could mean supporting millions.

So there’s a big difference in throughput. If all those users are doing something—or not—those small differences can make a big impact. So when you're working with these numbers, you need to ask yourself: am I comparing the same thing?

There’s no real rule of thumb to convert one metric to another. But the key is just being conscious of what you're measuring, and how you're interpreting it.

And for different purposes, you need different measurements.

In load testing, we usually talk about concurrent users—that’s kind of our standard language. But you can’t really easily convert that into throughput—like how many users can I let in per minute—which is not the same as throughput in megabytes per second, what your network supports.

You do need all these answers. Your network guy will ask, “I don’t care about users—I just want to know how much bandwidth I need.”

The queue, on the other hand, is more about flow—how many users per minute.

But you can infer one from the other.

We usually start with virtual users. And the script will show you the throughput: this is how many users, and you’ll also see a graph of how much network they used.

If you know how long your user journey is—that’s a key factor. Every new user comes in, but are they staying for a minute, or for half an hour?

Some of our customers have very short sessions: you look at the website, see something, and close it. Others—like insurance companies—can have scenarios where a single user is in there for an hour. They grab resources the entire time.

So the key takeaway is: know that there are different metrics in play. And you need to think about them carefully.

Jose
And it's definitely—at least to my knowledge and experience—not always easy to translate from one to the other. But I think, as you're saying, it’s very important to know that you’re not talking about the same thing.

Yam
Right. So we don’t have a solution for you guys like, “This many users equals this many users per minute,” but we do say—don’t assume the numbers mean the same thing.

You might say, “I have 100 users in my database,” or “I have 100 concurrent users,” or “I want to get 100 users flowing into the system”—these are all different numbers.

So at the very least, be aware that you need to think about the conversion. Use some estimates, whatever data you can gather to get to a meaningful number.

Jose
Yeah, and that's a little bit of what we do in the initial discussions with customers. We talk about—okay, one thing is the number of users, but then it’s also: what is the average journey time for the user on your website, right?

And from that—it’s always an estimate, unless you’re lucky enough to have exact data—but from that you can get to the flow, to the number of users per minute. It’s at least a starting point.

In our world, it’s a lot about flow—how many users per minute we’re letting through from the queue, what we call outflow. But quite often when we’re talking with customers or prospects, they’re more used to talking about transactions per minute. And we need to be careful that we’re not comparing the two, right?

Because one outflow, one user coming out of the queue, can vary. Back to what you said—it depends on both the journey time and also the number of transactions that user performs.

Yam
Yeah, and like we said—you may have a different metric in hand, and that’s fine. But you just need to know, “All right, so what does that mean in terms of the other metrics?”

Jose
Yeah. Maybe you can take us through a real example, if a customer comes to you and says, “Okay, I have this big event coming up. I’m expecting a peak of traffic, a big surge in one month.”

Do you have a specific playbook or at least some guidance on how to start approaching that, to get the highest confidence that by the time the event happens, they’re ready?

Yam
So the first advice is: start earlier than you think. If you think, “Okay, it’s one month away,”—start earlier.

Because people often don’t think about the full process. They say, “All right, I’ll do the load testing, and then I’ll go live.” But that’s not really the process. The actual process is: you test, and then you find the bottlenecks. Then you go back to development and say, “We need to make some changes.” Then you change something—it could be code, or it could be something in the infrastructure. Then you retest.

So it’s a cycle—and you need to leave enough space for that.

Otherwise, it’s not helpful to just say, “Here’s my number. How many users can I support?” and that’s it. It’s not as beneficial. The real value comes from treating it as a process—you want to get to the optimal number.

So usually, it’s an iterative cycle: you test, refine, and test again. The more time you have, the more you can reap the benefits of it.

Jose
And I guess it can be, as you said, a long-running game with a lot of iterations, because you do the first testing, you find the first bottleneck. And then, I guess if you're lucky—or if you have the opportunity—you address that, you lift that bottleneck. But then another one will show up, and I guess you can follow that chain for quite a while, lifting a lot of the bottlenecks.

Yam
And the retesting part is super critical, because as we all know in load testing, you fix one thing, you break something else—which is now much worse. You think it’s going to get better, but it’s not always like that. So definitely retest.

Jose
Okay. That's an interesting view. So is it often that you lift a bottleneck and then actually end up creating another one that has lower throughput? And then in those cases, do you ever go, "Oh, let’s put the other bottleneck back"?

Yam
Yes, it could happen. Again, because real life isn't that straightforward. It's not like a bottleneck is a fuel limit. You change something in the code, and then it behaves differently—not always the way you expected. That's the thing. It might actually be worse, and you see that only after testing.

Jose
Yeah. You mentioned the database as one of the clear suspects, one of the usual ones when we talk about bottlenecks. If you were to name the top three or top five bottlenecks in complex systems—so let’s say I fix the database bottleneck—where would you expect the next bottlenecks to show up?

Yam
It’s hard for me to say—it really depends on how your system is built. I think the database is a more common bottleneck for big monolithic systems, where everyone depends on a single system. That’s where you're going to see it.

But now people are shifting to microservices, so maybe your application is built from many different small components. Sometimes the problem is the orchestration between them. Each one is doing its own thing—which is great—but then to do something complex, you need to combine all of them together. And that’s sometimes where things get more complicated. But every system is different.

Jose
And that orchestration component is interesting. We often see—and we’ve also talked about—some external dependencies being potential bottlenecks too, like payment gateways or third-party APIs. Does that match your experience?

Yam
It’s a very good question, and it’s a tougher one to grapple with. Usually you can’t just include those in load testing freely—your partners will want to know when you’re doing it.

Jose
Yeah, we know. We also like to know when our customers are running load tests.

Yam
Exactly. So yeah, don’t just hit everything external during testing without coordination. That’s not really recommended.

Jose
No, no, of course—assuming it’s coordinated.

Yam
Right, but even before assuming that, it’s something you really need to think about. Your first priority is you—so if you can avoid calling third-party services during your load test and focus on the parts of the system you can change, that’s ideal.

Usually, conversations with partners are different. But it’s important to think about, because it could be a performance bottleneck. You need to know who’s in your ecosystem and how their performance impacts yours. It often becomes a conversation about SLAs and what they can provide you.

And yes, if it’s a critical part of your infrastructure, you should coordinate your load testing with them. But in many cases, you can’t. For example, if you're using Microsoft authentication—like many people do—you don’t really have control over those big giants. You can’t exactly ask them to install new servers for your test.

So it’s more about awareness. It is a challenge.

Jose
And in practice, what would you say is the simplest way to address that? Is it with mock services that simulate the responses, or do you try to completely...?

Yam
Yeah, it depends on what the service is doing. Let’s say you have ads that are popping up—but if you don’t have them, they’re just not there—then probably the first thing I’d try is to ignore it. You can filter them out from the script and say, “I don’t need those ads in my test.”

Or if you have, like, a Facebook button or something like that—you can just skip it. So the easiest approach is often to skip it for your test.

But sometimes, like with authentication, that’s harder. That’s a bigger one to tackle. You can’t just skip it—the system won’t work if you’re not logged in. So you have to test it.

In that case, either you mock it—you set something up that says, “Yeah, I’m authenticated,” and then continue for the rest of the test—or you coordinate with the provider and test against the real thing.

Jose
I think that’s a very good point—having the discussion about what exactly you’re load testing. That’s a conversation we also often have with our customers.

You can load test us, but what do you want to achieve with that? Do you want to load test and understand how your system handles the load—or how our system handles it? We actually have some load test results we can share to show how our system handles traffic.

And quite often, there are ways for our customers to test their own systems without needing us directly involved. So it’s interesting hearing you share that perspective—because I’m thinking about it from the other side, as a provider.

Yam
Yes, you’re definitely correct. And it comes back to what we said at the beginning—start simple and test individual components.

It’s a common mistake to test two completely different systems at the same time. If you can—and most of the time you can—avoid that.

Make sure you test your own stuff first. Know your bottlenecks, your scalability, know everything about your environment. Then you can see, “Okay, how does this fit with my other providers?”

But if you try to test everything together, you end up muddying the results. If your system slows down a little and the other component is also a little slower, then it becomes much harder to interpret. You see that it’s slow—but you don’t know why.

Jose
You’re adding too many variables to the equation, right?

Yam
Exactly, exactly. If you can, try to minimize that.

And here’s another thing—in our world, the world of testing—there’s never really an end to it. How many unit tests can you write? You can just keep adding more and more.

So with load testing, too, start with the basics. Eventually, maybe, you’ll run that big test that includes everything together—but only once you’re already comfortable with all the other components. That’s when you can say, “All right, let’s see how everything plays together.”

But if you start there, it’s almost impossible to know what’s next. You just get some numbers, and you’re stuck with them.

Jose
And from your perspective, is there anything you see changing these days in the performance testing world—or specifically at Radview? Does anything come to mind around how the technology or the work is shifting?

Yam
Radview, like the industry—and like the world—is experiencing this massive AI wave.

AI is definitely the biggest buzzword everywhere, and it’s impacting the world, the industry, and us. So yeah, that’s definitely the biggest thing happening.

We already have integration with OpenAI—ChatGPT specifically. So when we show you the results, we can now give you kind of an AI opinion on them. It’ll provide an explanation like, “Your bottleneck starts after X number of users—that’s what I’m seeing.” So that’s already live.

And I’m seeing a future with many other things becoming more automated and smarter, thanks to AI.

That’s one thing, but there are other shifts happening in the industry too—for example, the trend of breaking up larger applications into smaller microservices. So we’re seeing more and more API testing. Each of those brings a different flavor to testing.

Jose
Would you say API testing is easier to do than testing full user journeys?

Yam
It’s like we said—it’s easier because each API is more controlled. Each one is isolated, so it's manageable on its own. But with the actual user journey, there are so many of them, and the way they behave together becomes tricky.

In some cases, API testing is easier, because if there's a bottleneck, it's simpler to narrow it down to that specific component. Whereas if you're just looking at one big request and saying, "This whole thing takes X amount of time," it’s much harder to figure out what’s behind it.

Jose
And you mentioned AI—the use case you described was helping interpret results. But I can imagine another use case where it helps validate or explore the user journey on the website. Do you see potential there—in the future or even the present?

Yam
Yeah, we’re actually working on that now.

In general, there are still a lot of areas in load testing that require expertise. Even with all the tools we have, it’s still not that easy. We try to help—we automate more and more—but there’s still a level of expertise you need.

One example is understanding the results. You run the test, you see some graphs—and you might not know what they mean. That’s a problem. But it’s not just interpreting results. Setting up the user journey, validating it, comparing it—there are many places where users still need help.

And yes, I see a future where AI will take over more and more of these steps. Eventually, you might just say, “Tell me how good my website is,” and it will figure out the relevant scenarios, run the tests, interpret the results.

So yeah, all of that will probably become more and more automated over time.

Jose
And the user journey is something we also spend a lot of time looking at with our customers. If you're protecting your website or a specific page, and you don't consider all the entry points to that page or resource, then you might be protecting one part, but missing another. It's not as good as it should be, right? So how confident are we that we've covered the full user journey—I think that's a hard problem, but a very interesting one.

Good. And Yam, just to wrap up, we like to end with a few rapid-fire questions. Just three or four—short answers, whatever comes to mind, all right? I promise, no trick questions. So I’ll start: for you, scalability is...?

Yam
Yeah, it's the ability to grow. It’s not about how big you are—it's how fast you can grow. I’d focus on what's blocking you from growing. Can you keep growing? Not can I reach 100 or 1,000 users, but can I keep growing? It’s not a number, it's really a process. That’s how I see it.

Jose
Yeah. Second question: is there a resource—a book, a website, a blog, a specific person—you follow and would recommend to our listeners?

Yam
Find something that’s interesting for you. I’d say don’t focus too much on your exact niche—look at broader tech. Find somebody whose voice you enjoy. I speak Hebrew, so I wouldn’t necessarily recommend my podcast to everyone, a Hebrew tech podcast. But I think it gives a broader view of what’s happening in the world—not just your narrow slice.

Jose
But you’re welcome to share the name of that podcast—maybe there are others who speak Hebrew and would be interested.

Yam
Yeah, it’s called Making Technology—in Hebrew.

Jose
Okay. In one recent episode, we also got a Danish podcast recommendation, so I think at some point we’ll cover all languages. That’s pretty good.

Yam
Yeah. And my advice isn’t that this specific podcast is so great you should translate it—it’s more about finding something that’s easy for you to absorb. Like easy reading. Instead of going deep into a very narrow subject, find something almost pop-culture-like, but that still gives you a good overview of what’s going on in tech. I think that’s really valuable.

Jose
Very good. Two more to go. What advice would you give to yourself early in your career—or to someone just starting out now?

Yam
Be serious about what you do. Take it seriously. Think of yourself as a professional, even at the beginning, even when you feel small. Don’t just think, “I’m a small cog in the system.” You make a difference. In tech, every developer, no matter how junior, is doing something meaningful. So see yourself that way—an important part of the system.

Jose
And final question: which technology are you excited about these days? And there's an asterisk—you’re not allowed to say AI. So anything else you’re looking forward to in the next few years?

Yam
Not the next, but the previous one—everything around cloud and containers.

On the flip side of AI, we still have to run things, and I think we now have really interesting tech that makes it easier to build databases that scale infinitely, and machines that orchestrate themselves.

This whole world of scalable infrastructure—cloud, containers, orchestration—it’s just so much easier now to scale things.

Even though we talk about infinite scalability, and technically you can ask Amazon for an infinite number of machines, we’ve seen—and this goes back to your very first question about scalability—that throwing more machines at the problem doesn’t always translate to more throughput. That’s why we talk about scalability.

If you build things correctly for scale, then yes, you can keep going. Still, the fact that you can summon machines on the fly—I still find that magical.

Jose
Awesome. I think talking about this magical world is a good way to wrap up. Thank you so much, Yam, for dropping by the podcast. Really appreciate it—it was great talking load testing and learning from your perspective.

Yam
Thank you again for hosting me. Thank you.

Jose
And that’s it for this episode of the Smooth Scaling Podcast. Thank you so much for listening. If you enjoyed it, consider subscribing and maybe share it with a friend or colleague.

If you want to send us any thoughts or feedback, write to us at smoothscaling@queue-it.com.

This podcast is researched by Joseph Thwaites, produced by Perseu Mandillo, and brought to you by Queue-it—your virtual waiting room partner.

I’m your host, Jose Quaresma. Until next time—keep it smooth, keep it scalable.

[This transcript was generated using AI and may contain errors.]

Handle peak traffic with confidence, no matter the demand

Discover Queue-it

Choose a language:

Load Testing for Peak Traffic with Radview's CTO Yam Shal-Bar

Episode transcript:

Handle peak traffic with confidence, no matter the demand