Choose a language:

Infrastructure as The Product: Designing Data-Heavy Systems with VP of Product Maria Petrova

Infrastructure is often treated as a backend concern, but in practice it shapes how users experience a product. In this episode of Smooth Scaling, Product VP Maria Petrova explores what it means when infrastructure becomes the product, looking at real-world, data-heavy systems where decisions around compute, data resolution, scheduling, regions, and cost directly impact scalability and user experience. The conversation dives into scaling beyond the MVP, balancing accuracy with performance, and why both engineers and product managers need to think carefully about infrastructure trade-offs when operating at scale.

Maria Petrova is a product leader known for scaling data-driven platforms and building high-performing product teams. With over a decade of experience across AdTech, eCommerce, and green tech, she’s led teams at Supermetrics, Zalando, Smartly.io, and now TWAICE, where she’s shaping AI-powered energy intelligence solutions. Maria is also the founder of Value Lab, a consultancy that embeds expert product talent into growing teams. She’s passionate about building products that truly solve customer problems at scale.

Episode transcript:

Jose

Hello and welcome to the Smooth Scaling Podcast, where we speak with industry experts to uncover how to design, build, and run resilient and scalable systems. I'm your host, José Quaresma, and today we had a great conversation with Maria Petrova, VP of Products at TWAICE and previously Supermetrics. We discussed her hands-on experience on projects where infrastructure is the product, how infrastructure can have a huge impact on the actual end product, including user experience. If you like this podcast episode, please subscribe and leave a review. It really helps the podcast. Enjoy.

Hi, Maria. Welcome to the podcast.

Maria

Hello, hello.

Jose

So I would like to start straight on talking about this concept of infrastructure being the product. When you talk about that, can you tell us a little bit what does that mean to you? And also, I mean, if you can add some concrete examples there, that would be great.

Maria

So infrastructure is a product. And we know that there are big companies selling infrastructure as a product. Let's say AWS or GCP and all these offerings around how you can buy infrastructure, some infrastructure components as a service. And that's one story.

But what I want to talk about today a bit more is how infrastructure and infrastructure-related decisions, when it comes to system design, when it comes to tools that you use, actually affect the product, even if it's not a core product of the company. Even if the company focuses on something else, like let's say data products, analytic products, or whatever user experience they build, how infrastructure is actually essential to power the right user journey. And how it sometimes gets overlooked because usually when it comes to infrastructure, all these teams that are in charge of it, like DevOps, they're usually very, very far away from people who make decisions on user experience and product look and feel and how the product will be exposed towards users. These teams never speak, and sometimes it's okay, but sometimes it might lead to some suboptimal experiences and some learnings. So let's talk more about it today.

Jose

I like that concept. So that kind of maybe unwillingly then you're saying that the DevOps engineers might end up being UX designers in a way, right? So that the work that they do has that impact.

Maria

Yeah, and they sometimes won’t even consider that something they work on and they make decisions about, like let's say which database we will use and the performance of this database can affect how the users will work with the tools that you provide.

Jose

Do you have any concrete examples that you can share on that? That would be, I think, a really good starting point.

Maria

Yeah, so just to maybe come to the products where this effect of infrastructure and DevOps and the way you scale is actually very noticeable. That is usually when it comes to products that rely on huge amounts of data or huge amounts of usage per second, per minute, like requests per second and things like that.

One specific example would be if you run some big chunk of calculations for the user. The way you orchestrate this calculation—whether it’s one big job that runs every day, or a number of separate, smaller, targeted jobs that are executed on a certain schedule—might affect the user experience at the end of the day. There is a user somewhere sitting, waiting for the results of these calculations.

Let’s say you schedule the whole thing in a specific time zone, and usually by default it’s UTC, right? Something gets calculated at, let’s say, 1:00 AM UTC. But that means that in the Australian time zone, that would be a totally different time. And the user might wait for the data that they need, and it’s essential for their work, and they might only get it at the end of the working day.

So that’s just a very specific example of how this architecture decision—how do we orchestrate the calculations for our tool—might affect the end user. And sometimes it’s like when you base it on the principle that you want to optimize the amount of hours your team puts into that. It can be a very, very attractive idea to put everything into one big chunk of things because then it’s easier to maintain, rather than having five different scheduled jobs spread across time. But then on the other hand, what it results in is that somewhere, some person needs to talk to a very unhappy customer because they don’t perceive the data that your application provides as fresh and timely.

Jose

I see that when you were sharing that example, it did remind me of the batch jobs in the mainframe that usually run overnight and also have that impact. But of course now, with the time zones, that gets even more interesting, right? Because, yeah, as you said, if you’re working in Australia, you might have to wait maybe until lunchtime for those to run, or until the end of the day. So that’s a big impact on the user experience. It’s a good point.

Maria

Yeah, and usually when it comes to applications that are not that, let’s say, compute-heavy, we tend to forget that how we execute some performance-related aspects of the system might affect the whole user experience and how users interact with the tool.

Another example would be the performance of your database. For most solutions, databases provide good enough performance that nobody even notices any sort of slowness, a problem with the response rate, something like that. But again, when it comes to a higher level of calculations needed to provide the response for the end user, you might end up in trouble as somebody who develops a product.

And I think now, as we are moving into this whole AI era and compute all of a sudden starts to matter, it’s so funny. When I was at school, we had all these desktop computers where you could accelerate calculation speed if you wanted to run a video game. Then maybe for 20 years, nobody cared about compute or how performant the system was. And nowadays, it all of a sudden becomes very essential again, because all these AI-powered applications—especially large language models and everything on top of that—actually require compute, and performance matters again.

Jose

Maybe trying to get a few more examples—I think it’s always interesting to hear those. I understand that at TWAICE, was there any kind of specific infrastructure change or decision that you experienced bringing noticeable changes to the product itself?

Maria

Yeah, for sure. So maybe we should explain to the audience what TWAICE as a product is all about. TWAICE is the product that I’ve been working on for the last two years, and it’s a very interesting system. It helps to navigate the performance of big energy storage systems, BESS. And this is a new way for people to get energy, right? We now use more and more of these green sources of energy.

One of the ways to do it is to power all these big energy and grid systems with battery storage. When energy is not that expensive, you can accumulate a little bit more into the batteries, and then when it’s time to consume, those battery storages power the grid.

The issue is that this is relatively new equipment that people use for that. This equipment requires a lot of monitoring and a lot of tuning. And for that, you need some sort of analytic solution that actually guides this whole maintenance operation process. And that’s what TWAICE does.

So if we put it on a very abstract level, there is a lot of sensor data regarding how this plant operates over time. We take this data in, we run some algorithms on top in order to predict if something might go wrong pretty soon or already goes wrong, and point whoever manages this big storage to it. And that means a lot of data.

Every second, some sensor tracks essential information like voltage, temperature, all the things that matter to evaluate the performance of the storage. And for us, it’s always a journey to find the balance between the number of signals we take in and the quality of information we provide back to the customer. Because the sky is the limit in terms of data resolution, the sky is the limit in terms of the volumes of things that you track, but then there is a huge cost factor attached to it.

In that sense, we’ve been on a journey to figure out what would be the right resolution, where we can still give a good calculation, a good prediction that the customer can trust and actually use to guide decision-making, versus the number of signals we use. And this is the dilemma that we are constantly fighting.

Of course, when we started, we were very open to accumulating as much data as possible and trying to process it as fast as possible. But then we had to develop some algorithms where we’re a little bit smarter in terms of what resolution actually makes sense, where additional data creates more noise than clarity, and making all these trade-offs. And as I said, it’s a journey, because we learn all the time. Like, all right, those sensors are actually very essential for us and we need to use them with a certain data resolution or data structure. And there are some sensors that we can probably not consume data from that frequently.

Jose

The algorithm that TWAICE is using, is it mostly around maintenance, or is there some component of the distribution of energy as well and where it should be saved? And sorry if it’s a very stupid question.

Maria

No, it’s not a stupid question. We do not touch energy distribution. At this point in time, it’s not in scope for the solution. We focus on a different story. We try to understand how equipment works in order to ensure that the equipment performs according to what was promised to the customer.

If you own an electric vehicle, you know there is a battery in place and the battery on the screen tells you, okay, at this current state of charge you can drive for that many kilometers. But depending on how long you’ve been using this electric vehicle, you might notice that you can drive fewer kilometers with the exact same state of charge. Or you can find yourself in a situation where, when it gets colder, as it does in the Nordics where we all live, you also have a problem with battery performance.

So what we try to do is understand which factors affect that and give a heads up to the customer, saying, okay, you might run into problems, you need to fix something, you need to tune something, plug some stuff in and out, connect a disconnected equipment module, all things like that.

Jose

And I definitely experienced that on my electric car, and now with the winter coming, yeah, that’s always an interesting problem.

Maria

Yeah, yeah. And ideally, maybe as humanity, 10 years from now we will get to a state where batteries are more stable. Because we of course want everyone to use more green energy, but so far it requires a bit of tuning. And I think analytics is essential on this whole journey of enablement and grid transition.

Jose

But are you also evaluating car batteries? I mean, I know that cars themselves can sometimes be used as kind of a battery in a grid system as well. Are those also part of your system in some cases, or is it more just the Powerwalls of the world and other batteries?

Maria

So the majority of our customers are currently small industrial customers with big storage systems. But the algorithms that we develop are quite agnostic, so they could be applied to end-user consumer solutions as well. It’s a question of where you want to focus and where you can make more difference with what you have.

For now, we decided to focus more on enterprise-level storage and enterprise-level customers. But the same logic, the same algorithms, could be applied to regular users as well.

Jose

Yeah, very interesting.

And so when we were talking about this idea of infrastructure being the product, you explained it from the lens of a DevOps engineer making decisions that might affect the user experience. But if we turn it the other way around, we could look from another perspective, which is the product manager, right? The PM.

From that side, I guess one could also say that it’s important for a PM to grasp some of those infrastructure concepts as well. So is there any kind of top three, top five architecture choices that you think a PM should grasp?

Maria

Yes, for sure. And there is always a debate—if you go on LinkedIn, there are multiple posts discussing how technical product managers should be, and even how technical product designers should be. Of course, it really depends on the product that you’re dealing with.

But I think it doesn’t hurt to keep in mind that some of the technical concepts and decision-making on the system design side might affect how your end product is received by the user at the end of the day.

I would say there are a number of things product managers need to know about so they can ask the right questions at the right time when decisions are being made. And also to bring the right information to the table to empower decision-making within the technical team.

For example, the questions could be around: do you believe that with the decision you’re making, we can support that user experience? And ideally, at that point, you open up some sort of tool—be it a Miro board or a Figma prototype—and explain, “Hey, there is a user, and they expect, for example, to receive a certain notification or a certain report at a certain point in time.” Basically describing the target picture. That’s really important.

Then asking a question like, “Okay, this is my target picture, with all the complexity that we have. What needs to be in place to actually power this experience?” Do we believe that, for example, MongoDB can handle the number of users that we want to introduce to our tool? Or should we go with something more reliable? Maybe a different database would be the best choice for us.

If we move into more of an enterprise solution space, another question is: do you believe that with this choice of tools on the DevOps side, we can still be compliant? Because if you’re building enterprise solutions, that’s a question your customers usually ask. There are a lot of newly developed or open-source tools that might not yet be compliant for certain customer use cases.

So first, figuring out if the solution is performant enough to deliver the right customer experience. Second, if you’re dealing with B2B enterprise software, whether it is compliant. And then the third question: how do we scale?

We might be in a situation where we’re just pushing something to market, it’s a new product, and the customer base is small. There’s no immediate need to scale. But if we believe there is potential and we invest in marketing, there will be more users. How do we scale the system? Do we have enough capacity? Have we considered different options?

The fourth question brings us back to where we started this conversation: time zones and regions. There are many questions nowadays related to where you store the data and how you handle it. And again, where you perform the calculations might affect the speed of delivery of the solution to your customer.

So let’s talk about regions. Where do we host the solution? Where do we host customer data? How do we run all these queries? That also affects the overall setup we’re delivering.

For example, when I was working at Supermetrics, another data-powered solution, it’s essentially a data pipeline for digital advertising. By nature, it handles a lot of consumer and customer data, which comes with many regulations, like GDPR in Europe, and other requirements around data handling. Whoever handles this data often has strong opinions about how it should be managed—even where it’s stored.

One interesting challenge we had at Supermetrics was that if a customer was already an AWS customer, they preferred that whoever handled their data was also running on AWS. If they were a GCP customer, they preferred everything to stay within GCP, and similarly with Azure and others.

So how do you approach that? How do you become multi-cloud or multi-storage if needed? It feels like a very technical question, and you might initially approach it from a cost or efficiency perspective. But at the same time, there’s a very important customer perspective that a product manager needs to be aware of and bring to the table when those decisions are being made.

Jose

And in Supermetrics, right, I understand that the fact that it’s so data-heavy, and as you’re saying, the different expectations from customers on which cloud to run on and all that, must have made it quite complex to work on. Do you have any examples of those discussions and how you went about decisions on where, in which cloud, to run it and how to handle such requirements?

Maria

Yeah, so actually we had to do a lot of user research. One thing that goes beyond product management and infrastructure-related engineering tasks is the factor of cost and how this whole thing affects the business.

If you look at the cost structure of a modern tech company, usually the first line item is people-related expenses, all the investment companies make in talent. And the second line item is infrastructure-related expenses, all the investment into compute and related things. So there are a lot of things to balance when you think about the right design for your infrastructure.

If I go back to my time at Supermetrics, without sharing non-public details, it was really important to understand how we could empower the best user experience. For example, there is one place where the data is handled, and another place where the data is analyzed. And the majority of customers, when it comes to data analytics, use a lot of publicly available tools, either within the Google or Microsoft ecosystem.

To name a few: Google Sheets, Looker Studio, and other solutions powered by Google, or Excel and Power BI powered by Microsoft. At the end of the day, we wanted people to be empowered and not sitting there pouring coffee and waiting for numbers to appear in a report, but actually working in these tools and getting insights as fast as possible.

So the question was: how can we make sure that whatever we need to deliver is delivered to that user interface as fast as possible? That led us to consider a multi-cloud approach and to figure out what would be the best place for a specific user to run their data operations. It’s actually a very interesting problem to solve, and I think more and more companies are facing it for various reasons.

One reason is to design the system in a way that’s most convenient for the customer. Another is compliance—some customers strongly prefer one cloud over another. So I see more and more multi-cloud solutions around me. Of course, big cloud providers will push you to be loyal and stick to one, but you need to think about what’s best for your business and what’s best for your user.

Jose

Yeah, and I think there’s also the geopolitical reality these days that exacerbates the need to be multi-cloud and able to cater for different requirements and wishes.

Maria

Yeah. And maybe for people building enterprise solutions who are listening to us, it’s also worth noting that if a customer handles their data in one cloud and your system runs in another, there will be an attached cost for moving data from the customer’s cloud to whichever solution you run. So again, considering a multi-cloud approach might be more cost-efficient in the long run.

Jose

You mentioned the cost-efficient side of things, and you’ve mentioned compute a few times. I’d actually like to go back to TWAICE and the work you’re doing there, because I understand that some of the interesting work you’ve been doing is around reducing cost.

You talked about the amount of data you’re getting from all the sensors, and as I understand, one of the challenges was figuring out how to get the right amount of data to compute, while at the same time reducing compute cost, but still giving users the right insights to help them.

Can you tell us a little bit about the key changes you made there, looking at reducing cost?

Maria

Yeah, so basically I think it’s a bit of a typical story when companies start to operate at scale. Usually, when you’re a tech startup, you begin with an MVP. What you’re trying to build is a proof of concept, just to justify that you can solve a pressing customer problem and that what you’ve built can actually produce a meaningful outcome.

But then, if you have enough customers and enough data to work with, the question of scale comes in. And it hits you differently depending on the specifics of your solution. In the case of TWAICE, we started by building this intelligence layer, by building the algorithms in the first place and investing very strongly in data science.

And if we have some listeners who are data scientists—this is not saying anything bad about them—they usually work with data notebooks, whether it’s Pandas, Databricks, or something similar. Those solutions are, by nature, developed for ad hoc analysis. You get a lot of data, you spend a couple of days, you get your results. But it’s not the easiest thing to run the same algorithms over and over again in a repetitive and scalable way.

So for us, it’s been a big journey to figure out how we can optimize all the algorithms without losing quality or accuracy, while making sure they’re performant in a new environment where we have quite a few customers and quite a lot of data to work with. The first thing we did was optimize the calculations themselves—moving from more of a data science proof-of-concept approach to writing a robust calculation layer that can be applied at scale.

Jose

And when doing that, were there any notable trade-offs when you started making that change?

Maria

Yeah. So for us, it wasn’t really a trade-off in the classic sense. Accuracy—making sure we have the best possible calculations—is a hard requirement. But at the same time, you want to keep that accuracy intact and somehow figure out how to do it with less time and less computation.

That meant iterating on the algorithms quite a lot. Another challenge—again, not really a trade-off, but more a creative engineering challenge—was figuring out how you can sometimes decrease the amount of data or the resolution and still deliver meaningful results. And that’s where engineers really partner with user researchers, product managers, and product designers.

Sometimes it’s about understanding that something doesn’t need to be calculated every five minutes. In some cases, it does, but in others, it doesn’t. That’s how you free up capacity for things that actually need it.

I can give you an example from the battery world. Going back to the battery in your car, there’s a measurement that describes the state of the battery quite well, which is the state of health. Without going into too much detail, state of health tells you how old your battery is, how far it is from the initial nameplate capacity when it was first deployed, for example in your car.

This measurement doesn’t change every day. The battery degrades, but it degrades very slowly if everything is fine. Nobody needs this calculation every day because it won’t change that fast. You might need to update it weekly, or in some cases monthly.

So then you start to think: how do I orchestrate this whole calculation layer, which takes a lot of energy, capacity, and money, in a smarter way? How can I make sure that the things that don’t need to run frequently are calculated less often, while the things that are essential for operations every hour or every few minutes get more emphasis and are delivered in a timely manner?

That was a really good example where we spent time understanding usage patterns and the user journey, and based on that, optimized the system itself.

Jose

And I guess that’s a bit like what we see with the batteries on our mobile phones as well, right? You can also see the state of health there.

Maria

Yeah. And again, if you don’t do anything crazy with your phone, the battery health shouldn’t change every day. And if it does, then you should probably get it checked, maybe even for safety reasons.

Jose

And before we go into the rapid-fire questions we usually wrap up with, do you have a set of rules for building data as a product? Is there anything you could share with us—some main principles you follow when doing that?

Maria

Well, very interesting question. I haven’t really thought about them as rules, more like guidelines or principles. One thing with data—when you build something based on data—is that you can make it as complex, or as some of my colleagues used to say, as clever as you want. But you always need to strive for a balance where you deliver clarity first, and context and details later.

So the main principle is: how can we make it simple? This is something I constantly bring to the table. Teams that work with me probably hate me for it, but I always ask, can we make it simpler? That would be the first thing I check. And it goes across everything: user experience, how people see screens, how they interact with them. It needs to be simple and straightforward.

But also, the simpler and more straightforward the system that powers it is, the easier it is to maintain and to change. So simplify as much as you can. That would be my main guiding principle.

The second one is remembering the user and the user experience in terms of regions and time zones. I even had this as a favorite interview question when I was hiring engineers some years ago: can you design a simple appointment booking and calendaring app? That’s where all the complexity comes in.

When we talk about data products, their main purpose is usually to inform and provide insights. Those insights need to be timely. So how you orchestrate schedules—when you pull data, when you calculate—can really affect the end-user experience, decision-making, and how much value users get from the product.

And then there’s a third principle, which might sound unusual in 2025, but when it comes to data and analytics products, pictures still speak a thousand words. Sometimes people go deep into tables or written narratives, but explaining something with a simple visualization often delivers better results and a better user experience.

Jose

So let’s go into the rapid-fire questions now. I think I only have three for you. Don’t overthink them—just share what comes to mind. Is there any book, podcast, or person you follow that you would recommend to our audience?

Maria

I’m very basic on that front, but I would definitely recommend everyone who is in product, or interested in product management, to listen to Lenny’s Podcast from California. I think it’s a great one.

In terms of books, when people ask me how to break into product management, I always recommend Inspired by Marty Cagan. I think it’s something every product manager should read.

There are also many other sources out there. My advice would be: don’t follow anything blindly. Try different approaches and see what works for your specific situation. There’s a lot of content online that says, “Do it this way or you’re doomed.” But from my more than 15 years in tech and product management, I’ve learned that every company is different, every business situation is different. There’s no ready-made recipe.

Jose

I love that. That’s a great insight, because often when we look at other companies, we only see the end result of years of work. There’s a lot underneath that we don’t see. If you just copy the surface, that can be a recipe for disaster.

You may have already answered this, but what professional advice would you give to your younger self?

Maria

When it comes to advice for my younger self—and maybe especially for women listening to this—don’t be afraid to be the only woman in the room. That happened to me a lot, especially at the beginning of my career.

And don’t be afraid to challenge people with questions. When you’re a junior product manager in a room full of DevOps engineers who are very serious and don’t always take your challenges seriously, just keep going. They will listen to you. Just make sure you bring valuable information to the table.

Jose

Thank you. I’ll definitely pass that advice on to my three-year-old daughter. I don’t know what she’ll be doing in 20 years, but I hope that advice still holds.

Last question: to you, scalability is…?

Maria

To me, scalability is flexibility. To be able to scale any system, you need to think from the start about how flexible it is. It’s really hard, especially with greenfield products, to anticipate all future use cases or problems. But what you can do is make the system flexible enough to adapt and build on top of it. That’s what enables scalability.

Jose

I think that’s a wonderful way to wrap up the episode. It’s always great to hear what scalability means to different people. Thank you so much for joining us.

Maria

Thank you for inviting me.

Jose

And that’s it for this episode of the Smooth Scaling Podcast. Thank you so much for listening. If you enjoyed it, consider subscribing and sharing it with a friend or colleague. If you’d like to share thoughts or comments, send them to smoothscaling@queue-it.com.

This podcast is researched by Joseph Thwaites, produced by Perseu Mandillo, and brought to you by Queue-it, your virtual waiting room partner. I’m your host, José Quaresma. Until next time, keep it smooth, keep it scalable.

 

 

[This transcript was generated using AI and may contain errors.]

Handle peak traffic with confidence, no matter the demand