Making Databases Easier and Invisible

Tobias Ternström leads a team that helps enterprises manage databases with less effort and risk across hybrid multicloud environments.

By Jason Lopez October 20, 2022

Databases store digital information in ways that make it easy to find and use, but managing the rapid proliferation of databases is forcing database administrators (DBA) to jungle a variety of formats, features, engines and applications.

Provisioning, patching, cloning, and copy data management are just some of the many labor-intensive tasks that DBAs deal with on any given day. New innovations and offerings such as database-as-a-service (DBaaS) are helping, but data creation is growing at a phenomenal rate. Industry analysts believe over 2.5 quintillion bytes of data are created every single day and expect worldwide data to more than double over the next few years.

Podcast

Making Databases Easier and Invisible

In this Tech Barometer podcast segment, Tobias Ternström talks about the kinds of database-centered issues technology leaders are grappling with and shares how he learned to master the nuances of intricate database ecosystems. Then he points out new innovations that are helping databases to become almost invisible.

Ternström the vice president of engineering and product at Nutanix, whose software helps enterprises build hybrid multicloud IT operations that support changing business needs, including the development of new applications. He leads the team behind Nutanix Database Services, which helps simplify database management at scale across hybrid multicloud and supports developer productivity.

Ternström grew up in Stockholm, Sweden, where his techie dad introduced him to computers. Bitten by the PC bug since boyhood, he started his career in the 1990s dot-com boom, coding with SQL databases for a tech startup. The startup was self-funded, so Ternström taught SQL database classes to pay for rent and groceries.

He delved deeper and deeper into the nuances of database engines, formats and architectures. He saw that as computers could do more things, people demand more from databases.

“I see things in terms of databases,” he said. “If I’m at the airport check-in and something goes wrong, my mind fixates on what could be going wrong with the airline database.”

Bringing Order to Database Ecosystems

Ternström said the companies (customers) he works with have straightforward objectives.

“They want to reduce the risk, effort and cost of operating 100s to 1,000s of databases,” he said. “It’s critical to step back, focus on the right strategy and build from there then optimize.”

Sure it’d be great to have just one database to manage but Ternström said that’s impractical because every application uses data in different ways. Moreover, keeping everything in a central database slows development, hampers scale and increases risk.

Nobody has one database (or even a dozen) anymore.

“They have hundreds, to thousands, to tens of thousands of databases,” he added.

Rein In Databases With Hybrid Multicloud Systems

He talks about automation and finding the best balance of risk and reward. His team has a lot to figure out because every automation capability must work across multiple database engines, operating systems and IT infrastructures. But all of that work is paying off, as DBAs and developers no longer need to master all the intricacies of databases, including backup, security, high availability, etc. They can automate tasks by partnering with a database-as-a-service (DBaaS) provider whose APIs let them plug fully functioning DBs into their apps.

“Nutanix Database Service gives you full control with the highest level of automation across the most popular databases, wherever they are running.”

Learn more about Nutanix Database Solutions in this video.

Transcript (unedited)

Tobias Ternström: You will start in the public cloud because the public cloud is very cost-effective for smaller workloads; even though they generally have a higher price per compute, if you will, per compute second, it still makes a lot of sense to start in the public cloud. Those most critical databases would be the last that you automate and probably also the databases where you're more selective into what kind of automation you use. You have a team of experts. You want those DBAs to focus on literally the top handful of databases. So let's say you have five to 10 out of your 10,000 databases. Those are super important. That's where your DBA team should spend the absolute majority of their time.

Jason Lopez: That’s Tobias Ternström. He runs engineering for database workloads at Nutanix. This is the Tech Barometer podcast, I’m Jason Lopez. What you’re about to hear are his thoughts on databases in the 2020s. So what makes his views important? He’s worked at Microsoft on Azure, at Google on cloud databases, and at AWS on Amazon Aurora. In this podcast, he covers topics like the explosion in databases, databases on hybrid multi-clouds, automation and careers in databases. Tobias emphasizes that a company’s databases are critical properties and thus company leaders and decision-makers, who may not have specific expertise in the technology, should absorb what they can about database engines. That’s one of the aims of this podcast, to provide insight. We started off the conversation on the kinds of databases out in the marketplace.

Tobias Ternström: So generally there are two databases in the world, as far as I see it kind of simplistically, but I think it makes a lot of sense. So you optimize for one of two things generally. So either when you build a database and you optimize for features, so the database can do lots of things. A good example here would be relational databases, like post SQL or, or, you know, document database like MongoDB, both I would argue or optimized for features you can do, you know, joins. You can do lots of different types of queries. They have query optimizers. You don't have to think about exactly which index to use the database kind of figures it out for you. But the problem is because of the fact that a query when you submit it to a database, it's a little program. So you never know exactly how much time the query will take to run. And how many machines, if you use multiple machines for the data will need to be involved. So it's hard to get predictable, especially predictable, right performance, but predictable performance after this. So you have to be very careful with how you write your app. The other type of database is basically a database that's focused on predictable, write performance. At scale examples, there would be Cassandra or DynamoDB and AWS or big table and GCP. So you have to basically pick between these two. I would hope that over time, it becomes easier when you write an app to use a single API independent of if you want to use the feature-rich side of things or the right scale outside of things. Obviously, it needs to be clear to the app because otherwise you kind of break the premise or the feature-rich versus predictable, right? skip performance at scale. But right now I think it's fairly Ify as a developer because you're building an app and you use multiple database engines. You use my SQL or, or Postgre SQL and use, you know, read this for cash and maybe use elastic for search, maybe use Mongo for documents. There are a bunch of APIs to use. And your data gets kind of siloed across all of these APIs. And you also have to become an expert or learn all of these APIs. And that obviously slows down development. So it would be nice if we went towards a path where even if you have optimized database engines, you can have a simple API experience, independent of the backing technology.

Jason Lopez: Let’s add a dimension to this. What’s your sense of how the movement to hybrid and multi-cloud systems is affecting how people manage databases?

Tobias Ternström: Yeah, sure. I think it's important when we talk about hybrid multi-cloud or you talk about cloud, generally, you talk about this movement of data and applications from on-premises to the public cloud, but I think that's really a misnomer that's mainly because cloud is obviously newer than on-premises, but really what I think we'll see more and more, and we already see is you will start in the public cloud because the public cloud is very cost-effective for smaller workloads. Even though they generally have a higher price per compute, if you will per compute second, it still makes a lot of sense to start in the public cloud. Now that could be because you're building a new app or because you have something existing and you want to now launch in a new region. And obviously, in that region, you'll start in the public cloud. But as you gain more and more momentum makes sense. If you have a very burst application on average for long periods of time, it's not using a lot of resources. Public cloud is, will be very, very most effective, but workloads that are more steady state over time, as they grow, you'll find that it's more cost-effective to run them in something like a colo. And then as it keeps growing, starts making more and more sense to run them on-premises. Now provided you can build on something that gives you the same cloud operating model if you will. So I can get the same database automation in the colo and in the public cloud and on-premises, then I can scale and I can move my app as I evolve if you will. That helps me basically run as cost-effective as possible. So you want to make sure that both your application and your database is moveable between these environments, either because you're moving between regions, right, and expanding a footprint between regions or because you're growing your workload. And over time, you may find that it's more cost-effective to run it on your own servers or Nicole or on-premises. You know, we often, when you talk about cloud, you talk about and say, well, you know, no one runs their own power plant at home. You get the electricity through the power grid, but now with solar, more and more folks are saving quite a lot, not using the utility, but instead using a combination of solar and batteries, which is very much akin to going from the public cloud towards on-premises. So it's the same analogy as also renting versus buying your own home over time. Buying your own home is going to be more cost-effective than renting.

Think Global, Act Local: Controlling Distributed Cloud Databases

Jason Lopez: Has this changed how you look at or design databases?

Tobias Ternström: Yeah, I think it changes how lots of people look at it. Now, the nice thing is if you start in the public cloud, which also that's a natural way, if you will, if you start in the public cloud, you generally want to start using this little bit higher level of services, like a database as a service. And if you build on that, that also makes your app generally more portable provided the database service can be found elsewhere. So if you build on database service, that's completely proprietary to a certain cloud, you obviously will have higher switching costs. It's going to be trickier to move. I think most people that have been in idea have experienced high switching costs, especially related to databases. You obviously want to pick a technology where you can find the same API if you will, with a different vendor. So if you, by building on something like Postgres SQL for example, an open source database, even if I use a certain service for that, as long as I know that that's say API with similar performance characteristics is available somewhere else, I'm fairly portable. And generally, sometimes you hear people talk about failing over between, I don't know, Azure and GCP or, or Azure and AWS. Obviously, there are apps where you want that type of availability, but those are extremely far and few in between where you talk about. I need to move in literally seconds here. What's more, can I move in a matter of days or weeks without a lot of risk? And that's really how you want to think about it.

Jason Lopez: Tobias Ternström grew up in Sweden. As a kid, he would accompany his dad to work at offices in Stockholm where in the basement were some older large computers and mainframes. These were the kinds of toys that would hook him on information technology.

Tobias Ternström: I started writing code in the eighties and nineties. I started writing code, but that was on, on PC.

Jason Lopez: Well, what is it that led you to databases?

Tobias Ternström: Databases were completely by accident. One of the things was databases and using SQL servers. So I used to basically code in the evening and teach SQL servers in during the daytime. And it was mostly because we had a startup and we were self-funded and in order to basically make revenue while we were building applications, we taught, there was a lot of demand for databases. So they asked me to go and teach, basically development and administration of SQL servers. So I started on that.

Jason Lopez: Well, to start simply, can you tell us, what is a database?

Tobias Ternström: Well, databases have been around for literally thousands of years. So it just used to be written down. Look at the index of a book, for example. That's a typo database, right? You look up in the back and you find out which page contains whatever it is you're looking for. So the first big thing that happened was obviously going, digital. So I think it was a bank of America the, in the sixties, if I don't misremember that first digitized banking. And then from that database has gotten into absolutely everything. And there is obviously lots and lots of revolutions that's happened along the way. One thing that's remained fairly constant since let's say the seventies, when it developed the relational database model is still, you know, going strong today. So relational databases is mainly what the world runs on as well as, there was a big thing that happened at the end of two thousand and the beginning of 2000 tens where no SQL databases really became more and more used, which basically straight away from the relational model due to mainly reasons of scale.

Jason Lopez: Well, let's talk about managing the explosion of databases. How did we get where we are now with thousands -- tens of thousands -- more databases than we used to have?

Tobias Ternström: You know, there is the famous quote that said, I see a market of, you know, a couple of computers in the world. And after that, obviously, you know, computers are everywhere. It's the same with databases. I'm sure most companies, if you go back, a couple of decades said, oh, you know, it would be great to just have one database. We have one database, and we have all of our information in it. We can assign probations and let people query, but it turns out that trying to centrally plan. This is very, very difficult and it slows down development. It started with, you should have just one database. Well then it comes, oh, I need to add a table into this database. Do we have space for this table? How much compute power does the table need? And so on and so forth. And at some point, you realize that that's going to be very, very complex to try to keep everything in a single database. And back then obviously a single database meant a single computer. And for the most part, most databases out there today, it tends to be a single writer. So single computer handles the rights for the database. So the next step was, well, we should use just one database engine, at least. So we standardized on something like Oracle or Informix or SQL server, but that also turned out to be tough because as you're building applications, different applications have different requirements. It may be around cost, maybe around the scale. It may be around features. So you start saying, well, if I'm building this application, maybe I should use a different database today. It's definitely very different because you generally don't start from scratch. When you build a new application, you generally start from some application framework, some app that has built obviously lots of ISV apps out there that folks build on or take an open source project. That's very popular WordPress. So if you're building on WordPress as your basis, you're going to use my SQL because that's what, WordPress uses WordPress supports other databases also. But by and large, it's a very high percentage. It's my SQL. It's gone from, let's try to keep it one and then realize you can't, then let's try to stick to one database engine, then realize you can't. And then basically as applications evolve, or even now microservices evolve, you'll find that a single microservice may use one database where another microservice in the same app if you will use a different database

Jason Lopez: As you’re now dealing with more than one database engine, what's the most sensible way to deal with the quantities and varieties of databases?

Tobias Ternström: Let's say you have 10,000 databases in your organization. Some of them are for developement. Some of them are for test. Lots of them are production. First of all, it's all databases aren't equal. If you want all of these databases to have the maximum number of nines and performance and so on, you can obviously do that. But at cost, at the end of the day, you want first to decide which of these databases are absolutely critical for the organization. And those you invest in one way versus in other databases that may be important, but not mission-critical. You invest in a different way. And generally what you want to do is as you have a team of experts of DBAs, you want those DBAs to focus on literally the top handful of databases. So let's say you have five to 10 out of your 10,000 databases. Those are super important. That's where your DBA team should spend the absolute majority of their time. Optimizing doing database design, architectural of this, and the rest of all of these databases. You want to automate to make sure that they are managed automatically, including patching again, configuring disaster recovery, high availability, and so on that's at the end of the day, how you scaled it.

Jason Lopez: That’s interesting, the idea of focusing on the top databases. Is the profusion of databases making life difficult for companies? Or, even, database administrators?

Tobias Ternström: Well, the profusion of databases is really driven by applications and developers. So developers trying to solve a problem. They want to use the best tool for the job. So that applies to databases as well. So they pick the right database for the job, obviously because of this explosion of apps, microservices, and using the right database for the job organizations just run more and more databases and they run not just one or two, but multiple database engines. And at the end of the day, the tricky thing becomes, how do I make sure that all of these databases are configured, optimally are being patched or being backed up or configured correctly for high availability and disaster recovery? And having, for example, a DBA team, that's an expert in each of these database engines and can handle that at scale becomes very, very tricky.

Jason Lopez: The next part of our conversation turned to database automation. Automating frees people up to focus on higher-level tasks. Most database administrators would prefer to work on database design, database architecture, and database performance tuning, rather than patching or configuring high availability or configuring disaster recovery.

Tobias Ternström: The thing that I think hits people, and I know lots of folks have built automation, scripts and solutions themselves is that it becomes difficult to scale it. First of all, the devil is in the details. So there are lots of things as you automate that you realize there are special cases all over the place. What if this particular thing happened? Well, my automation script dealt with case a, but it didn't cater to case B, then I'm in trouble.

Jason Lopez: So are you saying that if you have one thing that's wrong in the database automation it can render the whole thing unusable?

Tobias Ternström: Yeah, absolutely. Automation is at the end of the day, you're automating something and that something is taking an action. It may issue a backup. It may create an index in the database. It does something, whatever that action is, it can fail. So then what should you do if that action fails? and do you need to roll back part of the action? Right? The action may have multiple steps. Those steps may not be possible to maintain in a database transaction, for example. So how do you walk back the steps if it fails halfway through, if you will, if it didn't have the intended consequences, for example, then how do you deal with that?

Jason Lopez: Hmm, interesting. It's kind of like Christmas lights.

Tobias Ternström: Yeah, that's exactly right. And so that's part of the problem. Then you get into the, well, I run multiple database engines and I run multiple versions of each database engine, and I may also run them on different operating systems and that quickly expands the testing matrix and the issues that you can run into.

Jason Lopez: I see. Well, how do you map out the business case for automation, especially when high-quality developers are so difficult to find and even keep on staff.

Tobias Ternström: Well, the business case, I think goes in two directions. One business case is obviously to keep yourself safe. And that means having up-to-date patching backup, Dr ha, and so on. It's the kind of, what-if scenario if you will. So that's part of it, of the business case. The other part of the business case is as you get more and more databases and more and more database engines, the direction is pretty clear. That's what’s happening. Do you hire more and more individuals to manage those, or do you make sure that the folks that you have on staff can focus on higher-level tasks and look at the whole? Are we investing correctly? Are we, you know, focusing on the right database engines, are we performance tuning the right databases, and so on and so forth? I think that's generally the business case that you want to make sure that the DBA team, the administrators, and the developers that you have are effective and efficient. The other problem that you're running with developers is generally that developers need databases as part of day-to-day activities. So development testing, and so on part of continuous integration, continuous deployment pipeline. And then is this all automated or does the developer have to work with, their DBA team in order to get those databases? And generally, DBAs don't to be bothered by developers. They want to focus on making sure all of the databases run smoothly and the developers definitely don't want to wait to get their database.

Jason Lopez: One of our writer’s Tom Mangan proposed this statement and we wonder how you’d riff on it. Let's just automate the management of our massive database portfolio, what does that mean to you?

Tobias Ternström: It's, it's not black and white it's gray. So first of all, you want to automate your least critical databases. First, you have your 10,000. You want to start at the bottom and do the first hundred or thousand first the lowest. And then you work your way up. And generally, those most critical databases would be the last that you automate. And probably also the databases where you're more selective into what kind of automation you use in a high availability setup. It means that if one machine, for example, has an issue, I can fail over to another machine or maybe I will fail over between buildings. If one building, for example, loses power, I can only fail over automatically. If I know I have no data loss, if I can have data loss, it becomes very, very complex. If I don't replicate my data at all, that's clearly a problem. Then you will just lose your data. But if the connection between the two computers is fast enough, I can do what's called synchronous replication, which basically means I send information over to the other computer. And before I tell the user that I actually save their order or whatnot, I wait for an acknowledgment from that other machine. So I know that it's not just saved with me locally here. It's also saved on this other machine. And generally, I wait for the other side to say, okay, the problem is that at some point, the speed of light gets in the way, if you will. And I can't wait for this acknowledgment because it slows down the application too much. So between two sites, for example, within the same city, let's say it's not a problem and I could do synchronous and I can automatically failover. But let's say I have a data center that goes down in the region that takes us example of it goes down and I have, maybe New York my fail oversight. And if I now do asynchronous replication, which means there may be a non-zero recovery point, objective, meaning I may have lost some data. It might be seconds might be minutes, might be hours depending on how I configured things. Then generally you can't just make an automatic decision on, do I fail over? Because let's say we lost one minute of data is, were in Texas now contemplating moving over to New York. Is it good now? Should we wait another minute? And see if Texas comes up where we have no lost orders or whatever is going on or EKG records at the hospital or whatever it might be, right? Or do we fail over now and take the data loss? And that's generally a human being that needs to make this decision because obviously depends on, do we think there is a chance that the Texas data center or this machine will come up again while there is a chance, depending on you know, how critical the application is. You may opt to wait or you may opt to failover. And obviously, if you failed over to New York, now it continues on this new timeline. So if takes later come up again, you have now this one minute of data that you need to manually generally go in and find, and then get that information somehow imported into the New York dataset.

Jason Lopez: If a company wants to embrace database automation, what should they look for in an automation provider?

Tobias Ternström: This is generally called database as a service. There is a couple of things. Do they run the databases that you need? Which database engines are supported because of the fact that you don't just run Oracle or don't just run my SQL or Postgres? You probably run a few of them. Do they support the database engines that you need? I think that's the first question. The second question is, do they support this, automation, if you will, where you need it to be run? So if you run in your own data center, maybe you run in a co-location facility and maybe you use a public cloud. Can you use this across all of these or is it in just one of the environments? Another thing to keep an eye on is, you know, how flexible is the automation system is in this database as a service. If you have requirements that you need to run a certain, maybe auditing plugin on your database server, or you need to maybe install your own extension that you built for the server, or you maybe you need to install certain hotfixes from a vendor, can you then install those? Do you have control? And can you install those or is the Debas or the database automation system locked down such that you can't install these things? And the more of these caveats that you find, the more you need to run multiple systems, some databases will obviously be outside of automation, always when you run multiple systems, you have to make sure they now configure the same. Let's say I'm setting up high availability in one system does high availability in that system mean the same as high availability in another system. So it becomes, you know, harder to manage and harder to make sure that you correctly live up to whatever business requirements you have.

Jason Lopez: Let’s close with your thoughts on careers in databases. How are all these changes and evolutions in DB technologies affecting people's career decisions? What would you tell people entering the field?

Tobias Ternström: It's a very good question. One answer might behave. If you haven't looked at open source, that's clearly a strong trend toward open source databases. I would say, especially today, especially Postgre SQL, it's fashionable within databases also. So trends change at the same time. We all know. I think Cobal programmers make a lot of money today because they're far and few in between databases is such a large space. There's lots of activity. And the top five database engines. If you look dbengines.com, which I think is a fairly reasonable source is, you know, Oracle, my SQL SQL server Postgre SQL and MongoDB today. So if you're using one of those, you're probably in a pretty good spot. Now you may want to diversify probably a good idea to get a sense of if you're very much into one database engine. And especially if it's a proprietary one like Oracle or SQL server, I would definitely recommend venturing outside and learning a bit more about open source databases at the end of the day. I think as someone in the industry, it's still really knowing the details matter quite a bit, especially with databases. Most of the time you would pick someone that knows a lot deeply about Oracle, for example, versus someone that knows a little about five different databases.

Jason Lopez: At one time, there was some talk about the DBA jobs going away. But more realistically it’s probably changing. How do you think the DBA job will going to change in the next few years?

Tobias Ternström: Yeah, it's I don't, you know, DBA job isn't going away. First of all, it might say that definition of DBA is somewhat unclear or ambiguous, right? Because two DBAs next to each other may have very, very different jobs. So some DBA jobs are more leaning. It admin, you know, patching servers, configuring setting things up, and some DBA jobs lean very much towards developer. They're li writing, you know, SQL queries, they're optimizing queries, optimizing database, you know, physical structures and whatnot. And you know, some of them focus very much on, on, you know, database design, database architecture, things like this. So I would suspect that the ones leaning more, it admin will want to move higher up the stack and focus more on, you know, go closer and closer to the application. And how do we make sure that the application is, is highly available, highly performant. That would be the general expectation. I think.

Jason Lopez: Tobias Ternström is VP and GM of Databases at Nutanix. This is the Tech Barometer Podcast, I’m Jason Lopez.Tech Barometer is produced by the Forecast. If you liked this interview with Tobias, check out his two part blog post, Modernize Your Database with Cloud Simplicity at nutanix.com/blog. And for more stories on technology you’ll find them at theforecastbynutanix.com.

Jason Lopez is executive producer of Tech Barometer, the podcast outlet for The Forecast. He’s the founder of Connected Social Media. Previously, he was executive producer at PodTech and a reporter at NPR.

Subscribe

Making Databases Easier and Invisible

Making Databases Easier and Invisible

Related Articles