What went wrong (& what went right) with AIO with Andres Freund

Download MP3

CLAIRE: 00:00:05
Welcome to Talking Postgres. It's a monthly podcast for developers who love Postgres. And I'm your host, Claire Giordano. In this podcast, we explore the human side of Postgres and databases, and open source, which means why do people who work with Postgres do what they do and, sometimes, how did they get there? Thank you to the team at Microsoft for sponsoring this conversation. Today's guest is Andres Freund. Andres is a Postgres major contributor and a committer and a member of the seven-person Postgres core team, which is like a steering committee for the Postgres open source project. And he's been working on the Postgres database for more than 15 years. He's employed by Microsoft, where he works full-time on Postgres open source, and he's lead of Microsoft's open source contributors team and has been since 2019. His fingerprints can be seen all over the Postgres source base, but in a good way, including things like logical decoding and scalability and most recently asynchronous I/O. Welcome Andres.

ANDRES: 00:01:11
Hi, thanks for having me.

CLAIRE: 00:01:16
I'm so glad you're here, and it's time to talk about today's topic which is what went wrong, and what went right, with AIO. Now, for regular listeners, they probably know that this is not your first time on this podcast. For people who are interested in your origin story, how you got started as an engineer, as well as in Postgres, you can just go ahead and listen to episode eight, and we'll be sure to drop that into the show notes. That was really about, well actually you weren't the only guest on that episode either, it was you and Heikki Linnakangas and you both shared your stories about how you got start, started, started, I can speak. But, before we dive in, I'm curious what do you work on mostly these days? Is it all AIO, or are there other things on your plate as well?

ANDRES: 00:02:06
It is primarily AIO related, but it's not so much the AIO subsystem itself, but working on the infrastructure to be able to use AIO in more parts of PostgreSQL, which is not really directly touching AIO pieces, but redesigning other subsystems so that they actually can use AIO. There's also some other performance related work and similar things and trying to help out some others too.

CLAIRE: 00:02:39
So some context for people who are listening, Postgres 18 is about to GA. It's about to release in General Availability. And Release Candidate 1 is available for anybody to download and check out the release notes for right now, correct?

ANDRES: 00:02:57
Yes, that is correct. And please test and report back if you find any problems.

CLAIRE: 00:03:02
And yes, because this is the chance, if there are any showstoppers, to catch them and get them fixed before the GA. But AIO, asynchronous I/O is part of the Postgres 18 release, but that doesn't mean it's over yet. And the goal of today's discussion is really to kind of explore your journey leading that project and what went wrong, what went right, what happened. So I guess we should start with why did we do it and when did it start and what's the beginning of the AIO project for Postgres?

ANDRES: 00:03:38
The beginning is probably even further back than me starting to work on it. Personally, I've been interested in adding AIO support to Postgres basically shortly after I started using Postgres, early in the 2010s. But I only started working on it like around 2019. It might have been late 2018, I don't know. And the reason for why we did it was that Postgres until very recently basically relied on the operating system to do efficient reads from storage. For some things that works rather well. For example, if you have a sequential scan or something similar, the operating system, or at least most operating systems, can do reasonably efficient readahead. And that allows Postgres to not be blocked by storage. But if you have anything more complicated, like a bitmap index scan or a bitmap heap scan or a vacuum that skips blocks or something similar, then the storage, the operating system can't do this readahead for us because the operating system doesn't know as much. So it does not, can't look into the future and do a readahead because it just doesn't know the future, even though Postgres could know the future because we know what we're going to do in the future. So the goal of this is to basically give the operating system and the storage the information to do more efficient reading. And one motivating factor for why I started around that time was that Linux had a new feature called io_uring, which allowed to do asynchronous I/O in more cases. Before that, Linux had asynchronous I/O support and had that for a long time, but it only worked with direct I/O. What that means is that it only worked if we did not use the kernel page cache. But it turns out that that's much harder to use. And there's also a lot of use cases where that's not really the right setup to use. So with the introduction of io_uring, it was suddenly possible to do native AIO in more cases. And that's kind of what made me start looking into it. At the same time, there were two important changes, I think, afoot. One was that we had much faster storage than we used to do due to NVMe storage. That's like fast local SSDs that can have very large bandwidth. And it turns out that the CPU overhead of doing I/O. suddenly matters a lot more when having that fast storage. And at the same time, more and more workloads were migrating into cloud systems where you could have storage that has like reasonably high number of IOPS or a reasonably high bandwidth. but the latency towards the storage is fairly high at the same time. And that's what means that to saturate the storage or to fully utilize the storage that we pay for, one actually needs to issue I/O more in parallel than we could do until recently. And that was the second motivation basically for investing time in working on I/O.

CLAIRE: 00:07:00
So let me see if I follow that properly. For both of those changes, both with NVMe as well as with workloads in the clouds with these large IOPS, are you saying that those were opportunities to take advantage of? That's not what I heard. I heard you saying that, wow, we really needed to fix this problem because it was becoming a bigger problem. Did I get it right, or not?

ANDRES: 00:07:27
I mean, that's kind of two sides of the same coin. Like either we would need to benefit to fully utilize the hardware or we're not performing as well because we are not utilizing it. But yeah, otherwise I think you're right.

CLAIRE: 00:07:43
I'm just looking at it the more negative way. Like we had to do it. Like we had no choice. Well, maybe it's true that you had to do it because you were so motivated and had been thinking about it for, it sounds like, eight years.

ANDRES: 00:07:59
Something like that, yeah.

CLAIRE: 00:08:01
Okay, so that's why. Had you ever led an architectural change as big as this?

ANDRES: 00:08:11
I don't think so. I've worked on reasonably big changes to Postgres, but they were all more narrow. They didn't need changes to as many parts of Postgres and they were more focused. And even though I think some of them might have actually been more lines of code or something like that, they were never quite as hard to integrate because they touched fewer places. So that's definitely the hardest project I've ever worked on.

CLAIRE: 00:08:50
So how did you even begin? How did you get started? I mean, obviously with a prototype.

ANDRES: 00:08:56
It's long enough ago that I am not 100% sure about all the details anymore. But I think I just started out doing some very minimal testing in the sense of like, I tried to do the minimal hacking on Postgres to use AIO in one very narrow place and then tried to see whether I could see any performance benefits from that. And initially I didn't. And then I just started trying to use it in other places to see whether there's bigger gains there. And eventually I learned more about what the problems were and where we can really gain performance. And I started to try to generalize how the AIO subsystem would look like in a somewhat understandable way. And over the next couple of years, I on and off worked on developing an AIO subsystem that was generic and tried to introduce users of it in more and more places. And as part of that, there were lots of subprojects that were somewhat independent and could be committed independently. And there were improvements to Postgres that could be committed, even though AIO was not merged. For example, I think Postgres 16 or 17, I don't fully remember, 16, I was able to commit a change to make relation extension, that's like making the table bigger, faster. and that actually was interesting to do because it allowed a fairly substantial part of the architectural changes that were necessary to be merged without the rest of the AIO changes. And generally that's something we tried to do was to find bits and pieces that we can merge earlier because trying to merge the whole thing at once was just going to be infeasible. that was clear pretty early on.

CLAIRE: 00:11:07
I know that in Postgres 17 there was a feature or a collection of features under the umbrella name of streaming I/O was that considered part of the AIO project as well?

ANDRES: 00:11:15
Yes. That was definitely one fairly crucial part. And as part of the prototype for AIO, I had written something that was called at the time, I think, streaming read, It was streaming read, and then Thomas Munro tried to make what I had prototyped into something more general and independently mergeable. And that is what got merged into Postgres 17. And that had substantial benefits on its own, because what it added was the ability to merge multiple I/Os for [...] blocks. In Postgres those blocks are typically eight kilobytes large, into one larger read of up to by default I think 128 kilobytes, if they were neighboring blocks, and that alone can reduce the CPU overhead of doing AIO very substantially because fewer system calls are required.

CLAIRE: 00:12:18
Okay. So the title of today's...

ANDRES: 00:12:19
The other part why that was interesting is that it allowed us to introduce uses of AIO without actually having AIO merged because the whole idea behind this interface that was added to 17 was that it allowed code to have the same interface in 17 as they would have in once AIO was merged and then automatically get AIO and also already get some other benefits before that, and I think that was a pretty good path to take, I think that's one of the things that went well.

CLAIRE: 00:12:59
Okay. So what we want to cover today are things that went wrong, and things that went right, which is a little bit different from what you did when you gave a talk a couple months ago in Montreal at PGConf.dev. And there you focused primarily, and the title of that talk was just what went wrong with AIO. So, but here we want to talk about both, you know, those big challenges, and things that went wrong, as well as what went right. You just talked about something that went right, right? That you seeded some of these changes into Postgres 16 and Postgres 17. So it wasn't a big bang code contribution, if you will, in Postgres 18. But I guess my question to you is, do you want to weave these two things that were good and things that were a problem together throughout today's conversation? Or should we actually start with the what went wrong part? How do you want to do it?

ANDRES: 00:13:56
I think either for me works okay. I think maybe it's easier for the audience if you do separate the two, but I'm sure that even if you do that, there will be some back and forth, just because that's how our brains work.

CLAIRE: 00:14:03
Okay, because you can't help it? [Yes.] Okay, So you talked about, in that talk at Montreal, which I listened to it, I was there for part of it, but I also listened to it again this morning, you talked about a handful of mistakes. But you also talked about why it took so long, and are you happy or not happy with how long it took? Did it take the right amount of time as a project?

ANDRES: 00:14:34
I think it definitely took too long. I think there's like several reasons why I think it's too long. One is just that it's extremely hard to maintain motivation and over that long time. And I think some of the slowdown was just related to needing to do something else just because I couldn't see the word AIO anymore. And if the whole project had taken a shorter amount of time, then that would have been less of a factor, I think. But I also think that it took too long just because, in the sense for, it would have been good for Postgres for it to have happened sooner and there were plenty of other projects that were kind of blocked by not having AIO and also several people in the community in our team at Microsoft were helping with AIO and they would probably have been happy if that happened more quickly because at the pace it was happening, they had needed to switch back between helping out with AIO or doing AIO related projects and doing other projects. And that's the more context switches one has, for a lot of us at least, the slower things go.

CLAIRE: 00:15:55
I guess I wonder if you're being too hard on yourself saying that it took too long? I mean, by definition, don't big projects go down dead ends? Isn't that like a normal part of the design process? That there will be dead ends or there will be things that you wish you hadn't done the way you did in the prototype?

ANDRES: 00:16:18
Yes, that's definitely part of it, and I think a good part of a project of this complexity I think it's impossible to do without exploring dead ends because, otherwise, one wouldn't have been ambitious enough to find the actual right design and would have just gotten stuck in some local minimum of okayish design. But I think there probably were cases where I could have done better like I invested a lot of time in trying to make the prototype kind of work in some edge cases, even though there were just known fundamental architectural mistakes in the prototype. And that cost at least a year. And if I had focused more on giving up the prototype at some point and just starting with a real thing, if I had done that earlier, I think it would have been better. But I think there's also a second aspect of issues that were not related to me personally where I did something wrong. I think a lot of the time that it took for AIO was that we had some aspects of Postgres where we just hadn't invested the time necessary to allow for faster-paced development. One aspect is that before I started working on AIO, Postgres did not have a CI that could be run by everybody. And for something that has so many portability effects like AIO has, that is just not feasible. If we can't automatically test Postgres on different operating systems and so on, then it's not really feasible to develop something like AIO. And so one of the large timeless things that I was on the path to getting AIO anywhere was to merge CI infrastructure into Postgres. And that took a lot of time, and if that hadn't been the case, then the AIO project would have gone faster. And I think it's an area of Postgres that we, just as a projec,t had underinvested in. And I think even though there were some initial skepticisms about adding CI, I think that has generally borne out to be a very crucial enabler for lots of different projects.

CLAIRE: 00:18:49
Now that's something that Bilal worked on right, Nazir Bilal Yavuz? Probably other people as well.

ANDRES: 00:18:53
Yes. Yeah, I think it was Bilal and me were doing a lot of the work initially and then since then, plenty of other people have chimed in. I think in the last couple of years, Bilal has done most of the work.

CLAIRE: 00:19:09
I had no idea that that was an enabler, if you will, for the AIO project. That's pretty cool. I thought it was a general improvement to the overall way the Postgres contributors and engineers tested the project, but I didn't realize there was an AIO connection.

ANDRES: 00:19:30
I started doing CI purely because of AIO, that was directly the motivation.

CLAIRE: 00:19:37
Okay then, that should go on Bilal's, I don't know, his next promotion justification or something like that.

ANDRES: 00:19:45
I'm pretty sure that I did that.

CLAIRE: 00:19:48
Okay, good, good, good. Do you want to walk us through some, since a lot of the people that listen to this show are engineers and developers, they probably are hungry to hear specific examples of things that, decisions you made, or dead ends you went down, that you wish you hadn't. Do you want to give us some of those examples? Do you remember?

ANDRES: 00:20:13
I can try. Some of them will be a bit far down into the weeds. I don't know how easy they're to explain on a podcast. I think one of the dependencies that probably should not have been a dependency was that I got very frustrated with running tests and Postgres before working, like while working on AIO, but also while working on other features. And that indirectly led me down to adding a support for a new build system to Postgres. And I think that was a very good investment into Postgres, but I think it was not a very good investment in the sense of doing it before AIO was complete. And I think I knew that at the time, I just needed to do something other than AIO. So maybe it was the right thing to do, but it definitely did not help the timeline. Another example of like more technical--

CLAIRE: 00:21:09
Okay, and when you talk about [Go ahead.] just how big of a distraction was that? Are we talking about two or three months on your part or a year?

ANDRES: 00:21:24
It's hard to say because it was not like drop one thing and do only the other thing, but like the work definitely went on over like nine months to varying degrees or something so it was a substantial time investment. On the more technical front I think one very hard thing about adding support for AIO into something like Postgres which just was not written with anything like asynchronicity in mind is that one invariably needs something like callbacks or something to react to the completion of I/Os and I definitely went down many, many different dead ends in how to make that correct and not super failure prone. And initially, one of the biggest mistakes was that I allowed those callbacks to start more I/O after the completion. And that turned out to have very complicated nesting issues because it then meant that if an I/O completed while deep in some subsystem, then that more I/O could be triggered and that could then recursively reenter the same subsystem. And it made everything very fragile and hard to understand. And I think I intuitively knew that that wasn't quite the right direction to go in, but like, it was hard to go back and redo everything to get rid of that decision. And the wrongest decision about all of this on a technical level. And what Postgres now has is a much more restricted level set of callbacks. One is not allowed to start new IO inside those callbacks. One is not allowed to allocate memory inside those callbacks. And like, it's very restrictive and that's good for some things, but it's also makes it a lot more restricted. And that probably will make some other features harder, but it's the only way I could see to make the feature actually, understand it little enough to believe in its correctness to some degree.

CLAIRE: 00:23:43
Obviously, if someone really wants to go deep on understanding some of the things you did that in hindsight, with 2020 hindsight, you wish you hadn't done, they can go watch your talk, which is available on YouTube, from Montreal, the what went wrong with AIO. And I can't remember if that was a half hour long talk. I think it was, I think it was about a half hour, could have been longer.

ANDRES: 00:24:07
I think it was 45 minutes or something, it was the full length talk, but I'm not entirely sure.

CLAIRE: 00:24:11
Okay. So you dive deep in that talk. But are there a few other examples that we can kind of consider? I mean, what I want to get to after you share the examples is, what's your takeaway? Are there learnings that other developers can steal from you, or, that if you embark on a similar architectural project in the future, things you will know to do better next time. But before we get to the learnings, I just feel like we need to go through a few more examples if we can.

ANDRES: 00:24:50
Yeah. Another example of failures that were more like project failures rather than my personal failings is that there just are, and particularly were, significant parts of Postgres that just had no tests. And it turns out that if you then redesign parts of Postgres, it's very easy to break those other subsystems that had no tests. And like, for example, we found out very late in the development of, or merging, of AIO that it broke some stats that are emitted whenever there are checksum failures. But we just had no tests, so I just did not think about that until the last minute somehow. And I think as a project lesson, I think it's that we have to continue to invest more into testing infrastructure and different types of testing. And I think that's also, in a way, a personal lesson. Obviously I invested time in working on CI and stuff like that, but I should probably have done more testing infrastructure earlier on to find some of the gnarlier hard to find bugs and, yeah, that was not perfect. I think the development process was that I first wrote that prototype and then only in like about a year ago turned that prototype into some, like rewrote the prototype from scratch, to get something mergeable, and I think I we added too many features to the prototype. We basically had already learned nearly all the lessons that you could have learned, but I tried to make it better and better. Like one big part of what you eventually want to use AIO for is to do WAL writes. And I invested at least a year and a half into trying to make asynchronous WAL writes work very well in all situations. Even though getting the performance exactly right of that was not all that important a decision. It wasn't that important for the design of AIO. It was important to prototype that we could do asynchronous WAL writes, but it was not important to get the performance to be on par in all situations with current Postgres because it was always to be a prototype, not the real thing. So I invested inordinate amounts of time in that, and I think knowing when to stop with a prototype is probably something that I learned a lot about as part of this project. Well what is the answer to that?

CLAIRE: 00:27:49
Knowing when to stop with a prototype, that's hard to give a rule of thumb around.

ANDRES: 00:27:53
Yes, and I think, generally, the problems where things go wrong are not going to be hard and fast zero or one kind of things where like there's a right or is it wrong. It's always a question of like a graduation where like at some point you go definitely invested too much time in it, at some point you invested too little time, but like where exactly the right spot is is a large bandwidth between those and I think that's where most of the things that went wrong were of that nature, and I don't think I know the answer right now, it's know that the spot I picked in some cases was definitely wrong. I don't know where the right spot would have been.

CLAIRE: 00:28:46
Okay, so more examples. I'm putting you on the spot.

ANDRES: 00:28:56
One, the way that the AIO subsystem works is that one can get an I/O handle and then that with that I/O handle one can associate like a read or write and some callbacks that are to be called when when the AIO completes. Initially there was no hard limit in each backend how many of those, could be, handles could be used, and it was actually somewhat expensive to get one of those handles and it took, because it was so somewhat expensive, all the parts that used those handles cached them for reuse and it turns out that if you cache a lot of handles in a lot of places that the total number of those handles can get very large but because of PostgreSQL's multi-process designs the state for all of those handles has to be in shared memory, which then means that we have to pre-allocate them at the start of the server. So this caused a problem that we could run out of handles and that then meant caused a lot of problems, because if we are in the place that wants to do, for example, WAL write which may not fail without taking down the server, and we ran out of handles, there was not really a good way forward. And that was like a multi-layered descent into a wronger and wronger design. And it turns out that the root cause basically was that that it was expensive to get new handles. And because of that, we had to do the caching, and without all of that, once it was cheap to get handles, the whole set of problems related to this went away. And I think I could have recognized that earlier. But I think that issue I feel not as bad about as some others because that was just a new design space that we needed to explore, and in hindsight, everything is easier.

CLAIRE: 00:31:00
Well, in hindsight, it's all obvious, right?

ANDRES: 00:31:03
It's not obvious, but more obvious maybe.

CLAIRE: 00:31:09
Okay, I like the phrase you just used. You said "a multi-layered descent into wronger and wronger design." I'll replace wronger with bad, but I like that quote. Is there any takeaway from that, or did you just have to go through that exploration to get to that result?

ANDRES: 00:31:35
I think we needed to go through that exploration, but I think I should have, or we should have, stopped earlier and did the necessary redesign to get rid of those problems. And that was one of the things that make it really hard to work with a prototype because it would lead to these nested subsystems that had very complicated problems that were interacting with each other. And I tried to put more band-aids on more band-aids, and that just made it even harder. And I think it's related to the decision to stop earlier in the prototype and just rewrite in a cleaner way from scratch. And I think that's actually one of the positive lessons is that for complicated projects, it really often will be worth it to write a throwaway prototype where basically no code will survive from the prototype to the real thing just because by the time the right design will be clear, there will be so much garbage left in the prototype that it's not really worth trying to keep the code and going from there to something mergeable.

CLAIRE: 00:32:56
That's something that you said in your talk in Montreal, that in hindsight you wish you hadn't spent the time you spent trying to get to production level quality in the prototype. Like if you knew upfront that this is going to be a throwaway prototype, you might have saved a little bit of time there. Is that the right takeaway?

ANDRES: 00:33:19
Yeah, and the hard part of that is trying to know which of the problems in the prototype are architectural problems that need to, where it's not yet clear how the right solution looks like and which are architectural problems that we now can fix because we now know about them and so it's easy to avoid them while writing the real thing. Obviously, that's not an easily generally answerable question.

CLAIRE: 00:33:50
So before we flip to what went right with AIO, is there anything else that went wrong that gives you one of those takeaways, those learnings, those "I'm not gonna make that mistake again?"

ANDRES: 00:34:06
I don't know whether, like, I think one other big thing that it didn't go right, but I don't know how wrong it went, and I don't know whether I really know how to do it better, is trying to tackle a complicated architecture problems while collaborating with others. Because it is very hard while exploring something that is in an architectural void to share the problem space with somebody else and to try to delegate parts of the problem to them. Because it requires a fair amount of experience and a fair amount of tolerance for uncertainty, I would guess, is the best way of describing it, to work in that void. And I think that's something that didn't go right in all cases. I think I tried to delegate some projects that were too underspecified and perhaps were too early. And on the other side of the coin, I think there were projects where I made myself the bottleneck for far too long and did not delegate or did not, delegate is the wrong word, did not hand off subsets of the problem to other. But it's very hard ahead of time to know which side of the lines some subset of some problem is going to be. And I hope I am getting better at it but like I've been hoping to get better at particularly this task for a long time, so I don't know whether I am now better at it or whether I know the right solution. But yeah, I found that to be a very hard problem.

CLAIRE: 00:35:55
Well, I think that anybody listening, anyone listening who's a technical lead like you are, is probably identifying with what you're saying. Because like you said before, it's not like there's a right or wrong answer or a zero or one answer, right? It's figuring out what can be delegated and it's also who you're involving. Some people are very good at tolerating uncertainty and other people need things to be more clearly specified. And so kind of knowing that, right, knowing those people and figuring out what to carve up, that's just one of the big challenges of leading a project like this.

ANDRES: 00:36:36
Yeah. And quite often the problem is it's not known whether they are a hard problem or an actually easy problem without first having spent the time to solve the problem. And that means that delegating the problem is like kind of a roll with a die, to see like, it might go well or it might not, but without you having ahead of time the information to decide whether it's a good match.

CLAIRE: 00:37:03
Okay, so before we flip to what went right and to look at the things that you want to celebrate or you want to repeat or you hope others repeat, is there anything else that went wrong that leads to a lesson that you want to share with other engineers?

ANDRES: 00:37:23
I think there's a lot more, but I don't know how much of those are worth investing time on this podcast. Maybe one interesting challenge around this was that it turns out that hardware is very diverse and has very many odd behaviors. I spent a fair bit of time trying to understand how different SSDs work across different workloads and it turns out there's very little information out there to understand that and some SSDs like much bigger writes but very little I/O concurrency, but other SSDs, even from the same manufacturer in some cases, want a lot of concurrent writes but not have them be very large because otherwise the latency increases dramatically, and that makes it very hard to have generally applicable auto-tuning systems. And I think we spent a fair bit of time trying to make subsets of the AIO project not have a lot of configuration knobs for every user, because like users are not going to know how to tune those. But I think we had a hard time finding good ways to do that. And that was definitely a challenge. And I think we went with very simple algorithms for now, but it's definitely not where it could be. And there's lots of challenges still remaining with dealing with different hardware, and particularly because no individual developer will ever have access to all kinds of different hardware.

CLAIRE: 00:39:14
Okay, so you're suggesting there's more work to do in the future, especially around tuning. [A lot more work, yes.] So before we dive into what went right with the project, maybe let's tell people, like, where is this project now? And how much work, how much change is going to happen in the future in Postgres 19, in Postgres 20? Like, let's just state of the world, AIO and Postgres, Postgres 18.

ANDRES: 00:39:45
In Postgres 18, there are quite a few uses of AIO. For example, sequential scans, bitmap-heap scans, vacuum, all use AIO. And in several of those, it can lead to substantial speedups. The reason for the speedups actually differ somewhat between the different uses of AIO, but it is pretty decent speedups. However, there are very important, heavy IO dependent paths in Postgres that do not use AIO yet. And the most crucial one is probably that index scans, like not bitmap index scans, but plain index scans, do not yet use AIO. And that means that if you have a workload that does a lot of ordered index scans, for example, you're not going to want to, you're not going to benefit from AIO, even though it's a workload that can, in theory, very, very heavily benefit from AIO. There's a prototype that's being worked on, or a project to add a readahead for index scans. And in some cases, the speedups are 8x, 9x, compared to not using readahead. And that also means that, let me retract a tiny bit, one of the motivations for adding AIO to Postgres was to be able to use direct I/O, but that means that we do not rely on the kernel caching, buffering, I/O for us, and the kernel also does not do any readahead. And that can be a lot faster than relying on the kernel page cache, and it can avoid a lot of double buffering, where the same data is cached in Postgres' buffer pool and in the kernel page cache. But without supporting AIO in a few more places, that's just not viable to use in any non-toy workload. Today, when turning on direct I/O in Postgres 18, it is going to be faster for sequential scans in a lot of cases. However, if you ever have an index scan, it will be a lot slower than before. That index scan can utilize readahead by the operating system. So I think one big part that is remaining is to just use AIO in more places. Often that will not actually require a lot of work on the AIO subsystem itself, but it will just require work in the subsystem that wants to use AIO. For example, for the index scan, a big part of work is to, like the index interface, how to represent the ability to do more readahead or to present readaheads in there and how to handle the pinning of buffers across, for longer time ,and similar things, and I think there will be a lot of other areas like that.

CLAIRE: 00:43:14
So, let's pause for a second. For users that are listening to this, the story is not yet written. Postgres 19 is likely going to have, so the Postgres 19 is the release that will come out in the September-ish timeframe of 2026, a year from now. It's likely to have even more users of AIO, potentially such as index scans that will then reap the performance benefits for some workloads. That's what you're saying, right?

ANDRES: 00:43:47
Yes, and I suspect that that will go on considerably longer than Postgres 19. Although I think if you add a few more...

CLAIRE: 00:43:54
Okay, so also more things in Postgres 20, et cetera. And you were about to say, the second part...

ANDRES: 00:44:04
Is that post that AIO in 18 is only used for reads. There's no writes that are utilizing AIO.

CLAIRE: 00:44:16
What? Okay, but that's just because it hasn't been done yet, right, it's going to be used for writes, in the future?

ANDRES: 00:44:20
Correct, yes, but in 18 we're not yet and the reason for that is that there are lots of architectural issues outside of the AIO subsystem that need to be tackled. That's actually what I'm currently working on is to make the buffer manager ready to do AIO writes and it turns out that there's just a bunch of larger AIO independent projects that need to be done to make that feasible. And then there are currently patches to do some of the preliminary work to make it easier to later do AIO writes. And some of them have substantial performance benefits on its own. Melanie posted a patch to do write combining for writes in checkpointer, for example, and that can speed up checkpoints rather substantially and I think it also does some of the work that we then later need to do to turn those into asynchronous I/O writes and that's another that's a big, I think, set of improvements that we can do and then as mentioned earlier one thing that I really want to do with AIO eventually is AIO writes for WAL writes and that will be a pretty large project that requires like infrastructure changes that are not really related to AIO but that will hopefully have their own performance benefits. Yeah I think that's roughly the current state.

CLAIRE: 00:45:54
And if somebody is listening to this and they are a contributor to Postgres and they're not already involved in helping drive all of this future work for AIO in Postgres 19 or Postgres 20, like how do they get involved? It's just via the mailing list, or via reaching out to you, or just starting to do the work, like what is that process like? Maybe there's a PhD student somewhere who's listening to this.

ANDRES: 00:46:23
I think all of those can work. You can just decide that you want to start using AIO in one more place, and some of those are not going to be very hard. And you can convert those to use a read stream to use AIO for reads. And that can be done fairly easily, I think, in some cases. And you can reach out to me or to the entire list to ask for suggestions or to get review for the idea or the actual patch. You can also go to the PostgreSQL Hackers Discord, it's linked on the community website, and ask for suggestions there. Another big area where I would definitely welcome help would be to review patches that are related around AIO. Like I, for example, posted patches for parts of the redesigns of the buffer manager. You would be more than welcome to review those. Yeah.

CLAIRE: 00:47:38
Okay. We'll definitely include a link to the PostgreSQL Hackers Discord in the show notes for this episode. and as well as to the mailing list for anyone who's unfamiliar with that. I'm curious whether there's a list, like a punch list, you know how when a house is mostly built, but there's still this laundry list of some big, some small things that the builder still needs to finish? Is there a punch list for all of these pieces that still need to be built out to leverage AIO?

ANDRES: 00:48:14
There's a wiki page, but it's not, I should probably go and update it. I did some work on updating it after AIO got merged, but it needs some more work. But that's probably a good place to look, but just with a caveat that it might not be perfectly up to date.

CLAIRE: 00:48:33
Okay, so it's a work in progress, if you will, and it'll change over time, [Yes, definitely.] depending on when somebody listens to this. Okay, so let's pivot to things that went right in the project, things that you feel good about, you and other people who you collaborated with.

ANDRES: 00:48:56
I think the thing I feel best about is that we actually managed to get it done at all. And when I started the project, I was not at all confident that this was a project that we could succeed in. I thought it was important to try to succeed in, but yeah, I was not confident that it would actually work out. And that's definitely something I'm very happy and proud of. I think another thing that went well, and I think those are the parts... Let me restart that. I think what went well was that we found sub-projects that could be merged independently, and that help independently, like the relation extension part that I mentioned earlier. Being able to upstream that first was pretty important. I think otherwise having to also carry all those changes at the same time would have been very hard. and getting the read stream stuff into Postgres 17 and allowing various places to be converted to use the read streams was actually fairly crucial to merge the AIO in Postgres 18 because that meant that with just merging the AIO subsystem and doing a few dozen lines of change in read stream, all of those places suddenly started to use AIO And that made it a lot more reviewable than if after merging the whole AIO subsystem had to also go into all these other places, and change them to use the read stream interface, because that sometimes required non-trivial work in those places, because in some cases just to get rid of other architectural debt and similar things. I think several people that worked on AIO gained a lot of experience, and I think that was a pretty good success. And even though, as I'm sure that some people that might be listening would confirm, it was not always pain-free. And I would like that to have been a more pleasant experience, but I think it was still a lot of knowledge was gained across all the people involved. And I think that's great. Yeah, I don't really have other thoughts.

CLAIRE: 00:51:33
I mean obviously the fact that you got it done is something to feel good about but I'm struck by what you said after that, that you were not at all confident this was a project we could succeed in. And I almost wonder, I wonder if that's your nature. Is it fair to say that you are inherently skeptical of something in the beginning, that you're like picking up that idea and looking at it from different angles to figure out what could go wrong and obviously try to prevent those things from going wrong? Isn't that how you're wired or am I misreading you?

ANDRES: 00:52:09
I think that's part of it, but I don't think that is all of it. I've definitely tackled projects where I was like 95% sure that I could succeed. Just because I've been working on Postgres for a long time by now, and I know the community, and I can roughly predict whether something has a chance or is going to be controversial or not. But with the AIO project I did not have confidence in either my own skills from a technical point of view that it would be doable and also on the community politics. I think that's perhaps like one angle I forgot to mention earlier, which is that a change that is of this size, getting that into Postgres requires convincing a lot of people. And historically, our to-do list had a point that said we do not want to use direct I/O. Political things are a lot harder to predict than purely technical things. So I think, yeah, that's why I think I was more skeptical about this project than about other project.

CLAIRE: 00:53:26
Okay, so maybe let's just tease that out as something else that went right. I mean you and the other people involved in the project were ultimately able to convince a lot of people. So it wasn't just a matter of doing the work, right, and getting it done correctly, but selling people and bringing the rest of the committer and contributor engineers along with you. Like, that's something to feel good about, too. Maybe that's what you meant before. [Yeah, that's true.] It was, like, implied. Are there, are there...

ANDRES: 00:54:06
I think it was implicit in what I said earlier, but another aspect I think that went right was to actually develop a prototype first. Because without like being able to just explore crazy things and then roll back and not be too worried about getting everything right it would also have not been able to actually get to a design point where it was mergeable and that was, I think, one more of those things that were like, some, it was important to do but I did it too much, but where the exact right spot is hard to tell, but I don't think without the plan to write a prototype that would not be mergeable, I don't think it could have gone anywhere. I think one more aspect that I think went okay, could have gone better, could have gone a lot worse, is corporate politics. I worked on AIO, I think, while working at two different Postgres companies or three different Postgres companies. And you have to convince the companies to actually allow you to spend so much time on something that does not actually have very immediate benefit. Because it was always clear that it would take a while to get merged and that even then it would take more years for it to get adopted. And I think that's definitely also an angle where I've had to learn a lot about how to do that and how to get buy-in into investing this much into a project with unclear outcomes. I think that went okay. And I'm proud that it did not go horribly.

CLAIRE: 00:55:57
When you gave the talk at Montreal you actually gave a shout out to your boss Affan Dar for supporting you in your years working on this project, but I guess I've not been a fly on the wall in your one-on-ones with your boss, but it feels to me that there's a ton of support for what you and the team are working on, and the knowledge that like many of the decisions about what gets worked on in a future release ,or an upcoming release of Postgres, it's a very bottoms up process. Is that fair to say? And I feel like a Affan is supportive of that.

ANDRES: 00:56:37
Yes, I agree. It turns out there were several other managers over time. And I think they were all supportive, but in different ways. And I think managing expectations of the timelines and stuff like that is pretty important, to just not, otherwise you deceive your manager which is not necessarily a good idea.

CLAIRE: 00:57:05
All right. So other things that went right that you feel good about. I have one to throw out there. And you're going to shoot me for bringing this up because your moment of fame is behind you. It happened in like whatever that was, March, April, 2024, something like that. It was over a year ago. But were you actually, I think Thomas Munro had sent you something and asked you to do some performance testing on it and that's when you discovered the XZ Utils security backdoor [That's true.] and reported that security issue, and that kind of blew up the internet for a little while, but wasn't what Thomas Munro sent to you to investigate, wasn't that AIO related?

ANDRES: 00:57:49
That was the read stream interface. We were trying to figure out why it had some regression in some observed workloads, and as part of that I did all the tests where I then found that SSH was using too much CPU, and yeah, [So that's something that went right.] it turns out that it's good, very good, to be to learn about low-level benchmarking. It has unexpected benefits.

CLAIRE: 00:58:18
Yeah, I remember seeing an email from someone, I won't name names, but they were like "a database engineer wasn't going to be running low-level performance benchmarks like that, you've got to be kidding me," but they clearly have never met you, and are unaware of your commitment to investigating performance problems and getting to, I don't know, figuring them out. I mean, you can be very stubborn, can't you? Is that fair?

ANDRES: 00:58:49
I refuse to answer on the grounds that it might incriminate me.

CLAIRE: 00:58:55
[LAUGHS] All right. I'm going to go look at the chat really quickly because there's a bunch of other Postgres developers who are on the live parallel chat that's happening while we're doing this recording live, just to see if there are any other highlights of things that went right that you're not thinking of right now. Because I'm fishing, fishing for anything else you want to call out. I mean, for developers listening to this, is there anything else you did that you were like, huh, people making these kinds of large-scale architectural changes should definitely do this. And we did it, and you feel good about it. Fishing...

ANDRES: 00:59:38
I mean, I think one of the things that turned out to be very crucial for being able to merge AIO was that we got a lot of review by Noah Misch, and that was not something that actually I had planned upon and that planned for, and I think that went very well and I'm very very thankful for Noah that he invested so much time in it. And I think in hindsight I would have probably tried to do a bit more backroom dealing for like trading of reviews with other people to line them up ahead of time so that I could be more confident that it would be reviewed, because like I think it's, from a community politics perspective, and from a diversity of thought, maybe it sounds not quite right, but if you work closely on one project together, like the team at Microsoft on AIO, then you might not see problems that somebody that comes more freshly from it at the problem from the outside will see. And that was definitely the case with Noah. He found a lot of problems that I just did not think about. And I'm glad that that happened, but in hindsight, I should have invested more. That was luck, that was not skill, that that happened. And I think luck is a skill, but I would invest more in trying to line that up ahead of time next time.

CLAIRE: 01:01:07
So to try to put a fine point on what you just said, it wasn't luck that Noah found the problems, because that's something that he's probably good at, [He's very good at that.] it was luck that you enlisted Noah to help do the reviews and find the problems. Is that correct?

ANDRES: 01:01:25
I did not enlist Noah. He volunteered. That's the luck. He just did it.

CLAIRE: 01:01:29
Oh, he volunteered even better. [Yes.] So for those of you who don't know Noah Misch, he's a Postgres committer and contributor, he works at Google. And I think the first place I ever met Noah was at PGConf.dev in Vancouver last year, and he was there again this year too. And that's the annual conference where a lot of the Postgres contributors and engineers come together. And some users, but I would say mostly contributors. Okay, so you've given shout outs to Thomas Munro, Bilal, Yavuz, Melanie Plageman and now Noah Misch. Is there anyone else that you need to be sure to give a shout out to, or are there too many people to possibly list? This is like the Academy Awards now where you're trying to fit everybody in.

ANDRES: 01:02:17
Thomas Munro, I think, did a fair bit of work too, both on the actual AIO subsystem and upstreaming the read stream interface in a very different form than what I had prototyped. And then I think I had a lot of discussions with various people over the years about different aspects of it, but I think the people that were just mentioned are the most important ones.

CLAIRE: 01:02:53
One of the things that Melanie just chimed in on the chat and said is that getting someone experience to review these architecturally significant patches is hard because it's just so much work it takes forever to do, she said, and that is probably what makes Noah's contribution so so good. That he, not only do it, he volunteered for it, is what you're saying. Okay, so you've given a shout out to to Thomas, to Bilal, to Melanie, to Noah, is there anybody else that you want to shout out to, or too many to list?

ANDRES: 01:03:33
I would probably have to look in the commit message. There were lots of other, smaller, projects that were done. I think David Rowley did some prerequisite work. I had a lot of discussions with Robert Haas about architectural aspects. I had lots of discussions about parts of it, and he did also do some review with Heikki Linnakangas, but I'm sure that there's many more that I'm just not thinking of right now. It's been, after all, like six or seven years.

CLAIRE: 01:04:09
Well and that's one of the things that's nice, in the commit messages, the team does a really good job, and i'd say a better job this year than in the past, of including who reviewed this commit, who were the authors, who tested it, who was it reported by, like I think in a lot of open source projects giving credit where credit is due is an important part of the culture and I think that's certainly true in Postgres as well. So yeah, it's fair to say that there's a lot of other names that are listed in the plethora of commits that are associated with this project. Okay, so are there any other lessons you want to highlight to someone who's listening, who maybe is about to embark on their own architectural project, and is trying to make sure they only make original mistakes.

ANDRES: 01:05:23
I think one other aspect that I only somewhat mentioned is to take care of yourself if you do something that takes this long. I found that development progress definitely was associated with how well I was doing in my personal life, and exercise, and all that kind of stuff, and that I feel that more strongly with projects that take this long because I can look back and remember I was working on this part when I was sick or something like that. And I think that is, particularly for long projects, something to remember that it's important to take care of yourself and not just invest into the project and do more and more hacking.

CLAIRE: 01:06:13
There was a a woman that I used to spend a lot of time with at swim meets, both of my children were competitive swimmers growing up, and when they're at swim meets oftentimes you're literally standing by a pool for the entire day, like for hours and hours, like you're there for the whole day and they swim for, you know, three minutes or something like that. But she had just taken a job. It was a really big job as an executive. And what she realized in taking on all that additional responsibility is exactly what you said. She had to take care of her body in order to be successful in her job. So she had to change her diet. She had to find a way to exercise and have an exercise routine that she could do even when she was traveling and in hotel rooms. I think you're right, you got to take care of yourself or your brain isn't going to be able to do everything you need it to do. So that is worth shining a light on, I'm glad you brought it up. Anything else you would tell past Andres, if you could go back, and whisper in your own ear?

ANDRES: 01:07:29
I mean, all the mistakes I mentioned. I would tell all the, if I knew ahead of time, which architectural decisions would be wrong and which prerequisites could be tackled independently earlier, then I would probably do that. But that feels like it's not really the answer to the question. [Oh, it's the answer to the question.] Yeah, I don't think I otherwise have anything very smart to say, unfortunately.

CLAIRE: 01:08:02
All right, well, before we wrap, I guess I'm curious. You're here on a podcast. This is the second time you've been on the Talking Postgres podcast. Thank you for that, I really appreciate it. I didn't know if you would say yes, but I thought it was important for you to share your learnings from this project so that it will benefit other people. But I'm curious, now that you've been a guest, you've been a guest on this podcast twice. You've also been a guest on Oxide and Friends, and another security related podcast, maybe even more that I don't know about. But I'm curious whether you listen to podcasts.

ANDRES: 01:08:46
I do listen to podcasts, but mostly non-technical ones. I think most of the time when I'm listening to podcasts I'm trying to let my brain do something else rather than focus on technical things because I already spent way too much time thinking about Postgres and stuff like that. I occasionally do listen to technical podcasts, but it's mostly when somebody mentions that something is particularly good when it's like very square in my interests. But most, yeah. I don't listen to too many technical ones.

CLAIRE: 01:09:31
Okay, well I won't put you on the spot and ask you what they are. But I will say thank you, for the work that you do on Postgres. For those of you who don't know Andres's origin story Like I said, you should go back and listen to, I think it was episode 8 of Talking Postgres, where he and Heikki dove into how they got started. But you got started in kind of an unusual path, and I don't know, it's almost happenstance that you landed in Postgres. And I think it's fair to say that I, and a ton of other people, are really glad that you did. So I feel very lucky to work with you, and I guess that's a little bit of a fangirl type of thing to say. But I do, so I'm saying it.

ANDRES: 01:10:15
Thank you.

CLAIRE: 01:10:17
Yeah, and thank you so much for joining the show. I don't have any other topics or questions for us today. So unless you do, we will give it a wrap.

ANDRES: 01:10:25
Cool. I don't think I do that right now.

CLAIRE: 01:10:31
I want to say thank you to Andres Freund for joining us. And if you're listening and you liked today's episode, and I hope you did, and you want to hear more of these Talking Postgres episodes, you should subscribe, on Apple, or Spotify, or YouTube, or wherever you get your podcasts. And please tell your friends. If you tell your friends or leave a review, that helps more people discover the show. Word of mouth is the best way to discover a new podcast. You can always get to past episodes and get the links to subscribe at TalkingPostgres. And transcripts are included on the episode pages on TalkingPostgres.com too. And a big thank you to everybody who joined the live recording and participated in the live text chat on Discord.

Creators and Guests

Claire Giordano
Host
Claire Giordano
Head of open source community efforts for Postgres at Microsoft. Ex-Citus Data, Amazon, Sun Microsystems, and Brown University CS. Serves on PGCA board. Prolific Postgres conference speaker. Co-creator of POSETTE: An Event for Postgres. Loves sailing in Greece.
Aaron Wislang
Producer
Aaron Wislang
Open Source Engineering + Developer Relations at Microsoft + Azure ☁️ | Go (golang), Cloud Native, Linux 🐧 🐍 🦀 ☕ 🍷📷 🎹 | Toronto 🇨🇦🌎 | 💨😷💉 | https://aaronw.dev/hello/
What went wrong (& what went right) with AIO with Andres Freund
Broadcast by