You're probably already using Postgres: What you need to know with Chelsea Dole & Floor Drees

You're probably already using Postgres: What you need to know
===

CLAIRE: Welcome to Path To Citus Con. It is the podcast for developers who love Postgres, where we discuss the human side of open source, databases, Postgres, and the many PG extensions. The podcast is now available on all your podcast platforms, and you can get to past episodes and get links to all the platforms at aka.ms/PathToCitusCon, all one word. Transcripts are included in the episode pages on Transistor, too, and I'm Claire Giordano.

PINO: And I'm Pino de Candia. Today's topic is You're probably already using Postgres: What you need to know. We have two great guests. Our first guest is Chelsea Dole, Staff Software Engineer at fintech startup Brex.

This past year, Chelsea's made the rounds in the Postgres community conference circuit, with lots of talks including at pgDay Paris, Nordic PGDay, Citus Con: An event for Postgres, Postgres at SCALE, and even a lightning talk at PGConf.EU. Hi, Chelsea. I understand you're also a Citus user.

CHELSEA: I used that my last job with great success.

CLAIRE: And our second guest today is Floor Drees. I hope I pronounced your name right. Floor is based in the Netherlands and works as a Community PM at Aiven. Floor is also a Devopsday core member and organizes the Devopsday Amsterdam and another conference that I don't think I can pronounce properly called Eindhoven.

FLOOR: Perfect.

CLAIRE: Okay. I first met Floor at Microsoft at FOSDEM specifically. And Citus was just acquired when I joined Microsoft and a whole bunch of my awesome Citus teammates live in the Netherlands and worked in Floor's co-working space. Welcome, Floor.

FLOOR: Thank you. Thank you for having me.

CLAIRE: So a little bit of background on this topic of "You're probably already using Postgres: What you need to know."

A lot of applications run on top of Postgres, obviously. But for many application developers, they, they're a little bit further from the database. They're probably working in a framework, whether that's Rails for Ruby or Hibernate or Spring; Django for Python. There's obviously a ton more. So as an application developer, you might feel you don't need to know a lot about the underlying database.

But what if you did? How could knowing more about Postgres help you? So, that's part of the backstory to this topic, but Floor, you came up with this podcast topic, so can you tell me your backstory?

FLOOR: Yeah, absolutely, because there is certainly a backstory there. I also don't want to take all of the credit, Claire, I think we came up with it together. But my Postgres story started in 2012 when I started learning Ruby on Rails. And we would load it in as the gem we would use for like database stuff, right? And it was just functional DB, migrate with create columns for stuff in your application to work the right way.

And you didn't think about Postgres ever, or MySQL or SQLite, which were also commonly used with Rails apps then, and so you would take for granted that it would do data management for you. And I used it for transactions and other things that companies that I worked at.

And then I started working at Aiven, and I needed to really deepen my understanding of data tools like Postgres which I had then probably most recently used as a data source for Grafana back when I worked at Grafana for a bit. And again, without really lifting the hood on what Postgres is, right and also Rails and Ruby conferences, which is really the world that I come from, didn't have Postgres people talk about data management or Postgres specifically, and in later life slash career, I guess, I found that Dev and Ops could be joined, right?

And had a lot in common, really. Had some common ground. But DBAs were still very much outside of the picture. It was not part of the gang. And if so, even with all of the companies that claim that they're, they're data-driven very few decision makers will actually really know what is happening within the data wranglers that they use.

And that's too bad because I think that having more of an understanding; what the data tools in your tool belt can do can really help you design better, more innovative solutions. So I think that's where the topics comes from. I just feel like there might be innovation and thinking creatively without constraints, but actually understanding the tools that are in your tool belt.

So, there there can be a lot of innovation in knowledge as well. So I think that's the long-winded explanation of the sort of backstory of why I wanted to talk about this topic specifically.

CLAIRE: Well, it sounds like you and Chelsea have something in common as well, because when I was talking to Chelsea beforehand, Chelsea, you said that you started out as a full stack backend engineer and that you have absolutely zero formal education in databases.

So... I'll circle back to Floor in a minute, but Chelsea, I'm curious, like, what were your first steps? How did you get started understanding Postgres better?

CHELSEA: Yeah, that's completely true. So I, beyond just having no formal education in databases, I have no undergraduate degree or beyond in software engineering or computer science in general.

I actually studied economics and political science in college, and I got into coding through research. I was on this path to want to be a journalist because I always loved writing analytically, which I also use in my work today. But the longer I was in university, the more I met people who were just so passionate about this whole computer science thing, which is something that I never had exposure to at a younger age.

Wasn't at my high school, wasn't anything my family was a part of, etc. So I kind of got a bug for it when I was in college, took some classes, but being from the U.S., you don't exactly change your major three fourths of the way through. The cost of that is a little bit insane, but after I graduated from college, I, only a couple months later, ended up doing a bootcamp.

I remember I was trying to decide if I should do it because I was applying for jobs and I had a final interview. Or, actually a Marketing Analyst position. And I decided, okay, if I get the job, I should take it. But if I don't, I'm going to do a bootcamp. And I did not get the job, but I remember feeling really relieved because I was really excited to delve more into it.

And actually, as a side note, I find it very ironic that the reason I actually failed the interview is I didn't know SQL. Which I definitely know now. So I remember I was trying to, you know, binge-learn SQL the day before this interview for a Marketing Analyst role. But, you know, I did a coding bootcamp, which coding bootcamps don't focus on backend or databases at all, because, and that's the correct thing, to be clear.

I think that they generally prepare you for a sort of full-stack role. The one I did, you can concentrate, and I, from the beginning, was a big fan of backend. I liked the clarity of it. And I found, I've never been really a designer, at least visually, so I think that being frontend didn't appeal to me very much, but as I started to go through jobs one after another, I found myself drifting further and further from full-stack on paper to actually doing backend at my first job, and my second job I was hired as a data engineer.

My second job, which is at Coffee Meets Bagel which is a dating app. I would say that's where I learned the vast majority of my database chops. And it's a smaller startup, I think it was around 50 people, and I was hired as a data engineer, which now, no longer working there, I can admit that I was utterly unqualified.

But they really took a chance, and also I would say, it's kind of special and unusual to hire your only data engineer at a mid level. So, at that job, where I started not even knowing really what a replica was, and being hired as a data engineer, I ended up learning so much through the process of, "oh crap, I really should know this."

Let me figure this out. And so I think that most of my Postgres knowledge, because Coffee Meets Bagel is a Postgres shop, was learned kind of trial-by-fire. I was technically a backend developer slash database engineer, and the more I started working there, the more I realized that the problems I was trying to solve, and the problems that I got myself into were often around either: I would say database design or access patterns or query specifics that I didn't really understand when I first started, or even just infrastructure that I wasn't, I had never really worked in infrastructure before.

So I think that that journey was how I got into it. I started out liking backend but having no education in it. And I got myself hired to a job I was unqualified for, and then taught myself how to be qualified. And also, I would add, that sounds a little bit... I would add to that, I also had some good mentorship at that job, and since then, so that's always a big part of it, but mostly it was just "Oh crap, what is this? Let me figure it out."

PINO: Chelsea, could you clarify for me, did you start learning Postgres at the first job or when you became a data engineer?

CHELSEA: I used it exactly as Floor did. Which is, we did use Postgres. Specifically, actually, and a bit of MySQL, but it was just this invisible thing in the background that "just works," you know, TM.

So, it was, oh, I'll save something to the database, and I had no idea where it was hosted, I had no idea if it was fast or slow, even observability wise. It was just something that didn't exist to me. So, I would say that... It existed and I did use it, but my knowledge was at most ORM-level.

PINO: And let me ask something else about, about being a data engineer, since you were the only data engineer at the company, what did your colleagues rely on you for and how did they interact with you?

CHELSEA: So, my job there was specifically as a data engineer for the machine learning team, because Coffee Meets Bagel is a dating app, and the thing that they specifically try to provide is machine learning- based matches. So I was really supporting data scientists by building infrastructure for model training and things like that.

I would say that I would interact, my users there were people on data science, so that was interesting trying to prepare databases and infrastructure and systems that would face internally where, you know, latency isn't as important and you're building potentially pretty high throughput systems.

And then I also had this second job, which is: make everything fast, which I think a lot of data engineers can probably relate to. How successful that is, hard to say, but those are two very different use cases. I would say generally people come to you as a data engineer when something is wrong, when something is slow, and so I think that that trial-by-fire element really came through.

Because no one likes not having the answer when somebody says, "why is this so slow? We really need to fix this. This is a problem now." So I would say that that's kind of how people interact with me from these two very different use cases, internal data scientists and model training where latency is not a problem and then very latency-sensitive user-facing applications.

CLAIRE: So you've covered a lot of ground and we want to talk about, obviously, Postgres and ORMs and how to give application developers maybe some places to go to better understand the underlying database, why that might help them, but you said something about hiring strategy that I just can't let that go.

You, you said basically that you were unqualified for that data engineer job. But what's really interesting is there is a hiring strategy where you hire someone not because they have done that job before, not because they have the requisite knowledge to do that job on day one, but because they have the skills, and the motivation. And that capability of learning. And maybe even inventing things that haven't been done before, right? Because they're not limited by, "oh, this is the way we always did it." So I don't know. I wonder if you are wrong when you say you were unqualified, maybe your skillset and your motivations actually made you uniquely qualified.

You just had some learning to do. And I think you can tell what my hiring strategy is when you hear that.

CHELSEA: Well, I appreciate that. I think that there is qualification and effort and willingness to learn. And, you know, it's worked out for me so far, so I guess I can't speak too badly of it. And obviously, I'm also very grateful that that happened.

CLAIRE: Yes, yes, because whoever that hiring manager was obviously did take some risk. The proof wasn't there yet. Clearly, they were right.

So, all right. So you also said something else. You said that your knowledge when you first got started in job one, the database was this invisible thing that just works and that your work was almost at the ORM level, right?

Beyond that, you weren't sure what was happening underneath the ORM, what those SQL queries were, etc. I think I heard the same thing from Floor. Is that right, Floor?

FLOOR: Yeah, that's absolutely correct. I just used it as, you know, taking something for granted. It's part of how to build an application, but I never really thought about it.

I also come from a very non-traditional background because I studied art actually. And then sort of fell in love with the internet and wanted to figure out how the internet works. And that put me on this trajectory of learning, learning to program and, and learning more about all of the underlying structures.

I had a similar sort of trial-by-fire where when I just joined Aiven we were migrating documentation from one platform to the other. And while doing the migration, checking all of the documentation as well. So had to do a lot of well, I had to build all of these things and try and figure out if that's if, you know, the documentation was still up to date, still valid. And that taught me a bunch of stuff that was really super useful. And then also at Aiven internally, we have a wonderful thing that is called a Let's Talk Aiven Tech, where we cover all of the data tools that Aiven offers. And there will be different people from the Solution A rchitecture or Developer Relations that will go through all of these tools, show a little demo, answer all kinds of questions, specifically geared to the roles at the company that aren't, like, traditionally technical.

And so that's such a useful primer. And the invitation is there, too, to follow up with the person who did the talk to learn more and to ask questions whenever you get stuck. So, likewise I also had a lot of help and mentoring but also, it just really helps when you're thrown into this project and you just, you have to figure it out and you have to ask for help.

And that's been sort of a pressure cooker to learn more about all of these tools. And there's so many things that I didn't know Postgres could even do, right? So, it would need to say a lot of things on the tin to do what it says on the tin. There's so many things that are natively supported that I didn't even know about.

I didn't know that there would be, you know, international character sets because that's something for, I've worked as a community manager for a long time and work with communities and like forms and input fields and names and things and characters are, it's super difficult.

And so I didn't even know that that's in the capabilities of Postgres. And I didn't know that outside of whatever is natively supported in Postgres, there is this vast extension network, all of these things that you can plug into. And so, my knowledge was so incredibly basic and I've learned, my learning curve over the last 12 years that I'm here now, it's it's just been incredible.

CLAIRE: As I was thinking about the audience for this podcast, right...application developers, maybe don't have a deep understanding of databases, maybe didn't study it in school or haven't actually worked in that part of the stack. And I was remembering that picture on Twitter that I think you might've posted, Floor.

And it was of Karen Jex giving a conference talk in the last year or some time to a completely packed standing-room-only room. And it was about Postgres, because Karen works for a Postgres company and that's her area of expertise. But I seem to remember, like, that was the only Postgres talk at that conference, and that was standing-room only.

Am I remembering that right?

FLOOR: I mean, almost, right? It was at it was at KubeCon in Amsterdam this year. And it was not standing-room, but it was... The room was packed and there were rows of people standing because there were no seats left. And I initially went to see Karen's talk because, I don't know Karen, and I wanted to, you know, support, but apparently my support was not needed because the room was packed, everyone was super interested, and indeed it was the only talk really about databases and, you know, specifically Postgres and Kubernetes. And there were so many questions afterwards. I wanted to, you know, like after a talk, I want to be like, "Oh, you did a wonderful job," but there was no way of getting to Karen because there were so many people lining up to ask even more questions.

So, it was really cool to see because as mentioned when, for the more programming-language- focused conferences like Ruby or Rails, there was never really a day. Our database or Postgres talk or MySQL or any of these in a mix. And so it's not a topic, that you, sort of organically you know get to learn more about.

And so this was just good to see that there's an interest. And that's just fantastic.

CLAIRE: Very cool. Wish I'd been there. Okay. So what, what are the key things you recommend that app developers learn about Postgres when they want to have that deeper understanding? Chelsea, you want to start?

CHELSEA: Sure. As I think that both Floor and I have mentioned, you know, when you start out, it's just this black box that just works.

And even if you are setting up a personal project using Postgres, you're really just doing something on localhost. There's no optimization needed. But I find that the two ways that developers usually start to interact with Postgres deeper are ORMs and migrations. So, this is me personally. I would say that the way I learn things, I'm very, well, I guess we're on a software engineering podcast, so I can say that I'm more of a depth-first person than a breadth-first person.

I like to find an entry point and just jump in deep. And so I think that these two entry points are a great place to start. So, for example, if you're used to using an ORM just in your day-to-day life, when you start running into things that feel confusing or the syntax is weird, then just ask why?

Let's figure out what's really being executed. So I think that taking those two very relevant pieces of technology: migration systems, such as whether you use Flyway or Ecto or, you know, anything in Django, and starting to actually learn that further as a springboard to understand what's actually being run.

Because, ORMs can be quite dumb. They often don't run things as you need to, and they often, in scale, lead to some pretty gnarly data problems. So, understanding that that can happen and starting to unfurl through there is great. With migrations, you similarly will run into issues eventually. When will they lock or not lock?

How do you actually see how long it's taking to run? What happens if a index fails? Why won't it just recreate? It seems like it should. Why does it say that it's invalid? You know, these are sort of natural questions that will come up through actually diving deeper into what you already use. I think that's a great approach, personally, because I think it can be overwhelming if you say, "Okay, well, I use the Django ORM, so let's look up what this really niche setting is on a Postgres server."

Because there's no real impetus for learning that at that point. So instead, I think it's great to take what you already do, and... Just intentionally dive deeper into the weird things that come up. You know, instead of just saying, Oh, it's slow, I guess that the query just doesn't run well. You can say, "okay, why slow? Let's Google little bits of this." "Okay, what's a wait event?" " Why is this ID so much slower than that ID? That seems really confusing." "Why would Postgres be different that way?" And those sort of depth-first conversations can be a great place to start compared to just... You know, throwing yourself off a cliff into something that feels like maybe a best practice place to start, but has no grounding in what you're working on.

Does that make sense?

CLAIRE: Yeah. I mean, you start with asking why, and it sounds like you're going down rabbit holes, right? Asking questions, peeling the onion, searching on Google. But it sounds like you're recommending, the way you like to learn is not just depth-first, but very specific to the problems you're working on.

It doesn't sound like your preference is to go learn things in the abstract, right, in theory, but you like learning them in the context of whatever the project is. Did I get that right?

CHELSEA: I think that there's a place for both. I think that when it comes to, when it comes to me, I do like to dive deep and I find that asking those questions, brings me around to the big picture.

If you, like me, when I first started, have no grounding in any particular databases, I would recommend reading, I believe it's, I don't have this in front of me, I believe it's "Designing Data-Intensive Applications." It's an O'Reilly book. Reading that before I really got into Postgres specifically, I found gave me a lot of grounding in general database topics.

So much that when I ran into those problems, I could remember the terminology that I just didn't know before. So, I would say that if you are not familiar with you know, I would say terminology like a write-ahead log, which of course I wasn't familiar with at some point or a replica or, what is even the purpose of replicas? Or, you know, things like that, that sort of broader knowledge application, I think is a good place to start.

I'm a personal advocate of there's no, even if you're starting with Postgres, you don't need to start Postgres-specific. If those are your questions, answering broader questions first is a great idea. And then diving into the specific technology with that depth-first approach. I think that a great thing you can do on a slightly more specific level is try to, if you're learning in this depth-first way, where you're kind of learning-by-doing, make sure that you or someone on your team has some basic observability, because things like logs help create those questions. If you don't have questions to answer and kind of rabbit hole into, like you said, you start kind of going in misguided directions. So having some logs, observability, I mean, that gives you the stuff to Google. So, and that helps you answer questions and dive deep like I was saying.

PINO: Chelsea, I just wanted to make sure I understood the recommendation. So there's a temptation since it's a black box. The database the ORM is talking to, just seems to work, that you don't need to dive deep. Do you wait? Do you suggest that people wait until they hit a problem and then they start to investigate this stuff?

Or really, as a way of working, the ORM is a layer that's worth understanding how the ORM creates queries, what the queries look like when executed, how fast they are, etcetera. So that's something just if you're working in an app that has an ORM integrated that you that you should be paying attention to both the database layer and the app development as just a standard way of of doing your development.

CHELSEA: I would say so. I think learning to be what I usually call "data aware." I mean, maybe not your spec- you might not be specialized in databases, but being aware and conscious of what you're interacting with more than a black box, that's good, even if there's no problems. I think that you can be asking yourself those questions and asking why this, why that, even if it's not actively on fire, might be the opposite way around.

Why do I have to use this keyword when I'm running a migration? Like, what does that actually do? How does that prevent, you know, locking. What is a lock? I think those questions are kind of good ones to follow. Yeah, does that make sense? I do think it's of course worthwhile to learn and understand without there being problems, so to speak.

I think that speaking for myself I'm very driven by I would say, like, Floor used the phrase "pressure cooker," I think. I'm very driven by the pressure cooker. And so even if it's not because something is actively on fire, I think that it helps me to feel like the motivation and the question coming from something tangible.

PINO: That does make sense. And I would, from what you and Floor have talked about, it sounds like there are many times when an app dev might not need to because, for example, there's a data engineer on their team. And so, for example, in the case of migrations, it kind of works in development on my test and and actually running it in the production environment.

That's not something that I'll have to do. So what is that interaction like? Perhaps, maybe you can make a copy of the of the of the actual production database to run the migration. You run into different problems. What do you recommend there?

CHELSEA: Well, I think the first thing is, if you are lucky to have a database engineer just start asking them questions.

Database engineers are often few, far between, and kind of stressed. And so they're always really thrilled to have people who are on the app development side asking questions and wanting to learn. So I think that is an amazing place to start if you have that network. Any kind of staging environment is usually a gift and a good thing to try out migrations and queries on usually more so than local development, because you wouldn't have any kind of scale on local host.

So you have access to a non-prod probably cloud-based or just a separate server. With "prod-esque" data, that's a more realistic place to start, I would say, but yeah, my first suggestion would be, you know, using your network, if you're not running the prod migrations, who is? So somebody is doing that, or at least knows how they're run and, you know, again, with software engineering jokes, they, you know, our databases are high availability. They want to be high availability too. You never know when they want to fail over and you need to start running the prod migrations.

PINO: Great. Thank you. And maybe we'll ask Floor the same question: what are the key things you recommend app devs learn about Postgres?

FLOOR: Yeah, sure. And first I want to say that "prod-esque" is a term that I'm going to steal and use a lot.

Chelsea, I love that.

CLAIRE: Yeah, and I'm trying to figure out, do I hyphenate it? Do I put a little apostrophe there? I'm loving that term too. And I've now written it three different ways in my notes.

FLOOR: I'm using a hyphen. So, you know, like, I don't know. I'm going to use that one. I think for me, it's important for app developers to check their assumptions.

I think there are some assumptions about what Postgres does and doesn't do that might not be entirely correct. So, for instance, when talking to app developers, I noticed that a lot of them don't really know that Postgres offers concurrent database access, for instance. And, you know, so there's no read log, readers can, you know readers don't block writers, writers don't block readers.

And that's, you know, the MVCC is not something that a lot of people are very aware of that are of the app development persuasion. So that should be something that people are looking into. And then on the other side, I think we had a talk at Devopsdays Amsterdam this year, where someone was talking about the shiny object syndrome.

Don't know if they coined that term, or if that has been around for longer but we tend to, as app developers, look at all of these new and shiny frameworks and tools, and we want to try and implement all of those because, you know, they're fun and exciting and all of that kind of stuff and there is so much strength in... can I say "boring technology?"

Like boring is not a bad thing, right? Boring is solid. It just works. And so I would want to say to app developers too, before adopting more complex data infrastructure, for instance, you'll see companies that are using Postgres to also add OpenSearch and Redis. While those are really super interestingset of technologies, it might make a lot of sense to investigate what's possible in Postgres before, you know, adding complexity to your system, because text search is definitely something that is possible, maybe not instant full text searching, but, you know, text search is, this is a possibility or specialized indexing those are all things that are just, you know, those are available in Postgres and there's no need to add more services to your infrastructure that also, like, keeping your infrastructure simple also means that there is easier onboarding when there's new people coming to your team. They don't need to learn a bunch of services if they're not aware of all of these. It prevents legacy maintenance. There is you know, security and all of these upgrades and the different upgrades scenarios for all of the different tools.

That's a consideration too. There is cost efficiency in that. And so, but maybe before that, even at a company level, try and see if you can do, like, little communities of practice and figure out what other teams might be using Postgres or might be using one of those other services and see if you can learn from them before you start adding tools, because it's absolutely necessary.

CLAIRE: Got it. One of the things I like about this podcast is there'll be good show notes afterwards and hopefully we'll be able to capture some of these some of these tips in a nice easy bulleted list. But, I think the key thing is you're going to have to be curious to understand more. And I like this notion of the "shiny object syndrome."

Although, if you live in the Postgres world, you kind of do think of a new Postgres release as a shiny object. There are obviously, even after 30 years, there's still a lot of new capabilities and features that come out every year.

PINO: Well, maybe I'll ask about this shiny object syndrome, even if you are just using Postgres and have more, because it can do a lot of things.

What are your recommendations, Floor, about adopting?

FLOOR: Yeah, excellent. So I think it's both, both the investigating the extension network but also know that, or get to know the open source project a little too, is that, I think that there's an annual release and I think the average amount of new features is like 180 per release, which that's a big number, yo. So that is huge.

And so checking release notes and maybe you will have to interpret them yourself a little too, because a lot of release notes for projects will say like, "fixes, blah," or "adds, blah," without really saying what the added benefit for an end-user might be. But checking release notes and maybe do that in a, like a little project group within the company might be super useful to find things that are features that you can use to your benefit. I have certainly seen it at companies to have these communities of practice come together and go through release notes together to see what it means for them.

Because that's often not really what comes across in release notes that project will put up.

CLAIRE: I was listening to your podcast interview with Corey Quinn, Floor. Screaming in the Cloud. And I know that was a completely different topic. You were talking about community together, but you did mention there that at Aiven, you have a speaker's corner. Where all of you who give talks will kind of get together, I don't know if it's physical or virtual, it's probably virtual given how spread apart people are, and you'll share your talks beforehand.

So are you suggesting that, like a Speaker's Corner in a company, that there be like a Postgres Corner, where people share their understanding of, you know, the latest release, or...

FLOOR: That sounds super nice, and I think that's actually something that is more towards that at Aiven too. So Speaker's Corner, first of all, that is a fantastic initiative for veteran speakers, but also newcomers, because you learn just so much from how others interpret your talk.

That you can't anticipate yourself. And so, I have used Speaker's Corner for all of my talks to, you know, do a dry run and get a bunch of feedback. And it's been super duper helpful. But that's, you know. Entirely besides the point. We do have a weekly demo session as well, where everyone can just tune in and they will demo all of the, you know, like internal tools, but external tools too.

And so, yes, it might be, for instance, Grafana. When Grafana 9 came out and brought all of this alerting technology, we would go through what that means for, what that would potentially mean for our customers to better understand what that brings. And so yes, there's a lot of that sort of polish sharing within Aiven, which is absolutely excellent.

I feel really privileged to be able to really take advantage of all of those things. Yeah.

CLAIRE: All right, so let's circle back to Pino's question for a second about extensions specifically. Not everybody listening to this podcast... I mean, I imagine a lot of people do, but some people probably don't know what a Postgres extension is. So who wants to define it here? Don't everybody go first?

CHELSEA: I can give a go at it. So a Postgres extension is usually an open source addition to vanilla Postgres. It can be something that is included already in the Postgres binary that you're able to install without any access or additional software added, or it can be something that is additional software that you use and need to actually alter your installation.

Depends. Postgres itself is designed to be the most simple version of what it provides. It's designed to not provide the bells and whistles but to give you access to installing them if you need them. The reason being that it wants to suit the best use case or the most general use case and allow people to configure it from there.

This is a different philosophy than some other relational databases, which try to suit themselves to a pretty specific purpose. The purpose of extensions is to basically model your database on the server level, not just the database access level to suit what you need. So some examples might be the classic pg_stat_statements, which can give you history of queries that are being run, which is a great first extension to try.

So it can be as simple as that, collecting more stats, or it can be as complex as Citus, for example, which I mean, I'm far from the expert in Citus in this call right now, but which will deeply and critically change the way Postgres even designs tables or answers queries. So it can be a huge spectrum of how much it alters the basic Postgres installation.

CLAIRE: One of the things about extensions that I think is so cool is that this ability of extent, or this capability was designed into Postgres from day one. Like, if you look at the original academic design paper for Postgres, I think it's number two or number three on the objectives for the database.

And one of the other things that's cool about it is it's enabled a lot of innovation, right? I mean, to get a new capability in core can sometimes take a long time, like years. Right. Whereas by enabling extensions that can modify the database. Runtime, like, as things are running, people can do things like Citus, but there's a hundreds of other extensions out there.

PostGIS is probably like number one on the list and hugely beneficial in the geospatial world. So yeah, I think it's contributed to a lot of the growth of Postgres.

CHELSEA: Agreed.

PINO: Maybe I'll add that it also maybe helped avoid forking. There are forks with Postgres, but perhaps the extensions have helped keep the community together by allowing that, by providing that flexibility.

FLOOR: Can you expand on that a little? I'd love to learn more.

PINO: Oh, yeah, sure. No, I've seen Postgres forks. In fact, Citus started as a Postgres fork. And then I can't remember what year it was well before I joined. 2016, unforked Postgres that got a lot of attention in the community and able to provide, I mean, it's a really, a very impressive thing to be able to provide a distributed version of Postgres as an extension.

A similar one is, for example, that you can do columnar storage in Postgres by using a different table access method. I think the standard one, the in-core one is the heap, but there's columnar storage. Citus has a columnar storage capability. I've seen other forks. So I've seen some forks that are very interesting in the community recently.

There's Neon database, which is serverless. That was a fork, and they've documented why, and their intention is to hopefully, and I don't really know the details, but to be able to get some changes into Postgres that will allow them to unfork. And there's some other examples, but I think that, you know, obviously, vendors will fork to provide some advanced features, but a lot of these really advanced capabilities then eventually make it into Postgres itself, thanks to the extensions. And later, there's a path to going into core, if that makes sense.

CLAIRE: Melanie Plageman is in the audience on text chat, and one of the things she's just typed in is that the philosophy of the Postgres contributors is that things that don't need to be in core to work shouldn't be in core.

And then she has the caveat "usually" underneath it. But, I think it's a pretty interesting philosophy. And, some things like PostGIS, like Citus, can persist as an extension. They don't need to go into core. And what's really cool about being an extension is that typically you can make it work on the latest Postgres release, within, you know, days or weeks of the latest Postgres release coming out. Whereas forks oftentimes, like those lag significantly behind in terms of months or even years. So, anyway, I'm a big fan of extensions, but. We're here to talk about: things that people need to know in order to better understand the underlying database if they're an app developer.

So maybe I'll reel us back for just a second. I know I had shared before the podcast this link to a talk that Louise Grandjonc gave at PyCon a couple of years ago now. And it was obviously geared to Python developers and it was called "Optimizing your Python and Django apps with Postgres superpowers."

And it was definitely had that performance angle to it, right? It's about optimizing performance. And I'm just curious, Chelsea, Floor, if you had a chance to look at it and what your thoughts are about whether those superpowers that Louise talked about in there are still relevant today?

CHELSEA: I watched the talk.

I wasn't familiar with it before, but I would say that the superpowers are extremely still relevant. For anybody who hasn't seen the talk, it is from the perspective of an engineer who works with Postgres and wants to teach how to optimize it through the perspective of the Django ORM. And she does get into some details that are, say, Postgres specific, separate of using the ORM, but that is sort of the entry point she uses.

I thought that was a great entry point because I think a problem I see with some Postgres talks is that they dive in deep to what feels kind of scary. They leave the GUI behind, they leave you know, they leave everything that connects an app dev to what they're familiar with and say, "oh, just jump in, it'll be fine."

And although I do agree, you know, I think, one big advice I would give people is to try to de-risk and stop being fearful of Psql. I'm a big fan, but I think that that is an entry point and she does a great job of talking about how to make that transition. The things that I think are good from that talk that are still relevant to this day are focusing on things like how both things that are pure Postgres, like, "here's how you read an explain plan. This is what it means." To things that are in between the ORM and Postgres. So not Postgres itself, but if you're an app dev, this is your entry point. So, for example "what is pagination?" And "how can I optimize my queries through using different styles of it?" Or "oh, I didn't realize my ORM is selecting every column."

How can I get better performance by selecting specific columns, or using a limit, and things that your ORM might be doing to "help" you, quote unquote, that could actually be shooting you in the foot. So I thought it was a great talk. If you're in the Django community, which, you know, the first conference I ever went to was actually a Django conference or a Python conference, so, I love that.

Then I think it's a great talk if you're looking to make the app dev jump. And even if you're not really into Python, you know, I'm sure that there are similar techniques to use in different languages as well.

CLAIRE: I know that you mentioned logging earlier as something that you pay a lot of attention to, Chelsea, and that helps you kind of come up with those questions of what you're going to need to investigate, what you're going to investigate. And I think she has a whole section on logs as well for Postgres.

CHELSEA: Yeah. Something she mentioned, which I am a huge fan of, if you're trying to get started and understanding what even questions to rabbit hole into is turning on auto_explain and EXPLAIN ANALYZE plan will tell you what your Postgres server is doing to return the answer to a query.

So it'll tell you, "Oh, I'm scanning in this direction and I'm filtering for this." It just tells you the game plan. And also if you run with with ANALYZE, it actually runs it and it gives you some runtime-specific answers. However, if you set up auto_explain, you're able to have this done automatically for you.

And this is a great place to start because running it yourself can run into some pitfalls. For example, things might be cached. You might be running it with an ID that has one row instead of five million rows. It's hard to say. I think a great place to start, if you're trying to answer these questions, you're trying to jump in, is actually just setting up auto_explain with some reasonable min duration, which means that it'll automatically collect those questions for you.

Because you know, running EXPLAIN ANALYZE is essentially, in some ways it's telling you the game plan of Postgres, in some ways it's giving you questions to ask and it'll run it for longer running queries. So I think that's something she mentioned as well that I really agree with.

PINO: Maybe I'll ask a question. I didn't see anything about this in the blog and talk, but do application developers need to be aware of connection poolers and maybe the built in connection pooler in their ORM? And at what point if they do?

CHELSEA: That's a great question.

I think there's two situations in which connection poolers probably just work or just don't exist, and that's with a very small company, possible, you just connect directly to the database or the big company. Somebody has probably already set up PgBouncer or, you know, there's modern or different solutions right now as well, such as Odyssey or PgCat and other options.

I think most of the time it's not actually where I'd recommend jumping in first, because I think it's possible to get around it, and especially if you have an ORM-side connection pooler you can get some of the benefit there. You won't get as much from a centralized pooler that's going to sit in between you and Postgres, so I think that that's not a pure answer, especially if you're running in, let's say, a Kubernetes environment where you have many, many pods or servers which will have their own connection poolers, which, of course cumulatively will pool less than one centralized pooler.

So I would say if you're thinking about where to start off, that's not where I'd see the most benefit to jump in. But I do think that the benefit of understanding that your ORM might have a pooler is pretty high. So I think that I would say if you're going to come into it from the app dev perspective, look at the ORM side first.

And then ask your, again, like I said, DBAs, they're out there to be reached out to. If you can figure out if your company or if your project has a centralized pooler, that will give you a lot of understanding. And if they don't, which, if you're a larger company with lots of connections who doesn't have it, then that is an amazing place to start off in suggesting it, because the benefits can be really huge.

That was a bit wandery. Did that make sense?

PINO: Yes, it did. Thanks.

CLAIRE: That was great. So, one of you, as we were preparing for the podcast, said, it might have been you, Chelsea, although I could see Floor saying this too that, "databases can be your best friend or your worst enemy." It's kind of dramatic, but I want to know more.

What did you mean?

CHELSEA: That was me. I think I do have a flair for the dramatic sometimes. But I do stand by this. I think that when I said that, what I was thinking about is how app developers try to kind of avoid jumping into the database sometimes. And sometimes there'll be so many code optimizations, even infrastructure optimizations before optimizing the database itself. So, you know, you think about a developer who's okay, I want to go from this O of N runtime, you know, I want to remove this for loop and I can make it faster by, you know, I can cache locally and this LRU cache and there's optimizations you can make on the code side, but they won't jump into the Postgres or, in this case Postgres, but the database side, whereas I find that databases can be your best friend, because if you notice that a lot of your timing is from the database, that is a huge place for optimization and improvement. That's going to pay dividends much faster, almost always, than, you know, optimizing a for loop.

But it can also be your worst enemy in that the complexity of doing so and the ability to kind of run into this kind of deep knowledge that feels very confusing can be very challenging.

So I would say that it's your best friend and that it's such an amazing place for optimization that needs less maintenance potentially, and needs less code work than optimizing a loop, but it can also just be a very high barrier to entry. And sometimes when, especially if you don't have observability or monitoring it is just this black box and you don't understand how to get started. Whereas, most developers can say, "Oh, I can remove that for loop." It becomes a higher barrier to entry.

CLAIRE: Okay, so you just said something really important. You said especially if you don't have observability or monitoring. So how do you recommend an app dev who wants to consider, who wants the database to become their best friend, who wants to be able to optimize at the database level?

Where do they get this observability and this monitoring? How do they figure out if the database is their bottleneck?

CHELSEA: At a high level, I would say the best thing that you can try to do is get some form of traces-per-request. So, just to see what percentage of your request time is spent on different steps, and that can be really illuminating, you know, everything from Grafana to Datadog to the observability that might be built into a cloud-managed service.

That will usually, well, not maybe the last one if it's only on databases, but that can tell you, "okay, your request took 500 milliseconds." And 80% of that time is coming from database queries, or there's actually 200 very fast requests in that 500 milliseconds. And I think that the end-to-end information is where I find the best place to start.

CLAIRE: Floor, I know you used to work at Grafana, too. I mean, do you want to pile onto that? Any tips for application developers who are trying to figure out if the database is organic?

FLOOR: Absolutely. I think you should definitely have a look at a tool like Grafana. And do some benchmarking there.

But also to tag along to the, or, pile on the earlier question, I think we don't know what we don't know. And I'm at Devopsdays Chicago right now. So I'd be also remiss to be like, "Hey, you know, we have dev and ops working together," but we're not talking about, it's not called "dev DB ops" or something.

And it shouldn't be called that because we should be talking to our DBAs and our data engineers, like, organically, I don't understand why they don't factor into a lot of these conversations that we're having. And I just want people on the dev and ops side and just generally application developers to be more curious about, you know, the little quirks of a particular data tool that you're using, if you have zero familiarity with any of these tools and just use them for granted as we did, you know, when we were still young and unknowing it's such a big miss, right? It's unfortunate. So yeah, we've learned.

CLAIRE: We've all been there, right? We've all been at those stages in our careers where we don't know what we don't know. And yeah. And at least I still have more to learn and lots of new areas. So, but yeah, it means I'm not bored, right? Hopefully none of us are. Okay, so be curious if you're in a larger organization and you have data engineers and DBAs and people on the operational infrastructure side, you're saying talk to them. Is that right? Obviously, some people are in small companies and they don't have specialists like that to talk to.

Obviously, yeah, how would they go about it? Well, obviously there's fewer people to talk to, which actually maybe makes it a little bit easier, right? That's kind of what Chelsea's situation in, was it your first or your second company, Chelsea? It was a smaller team, right?

CHELSEA: Yeah, it was my second company. And like I said, I was then hired as the only data engineer. However, there were definitely people that knew more context about what we were running. And so even if they're not, even though they don't have the name DBA in their title, people that have more context on why choices were made and what trade offs have been made in making these decisions is a good place to start.

Even if their title may not seem to be database-specific.

CLAIRE: So. If a developer was going to do just one thing, you've each made a number of recommendations based on your experiences, right? Becoming more savvy at the database level over the years. But if an app developer was going to do just one thing as a first step in becoming more Postgres savvy, what would you recommend it be? Pino and I are going to ask each of you this question.

CHELSEA: I can go first. I have already shilled a little bit about auto_explain and EXPLAIN ANALYZE, so I won't talk about that again. But I will say that the number one thing that I think is helpful is trying to remove the fear that comes with working with databases as an app developer. Because when you're an app developer, you work with an ORM, and it feels safe because it's code. And there's a certain fear that comes with, you know, using Psql, even on, for me, even on staging, I'd be running, you know, SELECT LIMIT one, I'd be like, "did I somehow include DROP TABLE?" I don't know, I gotta double check.

And you obviously have to be careful, and I'm talking about non-production environments here, but as you start to feel comfortable using a shell, as you start to feel comfortable maybe executing queries outside of the ORM, this opens up those questions that I mentioned. It opens up behavior and new commands that you didn't know you could run.

And, it opens up discrepancies between what your ORM is doing and what it's running and why is that so much slower or faster than what I try personally? So I think that although you don't need to use Psql directly to be a good database engineer, I know people that don't and prefer, you know, DataGrip or prefer something else or SQL PAD, I think it's a great place to start because it gives you that springboard into depth-first like I was talking about, and you're really able to start to drop the fear and to make databases feel like a known quantity instead of this black box.

PINO: And Chelsea, would you say pgAdmin is another tool that is approachable in the same way?

CHELSEA: Yeah, absolutely. I haven't personally used it as much, just circumstantially, but I would say everything from pgAdmin to DataGrip to anything like that, is really just the same. If you use a GUI, that's fine.

I think Psql will help with that a little bit more. But it's about, like I said, it's about dropping the fear. So whatever gets you there. And you can always go back and forth between different tooling as well.

FLOOR: I love that.

PINO: Thank you. Yeah.

FLOOR: So I love that Chelsea says dropping the fear. So I would say seek out the fear. I know controversial.

I would say, really actively seek out to learn more about something that is not in your immediate tool belt or something that you have immediate experience with. Now I'm not a great developer, but I'm a very extroverted developer. And so if I if I want to learn more, I want to seek out people inside my company, but maybe also outside my company to learn more about a topic that I really want to know more about. And so I would really love for to see more companies, these communities of practice doing Lunch and Learns and just learn more that way. I remember that, you know, like I moved to Berlin at one point from Vienna where I hosted a whole bunch of meetups when I was learning to program because you know, learning to program by yourself is well, for me it's incredibly hard.

I need a little bit of peer pressure. I need to learn with others in order and explain things to others in order to, that way it will it will stick with me more and quicker. And so I moved to Berlin to work most- sorry. Mainly as a developer. And didn't organize any of these events anymore.

And I just felt like my learning sort of, like, slowed down so much while, so I was thinking like, "Oh, I'll be working as a developer in a, you know, like, I'll be doing this every day," the whole day, and that will excel my learning because I'm working with this day in and day out and turns out it didn't, and when I started organizing events and that again started to accelerate my learning.

So yeah, no. Actively seek out, you know? Places outside your comfort zone and seek out knowledge from others. That's what I would say.

CLAIRE: So it's interesting because you said seek out the fear and Chelsea said remove the fear and you said "controversy," but it's actually not that controversial. Like, the thing that's going to help you remove the fear, I believe, is by rolling up your sleeves and just trying. And doing. And I used to work for someone who would say, when they were trying to drive change in our team and the company and our mission, was that "you guys have to get comfortable being uncomfortable, right?"

You can't just do the same, same thing that you know how to do blindfolded. You've got to get uncomfortable if you're going to learn and grow and succeed.

CHELSEA: So maybe we can combine forces and say: seek the fear so you can drop the fear.

PINO: It might be the case, too, that onboarding folks has to start including the data side and and then things like dev containers and pre-prepared environments for development help make it easier to just to get started because it's right there. You can connect to it. You can make that part of the of the first steps in joining a project.

CLAIRE: Well, I think that we have covered a lot of ground thinking about this. You're probably already using Postgres, you know, now what, right? What you need to know. I think you guys have both shared some super useful things that you've done, and is there anything else, a question that Pino and I, for example, did not ask that you wanted us to ask, that you really want to share with app developers? Maybe based on your experience or what you've seen somebody else do.

FLOOR: I thought Chelsea started talking and so I shut up.

CHELSEA: No, no, you go ahead. I need to think.

FLOOR: It's more a call to action to start learning more about databases. What they mean to you, what place they take in your world, right, or what place they could take in your world. I really want to encourage people to start, when they go on that learning journey, to start teaching others early and often and not think like this might be a journey that's only interesting to them because I assure you it's not, so it's more of a call to action for people that when you start that learning journey grab others to join you there. And you'll see, you'll end up with a group that is incredibly enthusiastic about tackling issues with all of this newfound knowledge.

CHELSEA: I think that a conversation that we didn't have that I think kind of almost ties into, it kind of mirrors the conversation that we were having about extensions in some ways and how to optimize, how to get started is: when you first jump into using Postgres, something that seems incredibly appealing is configuring your server to be hyperspecific.

And although I'm a big fan of Postgres configurations, and the first talk of any kind I did was a lightning talk about them. So I'm a big fan, but I think something that sometimes when people jump into Postgres, they get over excited about configurations when it's the queries and the access patterns that will show the quickest and the most direct performance improvement.

So I would say, if you're jumping into Postgres, and you're wanting to get the most benefit for your work, I think start with "why are these queries running?" "Do they need to run?" "Do they need to have this many parameters?" "Why are they using this plan?" And I think sometimes the less sexy work of going through query-by-query is kind of ignored a little bit or pushed off because people want to find the master configuration that will solve all the problems. And though there are definitely configurations that will help I think starting from that perspective of query optimization will bring you a lot of benefit.

PINO: Maybe I'd like to chime in and since Floor and Chelsea have told us about entry points that make the most sense for app devs to get started being more savvy with Postgres, there's another approach or you should at least know about where you can find a lot more resources. So in Episode 5 last month, we had guests, Ryan Booz and Grant Fritchey, and the topic was, "My favorite way is to learn more about PostgreSQL". And so in that episode you, you can find all the links that were mentioned in that episode at aka.ms/PathToCitusCon, and I'll drop that link in the chat.

CLAIRE: Yeah, there was just a huge amount of ideas because people have written blog posts and they've created tutorials, they've written books there's conferences and people, not just Grant and Ryan and Pino and me, but all the people on the chat were just incredibly generous with the links.

So, the show notes are just a useful treasure trove of places to learn more. Cool. Well I think we're at a wrap. Floor and Chelsea, thank you so much for joining Pino and me today. Really, really enjoyed the conversation and those of you who participated in the text chat, thank you so much as well.

The next episode of Path to Citus Con, the podcast for developers who love Postgres, is going to be on Wednesday, September 6th at 10 AM PDT. For those of you who want to join the live recording here on Discord. And this episode will be available in the next couple of days. Thanks to our producers, Carol Smith and Aaron Wislang.

They'll be making it available on all the podcast platforms. So shout out to Carol and Aaron for making this happen.

PINO: Chelsea and Floor, thank you so much. We learned so much from you.

CHELSEA: Thank you so much for having us.

Creators and Guests

Claire Giordano
Host
Claire Giordano
Claire Giordano is head of the Postgres open source community initiatives at Microsoft. Claire has served in leadership roles in engineering, product management, and product marketing at Sun Microsystems, Amazon/A9, and Citus Data. At Sun, Claire managed the engineering team that created Solaris Zones, and led the effort to open source Solaris.
Pino de Candia
Host
Pino de Candia
Pino de Candia is a software dev manager at Microsoft since 2020 and is currently working on the Citus open source project. Pino previously worked on the managed PostgreSQL database service in Azure Cosmos DB for PostgreSQL, which includes Citus on Azure support for distributed PostgreSQL. Pino has lived in New Orleans since 2017.
Aaron Wislang
Producer
Aaron Wislang
Open Source Engineering + Developer Relations at Microsoft + Azure ☁️ | Go (golang), Cloud Native, Linux 🐧 🐍 🦀 ☕ 🍷📷 🎹 | Toronto 🇨🇦🌎 | 💨😷💉 | https://aaronw.dev/hello/
Carol Smith
Producer
Carol Smith
Senior Program Manager at Microsoft in the Citus Community team. Previously at GitHub and Google. Horseback rider, cook, and armchair movie critic.
You're probably already using Postgres: What you need to know with Chelsea Dole & Floor Drees
Broadcast by