Fundamental concepts of PostgreSQL
PostgreSQL has been around for close to 40 years now. It started as a project at UC Berkeley and is an open-source project. It's been battle-tested in high-scale production environments for a long time (and for different use-cases), with a big community of users and contributors.
In this article, I'll cover the most important concepts of PostgreSQL, as well as some of the more advanced topics.
The first topic is the most important one: the database schema.
Database schema
PostgreSQL uses schemas to organize the database.
A schema is a namespace that contains tables, views, and other objects. The schema is a way to group related objects together.
If you are using two schemas, you can have tables with the same name in both schemas.
Schemas are useful for:
- Namespacing: You can use schemas to group related objects together.
- Access control: You can use schemas to control access to objects.
- Object sharing: You can use schemas to share objects between databases.
Tables
Just like regular SQL databases, each table has rows and columns.
Each column has a name and a data type, which means that each column in a table has a specific type of data it can hold.
There are certain limits to the number of columns in a table (from 250 to 1600 according to their docs).
Rows in Postgres doesn't have identifiers attached to them (like columns), which means that you can have multiple rows holding the exact same values.
In high-scale production environments, you can have billions of rows in a single table.
Data types
Postgres supports a wide range of data types.
Text, integer, boolean, date, and timestamp are some of the most common data types.
But you can also use more complex ones like arrays, JSON, geotypes, hstore, etc.
Besides the built-in types supported out-of-the-box, you can also create your own types.
You won't be able to insert a string into an integer column, for example. That's why PostgreSQL is known as a "type safe" database.
This allows to perform computations on the db side (like calculating the average of a integer-type column), which is usually faster and more efficient than doing it on the application side since it's closer to the data with no need for network back and forth.
RLS (row-level security)
One of the most important features of PostgreSQL is its row-level security (RLS).
RLS is a security feature that allows you to restrict access to rows in a table based on, for example, the user's role.
It's a great way to protect sensitive data from being accessed by unauthorized users.
It follows the principle of least privilege, which is a security best practice (only grant the bare minimum permissions).
You can leverage RLS to create a multi-tenant application, where all tenants have their data in the same table, but the data is isolated from each other without any need for complex joins, subqueries or adding logic on the application side.
PL/pgSQL
PL/pgSQL is the procedural language for PostgreSQL. The main difference between PL/pgSQL and SQL is that PL/pgSQL allows for more complex logic and control flow.
Using it allows for batching multiple operations together, which is faster and more efficient (less network overhead) than executing each operation separately like in a regular SQL statement.
Some features of PL/pgSQL include:
- Control flow: It allows if-else, while, for, case, exception handling, etc.
- Stored procedures: You can create stored procedures, but keep in mind they don't return anything.
- Functions: You can create functions to encapsulate logic, reuse them and return a result.
- Triggers: You can create specific triggers based on events on the db.
Indexes
Indexes are used to speed up data retrieval.
By creating indexes, you provide Postgres a more efficient way to locate the data associated with a query without having to go through every single row in the table.
You can think of indexes similar to an index in a document. The reader can easily find the location when provided with the index, same thing with Postgres.
There are several types of indexes in Postgres, including:
- B-tree: The most common type of index. It's a balanced tree structure.
- Hash: It's a hash table. It's faster for equality queries.
- GiST: It's a generalised search tree. It's used for text search and full-text search.
- GIN: It's a generalised inverted index. It's used for text search and full-text search.
- SP-GiST: It's a space-partitioned generalised search tree. It's used for text search and full-text search.
- BRIN: It's a block-range index. It's used for text search and full-text search.
You should choose the right type of index based on the type of query you are running and the data you are trying to retrieve.
There are some caveats to consider when creating indexes:
- They can make it slower for inserts and updates, since the index needs to be updated too.
- They require additional storage.
Usually indexes are used when you have:
- Frequent queries on the same data.
- Sort operations are frequent.
- The DB is read-heavy.
Transactions
Transactions ensure that the data is consistent and that concurrent transactions cannot read inconsistent data.
Basically, it groups multiple operations together and if all of them are successful, it commits the transaction, otherwise it rolls back the entire transaction (even if some of the operations were successful).
It follows the ACID properties:
- Atomicity: It ensures that all operations in a transaction are treated as a single unit (either all of them are successful or none of them are).
- Consistency: It ensures that the data is consistent before and after the transaction.
- Isolation: It ensures that concurrent transactions happens in isolation from each other and would render the same db-state as if the transactions were executed in sequence.
- Durability: It ensures that even if the system crashes, the transaction is still committed.
Concurrency
Postgres uses a multi-version concurrency control (MVCC) to handle concurrent transactions.
This allows for concurrent transactions to read the same data without having to lock the entire table (readers don't block writers and writers don't block readers).
Each transaction see a snapshot of the data at the moment it starts plus the changes made by the current transaction.
Extensions and plugins
Given that Postgres is an open-source project, it's possible to extend its functionality by installing extensions and plugins developed and maintained by the community.
Extensions are pieces of code that add new features to Postgres. They can be used to add new data types, functions, operators, etc.
Some of the most popular extensions include:
- PostGIS: It's an extension that adds support for geographic objects and spatial indexing.
- pgvector: it allows for vector-based operations, like finding the nearest neighbors (using cosine similarity and other similarity metrics).
- pg_stat_statements: tracks the execution of SQL queries and provides a lot of useful statistics like the most frequent queries, the most expensive queries, etc. so you can get insights and help debug your queries.
Hopefully, this article was a good introduction to the most important concepts of PostgreSQL.
If you want to learn more about PostgreSQL, I recommend you to check out the official documentation and build your own project using it :)