linkedin Skip to Main Content
 

MySQL JOIN Clause: A Detailed Guide

Development

MySQL joins undoubtedly belongs to the list of database techniques everyone should know. After all, a common need when using a relational database is to combine data from more than one table, creating a cohesive result. In today’s post, we’ll cover the JOIN clause in MySQL.

Requirements

We’ll start the post by covering what joins are in MySQL, and then we’ll quickly move on to the practical part. To follow along with that second part, you’ll need the following:

  • A MySQL instance installed and running on your machine (the Docker image is a great way to achieve that)
  • Any MySQL client to connect to said instance

We also assume basic database knowledge. Let’s begin!

What are joins in MySQL?

In SQL language, the JOIN clause is what you use to combine data from two or more tables. We can make things more straightforward with an example. Imagine a database for an e-commerce system with a categories table:

iddescription
1Toys
2Clothing
3Technology
4Food & Beverages
5Books

The database also contains a products table:

| id |           description          | category_id |   |   | |:--:|:------------------------------:|:-----------:|---|---| | 7  | Notebook Dell Inspiron         | 3           |   |   | | 9  | iPhone 14 Pro                  | 3           |   |   | | 10 | iPhone 14 Pro Max              | 3           |   |   | | 26 | Super Awesome Coffee           | 4           |   |   | | 37 | "Refactoring" By Martin Fowler | 5           |   |   |
Code language: JavaScript (javascript)

As you can see, the products table has a column called category_id, whose value for each row references a given row from the categories table. This is what we call a “foreign key relationship,” and through that relationship, we can perform a join between the two tables to display—for instance—a list of products along with their categories: 

|             Product            |      Category     |   |   |   | |:------------------------------:|:-----------------:|---|---|---| | Notebook Dell Inspiron         | Technology        |   |   |   | | iPhone 14 Pro                  | Technology        |   |   |   | | iPhone 14 Pro Max              | Technology        |   |   |   | | Super Awesome Coffee           | Foods & Beverages |   |   |   | | "Refactoring" By Martin Fowler | Books             |   |   |   |
Code language: JavaScript (javascript)

By this point, it’s possible that the following question has crossed your mind: Why don’t we put everything on the same table? That would eliminate the need for joins!

Why do we need MySQL joins?

We need joins because we split related data into multiple tables, and that is due to database normalization.

Database normalization, in the simplest possible terms, is a technique we use to eliminate data redundancy. Basically, we strive for every piece of data on our database to have a single instance. That way, we guarantee data integrity. Normalization also helps keeping storage costs down, because you don’t have to store many (potentially thousands or even millions) copies of the same piece of data. 

For example, suppose that in the previous examples, we didn’t have the categories table, having instead a category column on the products table that contained the category’s description. In this scenario, we would end up with the same category potentially listed several times, threatening data integrity. There could have been typos. We might have needed to update some category descriptions and then failed to update all rows accordingly, leaving the old name in several places.

With the categories belonging to a dedicated table, we ensure a canonical place for each category to which related tables can link if needed. If there’s a need to update, that happens once and in just one place.

In summary: even though it might seem odd and unintuitive for humans, splitting data across multiple tables is a great way for computer to organize data, in a fast, efficient way while ensuring data integrity.

Types of MySQL joins

There are three main types of MySQL joins:

  • Inner join
  • Outer join (which has subtypes)
  • Cross join

Let’s cover each type so you can learn how they work and which problems they solve.

Setting up the database

Using your favorite client, connect to your MySQL database and run the following query:

CREATE DATABASE commerce; USE commerce; CREATE TABLE categories (      id MEDIUMINT NOT NULL AUTO_INCREMENT,      description VARCHAR(250) NOT NULL,      PRIMARY KEY (id) ); INSERT INTO categories (description) VALUES ('Toys'); INSERT INTO categories (description) VALUES ('Clothing'); INSERT INTO categories (description) VALUES ('Technology'); INSERT INTO categories (description) VALUES ('Food & Beverages'); INSERT INTO categories (description) VALUES ('Books');
Code language: SQL (Structured Query Language) (sql)

This will create the categories table in the commerce database and insert five rows into it.

Now, let’s create the products table:

CREATE TABLE products ( id MEDIUMINT NOT NULL AUTO_INCREMENT,     description VARCHAR(250) NOT NULL,     category_id MEDIUMINT NULL,     PRIMARY KEY (id),     FOREIGN KEY (category_id) REFERENCES categories(id) ON DELETE SET NULL );
Code language: SQL (Structured Query Language) (sql)

Notice that when creating the foreign key reference, we used the ON DELETE SET NULL clause. That means that if we try to delete a row from categories with one or more “children” in products, the deletion will be allowed, and the category_id column from the child rows will be updated to null. The reason we’re doing this will become clear soon.

The next step is inserting the products:

INSERT INTO products (description, category_id) VALUES ('Notebook Dell Inspiron', 3); INSERT INTO products (description, category_id) VALUES ('iPhone 14 Pro', 3); INSERT INTO products (description, category_id) VALUES ('iPhone 14 Pro Max', 3); INSERT INTO products (description, category_id) VALUES ('Super Awesome Coffee', 4); INSERT INTO products (description, category_id) VALUES ('"Refactoring" By Martin Fowler', 5);
Code language: SQL (Structured Query Language) (sql)

Now, we’re finally ready to start using joins.

Inner join: the basic scenario

The inner join is arguably the most common type of join you’ll use in SQL. It consists of joining two tables based on one or more columns from each table matching. Values from both tables are retrieved only when a match occurs. Visually, you could represent an inner join as an intersection between two sets.

Let’s see a practical example. Since we want the product description and category description, let’s try to query for that:

SELECT p.description Product, c.description Category FROM products p, categories c
Code language: CSS (css)

Well, the result doesn’t look right: It seems that each row from products got paired with each row from categories:

List of products and their categories in the database.

The explanation is simple: We didn’t specify how we wanted those rows to match, so MySQL just retrieved every possible combination. Let’s add a simple WHERE clause to our query:

SELECT p.description Product, c.description Category FROM products p, categories c where p.category_id = c.id
Code language: SQL (Structured Query Language) (sql)

The result now looks right:

Products matching a select category.

The latest command is actually an inner join, even though the word “join” is nowhere to be seen. However, I don’t recommend that syntax; why is that?  First, it’s not standard syntax, unlike the syntax you’ll soon see, which is an ANSI standard.

The other reason has to do with making your intent clear. The syntax above doesn’t even feature the word “join”. Even though it works in performing a join operation, it doesn’t express that fact as explicitly as possible.

The standard syntax for an inner join is as follows:

SELECT p.description Product, c.description Category FROM products p INNER JOIN categories c ON p.category_id = c.id
Code language: SQL (Structured Query Language) (sql)

The outcome you achieve is the same, but in this case, you’re making it clear that you’re performing an inner join. 

Inner join when a row fails to match

As you can see from the example above, the query only retrieves categories that have at least one product belonging to them. The categories “Toys” and “Clothing” don’t have any associated products; thus, they don’t appear.

What would happen if we had a product that didn’t have a valid category? Let’s simulate that scenario by deleting one of the categories:

delete from categories where id = 5
Code language: SQL (Structured Query Language) (sql)
Updated products after deleting the category of the last item.

The last product no longer has a category. What will happen now if we perform the inner join again?

Updated products after performing an inner join.

The book by Martin Fowler is no longer listed. I believe the main property of an inner join is now clear: It only brings rows when there is a double match. But what if you wanted to list all products, regardless of whether they have a valid category?

That sounds like a job for the outer join.

Outer joins

An outer join enables you to list rows even when there are no matches from one or both sides. We’ll take a look at examples to make things clearer. Continuing with the previous example, let’s list all of the products regardless of whether they have a category. I’ll show the command first and then explain it:

SELECT p.description Product, c.description Category FROM products p LEFT OUTER JOIN categories c ON p.category_id = c.id
Code language: SQL (Structured Query Language) (sql)

The command above results in all products being listed, even if they have no category_id:

List of all categories present in the database.

So, that’s the property of an outer join: It brings all the rows from a given table regardless of whether there’s a match. In the example above, the priority went to the products table: all products were shown regardless of having a valid category. But why?

The answer is that we used the “left” variation of the inner join, which privileges the table that comes to the left—that is, first—of the LEFT OUTER JOIN clause. The following image makes it very clear:

An illustration of the LEFT variation of the INNER JOIN.

What if we used the right outer join? In this case, the query privileges the table on the right, displaying all categories—even the ones that don’t have any products:

All the categories contained in the database.

By now, it becomes clear that it would’ve been possible to achieve the first result—that is, privileging the products table—using the right join as well. You’d have to swap the two tables:

SELECT p.description Product, c.description Category FROM categories c RIGHT OUTER JOIN products p ON p.category_id = c.id
Code language: SQL (Structured Query Language) (sql)

As you’ve probably noticed, the left and right join operations are analogous: A LEFT OUTER JOIN B is the same as B RIGHT OUTER JOIN A. Which one should you prefer?

Though there are edge cases in which the right join serves a real purpose, I’d say that, anecdotally, most people, most of the time, use the left variation. This is also what the MySQL documentation suggests to increase the portability of queries to different databases.

Finally, when using the outer joins, the actual word OUTER is optional: You can use LEFT JOIN or RIGHT JOIN, and the outcome won’t change.

Full joins

What if you wanted to list all products and all categories regardless of whether each row has a match? The solution for this scenario is yet another type of outer join called a full outer join. Unfortunately, unlike some other database engines, MySQL doesn’t support this type of join out of the box.

The alternative is to emulate a full outer join by performing a union between a left and a right join:

SELECT p.description Product, c.description Category FROM products p LEFT JOIN categories c ON p.category_id = c.id UNION SELECT p.description Product, c.description Category FROM products p RIGHT JOIN categories c ON p.category_id = c.id
Code language: SQL (Structured Query Language) (sql)

The outcome is now that all rows from both tables are displayed, even when there aren’t matches:

All rows from the products and categories tables are displayed.

Cross joins

A cross join is when you have all rows from one table combined with the other. Here’s the syntax:

SELECT * FROM products p CROSS JOIN categories c
Code language: SQL (Structured Query Language) (sql)

This is syntactically equivalent to performing an INNER JOIN without the ON clause:

SELECT * FROM products p INNER JOIN categories c
Code language: SQL (Structured Query Language) (sql)

And both are equivalent to the very first example we looked at today:

SELECT p.description Product, c.description Category FROM products p, categories c
Code language: SQL (Structured Query Language) (sql)

All of the three above queries result in 20 rows being displayed, which makes sense if you remember that products has five rows and categories has four.

In MySQL, JOIN, INNER JOIN, and CROSS JOIN are effectively synonymous, which isn’t necessarily true when it comes to other database engines.

Wrapping up

Relational databases are all about relationships. That’s why joins are some of the most crucial operations you can learn. They make it possible to bring data together while enjoying the benefits of database normalization.

Though there are some cases in which joins can affect performance, generally speaking, databases are well-equipped to execute them.

Thanks for reading, and until the next time!

This post was written by Carlos Schults. Carlos is a consultant and software engineer with experience in desktop, web, and mobile development. Though his primary language is C#, he has experience with a number of languages and platforms. His main interests include automated testing, version control, and code quality.