How To Use Joins in SQL

Introduction

Many database designs separate information into different tables based on the relationships between certain data points. Even in cases like this, it’s likely that there will be times when someone will want to retrieve information from more than one table at a time.

A common way of accessing data from multiple tables in a single Structured Query Language (SQL) operation is to combine the tables with a JOIN clause. Based on join operations in relational algebra, a JOIN clause combines separate tables by matching up rows in each table that relate to one another. Usually, this relationship is based on a pair of columns — one from each table — that share common values, such as one table’s foreign key and the primary key of another table that the foreign key references.

This guide outlines how to construct a variety of SQL queries that include a JOIN clause. It also highlights different types of JOIN clauses, how they combine data from multiple tables, and how to alias column names to make writing JOIN operations less tedious.

Prerequisites for Joins in SQL

In order to follow this guide, you will need a computer running some type of relational database management system (RDBMS) that uses SQL. The instructions and examples in this guide were validated using the following environment:

  • A server running Ubuntu 20.04, with a non-root user with administrative privileges and a firewall configured with UFW, as described in our initial server setup guide for Ubuntu 20.04.
  • MySQL installed and secured on the server, as outlined in How To Install MySQL on Ubuntu 20.04. This guide was verified with a newly-created user, as described in Step 3.

Note: Please note that many RDBMSs use their own unique implementations of SQL. Although the commands outlined in this tutorial will work on most RDBMSs, the exact syntax or output may differ if you test them on a system other than MySQL.

You’ll also need a database with some tables loaded with sample data which you can use to practice using JOIN operations. We encourage you to go through the following Connecting to MySQL and Setting up a Sample Database section for details on how to connect to a MySQL server and create the testing database used in examples throughout this guide.

Connecting to MySQL and Setting up a Sample Database

If your SQL database system runs on a remote server, SSH into your server from your local machine:

Then open up the MySQL server prompt, replacing sammy with the name of your MySQL user account:

Create a database named joinsDB:

If the database was created successfully, you’ll receive output like this:

Query OK, 1 row affected (0.01 sec)

To select the joinsDB database, run the following USE statement:

Output:

After selecting joinsDB, create a few tables within it. For the examples used in this guide, imagine that you run a factory and have decided to begin tracking information about your product line, employees on your sales team, and your company’s sales in an SQL database. You plan to start off with three tables, the first of which will store information about your products. You decide this first table needs three columns:

  • productID: each product’s identification number, expressed with the int data type. This column will serve as the table’s primary key, meaning that each value will function as a unique identifier for its respective row. Because every value in a primary key must be unique, this column will also have a UNIQUE constraint applied to it
  • productName: each product’s name, expressed using the varchar data type with a maximum of 20 characters
  • price: the price of each product, expressed using the decimal data type. This statement specifies that any values in this column are limited to a maximum of four digits in length with two of those digits to the right of the decimal point. Thus, the range of values allowed in this column goes from -99.99 to 99.99

Create a table named products that has these three columns:

CREATE TABLE products (
productID int UNIQUE,
productName varchar(20),
price decimal (4,2),
PRIMARY KEY (productID)
);

The second table will store information about the employees on your company’s sales team. You decide this table also needs three columns:

  • empID: similar to the productID column, this column will hold a unique identification number for each employee on the sales team expressed with the int data type. Likewise, this column will have a UNIQUE constraint applied to it and will serve as the primary key for the team table
  • empName: the name of each salesperson, expressed using the varchar data type with a maximum of 20 characters
  • productSpecialty: each member of your sales team has been assigned a product as their specialty; they can sell any product your company makes, but their overall focus will be on whatever product they specialize in. To indicate this in the table, you create this column which holds the productID value of whatever product each employee specializes in

To ensure that the productSpecialty column only holds values that represent valid product ID numbers, you decide to apply a foreign key constraint to the column that references the products table’s productID column. A foreign key constraint is a way to express a relationship between two tables by requiring that values in the column on which it applies must already exist in the column that it references. In the following CREATE TABLE statement, the FOREIGN KEY constraint requires that any value added to the productSpecialty column in the team table must already exist in the products table’s productID column.

Create a table named team with these three columns:

CREATE TABLE team (
empID int UNIQUE,
empName varchar(20),
productSpecialty int,
PRIMARY KEY (empID),
FOREIGN KEY (productSpecialty) REFERENCES products (productID)
);

The last table you create will hold records of the company’s sales. This table will have four columns:

  • saleID: similar to the productID and empIDcolumns, this column will hold a unique identification number for each sale expressed with the int data type. This column will also have a UNIQUE constraint so it can serve as the primary key for the sales table
  • quantity: the number of units of each product sold, expressed with the int data type
  • productID: the identification number of the product sold, expressed as an int
  • salesperson: the identification number of the employee who made the sale

Like the productSpecialty column from the team table, you decide to apply FOREIGN KEY constraints to both the productID and salesperson columns. This will ensure that these columns only contain values that already exist in the products table’s productID column and the team table’s empID columns, respectively.

Create a table named sales with these four columns:

CREATE TABLE sales (
saleID int UNIQUE,
quantity int,
productID int,
salesperson int,
PRIMARY KEY (saleID),
FOREIGN KEY (productID) REFERENCES products (productID),
FOREIGN KEY (salesperson) REFERENCES team (empID)
);

Following that, load the products table with some sample data by running the following INSERT INTO operation:

INSERT INTO products
VALUES
(1, 'widget', 18.99),
(2, 'gizmo', 14.49),
(3, 'thingamajig', 39.99),
(4, 'doodad', 11.50),
(5, 'whatzit', 29.99);

Then load the team table with some sample data:

INSERT INTO team
VALUES
(1, 'Florence', 1),
(2, 'Mary', 4),
(3, 'Diana', 3),
(4, 'Betty', 2);

Load the sales table with some sample data as well:

INSERT INTO sales
VALUES
(1, 7, 1, 1),
(2, 10, 5, 4),
(3, 8, 2, 4),
(4, 1, 3, 3),
(5, 5, 1, 3);

Lastly, imagine that your company makes a few sales without the involvement of anyone on your sales team. To record these sales, run the following operation to add three rows to the sales table that don’t include a value for the salesperson column:

INSERT INTO sales (saleID, quantity, productID)
VALUES
(6, 1, 5),
(7, 3, 1),
(8, 4, 5);

With that, you’re ready to follow the rest of the guide and begin learning about how to join tables together in SQL.

Understanding the Syntax of Joins in SQL Operations

JOIN clauses can be used in a variety of SQL statements, including UPDATE and DELETE operations. For illustration purposes, though, the examples in this guide use SELECT queries to demonstrate how JOIN clauses work.

The following example shows the general syntax of a SELECT statement that includes a JOIN clause:

SELECT table1.column1, table2.column2
FROM table1 JOIN table2
ON search_condition;

This syntax begins with a SELECT statement that will return two columns from two separate tables. JOIN clauses compare multiple tables, so this syntax specifies each column’s table by using the table name. This is known as a fully qualified column reference.

Fully qualified column references are necessary when columns from different tables share the same name in operations. It’s good practice to use them when working with multiple tables, though, as they can help make JOIN operations easier to read and understand.

After the SELECT clause comes the FROM clause. In any query, the FROM clause is where you define the data set that should be searched in order to return the desired data. The only difference here is that the FROM clause includes two tables separated by the JOIN keyword. A helpful way to think of writing queries is to remember that you SELECT which columns to return FROM which table you’d like to query.

Following that is an ON clause, which describes how the query should join the two tables together by defining a search condition. A search condition is a set of one or more predicates, or expressions that can evaluate whether a certain condition is “true,” “false,” or “unknown.” It can be helpful to think of a JOIN operation as combining every row from both tables, and then returning any rows for which the search condition in the ON clause evaluates to “true”.

In an ON clause, it usually makes sense to include a search condition that tests whether two related columns — like one table’s foreign key and the primary key of another table that the foreign key references — have values that are equal. This is sometimes referred to as an equi join.

As an example of how equi joins match data from multiple tables, run the following query using the sample data you added previously. This statement will join the products and team tables with a search condition that tests for matching values in their respective productID and productSpecialty columns. It will then return the names of every member of the sales team, the name of each product they specialize in, and the price of those products:

SELECT team.empName, products.productName, products.price
FROM products JOIN team
ON products.productID = team.productSpecialty;

Here is this query’s result set:

+----------+-------------+-------+
| empName  | productName | price |
+----------+-------------+-------+
| Florence | widget      | 18.99 |
| Mary     | doodad      | 11.50 |
| Diana    | thingamajig | 39.99 |
| Betty    | gizmo       | 14.49 |
+----------+-------------+-------+
4 rows in set (0.00 sec)

To illustrate how SQL combines these tables to form this result set, let’s take a closer look at this process. JOIN operations may not work exactly like this, but it’s helpful to think of them as following this procedure.

First, the query prints every row and column in the first table in the FROM clause, products:

JOIN Process Example
+-----------+-------------+-------+
| productID | productName | price |
+-----------+-------------+-------+
|         1 | widget      | 18.99 |
|         2 | gizmo       | 14.49 |
|         3 | thingamajig | 39.99 |
|         4 | doodad      | 11.50 |
|         5 | whatzit     | 29.99 |
+-----------+-------------+-------+

It matches rows from the team table where productSpecialty equals the productID value in each row:

JOIN Process Example
+-----------+-------------+-------+-------+----------+------------------+
| productID | productName | price | empID | empName  | productSpecialty |
+-----------+-------------+-------+-------+----------+------------------+
|         1 | widget      | 18.99 |     1 | Florence |                1 |
|         2 | gizmo       | 14.49 |     4 | Betty    |                2 |
|         3 | thingamajig | 39.99 |     3 | Diana    |                3 |
|         4 | doodad      | 11.50 |     2 | Mary     |                4 |
|         5 | whatzit     | 29.99 |       |          |                  |
+-----------+-------------+-------+-------+----------+------------------+

Then, it cuts any rows that don’t have a match and rearranges the columns based on their order in the SELECT clause, drops any columns that weren’t specified, resorts the rows, and returns the final result set:

JOIN Process Example
+----------+-------------+-------+
| empName  | productName | price |
+----------+-------------+-------+
| Florence | widget      | 18.99 |
| Mary     | doodad      | 11.50 |
| Diana    | thingamajig | 39.99 |
| Betty    | gizmo       | 14.49 |
+----------+-------------+-------+
4 rows in set (0.00 sec)

Using equi joins is the most common way to join tables, but it’s possible to use other SQL operators such as <, >, LIKE, NOT LIKE, or even BETWEEN in ON clause search conditions. Using more complex search conditions can make it hard to predict which data will appear in the result set.

In most implementations, you can join tables with any set of columns that have what the SQL standard refers to as “JOIN eligible” data type. You can join numeric columns regardless of their specific data types, as long as they hold numeric data. Likewise, it’s usually possible to join any column that holds character values with any other column holding character data. Typically, columns used to join tables represent a relationship, such as a foreign key and primary key.

Many SQL implementations also allow you to join columns that have the same name with the USING keyword instead of ON. This is how the syntax for such an operation might look:

SELECT table1.column1, table2.column2
FROM table1 JOIN table2
USING (related_column);

In this example syntax, the USING clause is equivalent to ON table1.related_column = table2.related_column;

Sales and products can be joined by matching their productID columns using the USING keyword. The command returns saleID, quantity, product name, price, and sorts results by saleID in ascending order:

SELECT sales.saleID, sales.quantity, products.productName, products.price
FROM sales JOIN products
USING (productID)
ORDER BY saleID;

Output

+--------+----------+-------------+-------+
| saleID | quantity | productName | price |
+--------+----------+-------------+-------+
|      1 |        7 | widget     | 18.99 |
|      2 |       10 | whatzit     | 29.99 |
|      3 |        8 | gizmo       | 14.49 |
|      4 |        1 | thingamajig | 39.99 |
|      5 |        5 | widget      | 18.99 |
|      6 |        1 | whatzit     | 29.99 |
|      7 |        3 | widget      | 18.99 |
|      8 |        4 | whatzit     | 29.99 |
+--------+----------+-------------+-------+
8 rows in set (0.00 sec)

When joining tables, the database system will sometimes rearrange rows in ways that aren’t easy to predict. Including an ORDER BY clause like this can help make result sets more coherent and readable.

Joining More than Two Tables

There may be times when you need to combine data from more than just two tables. You can join any number of tables together by embedding JOIN clauses within other JOIN clauses. The following syntax is an example of how this can look when joining three tables:

SELECT table1.column1, table2.column2, table3.column3
FROM table1 JOIN table2
ON table1.related_column = table2.related_column
JOIN table3
ON table3.related_column = table1_or_2.related_column;

This example syntax’s FROM clause starts by joining table1 with table2. After this joining’s ON clause, it starts a second JOIN that combines the initial set of joined tables with table3. Note that the third table can be joined to a column in either the first or second table.

Imagine you want to know revenue from sales where employees sell products they specialize in.

To get this information, you could run the following query. This query starts by joining the products and sales tables together by matching their respective productID columns. It joins the team table by matching each row to its productSpecialty column in the initial JOIN. The query filters results with a WHERE clause to return rows where the matched employee made the sale. This query includes an ORDER BY clause to sort results in ascending order by saleID:

SELECT sales.saleID,
team.empName,
products.productName,
(sales.quantity * products.price)
FROM products JOIN sales
USING (productID)
JOIN team
ON team.productSpecialty = sales.productID
WHERE team.empID = sales.salesperson
ORDER BY sales.saleID;

This query’s SELECT clause includes an expression multiplying sales quantity by products price values. It returns the products of these values in the matched rows:

Output
+--------+----------+-------------+-----------------------------------+
| saleID | empName  | productName | (sales.quantity * products.price) |
+--------+----------+-------------+-----------------------------------+
|      1 | Florence | widget      |                            132.93 |
|      3 | Betty    | gizmo       |                            115.92 |
|      4 | Diana    | thingamajig |                             39.99 |
+--------+----------+-------------+-----------------------------------+
3 rows in set (0.00 sec)

All the examples so far have featured the same type of JOIN clause: the INNER JOIN. For an overview of INNER joins, OUTER joins, and how they differ, continue reading the next section.

Inner vs. Outer Joins in SQL Operations

There are two main types of JOIN clauses: INNER joins and OUTER joins. The difference between these two types of joins has to do with what data they return. INNER join operations return only matching rows from each joined table, while OUTER joins return both matching and non-matching rows.

The previous examples used INNER JOIN clauses, though none explicitly included the INNER keyword. Most SQL implementations treat any JOIN clause as an INNER join unless explicitly stated otherwise.

OUTER JOIN queries combine tables and return matching rows, along with non-matching rows from each table. This can be useful for finding rows with missing values, or in cases where partial matches are acceptable.

OUTER join operations can be further divided into three types: LEFT OUTER joins, RIGHT OUTER joins, and FULL OUTER joins. LEFT JOIN returns matching rows and non-matching rows from the “left” table. In JOIN operations, the “left” table is the first table after the FROM keyword, to the left of JOIN.  The “right” table follows JOIN, and RIGHT JOIN returns matching and non-matching rows from the right table. A FULL OUTER JOIN returns every row from both tables, including any rows from either table that don’t have matches.

Run the example queries on the tables from the previous “Connecting to and Setting up a Sample Database” section. These queries are identical except that each specifies a different type of JOIN clause.

This example uses an INNER JOIN to combine sales and team tables by matching salesperson and empID columns. Again, the INNER keyword is implied even though it’s not explicitly included:

SELECT sales.saleID, sales.quantity, sales.salesperson, team.empName 
FROM sales JOIN team
ON sales.salesperson = team.empID;

Because this query uses an INNER JOIN clause, it only returns matching rows from both tables:

Output
+--------+----------+-------------+----------+
| saleID | quantity | salesperson | empName  |
+--------+----------+-------------+----------+
|      1 |        7 |           1 | Florence |
|      4 |        1 |           3 | Diana    |
|      5 |        5 |           3 | Diana    |
|      2 |       10 |           4 | Betty    |
|      3 |        8 |           4 | Betty    |
+--------+----------+-------------+----------+
5 rows in set (0.00 sec)

This version of the query uses a LEFT OUTER JOIN clause instead:

SELECT sales.saleID, sales.quantity, sales.salesperson, team.empName
FROM sales LEFT OUTER JOIN team
ON sales.salesperson = team.empID;

Like the previous query, this one also returns every matching value from both tables. However, it also returns any values from the “left” table (in this case, sales) that don’t have matches in the “right” table (team). Because these rows in the left table don’t have matches in the right, the unmatched values are returned as NULL:

Output
+--------+----------+-------------+----------+
| saleID | quantity | salesperson | empName  |
+--------+----------+-------------+----------+
|      1 |        7 |           1 | Florence |
|      2 |       10 |           4 | Betty    |
|      3 |        8 |           4 | Betty    |
|      4 |        1 |           3 | Diana    |
|      5 |        5 |           3 | Diana    |
|      6 |        1 |        NULL | NULL     |
|      7 |        3 |        NULL | NULL     |
|      8 |        4 |        NULL | NULL     |
+--------+----------+-------------+----------+
8 rows in set (0.00 sec)

This next version of the query instead uses a RIGHT JOIN clause:

SELECT sales.saleID, sales.quantity, sales.salesperson, team.empName
FROM sales RIGHT JOIN team
ON sales.salesperson = team.empID;

Notice that this query’s JOIN clause reads RIGHT JOIN instead of RIGHT OUTER JOIN. Just like INNER JOIN, OUTER is implied with LEFT JOIN or RIGHT JOIN without needing the OUTER keyword.

This query returns all rows from both tables, but only unmatched rows from the “right” table:

Output
+--------+----------+-------------+----------+
| saleID | quantity | salesperson | empName  |
+--------+----------+-------------+----------+
|      1 |        7 |           1 | Florence |
|   NULL |     NULL |        NULL | Mary     |
|      4 |        1 |           3 | Diana    |
|      5 |        5 |           3 | Diana    |
|      2 |       10 |           4 | Betty    |
|      3 |        8 |           4 | Betty    |
+--------+----------+-------------+----------+
6 rows in set (0.00 sec)

Note: Be aware that MySQL doesn’t support FULL OUTER JOIN clauses. Here’s what the result would look like with a FULL OUTER JOIN in PostgreSQL:

SELECT sales.saleID, sales.quantity, sales.salesperson, team.empName
FROM sales FULL OUTER JOIN team
ON sales.salesperson = team.empID;

Output

 saleid | quantity | salesperson | empname  
--------+----------+-------------+----------
      1 |        7 |           1 | Florence
      2 |       10 |           4 | Betty
      3 |        8 |           4 | Betty
      4 |        1 |           3 | Diana
      5 |        5 |           3 | Diana
      6 |         1 |             | 
      7 |         3 |             | 
      8 |         4 |             | 
        |         |             | Mary
(9 rows)

As this output indicates, the FULL JOIN returns every row in both tables including the unmatched ones.

Aliasing Table and Column Names in Joins in SQL Clauses

When joining tables with long or highly descriptive names, having to write multiple fully qualified column references can become tedious. To avoid this, users sometimes find it helpful to provide table or column names with shorter aliases.

In SQL, follow the table definition in the FROM clause with AS and an alias of your choice:

SELECT t1.column1, t2.column2
FROM table1 AS t1 JOIN table2 AS t2
ON t1.related_column = t2.related_column;

This example syntax uses aliases in the SELECT clause even though they aren’t defined until the FROM clause. In SQL queries, the order of execution begins with the FROM clause. Thinking of aliases before writing the query helps with clarity.

For example, run the query that joins the sales and products tables with aliases S and P:

SELECT S.saleID, S.quantity,
P.productName,
(P.price * S.quantity) AS revenue 
FROM sales AS S JOIN products AS P
USING (productID);

This example creates a third alias, revenue, for the product of sales quantity and product price values. This is visible in the column name, but aliases help convey the meaning or purpose behind query results:

Output
+--------+----------+-------------+---------+
| saleID | quantity | productName | revenue |
+--------+----------+-------------+---------+
|      1 |        7 | widget      |  132.93 |
|      2 |       10 | whatzit     |  299.90 |
|      3 |        8 | gizmo       |  115.92 |
|      4 |        1 | thingamajig |   39.99 |
|      5 |        5 | widget      |   94.95 |
|      6 |        1 | whatzit     |   29.99 |
|      7 |        3 | widget      |   56.97 |
|      8 |        4 | whatzit     |  119.96 |
+--------+----------+-------------+---------+
8 rows in set (0.00 sec)

Note that when defining an alias the AS keyword is technically optional. The previous example could also be written like this:

SELECT S.saleID, S.quantity, P.productName, (P.price * S.quantity) revenue 
FROM sales S JOIN products P
USING (productID);

Even though the AS keyword isn’t needed to define an alias, it’s considered a good practice to include it. Doing so can help keep the query’s purpose clear and improve its readability.

Conclusion

This guide teaches you how to use JOIN operations to combine tables into a single query result. The commands here work on most relational databases, but each SQL database has its own unique implementation. Consult your DBMS documentation for a complete description of each command and its full set of options.

To learn more about SQL, check out the other tutorials in this series on How To Use SQL.

Create a Free Account

Register now and get access to our Cloud Services.

Posts you might be interested in: