Understanding Inner Joins and Strategies to Minimize Record Counts
When tables are joined, it is usual to encounter duplicate or increased record counts. This happens when there are several matching values in either of the tables being joined. This article explains the causes of increased record counts and methods to minimize them.
Understanding Inner Joins
An inner join retrieves only those rows where there are corresponding values in both tables involved in the join. If there are several matching values in one or both tables, duplicate records may occur in the output.
Consider the following tables as an example:
Orders Table:
Customers Table:
When you execute an inner join on the CustomerID
column, the outcome will contain only the records that have matching CustomerID
values in both tables.
Result:
The outcome of performing an inner join on the CustomerID
column would be:
Identifying Duplicate Records
Duplicate records frequently arise when there are several matching values in each table. For example, if you perform a join operation on a column such as City
, where multiple customers share the same city, the result might contain duplicates.
Example:
Execute an inner join operation on the ‘City’ column.
Orders Table:
Customers Table:
Executing an inner join based on the City
column will produce the table shown below:
Result:
In this scenario, the resulting table includes duplicate entries for London and Berlin because there are several customers associated with the same city in the Customers table.
How to Reduce Duplicate Records
-
Use Primary Keys for Joins
Joining tables on primary keys ensures a one-to-one relationship between rows.Example:
Joining the Orders and Customers tables using theCustomerID
column eliminates duplicates:Result:
OrderID OrderDate CustomerName City 1.00 01 Apr 2022 John Smith London 2.00 02 Apr 2022 Sarah Johnson Paris 3.00 03 Apr 2022 Mark Davis Berlin -
Apply Filters After Joins
If only specific data is needed, use a filter condition to reduce duplicates.Example:
Fetching orders for customers only from London:SELECT Orders.OrderID, Orders.OrderDate, Customers.CustomerName, Customers.City FROM Orders INNER JOIN Customer ON Orders.CustomerID = Customers.CustomerID WHERE Customers.City = 'London';
Result:
OrderID OrderDate CustomerName City 1.00 01 Apr 2022 John Smith London
For Extract mode data sources Filters and Joins can be applied only in Bold BI. For SQL live data sources Filters and joins can be applied either on Bold BI or on the database server.
Conclusion
To avoid duplicate rows when joining tables:
- Use primary keys to establish a unique match between rows.
- Apply filters to narrow down the results based on specific conditions.