How SQL Queries Are Executed And How You Can Optimize It
Here's How You Can Optimize Your SQL Queries (I Wish I Knew This When I Started)
This happened to me in the early days of studying SQL. When I'm preparing to tackle a problem, I sometimes get stuck trying to build a logical flow of the query.
This problem occurs when you have learned the commands but are still unsure about the order of execution in any SQL query.
Trust me, once I had a comprehensive understanding of the sequence of execution, it was extremely easy for me to solve any problem since I could envision the entire flow in my head.
Understanding the sequence of execution also allows you to optimize your queries to the greatest extent feasible.
Knowing this order is important for optimization.
You could be wondering how. - So, in this story, I'll explain how queries are run and how we might optimize them.
But, Why do we need to optimize?
You may look up several reasons on Google. But I'll simply highlight the three that I believe are the most important for each business.
Performance: It will improve query performance and accelerate the process. Consequently, customers will have a better experience with your products, such as apps.
Saves Resources: I've experienced this firsthand. A well-optimized query significantly saves system resources (such as CPU or RAM), allowing your database to manage more active users and queries at once.
Lastly, it is very cost-effective. You may conclude that from the two reasons listed above, right?
Finally, “Optimising your SQL queries is really necessary and a good practice.”
The SQL Query
Suppose, you work as a data analyst at XYZ Corporation. And you're tasked with determining the amount of the overall sales made by employees in each department last year. However, you are given the requirement that the department's average sales per employee exceed $500.
The table looks something like this:
employees
- employee_id (INT) - Unique identifier for each employee.
- name (VARCHAR) - Employee’s name.
- department_id (INT) - Foreign key referencing departments.department_id.
sales
- sale_id (INT) - Unique identifier for each sale.
- employee_id (INT) - Foreign key referencing employees.employee_id.
- sale_amount (DECIMAL) - Amount of the sale.
- sale_date (DATE) - Date of the sale.
departments
- department_id (INT) - Unique identifier for each department.
- department_name (VARCHAR) - Name of the department.
To solve this problem, I wrote the following query:
SELECT
d.department_name,
e.employee_id,
e.name,
SUM(s.sale_amount) AS total_sales,
AVG(s.sale_amount) AS average_sale_amount
FROM
employees e
JOIN
sales s ON e.employee_id = s.employee_id
JOIN
departments d ON e.department_id = d.department_id
WHERE
s.sale_date >= CURDATE() - INTERVAL 1 YEAR
GROUP BY
d.department_name, e.employee_id, e.name
HAVING
AVG(s.sale_amount) > 500
ORDER BY
total_sales DESC
LIMIT 5;
First, let’s take a closer look at our sample query.
As previously mentioned, this query returns the top 5 employees in each department who have a maximum total sales in the previous year, but only those whose average sale amount is greater than 500.
How SQL queries are executed
While we write SQL queries in a certain order, the order of execution by the database engine is different.
Here is the common order of execution:
Now, let’s detail each step and talk about some optimization tips.
Step 1: FROM and JOINs
FROM
employees e
JOIN
sales s ON e.employee_id = s.employee_id
JOIN
departments d ON e.department_id = d.department_id
At first, this query identifies which tables are needed- employees, sales, and departments -and executes the JOINs.
Optimization tips:
Ensure proper indexing on the join columns, employee_id, and department_id.
Consider Join order — start with the table that would yield the smallest amount of records after filters are applied.
Use appropriate join type INNER, LEFT, RIGHT according to the data needs.
Step 2: WHERE
WHERE
s.sale_date >= CURDATE() - INTERVAL 1 YEAR
We now have all of the tables we need. The WHERE clause filters the rows by the condition s.sale_date >= CURDATE() — INTERVAL 1 YEAR. This minimizes table size by filtering for more specific conditions.
How to optimize:
Add an index in the sales table to the sale_date column.
Use sargable conditions to perform effective use of indexes.Avoid using functions on the indexed columns in the WHERE clause if unnecessary. It may prevent index usage in certain scenarios. (Avoid non-sargable conditions as much as possible)
Step 3: GROUP BY
GROUP BY
d.department_name, e.employee_id, e.name
After filtering the table, this query groups the results on d.department_name, e.employee_id, and e.name.
How to optimize:
Ensure the columns being used in the GROUP BY clause are indexed.
Wherever possible, limit the number of columns in the GROUP BY clause.If you frequently group on the same columns, consider the use of composite indexes.
Step 4: HAVING
HAVING
AVG(s.sale_amount) > 500
Following the grouping of the results, we are now filtering down to specifics. The HAVING clause filters the results of the groups by the condition AVG(s.sale_amount) > 500.
Optimization tips:
Use HAVING only for conditions on aggregated values; other non-aggregated conditions should be moved into the WHERE clause.
Use complex calculations in the HAVING clause with caution, as it will be executed after grouping and may be resource-intensive.
Step 5: SELECT
SELECT
d.department_name,
e.employee_id,
e.name,
SUM(s.sale_amount) AS total_sales,
AVG(s.sale_amount) AS average_sale_amount
By now, we have the final table that meets all of the requirements. The SELECT clause specifies which columns should be returned in the result set and must do the aggregations (SUM and AVG).
Optimization tips:
Only select columns are needed to minimize data transfer and processing overhead.
Use column aliases to make the query easier to understand and to simplify references if you are using ORDER BY or another clause that refers to a column alias.
Consider building a covering index based on all columns that are used in the query to avoid the extra table lookup.
Step 6: ORDER BY
ORDER BY
total_sales DESC
We are sorting results in descending order by total_sales.
How to optimize:
Create an index on the column(s) associated with the ORDER BY.
Be very aware that sorting large results is extremely memory-intensive.Consider rewriting complex queries containing a sort, which are frequently reused to utilize indexed views or materialized views.
Step 7. LIMIT
LIMIT 5;
Finally, the LIMIT clause constrains the output to only the top 5 rows.
Optimization Tips:
Use LIMIT with ORDER BY for optimizing “Top N” queries.
Push LIMIT operations down to subqueries or derived tables when possible to reduce the amount of data processed by the main query.
General Optimization Strategies
Run ANALYZE and UPDATE STATISTICS regularly to keep the query optimizer best armed with the information with which to choose optimal execution plans.
Use EXPLAIN, or a related execution plan analysis tool to understand how your query is being executed and identify bottlenecks.
Consider denormalizing, or pre-aggregating, often accessed data to reduce join operations and complex calculations.
Optimization of database design and table structures for your most frequent query patterns is necessary.
Usage of data types that minimize storage and also enable more efficiency in supporting comparison operations is required.
Conclusion
Understanding how a SQL query is executed can greatly benefit in developing optimization techniques that are required for targeting it.
Utilizing these techniques appropriately can result in a good amount of value addition to a database, to its application, and in tuning its queries regularly when data volumes and access patterns continue to evolve.
This process is iterative and may differ according to the specific data, hardware involved, and the database management system being used.
It’s always important to test your optimizations in detail to make sure that expected performance improvements are obtained.
Connect: LinkedIn | Gumroad Shop | Medium | GitHub
Subscribe: Substack Newsletter | Appreciation Tip: Support