Introduction
Handling large datasets in MySQL, especially for pagination on a table with 10 million records, can be challenging. A poorly optimized query might take up to 7 seconds to return results, negatively impacting user experience. In this article, we’ll explore how to use a temporary table and indexing to reduce query time from 7 seconds to just 1 second, based on a real-world example.
The Problem: Slow Query with 10 Million Records
Consider a basic pagination query on the pre_go_crm_user table with 10 million records:
SELECT SQL_NO_CACHE usr_id, usr_created_at_data, usr_email, usr_phone, usr_username, usr_created_at, usr_create_ip_at, usr_last_login_at, usr_last_login_ip_at, usr_login_times, usr_status FROM pre_go_crm_user WHERE usr_created_at_data > '2024-07-23 00:00:00' AND usr_created_at_data < '2024-08-22 23:59:59' ORDER BY usr_created_at_data ASC, usr_id ASC LIMIT 9000000, 50;
- Issues:
- LIMIT 9000000, 50: With a large OFFSET (9 million), MySQL must scan 9 million records before returning 50, resulting in a 7-second query.
- No index on the usr_created_at_data column in the WHERE clause, forcing a full table scan.
- ORDER BY on multiple columns (usr_created_at_data, usr_id) increases sorting overhead.
Solution: Use Temporary Tables and Indexing
Step 1: Add an Index on the WHERE Clause Column
To speed up filtering, create an index on the usr_created_at_data column in the pre_go_crm_user table:
CREATE INDEX idx_usr_created_at_data ON pre_go_crm_user (usr_created_at_data);
- Benefit: The index allows MySQL to quickly locate records that match the WHERE condition, avoiding a full table scan.
Step 2: Use a Temporary Table to Query Only IDs
Instead of querying the main table directly, create a temporary table (temp) to store the usr_id of the required records:
SELECT SQL_NO_CACHE usr_id FROM pre_go_crm_user WHERE usr_created_at_data > '2024-07-23 00:00:00' AND usr_created_at_data < '2024-08-22 23:59:59' ORDER BY usr_created_at_data ASC, usr_id ASC LIMIT 9000000, 50 ) AS temp
- Benefit:
- The temporary table only contains the usr_id column, reducing the data size.
- Querying the temporary table is faster since it doesn’t fetch all columns.
Step 3: Query the Main Table with the Temporary Table
Use the temporary table to join with the main table and retrieve all columns:
SELECT SQL_NO_CACHE pre.usr_id, usr_created_at_data, usr_email, usr_phone, usr_username, usr_created_at, usr_create_ip_at, usr_last_login_at, usr_last_login_ip_at, usr_login_times, usr_status FROM ( SELECT SQL_NO_CACHE usr_id FROM pre_go_crm_user WHERE usr_created_at_data > '2024-07-23 00:00:00' AND usr_created_at_data < '2024-08-22 23:59:59' ORDER BY usr_created_at_data ASC, usr_id ASC LIMIT 9000000, 50 ) AS temp INNER JOIN pre_go_crm_user AS pre ON temp.usr_id = pre.usr_id ORDER BY usr_created_at_data ASC, usr_id ASC;
- Benefit:
- The temporary table query is fast since it only processes the usr_id column.
- The join with the main table retrieves only 50 records, significantly reducing the workload.
Results
- Before Optimization: The query took 7 seconds due to a full table scan and large OFFSET.
- After Optimization: The query time dropped to 1 second thanks to:
- Indexing on usr_created_at_data for faster filtering.
- A temporary table to reduce data processing.
- Efficient joining with the main table.
Pros and Cons
- Pros:
- Significant query speed improvement (7s → 1s).
- Scalable for even larger datasets.
- Easy to implement in backend systems.
- Cons:
- Requires creating a temporary table, which consumes initial resources.
- If the data changes frequently, the temporary table needs regular updates.
Conclusion
Optimizing MySQL queries for a table with 10 million records is achievable with temporary tables and proper indexing. This approach not only boosts performance but also enhances user experience in real-world applications. Try implementing it today to see the difference!