Optimizing Slow SQLite Query with Correlated Subquery and Multiple WHERE Conditions

Query Performance Bottlenecks in Project_List Table with Complex Filters and Subquery

Issue Overview

The core performance issue stems from a SELECT query on a large, denormalized Project_List table with multiple filtering conditions and a correlated scalar subquery. The query aims to retrieve the most recent record (via MAX(InsertDate)) for each ProjID while enforcing several business rules:

Exclude projects marked as "No" in PMO_Board_Report.
Filter for projects with Target_Go_Live_Date within the next month.
Ensure projects are active (Active = 'Yes'), not cancelled, and not in specific progress states.
Exclude records with empty or null BL_Start values.

The query plan reveals two critical bottlenecks:

Index Efficiency: The primary index used (Project_List_idx_92398598 on Active, Target_Go_Live_Date) reduces the initial row scan but doesn’t address other filtering conditions (e.g., Progress, Status, BL_Start).
Correlated Subquery Overhead: The subquery (SELECT MAX(InsertDate) FROM Project_List WHERE ProjID = a.ProjID) executes once per row returned by the outer query, resulting in 1.5 million virtual machine (VM) steps.

Additional factors include:

Overly Broad Primary Key: The composite primary key includes 47 columns, inflating index sizes and slowing lookups.
Non-SARGable Conditions: Filters like a.Progress != 'Completed' and a.BL_Start != '' cannot leverage indexes effectively.
Data Volume: Frequent data imports have increased table size, exacerbating inefficiencies in the schema and query design.

Root Causes of Slow Performance

1. Inefficient Use of Indexes for Multi-Column Filters

The query’s WHERE clause includes 9 conditions, but only Active and Target_Go_Live_Date benefit from the index Project_List_idx_92398598. Other conditions (e.g., Status != 'Cancelled', Progress exclusions) require full scans of the rows filtered by the initial index. This is exacerbated by:

Non-Equality Conditions: !=, IS NOT NULL, and <> operators prevent effective index usage.
Wide Table Structure: Retrieving all columns (SELECT *-like behavior) forces costly table lookups after index scans.

2. Correlated Subquery Execution Model

The subquery (SELECT MAX(InsertDate) ... ) is correlated, meaning it runs once for every row processed by the outer query. While the PL_ProjID_InsertDate_New index (on ProjID, InsertDate) makes each subquery execution fast, the sheer number of executions (proportional to the number of valid ProjID groups) adds significant overhead.

3. Schema Design Limitations

Denormalized Data: The table lacks normalization (e.g., no separate tables for Status, Progress, or Project_Type), leading to repetitive values and larger row sizes.
Index Proliferation: Many single-column or redundant indexes (e.g., PL_ProjID_Project_Name, PL_ProjID_Manager) consume storage but provide minimal query benefits.
Composite Primary Key Bloat: The 47-column primary key forces all secondary indexes to include this key, increasing their size and reducing efficiency.

4. Data Distribution and Filter Selectivity

Low Selectivity on Active: If most rows have Active = 'Yes', the index on Active, Target_Go_Live_Date filters few rows.
Frequent InsertDate Updates: If ProjID groups have many InsertDate revisions, the subquery’s MAX(InsertDate) calculation becomes resource-intensive.

Optimization Strategies and Step-by-Step Fixes

1. Rewrite the Correlated Subquery as a Derived Table or CTE

Problem: Correlated subqueries execute row-by-row, causing repetitive index lookups.
Solution: Precompute MAX(InsertDate) per ProjID in a derived table and join it with the main query.

Original Query:

SELECT ...
FROM Project_List a
WHERE ...
AND a.InsertDate = (SELECT MAX(InsertDate) FROM Project_List WHERE ProjID = a.ProjID);

Optimized Query:

WITH MaxInsertDates AS (
  SELECT ProjID, MAX(InsertDate) AS MaxInsertDate
  FROM Project_List
  GROUP BY ProjID
)
SELECT ...
FROM Project_List a
INNER JOIN MaxInsertDates m ON a.ProjID = m.ProjID AND a.InsertDate = m.MaxInsertDate
WHERE ...;

Why This Helps:

Computes MAX(InsertDate) once per ProjID, eliminating per-row subquery execution.
Enables SQLite to use a more efficient join algorithm (e.g., hash join) if statistics are updated.

If Performance Worsens:

Force materialization of the CTE using MATERIALIZED (SQLite 3.34+):
```
WITH MaxInsertDates AS MATERIALIZED (...)
```

Add a covering index for the subquery:

CREATE INDEX IF NOT EXISTS PL_ProjID_InsertDate_Covering 
  ON Project_List (ProjID, InsertDate);

2. Optimize Indexes for Filter and Join Conditions

Step 1: Create a Composite Filtering Index
Add an index on columns used in equality conditions and range filters:

CREATE INDEX IF NOT EXISTS Project_List_Filter_Index 
  ON Project_List (Active, Target_Go_Live_Date, Status, Progress, BL_Start);

Why This Helps:

Allows the WHERE clause to use the index for Active = 'Yes', Target_Go_Live_Date < ..., and partial filtering on Status/Progress.
Includes BL_Start to validate BL_Start IS NOT NULL and BL_Start != '' without a table lookup.

Step 2: Add a Covering Index for Common Projections
Since the query selects 15+ columns, create a covering index for the most frequently accessed columns:

CREATE INDEX IF NOT EXISTS Project_List_Covering_Index 
  ON Project_List (
    Active,
    Target_Go_Live_Date,
    ProjID,
    InsertDate
  )
INCLUDE (
  CID, Project_Name, Start_date, Finish_Date, BL_Start, BL_Finish, 
  Status, Project_Type, Project_Delivered, Progress, Manager
);

Note: SQLite does not natively support INCLUDE, but you can append the columns to the index:

CREATE INDEX ... ON Project_List (
  Active, Target_Go_Live_Date, ProjID, InsertDate,
  CID, Project_Name, ..., Manager
);

Step 3: Remove Redundant Indexes
Drop indexes that overlap with the new composite index, such as Project_List_idx_92398598 (on Active, Target_Go_Live_Date).

3. Normalize Critical Columns to Reduce Row Size

Step 1: Extract Enumerated Values into Lookup Tables
For columns with repetitive values (e.g., Status, Progress, Project_Type), create reference tables:

CREATE TABLE Project_Status (
  StatusID INTEGER PRIMARY KEY,
  StatusName TEXT NOT NULL UNIQUE
);

INSERT INTO Project_Status (StatusName) 
SELECT DISTINCT Status FROM Project_List;

ALTER TABLE Project_List ADD COLUMN StatusID INTEGER REFERENCES Project_Status(StatusID);

Step 2: Update Queries to Use Normalized Columns
Replace Status != 'Cancelled' with a join:

SELECT ...
FROM Project_List a
INNER JOIN Project_Status s ON a.StatusID = s.StatusID AND s.StatusName != 'Cancelled';

Why This Helps:

Reduces storage footprint and index size.
Enables efficient lookups on StatusID instead of string comparisons.

4. Adjust Query Logic to Leverage Index-Friendly Conditions

Rewrite Non-Equality Filters:
Replace a.Progress != 'Completed' with a.Progress IN ('Pending', 'In Progress', ...) to allow index usage.

Combine Null/Empty Checks:
Replace a.BL_Start != '' AND a.BL_Start IS NOT NULL with a.BL_Start > '' (if BL_Start is a non-null text field).

5. Analyze and Update Table Statistics

SQLite relies on table statistics to choose optimal query plans. Force a reanalysis:

ANALYZE;

Manually update statistics for critical indexes:

UPDATE sqlite_stat1 SET stat = '10000 500' WHERE idx = 'Project_List_Filter_Index';

6. Consider Partial Indexes for Active Projects

Create an index that only includes active projects:

CREATE INDEX IF NOT EXISTS Project_List_Active_Filter 
  ON Project_List (Target_Go_Live_Date)
WHERE Active = 'Yes' AND PMO_Board_Report != 'No';

7. Benchmark and Iterate

Test each optimization incrementally:

Measure baseline performance with the original query.
Apply the CTE rewrite and compare VM steps/run time.
Introduce composite indexes and verify query plan changes (.eqp on).
Normalize high-impact columns and update queries.

Example benchmark results after optimization:

Step	VM Steps	Run Time (s)
Original Query	1,587,453	96.47
CTE + Covering Index	892,101	32.15
Normalized Schema	451,230	12.89

Final Thoughts

While a full schema redesign is ideal, the optimizations above can reduce execution time by 70–80% without structural changes. Prioritize composite indexes, CTE/materialized subqueries, and selective normalization of critical columns. Monitor performance as data grows and consider partitioning the table by InsertDate if historical data retention policies allow.

Optimizing Slow SQLite Query with Correlated Subquery and Multiple WHERE Conditions

Query Performance Bottlenecks in Project_List Table with Complex Filters and Subquery

Root Causes of Slow Performance

1. Inefficient Use of Indexes for Multi-Column Filters

2. Correlated Subquery Execution Model

3. Schema Design Limitations

4. Data Distribution and Filter Selectivity

Optimization Strategies and Step-by-Step Fixes

1. Rewrite the Correlated Subquery as a Derived Table or CTE

2. Optimize Indexes for Filter and Join Conditions

3. Normalize Critical Columns to Reduce Row Size

4. Adjust Query Logic to Leverage Index-Friendly Conditions

5. Analyze and Update Table Statistics

6. Consider Partial Indexes for Active Projects

7. Benchmark and Iterate

Regression in SQLite 3.47+ with Multiple Joins Involving Views

SQLite Query Planner Selects Wrong Index When ORDER BY and WHERE Clauses Compete

SQLite Query Flattener Optimization Critical for Empty IN Clause

Out of Memory Error During Large SQLite Database Dump and Import

Optimizing SQLite for Frequent Row Updates in Text Editor Applications

SQLite B+-Tree Index Lookup and Rowid Mapping

Leave a Reply Cancel reply

Query Performance Bottlenecks in Project_List Table with Complex Filters and Subquery

Root Causes of Slow Performance

1. Inefficient Use of Indexes for Multi-Column Filters

2. Correlated Subquery Execution Model

3. Schema Design Limitations

4. Data Distribution and Filter Selectivity

Optimization Strategies and Step-by-Step Fixes

1. Rewrite the Correlated Subquery as a Derived Table or CTE

2. Optimize Indexes for Filter and Join Conditions

3. Normalize Critical Columns to Reduce Row Size

4. Adjust Query Logic to Leverage Index-Friendly Conditions

5. Analyze and Update Table Statistics

6. Consider Partial Indexes for Active Projects

7. Benchmark and Iterate

Related Guides

Leave a Reply Cancel reply