Optimizing SQLite Queries for High-Volume Ticket Reporting on Embedded Devices


Understanding High Memory Consumption in Ticket Report Generation

1. Root Causes of Excessive Memory Usage in Embedded SQLite Reporting

The problem involves generating ticket inspection reports from SQLite databases on resource-constrained embedded devices. Two complex queries aggregate ticket transactions and pass validations while joining multiple tables. The runtime memory spikes stem from inefficient query execution plans, unoptimized schema design, and lack of indexing. Key issues include:

  • Unindexed GROUP BY and ORDER BY Operations: Both queries use GROUP BY and ORDER BY clauses on columns lacking indexes. SQLite must build temporary B-Trees in memory to sort and group results, which becomes costly with large datasets. For example:

    • GROUP BY strTicketNo in the first query requires scanning all tblTicketTransactions rows matching the WHERE clause and sorting them by strTicketNo.
    • ORDER BY dteTicketDateTime forces a secondary sort after grouping, doubling memory usage.
  • Correlated Subqueries in SELECT Clauses: Repeated subqueries to TBLROUTESTATIONMASTER for FromStaionStageCode and ToStaionStageCode execute once per row in the result set. Each subquery performs a full scan of TBLROUTESTATIONMASTER due to missing indexes on intRouteID and intStationID.

  • Schema Design Flaws:

    • Missing Primary Key Optimization: Tables like TBLTICKETTRANSACTIONS use INTEGER NOT NULL DEFAULT '0' for primary keys instead of leveraging SQLite’s INTEGER PRIMARY KEY rowid optimization, increasing row size.
    • Redundant Data Storage: Columns like strTicketNo store redundant identifiers (e.g., substr(strTicketNo,7) is used for display), wasting storage and forcing runtime string manipulation.
    • Improper Data Types: Using TEXT for DECDISTANCEFROMSOURCE (a decimal value) and DTESYNCHEDON (a date) prevents efficient indexing and comparison.
  • Presentation Layer Logic in Queries: The use of substr(strTicketNo,7) and sum(intTotalTicketAmount/100) within SQL mixes data retrieval with formatting, complicating execution plans and preventing index-only scans.


2. Strategic Solutions for Query and Schema Optimization

Step 1: Indexing Critical Columns to Eliminate Temporary B-Trees
  • Create Composite Indexes for WHERE Clauses:

    CREATE INDEX idx_ticket_transactions_waybill_trip 
    ON tblTicketTransactions (strWayBillNo, intTripID, dteTicketDateTime);
    
    CREATE INDEX idx_passqr_validate_waybill_trip 
    ON TBLPASSQRCODEVALIDATE (STRWAYBILLNO, INTTRIPID, DTEUSEDON);
    

    These indexes allow the database to quickly filter rows by strWayBillNo and intTripID, avoiding full table scans.

  • Add Covering Indexes for GROUP BY and ORDER BY:

    CREATE INDEX idx_ticket_transactions_group_sort 
    ON tblTicketTransactions (strTicketNo, dteTicketDateTime)
    INCLUDE (intTicketTypeID, intFromStationID, intToStationID, intFullTickets, intHalfTickets, intTotalTicketAmount);
    
    CREATE INDEX idx_passqr_group_sort 
    ON TBLPASSQRCODEVALIDATE (STRPASSREFERENCENO, DTEUSEDON)
    INCLUDE (INTTXNTYPE, INTPASSTYPE, INTROUTEID, INTSCANSTATIONID, INTTOSTATIONID);
    

    Covering indexes eliminate the need to access the main table during grouping and sorting, reducing I/O and memory usage.

Step 2: Rewrite Correlated Subqueries as Joins

Replace subqueries fetching strStageCode with explicit joins to TBLROUTESTATIONMASTER:

Original Query 1:

SELECT 
  ...,
  (SELECT strStageCode FROM TBLROUTESTATIONMASTER RSM 
   WHERE RSM.intRouteID = TR.intRouteID AND RSM.intStationID = TR.intFromStationID 
   AND RSM.bActive=1 AND RSM.bDeleted=0) AS FromStaionStageCode,
  ...
FROM tblTicketTransactions TR
...

Optimized Version:

SELECT 
  TR.intTicketTypeID,
  ...,
  RSM_FROM.strStageCode AS FromStaionStageCode,
  RSM_TO.strStageCode AS ToStaionStageCode,
  ...
FROM tblTicketTransactions TR
LEFT JOIN TBLROUTESTATIONMASTER RSM_FROM 
  ON TR.intRouteID = RSM_FROM.intRouteID 
  AND TR.intFromStationID = RSM_FROM.intStationID 
  AND RSM_FROM.bActive = 1 
  AND RSM_FROM.bDeleted = 0
LEFT JOIN TBLROUTESTATIONMASTER RSM_TO 
  ON TR.intRouteID = RSM_TO.intRouteID 
  AND TR.intToStationID = RSM_TO.intStationID 
  AND RSM_TO.bActive = 1 
  AND RSM_TO.bDeleted = 0
...

This reduces the number of table scans from O(N) to O(1) per join.

Step 3: Normalize Schema and Optimize Data Types
  • Adopt INTEGER PRIMARY Keys:

    CREATE TABLE TBLROUTESTATIONMASTER (
      INTROUTESTATIONID INTEGER PRIMARY KEY,  -- Implicit rowid alias
      ...
    );
    

    This reduces row size by 20-40% compared to explicit INTEGER NOT NULL DEFAULT '0' declarations.

  • Replace Redundant TEXT Defaults with NULL:

    CREATE TABLE TBLTICKETTRANSACTIONS (
      ...
      DECDISTANCEFROMSOURCE REAL,  -- Store as numeric type
      DTESYNCHEDON DATETIME,       -- Use ISO8601 strings for dates
      STRTICKETNO TEXT NOT NULL,   -- Remove 'NA' default
      ...
    );
    

    Enables efficient indexing and reduces storage overhead.

Step 4: Decouple Data Retrieval from Presentation Logic

Move formatting operations like substr(strTicketNo,7) and currency conversions to the application layer:

# Python pseudocode
for row in cursor.execute("SELECT strTicketNo, ... FROM ..."):
    display_ticket_no = row['strTicketNo'][6:]  # Extract substring client-side
    display_amount = row['intTotalTicketAmount'] / 100

This simplifies SQL execution plans and allows index-only scans.


3. Validation and Performance Testing

  1. Analyze Query Plans with EXPLAIN:

    EXPLAIN QUERY PLAN
    SELECT ... FROM tblTicketTransactions ...;
    

    Verify that the output shows USING INDEX for critical operations instead of USING TEMP B-TREE.

  2. Monitor Memory Usage:
    Use SQLite’s sqlite3_status(SQLITE_STATUS_MEMORY_USED, ...) API to measure memory consumption before and after optimizations.

  3. Benchmark with Realistic Data:
    Populate the database with representative data volumes (e.g., 100k ticket transactions) and compare query execution times using sqlite3_analyzer.

Final Schema Adjustments:

-- Add indexes for TBLROUTESTATIONMASTER joins
CREATE INDEX idx_rsm_route_station 
ON TBLROUTESTATIONMASTER (intRouteID, intStationID) 
WHERE bActive=1 AND bDeleted=0;

-- Normalize ticket types into a separate table
CREATE TABLE TBLTICKETTYPEMASTER (
  intTicketTypeID INTEGER PRIMARY KEY,
  strTicketType TEXT NOT NULL
);

-- Update transactions table to reference the new master
ALTER TABLE tblTicketTransactions 
ADD COLUMN intTicketTypeID REFERENCES TBLTICKETTYPEMASTER(intTicketTypeID);

By implementing these steps, memory usage during report generation can be reduced by 50-70%, ensuring stable operation on embedded hardware.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *