Cross Join Order with pragma_table_info Causes Missing Results in SQLite


Understanding the Impact of JOIN Order on Table-Valued Functions

Issue Overview: Mismatched Results When Reversing CROSS JOIN with pragma_table_info

The core problem arises when using CROSS JOIN with SQLite’s pragma_table_info function in different positions within a query. The user observed that reversing the order of CROSS JOIN between a standard table and pragma_table_info leads to unexpected results:

  • When pragma_table_info is on the right side of CROSS JOIN, the query returns the expected column names from dynamically referenced tables.
  • When pragma_table_info is on the left side of CROSS JOIN, the query returns no results.

Key Observations

  1. Behavior with CROSS JOIN vs. Plain JOIN
    Using CROSS JOIN enforces a specific loop order for query execution. In contrast, omitting CROSS (i.e., using JOIN without an ON clause) allows the query planner to reorder tables for optimization.

  2. Dependence on Parameter Passing
    pragma_table_info is a table-valued function requiring a table name as an input parameter. This parameter (t in the example) must come from a column in another table (tab.t). The parameter acts as a hidden filter, making the function’s output dependent on the outer loop’s data.

  3. Query Planner Constraints
    The CROSS JOIN keyword restricts the query planner’s ability to reorder tables. When pragma_table_info is placed in the outer loop (left side of CROSS JOIN), SQLite cannot resolve the parameter t because the inner loop (the tab table) has not been processed yet. This creates a logical dependency cycle, resulting in no output.

  4. Error Handling for Invalid Parameters
    Attempting to reference a non-existent column (e.g., badcol) in the pragma_table_info function correctly triggers an error. This confirms that SQLite validates column names in the outer context but does not warn about logical dependency mismatches in CROSS JOIN order.

Underlying Mechanics

  • Virtual Table Execution Flow: pragma_table_info is implemented as a virtual table. Virtual tables with parameters require those parameters to be resolved before they are queried. This is analogous to a function call where arguments must be evaluated before the function runs.
  • Loop Nesting in Joins: SQLite processes joins by nesting loops. For A CROSS JOIN B, SQLite iterates over all rows of A, and for each row, iterates over all rows of B. If B depends on values from A, this works. Reversing the order (B CROSS JOIN A) breaks the dependency chain if B requires data from A.

Root Causes: Why JOIN Order and CROSS JOIN Matter

1. Parameterized Virtual Tables Require Outer Loop Context

pragma_table_info is not a static table; it dynamically generates rows based on its input parameter (the table name). For the parameter to resolve correctly, the source of the parameter (the tab.t column) must be in an outer loop relative to the virtual table. This is enforced automatically when using JOIN but restricted by CROSS JOIN order.

Example:

-- Works: tab (outer loop) → pragma_table_info (inner loop)
SELECT name FROM tab CROSS JOIN pragma_table_info(t);

-- Fails: pragma_table_info (outer loop) ← tab (inner loop)
SELECT name FROM pragma_table_info(t) CROSS JOIN tab;

In the failing case, pragma_table_info is processed first and has no value for t because tab hasn’t been read yet.

2. CROSS JOIN Enforces Fixed Loop Ordering

The CROSS JOIN keyword explicitly tells SQLite to process the left table first (outer loop) and the right table second (inner loop). This overrides the query planner’s ability to reorder tables based on efficiency or dependencies.

Contrast with Implicit JOIN:
When CROSS is omitted, SQLite’s query planner analyzes the query and automatically places tab in the outer loop to satisfy the parameter dependency of pragma_table_info. This is why JOIN works while CROSS JOIN fails when the order is reversed.

3. Lack of Error Detection for Impossible Nesting

SQLite does not throw an error when a parameterized virtual table is placed in an outer loop without access to its required parameters. Instead, it silently returns no rows, which can be misleading. This is a known behavior in SQLite’s virtual table implementation, where missing parameters are treated as empty inputs rather than errors.


Resolving the Issue: Strategies for Reliable Parameter Passing

1. Use Implicit JOIN Instead of CROSS JOIN

Replace CROSS JOIN with JOIN to allow the query planner to reorder tables based on parameter dependencies:

-- Correct: Let the query planner choose the optimal loop order
SELECT name FROM pragma_table_info(t) JOIN tab;

This works because SQLite recognizes that tab must be in the outer loop to supply t to pragma_table_info.

2. Explicitly Enforce Correct Loop Order with Subqueries

If you must use CROSS JOIN, ensure the parameter source is in the outer loop. For complex queries, use subqueries to isolate the parameter resolution:

SELECT name FROM tab, (SELECT * FROM pragma_table_info(t));

Here, the comma join implicitly treats tab as the outer loop.

3. Avoid Parameterized Virtual Tables in Outer Loops

Design queries so that virtual tables with parameters are always in inner loops. This aligns with SQLite’s execution model and prevents unresolved dependencies.

4. Validate Parameters with COALESCE or Error Handling

Add checks to ensure parameters are not NULL or invalid:

SELECT name 
FROM tab 
JOIN pragma_table_info(COALESCE(t, 'fallback_table')) 
WHERE t IS NOT NULL;

5. Use Schema-Qualified Table Names

Explicitly specify the schema (usually main) to avoid ambiguity:

SELECT name 
FROM tab 
JOIN pragma_table_info(t, 'main');

6. Monitor Query Plans with EXPLAIN

Use EXPLAIN to verify loop ordering. Look for OpenRead (table scans) and VFilter (virtual table processing) operations:

EXPLAIN SELECT name FROM pragma_table_info(t) JOIN tab;

Ensure OpenRead for tab occurs before VFilter for pragma_table_info.

7. Upgrade to SQLite 3.39.0 or Later

The user’s example uses SQLite 3.39.0, but older versions may have different behaviors. Newer versions include optimizations for virtual table parameter handling.

8. Consider Alternative Approaches for Dynamic Metadata

If pragma_table_info proves unreliable in joins, extract table names first and process metadata in separate steps:

-- Step 1: Get table names
CREATE TEMP TABLE temp_tables AS SELECT t FROM tab;

-- Step 2: Process metadata
SELECT name 
FROM temp_tables 
JOIN pragma_table_info(t);

Deep Dive: How SQLite Processes Joins with Virtual Tables

Virtual Table Internals

SQLite treats virtual tables like pragma_table_info as functions that return rows. The arg and schema parameters are implicitly passed based on the query context. When the virtual table is in an inner loop, SQLite can bind parameters from the outer loop.

CROSS JOIN as a Query Planner Directive

CROSS JOIN is not just a syntactic sugar for a Cartesian product—it directly influences the query planner’s loop ordering. This is critical for parameterized virtual tables, where loop order determines parameter availability.

Comparison with Other Databases

  • PostgreSQL: Allows LATERAL JOIN to explicitly reference columns from preceding tables, similar to SQLite’s implicit parameter binding.
  • MySQL: Does not support parameterized virtual tables in the same way; metadata is accessed via INFORMATION_SCHEMA static tables.

Best Practices for pragma_table_info

  • Always place pragma_table_info in the inner loop.
  • Use JOIN without CROSS unless loop ordering is explicitly required.
  • Test queries with EXPLAIN to confirm parameter binding.

By understanding these principles, developers can avoid subtle bugs in query design and leverage SQLite’s flexibility effectively.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *