Writing Complex SQLite Queries for Film Database Analysis
Listing Directors Who Directed Films in Leap Years
To list all directors who directed a film in a leap year, we need to understand the relationship between the DIRECTOR
, MOVIE_DIRECTOR
, and MOVIE
tables. The MOVIE
table contains the year
column, which we will use to determine if a year is a leap year. A leap year is divisible by 4, but not by 100, unless it is also divisible by 400. This logic must be embedded in the SQL query.
The query involves joining the DIRECTOR
, MOVIE_DIRECTOR
, and MOVIE
tables. The MOVIE_DIRECTOR
table acts as a bridge between DIRECTOR
and MOVIE
, allowing us to link directors to the movies they directed. The MOVIE
table provides the year of the movie, which we will filter to include only leap years.
Here is the SQL query:
SELECT DISTINCT D.fname, D.lname
FROM DIRECTOR D
JOIN MOVIE_DIRECTOR MD ON D.did = MD.did
JOIN MOVIE M ON MD.mid = M.mid
WHERE (M.year % 4 = 0 AND M.year % 100 != 0) OR (M.year % 400 = 0);
This query selects distinct director names (fname
and lname
) from the DIRECTOR
table. It joins the DIRECTOR
table with the MOVIE_DIRECTOR
table on the did
column, and then joins the result with the MOVIE
table on the mid
column. The WHERE
clause filters the results to include only those movies made in leap years.
Identifying Films with the Largest Cast
Finding the film(s) with the largest cast involves counting the number of actors associated with each movie in the CAST
table and then identifying the movie(s) with the maximum count. This requires aggregating data from the CAST
table and joining it with the MOVIE
table to get the movie names.
The query uses a subquery to first determine the maximum number of actors in any movie. Then, it selects the movies that have this maximum number of actors.
Here is the SQL query:
SELECT M.name, COUNT(C.aid) AS cast_size
FROM MOVIE M
JOIN CAST C ON M.mid = C.mid
GROUP BY M.mid
HAVING cast_size = (SELECT MAX(cast_count)
FROM (SELECT COUNT(aid) AS cast_count
FROM CAST
GROUP BY mid));
This query groups the results by movie ID (mid
) and counts the number of actors (aid
) for each movie. The HAVING
clause filters the groups to include only those with a cast size equal to the maximum cast size found in the subquery.
Finding Actors Who Acted Only in Films Before 1960
To find actors who acted only in films before 1960, we need to ensure that all movies an actor has acted in were released before 1960. This involves checking the CAST
table for actor-movie pairs and then filtering based on the year
column in the MOVIE
table.
The query uses a NOT EXISTS
clause to exclude actors who have acted in any movie released in or after 1960. This ensures that only actors who have exclusively acted in pre-1960 films are selected.
Here is the SQL query:
SELECT A.fname, A.lname
FROM ACTOR A
WHERE NOT EXISTS (SELECT 1
FROM CAST C
JOIN MOVIE M ON C.mid = M.mid
WHERE C.aid = A.aid AND M.year >= 1960);
This query selects actor names from the ACTOR
table where there does not exist a movie in the CAST
and MOVIE
tables that the actor has acted in and that was released in or after 1960.
Discovering Films with More Women Actors Than Men
To find films with more women actors than men, we need to count the number of male and female actors in each film and compare these counts. This involves joining the CAST
table with the ACTOR
table to access the gender
column and then aggregating the data by movie.
The query uses conditional aggregation to count male and female actors separately and then compares these counts in the HAVING
clause.
Here is the SQL query:
SELECT M.name
FROM MOVIE M
JOIN CAST C ON M.mid = C.mid
JOIN ACTOR A ON C.aid = A.aid
GROUP BY M.mid
HAVING SUM(CASE WHEN A.gender = 'female' THEN 1 ELSE 0 END) >
SUM(CASE WHEN A.gender = 'male' THEN 1 ELSE 0 END);
This query groups the results by movie ID (mid
) and uses conditional aggregation to count the number of female and male actors. The HAVING
clause filters the groups to include only those where the count of female actors is greater than the count of male actors.
Implementing PRAGMA journal_mode and Database Backup
To ensure data integrity and prevent corruption, especially after unexpected shutdowns or power failures, it is crucial to implement proper journaling modes and regular database backups in SQLite. The PRAGMA journal_mode
command can be used to set the journaling mode, which controls how SQLite handles transactions and recovery.
The WAL
(Write-Ahead Logging) mode is often recommended for its performance benefits and robustness. It allows reads to proceed concurrently with writes, which can significantly improve performance in multi-user environments.
Here is how to set the journal mode to WAL
:
PRAGMA journal_mode=WAL;
Regular database backups are also essential. SQLite provides the .backup
command in the SQLite command-line interface (CLI) to create a backup of the database. This command copies the entire database to a specified file, ensuring that the backup is consistent and can be restored in case of failure.
Here is an example of how to create a backup using the SQLite CLI:
sqlite3 original.db ".backup backup.db"
This command creates a backup of original.db
and saves it as backup.db
.
By combining proper journaling modes with regular backups, you can significantly enhance the reliability and durability of your SQLite database, ensuring that your data remains safe and recoverable even in the face of unexpected failures.