CREATE INDEX ix_people_name ON people(name); CREATE INDEX ix_titles_type ON titles(type); CREATE INDEX ix_titles_primary_title ON titles(primary_title); CREATE INDEX ix_titles_original_title ON titles(original_title); CREATE INDEX ix_akas_title_id ON akas(title_id); CREATE INDEX ix_akas_title ON akas(title); CREATE INDEX ix_crew_title_id ON crew(title_id); CREATE INDEX ix_crew_person_id ON crew(person_id);
2 SQLite3的一些特性
SQLite3与MySQL的一些不同是一些命另和函数,其数据库的使用直接采用:
1 2 3
$ ls imdb-cmudb2022.db placeholder $ sqlite3 imdb-cmudb2022.db
查看表:
1 2
sqlite> .tables akas crew episodes people ratings titles
查看表结构
1 2 3 4 5 6 7 8
sqlite> .schema people CREATE TABLE people( person_id VARCHAR PRIMARY KEY, name VARCHAR, born INTEGER, died INTEGER ); CREATE INDEX ix_people_name ON people(name);
2022 #### 3.1 Q2 [5 points] (q2_sci_fi)Find the 10Sci-Fiworks with the longest runtimes.Details:Print the title of the work, the premiere date, and the runtime. The column listing the runtime should be suffixed with the string " (mins)", for example, if the runtime_mins value is12, you should output 12 (mins). Note a work is Sci-Fi even if it is categorized in multiple genres, as long as Sci-Fi is one of the genres.Your first row should look like this: Cicak-Man 2: Planet Hitam|2008|999 (mins)
SELECT PRIMARY_TITLE,PREMIERED,(RUNTIME_MINUTES||' (mins)' ) FROM TITLES WHERE GENRES LIKE '%Sci-Fi' order by RUNTIME_MINUTES DESC LIMIT 10;
3.2 q3_oldest_people
Determine the oldest people in the dataset who were born in or after 1900. You should assume that a person without a known death year is still alive.Details:Print the name and age of each person. People should be ordered by a compound value of their age and secondly their name in alphabetical order. Return the first 20 results.
select name, case when died is null then 2022-born else died-born end as age from people where born>=1900 order by age desc,name limit 20;
2023
3.1 q2_not_the_same_title
Find the 10 Action movies with the newest premiere date whose original title is not the same as its primary title.
Details: Print the premiere year, followed by the two titles in a special format. The column listing the two titles should be in the format of primary_title (original_title) Note a work is Action even if it is categorized in multiple genres, as long as Action is one of the genres. Also note that it's possible for the premiered year to be in the future. If multiple movies premiered in the same year, order them alphabetically. Your first row should look like this:
1 2 3 4 5
SELECT PREMIERED,(PRIMARY_TITLE||' ('||ORIGINAL_TITLE||')') AS T FROM TITLES WHERE PRIMARY_TITLE!=ORIGINAL_TITLE AND GENRES LIKE '%Action%' order by PREMIERED DESC,T LIMIT 10;
3.2 q3_longest_running_tv
Find the top 20 longest running tv series.
Details: Print the title and the years the series has been running for. The series must have a non NULL premiered year. If the ended date is NULL, assume it to be the current year (2023). If multiple tv series have been running the same number of years, order them alphabetically. Print the top 20 results.
Your output should have the format: TITLE|YEARS_RUNNING Your first row should look like this:Looney Tunes|93
1 2 3 4 5 6 7 8 9
SELECT PRIMARY_TITLE, CASE WHEN ENDED IS NULL THEN 2023-PREMIERED ELSE ENDED-PREMIERED END AS YEARS_RUNNING FROM TITLES WHERE PREMIERED IS NOT NULL AND TYPE='tvSeries' ORDER BY YEARS_RUNNING DESC,PRIMARY_TITLE LIMIT 20;
3.3 q4_directors_in_each_decade
List the number directors born in each decade since 1900.(计算自1900年以来每10年出生的导演人数)
Details: Print the decade in a fancier format by constructing a string that looks like this: 1990s. Order the results by decade.
Your output should look like this: DECADE|NUM_DIRECTORS Your first row should look like this: 1900s|376
SELECT decade || 's', COUNT(*) FROM ( SELECT DISTINCT(people.person_id), FLOOR(people.born / 10) * 10 AS decade FROM people INNER JOIN crew ON crew.person_id = people.person_id WHERE crew.category = 'director' AND people.born >= 1900 ) GROUP BY decade ORDER BY decade;
上面的group by decade1最重要,这样count()会以decade组区分进行统计
3.4 q5_german_type_ratings
Compute statistics about different type of works that has a German title.(计算具有德语标题的不同类型作品的统计数据。) Details:Compute the average (rounded to 2 decimal places), min, and max rating for each type of work that has a German title and the akas types is either imdbDisplay or original. Sort the output by the average rating of each title type.(计算具有德语标题且akas类型为imdbDisplay或original的每种类型的作品的平均值(舍入到小数点后2位)、最小值和最大值。按每个标题类型的平均评分对输出进行排序。)
Your output should have the format: TITLE_TYPE|AVG_RATING|MIN_RATING|MAX_RATING Your first row should look like this:movie|6.65|3.4|8.2
分析:titles表、akas和ratings表通过title_id进行联结,其中titles中作为主键,akas和ratings作为外键。要求akas.types in ('imdbDisplay','original') and akas.language='de',计算平均值、最大值、最小值。为三表查询
1 2 3 4 5 6 7 8
SELECT T.TYPE, ROUND(AVG(R.RATING),2),MIN(R.RATING),MAX(R.RATING) FROM TITLES AS T INNER JOIN AKAS AS A ON T.TITLE_ID=A.TITLE_ID INNER JOIN RATINGS AS R ON T.TITLE_ID=R.TITLE_ID WHERE A.TYPES IN ('imdbDisplay','original') AND A.LANGUAGE='de' GROUP BY T.TYPE ORDER BY T.TYPE;
3.5 q6_who_played_a_batman
List the 10 highest rated actors who played a character named "Batman".(请列出出演过“蝙蝠侠”角色的10位评价最高的演员) Details:Calculate the actor rating by taking the average rating of all their works. Return both the name of the actor and their rating and only list the top 10 results in order from highest to lowest rating. Round average rating to the nearest hundredth.(通过取所有作品的平均评分来计算演员的评分。返回演员的名字和他们的评分,并且只按评分从高到低的顺序列出前10个结果。四舍五入平均评级到最接近的百分之一。)
Make sure your output is formatted as follows: Kayd Currier|8.05
分析:name在people表,rating在ratings表,characters在crew表,其中三者的联结依赖主外键,ratings和crew通过title_id联结,crew和people通过person_id联结,要求crew.characters like '%Batman%'。可将people和crew先用person_id进行内连接得到actor,然后利用actor与ratings连接
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
WITH actors AS( SELECT DISTINCT(crew.person_id) AS person_id, people.name AS person_name FROM crew INNER JOIN people ON people.person_id = crew.person_id WHERE crew.characters LIKE '%"Batman"%' AND crew.category = "actor" ) SELECT a.person_name, ROUND(AVG(rating), 2) AS average FROM actors AS a INNER JOIN crew AS c ON c.person_id = a.person_id INNER JOIN ratings AS r ON c.title_id = r.title_id GROUP BY a.person_id ORDER BY average DESC LIMIT 10;
上述等价于:
1 2 3 4 5 6 7 8 9 10 11 12
SELECT A.name,ROUND(AVG(R.rating),2) AS SCORE FROM(SELECT DISTINCT(C.person_id),P.name,C.title_id FROM crew AS C INNER JOIN people AS P ON C.person_id=P.person_id WHERE C.characters LIKE '%"Batman"%' AND C.category = 'actor' ) AS A INNER JOIN crew AS C ON C.person_id=A.person_id INNER JOIN ratings AS R ON C.title_id=R.title_id GROUP BY A.person_id ORDER BY SCORE DESC LIMIT 10;
3.6 q7_born_with_prestige
List the number of actors or actress who were born on the year that "The Prestige" was premiered.(列出在《致命魔术》首映那一年出生的演员人数。) Details:Print only the total number of actors born that year. For this question, determine distinct people by their person_id, not their names. Do not hard code the query.(只打印当年出生的演员总数。对于这个问题,确定不同的人通过他们的person_id,而不是他们的名字。不要硬编码查询。)
分析:born在people表,primary_title在title表,因此需要将crew与people联结起来。加上条件category IN ('actor', 'actress')和primary_title = 'The Prestige'
1 2 3 4 5 6 7 8 9
SELECT COUNT(DISTINCT(p.person_id)) FROM people AS p INNER JOIN crew AS c ON p.person_id = c.person_id WHERE c.category IN('actor', 'actress') AND p.born IN( SELECT premiered FROM titles WHERE primary_title = 'The Prestige' );
3.7 q8_directing_rose.sql
Find all the directors who have worked with an actress with first name "Rose".(找出所有与名字为“Rose”的女演员合作过的导演。) Details:Print only the names of the directors in alphabetical order. Each name should only appear once in the output.
分析:category在表crew,name在people
1 2 3 4 5 6 7 8 9 10 11 12 13
WITH title_ids AS ( SELECT c.title_id AS title_id FROM crew AS c INNERJOIN people AS p ON p.person_id = c.person_id WHERE p.name like'Rose%' AND c.category = 'actress' ) SELECTDISTINCT(p.name) ASname FROM title_ids AS t INNERJOIN crew AS c ON t.title_id = c.title_id INNERJOIN people AS p ON p.person_id = c.person_id WHERE c.category = 'director' ORDERBYname;
当一个进程 A 要访问某资源时,必须先提出请求。如果此时该资源空闲,系统便可将之分配给请求进程 A 使用。此后若再有其它进程也要访问该资源时(只要 A 未用完),则必须等待。仅当 A 进程访问完并释放该资源后,才允许另一进程对该资源进行访问。我们把这种资源共享方式称为互斥式共享,而把在一段时间内只允许一个进程访问的资源称为临界资源或独占资源,例如打印机。