1 实验准备
实验当中提供的数据库,其中表关系如下:
索引创建如下: 1
2
3
4
5
6
7
8CREATE INDEX ix_people_name ON people (name);
CREATE INDEX ix_titles_type ON titles (type);
CREATE INDEX ix_titles_primary_title ON titles (primary_title);
CREATE INDEX ix_titles_original_title ON titles (original_title);
CREATE INDEX ix_akas_title_id ON akas (title_id);
CREATE INDEX ix_akas_title ON akas (title);
CREATE INDEX ix_crew_title_id ON crew (title_id);
CREATE INDEX ix_crew_person_id ON crew (person_id);
2 SQLite3的一些特性
SQLite3
与MySQL
的一些不同是一些命另和函数,其数据库的使用直接采用:1
2
3$ ls
imdb-cmudb2022.db placeholder
$ sqlite3 imdb-cmudb2022.db查看表:
1
2sqlite> .tables
akas crew episodes people ratings titles查看表结构
1
2
3
4
5
6
7
8sqlite> .schema people
CREATE TABLE people (
person_id VARCHAR PRIMARY KEY,
name VARCHAR,
born INTEGER,
died INTEGER
);
CREATE INDEX ix_people_name ON people (name);更大命令使用
.help
1
sqlite> .help
注意,MySQL当中的一些函数,SQLite未必有,就如
concat
我就发现没有,SQLite使用||
替代字符串拼接
3 实验问题
2022 #### 3.1
Q2 [5 points] (q2_sci_fi)
Find the 10
Sci-Fiworks with the longest runtimes.
Details:
Print the title of the work, the premiere date, and the runtime. The column listing the runtime should be suffixed with the string " (mins)", for example, if the runtime_mins value is
12, you should output 12 (mins). Note a work is Sci-Fi even if it is categorized in multiple genres, as long as Sci-Fi is one of the genres.
Your first row should look like this: Cicak-Man 2: Planet Hitam|2008|999 (mins)
找出10个运行时间最长的“科幻”作品。输出其
primary_title、primiered、runtime_minutes
,其中runtime_minutes
后边要接(mins)
。
1
2
3
4 SELECT PRIMARY_TITLE,PREMIERED,(RUNTIME_MINUTES||' (mins)' )
FROM TITLES
WHERE GENRES LIKE '%Sci-Fi'
order by RUNTIME_MINUTES DESC LIMIT 10;
3.2
q3_oldest_people
Determine the oldest people in the dataset who were born in or after 1900. You should assume that a person without a known death year is still alive.
Details:Print the name and age of each person. People should be ordered by a compound value of their age and secondly their name in alphabetical order. Return the first 20 results.
Your output should have the format:
NAME|AGE
找出出生于1900后(包含1900),年龄最大的20个人,按照年龄年龄和姓名的字母联合排序。注意没有死亡日期,证明还活着,你要能够处理这类数据
1
2
3
4
5
6
7 select name,
case
when died is null then 2022-born
else died-born
end as age
from people
where born>=1900 order by age desc,name limit 20;
2023
3.1 q2_not_the_same_title
Find the 10 Action movies with the newest premiere date whose original title is not the same as its primary title.
Details: Print the premiere year, followed by the two titles in a special format. The column listing the two titles should be in the format of primary_title (original_title) Note a work is Action even if it is categorized in multiple genres, as long as Action is one of the genres. Also note that it's possible for the premiered year to be in the future. If multiple movies premiered in the same year, order them alphabetically. Your first row should look like this:
1 | SELECT PREMIERED,(PRIMARY_TITLE||' ('||ORIGINAL_TITLE||')') AS T |
3.2 q3_longest_running_tv
Find the top 20 longest running tv series.
Details
: Print the title and the years the series has been running for. The series must have a non NULL premiered year. If the ended date is NULL, assume it to be the current year (2023). If multiple tv series have been running the same number of years, order them alphabetically. Print the top 20 results.
Your output should have the format: TITLE|YEARS_RUNNING Your first row should look like this:Looney Tunes|93
1 | SELECT PRIMARY_TITLE, |
3.3 q4_directors_in_each_decade
List the number directors born in each decade since 1900.(计算自1900年以来每10年出生的导演人数)
Details: Print the decade in a fancier format by constructing a string that looks like this: 1990s. Order the results by decade.
Your output should look like this: DECADE|NUM_DIRECTORS
Your first row should look like this: 1900s|376
分析:从事领域存储在crew
表的category
,其中人的出生时期存储在people
表,两表通过person_id
联结,people
的为主键,crew
的为外键
1 | SELECT decade || 's', |
上面的group by decade1最重要,这样count()
会以decade
组区分进行统计
3.4 q5_german_type_ratings
Compute statistics about different type of works that has a German title
.(计算具有德语标题的不同类型作品的统计数据。) Details:
Compute the average (rounded to 2 decimal places), min, and max rating for each type of work that has a German title and the akas types is either imdbDisplay or original. Sort the output by the average rating of each title type.
(计算具有德语标题且akas
类型为imdbDisplay
或original
的每种类型的作品的平均值(舍入到小数点后2位)、最小值和最大值。按每个标题类型的平均评分对输出进行排序。)
Your output should have the format: TITLE_TYPE|AVG_RATING|MIN_RATING|MAX_RATING
Your first row should look like this:movie|6.65|3.4|8.2
分析:titles
表、akas
和ratings
表通过title_id
进行联结,其中titles
中作为主键,akas
和ratings
作为外键。要求akas.types in ('imdbDisplay','original') and akas.language='de'
,计算平均值、最大值、最小值。为三表查询
1 | SELECT T.TYPE, ROUND(AVG(R.RATING),2),MIN(R.RATING),MAX(R.RATING) |
3.5 q6_who_played_a_batman
List the 10 highest rated actors who played a character named "Batman".
(请列出出演过“蝙蝠侠”角色的10位评价最高的演员) Details:
Calculate the actor rating by taking the average rating of all their works. Return both the name of the actor and their rating and only list the top 10 results in order from highest to lowest rating. Round average rating to the nearest hundredth.
(通过取所有作品的平均评分来计算演员的评分。返回演员的名字和他们的评分,并且只按评分从高到低的顺序列出前10个结果。四舍五入平均评级到最接近的百分之一。)
Make sure your output is formatted as follows: Kayd Currier|8.05
分析:name
在people
表,rating
在ratings
表,characters
在crew
表,其中三者的联结依赖主外键,ratings
和crew
通过title_id
联结,crew
和people
通过person_id
联结,要求crew.characters like '%Batman%'
。可将people
和crew
先用person_id
进行内连接得到actor,然后利用actor与ratings
连接
1 | WITH actors AS ( |
上述等价于: 1
2
3
4
5
6
7
8
9
10
11
12SELECT A.name,ROUND(AVG(R.rating),2) AS SCORE
FROM (SELECT DISTINCT(C.person_id),P.name,C.title_id
FROM crew AS C
INNER JOIN people AS P ON C.person_id=P.person_id
WHERE C.characters LIKE '%"Batman"%'
AND C.category = 'actor'
) AS A
INNER JOIN crew AS C ON C.person_id=A.person_id
INNER JOIN ratings AS R ON C.title_id=R.title_id
GROUP BY A.person_id
ORDER BY SCORE DESC
LIMIT 10;
3.6 q7_born_with_prestige
List the number of actors or actress who were born on the year that "The Prestige" was premiered.
(列出在《致命魔术》首映那一年出生的演员人数。) Details
: Print only the total number of actors born that year. For this question, determine distinct people by their person_id, not their names. Do not hard code the query.
(只打印当年出生的演员总数。对于这个问题,确定不同的人通过他们的person_id,而不是他们的名字。不要硬编码查询。)
分析:born
在people
表,primary_title
在title
表,因此需要将crew
与people
联结起来。加上条件category IN ('actor', 'actress')和primary_title = 'The Prestige'
1 | SELECT COUNT(DISTINCT(p.person_id)) |
3.7 q8_directing_rose.sql
Find all the directors who have worked with an actress with first name "Rose".
(找出所有与名字为“Rose”的女演员合作过的导演。) Details:
Print only the names of the directors in alphabetical order. Each name should only appear once in the output.
分析:category
在表crew
,name
在people
1 | WITH title_ids AS ( |