0%

CMU15-445-FALL2022-HOMEWORK1

1 实验准备

作业地址

实验当中提供的数据库,其中表关系如下:

索引创建如下:

1
2
3
4
5
6
7
8
CREATE INDEX ix_people_name ON people (name);
CREATE INDEX ix_titles_type ON titles (type);
CREATE INDEX ix_titles_primary_title ON titles (primary_title);
CREATE INDEX ix_titles_original_title ON titles (original_title);
CREATE INDEX ix_akas_title_id ON akas (title_id);
CREATE INDEX ix_akas_title ON akas (title);
CREATE INDEX ix_crew_title_id ON crew (title_id);
CREATE INDEX ix_crew_person_id ON crew (person_id);

2 SQLite3的一些特性

  • SQLite3MySQL的一些不同是一些命另和函数,其数据库的使用直接采用:

    1
    2
    3
    $ ls
    imdb-cmudb2022.db placeholder
    $ sqlite3 imdb-cmudb2022.db

  • 查看表:

    1
    2
    sqlite> .tables
    akas crew episodes people ratings titles

  • 查看表结构

    1
    2
    3
    4
    5
    6
    7
    8
    sqlite> .schema people
    CREATE TABLE people (
    person_id VARCHAR PRIMARY KEY,
    name VARCHAR,
    born INTEGER,
    died INTEGER
    );
    CREATE INDEX ix_people_name ON people (name);

  • 更大命令使用.help

    1
    sqlite> .help

注意,MySQL当中的一些函数,SQLite未必有,就如concat我就发现没有,SQLite使用||替代字符串拼接

3 实验问题

2022 #### 3.1 Q2 [5 points] (q2_sci_fi) Find the 10Sci-Fiworks with the longest runtimes. Details: Print the title of the work, the premiere date, and the runtime. The column listing the runtime should be suffixed with the string " (mins)", for example, if the runtime_mins value is12, you should output 12 (mins). Note a work is Sci-Fi even if it is categorized in multiple genres, as long as Sci-Fi is one of the genres. Your first row should look like this: Cicak-Man 2: Planet Hitam|2008|999 (mins)

找出10个运行时间最长的“科幻”作品。输出其primary_title、primiered、runtime_minutes,其中runtime_minutes后边要接(mins)

1
2
3
4
SELECT PRIMARY_TITLE,PREMIERED,(RUNTIME_MINUTES||' (mins)' )
FROM TITLES
WHERE GENRES LIKE '%Sci-Fi'
order by RUNTIME_MINUTES DESC LIMIT 10;

3.2 q3_oldest_people

Determine the oldest people in the dataset who were born in or after 1900. You should assume that a person without a known death year is still alive. Details: Print the name and age of each person. People should be ordered by a compound value of their age and secondly their name in alphabetical order. Return the first 20 results.

Your output should have the format: NAME|AGE

找出出生于1900后(包含1900),年龄最大的20个人,按照年龄年龄和姓名的字母联合排序。注意没有死亡日期,证明还活着,你要能够处理这类数据

1
2
3
4
5
6
7
select name,
case
when died is null then 2022-born
else died-born
end as age
from people
where born>=1900 order by age desc,name limit 20;

2023

3.1 q2_not_the_same_title

Find the 10 Action movies with the newest premiere date whose original title is not the same as its primary title.

Details: Print the premiere year, followed by the two titles in a special format. The column listing the two titles should be in the format of primary_title (original_title) Note a work is Action even if it is categorized in multiple genres, as long as Action is one of the genres. Also note that it's possible for the premiered year to be in the future. If multiple movies premiered in the same year, order them alphabetically. Your first row should look like this:

1
2
3
4
5
SELECT PREMIERED,(PRIMARY_TITLE||' ('||ORIGINAL_TITLE||')') AS T 
FROM TITLES
WHERE PRIMARY_TITLE!=ORIGINAL_TITLE
AND GENRES LIKE '%Action%'
order by PREMIERED DESC,T LIMIT 10;

3.2 q3_longest_running_tv

Find the top 20 longest running tv series.

Details: Print the title and the years the series has been running for. The series must have a non NULL premiered year. If the ended date is NULL, assume it to be the current year (2023). If multiple tv series have been running the same number of years, order them alphabetically. Print the top 20 results.

Your output should have the format: TITLE|YEARS_RUNNING Your first row should look like this:Looney Tunes|93

1
2
3
4
5
6
7
8
9
SELECT PRIMARY_TITLE,
CASE
WHEN ENDED IS NULL THEN 2023-PREMIERED
ELSE ENDED-PREMIERED
END AS YEARS_RUNNING
FROM TITLES
WHERE PREMIERED IS NOT NULL
AND TYPE='tvSeries'
ORDER BY YEARS_RUNNING DESC,PRIMARY_TITLE LIMIT 20;

3.3 q4_directors_in_each_decade

List the number directors born in each decade since 1900.(计算自1900年以来每10年出生的导演人数)

Details: Print the decade in a fancier format by constructing a string that looks like this: 1990s. Order the results by decade.

Your output should look like this: DECADE|NUM_DIRECTORS Your first row should look like this: 1900s|376

分析:从事领域存储在crew表的category,其中人的出生时期存储在people表,两表通过person_id联结,people的为主键,crew的为外键

1
2
3
4
5
6
7
8
9
10
11
12
SELECT decade || 's',
COUNT(*)
FROM (
SELECT DISTINCT(people.person_id),
FLOOR(people.born / 10) * 10 AS decade
FROM people
INNER JOIN crew ON crew.person_id = people.person_id
WHERE crew.category = 'director'
AND people.born >= 1900
)
GROUP BY decade
ORDER BY decade;

上面的group by decade1最重要,这样count()会以decade组区分进行统计

3.4 q5_german_type_ratings

Compute statistics about different type of works that has a German title.(计算具有德语标题的不同类型作品的统计数据。) Details: Compute the average (rounded to 2 decimal places), min, and max rating for each type of work that has a German title and the akas types is either imdbDisplay or original. Sort the output by the average rating of each title type.(计算具有德语标题且akas类型为imdbDisplayoriginal的每种类型的作品的平均值(舍入到小数点后2位)、最小值和最大值。按每个标题类型的平均评分对输出进行排序。)

Your output should have the format: TITLE_TYPE|AVG_RATING|MIN_RATING|MAX_RATING Your first row should look like this:movie|6.65|3.4|8.2

分析:titles表、akasratings表通过title_id进行联结,其中titles中作为主键,akasratings作为外键。要求akas.types in ('imdbDisplay','original') and akas.language='de',计算平均值、最大值、最小值。为三表查询

1
2
3
4
5
6
7
8
SELECT T.TYPE, ROUND(AVG(R.RATING),2),MIN(R.RATING),MAX(R.RATING)
FROM TITLES AS T
INNER JOIN AKAS AS A ON T.TITLE_ID=A.TITLE_ID
INNER JOIN RATINGS AS R ON T.TITLE_ID=R.TITLE_ID
WHERE A.TYPES IN ('imdbDisplay','original')
AND A.LANGUAGE='de'
GROUP BY T.TYPE
ORDER BY T.TYPE;

3.5 q6_who_played_a_batman

List the 10 highest rated actors who played a character named "Batman".(请列出出演过“蝙蝠侠”角色的10位评价最高的演员) Details: Calculate the actor rating by taking the average rating of all their works. Return both the name of the actor and their rating and only list the top 10 results in order from highest to lowest rating. Round average rating to the nearest hundredth.(通过取所有作品的平均评分来计算演员的评分。返回演员的名字和他们的评分,并且只按评分从高到低的顺序列出前10个结果。四舍五入平均评级到最接近的百分之一。)

Make sure your output is formatted as follows: Kayd Currier|8.05

分析namepeople表,ratingratings表,characterscrew表,其中三者的联结依赖主外键,ratingscrew通过title_id联结,crewpeople通过person_id联结,要求crew.characters like '%Batman%'可将peoplecrew先用person_id进行内连接得到actor,然后利用actor与ratings连接

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
WITH actors AS (
SELECT DISTINCT(crew.person_id) AS person_id,
people.name AS person_name
FROM crew
INNER JOIN people ON people.person_id = crew.person_id
WHERE crew.characters LIKE '%"Batman"%'
AND crew.category = "actor"
)
SELECT a.person_name,
ROUND(AVG(rating), 2) AS average
FROM actors AS a
INNER JOIN crew AS c ON c.person_id = a.person_id
INNER JOIN ratings AS r ON c.title_id = r.title_id
GROUP BY a.person_id
ORDER BY average DESC
LIMIT 10;

上述等价于:

1
2
3
4
5
6
7
8
9
10
11
12
SELECT A.name,ROUND(AVG(R.rating),2) AS SCORE
FROM (SELECT DISTINCT(C.person_id),P.name,C.title_id
FROM crew AS C
INNER JOIN people AS P ON C.person_id=P.person_id
WHERE C.characters LIKE '%"Batman"%'
AND C.category = 'actor'
) AS A
INNER JOIN crew AS C ON C.person_id=A.person_id
INNER JOIN ratings AS R ON C.title_id=R.title_id
GROUP BY A.person_id
ORDER BY SCORE DESC
LIMIT 10;

3.6 q7_born_with_prestige

List the number of actors or actress who were born on the year that "The Prestige" was premiered.(列出在《致命魔术》首映那一年出生的演员人数。) Details: Print only the total number of actors born that year. For this question, determine distinct people by their person_id, not their names. Do not hard code the query.(只打印当年出生的演员总数。对于这个问题,确定不同的人通过他们的person_id,而不是他们的名字。不要硬编码查询。)

分析bornpeople表,primary_titletitle表,因此需要将crewpeople联结起来。加上条件category IN ('actor', 'actress')和primary_title = 'The Prestige'

1
2
3
4
5
6
7
8
9
SELECT COUNT(DISTINCT(p.person_id))
FROM people AS p
INNER JOIN crew AS c ON p.person_id = c.person_id
WHERE c.category IN ('actor', 'actress')
AND p.born IN (
SELECT premiered
FROM titles
WHERE primary_title = 'The Prestige'
);

3.7 q8_directing_rose.sql

Find all the directors who have worked with an actress with first name "Rose".(找出所有与名字为“Rose”的女演员合作过的导演。) Details: Print only the names of the directors in alphabetical order. Each name should only appear once in the output.

分析:category在表crew,namepeople

1
2
3
4
5
6
7
8
9
10
11
12
13
WITH title_ids AS (
SELECT c.title_id AS title_id
FROM crew AS c
INNER JOIN people AS p ON p.person_id = c.person_id
WHERE p.name like 'Rose%'
AND c.category = 'actress'
)
SELECT DISTINCT(p.name) AS name
FROM title_ids AS t
INNER JOIN crew AS c ON t.title_id = c.title_id
INNER JOIN people AS p ON p.person_id = c.person_id
WHERE c.category = 'director'
ORDER BY name;

3.8