For this problem, I found number of actors and actresses two ways: The first time I used id from role_type and used UNION to join all names from the name table. In the role_id, I called only for ids that were either a 1 or a 2 because those are the ids of female and male actors.
I’m really weary of cast_info because it takes too long to run and it has crashed my computer many times. While I tried to create a table for actors and I used ids from cast_info, My computer crashed and I had to restart it twice.
The first table I created combined the cast_info role ids
with therole_type ids
, and only used 1 or 2, (female and male actors) from each id column from those two tables. It took about 10 minutes to run, but I only had to run it once for it to be forever saved in my information.
the second table combined the ids from role_type and names (role ids only calling for 1 or 2, so male and female actors). This made it easier for me to count the names of actors, or how many actors there are in the database. There really was no need to do this tho, because the number of observations given in the global environment is the same as when I used a command to count the output from the table that I created.
library(RSQLite)
setwd("E:/Fall 2015/141/Assignment 5")
imdb = dbConnect(SQLite(), dbname = "E:/Fall 2015/141/Assignment 5/lean_imdbpy.db")
#create table combining role ids from cast_info and role ids from role_type
#this will be used for multiple questions and eases time
#combine ids from role_type and cast_info (created a new table called 'new')
roles_1<-dbGetQuery(imdb,
"CREATE TABLE new AS
SELECT role_id FROM cast_info WHERE role_id = '1' or role_id ='2' UNION SELECT id FROM role_type
WHERE id ='1' or id ='2'
GROUP BY id")
#created a new table from name table combined with role_ids, only called for male and female actors
actors <-dbGetQuery(imdb,
"CREATE TABLE actors_1 AS
SELECT DISTINCT name FROM name UNION SELECT id FROM role_type WHERE id = '1' OR id = '2'
GROUP BY id")
dbListTables(imdb)
actors_cast_info <-dbGetQuery(imdb,
"SELECT DISTINCT name FROM name UNION SELECT role_id FROM cast_info WHERE role_id = '1' OR role_id = '2'
GROUP BY role_id")
#4505382 results using cast_info role ids and name
For the movies, I originally tried the aka_titles
table, but the amount I got was very small, and something felt off with the number I got back, so I tried the title
table and I got more results and this felt like better results to me. As someone who has used IMDB for over 10 years, getting back only 300k titles with aka_titles
seemed very off to me.
library(RSQLite)
## Loading required package: DBI
setwd("E:/Fall 2015/141/Assignment 5")
imdb = dbConnect(SQLite(), dbname = "E:/Fall 2015/141/Assignment 5/lean_imdbpy.db")
actors_from_actors_table <-dbGetQuery(imdb, "SELECT COUNT (*) FROM actors_1")
actors_from_actors_table
## COUNT (*)
## 1 4505382
#4505382 total actors counted for with my actors table that I created
movi3es <- dbGetQuery(imdb, "SELECT COUNT(title) total_movies FROM title")
movi3es
## total_movies
## 1 3527732
#3527732 total movies in the database
I called for the MIN , the earliest year recorded in the production_year
column from the title table, and I also called for the MAX or latest year recorded in the production_year
column from the title table. I then looked up the film from 1974 and 2025 as a means of exploring the data further.
When using the aka_title
column, I noticed that the MAX year was slightly different, I believe it was 2022 and not 2025.
I also thought the oldest film would be ‘the great train robbery’.
years_fun <- dbGetQuery(imdb, "SELECT MIN(production_year), MAX(production_year) FROM title")
years_fun
## MIN(production_year) MAX(production_year)
## 1 1874 2025
#which movie is from 1874?
dbGetQuery(imdb, "select production_year,id,title from title where production_year='1874';")
## production_year id title
## 1 1874 3087258 Passage de Venus
#which movie is from 2025?
dbGetQuery(imdb, "select production_year,id,title from title where production_year='2025';")
## production_year id title
## 1 2025 3249359 StreetDance 4
For this I used piazza and a google search and after several attempts, managed to find an example that would work with the cast_info
, specifically role_id column.
I did combine the cast_info role_ids with names, as a new table, but I had issues trying to get proportions of it out. I would like to in the future figure out a way to create a table, where at the same time too, create a column in the new table that I am creating/. I am sure there is a way to do this, but as of yet I have not figured that out.
The proportion I got back from the cast_info
table:
There are 33.6% male actors and 20% female actors
gender_proportion<-dbGetQuery(imdb, "Select role_id, (Count(role_id)* 100.0 / (Select Count(*) From cast_info)) as Gender
From cast_info
WHERE role_id ='1' or role_id = '2'
Group By role_id")
gender_proportion
## role_id Gender
## 1 1 33.64233
## 2 2 20.04070
#where 1 is male actor and 2 is female actor.
#There are 33.6% male actors and 20% female actors
library(RSQLite)
kind_test<-dbGetQuery(imdb, "Select kind_id, (Count(kind_id)* 100.0 / (Select Count(*) From title)) as Proportion
From title
WHERE kind_id ='1' or kind_id = '2' or kind_id ='3' or kind_id='4' or kind_id='5' or kind_id='6' or kind_id='7'
Group By kind_id")
kind_test
## kind_id Proportion
## 1 1 24.9111894
## 2 2 3.5273371
## 3 3 3.4126175
## 4 4 4.1563815
## 5 6 0.4341033
## 6 7 63.5583712
setwd("E:/Fall 2015/141/Assignment 5")
imdb = dbConnect(SQLite(), dbname = "E:/Fall 2015/141/Assignment 5/lean_imdbpy.db")
setwd("E:/Fall 2015/141/Assignment 5")
library(RSQLite)
imdb_old = dbConnect(SQLite(), dbname = "E:/Fall 2015/141/Assignment 5/imdb_data.txt")
dbGetQuery(imdb_old, "select * from genres")
## idgenres genre
## 1 1 Documentary
## 2 2 Reality
## 3 3 Horror
## 4 4 Drama
## 5 5 Comedy
## 6 6 Musical
## 7 7 Talk
## 8 8 Mystery
## 9 9 News
## 10 10 Sport
## 11 11 Sci
## 12 12 Romance
## 13 13 Family
## 14 14 Short
## 15 15 Biography
## 16 16 Music
## 17 17 Game
## 18 18 Adventure
## 19 19 Crime
## 20 20 War
## 21 21 Fantasy
## 22 22 Thriller
## 23 23 Animation
## 24 24 Action
## 25 25 History
## 26 26 Adult
## 27 27 Western
## 28 28 Lifestyle
## 29 29 Film
## 30 30 Experimental
## 31 31 Commercial
## 32 32 Erotica
#there are 32 genres