First I selected production_year and id from title2 table. I then selected movie_id and info_type_id =‘3’ because that is how to get genres. The production year and id from table title2 are to get years. I was trying to graph years and genres over time, so I needed to use these two tables. Amount signifies the amount of these “genres” of films. The amount was calculated by counting the ids from the title 2 table. I took my information from my genre_change output, and created a txt document, which I then read via read.table so I could graph this from. I included this txt file in Duncan’s zip file. I made 3 graphs: -one has filled in factors of years, the other has filled in factors of amount -the last barchart has a more focus on years and amount I had difficulty narrowing the ylab, I tried ylim(2010,2025) and I would get 0 data back on my graph. The most accurate description of the years as a y-limit, is in the amount (or second graph with a legend of amounts). The span is about 15 years. We can see from each of the graphs, that ‘short’ film genre has the most hits throughout years as well as by amounts. Romance genre follows second, followed by animation at thrid place. There seems to be a lot of animated films these days, so I was not shocked to see animation as one of the top genre of films within the past few years and in production in the future. This can be said for the other two top genres as well. There are always have been a lot of short films made, ever since the very early days of film production, so it makes sense to see many short films having been made and will be made. Finally, romantic films are constantly being made. This is a never-ending genre. I used two packages for this problem: library(plyr) (as advised by the help file in R for geombar) library(ggplot2) (for my graph)
setwd("D:/Fall 2015/141/Assignment 5")
library(RSQLite)
## Warning: package 'RSQLite' was built under R version 3.2.5
imdb_new = dbConnect(SQLite(), dbname = "D:/Fall 2015/141/Assignment 5/lean_imdbpy_2010_idx.db")
library(RSQLite)
genre_change<-dbGetQuery(imdb_new, 'SELECT info, title2.production_year,
COUNT(title2.id) as Amount
FROM movie_info2, title2
WHERE title2.id = movie_info2.movie_id AND info_type_id = "3"
GROUP BY info_type_id, production_year;')
setwd("D:/Fall 2015/141/Assignment 5")
genre_change<-read.table("genre_change.txt")
genres=genre_change[,1]
years=genre_change[,2]
amount=genre_change[,3]
library(plyr)
require(ggplot2)
## Loading required package: ggplot2
mm <- ddply(genre_change, "genres")
ggplot(mm, aes(x = factor(genres), y = years)) + labs(title="Genre change", x="genre", y="years")+geom_bar( fill=factor(amount),stat = "identity")
qplot(factor(genres),data=genre_change, geom="bar", fill=factor(amount),xlab="genres",ylab="years",main="genre change over years w/ amount")
qplot(factor(genres),data=genre_change, geom="bar", fill=factor(years),xlab="genres", ylab="years", main="Genre change over years w/years")