Skip to content

nnvij/Webscrapping---IMDB-website

Repository files navigation

Webscrapping---IMDB-website

Introduction:

  • IMDB is an online database of information on films, television series and video games founded by Col needham in 1990. It os now a subsidiary of Amazon.
  • They have currenty 8.7 million titles, 11.4 million person records and 83 million registered users

Goals:

  • Webscrape movie data from https://www.imdb.com/
  • Perform exploratory data analysis on the data collected

Tools used:

• Beautiful Soup, Selenium (data collection) • Pandas (data processing) • Matplotlib, Seaborn

Data:

Data Collection:

  • Collected 10,000 movie title using function inputMovieData(), data was saved to Moviedataframe (shape :(10049, 11))

  • 94 Bestpicture data was collected using function InputSearchResultpage() and saving them into dataframe BestPictureadatframe (shape :(94,11))

    image image image

Data Cleanning:

  • Used regular expression to extract Year, Gross from respective columns.
  • Converted columns Year, Runtime, Rating, Votes,Gross from String to numeric.
  • Removed null values

Exploratory Data analysis:

gross gross_number top10_oscar

Summary:

  • Drama movies have won maximum Oscar Awards for Best Picture.
  • Action movies tend to make high revenues.
  • Average ratings for top gross movies is 7.4
  • Gross margin looks to plummet steeply in 2020 which may be an affect due to Covid.

About

Webscrape movie data from IMDB website and perform exploratory data analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published