Contd for Netlfix EDA

  • Q. 1) For 'House of Cards', what is the Show Id and Who is the Director of this show ?

To retrieve the information for "House of Cards," specifically the Show ID and Director, you can use the following commands in Python with pandas:

Commands to Retrieve Information

Import the necessary libraries:

python

import pandas as pd

Load the dataset:

python

df = pd.read_csv('netflix_titles.csv')

Filter for "House of Cards":

python

house_of_cards_info = df[df['title'] == 'House of Cards']

Retrieve Show ID and Director:

python

show_id = house_of_cards_info['show_id'].values[0]

director = house_of_cards_info['director'].values[0]

Print the results:

python

print(f"Show ID: {show_id}, Director: {director}")

Explanation of Commands

The first command imports the pandas library, which is essential for data manipulation.

The second command loads the dataset from a CSV file into a DataFrame.

The third command filters the DataFrame to find the row corresponding to "House of Cards."

The fourth command extracts the Show ID and Director from the filtered DataFrame.

The final command prints out the retrieved information.

This sequence of commands will allow you to efficiently find and display the Show ID and Director for "House of Cards."

 • Q. 2) In which year the highest number of TV Shows & Movies were released? Show with Bar Graph.

To determine the year with the highest number of TV shows and movies released from the dataset, you can follow these steps using Python with pandas and matplotlib for visualization. Here’s how you can do it:

Commands to Retrieve Information and Create a Bar Graph

Import the necessary libraries:

python

import pandas as pd

import matplotlib.pyplot as plt

Load the dataset:

python

df = pd.read_csv('netflix_titles.csv')

Group by release year and count the number of titles:

python

release_counts = df['release_year'].value_counts().sort_index()

Identify the year with the highest number of releases:

python

max_year = release_counts.idxmax()

max_count = release_counts.max()

print(f"The year with the highest number of releases is {max_year} with {max_count} titles.")

Create a bar graph to visualize the data:

python

plt.figure(figsize=(12, 6))

release_counts.plot(kind='bar', color='skyblue')

plt.title('Number of TV Shows & Movies Released by Year')

plt.xlabel('Year')

plt.ylabel('Number of Releases')

plt.xticks(rotation=45)

plt.axhline(y=max_count, color='r', linestyle='--', label=f'Max Releases: {max_count} in {max_year}')

plt.legend()

plt.tight_layout()

plt.show()

Explanation of Commands

The first command imports pandas for data manipulation and matplotlib for plotting.

The second command loads the dataset from a CSV file.

The third command counts how many titles were released each year and sorts them by year.

The fourth command identifies which year had the most releases and prints that information.

The final commands create a bar graph showing the number of releases per year, highlighting the maximum count with a dashed line.

This sequence of commands will help you find out which year had the highest number of TV shows and movies released, along with a visual representation of the data.

 • Q. 3) How many Movies & TV Shows are in the dataset ? Show with Bar Graph. 


To determine how many movies and TV shows are in the dataset and visualize this information with a bar graph, you can follow these steps using Python with pandas and matplotlib. Here’s how to do it:

Commands to Retrieve Information and Create a Bar Graph

Import the necessary libraries:

python

import pandas as pd

import matplotlib.pyplot as plt

Load the dataset:

python

df = pd.read_csv('netflix_titles.csv')

Count the number of Movies and TV Shows:

python

counts = df['type'].value_counts()

Create a bar graph to visualize the data:

python

plt.figure(figsize=(8, 5))

counts.plot(kind='bar', color=['blue', 'orange'])

plt.title('Number of Movies & TV Shows in the Dataset')

plt.xlabel('Type')

plt.ylabel('Count')

plt.xticks(rotation=0)

plt.show()

Explanation of Commands

The first command imports the pandas library for data manipulation and matplotlib for plotting.

The second command loads the dataset from a CSV file into a DataFrame.

The third command counts how many entries are categorized as 'Movie' and 'TV Show' using value_counts().

The final commands create a bar graph showing the count of movies and TV shows in the dataset.

This sequence of commands will help you find out how many movies and TV shows are present in the dataset and visualize that information effectively with a bar graph.


 • Q. 4) Show all the Movies that were released in the year 2000. 

 • Q. 5) Show only the Titles of all TV Shows that were released in India only. 

 • Q. 6) Show Top 10 Directors, who gave the highest number of TV Shows & Movies to Netflix ?

To address the queries regarding the Netflix dataset, you can use the following commands in Python with pandas to retrieve the required information.

 4: Show all the Movies that were released in the year 2000

python

import pandas as pd


# Load the dataset

df = pd.read_csv('netflix_titles.csv')


# Filter for movies released in the year 2000

movies_2000 = df[(df['type'] == 'Movie') & (df['release_year'] == 2000)]


# Display the results

print(movies_2000[['title', 'release_year']])

 5: Show only the Titles of all TV Shows that were released in India only

python

# Filter for TV shows released in India

tv_shows_india = df[(df['type'] == 'TV Show') & (df['country'] == 'India')]


# Display only the titles

print(tv_shows_india['title'])

 6: Show Top 10 Directors who gave the highest number of TV Shows & Movies to Netflix

python

# Count the number of titles per director

top_directors = df['director'].value_counts().head(10)


# Display the top directors

print(top_directors)

Explanation of Commands

For 4, we filter the DataFrame for entries where the type is "Movie" and the release year is 2000, then display the relevant columns.

For  5, we filter for entries where the type is "TV Show" and the country is "India," displaying only the title column.

For 6, we use value_counts() on the director column to count how many titles each director has contributed, then retrieve the top 10.

These commands will help you extract and display the required information from the dataset effectively.

 • Q. 7) Show all the Records, where "Category is Movie and Type is Comedies" or "Country is United Kingdom". 

 • Q. 8) In how many movies/shows, Tom Cruise was cast ? 

 • Q. 9) What are the different Ratings defined by Netflix ? 


Here are the commands to retrieve the required information for each of your queries regarding the Netflix dataset:

7: Show all the Records where "Category is Movie and Type is Comedies" or "Country is United Kingdom"

python

# Filter for movies that are comedies or country is United Kingdom

filtered_records = df[(df['type'] == 'Movie') & (df['listed_in'].str.contains('Comedies')) | (df['country'] == 'United Kingdom')]


# Display the results

print(filtered_records)

8: In how many movies/shows was Tom Cruise cast?

python

# Count the number of movies/shows where Tom Cruise is in the cast

tom_cruise_count = df[df['cast'].str.contains('Tom Cruise', na=False)].shape[0]


# Display the count

print(f"Tom Cruise was cast in {tom_cruise_count} movies/shows.")

 9: What are the different Ratings defined by Netflix?

python

# Get unique ratings defined by Netflix

unique_ratings = df['rating'].unique()


# Display the unique ratings

print("Different Ratings defined by Netflix:")

print(unique_ratings)

Explanation of Commands

For 7, we filter the DataFrame for entries where the type is "Movie" and includes "Comedies" in the listed_in column, or where the country is "United Kingdom."

For 8, we check for entries in the cast column that contain "Tom Cruise" and count them.

For  9, we retrieve unique values from the rating column to see all different ratings defined by Netflix.

These commands will help you extract and display the required information from the dataset effectively. 


 ▪ Q. 9.1) How many Movies got the 'TV-14' rating, in Canada ?

 ▪ Q. 9.2) How many TV Shows got the 'R' rating, after year 2018 ? 


9.1: How many Movies got the 'TV-14' rating in Canada?

python

# Count the number of movies with 'TV-14' rating in Canada

tv14_movies_canada = df[(df['rating'] == 'TV-14') & (df['country'] == 'Canada') & (df['type'] == 'Movie')]


# Display the count

count_tv14_movies_canada = tv14_movies_canada.shape[0]

print(f"Number of Movies with 'TV-14' rating in Canada: {count_tv14_movies_canada}")

9.2: How many TV Shows got the 'R' rating after the year 2018?

python

# Count the number of TV shows with 'R' rating released after 2018

r_tv_shows_after_2018 = df[(df['rating'] == 'R') & (df['release_year'] > 2018) & (df['type'] == 'TV Show')]


# Display the count

count_r_tv_shows_after_2018 = r_tv_shows_after_2018.shape[0]

print(f"Number of TV Shows with 'R' rating after 2018: {count_r_tv_shows_after_2018}")

Explanation of Commands

For 9.1, we filter the DataFrame for entries where the rating is "TV-14," the country is "Canada," and the type is "Movie." We then count these entries.

For  9.2, we filter for entries where the rating is "R," the release year is greater than 2018, and the type is "TV Show." We then count these entries as well.

These commands will help you extract and display the required information from the dataset effectively.


 • Q. 10) What is the maximum duration of a Movie/Show on Netflix ? 

 • Q. 11) Which individual country has the Highest No. of TV Shows ? 

 • Q. 12) How can we sort the dataset by Year ?

 • Q. 13) Find all the instances where: Category is 'Movie' and Type is 'Dramas' or Category is 'TV Show' & Type is 'Kids' TV'. 


 10: What is the maximum duration of a Movie/Show on Netflix?

python

# Convert duration to numeric values (in minutes)

df['duration_minutes'] = df['duration'].str.replace(' min', '').astype(int)


# Find the maximum duration

max_duration = df['duration_minutes'].max()


# Display the maximum duration

print(f"The maximum duration of a Movie/Show on Netflix is {max_duration} minutes.")

11: Which individual country has the highest number of TV Shows?

python

# Count the number of TV Shows per country

tv_show_counts = df[df['type'] == 'TV Show']['country'].value_counts()


# Get the country with the highest number of TV Shows

highest_tv_shows_country = tv_show_counts.idxmax()

highest_tv_shows_count = tv_show_counts.max()


# Display the results

print(f"The country with the highest number of TV Shows is {highest_tv_shows_country} with {highest_tv_shows_count} shows.")

 12: How can we sort the dataset by Year?

python

# Sort the dataset by release year

sorted_df = df.sort_values(by='release_year')


# Display the sorted dataset (optional)

print(sorted_df.head())  # Show first few rows of sorted dataset

 13: Find all instances where Category is 'Movie' and Type is 'Dramas' or Category is 'TV Show' & Type is 'Kids' TV.

python

# Filter for Movies that are Dramas or TV Shows that are Kids TV

filtered_instances = df[((df['type'] == 'Movie') & (df['listed_in'].str.contains('Dramas'))) | 

                         ((df['type'] == 'TV Show') & (df['listed_in'].str.contains("Kids' TV")))]


# Display the filtered results

print(filtered_instances)

Explanation of Commands

For  10, we convert the duration column to numeric values, then find and display the maximum duration.

For  11, we count how many TV shows exist for each country and identify which country has the highest count.

For  12, we sort the DataFrame by release_year and optionally display the first few rows of the sorted data.

For 13, we filter for entries where either condition (Movie and Dramas or TV Show and Kids TV) is met, then display those records.

These commands will help you extract and display the required information from the dataset effectively. 


Comments

Popular Posts