Skip to content

Latest commit

 

History

History
112 lines (74 loc) · 4.79 KB

File metadata and controls

112 lines (74 loc) · 4.79 KB

STATS-1: Basic Statistics

Generic

Population vs. Sample

Variable Types

Numerical Data Analysis

Summarizing/Describing Data

Covariance / Correlation

Exploratory Data Analytics

Check List

You should be familiar with the following

  • Population vs. sample
  • Variable types (quantitative, discrete, continuous)
  • Plot basic graphs
  • Do numerical analysis such as mean, median, variance, standard deviation
  • Correlation

Exercises

These are simple exercises designed to reinforce your learning so far.

Difficulty Level

★☆☆ - Easy
★★☆ - Medium
★★★ - Challenging
★★★★ - Bonus

EX-1 - Mean / Median / STD (★☆☆)

We have some sample salary data (in thousands) from two cities.

city1 = [15,12, 20, 25, 50, 35, 75, 80, 60, 45, 36]
city2 = [40,42, 45, 60, 55, 52, 56, 52, 62, 57, 48]

Calculate mean, median, variation, standard deviation for both city data.

EX-2 - NBA Player Stats (★★☆)

  • Read nba player stats data
  • Extract Salary column
  • Find min/max/mean/median of salary
  • Is there a large variance in salary? How will you find out?
  • Find the 10% trimmed mean of salary
  • Do some plots for salary
    Hint: boxplot and histograms

EX-3 - Correlation of NBA Player Stats (★★☆)

  • Read nba player stats data
  • Extract Height and Weight columns
  • You will notice the height is in feet-inches format. For example 6-4. You will need to convert this to single numeric format.
    • Create a new column called height_cm
    • Conversion formula is:
      cm = feet * 30.48 + inches * 2.54
  • Is there a correlation between height and weight ?
  • Create a plot to illustrate the relationship

EX-4 - Correlation Matrix for House Sales Data (★★☆)

  • Read house-sales.csv
  • Create a correlation matrix for this data
  • Analyze which attributes affect Saleprice