GitHub - lakshya90/DataScience101

1. Importing Data

Let’s get the data to our local machines

wget https://raw.githubusercontent.com/lakshya90/DataScience101/master/titanic.csv (curl -O for mac)

Let’s import relevant libraries

import pandas as pd

Let’s read the file into a dataframe

df = pd.read_csv('titanic.csv')

2. Viewing/Inspecting Data

Let’s see number of rows and columns

df.shape       # Output : 891,12

Let’s see top 5 rows

df.head(5)

Let’s see bottom 5 rows

df.tail(5)

Let’s see some more information on the rows and columns

df.info()       # Output : 891,12

Let’s see some statistics for numerical fields

df.describe()

Let’s see unique counts for each columns

df['Sex'].value_counts(); 
df['Survived'].value_counts();
df['Pclass'].value_counts()

3. Selection

Let’s see the type when we pick a column

d = df['PassengerId']; type(d)   #<class ‘pandas.core.series.Series’>

Let’s see any one row features (Pick any number between 0 and 890)

df.iloc[654,:]

4. Data Cleaning

A few columns have NaN values. How do I know?

df.describe() #Check count of all features
df_a = df; df_a['Age'] = df_a['Age'].fill na(df_a['Age'].mean())
df_a['Age'].count() #891 from previous 714

Let’s drop some irrelevant columns

df.drop(['PassengerId','Name','Ticket', 'Cabin'], axis=1)

Let’s see how many null values exist for a certain column

pd.isnull(df['Cabin'])
df[df['Embarked'].isnull()]

5. Filter, Sort, Group By

Let’s see the count of upper class values

df['Pclass'].value_counts(); s = df['Pclass'] < 2; s.value_counts()

Let’s sort all ages in ascending order

df.sort_values('Age')

Let’s see Survivors grouped by Sex.

df.groupby('Sex')['Survived'].value_counts()
df.groupby('Sex').Survived.mean()

6. Statistics

Let’s see some statistics for categorical fields

df.describe(include=['O'])

Let’s see some statistics for a few columns

df['Age'].mean()    #Mean of all values of the feature ‘Age’
df['Cabin'].count()    #Count of all valid ‘Cabin’ values
df['Fare'].max()   #Maximum fare paid for the ticket

7. Exporting the Data

Let's write the data to a csv file

df.to_csv('output.csv')

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
EDA.ipynb		EDA.ipynb
Matplotlib.ipynb		Matplotlib.ipynb
NumPy.ipynb		NumPy.ipynb
Pandas.ipynb		Pandas.ipynb
README.md		README.md
data.csv		data.csv
titanic.csv		titanic.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. Importing Data

Let’s get the data to our local machines

Let’s import relevant libraries

Let’s read the file into a dataframe

2. Viewing/Inspecting Data

Let’s see number of rows and columns

Let’s see top 5 rows

Let’s see bottom 5 rows

Let’s see some more information on the rows and columns

Let’s see some statistics for numerical fields

Let’s see unique counts for each columns

3. Selection

Let’s see the type when we pick a column

Let’s see any one row features (Pick any number between 0 and 890)

4. Data Cleaning

A few columns have NaN values. How do I know?

Let’s drop some irrelevant columns

Let’s see how many null values exist for a certain column

5. Filter, Sort, Group By

Let’s see the count of upper class values

Let’s sort all ages in ascending order

Let’s see Survivors grouped by Sex.

6. Statistics

Let’s see some statistics for categorical fields

Let’s see some statistics for a few columns

7. Exporting the Data

Let's write the data to a csv file

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

1. Importing Data

Let’s get the data to our local machines

Let’s import relevant libraries

Let’s read the file into a dataframe

2. Viewing/Inspecting Data

Let’s see number of rows and columns

Let’s see top 5 rows

Let’s see bottom 5 rows

Let’s see some more information on the rows and columns

Let’s see some statistics for numerical fields

Let’s see unique counts for each columns

3. Selection

Let’s see the type when we pick a column

Let’s see any one row features (Pick any number between 0 and 890)

4. Data Cleaning

A few columns have NaN values. How do I know?

Let’s drop some irrelevant columns

Let’s see how many null values exist for a certain column

5. Filter, Sort, Group By

Let’s see the count of upper class values

Let’s sort all ages in ascending order

Let’s see Survivors grouped by Sex.

6. Statistics

Let’s see some statistics for categorical fields

Let’s see some statistics for a few columns

7. Exporting the Data

Let's write the data to a csv file

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages