# Handling Missing Values in Pandas

In real life, many datasets arrive with missing data either because it exists and was not collected or it never existed.

In Pandas missing data is represented by two values:

* `None` : None is simply is `keyword` refer as empty or none.
* `NaN` : Acronym for `Not a Number`.

There are several useful functions for detecting, removing, and replacing null values in Pandas DataFrame:

1. `isnull()`
2. `notnull()`
3. `dropna()`
4. `fillna()`
5. `replace()`

## 2. Checking for missing values using `isnull()` and `notnull()`

Let's import pandas and our fancy car-sales dataset having some missing values.

```python
import pandas as pd

car_sales_missing_df = pd.read_csv("Datasets/car-sales-missing-data.csv")
print(car_sales_missing_df)
```

         Make Colour  Odometer  Doors    Price
    0  Toyota  White  150043.0    4.0   $4,000
    1   Honda    Red   87899.0    4.0   $5,000
    2  Toyota   Blue       NaN    3.0   $7,000
    3     BMW  Black   11179.0    5.0  $22,000
    4  Nissan  White  213095.0    4.0   $3,500
    5  Toyota  Green       NaN    4.0   $4,500
    6   Honda    NaN       NaN    4.0   $7,500
    7   Honda   Blue       NaN    4.0      NaN
    8  Toyota  White   60000.0    NaN      NaN
    9     NaN  White   31600.0    4.0   $9,700
    

```python
## Using isnull()

print(car_sales_missing_df.isnull())
```

        Make  Colour  Odometer  Doors  Price
    0  False   False     False  False  False
    1  False   False     False  False  False
    2  False   False      True  False  False
    3  False   False     False  False  False
    4  False   False     False  False  False
    5  False   False      True  False  False
    6  False    True      True  False  False
    7  False   False      True  False   True
    8  False   False     False   True   True
    9   True   False     False  False  False
    

Note here:
* `True` means for `NaN` values
* `False` means for no `Nan` values

If we want to find the number of missing values in each column use `isnull().sum()`.


```python
print(car_sales_missing_df.isnull().sum())
```

    Make        1
    Colour      1
    Odometer    4
    Doors       1
    Price       2
    dtype: int64
    

You can also check presense of null values in a single column.


```python
print(car_sales_missing_df["Odometer"].isnull())
```

    0    False
    1    False
    2     True
    3    False
    4    False
    5     True
    6     True
    7     True
    8    False
    9    False
    Name: Odometer, dtype: bool
    

```python
## using notnull()

print(car_sales_missing_df.notnull())
```

        Make  Colour  Odometer  Doors  Price
    0   True    True      True   True   True
    1   True    True      True   True   True
    2   True    True     False   True   True
    3   True    True      True   True   True
    4   True    True      True   True   True
    5   True    True     False   True   True
    6   True   False     False   True   True
    7   True    True     False   True  False
    8   True    True      True  False  False
    9  False    True      True   True   True
    

Note here:
* `True` means no `NaN` values
* `False` means for `NaN` values

`isnull()` means having null values so it gives boolean `True` for NaN values. And `notnull()` means having no null values so it gives `True` for no NaN value.

## 2. Filling missing values using `fillna()`, `replace()`.


```python
## Filling missing values  with a single value using `fillna`
print(car_sales_missing_df.fillna(0))
```

         Make Colour  Odometer  Doors    Price
    0  Toyota  White  150043.0    4.0   $4,000
    1   Honda    Red   87899.0    4.0   $5,000
    2  Toyota   Blue       0.0    3.0   $7,000
    3     BMW  Black   11179.0    5.0  $22,000
    4  Nissan  White  213095.0    4.0   $3,500
    5  Toyota  Green       0.0    4.0   $4,500
    6   Honda      0       0.0    4.0   $7,500
    7   Honda   Blue       0.0    4.0        0
    8  Toyota  White   60000.0    0.0        0
    9       0  White   31600.0    4.0   $9,700
    

```python
## Filling missing values with the previous value using `ffill()`
print(car_sales_missing_df.ffill())
```

         Make Colour  Odometer  Doors    Price
    0  Toyota  White  150043.0    4.0   $4,000
    1   Honda    Red   87899.0    4.0   $5,000
    2  Toyota   Blue   87899.0    3.0   $7,000
    3     BMW  Black   11179.0    5.0  $22,000
    4  Nissan  White  213095.0    4.0   $3,500
    5  Toyota  Green  213095.0    4.0   $4,500
    6   Honda  Green  213095.0    4.0   $7,500
    7   Honda   Blue  213095.0    4.0   $7,500
    8  Toyota  White   60000.0    4.0   $7,500
    9  Toyota  White   31600.0    4.0   $9,700
    

```python
## illing null value with the next ones  using 'bfill()'
print(car_sales_missing_df.bfill())
```

         Make Colour  Odometer  Doors    Price
    0  Toyota  White  150043.0    4.0   $4,000
    1   Honda    Red   87899.0    4.0   $5,000
    2  Toyota   Blue   11179.0    3.0   $7,000
    3     BMW  Black   11179.0    5.0  $22,000
    4  Nissan  White  213095.0    4.0   $3,500
    5  Toyota  Green   60000.0    4.0   $4,500
    6   Honda   Blue   60000.0    4.0   $7,500
    7   Honda   Blue   60000.0    4.0   $9,700
    8  Toyota  White   60000.0    4.0   $9,700
    9     NaN  White   31600.0    4.0   $9,700
    

#### Filling a null values using `replace()` method

Now we are going to replace the all `NaN` value in the data frame with -125 value

For this we will also need numpy


```python
import numpy as np

print(car_sales_missing_df.replace(to_replace = np.nan, value = -125))
```

         Make Colour  Odometer  Doors    Price
    0  Toyota  White  150043.0    4.0   $4,000
    1   Honda    Red   87899.0    4.0   $5,000
    2  Toyota   Blue    -125.0    3.0   $7,000
    3     BMW  Black   11179.0    5.0  $22,000
    4  Nissan  White  213095.0    4.0   $3,500
    5  Toyota  Green    -125.0    4.0   $4,500
    6   Honda   -125    -125.0    4.0   $7,500
    7   Honda   Blue    -125.0    4.0     -125
    8  Toyota  White   60000.0 -125.0     -125
    9    -125  White   31600.0    4.0   $9,700
    

## 3. Dropping missing values using `dropna()`

In order to drop a null values from a dataframe, we used `dropna()` function this function drop Rows/Columns of datasets with Null values in different ways.

#### Dropping rows with at least 1 null value. 


```python
print(car_sales_missing_df.dropna(axis = 0))  ##Now we drop rows with at least one Nan value (Null value) 
```

         Make Colour  Odometer  Doors    Price
    0  Toyota  White  150043.0    4.0   $4,000
    1   Honda    Red   87899.0    4.0   $5,000
    3     BMW  Black   11179.0    5.0  $22,000
    4  Nissan  White  213095.0    4.0   $3,500
    

#### Dropping rows if all values in that row are missing.


```python
print(car_sales_missing_df.dropna(how = 'all',axis = 0))  ## If not have leave the row as it is
```

         Make Colour  Odometer  Doors    Price
    0  Toyota  White  150043.0    4.0   $4,000
    1   Honda    Red   87899.0    4.0   $5,000
    2  Toyota   Blue       NaN    3.0   $7,000
    3     BMW  Black   11179.0    5.0  $22,000
    4  Nissan  White  213095.0    4.0   $3,500
    5  Toyota  Green       NaN    4.0   $4,500
    6   Honda    NaN       NaN    4.0   $7,500
    7   Honda   Blue       NaN    4.0      NaN
    8  Toyota  White   60000.0    NaN      NaN
    9     NaN  White   31600.0    4.0   $9,700
    

#### Dropping columns with at least 1 null value


```python
print(car_sales_missing_df.dropna(axis = 1))
```

    Empty DataFrame
    Columns: []
    Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    

Now we drop a columns which have at least 1 missing values.

Here the dataset becomes empty after `dropna()` because each column as atleast 1 null value so it remove that columns resulting in an empty dataframe.