kopia lustrzana https://github.com/animator/learn-python
Merge branch 'main' into main
commit
ce2f710835
Plik binarny nie jest wyświetlany.
Po Szerokość: | Wysokość: | Rozmiar: 136 KiB |
Plik binarny nie jest wyświetlany.
Po Szerokość: | Wysokość: | Rozmiar: 35 KiB |
|
@ -0,0 +1,289 @@
|
|||
|
||||
# FastAPI
|
||||
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [Features](#features)
|
||||
- [Installation](#installation)
|
||||
- [Making First API](#making-first-api)
|
||||
- [GET Method](#get-method)
|
||||
- [Running Server and calling API](#running-server-and-calling-api)
|
||||
- [Path Parameters](#pata-parameters)
|
||||
- [Query Parameters](#query-parameters)
|
||||
- [POST Method](#post-method)
|
||||
- [PUT Method](#put-method)
|
||||
- [Additional Content](#additional-content)
|
||||
- [Swagger UI](#swagger-ui)
|
||||
|
||||
## Introduction
|
||||
FastAPI is a modern, web-framework for building APIs with Python.
|
||||
It uses python 3.7+
|
||||
## Features
|
||||
|
||||
1. **Speed ⚡:** FastAPI is built on top of Starlette, a lightweight ASGI framework. It's designed for high performance and handles thousands of requests per second .
|
||||
2. **Easy to use 😃:** FastAPI is designed to be intuitive and easy to use, especially for developers familiar with Python. It uses standard Python type hints for request and response validation, making it easy to understand and write code.
|
||||
3. **Automatic Interactive API Documentation generation 🤩:** FastAPI automatically generates interactive API documentation (Swagger UI or ReDoc) based on your code and type annotations. Swagger UI also allows you to test API endpoints.
|
||||
4. **Asynchronous Support 🔁:** FastAPI fully supports asynchronous programming, allowing you to write asynchronous code with async/await syntax. This enables handling high-concurrency scenarios and improves overall performance.
|
||||
|
||||
Now, lets get hands-on with FastAPI.
|
||||
|
||||
|
||||
## Installation
|
||||
|
||||
Make sure that you have python version 3.7 or greater.
|
||||
|
||||
Then, simply open your command shell and give the following command.
|
||||
|
||||
```bash
|
||||
pip install fastapi
|
||||
```
|
||||
After this, you need to install uvicorn. uvicorn is an ASGI server on which we will be running our API.
|
||||
|
||||
```bash
|
||||
pip install uvicorn
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Making First API
|
||||
|
||||
After successful installation we will be moving towards making an API and seeing how to use it.
|
||||
|
||||
Firstly, the first thing in an API is its root/index page which is sent as response when API is called.
|
||||
|
||||
Follow the given steps to make your first FastAPI🫨
|
||||
|
||||
First, lets import FastAPI to get things started.
|
||||
|
||||
```python
|
||||
from fastapi import FastAPI
|
||||
app = FastAPI()
|
||||
```
|
||||
Now, we will write the ``GET`` method for the root of the API. As you have already seen, the GET method is ``HTTP request`` method used to fetch data from a source. In web development, it is primarily used to *retrieve data* from server.
|
||||
|
||||
The root of the app is ``"/"`` When the API will be called, response will be generated by on this url: ```localhost:8000```
|
||||
|
||||
### GET method
|
||||
Following is the code to write GET method which will be calling API.
|
||||
|
||||
When the API is called, the ``read_root()`` function will be hit and the JSON response will be returned which will be shown on your web browser.
|
||||
|
||||
```python
|
||||
@app.get("/")
|
||||
def read_root():
|
||||
return {"Hello": "World"}
|
||||
|
||||
```
|
||||
|
||||
Tadaaa! you have made your first FastAPI! Now lets run it!
|
||||
|
||||
### Running Server and calling API
|
||||
|
||||
Open your terminal and give following command:
|
||||
```bash
|
||||
uvicorn myapi:app --reload
|
||||
```
|
||||
Here, ``myapi`` is the name of your API which is name of your python file. ``app`` is the name you have given to your API in assignment ``app = FastAPI()``
|
||||
|
||||
After running this command, uvicorn server will be live and you can access your API.
|
||||
|
||||
As right now we have only written root ``GET`` method, only its corresponding response will be displayed.
|
||||
|
||||
On running this API, we get the response in JSON form:
|
||||
|
||||
```json
|
||||
{
|
||||
"Hello": "World"
|
||||
}
|
||||
```
|
||||
## Path Parameters
|
||||
Path parameters are a way to send variables to an API endpoint so that an operation may be perfomed on it.
|
||||
|
||||
This feature is particularly useful for defining routes that need to operate on resources identified by unique identifiers, such as user IDs, product IDs, or any other unique value.
|
||||
|
||||
### Example
|
||||
Lets take an example to make it understandable.
|
||||
|
||||
|
||||
Assume that we have some Students 🧑🎓 in our class and we have saved their data in form of dictionary in our API (in practical scenarios they will be saved in a database and API will query database).
|
||||
So we have a student dictionary that looks something like this:
|
||||
|
||||
```python
|
||||
students = {
|
||||
1: {
|
||||
"name": "John",
|
||||
"age": 17,
|
||||
"class": "year 12"
|
||||
},
|
||||
2: {
|
||||
"name": "Jane",
|
||||
"age": 16,
|
||||
"class": "year 11"
|
||||
},
|
||||
3: {
|
||||
"name": "Alice",
|
||||
"age": 17,
|
||||
"class": "year 12"
|
||||
}
|
||||
}
|
||||
```
|
||||
Here, keys are ``student_id``.
|
||||
|
||||
Let's say user wants the data of the student whose ID is 2. Here, we will take ID as **path parameter** from the user and return the data of that ID.
|
||||
|
||||
|
||||
Lets see how it will be done!
|
||||
|
||||
```python
|
||||
@app.get("/students/{student_id}")
|
||||
def read_student(student_id: int):
|
||||
return students[student_id]
|
||||
```
|
||||
Here is the explanatory breakdown of the method:
|
||||
|
||||
- ``/students`` is the URL of students endpoint in API.
|
||||
- ``{student_id}`` is the path parameter, which is a dynamic variable the user will give to fetch the record of a particular student.
|
||||
- ``def read_student(student_id: int)`` is the signature of function which takes the student_id we got from path parameter. Its type is defined as ``int`` as our ID will be an integer.
|
||||
**Note that there will be automatic type checking of the parameter. If it is not same as type defined in method, an Error response ⛔ will be generated.**
|
||||
|
||||
- ``return students[student_id]`` will return the data of required student from dictionary.
|
||||
|
||||
When the user passes the URL ``http://127.0.0.1:8000/students/1`` the data of student with student_id=1 is fetched and displayed.
|
||||
In this case following output will be displayed:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "John",
|
||||
"age": 17,
|
||||
"class": "year 12"
|
||||
}
|
||||
```
|
||||
|
||||
## Query Parameters
|
||||
Query parameters in FastAPI allow you to pass data to your API endpoints via the URL's query string. This is useful for filtering, searching, and other operations that do not fit well with the path parameters.
|
||||
|
||||
Query parameters are specified after the ``?`` symbol in the URL and are typically used for optional parameters.
|
||||
|
||||
### Example
|
||||
Lets continue the example of students to understand the query parameters.
|
||||
|
||||
Assume that we want to search students by name. In this case, we will be sending datat in query parameter which will be read by our method and respective result will be returned.
|
||||
|
||||
Lets see the method:
|
||||
|
||||
```python
|
||||
@app.get("/get-by-name")
|
||||
def read_student(name: str):
|
||||
for student_id in students:
|
||||
if students[student_id]["name"] == name:
|
||||
return students[student_id]
|
||||
return {"Error": "Student not found"}
|
||||
```
|
||||
Here is the explanatory breakdown of this process:
|
||||
|
||||
- ``/get-by-name`` is the URL of the endpoint. After this URL, client will enter the query parameter(s).
|
||||
- ``http://127.0.0.1:8000/get-by-name?name=Jane`` In this URL, ``name=Jane`` is the query parameter. It means that user needs to search the student whose name is Jane. When you hit this URL, ``read_student(name:str)`` method is called and respective response is returned.
|
||||
|
||||
In this case, the output will be:
|
||||
```json
|
||||
{
|
||||
"name": "Jane",
|
||||
"age": 16,
|
||||
"class": "year 11"
|
||||
}
|
||||
```
|
||||
If we pass a name that doesn't exist in dictionary, Error response will be returned.
|
||||
|
||||
## POST Method
|
||||
The ``POST`` method in FastAPI is used to **create resources** or submit data to an API endpoint. This method typically involves sending data in the request body, which the server processes to create or modify resources.
|
||||
|
||||
**⛔ In case of ``GET`` method, sent data is part of URL, but in case of ``POST`` metohod, sent data is part of request body.**
|
||||
|
||||
### Example
|
||||
Again continuing with the example of student. Now, lets assume we need to add student. Following is the ``POST`` method to do this:
|
||||
|
||||
```python
|
||||
@app.post("/create-student/{student_id}")
|
||||
def create_student(student_id: int, student: dict):
|
||||
if student_id in students:
|
||||
return {"Error": "Student exists"}
|
||||
students[student_id] = student
|
||||
return students
|
||||
```
|
||||
Here is the explanation of process:
|
||||
|
||||
- ``/create-student/{student_id}`` shows that only student_id will be part of URL, rest of the data will be sent in request body.
|
||||
- Data in the request body will be in JSON format and will be received in ``student: dict``
|
||||
- Data sent in JSON format is given as:
|
||||
```json
|
||||
{
|
||||
"name":"Seerat",
|
||||
"age":22,
|
||||
"class":"8 sem"
|
||||
|
||||
}
|
||||
```
|
||||
*Note:* I have used Swagger UI to send data in request body to test my ``POST`` method but you may use any other API tesing tool like Postman etc.
|
||||
|
||||
- This new student will be added in the dictionary, and if operation is successful, new dictionary will be returned as response.
|
||||
|
||||
Following is the output of this ``POST`` method call:
|
||||
|
||||
```json
|
||||
{
|
||||
"1": {
|
||||
"name": "John",
|
||||
"age": 17,
|
||||
"class": "year 12"
|
||||
},
|
||||
"2": {
|
||||
"name": "Jane",
|
||||
"age": 16,
|
||||
"class": "year 11"
|
||||
},
|
||||
"3": {
|
||||
"name": "Alice",
|
||||
"age": 17,
|
||||
"class": "year 12"
|
||||
},
|
||||
"4": {
|
||||
"name": "Seerat",
|
||||
"age": 22,
|
||||
"class": "8 sem"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## PUT Method
|
||||
The ``PUT`` method in FastAPI is used to **update** existing resources or create resources if they do not already exist. It is one of the standard HTTP methods and is idempotent, meaning that multiple identical requests should have the same effect as a single request.
|
||||
|
||||
### Example
|
||||
Let's update the record of a student.
|
||||
|
||||
```python
|
||||
@app.put("/update-student/{student_id}")
|
||||
def update_student(student_id: int, student: dict):
|
||||
if student_id not in students:
|
||||
return {"Error": "Student does not exist"}
|
||||
students[student_id] = student
|
||||
return students
|
||||
```
|
||||
``PUT`` method is nearly same as ``POST`` method but ``PUT`` is indempotent while ``POST`` is not.
|
||||
|
||||
The given method will update an existing student record and if student doesnt exist, it'll send error response.
|
||||
|
||||
## Additional Content
|
||||
|
||||
### Swagger UI
|
||||
|
||||
Swagger UI automatically generates UI for API tesing. Just write ``/docs`` with the URL and UI mode of Swagger UI will be launched.
|
||||
|
||||
Following Screenshot shows the Swagger UI
|
||||

|
||||
|
||||
Here is how I tested ``POST`` method in UI:
|
||||

|
||||
|
||||
That's all for FastAPI for now.... Happy Learning!
|
|
@ -1,3 +1,4 @@
|
|||
# List of sections
|
||||
|
||||
- [API Methods](api-methods.md)
|
||||
- [FastAPI](fast-api.md)
|
|
@ -8,3 +8,4 @@
|
|||
- [Searching Algorithms](searching-algorithms.md)
|
||||
- [Greedy Algorithms](greedy-algorithms.md)
|
||||
- [Dynamic Programming](dynamic-programming.md)
|
||||
- [Linked list](linked-list.md)
|
||||
|
|
|
@ -0,0 +1,222 @@
|
|||
# Linked List Data Structure
|
||||
|
||||
Link list is a linear data Structure which can be defined as collection of objects called nodes that are randomly stored in the memory.
|
||||
A node contains two types of metadata i.e. data stored at that particular address and the pointer which contains the address of the next node in the memory.
|
||||
|
||||
The last element in a linked list features a null pointer.
|
||||
|
||||
## Why use linked list over array?
|
||||
|
||||
From the beginning, we are using array data structure to organize the group of elements that are stored individually in the memory.
|
||||
However, there are some advantage and disadvantage of array which should be known to decide which data structure will used throughout the program.
|
||||
|
||||
limitations
|
||||
|
||||
1. Before an array can be utilized in a program, its size must be established in advance.
|
||||
2. Expanding an array's size is a lengthy process and is almost impossible to achieve during runtime.
|
||||
3. Array elements must be stored in contiguous memory locations. To insert an element, all subsequent elements must be shifted
|
||||
|
||||
So we introduce a new data structure to overcome these limitations.
|
||||
|
||||
Linked list is used because,
|
||||
1. Dynamic Memory Management: Linked lists allocate memory dynamically, meaning nodes can be located anywhere in memory and are connected through pointers, rather than being stored contiguously.
|
||||
2. Adaptive Sizing: There is no need to predefine the size of a linked list. It can expand or contract during runtime, adapting to the program's requirements within the constraints of the available memory.
|
||||
|
||||
Let's code something
|
||||
|
||||
The smallest Unit: Node
|
||||
|
||||
```python
|
||||
class Node:
|
||||
def __init__(self, data):
|
||||
self.data = data # Assigns the given data to the node
|
||||
self.next = None # Initialize the next attribute to null
|
||||
```
|
||||
|
||||
Now, we will see the types of linked list.
|
||||
|
||||
There are mainly four types of linked list,
|
||||
1. Singly Link list
|
||||
2. Doubly link list
|
||||
3. Circular link list
|
||||
4. Doubly circular link list
|
||||
|
||||
|
||||
## 1. Singly linked list.
|
||||
|
||||
Simply think it is a chain of nodes in which each node remember(contains) the addresses of it next node.
|
||||
|
||||
### Creating a linked list class
|
||||
```python
|
||||
class LinkedList:
|
||||
def __init__(self):
|
||||
self.head = None # Initialize head as None
|
||||
```
|
||||
|
||||
### Inserting a new node at the beginning of a linked list
|
||||
|
||||
```python
|
||||
def insertAtBeginning(self, new_data):
|
||||
new_node = Node(new_data) # Create a new node
|
||||
new_node.next = self.head # Next for new node becomes the current head
|
||||
self.head = new_node # Head now points to the new node
|
||||
```
|
||||
|
||||
### Inserting a new node at the end of a linked list
|
||||
|
||||
```python
|
||||
def insertAtEnd(self, new_data):
|
||||
new_node = Node(new_data) # Create a new node
|
||||
if self.head is None:
|
||||
self.head = new_node # If the list is empty, make the new node the head
|
||||
return
|
||||
last = self.head
|
||||
while last.next: # Otherwise, traverse the list to find the last node
|
||||
last = last.next
|
||||
last.next = new_node # Make the new node the next node of the last node
|
||||
```
|
||||
### Inserting a new node at the middle of a linked list
|
||||
|
||||
```python
|
||||
def insertAtPosition(self, data, position):
|
||||
new_node = Node(data)
|
||||
if position <= 0: #check if position is valid or not
|
||||
print("Position should be greater than 0")
|
||||
return
|
||||
if position == 1:
|
||||
new_node.next = self.head
|
||||
self.head = new_node
|
||||
return
|
||||
current_node = self.head
|
||||
current_position = 1
|
||||
while current_node and current_position < position - 1: #Iterating to behind of the postion.
|
||||
current_node = current_node.next
|
||||
current_position += 1
|
||||
if not current_node: #Check if Position is out of bound or not
|
||||
print("Position is out of bounds")
|
||||
return
|
||||
new_node.next = current_node.next #connect the intermediate node
|
||||
current_node.next = new_node
|
||||
```
|
||||
### Printing the Linked list
|
||||
|
||||
```python
|
||||
def printList(self):
|
||||
temp = self.head # Start from the head of the list
|
||||
while temp:
|
||||
print(temp.data,end=' ') # Print the data in the current node
|
||||
temp = temp.next # Move to the next node
|
||||
print() # Ensures the output is followed by a new line
|
||||
```
|
||||
|
||||
Lets complete the code and create a linked list.
|
||||
|
||||
Connect all the code.
|
||||
|
||||
```python
|
||||
if __name__ == '__main__':
|
||||
llist = LinkedList()
|
||||
|
||||
# Insert words at the beginning
|
||||
llist.insertAtBeginning(4) # <4>
|
||||
llist.insertAtBeginning(3) # <3> 4
|
||||
llist.insertAtBeginning(2) # <2> 3 4
|
||||
llist.insertAtBeginning(1) # <1> 2 3 4
|
||||
|
||||
# Insert a word at the end
|
||||
llist.insertAtEnd(10) # 1 2 3 4 <10>
|
||||
llist.insertAtEnd(7) # 1 2 3 4 10 <7>
|
||||
|
||||
#Insert at a random position
|
||||
llist.insertAtPosition(9,4) ## 1 2 3 <9> 4 10 7
|
||||
# Print the list
|
||||
llist.printList()
|
||||
```
|
||||
|
||||
## output:
|
||||
1 2 3 9 4 10 7
|
||||
|
||||
|
||||
### Deleting a node from the beginning of a linked list
|
||||
check the list is empty otherwise shift the head to next node.
|
||||
```python
|
||||
def deleteFromBeginning(self):
|
||||
if self.head is None:
|
||||
return "The list is empty" # If the list is empty, return this string
|
||||
self.head = self.head.next # Otherwise, remove the head by making the next node the new head
|
||||
```
|
||||
### Deleting a node from the end of a linked list
|
||||
|
||||
```python
|
||||
def deleteFromEnd(self):
|
||||
if self.head is None:
|
||||
return "The list is empty"
|
||||
if self.head.next is None:
|
||||
self.head = None # If there's only one node, remove the head by making it None
|
||||
return
|
||||
temp = self.head
|
||||
while temp.next.next: # Otherwise, go to the second-last node
|
||||
temp = temp.next
|
||||
temp.next = None # Remove the last node by setting the next pointer of the second-last node to None
|
||||
```
|
||||
|
||||
|
||||
### Search in a linked list
|
||||
```python
|
||||
def search(self, value):
|
||||
current = self.head # Start with the head of the list
|
||||
position = 0 # Counter to keep track of the position
|
||||
while current: # Traverse the list
|
||||
if current.data == value: # Compare the list's data to the search value
|
||||
return f"Value '{value}' found at position {position}" # Print the value if a match is found
|
||||
current = current.next
|
||||
position += 1
|
||||
return f"Value '{value}' not found in the list"
|
||||
```
|
||||
|
||||
```python
|
||||
if __name__ == '__main__':
|
||||
llist = LinkedList()
|
||||
|
||||
# Insert words at the beginning
|
||||
llist.insertAtBeginning(4) # <4>
|
||||
llist.insertAtBeginning(3) # <3> 4
|
||||
llist.insertAtBeginning(2) # <2> 3 4
|
||||
llist.insertAtBeginning(1) # <1> 2 3 4
|
||||
|
||||
# Insert a word at the end
|
||||
llist.insertAtEnd(10) # 1 2 3 4 <10>
|
||||
llist.insertAtEnd(7) # 1 2 3 4 10 <7>
|
||||
|
||||
#Insert at a random position
|
||||
llist.insertAtPosition(9,4) # 1 2 3 <9> 4 10 7
|
||||
llist.insertAtPositon(56,4) # 1 2 3 <56> 9 4 10 7
|
||||
|
||||
#delete at the beginning
|
||||
llist.deleteFromBeginning() # 2 3 56 9 4 10 7
|
||||
|
||||
#delete at the end
|
||||
llist.deleteFromEnd() # 2 3 56 9 4 10
|
||||
# Print the list
|
||||
llist.printList()
|
||||
```
|
||||
## Output:
|
||||
|
||||
2 3 56 9 4 10
|
||||
|
||||
|
||||
|
||||
## Real Life uses of Linked List
|
||||
|
||||
|
||||
Here are a few practical applications of linked lists in various fields:
|
||||
|
||||
1. **Music Player**: In a music player, songs are often linked to the previous and next tracks. This allows for seamless navigation between songs, enabling you to play tracks either from the beginning or the end of the playlist. This is akin to a doubly linked list where each song node points to both the previous and the next song, enhancing the flexibility of song selection.
|
||||
|
||||
2. **GPS Navigation Systems**: Linked lists can be highly effective for managing lists of locations and routes in GPS navigation systems. Each location or waypoint can be represented as a node, making it easy to add or remove destinations and to navigate smoothly from one location to another. This is similar to how you might plan a road trip, plotting stops along the way in a flexible, dynamic manner.
|
||||
|
||||
3. **Task Scheduling**: Operating systems utilize linked lists to manage task scheduling. Each process waiting to be executed is represented as a node in a linked list. This organization allows the system to efficiently keep track of which processes need to be run, enabling fair and systematic scheduling of tasks. Think of it like a to-do list where each task is a node, and the system executes tasks in a structured order.
|
||||
|
||||
4. **Speech Recognition**: Speech recognition software uses linked lists to represent possible phonetic pronunciations of words. Each potential pronunciation is a node, allowing the software to dynamically explore different pronunciation paths as it processes spoken input. This method helps in accurately recognizing and understanding speech by considering multiple possibilities in a flexible manner, much like evaluating various potential meanings in a conversation.
|
||||
|
||||
These examples illustrate how linked lists provide a flexible, dynamic data structure that can be adapted to a wide range of practical applications, making them a valuable tool in both software development and real-world problem-solving.
|
|
@ -0,0 +1,357 @@
|
|||
|
||||
---
|
||||
# Optimizers in Machine Learning
|
||||
|
||||
Optimizers are algorithms or methods used to change the attributes of your neural network such as weights and learning rate in order to reduce the losses. Optimization algorithms help to minimize (or maximize) an objective function (also called a loss function) which is simply a mathematical function dependent on the model's internal learnable parameters which are used in computing the target values from the set of features.
|
||||
|
||||
## Types of Optimizers
|
||||
|
||||
|
||||
|
||||
### 1. Gradient Descent
|
||||
|
||||
**Explanation:**
|
||||
Gradient Descent is the simplest and most commonly used optimization algorithm. It works by iteratively updating the model parameters in the opposite direction of the gradient of the objective function with respect to the parameters. The idea is to find the minimum of a function by taking steps proportional to the negative of the gradient of the function at the current point.
|
||||
|
||||
**Mathematical Formulation:**
|
||||
|
||||
The update rule for the parameter vector θ in gradient descent is represented by the equation:
|
||||
|
||||
- $$\theta_{\text{new}} = \theta_{\text{old}} - \alpha \cdot \nabla J(\theta)$$
|
||||
|
||||
Where:
|
||||
- θold is the old parameter vector.
|
||||
- θnew is the updated parameter vector.
|
||||
- alpha(α) is the learning rate.
|
||||
- ∇J(θ) is the gradient of the objective function with respect to the parameters.
|
||||
|
||||
|
||||
|
||||
**Intuition:**
|
||||
- At each iteration, we calculate the gradient of the cost function.
|
||||
- The parameters are updated in the opposite direction of the gradient.
|
||||
- The size of the step is controlled by the learning rate α.
|
||||
|
||||
**Advantages:**
|
||||
- Simple to implement.
|
||||
- Suitable for convex problems.
|
||||
|
||||
**Disadvantages:**
|
||||
- Can be slow for large datasets.
|
||||
- May get stuck in local minima for non-convex problems.
|
||||
- Requires careful tuning of the learning rate.
|
||||
|
||||
**Python Implementation:**
|
||||
```python
|
||||
import numpy as np
|
||||
|
||||
def gradient_descent(X, y, lr=0.01, epochs=1000):
|
||||
m, n = X.shape
|
||||
theta = np.zeros(n)
|
||||
for epoch in range(epochs):
|
||||
gradient = np.dot(X.T, (np.dot(X, theta) - y)) / m
|
||||
theta -= lr * gradient
|
||||
return theta
|
||||
```
|
||||
|
||||
### 2. Stochastic Gradient Descent (SGD)
|
||||
|
||||
**Explanation:**
|
||||
SGD is a variation of gradient descent where we use only one training example to calculate the gradient and update the parameters. This introduces noise into the parameter updates, which can help to escape local minima but may cause the loss to fluctuate.
|
||||
|
||||
**Mathematical Formulation:**
|
||||
|
||||
- $$θ = θ - α \cdot \frac{∂J (θ; xᵢ, yᵢ)}{∂θ}$$
|
||||
|
||||
|
||||
- xᵢ, yᵢ are a single training example and its target.
|
||||
|
||||
**Intuition:**
|
||||
- At each iteration, a random training example is selected.
|
||||
- The gradient is calculated and the parameters are updated for this single example.
|
||||
- This process is repeated for a specified number of epochs.
|
||||
|
||||
**Advantages:**
|
||||
- Faster updates compared to batch gradient descent.
|
||||
- Can handle large datasets.
|
||||
- Helps to escape local minima due to the noise in updates.
|
||||
|
||||
**Disadvantages:**
|
||||
- Loss function may fluctuate.
|
||||
- Requires more iterations to converge.
|
||||
|
||||
**Python Implementation:**
|
||||
```python
|
||||
def stochastic_gradient_descent(X, y, lr=0.01, epochs=1000):
|
||||
m, n = X.shape
|
||||
theta = np.zeros(n)
|
||||
for epoch in range(epochs):
|
||||
for i in range(m):
|
||||
rand_index = np.random.randint(0, m)
|
||||
xi = X[rand_index:rand_index+1]
|
||||
yi = y[rand_index:rand_index+1]
|
||||
gradient = np.dot(xi.T, (np.dot(xi, theta) - yi))
|
||||
theta -= lr * gradient
|
||||
return theta
|
||||
```
|
||||
|
||||
### 3. Mini-Batch Gradient Descent
|
||||
|
||||
**Explanation:**
|
||||
Mini-Batch Gradient Descent is a variation where instead of a single training example or the whole dataset, a mini-batch of examples is used to compute the gradient. This reduces the variance of the parameter updates, leading to more stable convergence.
|
||||
|
||||
**Mathematical Formulation:**
|
||||
|
||||
- $$θ = θ - α \cdot \frac{1}{k} \sum_{i=1}^{k} \frac{∂J (θ; xᵢ, yᵢ)}{∂θ}$$
|
||||
|
||||
|
||||
Where:
|
||||
- \( k \) is the batch size.
|
||||
|
||||
**Intuition:**
|
||||
- At each iteration, a mini-batch of training examples is selected.
|
||||
- The gradient is calculated for this mini-batch.
|
||||
- The parameters are updated based on the average gradient of the mini-batch.
|
||||
|
||||
**Advantages:**
|
||||
- More stable updates compared to SGD.
|
||||
- Faster convergence than batch gradient descent.
|
||||
- Efficient on large datasets.
|
||||
|
||||
**Disadvantages:**
|
||||
- Requires tuning of batch size.
|
||||
- Computationally more expensive than SGD per iteration.
|
||||
|
||||
**Python Implementation:**
|
||||
```python
|
||||
def mini_batch_gradient_descent(X, y, lr=0.01, epochs=1000, batch_size=32):
|
||||
m, n = X.shape
|
||||
theta = np.zeros(n)
|
||||
for epoch in range(epochs):
|
||||
indices = np.random.permutation(m)
|
||||
X_shuffled = X[indices]
|
||||
y_shuffled = y[indices]
|
||||
for i in range(0, m, batch_size):
|
||||
X_i = X_shuffled[i:i+batch_size]
|
||||
y_i = y_shuffled[i:i+batch_size]
|
||||
gradient = np.dot(X_i.T, (np.dot(X_i, theta) - y_i)) / batch_size
|
||||
theta -= lr * gradient
|
||||
return theta
|
||||
```
|
||||
|
||||
### 4. Momentum
|
||||
|
||||
**Explanation:**
|
||||
Momentum helps accelerate gradient vectors in the right directions, thus leading to faster converging. It accumulates a velocity vector in directions of persistent reduction in the objective function, which helps to smooth the path towards the minimum.
|
||||
|
||||
**Mathematical Formulation:**
|
||||
|
||||
- $$v_t = γ \cdot v_{t-1} + α \cdot ∇J(θ)$$
|
||||
- $$θ = θ - v_t$$
|
||||
|
||||
where:
|
||||
|
||||
- \( v_t \) is the velocity.
|
||||
- γ is the momentum term, typically set between 0.9 and 0.99.
|
||||
|
||||
**Intuition:**
|
||||
- At each iteration, the gradient is calculated.
|
||||
- The velocity is updated based on the current gradient and the previous velocity.
|
||||
- The parameters are updated based on the velocity.
|
||||
|
||||
**Advantages:**
|
||||
- Faster convergence.
|
||||
- Reduces oscillations in the parameter updates.
|
||||
|
||||
**Disadvantages:**
|
||||
- Requires tuning of the momentum term.
|
||||
|
||||
**Python Implementation:**
|
||||
```python
|
||||
def momentum_gradient_descent(X, y, lr=0.01, epochs=1000, gamma=0.9):
|
||||
m, n = X.shape
|
||||
theta = np.zeros(n)
|
||||
v = np.zeros(n)
|
||||
for epoch in range(epochs):
|
||||
gradient = np.dot(X.T, (np.dot(X, theta) - y)) / m
|
||||
v = gamma * v + lr * gradient
|
||||
theta -= v
|
||||
return theta
|
||||
```
|
||||
|
||||
### 5. Nesterov Accelerated Gradient (NAG)
|
||||
|
||||
**Explanation:**
|
||||
NAG is a variant of the gradient descent with momentum. It looks ahead by a step and calculates the gradient at that point, thus providing more accurate updates. This method helps to correct the overshooting problem seen in standard momentum.
|
||||
|
||||
**Mathematical Formulation:**
|
||||
|
||||
- $$v_t = γv_{t-1} + α \cdot ∇J(θ - γ \cdot v_{t-1})$$
|
||||
|
||||
- $$θ = θ - v_t$$
|
||||
|
||||
|
||||
|
||||
|
||||
**Intuition:**
|
||||
- At each iteration, the parameters are temporarily updated using the previous velocity.
|
||||
- The gradient is calculated at this lookahead position.
|
||||
- The velocity and parameters are then updated based on this gradient.
|
||||
|
||||
**Advantages:**
|
||||
- More accurate updates compared to standard momentum.
|
||||
- Faster convergence.
|
||||
|
||||
**Disadvantages:**
|
||||
- Requires tuning of the momentum term.
|
||||
|
||||
**Python Implementation:**
|
||||
```python
|
||||
def nesterov_accelerated_gradient(X, y, lr=0.01, epochs=1000, gamma=0.9):
|
||||
m, n = X.shape
|
||||
theta = np.zeros(n)
|
||||
v = np.zeros(n)
|
||||
for epoch in range(epochs):
|
||||
lookahead_theta = theta - gamma * v
|
||||
gradient = np.dot(X.T, (np.dot(X, lookahead_theta) - y)) / m
|
||||
v = gamma * v + lr * gradient
|
||||
theta -= v
|
||||
return theta
|
||||
```
|
||||
|
||||
### 6. AdaGrad
|
||||
|
||||
**Explanation:**
|
||||
AdaGrad adapts the learning rate to the parameters, performing larger updates for infrequent and smaller updates for frequent parameters. It scales the learning rate inversely proportional to the square root of the sum of all historical squared values of the gradient.
|
||||
|
||||
**Mathematical Formulation:**
|
||||
|
||||
- $$G_t = G_{t-1} + (∂J(θ)/∂θ)^2$$
|
||||
|
||||
- $$θ = θ - \frac{α}{\sqrt{G_t + ε}} \cdot ∇J(θ)$$
|
||||
|
||||
Where:
|
||||
- \(G_t\) is the sum of squares of the gradients up to time step \( t \).
|
||||
- ε is a small constant to avoid division by zero.
|
||||
|
||||
**Intuition:**
|
||||
- Accumulates the sum of the squares of the gradients for each parameter.
|
||||
- Uses this accumulated
|
||||
|
||||
sum to scale the learning rate.
|
||||
- Parameters with large gradients in the past have smaller learning rates.
|
||||
|
||||
**Advantages:**
|
||||
- Effective for sparse data.
|
||||
- Automatically adjusts learning rate.
|
||||
|
||||
**Disadvantages:**
|
||||
- Learning rate decreases continuously, which can lead to premature convergence.
|
||||
|
||||
**Python Implementation:**
|
||||
```python
|
||||
def adagrad(X, y, lr=0.01, epochs=1000, epsilon=1e-8):
|
||||
m, n = X.shape
|
||||
theta = np.zeros(n)
|
||||
G = np.zeros(n)
|
||||
for epoch in range(epochs):
|
||||
gradient = np.dot(X.T, (np.dot(X, theta) - y)) / m
|
||||
G += gradient**2
|
||||
adjusted_lr = lr / (np.sqrt(G) + epsilon)
|
||||
theta -= adjusted_lr * gradient
|
||||
return theta
|
||||
```
|
||||
|
||||
### 7. RMSprop
|
||||
|
||||
**Explanation:**
|
||||
RMSprop modifies AdaGrad to perform well in non-convex settings by using a moving average of squared gradients to scale the learning rate. It helps to keep the learning rate in check, especially in the presence of noisy gradients.
|
||||
|
||||
**Mathematical Formulation:**
|
||||
|
||||
- E[g²]ₜ = βE[g²]ₜ₋₁ + (1 - β)(∂J(θ) / ∂θ)²
|
||||
|
||||
- $$θ = θ - \frac{α}{\sqrt{E[g^2]_t + ε}} \cdot ∇J(θ)$$
|
||||
|
||||
Where:
|
||||
- \( E[g^2]_t \) is the exponentially decaying average of past squared gradients.
|
||||
- β is the decay rate.
|
||||
|
||||
**Intuition:**
|
||||
- Keeps a running average of the squared gradients.
|
||||
- Uses this average to scale the learning rate.
|
||||
- Parameters with large gradients have their learning rates reduced.
|
||||
|
||||
**Advantages:**
|
||||
- Effective for non-convex problems.
|
||||
- Reduces oscillations in parameter updates.
|
||||
|
||||
**Disadvantages:**
|
||||
- Requires tuning of the decay rate.
|
||||
|
||||
**Python Implementation:**
|
||||
```python
|
||||
def rmsprop(X, y, lr=0.01, epochs=1000, beta=0.9, epsilon=1e-8):
|
||||
m, n = X.shape
|
||||
theta = np.zeros(n)
|
||||
E_g = np.zeros(n)
|
||||
for epoch in range(epochs):
|
||||
gradient = np.dot(X.T, (np.dot(X, theta) - y)) / m
|
||||
E_g = beta * E_g + (1 - beta) * gradient**2
|
||||
adjusted_lr = lr / (np.sqrt(E_g) + epsilon)
|
||||
theta -= adjusted_lr * gradient
|
||||
return theta
|
||||
```
|
||||
|
||||
### 8. Adam
|
||||
|
||||
**Explanation:**
|
||||
Adam (Adaptive Moment Estimation) combines the advantages of both RMSprop and AdaGrad by keeping an exponentially decaying average of past gradients and past squared gradients.
|
||||
|
||||
**Mathematical Formulation:**
|
||||
|
||||
- $$m_t = β_1m_{t-1} + (1 - β_1)(∂J(θ)/∂θ)$$
|
||||
- $$v_t = β_2v_{t-1} + (1 - β_2)(∂J(θ)/∂θ)^2$$
|
||||
- $$\hat{m}_t = \frac{m_t}{1 - β_1^t}$$
|
||||
- $$\hat{v}_t = \frac{v_t}{1 - β_2^t}$$
|
||||
- $$θ = θ - \frac{α\hat{m}_t}{\sqrt{\hat{v}_t} + ε}$$
|
||||
|
||||
Where:
|
||||
- \( m<sub>t \) is the first moment (mean) of the gradient.
|
||||
- \( v<sub>t \) is the second moment (uncentered variance) of the gradient.
|
||||
- β_1.β_2 are the decay rates for the moment estimates.
|
||||
|
||||
**Intuition:**
|
||||
- Keeps track of both the mean and the variance of the gradients.
|
||||
- Uses these to adaptively scale the learning rate.
|
||||
- Provides a balance between AdaGrad and RMSprop.
|
||||
|
||||
**Advantages:**
|
||||
- Efficient for large datasets.
|
||||
- Well-suited for non-convex optimization.
|
||||
- Handles sparse gradients well.
|
||||
|
||||
**Disadvantages:**
|
||||
- Requires careful tuning of hyperparameters.
|
||||
- Can be computationally intensive.
|
||||
|
||||
**Python Implementation:**
|
||||
```python
|
||||
def adam(X, y, lr=0.01, epochs=1000, beta1=0.9, beta2=0.999, epsilon=1e-8):
|
||||
m, n = X.shape
|
||||
theta = np.zeros(n)
|
||||
m_t = np.zeros(n)
|
||||
v_t = np.zeros(n)
|
||||
for epoch in range(1, epochs+1):
|
||||
gradient = np.dot(X.T, (np.dot(X, theta) - y)) / m
|
||||
m_t = beta1 * m_t + (1 - beta1) * gradient
|
||||
v_t = beta2 * v_t + (1 - beta2) * gradient**2
|
||||
m_t_hat = m_t / (1 - beta1**epoch)
|
||||
v_t_hat = v_t / (1 - beta2**epoch)
|
||||
theta -= lr * m_t_hat / (np.sqrt(v_t_hat) + epsilon)
|
||||
return theta
|
||||
```
|
||||
|
||||
These implementations are basic examples of how these optimizers can be implemented in Python using NumPy. In practice, libraries like TensorFlow and PyTorch provide highly optimized and more sophisticated implementations of these and other optimization algorithms.
|
||||
|
||||
---
|
|
@ -8,3 +8,4 @@
|
|||
- [Artificial Neural Network from the Ground Up](ArtificialNeuralNetwork.md)
|
||||
- [TensorFlow.md](tensorFlow.md)
|
||||
- [PyTorch.md](pytorch.md)
|
||||
- [Types of optimizers](Types_of_optimizers.md)
|
||||
|
|
|
@ -0,0 +1,11 @@
|
|||
Make,Colour,Odometer,Doors,Price
|
||||
Toyota,White,150043,4,"$4,000"
|
||||
Honda,Red,87899,4,"$5,000"
|
||||
Toyota,Blue,,3,"$7,000"
|
||||
BMW,Black,11179,5,"$22,000"
|
||||
Nissan,White,213095,4,"$3,500"
|
||||
Toyota,Green,,4,"$4,500"
|
||||
Honda,,,4,"$7,500"
|
||||
Honda,Blue,,4,
|
||||
Toyota,White,60000,,
|
||||
,White,31600,4,"$9,700"
|
|
|
@ -0,0 +1,11 @@
|
|||
Make,Colour,Odometer (KM),Doors,Price
|
||||
Toyota,White,150043,4,"$4,000.00"
|
||||
Honda,Red,87899,4,"$5,000.00"
|
||||
Toyota,Blue,32549,3,"$7,000.00"
|
||||
BMW,Black,11179,5,"$22,000.00"
|
||||
Nissan,White,213095,4,"$3,500.00"
|
||||
Toyota,Green,99213,4,"$4,500.00"
|
||||
Honda,Blue,45698,4,"$7,500.00"
|
||||
Honda,Blue,54738,4,"$7,000.00"
|
||||
Toyota,White,60000,4,"$6,250.00"
|
||||
Nissan,White,31600,4,"$9,700.00"
|
|
|
@ -0,0 +1 @@
|
|||
## This folder contains all the Datasets used in the content.
|
|
@ -0,0 +1,264 @@
|
|||
# Handling Missing Values in Pandas
|
||||
|
||||
In real life, many datasets arrive with missing data either because it exists and was not collected or it never existed.
|
||||
|
||||
In Pandas missing data is represented by two values:
|
||||
|
||||
* `None` : None is simply is `keyword` refer as empty or none.
|
||||
* `NaN` : Acronym for `Not a Number`.
|
||||
|
||||
There are several useful functions for detecting, removing, and replacing null values in Pandas DataFrame:
|
||||
|
||||
1. `isnull()`
|
||||
2. `notnull()`
|
||||
3. `dropna()`
|
||||
4. `fillna()`
|
||||
5. `replace()`
|
||||
|
||||
## 2. Checking for missing values using `isnull()` and `notnull()`
|
||||
|
||||
Let's import pandas and our fancy car-sales dataset having some missing values.
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
car_sales_missing_df = pd.read_csv("Datasets/car-sales-missing-data.csv")
|
||||
print(car_sales_missing_df)
|
||||
```
|
||||
|
||||
Make Colour Odometer Doors Price
|
||||
0 Toyota White 150043.0 4.0 $4,000
|
||||
1 Honda Red 87899.0 4.0 $5,000
|
||||
2 Toyota Blue NaN 3.0 $7,000
|
||||
3 BMW Black 11179.0 5.0 $22,000
|
||||
4 Nissan White 213095.0 4.0 $3,500
|
||||
5 Toyota Green NaN 4.0 $4,500
|
||||
6 Honda NaN NaN 4.0 $7,500
|
||||
7 Honda Blue NaN 4.0 NaN
|
||||
8 Toyota White 60000.0 NaN NaN
|
||||
9 NaN White 31600.0 4.0 $9,700
|
||||
|
||||
|
||||
|
||||
```python
|
||||
## Using isnull()
|
||||
|
||||
print(car_sales_missing_df.isnull())
|
||||
```
|
||||
|
||||
Make Colour Odometer Doors Price
|
||||
0 False False False False False
|
||||
1 False False False False False
|
||||
2 False False True False False
|
||||
3 False False False False False
|
||||
4 False False False False False
|
||||
5 False False True False False
|
||||
6 False True True False False
|
||||
7 False False True False True
|
||||
8 False False False True True
|
||||
9 True False False False False
|
||||
|
||||
|
||||
Note here:
|
||||
* `True` means for `NaN` values
|
||||
* `False` means for no `Nan` values
|
||||
|
||||
If we want to find the number of missing values in each column use `isnull().sum()`.
|
||||
|
||||
|
||||
```python
|
||||
print(car_sales_missing_df.isnull().sum())
|
||||
```
|
||||
|
||||
Make 1
|
||||
Colour 1
|
||||
Odometer 4
|
||||
Doors 1
|
||||
Price 2
|
||||
dtype: int64
|
||||
|
||||
|
||||
You can also check presense of null values in a single column.
|
||||
|
||||
|
||||
```python
|
||||
print(car_sales_missing_df["Odometer"].isnull())
|
||||
```
|
||||
|
||||
0 False
|
||||
1 False
|
||||
2 True
|
||||
3 False
|
||||
4 False
|
||||
5 True
|
||||
6 True
|
||||
7 True
|
||||
8 False
|
||||
9 False
|
||||
Name: Odometer, dtype: bool
|
||||
|
||||
|
||||
|
||||
```python
|
||||
## using notnull()
|
||||
|
||||
print(car_sales_missing_df.notnull())
|
||||
```
|
||||
|
||||
Make Colour Odometer Doors Price
|
||||
0 True True True True True
|
||||
1 True True True True True
|
||||
2 True True False True True
|
||||
3 True True True True True
|
||||
4 True True True True True
|
||||
5 True True False True True
|
||||
6 True False False True True
|
||||
7 True True False True False
|
||||
8 True True True False False
|
||||
9 False True True True True
|
||||
|
||||
|
||||
Note here:
|
||||
* `True` means no `NaN` values
|
||||
* `False` means for `NaN` values
|
||||
|
||||
`isnull()` means having null values so it gives boolean `True` for NaN values. And `notnull()` means having no null values so it gives `True` for no NaN value.
|
||||
|
||||
## 2. Filling missing values using `fillna()`, `replace()`.
|
||||
|
||||
|
||||
```python
|
||||
## Filling missing values with a single value using `fillna`
|
||||
print(car_sales_missing_df.fillna(0))
|
||||
```
|
||||
|
||||
Make Colour Odometer Doors Price
|
||||
0 Toyota White 150043.0 4.0 $4,000
|
||||
1 Honda Red 87899.0 4.0 $5,000
|
||||
2 Toyota Blue 0.0 3.0 $7,000
|
||||
3 BMW Black 11179.0 5.0 $22,000
|
||||
4 Nissan White 213095.0 4.0 $3,500
|
||||
5 Toyota Green 0.0 4.0 $4,500
|
||||
6 Honda 0 0.0 4.0 $7,500
|
||||
7 Honda Blue 0.0 4.0 0
|
||||
8 Toyota White 60000.0 0.0 0
|
||||
9 0 White 31600.0 4.0 $9,700
|
||||
|
||||
|
||||
|
||||
```python
|
||||
## Filling missing values with the previous value using `ffill()`
|
||||
print(car_sales_missing_df.ffill())
|
||||
```
|
||||
|
||||
Make Colour Odometer Doors Price
|
||||
0 Toyota White 150043.0 4.0 $4,000
|
||||
1 Honda Red 87899.0 4.0 $5,000
|
||||
2 Toyota Blue 87899.0 3.0 $7,000
|
||||
3 BMW Black 11179.0 5.0 $22,000
|
||||
4 Nissan White 213095.0 4.0 $3,500
|
||||
5 Toyota Green 213095.0 4.0 $4,500
|
||||
6 Honda Green 213095.0 4.0 $7,500
|
||||
7 Honda Blue 213095.0 4.0 $7,500
|
||||
8 Toyota White 60000.0 4.0 $7,500
|
||||
9 Toyota White 31600.0 4.0 $9,700
|
||||
|
||||
|
||||
|
||||
```python
|
||||
## illing null value with the next ones using 'bfill()'
|
||||
print(car_sales_missing_df.bfill())
|
||||
```
|
||||
|
||||
Make Colour Odometer Doors Price
|
||||
0 Toyota White 150043.0 4.0 $4,000
|
||||
1 Honda Red 87899.0 4.0 $5,000
|
||||
2 Toyota Blue 11179.0 3.0 $7,000
|
||||
3 BMW Black 11179.0 5.0 $22,000
|
||||
4 Nissan White 213095.0 4.0 $3,500
|
||||
5 Toyota Green 60000.0 4.0 $4,500
|
||||
6 Honda Blue 60000.0 4.0 $7,500
|
||||
7 Honda Blue 60000.0 4.0 $9,700
|
||||
8 Toyota White 60000.0 4.0 $9,700
|
||||
9 NaN White 31600.0 4.0 $9,700
|
||||
|
||||
|
||||
#### Filling a null values using `replace()` method
|
||||
|
||||
Now we are going to replace the all `NaN` value in the data frame with -125 value
|
||||
|
||||
For this we will also need numpy
|
||||
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
|
||||
print(car_sales_missing_df.replace(to_replace = np.nan, value = -125))
|
||||
```
|
||||
|
||||
Make Colour Odometer Doors Price
|
||||
0 Toyota White 150043.0 4.0 $4,000
|
||||
1 Honda Red 87899.0 4.0 $5,000
|
||||
2 Toyota Blue -125.0 3.0 $7,000
|
||||
3 BMW Black 11179.0 5.0 $22,000
|
||||
4 Nissan White 213095.0 4.0 $3,500
|
||||
5 Toyota Green -125.0 4.0 $4,500
|
||||
6 Honda -125 -125.0 4.0 $7,500
|
||||
7 Honda Blue -125.0 4.0 -125
|
||||
8 Toyota White 60000.0 -125.0 -125
|
||||
9 -125 White 31600.0 4.0 $9,700
|
||||
|
||||
|
||||
## 3. Dropping missing values using `dropna()`
|
||||
|
||||
In order to drop a null values from a dataframe, we used `dropna()` function this function drop Rows/Columns of datasets with Null values in different ways.
|
||||
|
||||
#### Dropping rows with at least 1 null value.
|
||||
|
||||
|
||||
```python
|
||||
print(car_sales_missing_df.dropna(axis = 0)) ##Now we drop rows with at least one Nan value (Null value)
|
||||
```
|
||||
|
||||
Make Colour Odometer Doors Price
|
||||
0 Toyota White 150043.0 4.0 $4,000
|
||||
1 Honda Red 87899.0 4.0 $5,000
|
||||
3 BMW Black 11179.0 5.0 $22,000
|
||||
4 Nissan White 213095.0 4.0 $3,500
|
||||
|
||||
|
||||
#### Dropping rows if all values in that row are missing.
|
||||
|
||||
|
||||
```python
|
||||
print(car_sales_missing_df.dropna(how = 'all',axis = 0)) ## If not have leave the row as it is
|
||||
```
|
||||
|
||||
Make Colour Odometer Doors Price
|
||||
0 Toyota White 150043.0 4.0 $4,000
|
||||
1 Honda Red 87899.0 4.0 $5,000
|
||||
2 Toyota Blue NaN 3.0 $7,000
|
||||
3 BMW Black 11179.0 5.0 $22,000
|
||||
4 Nissan White 213095.0 4.0 $3,500
|
||||
5 Toyota Green NaN 4.0 $4,500
|
||||
6 Honda NaN NaN 4.0 $7,500
|
||||
7 Honda Blue NaN 4.0 NaN
|
||||
8 Toyota White 60000.0 NaN NaN
|
||||
9 NaN White 31600.0 4.0 $9,700
|
||||
|
||||
|
||||
#### Dropping columns with at least 1 null value
|
||||
|
||||
|
||||
```python
|
||||
print(car_sales_missing_df.dropna(axis = 1))
|
||||
```
|
||||
|
||||
Empty DataFrame
|
||||
Columns: []
|
||||
Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
|
||||
|
||||
|
||||
Now we drop a columns which have at least 1 missing values.
|
||||
|
||||
Here the dataset becomes empty after `dropna()` because each column as atleast 1 null value so it remove that columns resulting in an empty dataframe.
|
|
@ -0,0 +1,46 @@
|
|||
# Importing and Exporting Data in Pandas
|
||||
|
||||
## Importing Data from a CSV
|
||||
|
||||
We can create `Series` and `DataFrame` in pandas, but often we have to import the data which is in the form of `.csv` (Comma Separated Values), a spreadsheet file or similar tabular data file format.
|
||||
|
||||
`pandas` allows for easy importing of this data using functions such as `read_csv()` and `read_excel()` for Microsoft Excel files.
|
||||
|
||||
*Note: In case you want to get the information from a **Google Sheet** you can export it as a .csv file.*
|
||||
|
||||
The `read_csv()` function can be used to import a CSV file into a pandas DataFrame. The path can be a file system path or a URL where the CSV is available.
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
car_sales_df= pd.read_csv("Datasets/car-sales.csv")
|
||||
print(car_sales_df)
|
||||
```
|
||||
|
||||
```
|
||||
Make Colour Odometer (KM) Doors Price
|
||||
0 Toyota White 150043 4 $4,000.00
|
||||
1 Honda Red 87899 4 $5,000.00
|
||||
2 Toyota Blue 32549 3 $7,000.00
|
||||
3 BMW Black 11179 5 $22,000.00
|
||||
4 Nissan White 213095 4 $3,500.00
|
||||
5 Toyota Green 99213 4 $4,500.00
|
||||
6 Honda Blue 45698 4 $7,500.00
|
||||
7 Honda Blue 54738 4 $7,000.00
|
||||
8 Toyota White 60000 4 $6,250.00
|
||||
9 Nissan White 31600 4 $9,700.00
|
||||
```
|
||||
|
||||
You can find the dataset used above in the `Datasets` folder.
|
||||
|
||||
*Note: If you want to import the data from Github you can't directly use its link, you have to first obtain the raw file URL by clicking on the raw button present in the repo*
|
||||
|
||||
## Exporting Data to a CSV
|
||||
|
||||
`pandas` allows you to export `DataFrame` to `.csv` format using `.to_csv()`, or to a Excel spreadsheet using `.to_excel()`.
|
||||
|
||||
```python
|
||||
car_sales_df.to_csv("exported_car_sales.csv")
|
||||
```
|
||||
|
||||
Running this will save a file called ``exported_car_sales.csv`` to the current folder.
|
|
@ -5,3 +5,5 @@
|
|||
- [Pandas Descriptive Statistics](Descriptive_Statistics.md)
|
||||
- [Group By Functions with Pandas](GroupBy_Functions_Pandas.md)
|
||||
- [Excel using Pandas DataFrame](excel_with_pandas.md)
|
||||
- [Importing and Exporting Data in Pandas](import-export.md)
|
||||
- [Handling Missing Values in Pandas](handling-missing-values.md)
|
||||
|
|
Ładowanie…
Reference in New Issue