Merge branch 'animator:main' into main

pull/614/head
Vinay Sagar 2024-05-26 13:13:17 +05:30 zatwierdzone przez GitHub
commit 824b388c58
Nie znaleziono w bazie danych klucza dla tego podpisu
ID klucza GPG: B5690EEEBB952194
49 zmienionych plików z 6643 dodań i 6 usunięć

Wyświetl plik

@ -0,0 +1,379 @@
In Python object-oriented Programming (OOPs) is a programming paradigm
that uses objects and classes in programming. It aims to implement
real-world entities like inheritance, polymorphisms, encapsulation, etc.
in the programming. The main concept of object-oriented Programming
(OOPs) or oops concepts in Python is to bind the data and the functions
that work together as a single unit so that no other part of the code
can access this data.
**OOPs Concepts in Python**
1. Class in Python
2. Objects in Python
3. Polymorphism in Python
4. Encapsulation in Python
5. Inheritance in Python
6. Data Abstraction in Python
Python Class A class is a collection of objects. A class contains the
blueprints or the prototype from which the objects are being created. It
is a logical entity that contains some attributes and methods.
```python
#Simple Class in Python
class Dog:
pass
```
**Python Objects** In object oriented programming Python, The object is
an entity that has a state and behavior associated with it. It may be
any real-world object like a mouse, keyboard, chair, table, pen, etc.
Integers, strings, floating-point numbers, even arrays, and
dictionaries, are all objects.
```python
obj = Dog()
```
This creates an instance for class Dog
**The Python **init** Method**
The **init** method is similar to constructors in C++ and Java. It is
run as soon as an object of a class is instantiated. The method is
useful to do any initialization you want to do with your object.
```python
class Dog:
# class attribute
attr1 = "mammal"
# Instance attribute
def __init__(self, name):
self.name = name
# Object instantiation
Rodger = Dog("Rodger")
Tommy = Dog("Tommy")
# Accessing class attributes
print("Rodger is a {}".format(Rodger.__class__.attr1))
print("Tommy is also a {}".format(Tommy.__class__.attr1))
# Accessing instance attributes
print("My name is {}".format(Rodger.name))
print("My name is {}".format(Tommy.name))
```
In the above mentioned code, init method is used to initialize the name.
**Inheritance**
In Python object oriented Programming, Inheritance is the capability of
one class to derive or inherit the properties from another class. The
class that derives properties is called the derived class or child class
and the class from which the properties are being derived is called the
base class or parent class.
Types of Inheritances:
- Single Inheritance
- Multilevel Inheritance
- Multiple Inheritance
- Hierarchial Inheritance
```python
#Single Inheritance
# Parent class
class Animal:
def __init__(self, name, sound):
self.name = name
self.sound = sound
def make_sound(self):
print(f"{self.name} says {self.sound}")
# Child class inheriting from Animal
class Dog(Animal):
def __init__(self, name):
# Call the constructor of the parent class
super().__init__(name, "Woof")
# Child class inheriting from Animal
class Cat(Animal):
def __init__(self, name):
# Call the constructor of the parent class
super().__init__(name, "Meow")
# Creating objects of the derived classes
dog = Dog("Buddy")
cat = Cat("Whiskers")
# Accessing methods of the parent class
dog.make_sound()
cat.make_sound()
```
The above code depicts the Single Inheritance, in case of single inheritance there's only a single base class and a derived class. Here, Dog and Cat are the derived classes with Animal as the parent class. They can access the methods of the base class or derive their own methods.
```python
#Multilevel Inheritance
# Parent class
class Animal:
def __init__(self, name):
self.name = name
def speak(self):
print(f"{self.name} speaks")
# Child class inheriting from Animal
class Dog(Animal):
def bark(self):
print(f"{self.name} barks")
# Grandchild class inheriting from Dog
class GermanShepherd(Dog):
def guard(self):
print(f"{self.name} guards")
# Creating objects of the derived classes
german_shepherd = GermanShepherd("Rocky")
# Accessing methods from all levels of inheritance
german_shepherd.speak() # Accessing method from the Animal class
german_shepherd.bark() # Accessing method from the Dog class
german_shepherd.guard() # Accessing method from the GermanShepherd class
```
Multilevel inheritance is a concept in object-oriented programming where a class inherits properties and behaviors from another class, which itself may inherit from another class. In other words, it involves a chain of inheritance where a subclass inherits from a superclass, and that subclass can then become a superclass for another subclass.Its similar to GrandFather ,Father and Son .In the above code,Animal class is the superclass, Dog is derived from Animal and Dog is the parent of GermanShepherd. GermenShepherd is the child class of Dog. GermenShepherd can access methods of both Animal and Dog.
```python
#Hierarchial Inheritance
# Parent class
class Animal:
def __init__(self, name):
self.name = name
def speak(self):
print(f"{self.name} speaks")
# Child class 1 inheriting from Animal
class Dog(Animal):
def bark(self):
print(f"{self.name} barks")
# Child class 2 inheriting from Animal
class Cat(Animal):
def meow(self):
print(f"{self.name} meows")
# Creating objects of the derived classes
dog = Dog("Buddy")
cat = Cat("Whiskers")
# Accessing methods from the parent and child classes
dog.speak() # Accessing method from the Animal class
dog.bark() # Accessing method from the Dog class
cat.speak() # Accessing method from the Animal class
cat.meow() # Accessing method from the Cat class
```
Hierarchical inheritance is a type of inheritance in object-oriented programming where one class serves as a superclass for multiple subclasses. In this inheritance model, each subclass inherits properties and behaviors from the same superclass, creating a hierarchical tree-like structure.
```python
#Multiple Inheritance
# Parent class 1
class Herbivore:
def eat_plants(self):
print("Eating plants")
# Parent class 2
class Carnivore:
def eat_meat(self):
print("Eating meat")
# Child class inheriting from both Herbivore and Carnivore
class Omnivore(Herbivore, Carnivore):
def eat(self):
print("Eating everything")
# Creating an object of the Omnivore class
omnivore = Omnivore()
# Accessing methods from both parent classes
omnivore.eat_plants() # Accessing method from Herbivore
omnivore.eat_meat() # Accessing method from Carnivore
omnivore.eat() # Accessing method from Omnivore
```
Multiple inheritance is a concept in object-oriented programming where a class can inherit properties and behaviors from more than one parent class. This means that a subclass can have multiple immediate parent classes, allowing it to inherit features from each of them.
**Polymorphism** In object oriented Programming Python, Polymorphism
simply means having many forms
```python
class Bird:
def intro(self):
print("There are many types of birds.")
def flight(self):
print("Most of the birds can fly but some cannot.")
class sparrow(Bird):
def flight(self):
print("Sparrows can fly.")
class ostrich(Bird):
def flight(self):
print("Ostriches cannot fly.")
obj_bird = Bird()
obj_spr = sparrow()
obj_ost = ostrich()
obj_bird.intro()
obj_bird.flight()
obj_spr.intro()
obj_spr.flight()
obj_ost.intro()
obj_ost.flight()
```
Poly stands for 'many' and morphism for 'forms'. In the above code, method flight() has many forms.
**Python Encapsulation**
In Python object oriented programming, Encapsulation is one of the
fundamental concepts in object-oriented programming (OOP). It describes
the idea of wrapping data and the methods that work on data within one
unit. This puts restrictions on accessing variables and methods directly
and can prevent the accidental modification of data. To prevent
accidental change, an object's variable can only be changed by an
object's method. Those types of variables are known as private
variables.
```python
class Car:
def __init__(self, make, model, year):
self._make = make # Encapsulated attribute with single underscore
self._model = model # Encapsulated attribute with single underscore
self._year = year # Encapsulated attribute with single underscore
self._odometer_reading = 0 # Encapsulated attribute with single underscore
def get_make(self):
return self._make
def get_model(self):
return self._model
def get_year(self):
return self._year
def get_odometer_reading(self):
return self._odometer_reading
def update_odometer(self, mileage):
if mileage >= self._odometer_reading:
self._odometer_reading = mileage
else:
print("You can't roll back an odometer!")
def increment_odometer(self, miles):
self._odometer_reading += miles
# Creating an instance of the Car class
my_car = Car("Toyota", "Camry", 2021)
# Accessing encapsulated attributes through methods
print("Make:", my_car.get_make())
print("Model:", my_car.get_model())
print("Year:", my_car.get_year())
# Modifying encapsulated attribute through method
my_car.update_odometer(100)
print("Odometer Reading:", my_car.get_odometer_reading())
# Incrementing odometer reading
my_car.increment_odometer(50)
print("Odometer Reading after increment:", my_car.get_odometer_reading())
```
**Data Abstraction** It hides unnecessary code details from the user.
Also, when we do not want to give out sensitive parts of our code
implementation and this is where data abstraction came.
```python
from abc import ABC, abstractmethod
# Abstract class defining the interface for a Shape
class Shape(ABC):
def __init__(self, name):
self.name = name
@abstractmethod
def area(self):
pass
@abstractmethod
def perimeter(self):
pass
# Concrete class implementing the Shape interface for a Rectangle
class Rectangle(Shape):
def __init__(self, name, length, width):
super().__init__(name)
self.length = length
self.width = width
def area(self):
return self.length * self.width
def perimeter(self):
return 2 * (self.length + self.width)
# Concrete class implementing the Shape interface for a Circle
class Circle(Shape):
def __init__(self, name, radius):
super().__init__(name)
self.radius = radius
def area(self):
return 3.14 * self.radius * self.radius
def perimeter(self):
return 2 * 3.14 * self.radius
# Creating objects of the derived classes
rectangle = Rectangle("Rectangle", 5, 4)
circle = Circle("Circle", 3)
# Accessing methods defined by the Shape interface
print(f"{rectangle.name}: Area = {rectangle.area()}, Perimeter = {rectangle.perimeter()}")
print(f"{circle.name}: Area = {circle.area()}, Perimeter = {circle.perimeter()}")
```
To implement Data Abstraction , we have to import abc . ABC stands for Abstract Base Class . All those classes which want to implement data abstraction have to inherit from ABC.
@abstractmethod is a decorator provided by the abc module, which stands for "abstract method". It's used to define abstract methods within abstract base classes (ABCs). An abstract method is a method declared in a class, but it does not contain an implementation. Instead, it serves as a placeholder, and its concrete implementation must be provided by subclasses.
Abstract methods can be implemented by the derived classes.

Wyświetl plik

@ -0,0 +1,117 @@
## Working with Dates and Times in Python
Handling dates and times is an essential aspect of many programming tasks.
Python provides robust modules to work with dates and times, making it easier to perform operations like formatting, parsing, and arithmetic.
This guide provides an overview of these modules and their key functionalities.
## 1. 'datetime' Module
The datetime module supplies classes for manipulating dates and times. The main classes in the datetime module are:
* date: Represents a date (year, month, day).
* time: Represents a time (hour, minute, second, microsecond).
* datetime: Combines date and time information.
* timedelta: Represents the difference between two dates or times.
* tzinfo: Provides time zone information objects.
**Key Concepts:**
* Naive vs. Aware: Naive datetime objects do not contain time zone information, while aware datetime objects do.
* Immutability: date and time objects are immutable; once created, they cannot be changed.
Example:
```bash
import datetime
# Get the current date and time
now = datetime.datetime.now()
print("Current date and time:", now)
```
## 2. Formatting Dates and Times
Formatting involves converting datetime objects into human-readable strings. This is achieved using the strftime method, which stands for "string format time."
You can specify various format codes to dictate how the output string should be structured.
**Common Format Codes:**
* %Y: Year with century (e.g., 2024)
* %m: Month as a zero-padded decimal number (e.g., 01)
* %d: Day of the month as a zero-padded decimal number (e.g., 15)
* %H: Hour (24-hour clock) as a zero-padded decimal number (e.g., 13)
* %M: Minute as a zero-padded decimal number (e.g., 45)
* %S: Second as a zero-padded decimal number (e.g., 30)
Example:
```bash
import datetime
now = datetime.datetime.now()
formatted_now = now.strftime("%Y-%m-%d %H:%M:%S")
print("Formatted current date and time:", formatted_now)
```
## 3. Parsing Dates and Times
Parsing is the process of converting strings representing dates and times into datetime objects. The strptime method, which stands for "string parse time,"
allows you to specify the format of the input string.
Example:
```bash
import datetime
date_string = "2024-05-15 13:45:30"
date_object = datetime.datetime.strptime(date_string, "%Y-%m-%d %H:%M:%S")
print("Parsed date and time:", date_object)
```
## 4. Working with Time Differences
The timedelta class is used to represent the difference between two datetime objects. This is useful for calculations involving durations, such as finding the
number of days between two dates or adding a certain period to a date.
Example:
```bash
import datetime
date1 = datetime.datetime(2024, 5, 15, 12, 0, 0)
date2 = datetime.datetime(2024, 5, 20, 14, 30, 0)
difference = date2 - date1
print("Difference:", difference)
print("Days:", difference.days)
print("Total seconds:", difference.total_seconds())
```
## 5. Time Zones
Time zone handling in Python is facilitated by the pytz library. It allows you to convert naive datetime objects into timezone-aware objects and perform
operations across different time zones.
**Key Concepts:**
* Timezone-aware: A datetime object that includes timezone information.
* Localization: The process of associating a naive datetime with a time zone.
Example:
```bash
import datetime
import pytz
# Define a timezone
tz = pytz.timezone('Asia/Kolkata')
# Get the current time in a specific timezone
now = datetime.datetime.now(tz)
print("Current time in Asia/Kolkata:", now)
```
## 6. Date Arithmetic
Date arithmetic involves performing operations like addition or subtraction on date or datetime objects using timedelta. This is useful for calculating future
or past dates based on a given date.
Example:
```bash
import datetime
today = datetime.date.today()
future_date = today + datetime.timedelta(days=10)
print("Date after 10 days:", future_date)
```
## Summary
Pythons datetime module and the pytz library provide comprehensive tools for working with dates, times, and time zones. They enable you to perform a wide range
of operations, from basic date manipulations to complex time zone conversions.

Wyświetl plik

@ -1,3 +1,8 @@
# List of sections
- [OOPs](OOPs.md)
- [Decorators/\*args/**kwargs](decorator-kwargs-args.md)
- [Lambda Function](lambda-function.md)
- [Working with Dates & Times in Python](dates_and_times.md)
- [Regular Expressions in Python](regular_expressions.md)
- [JSON module](json-module.md)

Wyświetl plik

@ -0,0 +1,289 @@
# JSON Module
## What is JSON?
- [JSON]("https://www.json.org/json-en.html") (JavaScript Object Notation) is a format for structuring data.
- JSON is a lightweight, text-based data interchange format that is completely language-independent.
- Similar to XML, JSON is a format for structuring data commonly used by web applications to communicate with each other.
## Why JSON?
- Whenever we declare a variable and assign a value to it, the variable itself doesn't hold the value. Instead, the variable holds an address in memory where the value is stored. For example:
```python
age = 21
```
- When we use `age`, it gets replaced with `21`. However, *age doesn't contain 21, it contains the address of the memory location where 21 is stored*.
- While this works locally, transferring this data, such as through an API, poses a challenge. Sending your computers entire memory with the addresses is impractical and insecure. This is where JSON comes to the rescue.
### Example JSON
- JSON supports most widely used data types including String
, Number, Boolean, Null, Array and Object.
- Here is an example of JSON file
```json
{
"name": "John Doe",
"age": 21,
"isStudent": true,
"address": null,
"courses": ["Math", "Science", "History"],
"grades": {
"Math": 95,
"Science": 89,
"History": 76
}
}
```
# Python JSON
Python too supports JSON with a built-in package called `json`. This package provides all the necessary tools for working with JSON Objects including `parsing, serializing, deserializing, and many more`.
## 1. Python parse JSON string.
- To parse JSON string Python firstly we import the JSON module.
- JSON string is converted to a Python object using `json.loads()` method of JSON module in Python.
- Example Code:
```python
# Python program to convert JSON to Python
import json
# JSON string
students ='{"id":"01", "name": "Yatharth", "department":"Computer Science Engineering"}'
# Convert string to Python dict
students_dict = json.loads(students)
print(students_dict)
print(students_dict['name'])
```
- Ouput:
```json
{"id": "01", "name": "Yatharth", "department": "Computer Science Engineering"}
```
## 2. Python load JSON file.
- JSON data can also be directly fetch from a json file
- Example:
```python
import json
# Opening JSON file
f = open('input.json',)
# Returns JSON object as a dictionary
data = json.load(f)
# Iterating through the json file
for i in data['students']:
print(i)
# Closing file
f.close()
```
- JSON file
```json
{
"students":{
{
"id": "01",
"name": "Yatharth",
"department": "Computer Science Engineering"
},
{
"id": "02",
"name": "Raj",
"department": "Mechanical Engineering"
}
}
}
```
- Ouput
```json
{"id": "01", "name": "Yatharth", "department": "Computer Science Engineering"}
{"id": "02", "name": "Raj", "department": "Mechanical Engineering"}
```
- `json.load()`: Reads JSON data from a file object and deserializes it into a Python object.
- `json.loads()`: Deserializes JSON data from a string into a Python object.
## Addtiotnal Context
Relation between python data types and json data types is given in table below.
| Python Object | JSON Object |
|-----------------|-------------|
| Dict | object |
| list, tuple | array |
| str | string |
| int, long, float | numbers |
| True | true |
| False | false |
| None | null |
## 3. Python Dictionary to JSON String
- Parsing python dictionary to json string using `json.dumps()`.
- Example Code:
```python
import json
# Data to be written
dictionary ={
"id": "03",
"name": "Suraj",
"department": "Civil Engineering"
}
# Serializing json
json_object = json.dumps(dictionary, indent = 4)
print(json_object)
```
- Output:
``` json
{
"department": "Civil Engineering",
"id": "02",
"name": "Suraj"
}
```
## 4. Python Dictionary to JSON file.
- - Parsing python dictionary to json string using `json.dump()`.
- Example Code:
``` python
import json
# Data to be written
dictionary ={
"name" : "Satyendra",
"rollno" : 51,
"cgpa" : 8.8,
"phonenumber" : "123456789"
}
with open("sample.json", "w") as outfile:
json.dump(dictionary, outfile)
```
- Ouput: `sample.json`
``` json
{
"name" : "Satyendra",
"rollno" : 51,
"cgpa" : 8.8,
"phonenumber" : "123456789"
}
```
## 5. Append Python Dictionary to JSON String.
- Append to an already existing string using `json.update()`.
- Example :
```python
import json
# JSON data:
x = {
"id": "03",
"name": "Suraj"
}
# python object to be appended
y = { "department": "Civil Engineering"}
# parsing JSON string:
z = json.loads(x)
# appending the data
z.update(y)
# the result is a JSON string:
print(json.dumps(z))
```
- Ouput:
```json
{"id": "03", "name": "Suraj", "department": "Civil Engineering"}
```
## 6. Append Python Dictionary to JSON File.
- There is no direct function to append in file. So, we will load file in a dictionary, update dictionary then update content and convert back to json file format.
- `data.json`
``` json
{
"students":{
{
"id": "01",
"name": "Yatharth",
"department": "Computer Science Engineering"
},
{
"id": "02",
"name": "Raj",
"department": "Mechanical Engineering"
}
}
}
```
- Example Code:
``` python
import json
# function to add to JSON
def write_json(new_data, filename='data.json'):
with open(filename,'r+') as file:
# First we load existing data into a dict.
file_data = json.load(file)
# Join new_data with file_data inside students
file_data["students"].append(new_data)
# Sets file's current position at offset.
file.seek(0)
# convert back to json.
json.dump(file_data, file, indent = 4)
# python object to be appended
y = {
"id": "03",
"name": "Suraj",
"department": "Civil Engineering"
}
write_json(y)
```
- Output:
```json
{
"students":{
{
"id": "01",
"name": "Yatharth",
"department": "Computer Science Engineering"
},
{
"id": "02",
"name": "Raj",
"department": "Mechanical Engineering"
},
{
"id": "03",
"name": "Suraj",
"department": "Civil Engineering"
}
}
}
```
The Python json module simplifies the handling of JSON data, offering a bridge between Python data structures and JSON representations, vital for data exchange and storage in modern applications.

Wyświetl plik

@ -0,0 +1,88 @@
# Lambda Function
Lambda functions in Python are small, anonymous functions that can be created on-the-fly. They are defined using the `lambda` keyword instead of the `def` keyword used for regular functions. Lambda functions are typically used for simple tasks where a full-blown function definition is not necessary.
Here's an example of a lambda function that adds two numbers:
```python
add = lambda x, y: x + y
print(add(3, 5)) # Output: 8
```
The above lambda function is equivalent to the following regular function:
```python
def add(x, y):
return x + y
print(add(3, 5)) # Output: 8
```
The difference between a regular function and a lambda function lies mainly in syntax and usage. Here are some key distinctions:
1. **Syntax**: Lambda functions are defined using the `lambda` keyword, followed by parameters and a colon, while regular functions use the `def` keyword, followed by the function name, parameters, and a colon.
2. **Name**: Lambda functions are anonymous; they do not have a name like regular functions. Regular functions are defined with a name.
3. **Complexity**: Lambda functions are suitable for simple, one-liner tasks. They are not meant for complex operations or tasks that require multiple lines of code. Regular functions can handle more complex logic and can contain multiple statements and lines of code.
4. **Usage**: Lambda functions are often used in situations where a function is needed as an argument to another function (e.g., sorting, filtering, mapping), or when you want to write concise code without defining a separate function.
Lambda functions are used primarily for convenience and brevity in situations where a full function definition would be overkill or too cumbersome. They are handy for tasks that require a small, one-time function and can improve code readability when used judiciously.
## Use Cases
1. **Sorting**: Lambda functions are often used as key functions for sorting lists, dictionaries, or other data structures based on specific criteria. For example:
```python
students = [
{"name": "Alice", "age": 20},
{"name": "Bob", "age": 18},
{"name": "Charlie", "age": 22}
]
sorted_students = sorted(students, key=lambda x: x["age"])
```
2. **Filtering**: Lambda functions can be used with filter() to selectively include elements from a collection based on a condition. For instance:
```python
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
even_numbers = list(filter(lambda x: x % 2 == 0, numbers))
```
3. **Mapping**: Lambda functions are useful with map() to apply a transformation to each element of a collection. For example:
```python
numbers = [1, 2, 3, 4, 5]
squared_numbers = list(map(lambda x: x**2, numbers))
```
4. **Event Handling**: In GUI programming or event-driven systems, lambda functions can be used as event handlers to execute specific actions when an event occurs. For instance:
```python
button.clicked.connect(lambda: self.on_button_click(argument))
```
5. **Callback Functions**: Lambda functions can be passed as callback functions to other functions, especially when a simple operation needs to be performed in response to an event. For example:
```python
def process_data(data, callback):
# Process data
result = ...
# Execute callback function
callback(result)
process_data(data, lambda x: print("Result:", x))
```
6. **Anonymous Functions in Higher-Order Functions**: Lambda functions are commonly used with higher-order functions such as reduce(), which applies a rolling computation to sequential pairs of values in a list. For example:
```python
from functools import reduce
numbers = [1, 2, 3, 4, 5]
sum_of_numbers = reduce(lambda x, y: x + y, numbers)
```
These are just a few examples of how lambda functions can be applied in Python to simplify code and make it more expressive. They are particularly useful in situations where a small, one-time function is needed and defining a separate named function would be excessive.
In conclusion, **lambda functions** in Python offer a concise and powerful way to handle simple tasks without the need for full function definitions. Their versatility, especially in scenarios like sorting, filtering, and event handling, makes them valuable tools for improving code readability and efficiency. By mastering lambda functions, you can enhance your Python programming skills and tackle various tasks with elegance and brevity.

Wyświetl plik

@ -0,0 +1,96 @@
## Regular Expressions in Python
Regular expressions (regex) are a powerful tool for pattern matching and text manipulation.
Python's re module provides comprehensive support for regular expressions, enabling efficient text processing and validation.
## 1. Introduction to Regular Expressions
A regular expression is a sequence of characters defining a search pattern. Common use cases include validating input, searching within text, and extracting
specific patterns.
## 2. Basic Syntax
Literal Characters: Match exact characters (e.g., abc matches "abc").
Metacharacters: Special characters like ., *, ?, +, ^, $, [ ], and | used to build patterns.
**Common Metacharacters:**
* .: Any character except newline.
* ^: Start of the string.
* $: End of the string.
* *: 0 or more repetitions.
* +: 1 or more repetitions.
* ?: 0 or 1 repetition.
* []: Any one character inside brackets (e.g., [a-z]).
* |: Either the pattern before or after.
## 3. Using the re Module
**Key functions in the re module:**
* re.match(): Checks for a match at the beginning of the string.
* re.search(): Searches for a match anywhere in the string.
* re.findall(): Returns a list of all matches.
* re.sub(): Replaces matches with a specified string.
Examples:
```bash
import re
# Match at the beginning
print(re.match(r'\d+', '123abc').group()) # Output: 123
# Search anywhere
print(re.search(r'\d+', 'abc123').group()) # Output: 123
# Find all matches
print(re.findall(r'\d+', 'abc123def456')) # Output: ['123', '456']
# Substitute matches
print(re.sub(r'\d+', '#', 'abc123def456')) # Output: abc#def#
```
## 4. Compiling Regular Expressions
Compiling regular expressions improves performance for repeated use.
Example:
```bash
import re
pattern = re.compile(r'\d+')
print(pattern.match('123abc').group()) # Output: 123
print(pattern.search('abc123').group()) # Output: 123
print(pattern.findall('abc123def456')) # Output: ['123', '456']
```
## 5. Groups and Capturing
Parentheses () group and capture parts of the match.
Example:
```bash
import re
match = re.match(r'(\d{3})-(\d{2})-(\d{4})', '123-45-6789')
if match:
print(match.group()) # Output: 123-45-6789
print(match.group(1)) # Output: 123
print(match.group(2)) # Output: 45
print(match.group(3)) # Output: 6789
```
## 6. Special Sequences
Special sequences are shortcuts for common patterns:
* \d: Any digit.
* \D: Any non-digit.
* \w: Any alphanumeric character.
* \W: Any non-alphanumeric character.
* \s: Any whitespace character.
* \S: Any non-whitespace character.
Example:
```bash
import re
print(re.search(r'\w+@\w+\.\w+', 'Contact: support@example.com').group()) # Output: support@example.com
```
## Summary
Regular expressions are a versatile tool for text processing in Python. The re module offers powerful functions and metacharacters for pattern matching,
searching, and manipulation, making it an essential skill for handling complex text processing tasks.

Wyświetl plik

@ -1,3 +1,3 @@
# List of sections
- [Section title](filename.md)
- [Introduction to MySQL and Queries](intro_mysql_queries.md)

Wyświetl plik

@ -0,0 +1,371 @@
# Introduction to MySQL Queries
MySQL is a widely-used open-source relational database management system (RDBMS) that utilizes SQL (Structured Query Language) for managing and querying data. In Python, the **mysql-connector-python** library allows you to connect to MySQL databases and execute SQL queries, providing a way to interact with the database from within a Python program.
## Prerequisites
* Python and MySQL Server must be installed and configured.
* The library: **mysql-connector-python** must be installed.
## Establishing connection with server
To establish a connection with the MySQL server, you need to import the **mysql.connector** module and create a connection object using the **connect()** function by providing the prompt server details as mentioned.
```python
import mysql.connector
con = mysql.connector.connect(
host ="localhost",
user ="root",
passwd ="12345"
)
print((con.is_connected()))
```
Having established a connection with the server, you get the following output :
```
True
```
## Creating a Database [CREATE]
To create a database, you need to execute the **CREATE DATABASE** query. The following code snippet demonstrates how to create a database named **GSSOC**.
```python
import mysql.connector
# Establish the connection
conn = mysql.connector.connect(
host="localhost",
user="root",
password="12345"
)
# Create a cursor object
cursor = conn.cursor()
# Execute the query to show databases
cursor.execute("SHOW DATABASES")
# Fetch and print the databases
databases = cursor.fetchall()
for database in databases:
print(database[0])
# Execute the query to create database GSSOC
cursor.execute("CREATE DATABASE GSSOC")
print("\nAfter creation of the database\n")
# Execute the query to show databases
cursor.execute("SHOW DATABASES")
# Fetch and print the databases
databases = cursor.fetchall()
for database in databases:
print(database[0])
cursor.close()
conn.close()
```
You can observe in the output below, after execution of the query a new database named **GSSOC** has been created.
#### Output:
```
information_schema
mysql
performance_schema
sakila
sys
world
After creation of the database
gssoc
information_schema
mysql
performance_schema
sakila
sys
world
```
## Creating a Table in the Database [CREATE]
Now, we will create a table in the database. We will create a table named **example_table** in the database **GSSOC**. We will execute **CREATE TABLE** query and provide the fields for the table as mentioned in the code below:
```python
import mysql.connector
# Establish the connection
conn = mysql.connector.connect(
host="localhost",
user="root",
password="12345"
)
# Create a cursor object
cursor = conn.cursor()
# Execute the query to show tables
cursor.execute("USE GSSOC")
cursor.execute("SHOW TABLES")
# Fetch and print the tables
tables = cursor.fetchall()
print("Before creation of table\n")
for table in tables:
print(table[0])
create_table_query = """
CREATE TABLE example_table (
name VARCHAR(255) NOT NULL,
age INT NOT NULL,
email VARCHAR(255)
)
"""
# Execute the query
cursor.execute(create_table_query)
# Commit the changes
conn.commit()
print("\nAfter creation of Table\n")
# Execute the query to show tables in GSSOC
cursor.execute("SHOW TABLES")
# Fetch and print the tables
tables = cursor.fetchall()
for table in tables:
print(table[0])
cursor.close()
conn.close()
```
#### Output:
```
Before creation of table
After creation of Table
example_table
```
## Inserting Data [INSERT]
To insert data in an existing table, the **INSERT INTO** query is used, followed by the name of the table in which the data needs to be inserted. The following code demonstrates the insertion of multiple records in the table by **executemany()**.
```python
import mysql.connector
# Establish the connection
conn = mysql.connector.connect(
host="localhost",
user="root",
password="12345"
)
# Create a cursor object
cursor = conn.cursor()
cursor.execute("USE GSSOC")
# SQL query to insert data
insert_data_query = """
INSERT INTO example_table (name, age, email)
VALUES (%s, %s, %s)
"""
# Data to be inserted
data_to_insert = [
("John Doe", 28, "john.doe@example.com"),
("Jane Smith", 34, "jane.smith@example.com"),
("Sam Brown", 22, "sam.brown@example.com")
]
# Execute the query for each data entry
cursor.executemany(insert_data_query, data_to_insert)
conn.commit()
cursor.close()
conn.close()
```
## Displaying Data [SELECT]
To display the data from a table, the **SELECT** query is used. The following code demonstrates the display of data from the table.
```python
import mysql.connector
# Establish the connection
conn = mysql.connector.connect(
host="localhost",
user="root",
password="12345"
)
# Create a cursor object
cursor = conn.cursor()
cursor.execute("USE GSSOC")
# SQL query to display data
display_data_query = "SELECT * FROM example_table"
# Execute the query for each data entry
cursor.execute(display_data_query)
# Fetch all the rows
rows = cursor.fetchall()
# Print the column names
column_names = [desc[0] for desc in cursor.description]
print(column_names)
# Print the rows
for row in rows:
print(row)
cursor.close()
conn.close()
```
#### Output :
```
['name', 'age', 'email']
('John Doe', 28, 'john.doe@example.com')
('Jane Smith', 34, 'jane.smith@example.com')
('Sam Brown', 22, 'sam.brown@example.com')
```
## Updating Data [UPDATE]
To update data in the table, **UPDATE** query is used. In the following code, we will be updating the email and age of the record where the name is John Doe.
```python
import mysql.connector
# Establish the connection
conn = mysql.connector.connect(
host="localhost",
user="root",
password="12345"
)
# Create a cursor object
cursor = conn.cursor()
cursor.execute("USE GSSOC")
# SQL query to display data
display_data_query = "SELECT * FROM example_table"
# SQL Query to update data of John Doe
update_data_query = """
UPDATE example_table
SET age = %s, email = %s
WHERE name = %s
"""
# Data to be updated
data_to_update = (30, "new.email@example.com", "John Doe")
# Execute the query
cursor.execute(update_data_query, data_to_update)
# Commit the changes
conn.commit()
# Execute the query for each data entry
cursor.execute(display_data_query)
# Fetch all the rows
rows = cursor.fetchall()
# Print the column names
column_names = [desc[0] for desc in cursor.description]
print(column_names)
# Print the rows
for row in rows:
print(row)
cursor.close()
conn.close()
```
#### Output:
```
['name', 'age', 'email']
('John Doe', 30, 'new.email@example.com')
('Jane Smith', 34, 'jane.smith@example.com')
('Sam Brown', 22, 'sam.brown@example.com')
```
## Deleting Data [DELETE]
In this segment, we will Delete the record named "John Doe" using the **DELETE** and **WHERE** statements in the query. The following code explains the same and the observe the change in output.
```python
import mysql.connector
# Establish the connection
conn = mysql.connector.connect(
host="localhost",
user="root",
password="12345"
)
# Create a cursor object
cursor = conn.cursor()
cursor.execute("USE GSSOC")
# SQL query to display data
display_data_query = "SELECT * FROM example_table"
# SQL query to delete data
delete_data_query = "DELETE FROM example_table WHERE name = %s"
# Data to be deleted
data_to_delete = ("John Doe",)
# Execute the query
cursor.execute(delete_data_query, data_to_delete)
# Commit the changes
conn.commit()
# Execute the query for each data entry
cursor.execute(display_data_query)
# Fetch all the rows
rows = cursor.fetchall()
# Print the column names
column_names = [desc[0] for desc in cursor.description]
print(column_names)
# Print the rows
for row in rows:
print(row)
cursor.close()
conn.close()
```
#### Output:
```
['name', 'age', 'email']
('Jane Smith', 34, 'jane.smith@example.com')
('Sam Brown', 22, 'sam.brown@example.com')
```
## Deleting the Table/Database [DROP]
For deleting a table, you can use the **DROP** query in the following manner:
```python
import mysql.connector
# Establish the connection
conn = mysql.connector.connect(
host="localhost",
user="root",
password="12345"
)
# Create a cursor object
cursor = conn.cursor()
cursor.execute("USE GSSOC")
# SQL query to delete the table
delete_table_query = "DROP TABLE IF EXISTS example_table"
# Execute the query
cursor.execute(delete_table_query)
# Verify the table deletion
cursor.execute("SHOW TABLES LIKE 'example_table'")
result = cursor.fetchone()
cursor.close()
conn.close()
if result:
print("Table deletion failed.")
else:
print("Table successfully deleted.")
```
#### Output:
```
Table successfully deleted.
```
Similarly, you can delete the database also by using the **DROP** and accordingly changing the query to be executed.

Wyświetl plik

@ -0,0 +1,130 @@
# Queues in Python
A queue is a linear data structure where elements are added at the back (enqueue) and removed from the front (dequeue). Imagine a line at a coffee shop, the first in line (front) gets served first, and new customers join at the back. This FIFO approach ensures order and fairness in processing elements.
Queues offer efficient implementations for various scenarios. They are often used in:
- **Task Scheduling** - Operating systems utilize queues to manage processes waiting for CPU time.
- **Breadth-first search algorithms** - Traversing a tree or graph involves exploring neighbouring nodes level by level, often achieved using a queue.
- **Message passing** - Communication protocols leverage queues to buffer messages between applications for reliable delivery.
## Types of Queue
A queue can be classified into 4 types -
- **Simple Queue** - A simple queue is a queue, where we can only insert an element at the back and remove the element from the front of the queue, this type of queue follows the FIFO principle.
- **Double-Ended Queue (Dequeue)** - In this type of queue, insertions and deletions of elements can be performed from both ends of the queue.<br>
Double-ended queues can be classified into 2 types ->
- **Input-Restricted Queue**
- **Output-Restricted Queue**
- **Circular Queue** - It is a special type of queue where the back is connected to the front, where the operations follow the FIFO principle.
- **Priority Queue** - In this type of queue, elements are accessed based on their priority in the queue. <br>
Priority queues are of 2 types ->
- **Ascending Priority Queue**
- **Descending Priority Queue**
## Real Life Examples of Queues
- **Customer Service** - Consider how a customer service phone line works. Customers calling are put into a queue. The first customer to call is the first one to be served (FIFO). As more customers call, they are added to the end of the queue, and as customers are served, they are removed from the front. The entire process follows the queue data structure.
- **Printers** - Printers operate using a queue to manage print jobs. When a user sends a document to the printer, the job is added to the queue (enqueue). Once a job completes printing, it's removed from the queue (dequeue), and the next job in line starts. This sequential order of handling tasks perfectly exhibits the queue data structure.
- **Computer Memory** - Certain types of computer memory use a queue data structure to hold and process instructions. For example, in a computer's cache memory, the fetch-decode-execute cycle of an instruction follows a queue. The first instruction fetched is the first one to be decoded and executed, while new instructions fetched are added to the rear.
<hr>
# Important Terminologies in Queues
Understanding these terms is crucial for working with queues:
- **Enqueue** - Adding an element to the back of the queue.
- **Dequeue** - Removing the element at the front of the queue.
- **Front** - The first element in the queue, to be removed next.
- **Rear/Back** - The last element in the queue, where new elements are added.
- **Empty Queue** - A queue with no elements.
- **Overflow** - Attempting to enqueue an element when the queue is full.
- **Underflow** - Attempting to dequeue an element from an empty queue.
## Operations on a Queue
There are some key operations in a queue that include -
- **isFULL** - This operation checks if a queue is full.
- **isEMPTY** - This operation checks if a queue is empty.
- **Display** - This operation displays the queue elements.
- **Peek** - This operation is the process of getting the front value of a queue, without removing it. (i.e., Value at the front).
<hr>
# Implementation of Queue
```python
def isEmpty(Qu):
if Qu == []:
return True
else:
return False
def Enqueue(Qu, item) :
Qu.append(item)
if len(Qu) == 1:
front = rear = 0
else:
rear = len(Qu) - 1
print(item, "enqueued to queue")
def Dequeue(Qu):
if isEmpty(Qu):
print("Underflow")
else:
item = Qu.pop(0)
if len(Qu) == 0: #if it was single-element queue
front = rear = None
print(item, "dequeued from queue")
def Peek(Qu):
if isEmpty(Qu):
print("Underflow")
else:
front = 0
print("Frontmost item is :", Qu[front])
def Display(Qu):
if isEmpty(Qu):
print("Queue Empty!")
elif len(Qu) == 1:
print(Qu[0], "<== front, rear")
else:
front = 0
rear = len(Qu) - 1
print(Qu[front], "<-front")
for a in range(1, rear):
print(Qu[a])
print(Qu[rear], "<-rear")
queue = [] #initially queue is empty
front = None
# Example Usage
Enqueue(queue, 1)
Enqueue(queue, 2)
Enqueue(queue, 3)
Dequeue(queue)
Peek(queue)
Display(queue)
```
## Output
```
1 enqueued to queue
2 enqueued to queue
3 enqueued to queue
1 dequeued from queue
Frontmost item is : 2
2 <-front
3 <-rear
```
## Complexity Analysis
- **Worst case**: `O(n^2)` This occurs when the code performs lots of display operations.
- **Best case**: `O(n)` If the code mostly performs enqueue, dequeue and peek operations.
- **Average case**: `O(n^2)` It occurs when the number of operations in display are more than the operations in enqueue, dequeue and peek.

Wyświetl plik

@ -0,0 +1,54 @@
# Divide and Conquer Algorithms
Divide and Conquer is a paradigm for solving problems that involves breaking a problem into smaller sub-problems, solving the sub-problems recursively, and then combining their solutions to solve the original problem.
## Merge Sort
Merge Sort is a popular sorting algorithm that follows the divide and conquer strategy. It divides the input array into two halves, recursively sorts the halves, and then merges them.
**Algorithm Overview:**
- **Divide:** Divide the unsorted list into two sublists of about half the size.
- **Conquer:** Recursively sort each sublist.
- **Combine:** Merge the sorted sublists back into one sorted list.
```python
def merge_sort(arr):
if len(arr) > 1:
mid = len(arr) // 2
left_half = arr[:mid]
right_half = arr[mid:]
merge_sort(left_half)
merge_sort(right_half)
i = j = k = 0
while i < len(left_half) and j < len(right_half):
if left_half[i] < right_half[j]:
arr[k] = left_half[i]
i += 1
else:
arr[k] = right_half[j]
j += 1
k += 1
while i < len(left_half):
arr[k] = left_half[i]
i += 1
k += 1
while j < len(right_half):
arr[k] = right_half[j]
j += 1
k += 1
arr = [12, 11, 13, 5, 6, 7]
merge_sort(arr)
print("Sorted array:", arr)
```
## Complexity Analysis
- **Time Complexity:** O(n log n) in all cases
- **Space Complexity:** O(n) additional space for the merge operation
---

Wyświetl plik

@ -0,0 +1,132 @@
# Dynamic Programming
Dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems and solving each subproblem only once. It stores the solutions to subproblems to avoid redundant computations, making it particularly useful for optimization problems where the solution can be obtained by combining solutions to smaller subproblems.
## Real-Life Examples of Dynamic Programming
- **Fibonacci Sequence:** Computing the nth Fibonacci number efficiently.
- **Shortest Path:** Finding the shortest path in a graph from a source to a destination.
- **String Edit Distance:** Calculating the minimum number of operations required to transform one string into another.
- **Knapsack Problem:** Maximizing the value of items in a knapsack without exceeding its weight capacity.
# Some Common Dynamic Programming Techniques
# 1. Fibonacci Sequence
The Fibonacci sequence is a classic example used to illustrate dynamic programming. It is a series of numbers where each number is the sum of the two preceding ones, usually starting with 0 and 1.
**Algorithm Overview:**
- **Base Cases:** The first two numbers in the Fibonacci sequence are defined as 0 and 1.
- **Memoization:** Store the results of previously computed Fibonacci numbers to avoid redundant computations.
- **Recurrence Relation:** Compute each Fibonacci number by adding the two preceding numbers.
## Fibonacci Sequence Code in Python (Top-Down Approach with Memoization)
```python
def fibonacci(n, memo={}):
if n in memo:
return memo[n]
if n <= 1:
return n
memo[n] = fibonacci(n-1, memo) + fibonacci(n-2, memo)
return memo[n]
n = 10
print(f"The {n}th Fibonacci number is: {fibonacci(n)}.")
```
## Fibonacci Sequence Code in Python (Bottom-Up Approach)
```python
def fibonacci(n):
fib = [0, 1]
for i in range(2, n + 1):
fib.append(fib[i - 1] + fib[i - 2])
return fib[n]
n = 10
print(f"The {n}th Fibonacci number is: {fibonacci(n)}.")
```
## Complexity Analysis
- **Time Complexity**: O(n) for both approaches
- **Space Complexity**: O(n) for the top-down approach (due to memoization), O(1) for the bottom-up approach
</br>
<hr>
</br>
# 2. Longest Common Subsequence
The longest common subsequence (LCS) problem is to find the longest subsequence common to two sequences. A subsequence is a sequence that appears in the same relative order but not necessarily contiguous.
**Algorithm Overview:**
- **Base Cases:** If one of the sequences is empty, the LCS is empty.
- **Memoization:** Store the results of previously computed LCS lengths to avoid redundant computations.
- **Recurrence Relation:** Compute the LCS length by comparing characters of the sequences and making decisions based on whether they match.
## Longest Common Subsequence Code in Python (Top-Down Approach with Memoization)
```python
def longest_common_subsequence(X, Y, m, n, memo={}):
if (m, n) in memo:
return memo[(m, n)]
if m == 0 or n == 0:
return 0
if X[m - 1] == Y[n - 1]:
memo[(m, n)] = 1 + longest_common_subsequence(X, Y, m - 1, n - 1, memo)
else:
memo[(m, n)] = max(longest_common_subsequence(X, Y, m, n - 1, memo),
longest_common_subsequence(X, Y, m - 1, n, memo))
return memo[(m, n)]
X = "AGGTAB"
Y = "GXTXAYB"
print("Length of Longest Common Subsequence:", longest_common_subsequence(X, Y, len(X), len(Y)))
```
## Complexity Analysis
- **Time Complexity**: O(m * n) for the top-down approach, where m and n are the lengths of the input sequences
- **Space Complexity**: O(m * n) for the memoization table
</br>
<hr>
</br>
# 3. 0-1 Knapsack Problem
The 0-1 knapsack problem is a classic optimization problem where the goal is to maximize the total value of items selected while keeping the total weight within a specified limit.
**Algorithm Overview:**
- **Base Cases:** If the capacity of the knapsack is 0 or there are no items to select, the total value is 0.
- **Memoization:** Store the results of previously computed subproblems to avoid redundant computations.
- **Recurrence Relation:** Compute the maximum value by considering whether to include the current item or not.
## 0-1 Knapsack Problem Code in Python (Top-Down Approach with Memoization)
```python
def knapsack(weights, values, capacity, n, memo={}):
if (capacity, n) in memo:
return memo[(capacity, n)]
if n == 0 or capacity == 0:
return 0
if weights[n - 1] > capacity:
memo[(capacity, n)] = knapsack(weights, values, capacity, n - 1, memo)
else:
memo[(capacity, n)] = max(values[n - 1] + knapsack(weights, values, capacity - weights[n - 1], n - 1, memo),
knapsack(weights, values, capacity, n - 1, memo))
return memo[(capacity, n)]
weights = [10, 20, 30]
values = [60, 100, 120]
capacity = 50
n = len(weights)
print("Maximum value that can be obtained:", knapsack(weights, values, capacity, n))
```
## Complexity Analysis
- **Time Complexity**: O(n * W) for the top-down approach, where n is the number of items and W is the capacity of the knapsack
- **Space Complexity**: O(n * W) for the memoization table
</br>
<hr>
</br>

Wyświetl plik

@ -0,0 +1,219 @@
# Graph Data Stucture
Graph is a non-linear data structure consisting of vertices and edges. It is a powerful tool for representing and analyzing complex relationships between objects or entities.
## Components of a Graph
1. **Vertices:** Vertices are the fundamental units of the graph. Sometimes, vertices are also known as vertex or nodes. Every node/vertex can be labeled or unlabeled.
2. **Edges:** Edges are drawn or used to connect two nodes of the graph. It can be ordered pair of nodes in a directed graph. Edges can connect any two nodes in any possible way. There are no rules. very edge can be labelled/unlabelled.
## Basic Operations on Graphs
- Insertion of Nodes/Edges in the graph
- Deletion of Nodes/Edges in the graph
- Searching on Graphs
- Traversal of Graphs
## Types of Graph
**1. Undirected Graph:** In an undirected graph, edges have no direction, and they represent symmetric relationships between nodes. If there is an edge between node A and node B, you can travel from A to B and from B to A.
**2. Directed Graph (Digraph):** In a directed graph, edges have a direction, indicating a one-way relationship between nodes. If there is an edge from node A to node B, you can travel from A to B but not necessarily from B to A.
**3. Weighted Graph:** In a weighted graph, edges have associated weights or costs. These weights can represent various attributes such as distance, cost, or capacity. Weighted graphs are commonly used in applications like route planning or network optimization.
**4. Cyclic Graph:** A cyclic graph contains at least one cycle, which is a path that starts and ends at the same node. In other words, you can traverse the graph and return to a previously visited node by following the edges.
**5. Acyclic Graph:** An acyclic graph, as the name suggests, does not contain any cycles. This type of graph is often used in scenarios where a cycle would be nonsensical or undesirable, such as representing dependencies between tasks or events.
**6. Tree:** A tree is a special type of acyclic graph where each node has a unique parent except for the root node, which has no parent. Trees have a hierarchical structure and are frequently used in data structures like binary trees or decision trees.
## Representation of Graphs
There are two ways to store a graph:
1. **Adjacency Matrix:**
In this method, the graph is stored in the form of the 2D matrix where rows and columns denote vertices. Each entry in the matrix represents the weight of the edge between those vertices.
```python
def create_adjacency_matrix(graph):
num_vertices = len(graph)
adj_matrix = [[0] * num_vertices for _ in range(num_vertices)]
for i in range(num_vertices):
for j in range(num_vertices):
if graph[i][j] == 1:
adj_matrix[i][j] = 1
adj_matrix[j][i] = 1
return adj_matrix
graph = [
[0, 1, 0, 0],
[1, 0, 1, 0],
[0, 1, 0, 1],
[0, 0, 1, 0]
]
adj_matrix = create_adjacency_matrix(graph)
for row in adj_matrix:
print(' '.join(map(str, row)))
```
2. **Adjacency List:**
In this method, the graph is represented as a collection of linked lists. There is an array of pointer which points to the edges connected to that vertex.
```python
def create_adjacency_list(edges, num_vertices):
adj_list = [[] for _ in range(num_vertices)]
for u, v in edges:
adj_list[u].append(v)
adj_list[v].append(u)
return adj_list
if __name__ == "__main__":
num_vertices = 4
edges = [(0, 1), (0, 2), (1, 2), (2, 3), (3, 1)]
adj_list = create_adjacency_list(edges, num_vertices)
for i in range(num_vertices):
print(f"{i} -> {' '.join(map(str, adj_list[i]))}")
```
`Output`
`0 -> 1 2`
`1 -> 0 2 3`
`2 -> 0 1 3`
`3 -> 2 1 `
# Traversal Techniques
## Breadth First Search (BFS)
- It is a graph traversal algorithm that explores all the vertices in a graph at the current depth before moving on to the vertices at the next depth level.
- It starts at a specified vertex and visits all its neighbors before moving on to the next level of neighbors.
BFS is commonly used in algorithms for pathfinding, connected components, and shortest path problems in graphs.
**Steps of BFS algorithms**
- **Step 1:** Initially queue and visited arrays are empty.
- **Step 2:** Push node 0 into queue and mark it visited.
- **Step 3:** Remove node 0 from the front of queue and visit the unvisited neighbours and push them into queue.
- **Step 4:** Remove node 1 from the front of queue and visit the unvisited neighbours and push them into queue.
- **Step 5:** Remove node 2 from the front of queue and visit the unvisited neighbours and push them into queue.
- **Step 6:** Remove node 3 from the front of queue and visit the unvisited neighbours and push them into queue.
- **Step 7:** Remove node 4 from the front of queue and visit the unvisited neighbours and push them into queue.
```python
from collections import deque
def bfs(adjList, startNode, visited):
q = deque()
visited[startNode] = True
q.append(startNode)
while q:
currentNode = q.popleft()
print(currentNode, end=" ")
for neighbor in adjList[currentNode]:
if not visited[neighbor]:
visited[neighbor] = True
q.append(neighbor)
def addEdge(adjList, u, v):
adjList[u].append(v)
def main():
vertices = 5
adjList = [[] for _ in range(vertices)]
addEdge(adjList, 0, 1)
addEdge(adjList, 0, 2)
addEdge(adjList, 1, 3)
addEdge(adjList, 1, 4)
addEdge(adjList, 2, 4)
visited = [False] * vertices
print("Breadth First Traversal", end=" ")
bfs(adjList, 0, visited)
if __name__ == "__main__": #Output : Breadth First Traversal 0 1 2 3 4
main()
```
- **Time Complexity:** `O(V+E)`, where V is the number of nodes and E is the number of edges.
- **Auxiliary Space:** `O(V)`
## Depth-first search
Depth-first search is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible along each branch before backtracking.
**Steps of DFS algorithms**
- **Step 1:** Initially stack and visited arrays are empty.
- **Step 2:** Visit 0 and put its adjacent nodes which are not visited yet into the stack.
- **Step 3:** Now, Node 1 at the top of the stack, so visit node 1 and pop it from the stack and put all of its adjacent nodes which are not visited in the stack.
- **Step 4:** Now, Node 2 at the top of the stack, so visit node 2 and pop it from the stack and put all of its adjacent nodes which are not visited (i.e, 3, 4) in the stack.
- **Step 5:** Now, Node 4 at the top of the stack, so visit node 4 and pop it from the stack and put all of its adjacent nodes which are not visited in the stack.
- **Step 6:** Now, Node 3 at the top of the stack, so visit node 3 and pop it from the stack and put all of its adjacent nodes which are not visited in the stack.
```python
from collections import defaultdict
class Graph:
def __init__(self):
self.graph = defaultdict(list)
def addEdge(self, u, v):
self.graph[u].append(v)
def DFSUtil(self, v, visited):
visited.add(v)
print(v, end=' ')
for neighbour in self.graph[v]:
if neighbour not in visited:
self.DFSUtil(neighbour, visited)
def DFS(self, v):
visited = set()
self.DFSUtil(v, visited)
if __name__ == "__main__":
g = Graph()
g.addEdge(0, 1)
g.addEdge(0, 2)
g.addEdge(1, 2)
g.addEdge(2, 0)
g.addEdge(2, 3)
g.addEdge(3, 3)
print("Depth First Traversal (starting from vertex 2): ",g.DFS(2))
```
`Output: Depth First Traversal (starting from vertex 2): 2 0 1 3 `
- **Time complexity:** `O(V + E)`, where V is the number of vertices and E is the number of edges in the graph.
- **Auxiliary Space:** `O(V + E)`, since an extra visited array of size V is required, And stack size for iterative call to DFS function.
</br>

Wyświetl plik

@ -0,0 +1,135 @@
# Greedy Algorithms
Greedy algorithms are simple, intuitive algorithms that make a sequence of choices at each step with the hope of finding a global optimum. They are called "greedy" because at each step, they choose the most advantageous option without considering the future consequences. Despite their simplicity, greedy algorithms are powerful tools for solving optimization problems, especially when the problem exhibits the greedy-choice property.
## Real-Life Examples of Greedy Algorithms
- **Coin Change:** Finding the minimum number of coins to make a certain amount of change.
- **Job Scheduling:** Assigning tasks to machines to minimize completion time.
- **Huffman Coding:** Constructing an optimal prefix-free binary code for data compression.
- **Fractional Knapsack:** Selecting items to maximize the value within a weight limit.
# Some Common Greedy Algorithms
# 1. Coin Change Problem
The coin change problem is a classic example of a greedy algorithm. Given a set of coin denominations and a target amount, the objective is to find the minimum number of coins required to make up that amount.
**Algorithm Overview:**
- **Greedy Strategy:** At each step, the algorithm selects the largest denomination coin that is less than or equal to the remaining amount.
- **Repeat Until Amount is Zero:** The process continues until the remaining amount becomes zero.
## Coin Change Code in Python
```python
def coin_change(coins, amount):
coins.sort(reverse=True)
num_coins = 0
for coin in coins:
num_coins += amount // coin
amount %= coin
if amount == 0:
return num_coins
else:
return -1
coins = [1, 5, 10, 25]
amount = 63
result = coin_change(coins, amount)
if result != -1:
print(f"Minimum number of coins required: {result}.")
else:
print("It is not possible to make the amount with the given denominations.")
```
## Complexity Analysis
- **Time Complexity**: O(n log n) for sorting (if not pre-sorted), O(n) for iteration
- **Space Complexity**: O(1)
</br>
<hr>
</br>
# 2. Activity Selection Problem
The activity selection problem involves selecting the maximum number of mutually compatible activities that can be performed by a single person or machine, assuming that a person can only work on one activity at a time.
**Algorithm Overview:**
- **Greedy Strategy:** Sort the activities based on their finish times.
- **Selecting Activities:** Iterate through the sorted activities, selecting each activity if it doesn't conflict with the previously selected ones.
## Activity Selection Code in Python
```python
def activity_selection(start, finish):
n = len(start)
activities = []
i = 0
activities.append(i)
for j in range(1, n):
if start[j] >= finish[i]:
activities.append(j)
i = j
return activities
start = [1, 3, 0, 5, 8, 5]
finish = [2, 4, 6, 7, 9, 9]
selected_activities = activity_selection(start, finish)
print("Selected activities:", selected_activities)
```
## Complexity Analysis
- **Time Complexity**: O(n log n) for sorting (if not pre-sorted), O(n) for iteration
- **Space Complexity**: O(1)
</br>
<hr>
</br>
# 3. Huffman Coding
Huffman coding is a method of lossless data compression that efficiently represents characters or symbols in a file. It uses variable-length codes to represent characters, with shorter codes assigned to more frequent characters.
**Algorithm Overview:**
- **Frequency Analysis:** Determine the frequency of each character in the input data.
- **Building the Huffman Tree:** Construct a binary tree where each leaf node represents a character and the path to the leaf node determines its code.
- **Assigning Codes:** Traverse the Huffman tree to assign codes to each character, with shorter codes for more frequent characters.
## Huffman Coding Code in Python
```python
from heapq import heappush, heappop, heapify
from collections import defaultdict
def huffman_coding(data):
frequency = defaultdict(int)
for char in data:
frequency[char] += 1
heap = [[weight, [symbol, ""]] for symbol, weight in frequency.items()]
heapify(heap)
while len(heap) > 1:
lo = heappop(heap)
hi = heappop(heap)
for pair in lo[1:]:
pair[1] = '0' + pair[1]
for pair in hi[1:]:
pair[1] = '1' + pair[1]
heappush(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:])
return sorted(heappop(heap)[1:], key=lambda p: (len(p[-1]), p))
data = "Huffman coding is a greedy algorithm"
encoded_data = huffman_coding(data)
print("Huffman Codes:")
for symbol, code in encoded_data:
print(f"{symbol}: {code}")
```
## Complexity Analysis
- **Time Complexity**: O(n log n) for heap operations, where n is the number of unique characters
- **Space Complexity**: O(n) for the heap
</br>
<hr>
</br>

Wyświetl plik

@ -1,4 +1,10 @@
# List of sections
- [Section title](filename.md)
- [Queues in Python](Queues.md)
- [Graphs](graph.md)
- [Sorting Algorithms](sorting-algorithms.md)
- [Recursion and Backtracking](recursion.md)
- [Divide and Conquer Algorithm](divide-and-conquer-algorithm.md)
- [Searching Algorithms](searching-algorithms.md)
- [Greedy Algorithms](greedy-algorithms.md)
- [Dynamic Programming](dynamic-programming.md)

Wyświetl plik

@ -0,0 +1,107 @@
# Introduction to Recursions
When a function calls itself to solve smaller instances of the same problem until a specified condition is fulfilled is called recursion. It is used for tasks that can be divided into smaller sub-tasks.
# How Recursion Works
To solve a problem using recursion we must define:
- Base condition :- The condition under which recursion ends.
- Recursive case :- The part of function which calls itself to solve a smaller instance of problem.
Steps of Recursion
When a recursive function is called, the following sequence of events occurs:
- Function Call: The function is invoked with a specific argument.
- Base Condition Check: The function checks if the argument satisfies the base case.
- Recursive Call: If the base case is not met, the function performs some operations and makes a recursive call with a modified argument.
- Stack Management: Each recursive call is placed on the call stack. The stack keeps track of each function call, its argument, and the point to return to once the call completes.
- Unwinding the Stack: When the base case is eventually met, the function returns a value, and the stack starts unwinding, returning values to previous function calls until the initial call is resolved.
# What is Stack Overflow in Recursion
Stack overflow is an error that occurs when the call stack memory limit is exceeded. During execution of recursion calls they are simultaneously stored in a recursion stack waiting for the recursive function to be completed. Without a base case, the function would call itself indefinitely, leading to a stack overflow.
# Example
- Factorial of a Number
The factorial of i natural numbers is nth integer multiplied by factorial of (i-1) numbers. The base case is if i=0 we return 1 as factorial of 0 is 1.
```python
def factorial(i):
#base case
if i==0 :
return 1
#recursive case
else :
return i * factorial(i-1)
i = 6
print("Factorial of i is :", factorial(i)) # Output- Factorial of i is :720
```
# What is Backtracking
Backtracking is a recursive algorithmic technique used to solve problems by exploring all possible solutions and discarding those that do not meet the problem's constraints. It is particularly useful for problems involving combinations, permutations, and finding paths in a grid.
# How Backtracking Works
- Incremental Solution Building: Solutions are built one step at a time.
- Feasibility Check: At each step, a check is made to see if the current partial solution is valid.
- Backtracking: If a partial solution is found to be invalid, the algorithm backtracks by removing the last added part of the solution and trying the next possibility.
- Exploration of All Possibilities: The process continues recursively, exploring all possible paths, until a solution is found or all possibilities are exhausted.
# Example
- Word Search
Given a 2D grid of characters and a word, determine if the word exists in the grid. The word can be constructed from letters of sequentially adjacent cells, where "adjacent" cells are horizontally or vertically neighboring. The same letter cell may not be used more than once.
Algorithm for Solving the Word Search Problem with Backtracking:
- Start at each cell: Attempt to find the word starting from each cell.
- Check all Directions: From each cell, try all four possible directions (up, down, left, right).
- Mark Visited Cells: Use a temporary marker to indicate cells that are part of the current path to avoid revisiting.
- Backtrack: If a path does not lead to a solution, backtrack by unmarking the visited cell and trying the next possibility.
```python
def exist(board, word):
rows, cols = len(board), len(board[0])
def backtrack(r, c, suffix):
if not suffix:
return True
if r < 0 or r >= rows or c < 0 or c >= cols or board[r][c] != suffix[0]:
return False
# Mark the cell as visited by replacing its character with a placeholder
ret = False
board[r][c], temp = '#', board[r][c]
# Explore the four possible directions
for row_offset, col_offset in [(0, 1), (1, 0), (0, -1), (-1, 0)]:
ret = backtrack(r + row_offset, c + col_offset, suffix[1:])
if ret:
break
# Restore the cell's original value
board[r][c] = temp
return ret
for row in range(rows):
for col in range(cols):
if backtrack(row, col, word):
return True
return False
# Test case
board = [
['A','B','C','E'],
['S','F','C','S'],
['A','D','E','E']
]
word = "ABCES"
print(exist(board, word)) # Output: True
```

Wyświetl plik

@ -0,0 +1,161 @@
# Searching Algorithms
Searching algorithms are techniques used to locate specific items within a collection of data. These algorithms are fundamental in computer science and are employed in various applications, from databases to web search engines.
## Real Life Example of Searching
- Searching for a word in a dictionary
- Searching for a specific book in a library
- Searching for a contact in your phone's address book
- Searching for a file on your computer, etc.
# Some common searching techniques
# 1. Linear Search
Linear search, also known as sequential search, is a straightforward searching algorithm that checks each element in a collection until the target element is found or the entire collection has been traversed. It is simple to implement but becomes inefficient for large datasets.
**Algorithm Overview:**
- **Sequential Checking:** The algorithm iterates through each element in the collection, starting from the first element.
- **Comparing Elements:** At each iteration, it compares the current element with the target element.
- **Finding the Target:** If the current element matches the target, the search terminates, and the index of the element is returned.
- **Completing the Search:** If the entire collection is traversed without finding the target, the algorithm indicates that the element is not present.
## Linear Search Code in Python
```python
def linear_search(arr, target):
for i in range(len(arr)):
if arr[i] == target:
return i
return -1
arr = [5, 3, 8, 1, 2]
target = 8
result = linear_search(arr, target)
if result != -1:
print(f"Element {target} found at index {result}.")
else:
print(f"Element {target} not found.")
```
## Complexity Analysis
- **Time Complexity**: O(n)
- **Space Complexity**: O(1)
</br>
<hr>
</br>
# 2. Binary Search
Binary search is an efficient searching algorithm that works on sorted collections. It repeatedly divides the search interval in half until the target element is found or the interval is empty. Binary search is significantly faster than linear search but requires the collection to be sorted beforehand.
**Algorithm Overview:**
- **Initial State:** Binary search starts with the entire collection as the search interval.
- **Divide and Conquer:** At each step, it calculates the middle element of the current interval and compares it with the target.
- **Narrowing Down the Interval:** If the middle element is equal to the target, the search terminates successfully. Otherwise, it discards half of the search interval based on the comparison result.
- **Repeating the Process:** The algorithm repeats this process on the remaining half of the interval until the target is found or the interval is empty.
## Binary Search Code in Python (Iterative)
```python
def binary_search(arr, target):
low = 0
high = len(arr) - 1
while low <= high:
mid = (low + high) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
low = mid + 1
else:
high = mid - 1
return -1
arr = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
target = 13
result = binary_search(arr, target)
if result != -1:
print(f"Element {target} found at index {result}.")
else:
print(f"Element {target} not found.")
```
## Binary Search Code in Python (Recursive)
```python
def binary_search_recursive(arr, target, low, high):
if low <= high:
mid = (low + high) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
return binary_search_recursive(arr, target, mid + 1, high)
else:
return binary_search_recursive(arr, target, low, mid - 1)
else:
return -1
arr = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
target = 13
result = binary_search_recursive(arr, target, 0, len(arr) - 1)
if result != -1:
print(f"Element {target} found at index {result}.")
else:
print(f"Element {target} not found.")
```
## Complexity Analysis
- **Time Complexity**: O(log n)
- **Space Complexity**: O(1) (Iterative), O(log n) (Recursive)
</br>
<hr>
</br>
# 3. Interpolation Search
Interpolation search is an improved version of binary search, especially useful when the elements in the collection are uniformly distributed. Instead of always dividing the search interval in half, interpolation search estimates the position of the target element based on its value and the values of the endpoints of the search interval.
**Algorithm Overview:**
- **Estimating Position:** Interpolation search calculates an approximate position of the target element within the search interval based on its value and the values of the endpoints.
- **Refining the Estimate:** It adjusts the estimated position based on whether the target value is likely to be closer to the beginning or end of the search interval.
- **Updating the Interval:** Using the refined estimate, it narrows down the search interval iteratively until the target is found or the interval becomes empty.
## Interpolation Search Code in Python
```python
def interpolation_search(arr, target):
low = 0
high = len(arr) - 1
while low <= high and arr[low] <= target <= arr[high]:
if low == high:
if arr[low] == target:
return low
return -1
pos = low + ((target - arr[low]) * (high - low)) // (arr[high] - arr[low])
if arr[pos] == target:
return pos
elif arr[pos] < target:
low = pos + 1
else:
high = pos - 1
return -1
arr = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
target = 60
result = interpolation_search(arr, target)
if result != -1:
print(f"Element {target} found at index {result}.")
else:
print(f"Element {target} not found.")
```
## Complexity Analysis
- **Time Complexity**: O(log log n) (Average)
- **Space Complexity**: O(1)
</br>
<hr>
</br>

Wyświetl plik

@ -0,0 +1,153 @@
# Understanding the Neural Network
## Table of Contents
<details>
<summary>Click to expand</summary>
- [Introduciton](#introduction)
- [Neuron to Perceptron](#neuron-to-perceptron)
- [Key concepts](#key-concepts)
- [Layers](#layers)
- [Weights and Biases](#weights-and-biases)
- [Activation Function](#activation-functions)
- [Forward and Backward Pass](#forward-and-backward-propagation)
- [Implementation](#building-from-scratch)
</details>
## Introduction
This guide will walk you through a fundamental neural network implementation in Python. We'll build a `Neural Network` from scratch, allowing you to grasp the core concepts of how neural networks learn and make predictions.
### Let's start by Understanding the Basic Architecture of Neural Nets
## Neuron to Perceptron
| `Neuron` cells forming the humand nervous system | `Perceptron` inspired from human brain |
| :----------------------------------------------- | -------------------------------------: |
| Neurons are nerve cells that send messages all over your body to allow you to do everything from breathing to talking, eating, walking, and thinking. | The perceptron is a mathematical model of a biological neuron. Performing heavy computations to think like humans. |
| Neuron collects signals from dendrites. | The first layer is knownn as Input Layer, acting like dendritres to receive the input signal. |
| Synapses are the connections between neurons where signals are transmitted. | Weights represent synapses. |
The axon terminal releases neurotransmitters to transmit the signal to other neurons. | The output is the final result – between 1 & 0, representing classification or prediction. |
---
> Human brain has a Network of Neurons, about 86 billion neurons and more than a 100 trillion synapses connections!
## **Key Concepts**
Artificial neurons are the fundamental processing units in an ANN. They receive inputs, multiply them by weights (representing the strength of connections), sum those weighted inputs, and then apply an activation function to produce an output.
### Layers
Neurons in ANNs are organized into layers:
* **Input Layer:** Receives the raw data.
* **(n) Hidden Layers:** (Optional) Intermediate layers where complex transformations occur. They learn to detect patterns and features in the data.
* **Output Layer:** Produces the final result (prediction or classification).
### Weights and Biases
- For each input $(x_i)$, a weight $(w_i)$ is associated with it. Weights, multiplied with input units $(w_i \cdot x_i)$, determine the influence of one neuron's output on another.
- A bias $(b_i)$ is added to help influence the end product, giving the equation as $(w_i \cdot x_i + b_i)$.
- During training, the network adjusts these weights and biases to minimize errors and improve its predictions.
### Activation Functions
- An activation function is applied to the result to introduce non-linearity in the model, allowing ANNs to learn more complex relationships from the data.
- The resulting equation: $y = f(g(x))$, determines whether the neuron will "fire" or not, i.e., if its output will be used as input for the next neuron.
- Common activation functions include the sigmoid function, tanh (hyperbolic tangent), and ReLU (Rectified Linear Unit).
### Forward and Backward Propagation
- **Flow of Information:** All the above steps are part of Forward Propagation. It gives the output equation as $y = f\left(\sum_{i=1}^n w_i x_i + b_i\right)$
- **Error Correction:** Backpropagation is the algorithm used to train ANNs by calculating the gradient of error at the output layer and then propagating this error backward through the network. This allows the network to adjust its weights and biases in the direction that reduces the error.
- The chain rule of calculus is the foundational concept to compute the gradient of the error:
$
\delta_{ij}(E) = \frac{\partial E}{\partial w_{ij}} = \frac{\partial E}{\partial \hat{y}_j} \cdot \frac{\partial \hat{y}_j}{\partial \theta_j} \cdot \frac{\partial \theta_j}{\partial w_{ij}}
$
where $E$ is the error, $\hat{y}_j$ is the predicted output, $\theta_j$ is the input to the activation function of the $j^{th}$ neuron, and $w_{ij}$ is the weight from neuron $i$ to neuron $j$.
## Building From Scratch
```python
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
class SimpleNeuralNetwork:
def __init__(self, input_size, hidden_size, output_size):
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
# Initialize weights and biases
self.weights_input_hidden = np.random.randn(input_size, hidden_size)
self.bias_hidden = np.random.randn(hidden_size)
self.weights_hidden_output = np.random.randn(hidden_size, output_size)
self.bias_output = np.random.randn(output_size)
def sigmoid(self, x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(self, x):
return x * (1 - x)
def forward(self, X):
self.hidden_layer_input = np.dot(X, self.weights_input_hidden) + self.bias_hidden
self.hidden_layer_output = self.sigmoid(self.hidden_layer_input)
self.output_layer_input = np.dot(self.hidden_layer_output, self.weights_hidden_output) + self.bias_output
self.output = self.sigmoid(self.output_layer_input)
return self.output
def backward(self, X, y, learning_rate):
output_error = y - self.output
output_delta = output_error * self.sigmoid_derivative(self.output)
hidden_error = output_delta.dot(self.weights_hidden_output.T)
hidden_delta = hidden_error * self.sigmoid_derivative(self.hidden_layer_output)
self.weights_hidden_output += self.hidden_layer_output.T.dot(output_delta) * learning_rate
self.bias_output += np.sum(output_delta, axis=0) * learning_rate
self.weights_input_hidden += X.T.dot(hidden_delta) * learning_rate
self.bias_hidden += np.sum(hidden_delta, axis=0) * learning_rate
def train(self, X, y, epochs, learning_rate):
self.losses = []
for epoch in range(epochs):
self.forward(X)
self.backward(X, y, learning_rate)
loss = np.mean(np.square(y - self.output))
self.losses.append(loss)
if epoch % 1000 == 0:
print(f"Epoch {epoch}, Loss: {loss}")
def plot_loss(self):
plt.plot(self.losses)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Training Loss Over Epochs')
plt.show()
```
### Creating the Input & Output Array
Let's create a dummy input and outpu dataset. Here, the first two columns will be useful, while the rest might be noise.
```python
X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([[0], [1], [1], [1]])
```
### Defining the Neural Network
With our input and output data ready, we'll define a simple neural network with one hidden layer containing three neurons.
```python
# neural network architecture
input_size = 2
hidden_layers = 1
hidden_neurons = [2]
output_size = 1
```
### Visualizing the Training Loss
To understand how well our model is learning, let's visualize the training loss over epochs.
```python
model = NeuralNetwork(input_size, hidden_layers, hidden_neurons, output_size)
model.train(X, y, 100)
```

Wyświetl plik

@ -0,0 +1,257 @@
# Decision Trees
Decision trees are a type of supervised machine learning algorithm that is mostly used in classification problems. They work for both categorical and continuous input and output variables.
It is also interpreted as acyclic graph that can be utilized for decision-making is called a decision tree. Every branching node in the graph looks at a particular feature (j) of the feature vector. The left branch is taken when the feature's value is less than a certain threshold; the right branch is taken when it is higher. The class to which the example belongs is decided upon as soon as the leaf node is reached.
## Key Components of a Decision Tree
**Root Node:** This is the decision tree's first node, and it symbolizes the whole population or sample.
**Internal Nodes:** These are the nodes that make decisions and they stand in for the characteristics or features.
**Leaf Nodes:** These are the nodes that make decisions and they stand in for the characteristics or features.
**Branches:** These are the lines that connect the nodes, and they show how the choice was made depending on the feature value.
### Example: Predicting Loan Approval
In this example, we will use a decision tree to forecast the approval or denial of a loan application based on a number of features, including job status, credit score, and income.
```
Root Node
(All Applications)
/ \
Internal Node Internal Node
(Credit Score) (Employment Status)
/ \ / \
Leaf Node Leaf Node Leaf Node Leaf Node
(Approve Loan) (Deny Loan) (Approve Loan) (Deny Loan)
```
> There are various formulations of the decision tree learning algorithm. Here, we consider just one, called ID3.
## Appropriate Problems For Decision Tree Learning
In general, decision tree learning works best on issues that have the following characteristics:
1. ***Instances*** are represented by ***key-value pairs***
2. The ***output values of the target function are discrete***. Each sample is given a Boolean categorization (yes or no) by the decision tree. Learning functions with multiple possible output values can be effortlessly integrated into decision tree approaches.
3. ***Disjunctive descriptions may be required***
4. The ***training data may contain errors*** – ***Decision tree learning methods are robust to errors,*** both errors in classifications of the training examples and errors in the attribute
values that describe these examples.
5. ***Missing attribute values could be present in the training data.*** Using decision tree approaches is possible even in cases where some training examples have missing values.
# Decision Tree Algorithm
The decision tree method classifies the data according to a tree structure. The root node, that holds the complete dataset, is where it all begins. The algorithm then determines which feature, according to a certain criterion like information gain or Gini impurity, is appropriate for splitting the dataset. Subsets of the dataset are then created according to the values of the chosen feature. Until a halting condition is satisfied—for example, obtaining a minimal number of samples per leaf node or a maximum tree depth—this procedure is repeated recursively for every subset.
### Which Attribute Is the Best Classifier?
- The ID3 algorithm's primary idea is choose which characteristic to test at each tree node.
- Information gain, a statistical feature that quantifies how well a certain attribute divides the training samples into groups based on the target classification.
- When building the tree, ID3 chooses a candidate attribute  using the information gain metric.
## Entropy & Information
**Entropy** is a metric that quantifies the level of impurity or uncertainty present in a given dataset. When it comes to decision trees, entropy measures how similar the target variable is within a specific node or subset of the data. It is utilized for assessing the quality of potential splits during the tree construction process.
The entropy of a node is calculated as:
__Entropy = -Σ(p<sub>i</sub> * log<sub>2</sub>(p<sub>i</sub>))__
where `p`<sub>`i`</sub> is the proportion of instances belonging to class `i` in the current node. The entropy is at its maximum when all classes are equally represented in the node, indicating maximum impurity or uncertainty.
**Information Gain** is a measure used to estimate the possible reduction in entropy achieved by separating the data according to a certain attribute. It quantifies the projected decrease in impurity or uncertainty after the separation.
The information gain for a feature `A` is calculated as:
__Information Gain = Entropy(parent) - Σ(weight(child) * Entropy(child))__
### Example of a Decision Tree
Let us look at a basic decision tree example that predicts a person's likelihood of playing tennis based on climate conditions
**Data Set:**
---
| Day | Outlook | Temperature | Humidity | Wind | PlayTennis |
|-----|---------|-------------|----------|------|------------|
| D1 | Sunny | Hot | High | Weak | No |
| D2 | Sunny | Hot | High | Strong | No |
| D3 | Overcast| Hot | High | Weak | Yes |
| D4 | Rain | Mild | High | Weak | Yes |
| D5 | Rain | Cool | Normal | Weak | Yes |
| D6 | Rain | Cool | Normal | Strong | No |
| D7 | Overcast| Cool | Normal | Strong | Yes |
| D8 | Sunny | Mild | High | Weak | No |
| D9 | Sunny | Cool | Normal | Weak | Yes |
| D10 | Rain | Mild | Normal | Weak | Yes |
| D11 | Sunny | Mild | Normal | Strong | Yes |
| D12 | Overcast| Mild | High | Strong | Yes |
| D13 | Overcast| Hot | Normal | Weak | Yes |
| D14 | Rain | Mild | High | Strong | No |
---
1. Calculate the entropy of the entire dataset.
2. For each feature, calculate the information gain by splitting the data based on that feature.
3. Select the feature with the highest information gain to create the root node.
4. Repeat steps 1-3 for each child node until a stopping criterion is met (e.g., all instances in a node belong to the same class, or the maximum depth is reached).
Let's start with calculating the entropy of the entire dataset:
Total instances: 14
No instances: 5
Yes instances: 9
**Entropy** = -((5/14) * log2(5/14) + (9/14) * log2(9/14)) = 0.940
Now, we'll calculate the information gain for each feature:
**Outlook**:
- Sunny: 2 No, 3 Yes (Entropy = 0.971)
- Overcast: 0 No, 4 Yes (Entropy = 0)
- Rain: 3 No, 2 Yes (Entropy = 0.971)
Information Gain = 0.940 - ((5/14) * 0.971 + (4/14) * 0 + (5/14) * 0.971) = 0.246
**Temperature**:
- Hot: 2 No, 2 Yes (Entropy = 1)
- Mild: 2 No, 4 Yes (Entropy = 0.811)
- Cool: 1 No, 3 Yes (Entropy = 0.918)
Information Gain = 0.940 - ((4/14) * 1 + (6/14) * 0.811 + (4/14) * 0.918) = 0.029
**Humidity**:
- High: 3 No, 4 Yes (Entropy = 0.985)
- Normal: 2 No, 5 Yes (Entropy = 0.971)
Information Gain = 0.940 - ((7/14) * 0.985 + (7/14) * 0.971) = 0.012
**Wind**:
- Weak: 2 No, 6 Yes (Entropy = 0.811)
- Strong: 3 No, 3 Yes (Entropy = 1)
Information Gain = 0.940 - ((8/14) * 0.811 + (6/14) * 1) = 0.048
The feature with the highest information gain is Outlook, so we'll create the root node based on that.
**Step 1: Root Node (Outlook)**
```
Root Node (Outlook)
/ | \
Sunny Overcast Rain
Entropy: 0.971 Entropy: 0 Entropy: 0.971
5 instances 4 instances 5 instances
```
Now, we'll continue building the tree by recursively splitting the child nodes based on the feature with the highest information gain within each subset.
**Step 2: Splitting Sunny and Rain Nodes**
For the Sunny node:
- Temperature:
- Hot: 2 No, 0 Yes (Entropy = 0)
- Mild: 0 No, 3 Yes (Entropy = 0)
- Cool: 0 No, 0 Yes (Entropy = 0)
Information Gain = 0.971
- Humidity:
- High: 1 No, 2 Yes (Entropy = 0.918)
- Normal: 1 No, 1 Yes (Entropy = 1)
Information Gain = 0.153
- Wind:
- Weak: 1 No, 2 Yes (Entropy = 0.918)
- Strong: 1 No, 1 Yes (Entropy = 1)
Information Gain = 0.153
The highest information gain is achieved by splitting on Temperature, so we'll create child nodes for Sunny based on Temperature.
For the Rain node:
- Wind:
- Weak: 1 No, 3 Yes (Entropy = 0.918)
- Strong: 2 No, 0 Yes (Entropy = 0)
Information Gain = 0.153
Since there is only one feature left (Wind), we'll create child nodes for Rain based on Wind.
**Step 3: Updated Decision Tree**
```
Root Node (Outlook)
/ | \
Sunny Overcast Rain
/ | \ Entropy: 0 / \
Hot Mild Cool 4 instances Weak Strong
Entropy: 0 Entropy: 0 Entropy: 0.918 Entropy: 0
2 instances 3 instances 4 instances 1 instance
```
At this point, all leaf nodes are either pure (entropy = 0) or have instances belonging to a single class. Therefore, we can stop the tree construction process.
**Step 4: Pruning the Decision Tree**
The decision tree we constructed in the previous steps is a complete tree that perfectly classifies the training data. However, this can lead to overfitting, meaning the tree may perform poorly on new, unseen data due to its complexity and memorization of noise in the training set.
To address this, we can prune the tree by removing some of the leaf nodes or branches that contribute little to the overall classification accuracy. Pruning helps to generalize the tree and improve its performance on unseen data.
There are various pruning techniques, such as:
1. **Pre-pruning**: Stopping the tree growth based on a pre-defined criterion (e.g., maximum depth, minimum instances in a node, etc.).
2. **Post-pruning**: Growing the tree to its full depth and then removing subtrees or branches based on a pruning criterion.
>We can observe that the "Cool" node under the "Sunny" branch has no instances in the training data. Removing this node will not affect the classification accuracy on the training set, and it may help generalize the tree better.
**Step 5: Pruned Decision Tree**
```
Root Node (Outlook)
/ | \
/ | \
Sunny Overcast Rain
/ \ Entropy: 0 / \
Hot Mild 4 instances Weak Strong
Entropy: 0 Entropy: 0.918 Entropy: 0 Entropy: 0
2 instances 4 instances 3 instances 2 instances
```
**Step 6: Visualizing the Decision Tree**
Decision trees can be visualized graphically to provide a clear representation of the hierarchical structure and the decision rules. This visualization can aid in understanding the tree's logic and interpreting the results.
There are various tools and libraries available for visualizing decision trees. One popular library in Python is `graphviz`, which can create tree-like diagrams and visualizations.
Here's an example of how to visualize our pruned decision tree using `graphviz` in Python:
```python
import graphviz
from sklearn import tree
# Create a decision tree classifier
decision_tree_classifier = tree.DecisionTreeClassifier()
# Train the classifier on the dataset X and labels y
decision_tree_classifier.fit(X, y)
# Visualize the decision tree
tree_dot_data = tree.export_graphviz(decision_tree_classifier, out_file=None,
feature_names=['Outlook', 'Temperature', 'Humidity', 'Wind'],
class_names=['No', 'Yes'], filled=True, rounded=True, special_characters=True)
# Create a graph from the DOT data
graph = graphviz.Source(tree_dot_data)
# Render and save the decision tree as an image file
graph.render("decision_tree")
```
```
Outlook
/ | \
Sunny Overcast Rain
/ | / \
Humidity Yes Wind Wind
/ \ / \
High Normal Weak Strong
No Yes Yes No
```
The final decision tree classifies instances based on the following rules:
- If Outlook is Overcast, PlayTennis is Yes
- If Outlook is Sunny and Temperature is Hot, PlayTennis is No
- If Outlook is Sunny and Temperature is Mild, PlayTennis is Yes
- If Outlook is Sunny and Temperature is Cool, PlayTennis is Yes (no instances in the dataset)
- If Outlook is Rain and Wind is Weak, PlayTennis is Yes
- If Outlook is Rain and Wind is Strong, PlayTennis is No
> Note that the calculated entropies and information gains may vary slightly depending on the specific implementation and rounding methods used.

Wyświetl plik

@ -0,0 +1,171 @@
# Regression
* Regression is a supervised machine learning technique which is used to predict continuous values.
> Now, Supervised learning is a category of machine learning that uses labeled datasets to train algorithms to predict outcomes and recognize patterns.
* Regression is a statistical method used to model the relationship between a dependent variable (often denoted as 'y') and one or more independent variables (often denoted as 'x'). The goal of regression analysis is to understand how the dependent variable changes as the independent variables change.
# Types Of Regression
1. Linear Regression
2. Polynomial Regression
3. Stepwise Regression
4. Decision Tree Regression
5. Random Forest Regression
6. Ridge Regression
7. Lasso Regression
8. ElasticNet Regression
9. Bayesian Linear Regression
10. Support Vector Regression
But, we'll first start with Linear Regression
# Linear Regression
* Linear regression is a fundamental statistical method used to model the relationship between a dependent variable (often denoted as
𝑌) and one or more independent variables (often denoted as
𝑋). The relationship is assumed to be linear, meaning that changes in the independent variables are associated with changes in the dependent variable in a straight-line fashion.
The basic form of linear regression for a single independent variable is:
**𝑌=𝛽0+𝛽1𝑋+𝜖**
Where:
* Y is the dependent variable.
* X is the independent variable.
* 𝛽0 is the intercept, representing the value of Y when X is zero
* 𝛽1 is the slope coefficient, representing the change in Y for a one-unit change in X
* ϵ is the error term, representing the variability in Y that is not explained by the linear relationship with X.
# Basic Code of Linear Regression
* This line imports the numpy library, which is widely used for numerical operations in Python. We use np as an alias for numpy, making it easier to reference functions and objects from the library.
```
import numpy as np
```
* This line imports the LinearRegression class from the linear_model module of the scikit-learn library.scikit-learn is a powerful library for machine learning tasks in Python, and LinearRegression is a class provided by it for linear regression.
```
from sklearn.linear_model import LinearRegression
```
* This line creates a NumPy array X containing the independent variable values. In this example, we have a simple one-dimensional array representing the independent variable. The reshape(-1, 1) method reshapes the array into a column vector, necessary for use with scikit-learn
```
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
```
* This line creates a NumPy array Y containing the corresponding dependent variable values. These are the observed values of the dependent variable corresponding to the independent variable values in X.
```
Y = np.array([2, 4, 5, 8, 5])
```
* This line creates an instance of the LinearRegression class, which represents the linear regression model. We'll use this object to train the model and make predictions.
```
model = LinearRegression()
```
* This line fits the linear regression model to the data. The fit() method takes two arguments: the independent variable (X) and the dependent variable (Y). This method estimates the coefficients of the linear regression equation that best fit the given data.
```
model.fit(X, Y)
```
* These lines print out the intercept (beta_0) and coefficient (beta_1) of the linear regression model. model.intercept_ gives the intercept value, and model.coef_ gives an array of coefficients, where model.coef_[0] corresponds to the coefficient of the first independent variable (in this case, there's only one).
```
print("Intercept:", model.intercept_)
print("Coefficient:", model.coef_[0])
```
* These lines demonstrate how to use the trained model to make predictions for new data.
* We create a new NumPy array new_data containing the values of the independent variable for which we want to predict the dependent variable values.
* We then use the predict() method of the model to obtain the predictions for these new data points. Finally, we print out the predicted values.
```
new_data = np.array([[6], [7]])
predictions = model.predict(new_data)
print("Predictions:", predictions)
```
# Assumptions of Linear Regression
# Linearity:
* To assess the linearity assumption, we can visually inspect a scatter plot of the observed values versus the predicted values.
* If the relationship between them appears linear, it suggests that the linearity assumption is reasonable.
```
import matplotlib.pyplot as plt
predictions = model.predict(X)
plt.scatter(predictions,Y)
plt.xlabel("Predicted Values")
plt.ylabel("Observed Values")
plt.title("Linearity Check: Observed vs Predicted")
plt.show()
```
# Homoscedasticity:
* Homoscedasticity refers to the constant variance of the residuals across all levels of the independent variable(s). We can visually inspect a plot of residuals versus predicted values to check for homoscedasticity.
```
residuals = Y - predictions
plt.scatter(predictions, residuals)
plt.xlabel("Predicted Values")
plt.ylabel("Residuals")
plt.title("Homoscedasticity Check: Residuals vs Predicted Values")
plt.axhline(y=0, color='red', linestyle='--') # Add horizontal line at y=0
plt.show()
```
# Normality of Residuals:
* To assess the normality of residuals, we can visually inspect a histogram or a Q-Q plot of the residuals.
```
import seaborn as sns
sns.histplot(residuals, kde=True)
plt.xlabel("Residuals")
plt.ylabel("Frequency")
plt.title("Normality of Residuals: Histogram")
plt.show()
import scipy.stats as stats
stats.probplot(residuals, dist="norm", plot=plt)
plt.title("Normal Q-Q Plot")
plt.show()
```
# Metrics for Regression
# Mean Absolute Error (MAE)
* MAE measures the average magnitude of the errors in a set of predictions, without considering their direction. It is the average of the absolute differences between predicted and actual values.
```
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(Y, predictions)
print(f"Mean Absolute Error (MAE): {mae}")
```
# Mean Squared Error (MSE)
* MSE measures the average of the squares of the errors. It gives more weight to larger errors, making it sensitive to outliers.
```
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(Y, predictions)
print(f"Mean Squared Error (MSE): {mse}")
```
# Root Mean Squared Error (RMSE)
* RMSE is the square root of the MSE. It provides an error metric that is in the same units as the dependent variable, making it more interpretable.
```
rmse = np.sqrt(mse)
print(f"Root Mean Squared Error (RMSE): {rmse}")
```
# R-squared (Coefficient of Determination)
* R-squared measures the proportion of the variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, where 1 indicates a perfect fit.
```
from sklearn.metrics import r2_score
r2 = r2_score(Y, predictions)
print(f"R-squared (R^2): {r2}")
```
> In this tutorial, The sample dataset is there for learning purpose only

Wyświetl plik

@ -0,0 +1,123 @@
# Binomial Distribution
## Introduction
The binomial distribution is a discrete probability distribution that describes the number of successes in a fixed number of independent Bernoulli trials, each with the same probability of success. It is commonly used in statistics and probability theory.
### Key Characteristics
- **Number of trials (n):** The number of independent experiments or trials.
- **Probability of success (p):** The probability of success on an individual trial.
- **Number of successes (k):** The number of successful outcomes in n trials.
The binomial distribution is defined by the probability mass function (PMF):
P(X = k) = (n choose k) p^k (1 - p)^(n - k)
where:
- (n choose k) is the binomial coefficient, calculated as n! / (k!(n-k)!).
## Properties of Binomial Distribution
- **Mean:** μ = np
- **Variance:** σ² = np(1 - p)
- **Standard Deviation:** σ = √(np(1 - p))
## Python Implementation
Let's implement the binomial distribution using Python. We'll use the `scipy.stats` library to compute the binomial PMF and CDF, and `matplotlib` to visualize it.
### Step-by-Step Implementation
1. **Import necessary libraries:**
```python
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binom
```
2. **Define parameters:**
```python
# Number of trials
n = 10
# Probability of success
p = 0.5
# Number of successes
k = np.arange(0, n + 1)
```
3. **Compute the PMF:**
```python
pmf = binom.pmf(k, n, p)
```
4. **Plot the PMF:**
```python
plt.bar(k, pmf, color='blue')
plt.xlabel('Number of Successes')
plt.ylabel('Probability')
plt.title('Binomial Distribution PMF')
plt.show()
```
5. **Compute the CDF:**
```python
cdf = binom.cdf(k, n, p)
```
6. **Plot the CDF:**
```python
plt.plot(k, cdf, marker='o', linestyle='--', color='blue')
plt.xlabel('Number of Successes')
plt.ylabel('Cumulative Probability')
plt.title('Binomial Distribution CDF')
plt.grid(True)
plt.show()
```
### Complete Code
Here is the complete code for the binomial distribution implementation:
```python
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binom
# Parameters
n = 10 # Number of trials
p = 0.5 # Probability of success
# Number of successes
k = np.arange(0, n + 1)
# Compute PMF
pmf = binom.pmf(k, n, p)
# Plot PMF
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.bar(k, pmf, color='blue')
plt.xlabel('Number of Successes')
plt.ylabel('Probability')
plt.title('Binomial Distribution PMF')
# Compute CDF
cdf = binom.cdf(k, n, p)
# Plot CDF
plt.subplot(1, 2, 2)
plt.plot(k, cdf, marker='o', linestyle='--', color='blue')
plt.xlabel('Number of Successes')
plt.ylabel('Cumulative Probability')
plt.title('Binomial Distribution CDF')
plt.grid(True)
plt.tight_layout()
plt.show()

Wyświetl plik

@ -0,0 +1,70 @@
## Confusion Matrix
A confusion matrix is a fundamental performance evaluation tool used in machine learning to assess the accuracy of a classification model. It is an N x N matrix, where N represents the number of target classes.
For binary classification, it results in a 2 x 2 matrix that outlines four key parameters:
1. True Positive (TP) - The predicted value matches the actual value, or the predicted class matches the actual class.
For example - the actual value was positive, and the model predicted a positive value.
2. True Negative (TN) - The predicted value matches the actual value, or the predicted class matches the actual class.
For example - the actual value was negative, and the model predicted a negative value.
3. False Positive (FP)/Type I Error - The predicted value was falsely predicted.
For example - the actual value was negative, but the model predicted a positive value.
4. False Negative (FN)/Type II Error - The predicted value was falsely predicted.
For example - the actual value was positive, but the model predicted a negative value.
The confusion matrix enables the calculation of various metrics like accuracy, precision, recall, F1-Score and specificity.
1. Accuracy - It represents the proportion of correctly classified instances out of the total number of instances in the dataset.
2. Precision - It quantifies the accuracy of positive predictions made by the model.
3. Recall - It quantifies the ability of a model to correctly identify all positive instances in the dataset and is also known as sensitivity or true positive rate.
4. F1-Score - It is a single measure that combines precision and recall, offering a balanced evaluation of a classification model's effectiveness.
To implement the confusion matrix in Python, we can use the confusion_matrix() function from the sklearn.metrics module of the scikit-learn library.
The function returns a 2D array that represents the confusion matrix.
We can also visualize the confusion matrix using a heatmap.
```python
# Import necessary libraries
import numpy as np
from sklearn.metrics import confusion_matrix, classification_report
import seaborn as sns
import matplotlib.pyplot as plt
# Create the NumPy array for actual and predicted labels
actual = np.array(['Apple', 'Apple', 'Apple', 'Not Apple', 'Apple',
'Not Apple', 'Apple', 'Apple', 'Not Apple', 'Not Apple'])
predicted = np.array(['Apple', 'Not Apple', 'Apple', 'Not Apple', 'Apple',
'Apple', 'Apple', 'Apple', 'Not Apple', 'Not Apple'])
# Compute the confusion matrix
cm = confusion_matrix(actual,predicted)
# Plot the confusion matrix with the help of the seaborn heatmap
sns.heatmap(cm,
annot=True,
fmt='g',
xticklabels=['Apple', 'Not Apple'],
yticklabels=['Apple', 'Not Apple'])
plt.xlabel('Prediction', fontsize=13)
plt.ylabel('Actual', fontsize=13)
plt.title('Confusion Matrix', fontsize=17)
plt.show()
# Classifications Report based on Confusion Metrics
print(classification_report(actual, predicted))
```
### Results
```
1. Confusion Matrix:
[[5 1]
[1 3]]
2. Classification Report:
precision recall f1-score support
Apple 0.83 0.83 0.83 6
Not Apple 0.75 0.75 0.75 4
accuracy 0.80 10
macro avg 0.79 0.79 0.79 10
weighted avg 0.80 0.80 0.80 10
```

Wyświetl plik

@ -1,3 +1,9 @@
# List of sections
- [Section title](filename.md)
- [Binomial Distribution](binomial_distribution.md)
- [Regression in Machine Learning](Regression.md)
- [Confusion Matrix](confusion-matrix.md)
- [Decision Tree Learning](Decision-Tree.md)
- [Support Vector Machine Algorithm](support-vector-machine.md)
- [Artificial Neural Network from the Ground Up](ArtificialNeuralNetwork.md)
- [TensorFlow.md](tensorFlow.md)

Wyświetl plik

@ -0,0 +1,62 @@
## Support Vector Machine
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for Classification as well as Regression problems. However, primarily, it is used for Classification problems in Machine Learning.
SVM can be of two types -
1. Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier.
2. Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used is called as Non-linear SVM classifier.
Working of SVM - The goal of SVM is to find a hyperplane that separates the data points into different classes. A hyperplane is a line in 2D space, a plane in 3D space, or a higher-dimensional surface in n-dimensional space. The hyperplane is chosen in such a way that it maximizes the margin, which is the distance between the hyperplane and the closest data points of each class. The closest data points are called the support vectors.
The distance between the hyperplane and a data point "x" can be calculated using the formula
```
distance = (w . x + b) / ||w||
```
where "w" is the weight vector, "b" is the bias term, and "||w||" is the Euclidean norm of the weight vector. The weight vector "w" is perpendicular to the hyperplane and determines its orientation, while the bias term "b" determines its position.
The optimal hyperplane is found by solving an optimization problem, which is to maximize the margin subject to the constraint that all data points are correctly classified. In other words, we want to find the hyperplane that maximizes the margin between the two classes while ensuring that no data point is misclassified. This is a convex optimization problem that can be solved using quadratic programming. If the data points are not linearly separable, we can use a technique called kernel trick to map the data points into a higher-dimensional space where they become separable. The kernel function computes the inner product between the mapped data points without computing the mapping itself. This allows us to work with the data points in the higherdimensional space without incurring the computational cost of mapping them.
1. Hyperplane:
There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space, but we need to find out the best decision boundary that helps to classify the data points. This best boundary is known as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2 features, then hyperplane will be a straight line. And if there are 3 features, then hyperplane will be a 2-dimension plane. We always create a hyperplane that has a maximum margin, which means the maximum distance between the data points.
2. Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the position of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence called a Support vector.
3. Margin:
It may be defined as the gap between two lines on the closet data points of different classes. It can be calculated as the perpendicular distance from the line to the support vectors. Large margin is considered as a good margin and small margin is considered as a bad margin.
We will use the famous Iris dataset, which contains the sepal length, sepal width, petal length, and petal width of three species of iris flowers: Iris setosa, Iris versicolor, and Iris virginica. The goal is to classify the flowers into their respective species based on these four features. We load the iris dataset using load_iris and split the data into training and testing sets using train_test_split. We use a test size of 0.2, which means that 20% of the data will be used for testing and 80% for training. We set the random state to 42 to ensure reproducibility of the results.
### Implemetation of SVM in Python
```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# load the iris dataset
iris = load_iris()
# split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data,
iris.target, test_size=0.2, random_state=42)
# create an SVM classifier with a linear kernel
svm = SVC(kernel='linear')
# train the SVM classifier on the training set
svm.fit(X_train, y_train)
# make predictions on the testing set
y_pred = svm.predict(X_test)
# calculate the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
```
#### Output
```
Accuracy: 1
```

Wyświetl plik

@ -0,0 +1,64 @@
# TensorFlow
Developed by the Google Brain team, TensorFlow is an open-source library that provides a comprehensive ecosystem for building and deploying machine learning models. It supports deep learning and neural networks and offers tools for both beginners and experts.
## Key Features
- **Flexible and comprehensive ecosystem**
- **Scalable for both production and research**
- **Supports CPUs, GPUs, and TPUs**
## Basic Example: Linear Regression
Let's start with a simple linear regression example in TensorFlow.
```python
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Generate synthetic data
X = np.array([1, 2, 3, 4, 5], dtype=np.float32)
Y = np.array([2, 4, 6, 8, 10], dtype=np.float32)
# Define the model
model = tf.keras.Sequential([
tf.keras.layers.Dense(units=1, input_shape=[1])
])
# Compile the model
model.compile(optimizer='sgd', loss='mean_squared_error')
# Train the model
history = model.fit(X, Y, epochs=500)
# Predict
predictions = model.predict(X)
# Plot the results
plt.plot(X, Y, 'ro', label='Original data')
plt.plot(X, predictions, 'b-', label='Fitted line')
plt.legend()
plt.show()
```
In this example:
1. We define a simple dataset with a linear relationship.
2. We build a sequential model with one dense layer (linear regression).
3. We compile the model with stochastic gradient descent (SGD) optimizer and mean squared error loss.
4. We train the model for 500 epochs and then plot the original data and the fitted line.
## When to Use TensorFlow
TensorFlow is a great choice if you:
- **Need to deploy machine learning models in production:** TensorFlows robust deployment options, including TensorFlow Serving, TensorFlow Lite, and TensorFlow.js, make it ideal for production environments.
- **Work on large-scale deep learning projects:** TensorFlows comprehensive ecosystem supports distributed training and has tools like TensorBoard for visualization.
- **Require high performance and scalability:** TensorFlow is optimized for performance and can leverage GPUs and TPUs for accelerated computing.
- **Want extensive support and documentation:** TensorFlow has a large community and extensive documentation, which can be very helpful for both beginners and advanced users.
## Example Use Cases
- Building and deploying complex neural networks for image recognition, natural language processing, or recommendation systems.
- Developing models that need to be run on mobile or embedded devices.

Wyświetl plik

@ -0,0 +1,84 @@
# Rock Paper Scissors Game
This is a simple implementation of the classic rock-paper-scissors game in Python.
## Code Explanation:
In this section, we import the required libraries (`tkinter` for GUI and `random` for generating computer choices) and define two functions:
- `determine_winner(user_choice, computer_choice)`:
- This function determines the winner of the game based on the choices made by the user and the computer.
- It returns a tuple containing the result of the game and the computer's choice.
- `play_game()`:
- This function handles the gameplay logic.
- It gets the user's choice from the radio buttons, generates a random choice for the computer, determines the winner using the `determine_winner()` function, and updates the result and computer pick labels accordingly.
### Imports and Function Definitions:
```python
import tkinter as tk
import random
def determine_winner(user_choice, computer_choice):
"""Determine the winner of the game."""
if user_choice == computer_choice:
return "It's a tie!", computer_choice
elif (user_choice == "rock" and computer_choice == "scissors") or \
(user_choice == "paper" and computer_choice == "rock") or \
(user_choice == "scissors" and computer_choice == "paper"):
return "You win!", computer_choice
else:
return "Computer wins!", computer_choice
def play_game():
"""Play the game and display the result."""
user_choice = user_var.get()
computer_choice = random.choice(["rock", "paper", "scissors"])
result, computer_pick = determine_winner(user_choice, computer_choice)
result_label.config(text=result)
computer_label.config(text=f"Computer picked: {computer_pick}")
```
### GUI Setup:
```python
# Create main window
root = tk.Tk()
root.title("Rock Paper Scissors")
# User choice options
user_var = tk.StringVar()
user_var.set("rock") # Default choice
choices = ["rock", "paper", "scissors"]
for choice in choices:
rb = tk.Radiobutton(root, text=choice, variable=user_var, value=choice)
rb.pack()
```
- Here, we create the main window for the game using `tkinter.Tk()`. We set the title to "Rock Paper Scissors".
- We define a `StringVar` to store the user's choice and set the default choice to "rock".
- We create radio buttons for the user to choose from ("rock", "paper", "scissors") and pack them into the main window.
```
```
### Play Button and Result Labels:
```python
# Play button
play_button = tk.Button(root, text="Play", command=play_game)
play_button.pack()
# Result label
result_label = tk.Label(root, text="", font=("Helvetica", 16))
result_label.pack()
# Computer pick label
computer_label = tk.Label(root, text="", font=("Helvetica", 12))
computer_label.pack()
```
- We create a "Play" button that triggers the `play_game()` function when clicked, using `tkinter.Button`.
- We create two labels to display the result of the game (`result_label`) and the computer's choice (`computer_label`). Both labels initially display no text and are packed into the main window.
```
```
### Mainloop:
```python
root.mainloop()
```
- Finally, we start the Tkinter event loop using `root.mainloop()`, which keeps the GUI window open and responsive until the user closes it.
-

Wyświetl plik

@ -0,0 +1,36 @@
## Dice Roller
The aim of this project is to replicate a dice and generate a random number from the numbers 1 to 6.
For this first we will import the random library which will help make random choices.
```
import random
def dice():
dice_no = random.choice([1,2,3,4,5,6])
return "You got " + str(dice_no)
```
The above snippet of code defines a function called `dice()` which makes the random choice and returns the number that is generated.
```
def roll_dice():
print("Hey Guys, you will now roll a single dice using Python!")
while True:
start=input("Type \'k\' to roll the dice: ").lower()
if start != 'k':
print("Invalid input. Please try again.")
continue
print(dice())
roll_again = input("Do you want to reroll? (Yes/No): ").lower()
if roll_again != 'yes':
break
print("Thanks for rolling the dice.")
roll_dice()
```
The above code defines a function called `roll_dice()` which interacts with the user.
It prompts the user to give an input and if the input is `k`,the code proceeds further to generate a random number or gives the message of invalid input and asks the user to try again.
After the dice has been rolled once, the function asks the user whether they want a reroll in the form of a `yes` or `no` question. The dice is rolled again if the user gives `yes` as an answer and exits the code if the user replies with anything other than yes.

Wyświetl plik

@ -0,0 +1,220 @@
# Hangman - Movies Edition
The Hangman game script is a simple Python program designed to let players guess movie titles. It starts by importing the random module to select a movie from a predefined list. The game displays the movie title as underscores and reveals correctly guessed letters. Players have six attempts to guess the entire title, entering one letter at a time. The script checks if the input is valid, updates the list of guessed letters, and adjusts the number of attempts based on the correctness of the guess. The game continues until the player either guesses the title correctly or runs out of attempts. Upon completion, it congratulates the player for a correct guess or reveals the movie title if the attempts are exhausted. The main execution block ensures the game runs only when the script is executed directly.Below is first the code and then an explanation of the code and its components.
## Code
```
import random
def choose_movie():
movies = ['avatar', 'titanic', 'inception', 'jurassicpark', 'thegodfather', 'forrestgump', 'interstellar', 'pulpfiction', 'shawshank']
return random.choice(movies)
def display_word(movie, guessed_letters):
display = ""
for letter in movie:
if letter in guessed_letters:
display += letter + " "
else:
display += "_ "
return display
def hangman_movies():
movie = choose_movie()
guessed_letters = []
attempts = 6
print("Welcome to Hangman - Movies Edition!")
print("Try to guess the name of the movie. You have 6 attempts.")
while attempts > 0:
print("\n" + display_word(movie, guessed_letters))
guess = input("Guess a letter: ").lower()
if len(guess) != 1 or not guess.isalpha():
print("Please enter a single letter.")
continue
if guess in guessed_letters:
print("You've already guessed that letter.")
continue
guessed_letters.append(guess)
if guess not in movie:
attempts -= 1
print(f"Sorry, '{guess}' is not in the movie name. You have {attempts} attempts left.")
else:
print(f"Good guess! '{guess}' is in the movie name.")
if "_" not in display_word(movie, guessed_letters):
print(f"\nCongratulations! You guessed the movie '{movie.capitalize()}' correctly!")
break
if attempts == 0:
print(f"\nSorry, you ran out of attempts. The movie was '{movie.capitalize()}'.")
if __name__ == "__main__":
hangman_movies()
```
## Code Explanation
### Importing the Random Module
```python
import random
```
The `random` module is imported to use the `choice` function, which will help in selecting a random movie from a predefined list.
### Choosing a Movie
```python
def choose_movie():
movies = ['avatar', 'titanic', 'inception', 'jurassicpark', 'thegodfather', 'forrestgump', 'interstellar', 'pulpfiction', 'shawshank']
return random.choice(movies)
```
The `choose_movie` function returns a random movie title from the `movies` list.
### Displaying the Word
```python
def display_word(movie, guessed_letters):
display = ""
for letter in movie:
if letter in guessed_letters:
display += letter + " "
else:
display += "_ "
return display
```
The `display_word` function takes the movie title and a list of guessed letters as arguments. It constructs a string where correctly guessed letters are shown in their positions, and unknown letters are represented by underscores (`_`).
### Hangman Game Logic
```python
def hangman_movies():
movie = choose_movie()
guessed_letters = []
attempts = 6
print("Welcome to Hangman - Movies Edition!")
print("Try to guess the name of the movie. You have 6 attempts.")
while attempts > 0:
print("\n" + display_word(movie, guessed_letters))
guess = input("Guess a letter: ").lower()
if len(guess) != 1 or not guess.isalpha():
print("Please enter a single letter.")
continue
if guess in guessed_letters:
print("You've already guessed that letter.")
continue
guessed_letters.append(guess)
if guess not in movie:
attempts -= 1
print(f"Sorry, '{guess}' is not in the movie name. You have {attempts} attempts left.")
else:
print(f"Good guess! '{guess}' is in the movie name.")
if "_" not in display_word(movie, guessed_letters):
print(f"\nCongratulations! You guessed the movie '{movie.capitalize()}' correctly!")
break
if attempts == 0:
print(f"\nSorry, you ran out of attempts. The movie was '{movie.capitalize()}'.")
```
The `hangman_movies` function manages the game's flow:
1. It selects a random movie title using `choose_movie`.
2. Initializes an empty list `guessed_letters` and sets the number of attempts to 6.
3. Prints a welcome message and the initial game state.
4. Enters a loop that continues until the player runs out of attempts or guesses the movie title.
5. Displays the current state of the movie title with guessed letters revealed.
6. Prompts the player to guess a letter.
7. Validates the player's input:
- Ensures it is a single alphabetic character.
- Checks if the letter has already been guessed.
8. Adds the guessed letter to `guessed_letters`.
9. Updates the number of attempts if the guessed letter is not in the movie title.
10. Congratulates the player if they guess the movie correctly.
11. Informs the player of the correct movie title if they run out of attempts.
### Main Execution Block
```python
if __name__ == "__main__":
hangman_movies()
```
## Conclusion
This block ensures that the game runs only when the script is executed directly, not when it is imported as a module.
## Output Screenshots:
![image](https://github.com/Aditi22Bansal/learn-python/assets/142652964/a7af1f7e-c80e-4f83-b1f7-c7c5c72158b4)
![image](https://github.com/Aditi22Bansal/learn-python/assets/142652964/082e54dc-ce68-48fd-85da-3252d7629df8)
## Conclusion
This script provides a simple yet entertaining Hangman game focused on guessing movie titles. It demonstrates the use of functions, loops, conditionals, and user input handling in Python.

Wyświetl plik

@ -1,3 +1,8 @@
# List of sections
- [Section title](filename.md)
- [Dice Roller](dice_roller.md)
- [Rock Paper Scissors Game](Rock_Paper_Scissors_Game.md)
- [Password strength checker](password_strength_checker.md)
- [Path Finder](path-finder.md)
- [Hangman Game Based on Movies](hangman_game.md)
- [Tic-tac-toe](tic-tac-toe.md)

Wyświetl plik

@ -0,0 +1,100 @@
# about password strength
> This code is a simple password strength checker.
It evaluates the strength of a user's password based on the presence of
uppercase letters, lowercase letters, digits, spaces, and special characters.
### About the code:
- The codebase is break down in two file `password_strength_checker.py` and `main.py`.
`password_strength_checker.py` The function evaluates password strength based on character types (uppercase, lowercase, digits, spaces, special characters) and provides feedback on its security.
and `main.py` contains basic code.
```
import string
class password_checker:
def __init__(self, password):
self.password = password
def check_password_strength(self):
"""This function prompts the user to enter a password and then evaluates its strength."""
password_strength = 0
upper_count = 0
lower_count = 0
num_count = 0
space_count = 0
specialcharacter_count = 0
review = ""
for char in list(password):
if char in string.ascii_uppercase:
upper_count += 1
elif char in string.ascii_lowercase:
lower_count += 1
elif char in string.digits:
num_count += 1
elif char == " ":
space_count += 1
else:
specialcharacter_count += 1
if upper_count >= 1:
password_strength += 1
if lower_count >= 1:
password_strength += 1
if num_count >= 1:
password_strength += 1
if space_count >= 1:
password_strength += 1
if specialcharacter_count >= 1:
password_strength += 1
if password_strength == 1:
review = "That's a very easy password, Not good for use"
elif password_strength == 2:
review = (
"That's a weak password, You should change it to some strong password."
)
elif password_strength == 3:
review = "Your password is just okay, you may change it."
elif password_strength == 4:
review = "Your password is hard to guess."
elif password_strength == 5:
review = "Its the strong password, No one can guess this password "
about_password = {
"uppercase_letters ": upper_count,
"lowercase_letters": lower_count,
"space_count": space_count,
"specialcharacter_count": specialcharacter_count,
"password_strength": password_strength,
"about_password_strength": review,
}
print(about_password)
def check_password():
"""This function prompts the user to decide if they want to check their password strength."""
choice = input("Do you want to check your password's strength? (Y/N): ")
if choice.upper() == "Y":
return True
elif choice.upper() == "N":
return False
else:
print("Invalid input. Please enter 'Y' for Yes or 'N' for No.")
return password_checker.check_password()
```
### Here's the implementation of 'main.py'
```
import password_checker from password_strength_checker
while password_checker.check_password():
password = input("Enter your password: ")
p = password_checker(password)
p.check_password_strength()
```

Wyświetl plik

@ -0,0 +1,120 @@
# Path Finder
This Python script uses the curses library to visualize the process of finding a path through a maze in real-time within a terminal window. The program represents the maze as a list of lists, where each list represents a row in the maze, and each string element in the lists represents a cell in the maze. The maze includes walls (#), a start point (O), and an end point (X), with empty spaces ( ) that can be traversed.
## The script includes the following main components:
- Visualization Functions: <br>
print_maze(maze, stdscr, path=[]): This function is used to display the maze in the terminal. It utilizes color pairs to distinguish between the maze walls, the path, and unexplored spaces. The current path being explored is displayed with a different color to make it stand out.
- Utility Functions: <br>
find_start(maze, start): This function searches the maze for the starting point (marked as O) and returns its position as a tuple (row, col). <br>
find_neighbors(maze, row, col): This function identifies the valid adjacent cells (up, down, left, right) that can be moved to from the current position,
ignoring any walls or out-of-bound positions.
- Pathfinding Logic: <br>
find_path(maze, stdscr): This function implements a Breadth-First Search (BFS) algorithm to find a path from the start point to the end point (X). It uses a
queue to explore each possible path sequentially. As it explores the maze, it updates the display in real-time, allowing the viewer to follow the progress
visually. Each visited position is marked and not revisited, ensuring the algorithm efficiently covers all possible paths without repetition.
Overall, the script demonstrates an effective use of the curses library to create a dynamic visual representation of the BFS algorithm solving a maze, providing both an educational tool for understanding pathfinding and an example of real-time data visualization in a terminal.
#### Below is the code of the path finder
```python
import curses
from curses import wrapper
import queue
import time
# Define the structure of the maze as a list of lists where each inner list represents a row.
maze = [
["#", "O", "#", "#", "#", "#", "#", "#", "#"],
["#", " ", " ", " ", " ", " ", " ", " ", "#"],
["#", " ", "#", "#", " ", "#", "#", " ", "#"],
["#", " ", "#", " ", " ", " ", "#", " ", "#"],
["#", " ", "#", " ", "#", " ", "#", " ", "#"],
["#", " ", "#", " ", "#", " ", "#", " ", "#"],
["#", " ", "#", " ", "#", " ", "#", "#", "#"],
["#", " ", " ", " ", " ", " ", " ", " ", "#"],
["#", "#", "#", "#", "#", "#", "#", "X", "#"]
]
# Function to print the current state of the maze in the terminal.
def print_maze(maze, stdscr, path=[]):
BLUE = curses.color_pair(1) # Color pair for walls and free paths
RED = curses.color_pair(2) # Color pair for the current path
for i, row in enumerate(maze):
for j, value in enumerate(row):
if (i, j) in path:
stdscr.addstr(i, j*2, "X", RED) # Print path character with red color
else:
stdscr.addstr(i, j*2, value, BLUE) # Print walls and free paths with blue color
# Function to locate the starting point (marked 'O') in the maze.
def find_start(maze, start):
for i, row in enumerate(maze):
for j, value in enumerate(row):
if value == start:
return i, j
return None
# Function to find a path from start ('O') to end ('X') using BFS.
def find_path(maze, stdscr):
start = "O"
end = "X"
start_pos = find_start(maze, start) # Get the start position
q = queue.Queue()
q.put((start_pos, [start_pos])) # Initialize the queue with the start position
visited = set() # Set to keep track of visited positions
while not q.empty():
current_pos, path = q.get() # Get the current position and path
row, col = current_pos
stdscr.clear() # Clear the screen
print_maze(maze, stdscr, path) # Print the current state of the maze
time.sleep(0.2) # Delay for visibility
stdscr.refresh() # Refresh the screen
if maze[row][col] == end: # Check if the current position is the end
return path # Return the path if end is reached
# Get neighbors (up, down, left, right) that are not walls
neighbors = find_neighbors(maze, row, col)
for neighbor in neighbors:
if neighbor not in visited:
r, c = neighbor
if maze[r][c] != "#":
new_path = path + [neighbor]
q.put((neighbor, new_path))
visited.add(neighbor)
# Function to find the valid neighboring cells (not walls or out of bounds).
def find_neighbors(maze, row, col):
neighbors = []
if row > 0: # UP
neighbors.append((row - 1, col))
if row + 1 < len(maze): # DOWN
neighbors.append((row + 1, col))
if col > 0: # LEFT
neighbors.append((row, col - 1))
if col + 1 < len(maze[0]): # RIGHT
neighbors.append((row, col + 1))
return neighbors
# Main function to setup curses and run the pathfinding algorithm.
def main(stdscr):
curses.init_pair(1, curses.COLOR_BLUE, curses.COLOR_BLACK) # Initialize color pair for blue
curses.init_pair(2, curses.COLOR_RED, curses.COLOR_BLACK) # Initialize color pair for red
find_path(maze, stdscr) # Find the path using BFS
stdscr.getch() # Wait for a key press before exiting
wrapper(main) # Use the wrapper to initialize and finalize curses automatically.
```

Wyświetl plik

@ -0,0 +1,91 @@
# Python Code For The Tic Tac Toe Game
# Tic Tac Toe Game
## Overview
### Objective
- Get three of your symbols (X or O) in a row (horizontally, vertically, or diagonally) on a 3x3 grid.
### Gameplay
- Two players take turns.
- Player 1 uses X, Player 2 uses O.
- Players mark an empty square in each turn.
### Winning
- The first player to align three of their symbols wins.
- If all squares are filled without any player aligning three symbols, the game is a draw.
```python
print("this game should be played by two people player1 takes x player2 takes o")
board = [['1','2','3'],['4','5','6'],['7','8','9']]
x = 'X'
o = 'O'
def displayBoard():
print(f" {board[0][0]} | {board[0][1]} | {board[0][2]}")
print("----------------------------------------")
print(f" {board[1][0]} | {board[1][1]} | {board[1][2]}")
print("----------------------------------------")
print(f" {board[2][0]} | {board[2][1]} | {board[2][2]}")
print("----------------------------------------")
def updateBoard(character,position):
row = (position-1)//3
column = (position-1)%3
board[row][column] = character
def check_win():
for i in range(3):
if board[i][0] == board[i][1] == board[i][2]:
return 1
elif board[0][i] == board[1][i] == board[2][i]:
return 1
if board[0][2] == board[1][1] == board[2][0]:
return 1
elif board[0][0] == board[1][1] == board[2][2]:
return 1
return 0
def check_position(position):
row = (position-1)//3
column = (position-1)%3
if board[row][column] == x or board[row][column] == o:
return 0
return 1
print("==============================welcome to tic tac toe game =====================")
counter = 0
while 1:
if counter % 2 == 0:
displayBoard()
while 1:
choice = int(input(f"player{(counter%2)+1},enter your position('{x}');"))
if choice < 1 or choice > 9:
print("invalid input oplease try againn")
if check_position(choice):
updateBoard(x,choice)
if check_win():
print(f"Congratulations !!!!!!!!!!!Player {(counter % 2)+1} won !!!!!!!!!!")
exit(0)
else :
counter += 1
break
else:
print(f"position{choice} is already occupied.Choose another position")
if counter == 9:
print("the match ended with draw better luck next time")
exit(0)
else:
displayBoard()
while 1:
choice = int(input(f"player{(counter%2)+1},enter your position('{o}'):"))
if choice < 1 or choice > 9:
print("invalid input please try again")
if check_position(choice):
updateBoard(o,choice)
if check_win():
print(f"congratulations !!!!!!!!!!!!!!! player{(counter%2)+1} won !!!!!!!!!!!!!1")
exit(0)
else:
counter += 1
break
else:
print(f"position {choice} is already occupied.choose another position")
print()
```

Wyświetl plik

@ -0,0 +1,371 @@
# Basic Mathematics
## What is a Matrix?
A matrix is a collection of numbers ordered in rows and columns. Here is one.
<body>
<table>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>6</td>
</tr>
<tr>
<td>7</td>
<td>8</td>
<td>9</td>
</tr>
</table>
</body>
A matrix is generally written within square brackets[]. The dimensions of a matrix is represented by (Number of rows x Number of columns).The dimensions of the above matrix is 3x3.
Matrices are the main characters in mathematical operations like addition, subtraction etc, especially those used in Pandas and NumPy. They can contain only numbers, symbols or expressions.
In order to refer to a particular element in the matrix we denote it by :
A<sub>ij</sub>
where i represents the ith row and j represents the jth column of the matrix.
## Scalars and Vectors
### Scalars
There exists specific cases of matrices which are also widely used.
A matrix with only one row and column i.e. containing only one element is commonly referred to as a scalar.
The numbers ```[12] ; [-5] ; [0] ; [3.14]``` all represent scalars. Scalars have 0 dimensions.
### Vectors
Vectors are objects with 1 dimension. They sit somewhere between scalars and matrices. They can also be referred to as one dimensional matrices.
```[1 3 5]``` represents a vector with dimension 1x3.
A vector is the simplest linear algebraic object.A matrix can be refered to as a collection of vectors
Vectors are broadly classified into 2 types:
- Row Vectors: Is of the form 1 x n where n refers to the number of columns the vector has.
- Column Vectors: Is of the form m x 1 where m refers to the number of rows the vector has.
m or n are also called as the length of the column and row vector respectively.
## Arrays in Python
To understand arrays, first let us start by declaring scalars, vectors and matrices in Python.
First we need to import numpy. We do so by importing it as 'np' as it provides better readability, namespace clarity and also aligns with the community guidelines.
```python
import numpy as np
```
Next up, we declare a scalar s as,
```
s = 5
```
Now we declare a vector,
```python
v = np.array([5,-2,4])
```
On printing v we get the following output,
```python
array([5,-2,4])
```
By default, a vector is declared as a **'row vector'**.
Finally, we declare matrices,
```python
m=np.array([[5,12,6],[-3,0,14]])
```
On printing m we get,
```python
array([[5,12,6],
[-3,0,14]])
```
> The type() function is used to return the data type of a given variable.
* The type(s) will return **'int'**.
* The type(v) will return **'numpy.ndarray'** which represents a **n-dimensional array**, since it is a 1 dimensional array.
* The type(m) will also return **'numpy.ndarray'** since it is a 2-dimensional array.
These are some ways in which arrays are useful in python.
> The shape() function is used to return the shape of a given variable.
* m.shape() returns (2,3) since we are dealing with a (2,3) matrix.
* v.shape() returns(3,) indicates it has only one dimensional or that it stores 3 elements in order.
* However, 'int' objects do not have shape and therefore s.shape() gives an error.
## What is a Tensor?
A Tensor can be thought of as a collection of matrices. It has dimensions k x m x n.
**NOTE:** Scalars, vectors and matrices are also tensors of rank 0,1,2 respectively.
Tensors can be stored in ndarrays.
Let's create a tensor with 2 matrices,
```python
m1=np.array([[5,12,6],[-3,0,14]])
m2=np.array([[2,1,8],[-6,2,0]])
t=np.array([m1,m2])
```
Upon printing t we get,
```python
array([[[5,12,6],
[-3,0,14]],
[[2,1,8],
[-6,2,0]]])
```
If we check it's shape, we see that is is a **(2,2,3)** object.
If we want to manually create a tensor we write,
```python
t=np.array([[[5,12,6], [-3,0,14]],[[2,1,8], [-6,2,0]]])
```
## Addition and Subtraction in Matrices
### Addition
For 2 matrices to be added to one another they must have **same dimensions**.
If we have 2 matrices say,
```python
A=np.array([[5,12,6],[-3,0,14]])
B=np.array([[2,1,8],[-6,2,0]])
C= A+B
```
The element at position A<sub>ij</sub> gets added to the element at position B<sub>ij</sub>. It's that simple!
The above input will give the resultant C as:
```python
array([[7,13,14],
[-9,2,14]])
```
### Subtraction
As we know, subtraction is a type of addition, the same rules apply here.
If we have 2 matrices say,
```python
A=np.array([[5,12,6],[-3,0,14]])
B=np.array([[2,1,8],[-6,2,0]])
C= A-B
```
The element at position B<sub>ij</sub> gets subtracted from the element at position A<sub>ij</sub>.
The above input will give the resultant C as:
```python
array([[3,11,-2],
[3,-2,14]])
```
Similarly the same operations can be done with **floating point numbers** as well.
In a similar fashion, we can add or subtract vectors as well with the condition that they must be of the **same length**.
```python
A=np.array([1,2,3,4,5])
B=np.array([6,7,8,9,10])
C= A+B
```
The result is a vector of length 5 with C as,
```python
array([7,9,11,13,15])
```
### Addition of scalars with vectors & matrices
Scalars show unique behaviour when added to matrices or vectors.
To demonstrate their behaviour, let's use an example,
Let's declare a matrix,
```python
A=np.array([[5,12,6],[-3,0,14]])
A+1
```
We see that if we perform the above function, i.e. add scalar [1] to the matrix A we get the output,
```python
array([[6,13,7],[-2,1,15]])
```
We see that the scalar is added to the matrix elementwise, i.e. each element gets incremented by 1.
**The same applies to vectors as well.**
Mathematically, it is not allowed as the shape of scalars are different from vectors or matrices but while programming in Python it works.
## Transpose of Matrices & Vectors
### Transposing Vectors
If X is the vector, then the transpose of the vector is represented as X<sup>T</sup>. It changes a vector of dimension n x 1 into a vector of dimension 1 x n, i.e. a row vector to a column vector and vice versa.
> * The values are not changing or transforming ; only their position is.
> * Transposing the same vector (object) twice yields the initial vector (object).
```python
x=np.array([1,2,3))
```
Transposing this in python using ```x.T``` will give
```python
array([1,2,3))
```
which is the same vector as the one taken as input.
> 1-Dimensional arrays don't really get transposed (in the memory of the computer)
To transpose a vector, we need to reshape it first.
```python
x_new= x.reshape(1,3)
x_new.T
```
will now result in the vector getting transposed,
```python
array([[1],
[2],
[3]])
```
### Transposing Matrices
If M is a matrix, then the transpose of the matrix M is represented as M<sup>T</sup>. When transposed, a m x n matrix becomes a n x m matrix.
The element M<sub>ij</sub> of the initial matrix becomes the N<sub>ji</sub> where N is the transposed matrix of M.
Let's understand this further with the help of of an example,
```python
A = np.array([[1,5,-6],[8,-2,0]])
```
The output for the above code snippet will be,
```python
array([[1,5,-6],
[8,-2,0]])
```
> **array.T** returns the transpose of an array (matrix).
```python
A.T
```
will give the output as,
```python
array([[1,8],
[5,-2],
[-6,0]])
```
Hope the following examples have cleared your concept on transposing.
## Dot Product
> **np.dot()** returns the dot product of two objects
> Dot product is represented by ( * ), for example, x(dot)y = x * y
>
### Scalar * Scalar
Let's start with scalar multiplication first.
``` [6] * [5] = [30]
[10] * [-2] = [-20]
```
It is the same multiplication that we are familiar with since learnt as kids.
Therefore, ```np.dot([6]*[5])``` returns ```30```.
### Vector * Vector
To multiply vectors with one another, they must be of **same length**.
Now let's understand this with an example,
```python
x = np.array([2,8,-4])
y = np.array([1,-7,3])
```
The dot product returns the elementwise product of the vector i.e.
x * y = ( x<sub>1</sub> * y<sub>1</sub> ) + ( x<sub>2</sub> * y<sub>2</sub> ) + ( x<sub>3</sub> * y<sub>3</sub> ) in the above example.
Therefore, ```np.dot(x,y)``` gives ```[-66]``` as the input.
We observe that **dot product of 2 vectors returns a scalar**.
### Scalar * Vector
When we multiply a scalar with a vector, we observe that each element of the vector gets multiplied to the scalar individually.
A scalar k when multiplied to a vector v([x1,x2,x3]) gives the product = [(k * x1) + (k * x2) + (k * x3)]
An example would bring further clarity,
```python
y = np.array([1,-7,3])
y*5
```
will give the following output
```python
array[(5,-35,15)]
```
We observe that **dot product of 2 vectors returns a scalar**.
We observe that **dot product of a vector and a scalar returns a vector**.
## Dot Product of Matrices
### Scalar * Matrix
Dot product of a scalar with a matrix works similar to dot product of a vector with a scalar.
Now, we come to a very important concept which will be very useful to us while working in Python.
Each element of the vector gets multiplied to the scalar individually.
```python
A = np.array([[1,5,-6],[8,-2,0]])
B = 3 * A
```
will give the resultant B as
```python
array([[3,15,-18],
[24,-6,0]])
```
Thus each element gets multiplied by 3.
> NOTE: The dot product of a scalar and a matrix gives a matrix of the same shape as the input matrix.
### Matrix * Matrix
A matrix can be multipied to a matrix. However it has certain compatibility measures,
* We can only multiply an m x n matrix with an n x k matrix
* Basically the 2nd dimension of the first matrix has to match the 1st dimension of the 2nd matrix.
> The output of a m x n matrix with a n x k matrix gives a **m x k** matrix.
**Whenever we have a dot product of 2 matrices, we multiply row vectors within one matrix to the column vector of 2nd matrix.**
For example, let's use multiply a row vector to a column vector to understand it further.
```
([[1]
([2 8 4]) * [2] = [(2*1) + (8*2) + (4*3)] = [30]
[3]])
```
Now, let's multiply a 2 x 3 matrix with a 3 x 2 matrix.
```
([[A1,A2,A3], * ([[B1,B2] ([[(A1 * B1 + A2 * B3 + A3 * B5) , (A1 * B2 + A2 * B4 + A3 * B6)]
[A4,A5,A6]]) [B3,B4], = [ (A4 * B1 + A5 * B3 + A6 * B5) , (A4 * B2 + A5 * B4 + A6 * B6)]])
[B5,B6]])
```
Thus we obtain a 2 x 2 matrix.
We use the np.dot() method to directly obtain the dot product of the 2 matrices.
Now let's do an example using python just to solidify our knowledge.
```python
A=np.array([[5,12,6],[-3,0,14]])
B=np.array([[2,-1],[8,0],[3,0]])
np.dot(A,B)
```
The output we obtain is,
```python
array[[124,-5],
[36, 3]])
```

Wyświetl plik

@ -0,0 +1,267 @@
# Numpy Data Types
In NumPy, data types play a crcial role in representing and manipulating numerical data.
Numpy supports the following data types:
- `i` - integer
- `b` - boolean
- `u` - unsigned integer
- `f` - float
- `c` - complex float
- `m` - timedelta
- `M` - datetime
- `O` - object
- `S` - string
- `U` - unicode string
_Referred from: W3schools_
## dtype() Function
The `dtype()` function returns the type of the NumPy array object.
Example 1
``` python
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr.dtype)
# Output: int64
```
Example 2
``` python
import numpy as np
arr = np.array(['apple', 'banana', 'cherry'])
print(arr.dtype)
# Output: <U6
```
## Example for integer type
The NumPy integer array can be defined in two ways.
Way 1: Using function `int_()`
``` python
import numpy as np
arr = np.int_([2,4,6])
# Size: int8, int16, int32, int64
print(arr.dtype())
# Output: int64
```
Way 2: Using `dtype()`
``` python
import numpy as np
arr = np.array([2,4,6], dtype='i4')
# Size: i1, i2, i4, i8
print(arr.dtype)
# Output: int32
```
Note: `np.intc()` has the same function as `int32()`.
## Example for float type
Way 1: Using function `float_()`
``` python
import numpy as np
arr = np.float_(1)
# Size: float8, float16, float32, float64
print(arr)
print(arr.dtype())
# Output:
# 1.0
# float64
```
Way 2: Using `dtype()`
``` python
import numpy as np
arr = np.array([2,4,6], dtype='f4')
# Size: f1, f2, f4, f8
print(arr)
print(arr.dtype)
# Output:
# [1. 2. 3. 4.]
# float32
```
Note: `np.single()` has the same function as `float32()`.
## Example for boolean type
``` python
import numpy as np
x = np.bool_(1)
print(x)
print(x.dtype)
# Output:
# True
# bool
```
## Example for unsigned integer type
``` python
import numpy as np
x = np.uintc(1)
print(x)
print(x.dtype)
# Output:
# 1
# uint32
```
## Example for complex type
Complex type is a combination of real number + imaginary number. The `complex_()` is used to define the complex type NumPy object.
``` python
import numpy as np
x = np.complex_(1)
# Size: complex64, complex128
print(x)
print(x.dtype)
# Output:
# (1+0j)
# complex128
```
## Example for datetime type
The `datetime64()` is used to define the date, month and year.
``` python
import numpy as np
x = np.datetime64('2024-05')
y = np.datetime64('2024-05-20')
z = np.datetime64('2024')
print(x,x.dtype)
print(y,y.dtype)
print(z,z.dtype)
# Output:
# 2024-05 datetime64[M]
# 2024-20-05 datetime64[D]
# 2024 datetime64[Y]
```
## Example for string type
``` python
import numpy as np
arr = np.str_("roopa")
print(arr.dtype)
# Output: <U5
```
## Example for object type
``` python
import numpy as np
arr = np.object_([1, 2, 3, 4])
print(arr)
print(arr.dtype)
# Output:
# [1, 2, 3, 4]
# object
```
## Example for unicode string type
``` python
import numpy as np
arr = np.array(['apple', 'banana', 'cherry'])
print(arr.dtype)
# Output: <U6
```
## Example for timedelta type
The `timedelta64()` used to find the difference between the `datetime64()`. The arguments for timedelta64 are a number, to represent the number of units, and a date/time unit, such as (D)ay, (M)onth, (Y)ear, (h)ours, (m)inutes, or (s)econds. The timedelta64 data type also accepts the string “NAT” in place of the number for a “Not A Time” value.
``` python
import numpy as np
x = np.datetime64('2024-05-20')
y = np.datetime64('2023-05-20')
res = x - y
print(res)
print(res.dtype)
# Output:
# 366 days
# timedelta64[D]
```
## Additional Data Type (`longdouble`)
`longdouble` is a data type that provides higher precision than the standard double-precision floating-point (`float64`) type.
``` python
import numpy as np
arr = np.longdouble([1.222222, 4.44, 45.55])
print(arr, arr.dtype)
# Output:
# [1.222222 4.44 45.55] float128
```
# Data Type Conversion
`astype()` function is used to the NumPy object from one type to another type.
It creates a copy of the array and allows to specify the data type of our choice.
## Example 1
``` python
import numpy as np
x = np.array([1.2, 3.4, 5.6])
y = x.astype(int)
print(y,y.dtype)
# Output:
# [1 3 5] int64
```
## Example 2
``` python
import numpy as np
x = np.array([1, 3, 0])
y = x.astype(bool)
print(y,y.dtype)
# Output:
# [True True False] bool
```

Wyświetl plik

@ -1,3 +1,9 @@
# List of sections
- [Section title](filename.md)
- [Installing NumPy](installing-numpy.md)
- [Introduction](introduction.md)
- [NumPy Data Types](datatypes.md)
- [Basic Mathematics](basic_math.md)
- [Operations on Arrays in NumPy](operations-on-arrays.md)
- [Loading Arrays from Files](loading_arrays_from_files.md)
- [Saving Numpy Arrays into FIles](saving_numpy_arrays_to_files.md)

Wyświetl plik

@ -0,0 +1,82 @@
# Installing NumPy
NumPy is the fundamental package for scientific computing in Python.
NumPy is used for working with arrays.
The only prerequisite for installing NumPy is Python itself.
#
**Step 1: Check if PIP is Installed**
Before installing NumPy, it's essential to ensure that PIP (Python Package Installer) is installed on your system. PIP is a package management system used to install and manage Python packages. You can verify if PIP is installed by running a simple command in your terminal or command prompt.
```bash
pip --version
```
If PIP is not currently installed on your system, you can install it by visiting the [pypi.org](https://pypi.org/project/pip/) webpage.
#
**Step 2: Installing PIP**
**get-pip.py**
This is a Python script that uses some bootstrapping logic to install pip.
Open a terminal / command prompt and run:
**Linux**
```bash
python get-pip.py
```
**Windows**
```bash
py get-pip.py
```
**MacOS**
```bash
python get-pip.py
```
#
**Step 3: Installing NumPy**
NumPy can be installed either through conda or pip.
If you use pip, you can install NumPy with:
```bash
pip install numpy
```
If you use conda, you can install NumPy from the defaults or conda-forge channels:
```
# Best practice, use an environment rather than install in the base env
conda create -n my-env
conda activate my-env
```
```
# If you want to install from conda-forge
conda config --env --add channels conda-forge
```
```
# The actual install command
conda install numpy
```
You can find more information about how to install [NumPy](https://numpy.org/install/) on numpy.org.
#
**Step 4: Check if NumPy is Installed**
We can utilize the "pip show" command not only to display the version but also to determine whether NumPy is installed on the system.
```bash
pip show numpy
```

Wyświetl plik

@ -0,0 +1,30 @@
# Introduction
## What is NumPy?
NumPy is a powerful array-processing library in Python, essential for scientific computing. It provides efficient data structures and tools for working with multidimensional arrays.
## Key Features
1. **Efficient Arrays:** NumPy offers high-performance N-dimensional array objects for swift data manipulation.
2. **Broadcasting:** Advanced broadcasting enables seamless element-wise operations on arrays of varying shapes.
3. **Interoperability:** NumPy seamlessly integrates with C, C++, and Fortran, enhancing performance and versatility.
4. **Mathematical Tools:** Comprehensive support for linear algebra, Fourier transforms, and random number generation.
## Installation
Ensure Python is installed in your system. If not you can install it from here([official Python website](https://www.python.org/)),then install NumPy via:
```bash
pip install numpy
```
## Importing NumPy
To access NumPy functions, import it with the alias `np`.
```python
import numpy as np
```
Using `np` as an alias enhances code readability and is a widely adopted convention.

Wyświetl plik

@ -0,0 +1,115 @@
# Loading Arrays From Files
Scientific computing and data analysis require the critical feature of being able to load data from different file formats. NumPy has several functionalities for reading data from various file types and converting them into arrays. This part of the content will show how one can load arrays from standard file formats.
## Here are the methods available:
### 1. numpy.loadtxt():
The `loadtxt` function allows you to load data from a text file.You can specify various parameters such as the file name, data type, delimiter,
and more. It reads the file line by line, splits it at the specified delimiter, and converts the values into an array.
- #### Syntax:
```python
numpy.loadtxt(fname, dtype = float, delimiter=None, converters=None, skiprows=0, usecols=None)
```
**fname** : Name of the file <br>
**dtype** : Data type of the resulting array. (By default is float) <br>
**delimiter**: String or character separating columns. (By default is whitespace) <br>
**converters**: Dictionary mapping column number to a function to convert that column's string to a float. <br>
**skiprows**: Number of lines to skip at the beginning of the file. <br>
**usecols**: Which columns to read starting from 0.
- #### Example for `loadtxt`:
**example.txt** <br>
![image](https://github.com/Santhosh-Siddhardha/learn-python/assets/103999924/a0148d29-5fba-45fa-b3f4-058406b3016b)
**Code** <br>
```python
import numpy as np
arr = np.loadtxt("example.txt", dtype=int)
print(arr)
```
**Output**<br>
```python
[1 2 3 4 5]
```
<br>
### 2. numpy.genfromtxt():
The `genfromtxt` function is similar to loadtxt but provides more flexibility. It handles missing values (such as NaNs), allows custom converters
for data parsing, and can handle different data types within the same file. Its particularly useful for handling complex data formats.
- #### Syntax:
```python
numpy.genfromtxt(fname, dtype=float, delimiter=None, converters=None, missing_values=None, filling_values=None, usecols=None)
```
**fname** : Name of the file <br>
**dtype** : Data type of the resulting array. (By default is float) <br>
**delimiter**: String or character separating columns; default is any whitespace. <br>
**converters**: Dictionary mapping column number to a function to convert that column's string to a float. <br>
**missing_values**: Set of strings corresponding to missing data.<br>
**filling_values**: Value used to fill in missing data. Default is NaN.<br>
**usecols**: Which columns to read starting from 0.
- #### Example for `genfromtxt`:
**example.txt** <br>
![image](https://github.com/Santhosh-Siddhardha/learn-python/assets/103999924/3f9cdd91-4255-4e30-923d-f29c5f237798)
**Code** <br>
```python
import numpy as np
arr = np.genfromtxt("example.txt", dtype='str', usecols=1)
print(arr)
```
**Output**<br>
```python
['Name' 'Kohli' 'Dhoni' 'Rohit']
```
<br>
### 3. numpy.load():
`load` method is used to load arrays saved in NumPys native binary format (.npy or .npz). These files preserve the array structure, data types, and metadata.
Its an efficient way to store and load large arrays.
- #### Syntax:
```python
numpy.load(fname, mmap_mode=None, encoding='ASCII')
```
**fname** : Name of the file <br>
**mmap_mode** : Memory-map the file using the given mode (r, r+, w+, c)(By Default None).Memory-mapping only works with arrays stored in a binary file on disk, not with compressed archives like .npz.<br>
**encoding**:Encoding is used when reading Python2 strings only. (By Default ASCII) <br>
- #### Example for `load`:
**Code** <br>
```python
import numpy as np
arr = np.array(['a','b','c'])
np.savez('example.npz', array=arr) # stores arr in data.npz in NumPy's native binary format
data = np.load('example.npz')
print(data['array'])
```
**Output**<br>
```python
['a' 'b' 'c']
```
<br>
These methods empower users to seamlessly integrate data into their scientific workflows, whether from text files or binary formats.

Wyświetl plik

@ -0,0 +1,281 @@
# Operations on Arrays
## NumPy Arithmetic Operations
NumPy offers a broad array of operations for arrays, including arithmetic functions.
The arithmetic operations in NumPy are popular for their simplicity and efficiency in handling array calculations.
**Addition**
we can use the `+` operator to perform element-wise addition between two or more NumPy arrays.
**Code**
```python
import numpy as np
array_1 = np.array([9, 10, 11, 12])
array_2 = np.array([1, 3, 5, 7])
result_1 = array_1 + array_2
print("Utilizing the + operator:", result_1)
```
**Output:**
```
Utilizing the + operator: [10 13 16 19]
```
**Subtraction**
we can use the `-` operator to perform element-wise subtraction between two or more NumPy arrays.
**Code**
```python
import numpy as np
array_1 = np.array([9, 10, 11, 12])
array_2 = np.array([1, 3, 5, 7])
result_1 = array_1 - array_2
print("Utilizing the - operator:", result_1)
```
**Output:**
```
Utilizing the - operator: [8 7 6 5]
```
**Multiplication**
we can use the `*` operator to perform element-wise multiplication between two or more NumPy arrays.
**Code**
```python
import numpy as np
array_1 = np.array([9, 10, 11, 12])
array_2 = np.array([1, 3, 5, 7])
result_1 = array_1 * array_2
print("Utilizing the * operator:", result_1)
```
**Output:**
```
Utilizing the * operator: [9 30 55 84]
```
**Division**
we can use the `/` operator to perform element-wise division between two or more NumPy arrays.
**Code**
```python
import numpy as np
array_1 = np.array([9, 10, 11, 12])
array_2 = np.array([1, 3, 5, 7])
result_1 = array_1 / array_2
print("Utilizing the / operator:", result_1)
```
**Output:**
```
Utilizing the / operator: [9. 3.33333333 2.2 1.71428571]
```
**Exponentiation**
we can use the `**` operator to perform element-wise exponentiation between two or more NumPy arrays.
**Code**
```python
import numpy as np
array_1 = np.array([9, 10, 11, 12])
array_2 = np.array([1, 3, 5, 7])
result_1 = array_1 ** array_2
print("Utilizing the ** operator:", result_1)
```
**Output:**
```
Utilizing the ** operator: [9 1000 161051 35831808]
```
**Modulus**
We can use the `%` operator to perform element-wise modulus operations between two or more NumPy arrays.
**Code**
```python
import numpy as np
array_1 = np.array([9, 10, 11, 12])
array_2 = np.array([1, 3, 5, 7])
result_1 = array_1 % array_2
print("Utilizing the % operator:", result_1)
```
**Output:**
```
Utilizing the % operator: [0 1 1 5]
```
<br>
## NumPy Comparision Operations
<br>
NumPy provides various comparison operators that can compare elements across multiple NumPy arrays.
**less than operator**
The `<` operator returns `True` if the value of operand on left is less than the value of operand on right.
**Code**
```python
import numpy as np
array_1 = np.array([12,15,20])
array_2 = np.array([20,15,12])
result_1 = array_1 < array_2
print("array_1 < array_2:",result_1)
```
**Output:**
```
array_1 < array_2 : [True False False]
```
**less than or equal to operator**
The `<=` operator returns `True` if the value of operand on left is lesser than or equal to the value of operand on right.
**Code**
```python
import numpy as np
array_1 = np.array([12,15,20])
array_2 = np.array([20,15,12])
result_1 = array_1 <= array_2
print("array_1 <= array_2:",result_1)
```
**Output:**
```
array_1 <= array_2: [True True False]
```
**greater than operator**
The `>` operator returns `True` if the value of operand on left is greater than the value of operand on right.
**Code**
```python
import numpy as np
array_1 = np.array([12,15,20])
array_2 = np.array([20,15,12])
result_2 = array_1 > array_2
print("array_1 > array_2:",result_2)
```
**Output:**
```
array_1 > array_2 : [False False True]
```
**greater than or equal to operator**
The `>=` operator returns `True` if the value of operand on left is greater than or equal to the value of operand on right.
**Code**
```python
import numpy as np
array_1 = np.array([12,15,20])
array_2 = np.array([20,15,12])
result_2 = array_1 >= array_2
print("array_1 >= array_2:",result_2)
```
**Output:**
```
array_1 >= array_2: [False True True]
```
**equal to operator**
The `==` operator returns `True` if the value of operand on left is same as the value of operand on right.
**Code**
```python
import numpy as np
array_1 = np.array([12,15,20])
array_2 = np.array([20,15,12])
result_3 = array_1 == array_2
print("array_1 == array_2:",result_3)
```
**Output:**
```
array_1 == array_2: [False True False]
```
**not equal to operator**
The `!=` operator returns `True` if the value of operand on left is not equal to the value of operand on right.
**Code**
```python
import numpy as np
array_1 = np.array([12,15,20])
array_2 = np.array([20,15,12])
result_3 = array_1 != array_2
print("array_1 != array_2:",result_3)
```
**Output:**
```
array_1 != array_2: [True False True]
```
<br>
## NumPy Logical Operations
Logical operators perform Boolean algebra. A branch of algebra that deals with `True` and `False` statements.
It illustrates the logical operations of AND, OR, and NOT using np.logical_and(), np.logical_or(), and np.logical_not() functions, respectively.
**Logical AND**
Evaluates the element-wise truth value of `array_1` AND `array_2`
**Code**
```python
import numpy as np
array_1 = np.array([True, False, True])
array_2 = np.array([False, False, True])
print(np.logical_and(array_1, array_2))
```
**Output:**
```
[False False True]
```
**Logical OR**
Evaluates the element-wise truth value of `array_1` OR `array_2`
**Code**
```python
import numpy as np
array_1 = np.array([True, False, True])
array_2 = np.array([False, False, True])
print(np.logical_or(array_1, array_2))
```
**Output:**
```
[True False True]
```
**Logical NOT**
Evaluates the element-wise truth value of `array_1` NOT `array_2`
**Code**
```python
import numpy as np
array_1 = np.array([True, False, True])
array_2 = np.array([False, False, True])
print(np.logical_not(array_1))
```
**Output:**
```
[False True False]
```

Wyświetl plik

@ -0,0 +1,126 @@
# Saving NumPy Arrays to Files
- Saving arrays in NumPy is important due to its efficiency in storage and speed, maintaining data integrity and precision, and offering convenience and interoperability.
- NumPy provides several methods to save arrays efficiently, either in binary or text formats.
- The primary methods are `save`, `savez`, and `savetxt`.
### 1. numpy.save():
The `np.save` function saves a single NumPy array to a binary file with a `.npy` extension. This format is efficient and preserves the array's data type and shape.
#### Syntax :
```python
numpy.save(file, arr, allow_pickle=True, fix_imports=True)
```
- **file** : Name of the file.
- **arr** : Array to be saved.
- **allow_pickle** : This is an Optional parameter, Allows saving object arrays using Python pickles.(By Default True)
- **fix_imports** : This is an Optional parameter, Fixes issues for Python 2 to Python 3 compatibility.(By Default True)
#### Example :
```python
import numpy as np
arr = np.array([1,2,3,4,5])
np.save("example.npy",arr) #saves arr into example.npy file in binary format
```
Inorder to load the array from example.npy
```python
arr1 = np.load("example.npy")
print(arr1)
```
**Output** :
```python
[1,2,3,4,5]
```
### 2. numpy.savez():
The `np.savez` function saves multiple NumPy arrays into a single file with a `.npz` extension. Each array is stored with a unique name.
#### Syntax :
```python
numpy.savez(file, *args, **kwds)
```
- **file** : Name of the file.
- **args** : Arrays to be saved.( If arrays are unnamed, they are stored with default names like arr_0, arr_1, etc.)
- **kwds** : Named arrays to be saved.
#### Example :
```python
import numpy as np
arr1 = np.array([1,2,3,4,5])
arr2 = np.array(['a','b','c','d'])
arr3 = np.array([1.2,3.4,5])
np.savez('example.npz', a1=arr1, a2=arr2, a3 = arr3) #saves arrays in npz format
```
Inorder to load the array from example.npz
```python
arr = np.load('example.npz')
print(arr['a1'])
print(arr['a2'])
print(arr['a3'])
```
**Output** :
```python
[1 2 3 4 5]
['a' 'b' 'c' 'd']
[1.2 3.4 5. ]
```
### 3. np.savetxt()
The `np.savetxt` function saves a NumPy array to a text file, such as `.txt` or `.csv`. This format is human-readable and can be used for interoperability with other tools.
#### Syntax :
```python
numpy.savetxt(fname, X, delimiter=' ', newline='\n', header='', footer='', encoding=None)
```
- **fname** : Name of the file.
- **X** : Array to be saved.
- **delimiter** : It is a Optional parameter,This is a character or string that is used to separate columns.(By Default it is " ")
- **newline** : It is a Optional parameter, Character for seperating lines.(By Default it is "\n")
- **header** : It is a Optional parameter, String that is written at beginning of the file.
- **footer** : It is a Optional parameter, String that is written at ending of the file.
- **encoding** : It is a Optional parameter, Encoding of the output file. (By Default it is None)
#### Example :
```python
import numpy as np
arr = np.array([1.1,2.2,3,4.4,5])
np.savetxt("example.txt",arr) #saves the array in example.txt
```
Inorder to load the array from example.txt
```python
arr1 = np.loadtxt("example.txt")
print(arr1)
```
**Output** :
```python
[1.1 2.2 3. 4.4 5. ]
```
By using these methods, you can efficiently save and load NumPy arrays in various formats suitable for your needs.

Wyświetl plik

@ -0,0 +1,11 @@
Make,Colour,Odometer (KM),Doors,Price
Toyota,White,150043,4,"$4,000.00"
Honda,Red,87899,4,"$5,000.00"
Toyota,Blue,32549,3,"$7,000.00"
BMW,Black,11179,5,"$22,000.00"
Nissan,White,213095,4,"$3,500.00"
Toyota,Green,99213,4,"$4,500.00"
Honda,Blue,45698,4,"$7,500.00"
Honda,Blue,54738,4,"$7,000.00"
Toyota,White,60000,4,"$6,250.00"
Nissan,White,31600,4,"$9,700.00"
1 Make Colour Odometer (KM) Doors Price
2 Toyota White 150043 4 $4,000.00
3 Honda Red 87899 4 $5,000.00
4 Toyota Blue 32549 3 $7,000.00
5 BMW Black 11179 5 $22,000.00
6 Nissan White 213095 4 $3,500.00
7 Toyota Green 99213 4 $4,500.00
8 Honda Blue 45698 4 $7,500.00
9 Honda Blue 54738 4 $7,000.00
10 Toyota White 60000 4 $6,250.00
11 Nissan White 31600 4 $9,700.00

Wyświetl plik

@ -0,0 +1 @@
## This folder contains all the Datasets used in the content.

Wyświetl plik

@ -0,0 +1,573 @@
## Descriptive Statistics
In the realm of data science, understanding the characteristics of data is fundamental. Descriptive statistics provide the tools and techniques to succinctly summarize and present the key features of a dataset. It serves as the cornerstone for exploring, visualizing, and ultimately gaining insights from data.
Descriptive statistics encompasses a range of methods designed to describe the central tendency, dispersion, and shape of a dataset. Through measures such as mean, median, mode, standard deviation, and variance, descriptive statistics offer a comprehensive snapshot of the data's distribution and variability.
Data scientists utilize descriptive statistics to uncover patterns, identify outliers, and assess the overall structure of data before delving into more advanced analyses. By summarizing large and complex datasets into manageable and interpretable summaries, descriptive statistics facilitate informed decision-making and actionable insights.
```python
import pandas as pd
import numpy as np
df = pd.read_csv("Age-Income-Dataset.csv")
df
```
| | Age | Income |
| --- | ----------- | ------ |
| 0 | Young | 25000 |
| 1 | Middle Age | 54000 |
| 2 | Old | 60000 |
| 3 | Young | 15000 |
| 4 | Young | 45000 |
| 5 | Young | 65000 |
| 6 | Young | 70000 |
| 7 | Young | 30000 |
| 8 | Middle Age | 27000 |
| 9 | Young | 23000 |
| 10 | Young | 48000 |
| 11 | Old | 52000 |
| 12 | Young | 33000 |
| 13 | Old | 80000 |
| 14 | Old | 75000 |
| 15 | Old | 35000 |
| 16 | Middle Age | 29000 |
| 17 | Middle Age | 57000 |
| 18 | Old | 43000 |
| 19 | Middle Age | 56000 |
| 20 | Old | 63000 |
| 21 | Old | 32000 |
| 22 | Old | 45000 |
| 23 | Old | 89000 |
| 24 | Middle Age | 90000 |
| 25 | Middle Age | 93000 |
| 26 | Young | 80000 |
| 27 | Young | 87000 |
| 28 | Young | 38000 |
| 29 | Young | 23000 |
| 30 | Middle Age | 38900 |
| 31 | Middle Age | 53200 |
| 32 | Old | 43800 |
| 33 | Middle Age | 25600 |
| 34 | Middle Age | 65400 |
| 35 | Old | 76800 |
| 36 | Old | 89700 |
| 37 | Old | 41800 |
| 38 | Young | 31900 |
| 39 | Old | 25600 |
| 40 | Middle Age | 45700 |
| 41 | Old | 35600 |
| 42 | Young | 54300 |
| 43 | Middle Age | 65400 |
| 44 | Old | 67800 |
| 45 | Old | 24500 |
| 46 | Middle Age | 34900 |
| 47 | Old | 45300 |
| 48 | Young | 68400 |
| 49 | Middle Age | 51700 |
```python
df.describe()
```
| | Income |
|-------|-------------|
| count | 50.000000 |
| mean | 50966.000000 |
| std | 21096.683268 |
| min | 15000.000000 |
| 25% | 33475.000000 |
| 50% | 46850.000000 |
| 75% | 65400.000000 |
| max | 93000.000000 |
### Mean
The mean, also known as the average, is a measure of central tendency in a dataset. It represents the typical value of a set of numbers. The formula to calculate the mean of a dataset is:
$$ \overline{x} = \frac{\sum\limits_{i=1}^{n} x_i}{n} $$
* $\overline{x}$ (pronounced "x bar") represents the mean value.
* $x_i$ represents the individual value in the dataset (where i goes from 1 to n).
* $\sum$ (sigma) represents the summation symbol, indicating we add up all the values from i=1 to n.
* $n$ represents the total number of values in the dataset.
```python
df['Income'].mean()
```
#### Result
```
50966.0
```
#### Without pandas
```python
def mean_f(df):
for col in df.columns:
if df[col].dtype != 'O':
temp = 0
for i in df[col]:
temp = temp +i
print("Without pandas Library -> ")
print("Average of {} is {}".format(col,(temp/len(df[col]))))
print()
print("With pandas Library -> ")
print(df[col].mean())
mean_f(df)
```
Average of Income:
- Without pandas Library -> 50966.0
- With pandas Library -> 50966.0
### Median
The median is another measure of central tendency in a dataset. Unlike the mean, which is the average value of all data points, the median represents the middle value when the dataset is ordered from smallest to largest. If the dataset has an odd number of observations, the median is the middle value. If the dataset has an even number of observations, the median is the average of the two middle values.
The median represents the "middle" value in a dataset. There are two cases to consider depending on whether the number of observations (n) is odd or even:
**Odd number of observations (n):**
In this case, the median (M) is the value located at the middle position when the data is ordered from least to greatest. We can calculate the position using the following formula:
$$ M = x_{n+1/2} $$
**Even number of observations (n):**
When we have an even number of observations, there isn't a single "middle" value. Instead, the median is the average of the two middle values after ordering the data. Here's the formula to find the median:
$$ M = \frac{x_{n/2} + x_{(n/2)+1}}{2} $$
**Explanation:**
* M represents the median value.
* n represents the total number of observations in the dataset.
* $x$ represents the individual value.
```python
df['Income'].median()
```
#### Result
```
46850.0
```
#### Without pandas
```python
def median_f(df):
for col in df.columns:
if df[col].dtype != 'O':
sorted_data = sorted(df[col])
n = len(df[col])
if n%2 == 0:
x1 =sorted_data[int((n/2))]
x2 =sorted_data[int((n/2))+1]
median=(x1+x2)/2
else:
median = sorted_data[(n+1)/2]
print("Median without library ->")
print("Median of {} is {} ".format(col,median))
print("Median with library ->")
print(df[col].median())
median_f(df)
```
Median of Income:
- Median without library -> 49850.0
- Median with library -> 46850.0
### Mode
The mode is a measure of central tendency that represents the value or values that occur most frequently in a dataset. Unlike the mean and median, which focus on the average or middle value, the mode identifies the most common value(s) in the dataset.
```python
def mode_f(df):
for col in df.columns:
if df[col].dtype == 'O':
print("Column:", col)
arr = df[col].sort_values()
prevcnt = 0
cnt = 0
ans = arr[0]
temp = arr[0]
for i in arr:
if(temp == i) :
cnt += 1
else:
prevcnt = cnt
cnt = 1
temp = i
if(cnt > prevcnt):
ans = i
print("Without pandas Library -> ")
print("Mode of {} is {}".format(col,ans))
print()
print("With pandas Library -> ")
print(df[col].mode())
mode_f(df)
```
#### Result
```
Column: Age
Without pandas Library ->
Mode of Age is Old
With pandas Library ->
0 Old
Name: Age, dtype: object
```
### Standard Deviation
Standard deviation is a measure of the dispersion or spread of a dataset. It quantifies the amount of variation or dispersion of a set of values from the mean. In other words, it indicates how much individual values in a dataset deviate from the mean.
$$s = \sqrt{\frac{\sum(x_i-\overline{x})^{2}}{n-1}}$$
* $s$ represents the standard deviation.
* $\sum$ (sigma) represents the summation symbol, indicating we add up the values for all data points.
* $x_i$ represents the individual value in the dataset.
* $\overline{x}$ (x bar) represents the mean value of the dataset.
* $n$ represents the total number of values in the dataset.
```python
df['Income'].std()
```
#### Result
```
21096.683267707253
```
#### Without pandas
```python
import math
def std_f(df):
for col in df.columns:
if len(df[col]) == 0:
print("Column is empty")
if df[col].dtype != 'O':
sum = 0
mean = df[col].mean()
for i in df[col]:
sum = sum + (i - mean)**2
std = math.sqrt(sum/len(df[col]))
print("Without pandas library ->")
print("Std : " , std)
print("With pandas library: ->")
print("Std : {}".format(np.std(df[col]))) ##ddof = 1
std_f(df)
```
Without pandas library ->
Std : 20884.6509187968 \
With pandas library: ->
Std : 20884.6509187968
### Count
```python
df['Income'].count()
```
#### Result
```
50
```
### Minimum
```python
df['Income'].min()
```
#### Result
```
15000
```
#### Without pandas
```python
def min_f(df):
for col in df.columns:
if df[col].dtype != "O":
sorted_data = sorted(df[col])
min = sorted_data[0]
print("Without pandas Library->",min)
print("With pandas Library->",df[col].min())
min_f(df)
```
Without pandas Library-> 15000 \
With pandas Library-> 15000
### Maximum
```python
df['Income'].max()
```
#### Result
```
93000
```
#### Without pandas
```python
def max_f(df):
for col in df.columns:
if df[col].dtype != "O":
sorted_data = sorted(df[col])
max = sorted_data[len(df[col])-1]
print("Without pandas Library->",max)
print("With pandas Library->",df[col].max())
max_f(df)
```
Without pandas Library-> 93000
With pandas Library-> 93000
### Percentile
```python
df['Income'].quantile(0.25)
```
#### Result
```
33475.0
```
```python
df['Income'].quantile(0.75)
```
#### Result
```
65400.0
```
#### Without pandas
```python
def percentile_f(df,percentile):
for col in df.columns:
if df[col].dtype != 'O':
sorted_data = sorted(df[col])
index = int(percentile*len(df[col]))
percentile_result = sorted_data[index]
print(f"{percentile} Percentile is : ",percentile_result)
percentile_f(df,0.25)
```
0.25 Percentile is : 33000
We have used the method of nearest rank to calculate percentile manually.
Pandas uses linear interpolation of data to calculate percentiles.
## Correlation and Covariance
```python
df = pd.read_csv('Iris.csv')
df.head(5)
```
| | Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species |
|---|----|---------------|--------------|---------------|--------------|-------------|
| 0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
| 1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
| 2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
| 3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
| 4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
```python
df.drop(['Id','Species'],axis=1,inplace= True)
```
### Covarience
Covariance measures the degree to which two variables change together. If the covariance between two variables is positive, it means that they tend to increase or decrease together. If the covariance is negative, it means that as one variable increases, the other tends to decrease. However, covariance does not provide a standardized measure, making it difficult to interpret the strength of the relationship between variables, especially if the variables are measured in different units.
$$ COV(X,Y) = \frac{\sum\limits_{i=1}^{n} (X_i - \overline{X}) (Y_i - \overline{Y})}{n - 1}$$
**Explanation:**
* $COV(X, Y)$ represents the covariance between variables X and Y.
* $X_i$ and $Y_i$ represent the individual values for variables X and Y in the i-th observation.
* $\overline{X}$ and $\overline{Y}$ represent the mean values for variables X and Y, respectively.
* $n$ represents the total number of observations in the dataset.
```python
df.cov()
```
| | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm |
|-------------------|-------------- |---------------|-----------------|--------------|
| **SepalLengthCm** | 0.685694 | -0.039268 | 1.273682 | 0.516904 |
| **SepalWidthCm** | -0.039268 | 0.188004 | -0.321713 | -0.117981 |
| **PetalLengthCm** | 1.273682 | -0.321713 | 3.113179 | 1.296387 |
| **PetalWidthCm** | 0.516904 | -0.117981 | 1.296387 | 0.582414 |
#### Without pandas
```python
def cov_f(df):
for x in df.columns:
for y in df.columns:
mean_x = df[x].mean()
mean_y = df[y].mean()
sum = 0
n = len(df[x])
for val in range(n):
sum += (df[x].iloc[val] - mean_x)*(df[y].iloc[val] - mean_y)
print("Covariance of {} and {} is : {}".format(x,y, sum/(n-1)))
print()
cov_f(df)
```
#### Result
```
Covariance of SepalLengthCm and SepalLengthCm is : 0.6856935123042504
Covariance of SepalLengthCm and SepalWidthCm is : -0.03926845637583892
Covariance of SepalLengthCm and PetalLengthCm is : 1.2736823266219246
Covariance of SepalLengthCm and PetalWidthCm is : 0.5169038031319911
Covariance of SepalWidthCm and SepalLengthCm is : -0.03926845637583892
Covariance of SepalWidthCm and SepalWidthCm is : 0.1880040268456377
Covariance of SepalWidthCm and PetalLengthCm is : -0.32171275167785235
Covariance of SepalWidthCm and PetalWidthCm is : -0.11798120805369115
Covariance of PetalLengthCm and SepalLengthCm is : 1.2736823266219246
Covariance of PetalLengthCm and SepalWidthCm is : -0.32171275167785235
Covariance of PetalLengthCm and PetalLengthCm is : 3.113179418344519
Covariance of PetalLengthCm and PetalWidthCm is : 1.2963874720357946
Covariance of PetalWidthCm and SepalLengthCm is : 0.5169038031319911
Covariance of PetalWidthCm and SepalWidthCm is : -0.11798120805369115
Covariance of PetalWidthCm and PetalLengthCm is : 1.2963874720357946
Covariance of PetalWidthCm and PetalWidthCm is : 0.5824143176733781
````
### Correlation
Correlation, on the other hand, standardizes the measure of relationship between two variables, making it easier to interpret. It measures both the strength and direction of the linear relationship between two variables. Correlation values range between -1 and 1, where:
$$r = \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{n(\sum x^2) - (\sum x)^2} \cdot \sqrt{n(\sum y^2) - (\sum y)^2}}$$
* r represents the correlation coefficient.
* n is the number of data points.
```python
df.corr()
```
| | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm |
|-------------------|---------------|--------------|---------------|--------------|
| **SepalLengthCm** | 1.000000 | -0.109369 | 0.871754 | 0.817954 |
| **SepalWidthCm** | -0.109369 | 1.000000 | -0.420516 | -0.356544 |
| **PetalLengthCm** | 0.871754 | -0.420516 | 1.000000 | 0.962757 |
| **PetalWidthCm** | 0.817954 | -0.356544 | 0.962757 | 1.000000 |
#### Without using pandas
```python
import math
def corr_f(df):
for i in df.columns:
for j in df.columns:
n = len(df[i])
sumX = 0
for x in df[i]:
sumX += x
sumY = 0
for y in df[j]:
sumY += y
sumXY = 0
for xy in range(n):
sumXY += (df[i].iloc[xy] * df[j].iloc[xy])
sumX2 = 0
for x in df[i]:
sumX2 += (x**2)
sumY2 = 0
for y in df[j]:
sumY2 += (y**2)
NR = (n * sumXY) - (sumX*sumY)
DR = math.sqrt( ( (n * sumX2) - (sumX**2))*( (n * sumY2) - (sumY ** 2) ) )
print("Correlation of {} and {} :{}".format(i,j,NR/DR))
print()
corr_f(df)
```
#### Result
```
Correlation of SepalLengthCm and SepalLengthCm :1.0
Correlation of SepalLengthCm and SepalWidthCm :-0.10936924995067286
Correlation of SepalLengthCm and PetalLengthCm :0.8717541573048861
Correlation of SepalLengthCm and PetalWidthCm :0.8179536333691775
Correlation of SepalWidthCm and SepalLengthCm :-0.10936924995067286
Correlation of SepalWidthCm and SepalWidthCm :1.0
Correlation of SepalWidthCm and PetalLengthCm :-0.42051609640118826
Correlation of SepalWidthCm and PetalWidthCm :-0.3565440896138223
Correlation of PetalLengthCm and SepalLengthCm :0.8717541573048861
Correlation of PetalLengthCm and SepalWidthCm :-0.42051609640118826
Correlation of PetalLengthCm and PetalLengthCm :1.0
Correlation of PetalLengthCm and PetalWidthCm :0.9627570970509656
Correlation of PetalWidthCm and SepalLengthCm :0.8179536333691775
Correlation of PetalWidthCm and SepalWidthCm :-0.3565440896138223
Correlation of PetalWidthCm and PetalLengthCm :0.9627570970509656
Correlation of PetalWidthCm and PetalWidthCm :1.0
```

Wyświetl plik

@ -0,0 +1,391 @@
## Group By Functions
GroupBy is a powerful function in pandas that allows you to split data into distinct groups based on one or more columns and perform operations on each group independently. It's a fundamental technique for data analysis and summarization.
Here's a step-by-step breakdown of how groupby functions work in pandas:
* __Splitting the Data:__ You can group your data based on one or more columns using the .groupby() method. This method takes a column name or a list of column names as input and splits the DataFrame into groups according to the values in those columns.
* __Applying a Function:__ Once the data is grouped, you can apply various functions to each group. Pandas offers a variety of built-in aggregation functions like sum(), mean(), count(), etc., that can be used to summarize the data within each group. You can also use custom functions or lambda functions for more specific operations.
* __Combining the Results:__ After applying the function to each group, the results are combined into a new DataFrame or Series, depending on the input data and the function used. This new data structure summarizes the data by group.
```python
import pandas as pd
import seaborn as sns
import numpy as np
```
```python
iris_data = sns.load_dataset('iris')
```
This code loads the built-in Iris dataset from seaborn and stores it in a pandas DataFrame named iris_data. The Iris dataset contains measurements of flower sepal and petal dimensions for three Iris species (Setosa, Versicolor, Virginica).
```python
iris_data
```
| | sepal_length | sepal_width | petal_length | petal_width | species |
|----|--------------|-------------|--------------|-------------|-----------|
| 0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| ...| ... | ... | ... | ... | ... |
| 145| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 146| 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 147| 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 148| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 149| 5.9 | 3.0 | 5.1 | 1.8 | virginica |
```python
iris_data.groupby(['species']).count()
```
| species | sepal_length | sepal_width | petal_length | petal_width |
|------------|--------------|-------------|--------------|-------------|
| setosa | 50 | 50 | 50 | 50 |
| versicolor | 50 | 50 | 50 | 50 |
| virginica | 50 | 50 | 50 | 50 |
* We group the data by the 'species' column.
count() is applied to each group, which counts the number of occurrences (rows) in each species category.
* The output (species_counts) is a DataFrame showing the count of each species in the dataset.
```python
iris_data.groupby(["species"])["sepal_length"].mean()
```
species
setosa 5.006\
versicolor 5.936\
virginica 6.588\
Name: sepal_length, dtype: float64
* This groups the data by 'species' and selects the 'sepal_length' column.
mean() calculates the average sepal length for each species group.
* The output (species_means) is a Series containing the mean sepal length for each species.
```python
iris_data.groupby(["species"])["sepal_length"].std()
```
species
setosa 0.352490\
versicolor 0.516171\
virginica 0.635880\
Name: sepal_length, dtype: float64
* Similar to the previous, this groups by 'species' and selects the 'sepal_length' column.
However, it calculates the standard deviation (spread) of sepal length for each species group using std().
* The output (species_std) is a Series containing the standard deviation of sepal length for each species
```python
iris_data.groupby(["species"])["sepal_length"].describe()
```
| species | count | mean | std | min | 25% | 50% | 75% | max |
|------------|-------|-------|----------|------|--------|------|------|------|
| setosa | 50.0 | 5.006 | 0.352490 | 4.3 | 4.800 | 5.0 | 5.2 | 5.8 |
| versicolor | 50.0 | 5.936 | 0.516171 | 4.9 | 5.600 | 5.9 | 6.3 | 7.0 |
| virginica | 50.0 | 6.588 | 0.635880 | 4.9 | 6.225 | 6.5 | 6.9 | 7.9 |
* We have used describe() to generate a more comprehensive summary of sepal length for each species group.
* It provides statistics like count, mean, standard deviation, minimum, maximum, percentiles, etc.
The output (species_descriptions) is a DataFrame containing these descriptive statistics for each species.
```python
iris_data.groupby(["species"])["sepal_length"].quantile(q=0.25)
```
species\
setosa 4.800\
versicolor 5.600\
virginica 6.225\
Name: sepal_length, dtype: float64
```python
iris_data.groupby(["species"])["sepal_length"].quantile(q=0.75)
```
species\
setosa 5.2\
versicolor 6.3\
virginica 6.9\
Name: sepal_length, dtype: float64
* To calculate the quartiles (25th percentile and 75th percentile) of sepal length for each species group.
* quantile(q=0.25) gives the 25th percentile, which represents the value below which 25% of the data points lie.
* quantile(q=0.75) gives the 75th percentile, which represents the value below which 75% of the data points lie.
* The outputs (species_q1 and species_q3) are Series containing the respective quartile values for each species.
## Custom Function For Group By
```python
nc = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width','species']
```
```python
nc
```
['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
```python
nc = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
def species_stats(species_data,species_name):
print("Species Name: {}".format(species_name))
print()
print("Mean:\n",species_data[nc].mean())
print()
print("Median:\n",species_data[nc].median())
print()
print("std:\n",species_data[nc].std())
print()
print("25% percentile:\n",species_data[nc].quantile(0.25))
print()
print("75% percentile:\n",species_data[nc].quantile(0.75))
print()
print("Min:\n",species_data[nc].min())
print()
print("Max:\n",species_data[nc].max())
print()
```
```python
setosa_data = iris_data[iris_data['species'] == 'setosa']
```
```python
versicolor_data = iris_data[iris_data['species'] == 'versicolor']
```
```python
virginica_data = iris_data[iris_data['species'] == 'virginica']
```
```python
species_data_names = ['setosa_data','viginica_data','versicolor_data']
for data in species_data_names:
print("************** Species name {} *****************".format(data))
species_stats(setosa_data,data)
print("------------------------------------")
```
************** Species name setosa_data *****************\
Species Name: setosa_data
Mean:\
sepal_length 5.006\
sepal_width 3.428\
petal_length 1.462\
petal_width 0.246\
dtype: float64
Median:\
sepal_length 5.0\
sepal_width 3.4\
petal_length 1.5\
petal_width 0.2\
dtype: float64
std:\
sepal_length 0.352490\
sepal_width 0.379064\
petal_length 0.173664\
petal_width 0.105386\
dtype: float64
25% percentile:\
sepal_length 4.8\
sepal_width 3.2\
petal_length 1.4\
petal_width 0.2\
Name: 0.25, dtype: float64
75% percentile:\
sepal_length 5.200\
sepal_width 3.675\
petal_length 1.575\
petal_width 0.300\
Name: 0.75, dtype: float64
Min:\
sepal_length 4.3\
sepal_width 2.3\
petal_length 1.0\
petal_width 0.1\
dtype: float64
Max:
sepal_length 5.8\
sepal_width 4.4\
petal_length 1.9\
petal_width 0.6\
dtype: float64
------------------------------------\
************** Species name viginica_data *****************\
Species Name: viginica_data
Mean:\
sepal_length 5.006\
sepal_width 3.428\
petal_length 1.462\
petal_width 0.246\
dtype: float64
Median:\
sepal_length 5.0\
sepal_width 3.4\
petal_length 1.5\
petal_width 0.2\
dtype: float64
std:\
sepal_length 0.352490\
sepal_width 0.379064\
petal_length 0.173664\
petal_width 0.105386\
dtype: float64
25% percentile:\
sepal_length 4.8\
sepal_width 3.2\
petal_length 1.4\
petal_width 0.2\
Name: 0.25, dtype: float64
75% percentile:\
sepal_length 5.200\
sepal_width 3.675\
petal_length 1.575\
petal_width 0.300\
Name: 0.75, dtype: float64
Min:\
sepal_length 4.3\
sepal_width 2.3\
petal_length 1.0\
petal_width 0.1\
dtype: float64
Max:
sepal_length 5.8
sepal_width 4.4
petal_length 1.9
petal_width 0.6
dtype: float64
------------------------------------\
************** Species name versicolor_data *****************\
Species Name: versicolor_data
Mean:\
sepal_length 5.006\
sepal_width 3.428\
petal_length 1.462\
petal_width 0.246\
dtype: float64
Median:\
sepal_length 5.0\
sepal_width 3.4\
petal_length 1.5\
petal_width 0.2\
dtype: float64
std:\
sepal_length 0.352490\
sepal_width 0.379064\
petal_length 0.173664\
petal_width 0.105386\
dtype: float64
25% percentile:\
sepal_length 4.8\
sepal_width 3.2\
petal_length 1.4\
petal_width 0.2\
Name: 0.25, dtype: float64
75% percentile:\
sepal_length 5.200\
sepal_width 3.675\
petal_length 1.575\
petal_width 0.300\
Name: 0.75, dtype: float64
Min:
sepal_length 4.3\
sepal_width 2.3\
petal_length 1.0\
petal_width 0.1\
dtype: float64
Max:\
sepal_length 5.8\
sepal_width 4.4\
petal_length 1.9\
petal_width 0.6\
dtype: float64
------------------------------------

Wyświetl plik

@ -0,0 +1,63 @@
# Pandas DataFrame
The Pandas DataFrame is a two-dimensional, size-mutable, and possibly heterogeneous tabular data format with labelled axes. A data frame is a two-dimensional data structure in which the data can be organised in rows and columns. Pandas DataFrames are comprised of three main components: data, rows, and columns.
In the real world, Pandas DataFrames are formed by importing datasets from existing storage, which can be a Excel file, a SQL database or CSV file. Pandas DataFrames may be constructed from lists, dictionaries, or lists of dictionaries, etc.
Features of Pandas `DataFrame`:
- **Size mutable**: DataFrames are mutable in size, meaning that new rows and columns can be added or removed as needed.
- **Labeled axes**: DataFrames have labeled axes, which makes it easy to keep track of the data.
- **Arithmetic operations**: DataFrames support arithmetic operations on rows and columns.
- **High performance**: DataFrames are highly performant, making them ideal for working with large datasets.
### Installation of libraries
`pip install pandas` <br/>
`pip install xlrd`
- **Note**: The `xlrd` library is used for Excel operations.
Example for reading data from an Excel File:
```python
import pandas as pd
l = pd.read_excel('example.xlsx')
d = pd.DataFrame(l)
print(d)
```
Output:
```python
Name Age
0 John 12
```
Example for Inserting Data into Excel File:
```python
import pandas as pd
l = pd.read_excel('file_name.xlsx')
d = {'Name': ['Bob', 'John'], 'Age': [12, 28]}
d = pd.DataFrame(d)
L = pd.concat([l, d], ignore_index = True)
L.to_excel('file_name.xlsx', index = False)
print(L)
```
Output:
```python
Name Age
0 Bob 12
1 John 28
```
### Usage of Pandas DataFrame:
- Can be used to store and analyze financial data, such as stock prices, trading data, and economic data.
- Can be used to store and analyze sensor data, such as data from temperature sensors, motion sensors, and GPS sensors.
- Can be used to store and analyze log data, such as web server logs, application logs, and system logs

Wyświetl plik

@ -0,0 +1,46 @@
# Importing and Exporting Data in Pandas
## Importing Data from a CSV
We can create `Series` and `DataFrame` in pandas, but often we have to import the data which is in the form of `.csv` (Comma Separated Values), a spreadsheet file or similar tabular data file format.
`pandas` allows for easy importing of this data using functions such as `read_csv()` and `read_excel()` for Microsoft Excel files.
*Note: In case you want to get the information from a **Google Sheet** you can export it as a .csv file.*
The `read_csv()` function can be used to import a CSV file into a pandas DataFrame. The path can be a file system path or a URL where the CSV is available.
```python
import pandas as pd
car_sales_df= pd.read_csv("Datasets/car-sales.csv")
print(car_sales_df)
```
```
Make Colour Odometer (KM) Doors Price
0 Toyota White 150043 4 $4,000.00
1 Honda Red 87899 4 $5,000.00
2 Toyota Blue 32549 3 $7,000.00
3 BMW Black 11179 5 $22,000.00
4 Nissan White 213095 4 $3,500.00
5 Toyota Green 99213 4 $4,500.00
6 Honda Blue 45698 4 $7,500.00
7 Honda Blue 54738 4 $7,000.00
8 Toyota White 60000 4 $6,250.00
9 Nissan White 31600 4 $9,700.00
```
You can find the dataset used above in the `Datasets` folder.
*Note: If you want to import the data from Github you can't directly use its link, you have to first obtain the raw file URL by clicking on the raw button present in the repo*
## Exporting Data to a CSV
`pandas` allows you to export `DataFrame` to `.csv` format using `.to_csv()`, or to a Excel spreadsheet using `.to_excel()`.
```python
car_sales_df.to_csv("exported_car_sales.csv")
```
Running this will save a file called ``exported_car_sales.csv`` to the current folder.

Wyświetl plik

@ -1,3 +1,8 @@
# List of sections
- [Pandas Introduction and Dataframes in Pandas](introduction.md)
- [Pandas Series Vs NumPy ndarray](pandas_series_vs_numpy_ndarray.md)
- [Pandas Descriptive Statistics](Descriptive_Statistics.md)
- [Group By Functions with Pandas](GroupBy_Functions_Pandas.md)
- [Excel using Pandas DataFrame](excel_with_pandas.md)
- [Importing and Exporting Data in Pandas](import-export.md)

Wyświetl plik

@ -0,0 +1,244 @@
# Introduction_to_Pandas_Library_and_DataFrames
**As you have learnt Python Programming , now it's time for some applications.**
- Machine Learning and Data Science is the emerging field of today's time , to work in this this field your first step should be `Data Science` as Machine Learning is all about data.
- To begin with Data Science your first tool will be `Pandas Library`.
## Introduction of Pandas Library
Pandas is a data analysis and manipulation tool, built on top of the python programming language. Pandas got its name from the term Panel data (Pa from Panel and da from data). Panel data is a data which have rows and columns in it like excel spreadsheets, csv files etc.
**To use Pandas, first weve to import it.**
## Why pandas?
* Pandas provides a simple-to-use but very capable set of functions that you can use on your data.
* It is also associate with other machine learning libraries , so it is important to learn it.
* For example - It is highly used to transform tha data which will be use by machine learning model during the training.
```python
# Importing the pandas
import pandas as pd
```
*To import any module in Python use “import 'module name' ” command, I used “pd” as pandas abbreviation because we dont need to type pandas every time only type “pd” to use pandas.*
```python
# To check available pandas version
print(f"Pandas Version is : {pd.__version__}")
```
Pandas Version is : 2.1.4
## Understanding Pandas data types
Pandas has two main data types : `Series` and `DataFrames`
* `pandas.Series` is a 1-dimensional column of data.
* `pandas.DataFrames` is 2 -dimensional data table having rows and columns.
### 1. Series datatype
**To creeate a series you can use `pd.Series()` and passing a python list inside()**.
Note: S in Series is capital if you use small s it will give you an error.
> Let's create a series
```python
# Creating a series of car companies
cars = pd.Series(["Honda","Audi","Thar","BMW"])
cars
```
0 Honda
1 Audi
2 Thar
3 BMW
dtype: object
The above code creates a Series of cars companies the name of series is “cars” the code “pd.Series([“Honda” , “Audi” , “Thar”, "BMW"])” means Hey! pandas (pd) create a Series of cars named "Honda" , "Audi" , "Thar" and "BMW".
The default index of a series is 0,1,2….(Remember it starts from 0)
To change the index of any series set the “index” parameter accordingly. It takes the list of index values:
```python
cars = pd.Series(["Honda","Audi","Thar","BMW"],index = ["A" , "B" , "C" ,"D"])
cars
```
A Honda
B Audi
C Thar
D BMW
dtype: object
You can see that the index has been changed from numbers to A, B ,C and D.
And the mentioned dtype tells us about the type of data we have in the series.
### 2. DataFrames datatype
DataFrame contains rows and columns like a csv file have.
You can also create a DataFrame by using `pd.DataFrame()` and passing it a Python dictionary.
```python
# Let's create
cars_with_colours = pd.DataFrame({"Cars" : ["BMW","Audi","Thar","Honda"],
"Colour" : ["Black","White","Red","Green"]})
print(cars_with_colours)
```
Cars Colour
0 BMW Black
1 Audi White
2 Thar Red
3 Honda Green
The dictionary key is the `column name` and value are the `column data`.
*You can also create a DataFrame with the help of series.*
```python
# Let's create two series
students = pd.Series(["Ram","Mohan","Krishna","Shivam"])
age = pd.Series([19,20,21,24])
students
```
0 Ram
1 Mohan
2 Krishna
3 Shivam
dtype: object
```python
age
```
0 19
1 20
2 21
3 24
dtype: int64
```python
# Now let's create a dataframe with the help of above series
# pass the series name to the dictionary value
record = pd.DataFrame({"Student_Name":students ,
"Age" :age})
print(record)
```
Student_Name Age
0 Ram 19
1 Mohan 20
2 Krishna 21
3 Shivam 24
```python
# To print the list of columns names
record.columns
```
Index(['Student_Name', 'Age'], dtype='object')
### Describe Data
**The good news is that pandas has many built-in functions which allow you to quickly get information about a DataFrame.**
Let's explore the `record` dataframe
#### 1. Use `.dtypes` to find what datatype a column contains
```python
record.dtypes
```
Student_Name object
Age int64
dtype: object
#### 2. use `.describe()` for statistical overview.
```python
print(record.describe()) # It only display the results for numeric data
```
Age
count 4.000000
mean 21.000000
std 2.160247
min 19.000000
25% 19.750000
50% 20.500000
75% 21.750000
max 24.000000
#### 3. Use `.info()` to find information about the dataframe
```python
record.info()
```
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Student_Name 4 non-null object
1 Age 4 non-null int64
dtypes: int64(1), object(1)
memory usage: 196.0+ bytes

Wyświetl plik

@ -1,3 +1,4 @@
# List of sections
- [Section title](filename.md)
- [Installation of Scipy and its key uses](installation_features.md)

Wyświetl plik

@ -0,0 +1,173 @@
## Installation of Scipy
You can install scipy using the command:
```
$ pip install scipy
```
You can also use a Python distribution that already has Scipy installed like Anaconda, or Spyder.
### Importing SciPy
```python
from scipy import constants
```
## Key Features of SciPy
### 1. Numerical Integration
It helps in computing definite or indefinite integrals of functions
```python
from scipy import integrate
#Define the function to integrate
def f(x):
return x**2
#Compute definite integral of f from 0 to 1
result, error = integrate.quad(f, 0, 1)
print(result)
```
#### Output
```
0.33333333333333337
```
### 2. Optimization
It can be used to minimize or maximize functions, here is an example of minimizing roots of an equation
```python
from scipy.optimize import minimize
# Define an objective function to minimize
def objective(x):
return x**2 + 10*np.sin(x)
# Minimize the objective function starting from x=0
result = minimize(objective, x0=0)
print(result.x)
```
#### Output
```
array([-1.30644012])
```
### 3. Linear Algebra
Solving Linear computations
```python
from scipy import linalg
import numpy as np
# Define a square matrix
A = np.array([[1, 2], [3, 4]])
# Define a vector
b = np.array([5, 6])
# Solve Ax = b for x
x = linalg.solve(A, b)
print(x)
```
#### Output
```
array([-4. , 4.5])
```
### 4. Statistics
Performing statistics functions, like here we'll be distributing the data
```python
from scipy import stats
import numpy as np
# Generate random data from a normal distribution
data = stats.norm.rvs(loc=0, scale=1, size=1000)
# Fit a normal distribution to the data
mean, std = stats.norm.fit(data)
```
### 5. Signal Processing
To process spectral signals, like EEG or MEG
```python
from scipy import signal
import numpy as np
# Create a signal (e.g., sine wave)
t = np.linspace(0, 1, 1000)
signal = np.sin(2 * np.pi * 5 * t) + 0.5 * np.random.randn(1000)
# Apply a low-pass Butterworth filter
b, a = signal.butter(4, 0.1, 'low')
filtered_signal = signal.filtfilt(b, a, signal)
```
The various filters applied that are applied here, are a part of signal analysis at a deeper level.
### 6. Sparse Matrix
The word ' sparse 'means less, i.e., the data is mostly unused during some operation or analysis. So, to handle this data, a Sparse Matrix is created
There are two types of Sparse Matrices:
1. CSC: Compressed Sparse Column, it is used for efficient math functions and for column slicing
2. CSR: Compressed Sparse Row, it is used for fast row slicing
#### In CSC format
```python
from scipy import sparse
import numpy as np
data = np.array([[0, 0], [0, 1], [2, 0]])
row_indices = np.array([1, 2, 1])
col_indices = np.array([1, 0, 2])
values = np.array([1, 2, 1])
sparse_matrix_csc = sparse.csc_matrix((values, (row_indices, col_indices)))
```
#### In CSR format
```python
from scipy import sparse
import numpy as np
data = np.array([[0, 0], [0, 1], [2, 0]])
sparse_matrix = sparse.csr_matrix(data)
```
### 7. Image Processing
It is used to process the images, like changing dimensions or properties. For example, when you're doing a project on medical imaging, this library is mainly used.
```python
from scipy import ndimage
import matplotlib.pyplot as plt
image = plt.imread('path/to/image.jpg')
plt.imshow(image)
plt.show()
# Apply Gaussian blur to the image
blurred_image = ndimage.gaussian_filter(image, sigma=1)
plt.imshow(blurred_image)
plt.show()
```
The gaussian blur is one of the properties of the ' ndimage ' package in SciPy libraries, it used for better understanding of the image.