Merge branch 'main' into introduction-to-line-charts-in-plotly
|
@ -1,36 +1,144 @@
|
|||
## Regular Expressions in Python
|
||||
Regular expressions (regex) are a powerful tool for pattern matching and text manipulation.
|
||||
Regular expressions (regex) are a powerful tool for pattern matching and text manipulation.
|
||||
Python's re module provides comprehensive support for regular expressions, enabling efficient text processing and validation.
|
||||
Regular expressions (regex) are a versitile tool for matching patterns in strings. In Python, the `re` module provides support for working with regular expressions.
|
||||
|
||||
## 1. Introduction to Regular Expressions
|
||||
A regular expression is a sequence of characters defining a search pattern. Common use cases include validating input, searching within text, and extracting
|
||||
A regular expression is a sequence of characters defining a search pattern. Common use cases include validating input, searching within text, and extracting
|
||||
specific patterns.
|
||||
|
||||
## 2. Basic Syntax
|
||||
Literal Characters: Match exact characters (e.g., abc matches "abc").
|
||||
Metacharacters: Special characters like ., *, ?, +, ^, $, [ ], and | used to build patterns.
|
||||
Metacharacters: Special characters like ., \*, ?, +, ^, $, [ ], and | used to build patterns.
|
||||
|
||||
**Common Metacharacters:**
|
||||
|
||||
* .: Any character except newline.
|
||||
* ^: Start of the string.
|
||||
* $: End of the string.
|
||||
* *: 0 or more repetitions.
|
||||
* +: 1 or more repetitions.
|
||||
* ?: 0 or 1 repetition.
|
||||
* []: Any one character inside brackets (e.g., [a-z]).
|
||||
* |: Either the pattern before or after.
|
||||
|
||||
- .: Any character except newline.
|
||||
- ^: Start of the string.
|
||||
- $: End of the string.
|
||||
- *: 0 or more repetitions.
|
||||
- +: 1 or more repetitions.
|
||||
- ?: 0 or 1 repetition.
|
||||
- []: Any one character inside brackets (e.g., [a-z]).
|
||||
- |: Either the pattern before or after.
|
||||
- \ : Used to drop the special meaning of character following it
|
||||
- {} : Indicate the number of occurrences of a preceding regex to match.
|
||||
- () : Enclose a group of Regex
|
||||
|
||||
Examples:
|
||||
|
||||
1. `.`
|
||||
|
||||
```bash
|
||||
import re
|
||||
pattern = r'c.t'
|
||||
text = 'cat cot cut cit'
|
||||
matches = re.findall(pattern, text)
|
||||
print(matches) # Output: ['cat', 'cot', 'cut', 'cit']
|
||||
```
|
||||
|
||||
2. `^`
|
||||
|
||||
```bash
|
||||
pattern = r'^Hello'
|
||||
text = 'Hello, world!'
|
||||
match = re.search(pattern, text)
|
||||
print(match.group() if match else 'No match') # Output: 'Hello'
|
||||
```
|
||||
|
||||
3. `$`
|
||||
|
||||
```bash
|
||||
pattern = r'world!$'
|
||||
text = 'Hello, world!'
|
||||
match = re.search(pattern, text)
|
||||
print(match.group() if match else 'No match') # Output: 'world!'
|
||||
```
|
||||
|
||||
4. `*`
|
||||
|
||||
```bash
|
||||
pattern = r'ab*'
|
||||
text = 'a ab abb abbb'
|
||||
matches = re.findall(pattern, text)
|
||||
print(matches) # Output: ['a', 'ab', 'abb', 'abbb']
|
||||
```
|
||||
|
||||
5. `+`
|
||||
|
||||
```bash
|
||||
pattern = r'ab+'
|
||||
text = 'a ab abb abbb'
|
||||
matches = re.findall(pattern, text)
|
||||
print(matches) # Output: ['ab', 'abb', 'abbb']
|
||||
```
|
||||
|
||||
6. `?`
|
||||
|
||||
```bash
|
||||
pattern = r'ab?'
|
||||
text = 'a ab abb abbb'
|
||||
matches = re.findall(pattern, text)
|
||||
print(matches) # Output: ['a', 'ab', 'ab', 'ab']
|
||||
```
|
||||
|
||||
7. `[]`
|
||||
|
||||
```bash
|
||||
pattern = r'[aeiou]'
|
||||
text = 'hello world'
|
||||
matches = re.findall(pattern, text)
|
||||
print(matches) # Output: ['e', 'o', 'o']
|
||||
```
|
||||
|
||||
8. `|`
|
||||
|
||||
```bash
|
||||
pattern = r'cat|dog'
|
||||
text = 'I have a cat and a dog.'
|
||||
matches = re.findall(pattern, text)
|
||||
print(matches) # Output: ['cat', 'dog']
|
||||
```
|
||||
|
||||
9. `\``
|
||||
|
||||
```bash
|
||||
pattern = r'\$100'
|
||||
text = 'The price is $100.'
|
||||
match = re.search(pattern, text)
|
||||
print(match.group() if match else 'No match') # Output: '$100'
|
||||
```
|
||||
|
||||
10. `{}`
|
||||
|
||||
```bash
|
||||
pattern = r'\d{3}'
|
||||
text = 'My number is 123456'
|
||||
matches = re.findall(pattern, text)
|
||||
print(matches) # Output: ['123', '456']
|
||||
```
|
||||
|
||||
11. `()`
|
||||
|
||||
```bash
|
||||
pattern = r'(cat|dog)'
|
||||
text = 'I have a cat and a dog.'
|
||||
matches = re.findall(pattern, text)
|
||||
print(matches) # Output: ['cat', 'dog']
|
||||
```
|
||||
|
||||
## 3. Using the re Module
|
||||
|
||||
**Key functions in the re module:**
|
||||
|
||||
* re.match(): Checks for a match at the beginning of the string.
|
||||
* re.search(): Searches for a match anywhere in the string.
|
||||
* re.findall(): Returns a list of all matches.
|
||||
* re.sub(): Replaces matches with a specified string.
|
||||
- re.match(): Checks for a match at the beginning of the string.
|
||||
- re.search(): Searches for a match anywhere in the string.
|
||||
- re.findall(): Returns a list of all matches.
|
||||
- re.sub(): Replaces matches with a specified string.
|
||||
- re.split(): Returns a list where the string has been split at each match.
|
||||
- re.escape(): Escapes special character
|
||||
Examples:
|
||||
|
||||
Examples:
|
||||
```bash
|
||||
import re
|
||||
|
||||
|
@ -45,12 +153,20 @@ print(re.findall(r'\d+', 'abc123def456')) # Output: ['123', '456']
|
|||
|
||||
# Substitute matches
|
||||
print(re.sub(r'\d+', '#', 'abc123def456')) # Output: abc#def#
|
||||
|
||||
#Return a list where it get matched
|
||||
print(re.split("\s", txt)) #['The', 'Donkey', 'in', 'the','Town']
|
||||
|
||||
# Escape special character
|
||||
print(re.escape("We are good to go")) #We\ are\ good\ to\ go
|
||||
```
|
||||
|
||||
## 4. Compiling Regular Expressions
|
||||
|
||||
Compiling regular expressions improves performance for repeated use.
|
||||
|
||||
Example:
|
||||
|
||||
```bash
|
||||
import re
|
||||
|
||||
|
@ -58,12 +174,15 @@ pattern = re.compile(r'\d+')
|
|||
print(pattern.match('123abc').group()) # Output: 123
|
||||
print(pattern.search('abc123').group()) # Output: 123
|
||||
print(pattern.findall('abc123def456')) # Output: ['123', '456']
|
||||
|
||||
```
|
||||
|
||||
## 5. Groups and Capturing
|
||||
|
||||
Parentheses () group and capture parts of the match.
|
||||
|
||||
Example:
|
||||
|
||||
```bash
|
||||
import re
|
||||
|
||||
|
@ -76,21 +195,46 @@ if match:
|
|||
```
|
||||
|
||||
## 6. Special Sequences
|
||||
|
||||
Special sequences are shortcuts for common patterns:
|
||||
|
||||
* \d: Any digit.
|
||||
* \D: Any non-digit.
|
||||
* \w: Any alphanumeric character.
|
||||
* \W: Any non-alphanumeric character.
|
||||
* \s: Any whitespace character.
|
||||
* \S: Any non-whitespace character.
|
||||
- \A:Returns a match if the specified characters are at the beginning of the string.
|
||||
- \b:Returns a match where the specified characters are at the beginning or at the end of a word.
|
||||
- \B:Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word.
|
||||
- \d: Any digit.
|
||||
- \D: Any non-digit.
|
||||
- \w: Any alphanumeric character.
|
||||
- \W: Any non-alphanumeric character.
|
||||
- \s: Any whitespace character.
|
||||
- \S: Any non-whitespace character.
|
||||
- \Z:Returns a match if the specified characters are at the end of the string.
|
||||
|
||||
Example:
|
||||
|
||||
```bash
|
||||
import re
|
||||
|
||||
print(re.search(r'\w+@\w+\.\w+', 'Contact: support@example.com').group()) # Output: support@example.com
|
||||
```
|
||||
|
||||
## 7.Sets
|
||||
|
||||
A set is a set of characters inside a pair of square brackets [] with a special meaning:
|
||||
|
||||
- [arn] : Returns a match where one of the specified characters (a, r, or n) is present.
|
||||
- [a-n] : Returns a match for any lower case character, alphabetically between a and n.
|
||||
- [^arn] : Returns a match for any character EXCEPT a, r, and n.
|
||||
- [0123] : Returns a match where any of the specified digits (0, 1, 2, or 3) are present.
|
||||
- [0-9] : Returns a match for any digit between 0 and 9.
|
||||
- [0-5][0-9] : Returns a match for any two-digit numbers from 00 and 59.
|
||||
- [a-zA-Z] : Returns a match for any character alphabetically between a and z, lower case OR upper case.
|
||||
- [+] : In sets, +, \*, ., |, (), $,{} has no special meaning
|
||||
- [+] means: return a match for any + character in the string.
|
||||
|
||||
## Summary
|
||||
Regular expressions are a versatile tool for text processing in Python. The re module offers powerful functions and metacharacters for pattern matching,
|
||||
searching, and manipulation, making it an essential skill for handling complex text processing tasks.
|
||||
|
||||
Regular expressions (regex) are a powerful tool for text processing in Python, offering a flexible way to match, search, and manipulate text patterns. The re module provides a comprehensive set of functions and metacharacters to tackle complex text processing tasks.
|
||||
With regex, you can:
|
||||
1.Match patterns: Use metacharacters like ., \*, ?, and {} to match specific patterns in text.
|
||||
2.Search text: Employ functions like re.search() and re.match() to find occurrences of patterns in text.
|
||||
3.Manipulate text: Utilize functions like re.sub() to replace patterns with new text.
|
||||
|
|
|
@ -0,0 +1,216 @@
|
|||
# Deque in Python
|
||||
|
||||
## Definition
|
||||
A deque, short for double-ended queue, is an ordered collection of items that allows rapid insertion and deletion at both ends.
|
||||
|
||||
## Syntax
|
||||
In Python, deques are implemented in the collections module:
|
||||
|
||||
```py
|
||||
from collections import deque
|
||||
|
||||
# Creating a deque
|
||||
d = deque(iterable) # Create deque from iterable (optional)
|
||||
```
|
||||
|
||||
## Operations
|
||||
1. **Appending Elements**:
|
||||
|
||||
- append(x): Adds element x to the right end of the deque.
|
||||
- appendleft(x): Adds element x to the left end of the deque.
|
||||
|
||||
### Program
|
||||
```py
|
||||
from collections import deque
|
||||
|
||||
# Initialize a deque
|
||||
d = deque([1, 2, 3, 4, 5])
|
||||
print("Initial deque:", d)
|
||||
|
||||
# Append elements
|
||||
d.append(6)
|
||||
print("After append(6):", d)
|
||||
|
||||
# Append left
|
||||
d.appendleft(0)
|
||||
print("After appendleft(0):", d)
|
||||
|
||||
```
|
||||
### Output
|
||||
```py
|
||||
Initial deque: deque([1, 2, 3, 4, 5])
|
||||
After append(6): deque([1, 2, 3, 4, 5, 6])
|
||||
After appendleft(0): deque([0, 1, 2, 3, 4, 5, 6])
|
||||
```
|
||||
|
||||
2. **Removing Elements**:
|
||||
|
||||
- pop(): Removes and returns the rightmost element.
|
||||
- popleft(): Removes and returns the leftmost element.
|
||||
|
||||
### Program
|
||||
```py
|
||||
from collections import deque
|
||||
|
||||
# Initialize a deque
|
||||
d = deque([1, 2, 3, 4, 5])
|
||||
print("Initial deque:", d)
|
||||
|
||||
# Pop from the right end
|
||||
rightmost = d.pop()
|
||||
print("Popped from right end:", rightmost)
|
||||
print("Deque after pop():", d)
|
||||
|
||||
# Pop from the left end
|
||||
leftmost = d.popleft()
|
||||
print("Popped from left end:", leftmost)
|
||||
print("Deque after popleft():", d)
|
||||
|
||||
```
|
||||
|
||||
### Output
|
||||
```py
|
||||
Initial deque: deque([1, 2, 3, 4, 5])
|
||||
Popped from right end: 5
|
||||
Deque after pop(): deque([1, 2, 3, 4])
|
||||
Popped from left end: 1
|
||||
Deque after popleft(): deque([2, 3, 4])
|
||||
```
|
||||
|
||||
3. **Accessing Elements**:
|
||||
|
||||
- deque[index]: Accesses element at index.
|
||||
|
||||
### Program
|
||||
```py
|
||||
from collections import deque
|
||||
|
||||
# Initialize a deque
|
||||
d = deque([1, 2, 3, 4, 5])
|
||||
print("Initial deque:", d)
|
||||
|
||||
# Accessing elements
|
||||
print("Element at index 2:", d[2])
|
||||
|
||||
```
|
||||
|
||||
### Output
|
||||
```py
|
||||
Initial deque: deque([1, 2, 3, 4, 5])
|
||||
Element at index 2: 3
|
||||
|
||||
```
|
||||
|
||||
4. **Other Operations**:
|
||||
|
||||
- extend(iterable): Extends deque by appending elements from iterable.
|
||||
- extendleft(iterable): Extends deque by appending elements from iterable to the left.
|
||||
- rotate(n): Rotates deque n steps to the right (negative n rotates left).
|
||||
|
||||
### Program
|
||||
```py
|
||||
from collections import deque
|
||||
|
||||
# Initialize a deque
|
||||
d = deque([1, 2, 3, 4, 5])
|
||||
print("Initial deque:", d)
|
||||
|
||||
# Extend deque
|
||||
d.extend([6, 7, 8])
|
||||
print("After extend([6, 7, 8]):", d)
|
||||
|
||||
# Extend left
|
||||
d.extendleft([-1, 0])
|
||||
print("After extendleft([-1, 0]):", d)
|
||||
|
||||
# Rotate deque
|
||||
d.rotate(2)
|
||||
print("After rotate(2):", d)
|
||||
|
||||
# Rotate left
|
||||
d.rotate(-3)
|
||||
print("After rotate(-3):", d)
|
||||
|
||||
```
|
||||
|
||||
### Output
|
||||
```py
|
||||
Initial deque: deque([1, 2, 3, 4, 5])
|
||||
After extend([6, 7, 8]): deque([1, 2, 3, 4, 5, 6, 7, 8])
|
||||
After extendleft([-1, 0]): deque([0, -1, 1, 2, 3, 4, 5, 6, 7, 8])
|
||||
After rotate(2): deque([7, 8, 0, -1, 1, 2, 3, 4, 5, 6])
|
||||
After rotate(-3): deque([1, 2, 3, 4, 5, 6, 7, 8, 0, -1])
|
||||
|
||||
```
|
||||
|
||||
|
||||
## Example
|
||||
|
||||
### 1. Finding Maximum in Sliding Window
|
||||
```py
|
||||
from collections import deque
|
||||
|
||||
def max_sliding_window(nums, k):
|
||||
if not nums:
|
||||
return []
|
||||
|
||||
d = deque()
|
||||
result = []
|
||||
|
||||
for i, num in enumerate(nums):
|
||||
# Remove elements from deque that are out of the current window
|
||||
if d and d[0] <= i - k:
|
||||
d.popleft()
|
||||
|
||||
# Remove elements from deque smaller than the current element
|
||||
while d and nums[d[-1]] <= num:
|
||||
d.pop()
|
||||
|
||||
d.append(i)
|
||||
|
||||
# Add maximum for current window
|
||||
if i >= k - 1:
|
||||
result.append(nums[d[0]])
|
||||
|
||||
return result
|
||||
|
||||
# Example usage:
|
||||
nums = [1, 3, -1, -3, 5, 3, 6, 7]
|
||||
k = 3
|
||||
print("Maximums in sliding window of size", k, "are:", max_sliding_window(nums, k))
|
||||
|
||||
```
|
||||
|
||||
Output
|
||||
```py
|
||||
Maximums in sliding window of size 3 are: [3, 3, 5, 5, 6, 7]
|
||||
```
|
||||
|
||||
|
||||
## Applications
|
||||
- **Efficient Queues and Stacks**: Deques allow fast O(1) append and pop operations from both ends,
|
||||
making them ideal for implementing queues and stacks.
|
||||
- **Sliding Window Maximum/Minimum**: Used in algorithms that require efficient windowed
|
||||
computations.
|
||||
|
||||
|
||||
## Advantages
|
||||
- Efficiency: O(1) time complexity for append and pop operations from both ends.
|
||||
- Versatility: Can function both as a queue and as a stack.
|
||||
- Flexible: Supports rotation and slicing operations efficiently.
|
||||
|
||||
|
||||
## Disadvantages
|
||||
- Memory Usage: Requires more memory compared to simple lists due to overhead in managing linked
|
||||
nodes.
|
||||
|
||||
## Conclusion
|
||||
- Deques in Python, provided by the collections.deque module, offer efficient double-ended queue
|
||||
operations with O(1) time complexity for append and pop operations on both ends. They are versatile
|
||||
data structures suitable for implementing queues, stacks, and more complex algorithms requiring
|
||||
efficient manipulation of elements at both ends.
|
||||
|
||||
- While deques excel in scenarios requiring fast append and pop operations from either end, they do
|
||||
consume more memory compared to simple lists due to their implementation using doubly-linked lists.
|
||||
However, their flexibility and efficiency make them invaluable for various programming tasks and
|
||||
algorithmic solutions.
|
|
@ -22,4 +22,5 @@
|
|||
- [AVL Trees](avl-trees.md)
|
||||
- [Splay Trees](splay-trees.md)
|
||||
- [Dijkstra's Algorithm](dijkstra.md)
|
||||
- [Deque](deque.md)
|
||||
- [Tree Traversals](tree-traversal.md)
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
# Linked List Data Structure
|
||||
|
||||
Link list is a linear data Structure which can be defined as collection of objects called nodes that are randomly stored in the memory.
|
||||
Linked list is a linear data Structure which can be defined as collection of objects called nodes that are randomly stored in the memory.
|
||||
A node contains two types of metadata i.e. data stored at that particular address and the pointer which contains the address of the next node in the memory.
|
||||
|
||||
The last element in a linked list features a null pointer.
|
||||
|
@ -36,10 +36,10 @@ The smallest Unit: Node
|
|||
Now, we will see the types of linked list.
|
||||
|
||||
There are mainly four types of linked list,
|
||||
1. Singly Link list
|
||||
2. Doubly link list
|
||||
3. Circular link list
|
||||
4. Doubly circular link list
|
||||
1. Singly linked list
|
||||
2. Doubly linked list
|
||||
3. Circular linked list
|
||||
4. Doubly circular linked list
|
||||
|
||||
|
||||
## 1. Singly linked list.
|
||||
|
@ -160,6 +160,18 @@ check the list is empty otherwise shift the head to next node.
|
|||
temp.next = None # Remove the last node by setting the next pointer of the second-last node to None
|
||||
```
|
||||
|
||||
### Reversing the linked list
|
||||
```python
|
||||
def reverseList(self):
|
||||
prev = None
|
||||
temp = self.head
|
||||
while(temp):
|
||||
nextNode = temp.next #Store the next node
|
||||
temp.next = prev # Reverse the pointer of current node
|
||||
prev = temp # Move prev pointer one step forward
|
||||
temp = nextNode # Move temp pointer one step forward.
|
||||
self.head = prev # Update the head pointer to last node
|
||||
```
|
||||
|
||||
### Search in a linked list
|
||||
```python
|
||||
|
@ -174,6 +186,8 @@ check the list is empty otherwise shift the head to next node.
|
|||
return f"Value '{value}' not found in the list"
|
||||
```
|
||||
|
||||
Connect all the code.
|
||||
|
||||
```python
|
||||
if __name__ == '__main__':
|
||||
llist = LinkedList()
|
||||
|
@ -197,13 +211,17 @@ check the list is empty otherwise shift the head to next node.
|
|||
|
||||
#delete at the end
|
||||
llist.deleteFromEnd() # 2 3 56 9 4 10
|
||||
# Print the list
|
||||
# Print the original list
|
||||
llist.printList()
|
||||
llist.reverseList() #10 4 9 56 3 2
|
||||
# Print the reversed list
|
||||
llist.printList()
|
||||
```
|
||||
## Output:
|
||||
|
||||
2 3 56 9 4 10
|
||||
2 3 56 9 4 10
|
||||
|
||||
10 4 9 56 3 2
|
||||
|
||||
|
||||
## Real Life uses of Linked List
|
||||
|
|
Po Szerokość: | Wysokość: | Rozmiar: 23 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 25 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 272 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 56 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 15 KiB |
|
@ -0,0 +1,184 @@
|
|||
# Exploratory Data Analysis
|
||||
|
||||
Exploratory Data Analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. EDA is used to understand the data, get a sense of the data, and to identify relationships between variables. EDA is a crucial step in the data analysis process and should be done before building a model.
|
||||
|
||||
## Why is EDA important?
|
||||
|
||||
1. **Understand the data**: EDA helps to understand the data, its structure, and its characteristics.
|
||||
|
||||
2. **Identify patterns and relationships**: EDA helps to identify patterns and relationships between variables.
|
||||
|
||||
3. **Detect outliers and anomalies**: EDA helps to detect outliers and anomalies in the data.
|
||||
|
||||
4. **Prepare data for modeling**: EDA helps to prepare the data for modeling by identifying missing values, handling missing values, and transforming variables.
|
||||
|
||||
## Steps in EDA
|
||||
|
||||
1. **Data Collection**: Collect the data from various sources.
|
||||
|
||||
2. **Data Cleaning**: Clean the data by handling missing values, removing duplicates, and transforming variables.
|
||||
|
||||
3. **Data Exploration**: Explore the data by visualizing the data, summarizing the data, and identifying patterns and relationships.
|
||||
|
||||
4. **Data Analysis**: Analyze the data by performing statistical analysis, hypothesis testing, and building models.
|
||||
|
||||
5. **Data Visualization**: Visualize the data using various plots and charts to understand the data better.
|
||||
|
||||
## Tools for EDA
|
||||
|
||||
1. **Python**: Python is a popular programming language for data analysis and has many libraries for EDA, such as Pandas, NumPy, Matplotlib, Seaborn, and Plotly.
|
||||
|
||||
2. **Jupiter Notebook**: Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text.
|
||||
|
||||
## Techniques for EDA
|
||||
|
||||
1. **Descriptive Statistics**: Descriptive statistics summarize the main characteristics of a data set, such as mean, median, mode, standard deviation, and variance.
|
||||
|
||||
2. **Data Visualization**: Data visualization is the graphical representation of data to understand the data better, such as histograms, scatter plots, box plots, and heat maps.
|
||||
|
||||
3. **Correlation Analysis**: Correlation analysis is used to measure the strength and direction of the relationship between two variables.
|
||||
|
||||
4. **Hypothesis Testing**: Hypothesis testing is used to test a hypothesis about a population parameter based on sample data.
|
||||
|
||||
5. **Dimensionality Reduction**: Dimensionality reduction is the process of reducing the number of variables in a data set while retaining as much information as possible.
|
||||
|
||||
6. **Clustering Analysis**: Clustering analysis is used to group similar data points together based on their characteristics.
|
||||
|
||||
## Commonly Used Techniques in EDA
|
||||
|
||||
1. **Uni-variate Analysis**: Uni-variate analysis is the simplest form of data analysis that involves analyzing a single variable at a time.
|
||||
|
||||
2. **Bi-variate Analysis**: Bi-variate analysis involves analyzing two variables at a time to understand the relationship between them.
|
||||
|
||||
3. **Multi-variate Analysis**: Multi-variate analysis involves analyzing more than two variables at a time to understand the relationship between them.
|
||||
|
||||
## Understand with an Example
|
||||
|
||||
Let's understand EDA with an example. Here we use a famous dataset called Iris dataset.
|
||||
|
||||
The dataset consists of 150 samples of iris flowers, where each sample represents measurements of four features (variables) for three species of iris flowers.
|
||||
|
||||
The four features measured are :
|
||||
Sepal length (in cm) Sepal width (in cm) Petal length (in cm) Petal width (in cm).
|
||||
|
||||
The three species of iris flowers included in the dataset are :
|
||||
**Setosa**, **Versicolor**, **Virginica**
|
||||
|
||||
```python
|
||||
# Import libraries
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
import seaborn as sns
|
||||
from sklearn import datasets
|
||||
|
||||
# Load the Iris dataset
|
||||
iris = datasets.load_iris()
|
||||
df = pd.DataFrame(iris.data, columns=iris.feature_names)
|
||||
df.head()
|
||||
```
|
||||
|
||||
| Sepal Length (cm) | Sepal Width (cm) | Petal Length (cm) | Petal Width (cm) |
|
||||
|-------------------|------------------|-------------------|------------------|
|
||||
| 5.1 | 3.5 | 1.4 | 0.2 |
|
||||
| 4.9 | 3.0 | 1.4 | 0.2 |
|
||||
| 4.7 | 3.2 | 1.3 | 0.2 |
|
||||
| 4.6 | 3.1 | 1.5 | 0.2 |
|
||||
| 5.0 | 3.6 | 1.4 | 0.2 |
|
||||
|
||||
|
||||
### Uni-variate Analysis
|
||||
|
||||
```python
|
||||
# Uni-variate Analysis
|
||||
df_setosa=df.loc[df['species']=='setosa']
|
||||
df_virginica=df.loc[df['species']=='virginica']
|
||||
df_versicolor=df.loc[df['species']=='versicolor']
|
||||
|
||||
plt.plot(df_setosa['sepal_length'])
|
||||
plt.plot(df_virginica['sepal_length'])
|
||||
plt.plot(df_versicolor['sepal_length'])
|
||||
plt.xlabel('sepal length')
|
||||
plt.show()
|
||||
```
|
||||
![Uni-variate Analysis](assets/eda/uni-variate-analysis1.png)
|
||||
|
||||
```python
|
||||
plt.hist(df_setosa['petal_length'])
|
||||
plt.hist(df_virginica['petal_length'])
|
||||
plt.hist(df_versicolor['petal_length'])
|
||||
plt.xlabel('petal length')
|
||||
plt.show()
|
||||
```
|
||||
![Uni-variate Analysis](assets/eda/uni-variate-analysis2.png)
|
||||
|
||||
### Bi-variate Analysis
|
||||
|
||||
```python
|
||||
# Bi-variate Analysis
|
||||
sns.FacetGrid(df,hue="species",height=5).map(plt.scatter,"petal_length","sepal_width").add_legen()
|
||||
plt.show()
|
||||
```
|
||||
![Bi-variate Analysis](assets/eda/bi-variate-analysis.png)
|
||||
|
||||
### Multi-variate Analysis
|
||||
|
||||
```python
|
||||
# Multi-variate Analysis
|
||||
sns.pairplot(df,hue="species",height=3)
|
||||
```
|
||||
![Multi-variate Analysis](assets/eda/multi-variate-analysis.png)
|
||||
|
||||
### Correlation Analysis
|
||||
|
||||
```python
|
||||
# Correlation Analysis
|
||||
corr_matrix = df.corr()
|
||||
sns.heatmap(corr_matrix)
|
||||
```
|
||||
| | sepal_length | sepal_width | petal_length | petal_width |
|
||||
|-------------|--------------|-------------|--------------|-------------|
|
||||
| sepal_length| 1.000000 | -0.109369 | 0.871754 | 0.817954 |
|
||||
| sepal_width | -0.109369 | 1.000000 | -0.420516 | -0.356544 |
|
||||
| petal_length| 0.871754 | -0.420516 | 1.000000 | 0.962757 |
|
||||
| petal_width | 0.817954 | -0.356544 | 0.962757 | 1.000000 |
|
||||
|
||||
![Correlation Analysis](assets/eda/correlation-analysis.png)
|
||||
|
||||
## Exploratory Data Analysis (EDA) Report on Iris Dataset
|
||||
|
||||
### Introduction
|
||||
The Iris dataset consists of 150 samples of iris flowers, each characterized by four features: Sepal Length, Sepal Width, Petal Length, and Petal Width. These samples belong to three species of iris flowers: Setosa, Versicolor, and Virginica. In this EDA report, we explore the dataset to gain insights into the characteristics and relationships among the features and species.
|
||||
|
||||
### Uni-variate Analysis
|
||||
Uni-variate analysis examines each variable individually.
|
||||
- Sepal Length: The distribution of Sepal Length varies among the different species, with Setosa generally having shorter sepals compared to Versicolor and Virginica.
|
||||
- Petal Length: Setosa tends to have shorter petal lengths, while Versicolor and Virginica have relatively longer petal lengths.
|
||||
|
||||
### Bi-variate Analysis
|
||||
Bi-variate analysis explores the relationship between two variables.
|
||||
- Petal Length vs. Sepal Width: There is a noticeable separation between species, especially Setosa, which typically has shorter and wider sepals compared to Versicolor and Virginica.
|
||||
- This analysis suggests potential patterns distinguishing the species based on these two features.
|
||||
|
||||
### Multi-variate Analysis
|
||||
Multi-variate analysis considers interactions among multiple variables simultaneously.
|
||||
- Pairplot: The pairplot reveals distinctive clusters for each species, particularly in the combinations of Petal Length and Petal Width, indicating clear separation among species based on these features.
|
||||
|
||||
### Correlation Analysis
|
||||
Correlation analysis examines the relationship between variables.
|
||||
- Correlation Heatmap: There are strong positive correlations between Petal Length and Petal Width, as well as between Petal Length and Sepal Length. Sepal Width shows a weaker negative correlation with Petal Length and Petal Width.
|
||||
|
||||
### Insights
|
||||
1. Petal dimensions (length and width) exhibit strong correlations, suggesting that they may collectively contribute more significantly to distinguishing between iris species.
|
||||
2. Setosa tends to have shorter and wider sepals compared to Versicolor and Virginica.
|
||||
3. The combination of Petal Length and Petal Width appears to be a more effective discriminator among iris species, as indicated by the distinct clusters observed in multi-variate analysis.
|
||||
|
||||
### Conclusion
|
||||
Through comprehensive exploratory data analysis, we have gained valuable insights into the Iris dataset, highlighting key characteristics and relationships among features and species. Further analysis and modeling could leverage these insights to develop robust classification models for predicting iris species based on their measurements.
|
||||
|
||||
## Conclusion
|
||||
|
||||
Exploratory Data Analysis (EDA) is a critical step in the data analysis process that helps to understand the data, identify patterns and relationships, detect outliers, and prepare the data for modeling. By using various techniques and tools, such as descriptive statistics, data visualization, correlation analysis, and hypothesis testing, EDA provides valuable insights into the data, enabling data scientists to make informed decisions and build accurate models.
|
||||
|
||||
|
||||
|
|
@ -27,3 +27,4 @@
|
|||
- [Transformers](transformers.md)
|
||||
- [Reinforcement Learning](reinforcement-learning.md)
|
||||
- [Neural network regression](neural-network-regression.md)
|
||||
- [Exploratory Data Analysis](eda.md)
|
||||
|
|
|
@ -93,13 +93,9 @@ $$
|
|||
|
||||
- Rain:
|
||||
|
||||
$$
|
||||
P(Rain|Yes) = \frac{2}{6}
|
||||
$$
|
||||
$$P(Rain|Yes) = \frac{2}{6}$$
|
||||
|
||||
$$
|
||||
P(Rain|No) = \frac{4}{4}
|
||||
$$
|
||||
$$P(Rain|No) = \frac{4}{4}$$
|
||||
|
||||
- Overcast:
|
||||
|
||||
|
@ -111,10 +107,7 @@ $$
|
|||
$$
|
||||
|
||||
|
||||
Here, we can see that
|
||||
$$
|
||||
P(Overcast|No) = 0
|
||||
$$
|
||||
Here, we can see that P(Overcast|No) = 0
|
||||
This is a zero probability error!
|
||||
|
||||
Since probability is 0, naive bayes model fails to predict.
|
||||
|
@ -124,13 +117,9 @@ Since probability is 0, naive bayes model fails to predict.
|
|||
In Laplace's correction, we scale the values for 1000 instances.
|
||||
- **Calculate prior probabilities**
|
||||
|
||||
$$
|
||||
P(Yes) = \frac{600}{1002}
|
||||
$$
|
||||
$$P(Yes) = \frac{600}{1002}$$
|
||||
|
||||
$$
|
||||
P(No) = \frac{402}{1002}
|
||||
$$
|
||||
$$P(No) = \frac{402}{1002}$$
|
||||
|
||||
- **Calculate likelihoods**
|
||||
|
||||
|
@ -151,21 +140,13 @@ Since probability is 0, naive bayes model fails to predict.
|
|||
|
||||
- **Rain:**
|
||||
|
||||
$$
|
||||
P(Rain|Yes) = \frac{200}{600}
|
||||
$$
|
||||
$$
|
||||
P(Rain|No) = \frac{401}{402}
|
||||
$$
|
||||
$$P(Rain|Yes) = \frac{200}{600}$$
|
||||
$$P(Rain|No) = \frac{401}{402}$$
|
||||
|
||||
- **Overcast:**
|
||||
|
||||
$$
|
||||
P(Overcast|Yes) = \frac{400}{600}
|
||||
$$
|
||||
$$
|
||||
P(Overcast|No) = \frac{1}{402}
|
||||
$$
|
||||
$$P(Overcast|Yes) = \frac{400}{600}$$
|
||||
$$P(Overcast|No) = \frac{1}{402}$$
|
||||
|
||||
|
||||
2. **Wind (B):**
|
||||
|
@ -181,49 +162,27 @@ Since probability is 0, naive bayes model fails to predict.
|
|||
|
||||
- **Weak:**
|
||||
|
||||
$$
|
||||
P(Weak|Yes) = \frac{500}{600}
|
||||
$$
|
||||
$$
|
||||
P(Weak|No) = \frac{200}{400}
|
||||
$$
|
||||
$$P(Weak|Yes) = \frac{500}{600}$$
|
||||
$$P(Weak|No) = \frac{200}{400}$$
|
||||
|
||||
- **Strong:**
|
||||
|
||||
$$
|
||||
P(Strong|Yes) = \frac{100}{600}
|
||||
$$
|
||||
$$
|
||||
P(Strong|No) = \frac{200}{400}
|
||||
$$
|
||||
$$P(Strong|Yes) = \frac{100}{600}$$
|
||||
$$P(Strong|No) = \frac{200}{400}$$
|
||||
|
||||
- **Calculting probabilities:**
|
||||
|
||||
$$
|
||||
P(PlayTennis|Yes) = P(Yes) * P(Overcast|Yes) * P(Weak|Yes)
|
||||
$$
|
||||
$$
|
||||
= \frac{600}{1002} * \frac{400}{600} * \frac{500}{600}
|
||||
$$
|
||||
$$
|
||||
= 0.3326
|
||||
$$
|
||||
$$P(PlayTennis|Yes) = P(Yes) * P(Overcast|Yes) * P(Weak|Yes)$$
|
||||
$$= \frac{600}{1002} * \frac{400}{600} * \frac{500}{600}$$
|
||||
$$= 0.3326$$
|
||||
|
||||
$$
|
||||
P(PlayTennis|No) = P(No) * P(Overcast|No) * P(Weak|No)
|
||||
$$
|
||||
$$
|
||||
= \frac{402}{1002} * \frac{1}{402} * \frac{200}{400}
|
||||
$$
|
||||
$$
|
||||
= 0.000499 = 0.0005
|
||||
$$
|
||||
$$P(PlayTennis|No) = P(No) * P(Overcast|No) * P(Weak|No)$$
|
||||
$$= \frac{402}{1002} * \frac{1}{402} * \frac{200}{400}$$
|
||||
$$= 0.000499 = 0.0005$$
|
||||
|
||||
|
||||
Since ,
|
||||
$$
|
||||
P(PlayTennis|Yes) > P(PlayTennis|No)
|
||||
$$
|
||||
$$P(PlayTennis|Yes) > P(PlayTennis|No)$$
|
||||
we can conclude that tennis can be played if outlook is overcast and wind is weak.
|
||||
|
||||
|
||||
|
@ -366,4 +325,4 @@ print("Confusion matrix: \n",confusion_matrix(y_train,y_pred))
|
|||
## Conclusion
|
||||
|
||||
We can conclude that naive bayes may limit in some cases due to the assumption that the features are independent of each other but still reliable in many cases. Naive Bayes is an efficient classifier and works even on small datasets.
|
||||
|
||||
|
||||
|
|
Po Szerokość: | Wysokość: | Rozmiar: 28 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 24 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 25 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 25 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 37 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 28 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 21 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 18 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 20 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 20 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 21 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 20 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 24 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 24 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 15 KiB |
|
@ -6,9 +6,12 @@
|
|||
- [Pie Charts in Matplotlib](matplotlib-pie-charts.md)
|
||||
- [Line Charts in Matplotlib](matplotlib-line-plots.md)
|
||||
- [Scatter Plots in Matplotlib](matplotlib-scatter-plot.md)
|
||||
- [Violin Plots in Matplotlib](matplotlib-violin-plots.md)
|
||||
- [subplots in Matplotlib](matplotlib-sub-plot.md)
|
||||
- [Introduction to Seaborn and Installation](seaborn-intro.md)
|
||||
- [Seaborn Plotting Functions](seaborn-plotting.md)
|
||||
- [Getting started with Seaborn](seaborn-basics.md)
|
||||
- [Bar Plots in Plotly](plotly-bar-plots.md)
|
||||
- [Pie Charts in Plotly](plotly-pie-charts.md)
|
||||
- [Line Charts in Plotly](plotly-line-charts.md)
|
||||
- [Line Charts in Plotly](plotly-line-charts.md)
|
||||
- [Scatter Plots in Plotly](plotly-scatter-plots.md)
|
||||
|
|
|
@ -0,0 +1,130 @@
|
|||
### 1. Using `plt.subplots()`
|
||||
|
||||
The `plt.subplots()` function is a versatile and easy way to create a grid of subplots. It returns a figure and an array of Axes objects.
|
||||
|
||||
#### Code Explanation
|
||||
|
||||
1. **Import Libraries**:
|
||||
```python
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
```
|
||||
|
||||
2. **Generate Sample Data**:
|
||||
```python
|
||||
x = np.linspace(0, 10, 100)
|
||||
y1 = np.sin(x)
|
||||
y2 = np.cos(x)
|
||||
y3 = np.tan(x)
|
||||
```
|
||||
|
||||
3. **Create Subplots**:
|
||||
```python
|
||||
fig, axs = plt.subplots(3, 1, figsize=(8, 12))
|
||||
```
|
||||
|
||||
- `3, 1` indicates a 3-row, 1-column grid.
|
||||
- `figsize` specifies the overall size of the figure.
|
||||
|
||||
4. **Plot Data**:
|
||||
```python
|
||||
axs[0].plot(x, y1, 'r')
|
||||
axs[0].set_title('Sine Function')
|
||||
|
||||
axs[1].plot(x, y2, 'g')
|
||||
axs[1].set_title('Cosine Function')
|
||||
|
||||
axs[2].plot(x, y3, 'b')
|
||||
axs[2].set_title('Tangent Function')
|
||||
```
|
||||
|
||||
5. **Adjust Layout and Show Plot**:
|
||||
```python
|
||||
plt.tight_layout()
|
||||
plt.show()
|
||||
```
|
||||
|
||||
#### Result
|
||||
|
||||
The result will be a figure with three vertically stacked subplots.
|
||||
![subplot Chart](images/subplots.png)
|
||||
|
||||
### 2. Using `plt.subplot()`
|
||||
|
||||
The `plt.subplot()` function allows you to add a single subplot at a time to a figure.
|
||||
|
||||
#### Code Explanation
|
||||
|
||||
1. **Import Libraries and Generate Data** (same as above).
|
||||
|
||||
2. **Create Figure and Subplots**:
|
||||
```python
|
||||
plt.figure(figsize=(8, 12))
|
||||
|
||||
plt.subplot(3, 1, 1)
|
||||
plt.plot(x, y1, 'r')
|
||||
plt.title('Sine Function')
|
||||
|
||||
plt.subplot(3, 1, 2)
|
||||
plt.plot(x, y2, 'g')
|
||||
plt.title('Cosine Function')
|
||||
|
||||
plt.subplot(3, 1, 3)
|
||||
plt.plot(x, y3, 'b')
|
||||
plt.title('Tangent Function')
|
||||
```
|
||||
|
||||
3. **Adjust Layout and Show Plot** (same as above).
|
||||
|
||||
#### Result
|
||||
|
||||
The result will be similar to the first method but created using individual subplot commands.
|
||||
|
||||
![subplot Chart](images/subplots.png)
|
||||
|
||||
### 3. Using `GridSpec`
|
||||
|
||||
`GridSpec` allows for more complex subplot layouts.
|
||||
|
||||
#### Code Explanation
|
||||
|
||||
1. **Import Libraries and Generate Data** (same as above).
|
||||
|
||||
2. **Create Figure and GridSpec**:
|
||||
```python
|
||||
from matplotlib.gridspec import GridSpec
|
||||
|
||||
fig = plt.figure(figsize=(8, 12))
|
||||
gs = GridSpec(3, 1, figure=fig)
|
||||
```
|
||||
|
||||
3. **Create Subplots**:
|
||||
```python
|
||||
ax1 = fig.add_subplot(gs[0, 0])
|
||||
ax1.plot(x, y1, 'r')
|
||||
ax1.set_title('Sine Function')
|
||||
|
||||
ax2 = fig.add_subplot(gs[1, 0])
|
||||
ax2.plot(x, y2, 'g')
|
||||
ax2.set_title('Cosine Function')
|
||||
|
||||
ax3 = fig.add_subplot(gs[2, 0])
|
||||
ax3.plot(x, y3, 'b')
|
||||
ax3.set_title('Tangent Function')
|
||||
```
|
||||
|
||||
4. **Adjust Layout and Show Plot** (same as above).
|
||||
|
||||
#### Result
|
||||
|
||||
The result will again be three subplots in a vertical stack, created using the flexible `GridSpec`.
|
||||
|
||||
![subplot Chart](images/subplots.png)
|
||||
|
||||
### Summary
|
||||
|
||||
- **`plt.subplots()`**: Creates a grid of subplots with shared axes.
|
||||
- **`plt.subplot()`**: Adds individual subplots in a figure.
|
||||
- **`GridSpec`**: Allows for complex and custom subplot layouts.
|
||||
|
||||
By mastering these techniques, you can create detailed and organized visualizations, enhancing the clarity and comprehension of your data presentations.
|
|
@ -0,0 +1,277 @@
|
|||
# Violin Plots in Matplotlib
|
||||
|
||||
A violin plot is a method of plotting numeric data and a probability density function. It is a combination of a box plot and a kernel density plot, providing a richer visualization of the distribution of the data. In a violin plot, each data point is represented by a kernel density plot, mirrored and joined together to form a symmetrical shape resembling a violin, hence the name.
|
||||
|
||||
Violin plots are particularly useful when comparing distributions across different categories or groups. They provide insights into the shape, spread, and central tendency of the data, allowing for a more comprehensive understanding than traditional box plots.
|
||||
|
||||
Violin plots offer a more detailed distribution representation, combining summary statistics and kernel density plots, handle unequal sample sizes effectively, allow easy comparison across groups, and facilitate identification of multiple modes compared to box plots.
|
||||
|
||||
![Violen plot 1](images/violen-plots1.webp)
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before creating violin charts in matplotlib you must ensure that you have Python as well as Matplotlib installed on your system.
|
||||
|
||||
## Creating a simple Violin Plot with `violinplot()` method
|
||||
|
||||
A basic violin plot can be created with `violinplot()` method in `matplotlib.pyplot`.
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
# Creating dataset
|
||||
data = [np.random.normal(0, std, 100) for std in range(1, 5)]
|
||||
|
||||
# Creating Plot
|
||||
plt.violinplot(data)
|
||||
|
||||
# Show plot
|
||||
plt.show()
|
||||
|
||||
```
|
||||
|
||||
When executed, this would show the following pie chart:
|
||||
|
||||
|
||||
![Basic violin plot](images/violinplotnocolor.png)
|
||||
|
||||
|
||||
The `Violinplot` function in matplotlib.pyplot creates a violin plot, which is a graphical representation of the distribution of data across different levels of a categorical variable. Here's a breakdown of its usage:
|
||||
|
||||
```Python
|
||||
plt.violinplot(data, showmeans=False, showextrema=False)
|
||||
```
|
||||
|
||||
- `data`: This parameter represents the dataset used to create the violin plot. It can be a single array or a sequence of arrays.
|
||||
|
||||
- `showmeans`: This optional parameter, if set to True, displays the mean value as a point on the violin plot. Default is False.
|
||||
|
||||
- `showextrema`: This optional parameter, if set to True, displays the minimum and maximum values as points on the violin plot. Default is False.
|
||||
|
||||
Additional parameters can be used to further customize the appearance of the violin plot, such as setting custom colors, adding labels, and adjusting the orientation. For instance:
|
||||
|
||||
```Python
|
||||
plt.violinplot(data, showmedians=True, showmeans=True, showextrema=True, vert=False, widths=0.9, bw_method=0.5)
|
||||
```
|
||||
- showmedians: Setting this parameter to True displays the median value as a line on the violin plot.
|
||||
|
||||
- `vert`: This parameter determines the orientation of the violin plot. Setting it to False creates a horizontal violin plot. Default is True.
|
||||
|
||||
- `widths`: This parameter sets the width of the violins. Default is 0.5.
|
||||
|
||||
- `bw_method`: This parameter determines the method used to calculate the kernel bandwidth for the kernel density estimation. Default is 0.5.
|
||||
|
||||
Using these parameters, you can customize the violin plot according to your requirements, enhancing its readability and visual appeal.
|
||||
|
||||
|
||||
## Customizing Violin Plots in Matplotlib
|
||||
|
||||
When customizing violin plots in Matplotlib, using `matplotlib.pyplot.subplots()` provides greater flexibility for applying customizations.
|
||||
|
||||
### Coloring Violin Plots
|
||||
|
||||
You can assign custom colors to the `violins` by passing an array of colors to the color parameter in `violinplot()` method.
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
# Creating dataset
|
||||
data = [np.random.normal(0, std, 100) for std in range(1, 5)]
|
||||
colors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange']
|
||||
|
||||
# Creating plot using matplotlib.pyplot.subplots()
|
||||
fig, ax = plt.subplots()
|
||||
|
||||
# Customizing colors of violins
|
||||
for i in range(len(data)):
|
||||
parts = ax.violinplot(data[i], positions=[i], vert=False, showmeans=False, showextrema=False, showmedians=True, widths=0.9, bw_method=0.5)
|
||||
for pc in parts['bodies']:
|
||||
pc.set_facecolor(colors[i])
|
||||
|
||||
# Show plot
|
||||
plt.show()
|
||||
```
|
||||
This code snippet creates a violin plot with custom colors assigned to each violin, enhancing the visual appeal and clarity of the plot.
|
||||
|
||||
|
||||
![Coloring violin](images/violenplotnormal.png)
|
||||
|
||||
|
||||
When customizing violin plots using `matplotlib.pyplot.subplots()`, you obtain a `Figure` object `fig` and an `Axes` object `ax`, allowing for extensive customization. Each `violin plot` consists of various components, including the `violin body`, `lines representing median and quartiles`, and `potential markers for mean and outliers`. You can customize these components using the appropriate methods and attributes of the Axes object.
|
||||
|
||||
- Here's an example of how to customize violin plots:
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
# Creating dataset
|
||||
data = [np.random.normal(0, std, 100) for std in range(1, 5)]
|
||||
colors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange']
|
||||
|
||||
# Creating plot using matplotlib.pyplot.subplots()
|
||||
fig, ax = plt.subplots()
|
||||
|
||||
# Creating violin plots
|
||||
parts = ax.violinplot(data, showmeans=False, showextrema=False, showmedians=True, widths=0.9, bw_method=0.5)
|
||||
|
||||
# Customizing colors of violins
|
||||
for i, pc in enumerate(parts['bodies']):
|
||||
pc.set_facecolor(colors[i])
|
||||
|
||||
# Customizing median lines
|
||||
for line in parts['cmedians'].get_segments():
|
||||
ax.plot(line[:, 0], line[:, 1], color='black')
|
||||
|
||||
# Customizing quartile lines
|
||||
for line in parts['cmedians'].get_segments():
|
||||
ax.plot(line[:, 0], line[:, 1], linestyle='--', color='black', linewidth=2)
|
||||
|
||||
# Adding mean markers
|
||||
for line in parts['cmedians'].get_segments():
|
||||
ax.scatter(np.mean(line[:, 0]), np.mean(line[:, 1]), marker='o', color='black')
|
||||
|
||||
# Customizing axes labels
|
||||
ax.set_xlabel('X Label')
|
||||
ax.set_ylabel('Y Label')
|
||||
|
||||
# Adding title
|
||||
ax.set_title('Customized Violin Plot')
|
||||
|
||||
# Show plot
|
||||
plt.show()
|
||||
```
|
||||
|
||||
![Customizing violin](images/violin-plot4.png)
|
||||
|
||||
In this example, we customize various components of the violin plot, such as colors, line styles, and markers, to enhance its visual appeal and clarity. Additionally, we modify the axes labels and add a title to provide context to the plot.
|
||||
|
||||
### Adding Hatching to Violin Plots
|
||||
|
||||
You can add hatching patterns to the violin plots to enhance their visual distinction. This can be achieved by setting the `hatch` parameter in the `violinplot()` function.
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
# Creating dataset
|
||||
data = [np.random.normal(0, std, 100) for std in range(1, 5)]
|
||||
colors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange']
|
||||
hatches = ['/', '\\', '|', '-']
|
||||
|
||||
# Creating plot using matplotlib.pyplot.subplots()
|
||||
fig, ax = plt.subplots()
|
||||
|
||||
# Creating violin plots with hatching
|
||||
parts = ax.violinplot(data, showmeans=False, showextrema=False, showmedians=True, widths=0.9, bw_method=0.5)
|
||||
|
||||
for i, pc in enumerate(parts['bodies']):
|
||||
pc.set_facecolor(colors[i])
|
||||
pc.set_hatch(hatches[i])
|
||||
|
||||
# Show plot
|
||||
plt.show()
|
||||
```
|
||||
|
||||
![violin_hatching](images/violin-hatching.png)
|
||||
|
||||
|
||||
|
||||
### Labeling Violin Plots
|
||||
|
||||
You can add `labels` to violin plots to provide additional information about the data. This can be achieved by setting the label parameter in the `violinplot()` function.
|
||||
|
||||
An example in shown here:
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
# Creating dataset
|
||||
data = [np.random.normal(0, std, 100) for std in range(1, 5)]
|
||||
labels = ['Group {}'.format(i) for i in range(1, 5)]
|
||||
|
||||
# Creating plot using matplotlib.pyplot.subplots()
|
||||
fig, ax = plt.subplots()
|
||||
|
||||
# Creating violin plots
|
||||
parts = ax.violinplot(data, showmeans=False, showextrema=False, showmedians=True, widths=0.9, bw_method=0.5)
|
||||
|
||||
# Adding labels to violin plots
|
||||
for i, label in enumerate(labels):
|
||||
parts['bodies'][i].set_label(label)
|
||||
|
||||
# Show plot
|
||||
plt.legend()
|
||||
plt.show()
|
||||
```
|
||||
![violin_labeling](images/violin-labelling.png)
|
||||
|
||||
In this example, each violin plot is labeled according to its group, providing context to the viewer.
|
||||
These customizations can be combined and further refined to create violin plots that effectively convey the underlying data distributions.
|
||||
|
||||
### Stacked Violin Plots
|
||||
|
||||
`Stacked violin plots` are useful when you want to compare the distribution of a `single` variable across different categories or groups. In a stacked violin plot, violins for each category or group are `stacked` on top of each other, allowing for easy visual comparison.
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
# Generating sample data
|
||||
np.random.seed(0)
|
||||
data1 = np.random.normal(0, 1, 100)
|
||||
data2 = np.random.normal(2, 1, 100)
|
||||
data3 = np.random.normal(1, 1, 100)
|
||||
|
||||
# Creating a stacked violin plot
|
||||
plt.violinplot([data1, data2, data3], showmedians=True)
|
||||
|
||||
# Adding labels to x-axis ticks
|
||||
plt.xticks([1, 2, 3], ['Group 1', 'Group 2', 'Group 3'])
|
||||
|
||||
# Adding title and labels
|
||||
plt.title('Stacked Violin Plot')
|
||||
plt.xlabel('Groups')
|
||||
plt.ylabel('Values')
|
||||
|
||||
# Displaying the plot
|
||||
plt.show()
|
||||
```
|
||||
![stacked violin plots](images/stacked_violin_plots.png)
|
||||
|
||||
|
||||
### Split Violin Plots
|
||||
|
||||
`Split violin plots` are effective for comparing the distribution of a `single variable` across `two` different categories or groups. In a split violin plot, each violin is split into two parts representing the distributions of the variable for each category.
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
# Generating sample data
|
||||
np.random.seed(0)
|
||||
data_male = np.random.normal(0, 1, 100)
|
||||
data_female = np.random.normal(2, 1, 100)
|
||||
|
||||
# Creating a split violin plot
|
||||
plt.violinplot([data_male, data_female], showmedians=True)
|
||||
|
||||
# Adding labels to x-axis ticks
|
||||
plt.xticks([1, 2], ['Male', 'Female'])
|
||||
|
||||
# Adding title and labels
|
||||
plt.title('Split Violin Plot')
|
||||
plt.xlabel('Gender')
|
||||
plt.ylabel('Values')
|
||||
|
||||
# Displaying the plot
|
||||
plt.show()
|
||||
```
|
||||
|
||||
![Shadow](images/split-violin-plot.png)
|
||||
|
||||
In both examples, we use Matplotlib's `violinplot()` function to create the violin plots. These unique features provide additional flexibility and insights when analyzing data distributions across different groups or categories.
|
||||
|
|
@ -0,0 +1,198 @@
|
|||
# Scatter Plots in Plotly
|
||||
|
||||
* A scatter plot is a type of data visualization that uses dots to show values for two variables, with one variable on the x-axis and the other on the y-axis. It's useful for identifying relationships, trends, and correlations, as well as spotting clusters and outliers.
|
||||
* The dots on the plot shows how the variables are related. A scatter plot is made with the plotly library's `px.scatter()`.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before creating Scatter plots in Plotly you must ensure that you have Python, Plotly and Pandas installed on your system.
|
||||
|
||||
## Introduction
|
||||
|
||||
There are various ways to create Scatter plots in `plotly`. One of the prominent and easiest one is using `plotly.express`. Plotly Express is the easy-to-use, high-level interface to Plotly, which operates on a variety of types of data and produces easy-to-style figures. On the other hand you can also use `plotly.graph_objects` to create various plots.
|
||||
|
||||
Here, we'll be using `plotly.express` to create the Scatter Plots. Also we'll be converting our datasets into pandas DataFrames which makes it extremely convenient and easy to create charts.
|
||||
|
||||
Also, note that when you execute the codes in a simple python file, the output plot will be shown in your **browser**, rather than a pop-up window like in matplotlib. If you do not want that, it is **recommended to create the plots in a notebook (like jupyter)**. For this, install an additional library `nbformat`. This way you can see the output on the notebook itself, and can also render its format to png, jpg, etc.
|
||||
|
||||
## Creating a simple Scatter Plot using `plotly.express.scatter`
|
||||
|
||||
In `plotly.express.scatter`, each data point is represented as a marker point, whose location is given by the x and y columns.
|
||||
|
||||
```Python
|
||||
import plotly.express as px
|
||||
import pandas as pd
|
||||
|
||||
# Creating dataset
|
||||
years = ['1998', '1999', '2000', '2001', '2002']
|
||||
num_of_cars_sold = [200, 300, 500, 700, 1000]
|
||||
|
||||
# Converting dataset to pandas DataFrame
|
||||
dataset = {"Years": years, "Number of Cars sold": num_of_cars_sold}
|
||||
df = pd.DataFrame(dataset)
|
||||
|
||||
# Creating scatter plot
|
||||
fig = px.scatter(df, x='Years', y='Number of Cars sold')
|
||||
|
||||
# Showing plot
|
||||
fig.show()
|
||||
```
|
||||
![Basic Scatter Plot](images/plotly-basic-scatter-plot.png)
|
||||
|
||||
Here, we are first creating the dataset and converting it into a pandas DataFrame using a dictionary, with its keys being DataFrame columns. Next, we are plotting the scatter plot by using `px.scatter`. In the `x` and `y` parameters, we have to specify a column name in the DataFrame.
|
||||
|
||||
`px.scatter(df, x='Years', y='Number of Cars sold')` is used to specify that the scatter plot is to be plotted by taking the values from column `Years` for the x-axis and the values from column `Number of Cars sold` for the y-axis.
|
||||
|
||||
Note: When you generate the image using the above code, it will show you an interactive plot. If you want an image, you can download it from the interactive plot itself.
|
||||
|
||||
## Customizing Scatter Plots
|
||||
|
||||
### Adding title to the plot
|
||||
|
||||
Simply pass the title of your plot as a parameter in `px.scatter`.
|
||||
|
||||
```Python
|
||||
import plotly.express as px
|
||||
import pandas as pd
|
||||
|
||||
# Creating dataset
|
||||
years = ['1998', '1999', '2000', '2001', '2002']
|
||||
num_of_cars_sold = [200, 300, 500, 700, 1000]
|
||||
|
||||
# Converting dataset to pandas DataFrame
|
||||
dataset = {"Years": years, "Number of Cars sold": num_of_cars_sold}
|
||||
df = pd.DataFrame(dataset)
|
||||
|
||||
# Creating scatter plot
|
||||
fig = px.scatter(df, x='Years', y='Number of Cars sold' ,title='Number of cars sold in various years')
|
||||
|
||||
# Showing plot
|
||||
fig.show()
|
||||
```
|
||||
![Scatter Plot title](images/plotly-scatter-title.png)
|
||||
|
||||
### Adding bar colors and legends
|
||||
|
||||
* To add different colors to different bars, simply pass the column name of the x-axis or a custom column which groups different bars in `color` parameter.
|
||||
* There are a lot of beautiful color scales available in plotly and can be found here [plotly color scales](https://plotly.com/python/builtin-colorscales/). Choose your favourite colorscale apply it like this:
|
||||
|
||||
```Python
|
||||
import plotly.express as px
|
||||
import pandas as pd
|
||||
|
||||
# Creating dataset
|
||||
flowers = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
|
||||
petals = [11,9,17,4,7]
|
||||
|
||||
# Converting dataset to pandas DataFrame
|
||||
dataset = {'flowers':flowers, 'petals':petals}
|
||||
df = pd.DataFrame(dataset)
|
||||
|
||||
# Creating pie chart
|
||||
fig = px.pie(df, values='petals', names='flowers',
|
||||
title='Number of Petals in Flowers',
|
||||
color_discrete_sequence=px.colors.sequential.Agsunset)
|
||||
|
||||
# Showing plot
|
||||
fig.show()
|
||||
```
|
||||
![Scatter Plot Colors-1](images/plotly-scatter-colour.png)
|
||||
|
||||
You can also set custom colors for each label by passing it as a dictionary(map) in `color_discrete_map`, like this:
|
||||
|
||||
```Python
|
||||
import plotly.express as px
|
||||
import pandas as pd
|
||||
|
||||
# Creating dataset
|
||||
years = ['1998', '1999', '2000', '2001', '2002']
|
||||
num_of_cars_sold = [200, 300, 500, 700, 1000]
|
||||
|
||||
# Converting dataset to pandas DataFrame
|
||||
dataset = {"Years": years, "Number of Cars sold": num_of_cars_sold}
|
||||
df = pd.DataFrame(dataset)
|
||||
|
||||
# Creating scatter plot
|
||||
fig = px.scatter(df, x='Years',
|
||||
y='Number of Cars sold' ,
|
||||
title='Number of cars sold in various years',
|
||||
color='Years',
|
||||
color_discrete_map={'1998':'red',
|
||||
'1999':'magenta',
|
||||
'2000':'green',
|
||||
'2001':'yellow',
|
||||
'2002':'royalblue'})
|
||||
|
||||
# Showing plot
|
||||
fig.show()
|
||||
```
|
||||
![Scatter Plot Colors-1](images/plotly-scatter-colour-2.png)
|
||||
|
||||
### Setting Size of Scatter
|
||||
|
||||
We may want to set the size of different scatters for visibility differences between categories. This can be done by using the `size` parameter in `px.scatter`, where we specify a column in the DataFrame that determines the size of each scatter point.
|
||||
|
||||
```Python
|
||||
import plotly.express as px
|
||||
import pandas as pd
|
||||
|
||||
# Creating dataset
|
||||
years = ['1998', '1999', '2000', '2001', '2002']
|
||||
num_of_cars_sold = [200, 300, 500, 700, 1000]
|
||||
|
||||
# Converting dataset to pandas DataFrame
|
||||
dataset = {"Years": years, "Number of Cars sold": num_of_cars_sold}
|
||||
df = pd.DataFrame(dataset)
|
||||
|
||||
# Creating scatter plot
|
||||
fig = px.scatter(df, x='Years',
|
||||
y='Number of Cars sold' ,
|
||||
title='Number of cars sold in various years',
|
||||
color='Years',
|
||||
color_discrete_map={'1998':'red',
|
||||
'1999':'magenta',
|
||||
'2000':'green',
|
||||
'2001':'yellow',
|
||||
'2002':'royalblue'},
|
||||
size='Number of Cars sold')
|
||||
|
||||
# Showing plot
|
||||
fig.show()
|
||||
```
|
||||
![Scatter plot size](images/plotly-scatter-size.png)
|
||||
|
||||
### Giving a hover effect
|
||||
|
||||
you can use the `hover_name` and `hover_data` parameters in `px.scatter`. The `hover_name` parameter specifies the column to use for the `hover text`, and the `hover_data` parameter allows you to specify additional data to display when hovering over a point
|
||||
|
||||
```Python
|
||||
import plotly.express as px
|
||||
import pandas as pd
|
||||
|
||||
# Creating dataset
|
||||
years = ['1998', '1999', '2000', '2001', '2002']
|
||||
num_of_cars_sold = [200, 300, 500, 700, 1000]
|
||||
|
||||
# Converting dataset to pandas DataFrame
|
||||
dataset = {"Years": years, "Number of Cars sold": num_of_cars_sold}
|
||||
df = pd.DataFrame(dataset)
|
||||
|
||||
# Creating scatter plot
|
||||
fig = px.scatter(df, x='Years',
|
||||
y='Number of Cars sold' ,
|
||||
title='Number of cars sold in various years',
|
||||
color='Years',
|
||||
color_discrete_map={'1998':'red',
|
||||
'1999':'magenta',
|
||||
'2000':'green',
|
||||
'2001':'yellow',
|
||||
'2002':'royalblue'},
|
||||
size='Number of Cars sold',
|
||||
hover_name='Years',
|
||||
hover_data={'Number of Cars sold': True})
|
||||
|
||||
# Showing plot
|
||||
fig.show()
|
||||
```
|
||||
![Scatter Hover](images/plotly-scatter-hover.png)
|
||||
|