Merge branch 'main' into introduction-to-line-charts-in-plotly

pull/1335/head
Ashita Prasad 2024-07-04 23:11:15 +05:30 zatwierdzone przez GitHub
commit f93a35e8f8
Nie znaleziono w bazie danych klucza dla tego podpisu
ID klucza GPG: B5690EEEBB952194
31 zmienionych plików z 1226 dodań i 95 usunięć

Wyświetl plik

@ -1,36 +1,144 @@
## Regular Expressions in Python
Regular expressions (regex) are a powerful tool for pattern matching and text manipulation.
Regular expressions (regex) are a powerful tool for pattern matching and text manipulation.
Python's re module provides comprehensive support for regular expressions, enabling efficient text processing and validation.
Regular expressions (regex) are a versitile tool for matching patterns in strings. In Python, the `re` module provides support for working with regular expressions.
## 1. Introduction to Regular Expressions
A regular expression is a sequence of characters defining a search pattern. Common use cases include validating input, searching within text, and extracting
A regular expression is a sequence of characters defining a search pattern. Common use cases include validating input, searching within text, and extracting
specific patterns.
## 2. Basic Syntax
Literal Characters: Match exact characters (e.g., abc matches "abc").
Metacharacters: Special characters like ., *, ?, +, ^, $, [ ], and | used to build patterns.
Metacharacters: Special characters like ., \*, ?, +, ^, $, [ ], and | used to build patterns.
**Common Metacharacters:**
* .: Any character except newline.
* ^: Start of the string.
* $: End of the string.
* *: 0 or more repetitions.
* +: 1 or more repetitions.
* ?: 0 or 1 repetition.
* []: Any one character inside brackets (e.g., [a-z]).
* |: Either the pattern before or after.
- .: Any character except newline.
- ^: Start of the string.
- $: End of the string.
- *: 0 or more repetitions.
- +: 1 or more repetitions.
- ?: 0 or 1 repetition.
- []: Any one character inside brackets (e.g., [a-z]).
- |: Either the pattern before or after.
- \ : Used to drop the special meaning of character following it
- {} : Indicate the number of occurrences of a preceding regex to match.
- () : Enclose a group of Regex
Examples:
1. `.`
```bash
import re
pattern = r'c.t'
text = 'cat cot cut cit'
matches = re.findall(pattern, text)
print(matches) # Output: ['cat', 'cot', 'cut', 'cit']
```
2. `^`
```bash
pattern = r'^Hello'
text = 'Hello, world!'
match = re.search(pattern, text)
print(match.group() if match else 'No match') # Output: 'Hello'
```
3. `$`
```bash
pattern = r'world!$'
text = 'Hello, world!'
match = re.search(pattern, text)
print(match.group() if match else 'No match') # Output: 'world!'
```
4. `*`
```bash
pattern = r'ab*'
text = 'a ab abb abbb'
matches = re.findall(pattern, text)
print(matches) # Output: ['a', 'ab', 'abb', 'abbb']
```
5. `+`
```bash
pattern = r'ab+'
text = 'a ab abb abbb'
matches = re.findall(pattern, text)
print(matches) # Output: ['ab', 'abb', 'abbb']
```
6. `?`
```bash
pattern = r'ab?'
text = 'a ab abb abbb'
matches = re.findall(pattern, text)
print(matches) # Output: ['a', 'ab', 'ab', 'ab']
```
7. `[]`
```bash
pattern = r'[aeiou]'
text = 'hello world'
matches = re.findall(pattern, text)
print(matches) # Output: ['e', 'o', 'o']
```
8. `|`
```bash
pattern = r'cat|dog'
text = 'I have a cat and a dog.'
matches = re.findall(pattern, text)
print(matches) # Output: ['cat', 'dog']
```
9. `\``
```bash
pattern = r'\$100'
text = 'The price is $100.'
match = re.search(pattern, text)
print(match.group() if match else 'No match') # Output: '$100'
```
10. `{}`
```bash
pattern = r'\d{3}'
text = 'My number is 123456'
matches = re.findall(pattern, text)
print(matches) # Output: ['123', '456']
```
11. `()`
```bash
pattern = r'(cat|dog)'
text = 'I have a cat and a dog.'
matches = re.findall(pattern, text)
print(matches) # Output: ['cat', 'dog']
```
## 3. Using the re Module
**Key functions in the re module:**
* re.match(): Checks for a match at the beginning of the string.
* re.search(): Searches for a match anywhere in the string.
* re.findall(): Returns a list of all matches.
* re.sub(): Replaces matches with a specified string.
- re.match(): Checks for a match at the beginning of the string.
- re.search(): Searches for a match anywhere in the string.
- re.findall(): Returns a list of all matches.
- re.sub(): Replaces matches with a specified string.
- re.split(): Returns a list where the string has been split at each match.
- re.escape(): Escapes special character
Examples:
Examples:
```bash
import re
@ -45,12 +153,20 @@ print(re.findall(r'\d+', 'abc123def456')) # Output: ['123', '456']
# Substitute matches
print(re.sub(r'\d+', '#', 'abc123def456')) # Output: abc#def#
#Return a list where it get matched
print(re.split("\s", txt)) #['The', 'Donkey', 'in', 'the','Town']
# Escape special character
print(re.escape("We are good to go")) #We\ are\ good\ to\ go
```
## 4. Compiling Regular Expressions
Compiling regular expressions improves performance for repeated use.
Example:
```bash
import re
@ -58,12 +174,15 @@ pattern = re.compile(r'\d+')
print(pattern.match('123abc').group()) # Output: 123
print(pattern.search('abc123').group()) # Output: 123
print(pattern.findall('abc123def456')) # Output: ['123', '456']
```
## 5. Groups and Capturing
Parentheses () group and capture parts of the match.
Example:
```bash
import re
@ -76,21 +195,46 @@ if match:
```
## 6. Special Sequences
Special sequences are shortcuts for common patterns:
* \d: Any digit.
* \D: Any non-digit.
* \w: Any alphanumeric character.
* \W: Any non-alphanumeric character.
* \s: Any whitespace character.
* \S: Any non-whitespace character.
- \A:Returns a match if the specified characters are at the beginning of the string.
- \b:Returns a match where the specified characters are at the beginning or at the end of a word.
- \B:Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word.
- \d: Any digit.
- \D: Any non-digit.
- \w: Any alphanumeric character.
- \W: Any non-alphanumeric character.
- \s: Any whitespace character.
- \S: Any non-whitespace character.
- \Z:Returns a match if the specified characters are at the end of the string.
Example:
```bash
import re
print(re.search(r'\w+@\w+\.\w+', 'Contact: support@example.com').group()) # Output: support@example.com
```
## 7.Sets
A set is a set of characters inside a pair of square brackets [] with a special meaning:
- [arn] : Returns a match where one of the specified characters (a, r, or n) is present.
- [a-n] : Returns a match for any lower case character, alphabetically between a and n.
- [^arn] : Returns a match for any character EXCEPT a, r, and n.
- [0123] : Returns a match where any of the specified digits (0, 1, 2, or 3) are present.
- [0-9] : Returns a match for any digit between 0 and 9.
- [0-5][0-9] : Returns a match for any two-digit numbers from 00 and 59.
- [a-zA-Z] : Returns a match for any character alphabetically between a and z, lower case OR upper case.
- [+] : In sets, +, \*, ., |, (), $,{} has no special meaning
- [+] means: return a match for any + character in the string.
## Summary
Regular expressions are a versatile tool for text processing in Python. The re module offers powerful functions and metacharacters for pattern matching,
searching, and manipulation, making it an essential skill for handling complex text processing tasks.
Regular expressions (regex) are a powerful tool for text processing in Python, offering a flexible way to match, search, and manipulate text patterns. The re module provides a comprehensive set of functions and metacharacters to tackle complex text processing tasks.
With regex, you can:
1.Match patterns: Use metacharacters like ., \*, ?, and {} to match specific patterns in text.
2.Search text: Employ functions like re.search() and re.match() to find occurrences of patterns in text.
3.Manipulate text: Utilize functions like re.sub() to replace patterns with new text.

Wyświetl plik

@ -0,0 +1,216 @@
# Deque in Python
## Definition
A deque, short for double-ended queue, is an ordered collection of items that allows rapid insertion and deletion at both ends.
## Syntax
In Python, deques are implemented in the collections module:
```py
from collections import deque
# Creating a deque
d = deque(iterable) # Create deque from iterable (optional)
```
## Operations
1. **Appending Elements**:
- append(x): Adds element x to the right end of the deque.
- appendleft(x): Adds element x to the left end of the deque.
### Program
```py
from collections import deque
# Initialize a deque
d = deque([1, 2, 3, 4, 5])
print("Initial deque:", d)
# Append elements
d.append(6)
print("After append(6):", d)
# Append left
d.appendleft(0)
print("After appendleft(0):", d)
```
### Output
```py
Initial deque: deque([1, 2, 3, 4, 5])
After append(6): deque([1, 2, 3, 4, 5, 6])
After appendleft(0): deque([0, 1, 2, 3, 4, 5, 6])
```
2. **Removing Elements**:
- pop(): Removes and returns the rightmost element.
- popleft(): Removes and returns the leftmost element.
### Program
```py
from collections import deque
# Initialize a deque
d = deque([1, 2, 3, 4, 5])
print("Initial deque:", d)
# Pop from the right end
rightmost = d.pop()
print("Popped from right end:", rightmost)
print("Deque after pop():", d)
# Pop from the left end
leftmost = d.popleft()
print("Popped from left end:", leftmost)
print("Deque after popleft():", d)
```
### Output
```py
Initial deque: deque([1, 2, 3, 4, 5])
Popped from right end: 5
Deque after pop(): deque([1, 2, 3, 4])
Popped from left end: 1
Deque after popleft(): deque([2, 3, 4])
```
3. **Accessing Elements**:
- deque[index]: Accesses element at index.
### Program
```py
from collections import deque
# Initialize a deque
d = deque([1, 2, 3, 4, 5])
print("Initial deque:", d)
# Accessing elements
print("Element at index 2:", d[2])
```
### Output
```py
Initial deque: deque([1, 2, 3, 4, 5])
Element at index 2: 3
```
4. **Other Operations**:
- extend(iterable): Extends deque by appending elements from iterable.
- extendleft(iterable): Extends deque by appending elements from iterable to the left.
- rotate(n): Rotates deque n steps to the right (negative n rotates left).
### Program
```py
from collections import deque
# Initialize a deque
d = deque([1, 2, 3, 4, 5])
print("Initial deque:", d)
# Extend deque
d.extend([6, 7, 8])
print("After extend([6, 7, 8]):", d)
# Extend left
d.extendleft([-1, 0])
print("After extendleft([-1, 0]):", d)
# Rotate deque
d.rotate(2)
print("After rotate(2):", d)
# Rotate left
d.rotate(-3)
print("After rotate(-3):", d)
```
### Output
```py
Initial deque: deque([1, 2, 3, 4, 5])
After extend([6, 7, 8]): deque([1, 2, 3, 4, 5, 6, 7, 8])
After extendleft([-1, 0]): deque([0, -1, 1, 2, 3, 4, 5, 6, 7, 8])
After rotate(2): deque([7, 8, 0, -1, 1, 2, 3, 4, 5, 6])
After rotate(-3): deque([1, 2, 3, 4, 5, 6, 7, 8, 0, -1])
```
## Example
### 1. Finding Maximum in Sliding Window
```py
from collections import deque
def max_sliding_window(nums, k):
if not nums:
return []
d = deque()
result = []
for i, num in enumerate(nums):
# Remove elements from deque that are out of the current window
if d and d[0] <= i - k:
d.popleft()
# Remove elements from deque smaller than the current element
while d and nums[d[-1]] <= num:
d.pop()
d.append(i)
# Add maximum for current window
if i >= k - 1:
result.append(nums[d[0]])
return result
# Example usage:
nums = [1, 3, -1, -3, 5, 3, 6, 7]
k = 3
print("Maximums in sliding window of size", k, "are:", max_sliding_window(nums, k))
```
Output
```py
Maximums in sliding window of size 3 are: [3, 3, 5, 5, 6, 7]
```
## Applications
- **Efficient Queues and Stacks**: Deques allow fast O(1) append and pop operations from both ends,
making them ideal for implementing queues and stacks.
- **Sliding Window Maximum/Minimum**: Used in algorithms that require efficient windowed
computations.
## Advantages
- Efficiency: O(1) time complexity for append and pop operations from both ends.
- Versatility: Can function both as a queue and as a stack.
- Flexible: Supports rotation and slicing operations efficiently.
## Disadvantages
- Memory Usage: Requires more memory compared to simple lists due to overhead in managing linked
nodes.
## Conclusion
- Deques in Python, provided by the collections.deque module, offer efficient double-ended queue
operations with O(1) time complexity for append and pop operations on both ends. They are versatile
data structures suitable for implementing queues, stacks, and more complex algorithms requiring
efficient manipulation of elements at both ends.
- While deques excel in scenarios requiring fast append and pop operations from either end, they do
consume more memory compared to simple lists due to their implementation using doubly-linked lists.
However, their flexibility and efficiency make them invaluable for various programming tasks and
algorithmic solutions.

Wyświetl plik

@ -22,4 +22,5 @@
- [AVL Trees](avl-trees.md)
- [Splay Trees](splay-trees.md)
- [Dijkstra's Algorithm](dijkstra.md)
- [Deque](deque.md)
- [Tree Traversals](tree-traversal.md)

Wyświetl plik

@ -1,6 +1,6 @@
# Linked List Data Structure
Link list is a linear data Structure which can be defined as collection of objects called nodes that are randomly stored in the memory.
Linked list is a linear data Structure which can be defined as collection of objects called nodes that are randomly stored in the memory.
A node contains two types of metadata i.e. data stored at that particular address and the pointer which contains the address of the next node in the memory.
The last element in a linked list features a null pointer.
@ -36,10 +36,10 @@ The smallest Unit: Node
Now, we will see the types of linked list.
There are mainly four types of linked list,
1. Singly Link list
2. Doubly link list
3. Circular link list
4. Doubly circular link list
1. Singly linked list
2. Doubly linked list
3. Circular linked list
4. Doubly circular linked list
## 1. Singly linked list.
@ -160,6 +160,18 @@ check the list is empty otherwise shift the head to next node.
temp.next = None # Remove the last node by setting the next pointer of the second-last node to None
```
### Reversing the linked list
```python
def reverseList(self):
prev = None
temp = self.head
while(temp):
nextNode = temp.next #Store the next node
temp.next = prev # Reverse the pointer of current node
prev = temp # Move prev pointer one step forward
temp = nextNode # Move temp pointer one step forward.
self.head = prev # Update the head pointer to last node
```
### Search in a linked list
```python
@ -174,6 +186,8 @@ check the list is empty otherwise shift the head to next node.
return f"Value '{value}' not found in the list"
```
Connect all the code.
```python
if __name__ == '__main__':
llist = LinkedList()
@ -197,13 +211,17 @@ check the list is empty otherwise shift the head to next node.
#delete at the end
llist.deleteFromEnd() # 2 3 56 9 4 10
# Print the list
# Print the original list
llist.printList()
llist.reverseList() #10 4 9 56 3 2
# Print the reversed list
llist.printList()
```
## Output:
2 3 56 9 4 10
2 3 56 9 4 10
10 4 9 56 3 2
## Real Life uses of Linked List

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 23 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 25 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 272 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 56 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 15 KiB

Wyświetl plik

@ -0,0 +1,184 @@
# Exploratory Data Analysis
Exploratory Data Analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. EDA is used to understand the data, get a sense of the data, and to identify relationships between variables. EDA is a crucial step in the data analysis process and should be done before building a model.
## Why is EDA important?
1. **Understand the data**: EDA helps to understand the data, its structure, and its characteristics.
2. **Identify patterns and relationships**: EDA helps to identify patterns and relationships between variables.
3. **Detect outliers and anomalies**: EDA helps to detect outliers and anomalies in the data.
4. **Prepare data for modeling**: EDA helps to prepare the data for modeling by identifying missing values, handling missing values, and transforming variables.
## Steps in EDA
1. **Data Collection**: Collect the data from various sources.
2. **Data Cleaning**: Clean the data by handling missing values, removing duplicates, and transforming variables.
3. **Data Exploration**: Explore the data by visualizing the data, summarizing the data, and identifying patterns and relationships.
4. **Data Analysis**: Analyze the data by performing statistical analysis, hypothesis testing, and building models.
5. **Data Visualization**: Visualize the data using various plots and charts to understand the data better.
## Tools for EDA
1. **Python**: Python is a popular programming language for data analysis and has many libraries for EDA, such as Pandas, NumPy, Matplotlib, Seaborn, and Plotly.
2. **Jupiter Notebook**: Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text.
## Techniques for EDA
1. **Descriptive Statistics**: Descriptive statistics summarize the main characteristics of a data set, such as mean, median, mode, standard deviation, and variance.
2. **Data Visualization**: Data visualization is the graphical representation of data to understand the data better, such as histograms, scatter plots, box plots, and heat maps.
3. **Correlation Analysis**: Correlation analysis is used to measure the strength and direction of the relationship between two variables.
4. **Hypothesis Testing**: Hypothesis testing is used to test a hypothesis about a population parameter based on sample data.
5. **Dimensionality Reduction**: Dimensionality reduction is the process of reducing the number of variables in a data set while retaining as much information as possible.
6. **Clustering Analysis**: Clustering analysis is used to group similar data points together based on their characteristics.
## Commonly Used Techniques in EDA
1. **Uni-variate Analysis**: Uni-variate analysis is the simplest form of data analysis that involves analyzing a single variable at a time.
2. **Bi-variate Analysis**: Bi-variate analysis involves analyzing two variables at a time to understand the relationship between them.
3. **Multi-variate Analysis**: Multi-variate analysis involves analyzing more than two variables at a time to understand the relationship between them.
## Understand with an Example
Let's understand EDA with an example. Here we use a famous dataset called Iris dataset.
The dataset consists of 150 samples of iris flowers, where each sample represents measurements of four features (variables) for three species of iris flowers.
The four features measured are :
Sepal length (in cm) Sepal width (in cm) Petal length (in cm) Petal width (in cm).
The three species of iris flowers included in the dataset are :
**Setosa**, **Versicolor**, **Virginica**
```python
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
# Load the Iris dataset
iris = datasets.load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df.head()
```
| Sepal Length (cm) | Sepal Width (cm) | Petal Length (cm) | Petal Width (cm) |
|-------------------|------------------|-------------------|------------------|
| 5.1 | 3.5 | 1.4 | 0.2 |
| 4.9 | 3.0 | 1.4 | 0.2 |
| 4.7 | 3.2 | 1.3 | 0.2 |
| 4.6 | 3.1 | 1.5 | 0.2 |
| 5.0 | 3.6 | 1.4 | 0.2 |
### Uni-variate Analysis
```python
# Uni-variate Analysis
df_setosa=df.loc[df['species']=='setosa']
df_virginica=df.loc[df['species']=='virginica']
df_versicolor=df.loc[df['species']=='versicolor']
plt.plot(df_setosa['sepal_length'])
plt.plot(df_virginica['sepal_length'])
plt.plot(df_versicolor['sepal_length'])
plt.xlabel('sepal length')
plt.show()
```
![Uni-variate Analysis](assets/eda/uni-variate-analysis1.png)
```python
plt.hist(df_setosa['petal_length'])
plt.hist(df_virginica['petal_length'])
plt.hist(df_versicolor['petal_length'])
plt.xlabel('petal length')
plt.show()
```
![Uni-variate Analysis](assets/eda/uni-variate-analysis2.png)
### Bi-variate Analysis
```python
# Bi-variate Analysis
sns.FacetGrid(df,hue="species",height=5).map(plt.scatter,"petal_length","sepal_width").add_legen()
plt.show()
```
![Bi-variate Analysis](assets/eda/bi-variate-analysis.png)
### Multi-variate Analysis
```python
# Multi-variate Analysis
sns.pairplot(df,hue="species",height=3)
```
![Multi-variate Analysis](assets/eda/multi-variate-analysis.png)
### Correlation Analysis
```python
# Correlation Analysis
corr_matrix = df.corr()
sns.heatmap(corr_matrix)
```
| | sepal_length | sepal_width | petal_length | petal_width |
|-------------|--------------|-------------|--------------|-------------|
| sepal_length| 1.000000 | -0.109369 | 0.871754 | 0.817954 |
| sepal_width | -0.109369 | 1.000000 | -0.420516 | -0.356544 |
| petal_length| 0.871754 | -0.420516 | 1.000000 | 0.962757 |
| petal_width | 0.817954 | -0.356544 | 0.962757 | 1.000000 |
![Correlation Analysis](assets/eda/correlation-analysis.png)
## Exploratory Data Analysis (EDA) Report on Iris Dataset
### Introduction
The Iris dataset consists of 150 samples of iris flowers, each characterized by four features: Sepal Length, Sepal Width, Petal Length, and Petal Width. These samples belong to three species of iris flowers: Setosa, Versicolor, and Virginica. In this EDA report, we explore the dataset to gain insights into the characteristics and relationships among the features and species.
### Uni-variate Analysis
Uni-variate analysis examines each variable individually.
- Sepal Length: The distribution of Sepal Length varies among the different species, with Setosa generally having shorter sepals compared to Versicolor and Virginica.
- Petal Length: Setosa tends to have shorter petal lengths, while Versicolor and Virginica have relatively longer petal lengths.
### Bi-variate Analysis
Bi-variate analysis explores the relationship between two variables.
- Petal Length vs. Sepal Width: There is a noticeable separation between species, especially Setosa, which typically has shorter and wider sepals compared to Versicolor and Virginica.
- This analysis suggests potential patterns distinguishing the species based on these two features.
### Multi-variate Analysis
Multi-variate analysis considers interactions among multiple variables simultaneously.
- Pairplot: The pairplot reveals distinctive clusters for each species, particularly in the combinations of Petal Length and Petal Width, indicating clear separation among species based on these features.
### Correlation Analysis
Correlation analysis examines the relationship between variables.
- Correlation Heatmap: There are strong positive correlations between Petal Length and Petal Width, as well as between Petal Length and Sepal Length. Sepal Width shows a weaker negative correlation with Petal Length and Petal Width.
### Insights
1. Petal dimensions (length and width) exhibit strong correlations, suggesting that they may collectively contribute more significantly to distinguishing between iris species.
2. Setosa tends to have shorter and wider sepals compared to Versicolor and Virginica.
3. The combination of Petal Length and Petal Width appears to be a more effective discriminator among iris species, as indicated by the distinct clusters observed in multi-variate analysis.
### Conclusion
Through comprehensive exploratory data analysis, we have gained valuable insights into the Iris dataset, highlighting key characteristics and relationships among features and species. Further analysis and modeling could leverage these insights to develop robust classification models for predicting iris species based on their measurements.
## Conclusion
Exploratory Data Analysis (EDA) is a critical step in the data analysis process that helps to understand the data, identify patterns and relationships, detect outliers, and prepare the data for modeling. By using various techniques and tools, such as descriptive statistics, data visualization, correlation analysis, and hypothesis testing, EDA provides valuable insights into the data, enabling data scientists to make informed decisions and build accurate models.

Wyświetl plik

@ -27,3 +27,4 @@
- [Transformers](transformers.md)
- [Reinforcement Learning](reinforcement-learning.md)
- [Neural network regression](neural-network-regression.md)
- [Exploratory Data Analysis](eda.md)

Wyświetl plik

@ -93,13 +93,9 @@ $$
- Rain:
$$
P(Rain|Yes) = \frac{2}{6}
$$
$$P(Rain|Yes) = \frac{2}{6}$$
$$
P(Rain|No) = \frac{4}{4}
$$
$$P(Rain|No) = \frac{4}{4}$$
- Overcast:
@ -111,10 +107,7 @@ $$
$$
Here, we can see that
$$
P(Overcast|No) = 0
$$
Here, we can see that P(Overcast|No) = 0
This is a zero probability error!
Since probability is 0, naive bayes model fails to predict.
@ -124,13 +117,9 @@ Since probability is 0, naive bayes model fails to predict.
In Laplace's correction, we scale the values for 1000 instances.
- **Calculate prior probabilities**
$$
P(Yes) = \frac{600}{1002}
$$
$$P(Yes) = \frac{600}{1002}$$
$$
P(No) = \frac{402}{1002}
$$
$$P(No) = \frac{402}{1002}$$
- **Calculate likelihoods**
@ -151,21 +140,13 @@ Since probability is 0, naive bayes model fails to predict.
- **Rain:**
$$
P(Rain|Yes) = \frac{200}{600}
$$
$$
P(Rain|No) = \frac{401}{402}
$$
$$P(Rain|Yes) = \frac{200}{600}$$
$$P(Rain|No) = \frac{401}{402}$$
- **Overcast:**
$$
P(Overcast|Yes) = \frac{400}{600}
$$
$$
P(Overcast|No) = \frac{1}{402}
$$
$$P(Overcast|Yes) = \frac{400}{600}$$
$$P(Overcast|No) = \frac{1}{402}$$
2. **Wind (B):**
@ -181,49 +162,27 @@ Since probability is 0, naive bayes model fails to predict.
- **Weak:**
$$
P(Weak|Yes) = \frac{500}{600}
$$
$$
P(Weak|No) = \frac{200}{400}
$$
$$P(Weak|Yes) = \frac{500}{600}$$
$$P(Weak|No) = \frac{200}{400}$$
- **Strong:**
$$
P(Strong|Yes) = \frac{100}{600}
$$
$$
P(Strong|No) = \frac{200}{400}
$$
$$P(Strong|Yes) = \frac{100}{600}$$
$$P(Strong|No) = \frac{200}{400}$$
- **Calculting probabilities:**
$$
P(PlayTennis|Yes) = P(Yes) * P(Overcast|Yes) * P(Weak|Yes)
$$
$$
= \frac{600}{1002} * \frac{400}{600} * \frac{500}{600}
$$
$$
= 0.3326
$$
$$P(PlayTennis|Yes) = P(Yes) * P(Overcast|Yes) * P(Weak|Yes)$$
$$= \frac{600}{1002} * \frac{400}{600} * \frac{500}{600}$$
$$= 0.3326$$
$$
P(PlayTennis|No) = P(No) * P(Overcast|No) * P(Weak|No)
$$
$$
= \frac{402}{1002} * \frac{1}{402} * \frac{200}{400}
$$
$$
= 0.000499 = 0.0005
$$
$$P(PlayTennis|No) = P(No) * P(Overcast|No) * P(Weak|No)$$
$$= \frac{402}{1002} * \frac{1}{402} * \frac{200}{400}$$
$$= 0.000499 = 0.0005$$
Since ,
$$
P(PlayTennis|Yes) > P(PlayTennis|No)
$$
$$P(PlayTennis|Yes) > P(PlayTennis|No)$$
we can conclude that tennis can be played if outlook is overcast and wind is weak.
@ -366,4 +325,4 @@ print("Confusion matrix: \n",confusion_matrix(y_train,y_pred))
## Conclusion
We can conclude that naive bayes may limit in some cases due to the assumption that the features are independent of each other but still reliable in many cases. Naive Bayes is an efficient classifier and works even on small datasets.

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 28 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 24 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 25 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 25 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 37 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 28 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 21 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 18 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 20 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 20 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 21 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 20 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 24 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 24 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 15 KiB

Wyświetl plik

@ -6,9 +6,12 @@
- [Pie Charts in Matplotlib](matplotlib-pie-charts.md)
- [Line Charts in Matplotlib](matplotlib-line-plots.md)
- [Scatter Plots in Matplotlib](matplotlib-scatter-plot.md)
- [Violin Plots in Matplotlib](matplotlib-violin-plots.md)
- [subplots in Matplotlib](matplotlib-sub-plot.md)
- [Introduction to Seaborn and Installation](seaborn-intro.md)
- [Seaborn Plotting Functions](seaborn-plotting.md)
- [Getting started with Seaborn](seaborn-basics.md)
- [Bar Plots in Plotly](plotly-bar-plots.md)
- [Pie Charts in Plotly](plotly-pie-charts.md)
- [Line Charts in Plotly](plotly-line-charts.md)
- [Line Charts in Plotly](plotly-line-charts.md)
- [Scatter Plots in Plotly](plotly-scatter-plots.md)

Wyświetl plik

@ -0,0 +1,130 @@
### 1. Using `plt.subplots()`
The `plt.subplots()` function is a versatile and easy way to create a grid of subplots. It returns a figure and an array of Axes objects.
#### Code Explanation
1. **Import Libraries**:
```python
import matplotlib.pyplot as plt
import numpy as np
```
2. **Generate Sample Data**:
```python
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
y3 = np.tan(x)
```
3. **Create Subplots**:
```python
fig, axs = plt.subplots(3, 1, figsize=(8, 12))
```
- `3, 1` indicates a 3-row, 1-column grid.
- `figsize` specifies the overall size of the figure.
4. **Plot Data**:
```python
axs[0].plot(x, y1, 'r')
axs[0].set_title('Sine Function')
axs[1].plot(x, y2, 'g')
axs[1].set_title('Cosine Function')
axs[2].plot(x, y3, 'b')
axs[2].set_title('Tangent Function')
```
5. **Adjust Layout and Show Plot**:
```python
plt.tight_layout()
plt.show()
```
#### Result
The result will be a figure with three vertically stacked subplots.
![subplot Chart](images/subplots.png)
### 2. Using `plt.subplot()`
The `plt.subplot()` function allows you to add a single subplot at a time to a figure.
#### Code Explanation
1. **Import Libraries and Generate Data** (same as above).
2. **Create Figure and Subplots**:
```python
plt.figure(figsize=(8, 12))
plt.subplot(3, 1, 1)
plt.plot(x, y1, 'r')
plt.title('Sine Function')
plt.subplot(3, 1, 2)
plt.plot(x, y2, 'g')
plt.title('Cosine Function')
plt.subplot(3, 1, 3)
plt.plot(x, y3, 'b')
plt.title('Tangent Function')
```
3. **Adjust Layout and Show Plot** (same as above).
#### Result
The result will be similar to the first method but created using individual subplot commands.
![subplot Chart](images/subplots.png)
### 3. Using `GridSpec`
`GridSpec` allows for more complex subplot layouts.
#### Code Explanation
1. **Import Libraries and Generate Data** (same as above).
2. **Create Figure and GridSpec**:
```python
from matplotlib.gridspec import GridSpec
fig = plt.figure(figsize=(8, 12))
gs = GridSpec(3, 1, figure=fig)
```
3. **Create Subplots**:
```python
ax1 = fig.add_subplot(gs[0, 0])
ax1.plot(x, y1, 'r')
ax1.set_title('Sine Function')
ax2 = fig.add_subplot(gs[1, 0])
ax2.plot(x, y2, 'g')
ax2.set_title('Cosine Function')
ax3 = fig.add_subplot(gs[2, 0])
ax3.plot(x, y3, 'b')
ax3.set_title('Tangent Function')
```
4. **Adjust Layout and Show Plot** (same as above).
#### Result
The result will again be three subplots in a vertical stack, created using the flexible `GridSpec`.
![subplot Chart](images/subplots.png)
### Summary
- **`plt.subplots()`**: Creates a grid of subplots with shared axes.
- **`plt.subplot()`**: Adds individual subplots in a figure.
- **`GridSpec`**: Allows for complex and custom subplot layouts.
By mastering these techniques, you can create detailed and organized visualizations, enhancing the clarity and comprehension of your data presentations.

Wyświetl plik

@ -0,0 +1,277 @@
# Violin Plots in Matplotlib
A violin plot is a method of plotting numeric data and a probability density function. It is a combination of a box plot and a kernel density plot, providing a richer visualization of the distribution of the data. In a violin plot, each data point is represented by a kernel density plot, mirrored and joined together to form a symmetrical shape resembling a violin, hence the name.
Violin plots are particularly useful when comparing distributions across different categories or groups. They provide insights into the shape, spread, and central tendency of the data, allowing for a more comprehensive understanding than traditional box plots.
Violin plots offer a more detailed distribution representation, combining summary statistics and kernel density plots, handle unequal sample sizes effectively, allow easy comparison across groups, and facilitate identification of multiple modes compared to box plots.
![Violen plot 1](images/violen-plots1.webp)
## Prerequisites
Before creating violin charts in matplotlib you must ensure that you have Python as well as Matplotlib installed on your system.
## Creating a simple Violin Plot with `violinplot()` method
A basic violin plot can be created with `violinplot()` method in `matplotlib.pyplot`.
```Python
import matplotlib.pyplot as plt
import numpy as np
# Creating dataset
data = [np.random.normal(0, std, 100) for std in range(1, 5)]
# Creating Plot
plt.violinplot(data)
# Show plot
plt.show()
```
When executed, this would show the following pie chart:
![Basic violin plot](images/violinplotnocolor.png)
The `Violinplot` function in matplotlib.pyplot creates a violin plot, which is a graphical representation of the distribution of data across different levels of a categorical variable. Here's a breakdown of its usage:
```Python
plt.violinplot(data, showmeans=False, showextrema=False)
```
- `data`: This parameter represents the dataset used to create the violin plot. It can be a single array or a sequence of arrays.
- `showmeans`: This optional parameter, if set to True, displays the mean value as a point on the violin plot. Default is False.
- `showextrema`: This optional parameter, if set to True, displays the minimum and maximum values as points on the violin plot. Default is False.
Additional parameters can be used to further customize the appearance of the violin plot, such as setting custom colors, adding labels, and adjusting the orientation. For instance:
```Python
plt.violinplot(data, showmedians=True, showmeans=True, showextrema=True, vert=False, widths=0.9, bw_method=0.5)
```
- showmedians: Setting this parameter to True displays the median value as a line on the violin plot.
- `vert`: This parameter determines the orientation of the violin plot. Setting it to False creates a horizontal violin plot. Default is True.
- `widths`: This parameter sets the width of the violins. Default is 0.5.
- `bw_method`: This parameter determines the method used to calculate the kernel bandwidth for the kernel density estimation. Default is 0.5.
Using these parameters, you can customize the violin plot according to your requirements, enhancing its readability and visual appeal.
## Customizing Violin Plots in Matplotlib
When customizing violin plots in Matplotlib, using `matplotlib.pyplot.subplots()` provides greater flexibility for applying customizations.
### Coloring Violin Plots
You can assign custom colors to the `violins` by passing an array of colors to the color parameter in `violinplot()` method.
```Python
import matplotlib.pyplot as plt
import numpy as np
# Creating dataset
data = [np.random.normal(0, std, 100) for std in range(1, 5)]
colors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange']
# Creating plot using matplotlib.pyplot.subplots()
fig, ax = plt.subplots()
# Customizing colors of violins
for i in range(len(data)):
parts = ax.violinplot(data[i], positions=[i], vert=False, showmeans=False, showextrema=False, showmedians=True, widths=0.9, bw_method=0.5)
for pc in parts['bodies']:
pc.set_facecolor(colors[i])
# Show plot
plt.show()
```
This code snippet creates a violin plot with custom colors assigned to each violin, enhancing the visual appeal and clarity of the plot.
![Coloring violin](images/violenplotnormal.png)
When customizing violin plots using `matplotlib.pyplot.subplots()`, you obtain a `Figure` object `fig` and an `Axes` object `ax`, allowing for extensive customization. Each `violin plot` consists of various components, including the `violin body`, `lines representing median and quartiles`, and `potential markers for mean and outliers`. You can customize these components using the appropriate methods and attributes of the Axes object.
- Here's an example of how to customize violin plots:
```Python
import matplotlib.pyplot as plt
import numpy as np
# Creating dataset
data = [np.random.normal(0, std, 100) for std in range(1, 5)]
colors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange']
# Creating plot using matplotlib.pyplot.subplots()
fig, ax = plt.subplots()
# Creating violin plots
parts = ax.violinplot(data, showmeans=False, showextrema=False, showmedians=True, widths=0.9, bw_method=0.5)
# Customizing colors of violins
for i, pc in enumerate(parts['bodies']):
pc.set_facecolor(colors[i])
# Customizing median lines
for line in parts['cmedians'].get_segments():
ax.plot(line[:, 0], line[:, 1], color='black')
# Customizing quartile lines
for line in parts['cmedians'].get_segments():
ax.plot(line[:, 0], line[:, 1], linestyle='--', color='black', linewidth=2)
# Adding mean markers
for line in parts['cmedians'].get_segments():
ax.scatter(np.mean(line[:, 0]), np.mean(line[:, 1]), marker='o', color='black')
# Customizing axes labels
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
# Adding title
ax.set_title('Customized Violin Plot')
# Show plot
plt.show()
```
![Customizing violin](images/violin-plot4.png)
In this example, we customize various components of the violin plot, such as colors, line styles, and markers, to enhance its visual appeal and clarity. Additionally, we modify the axes labels and add a title to provide context to the plot.
### Adding Hatching to Violin Plots
You can add hatching patterns to the violin plots to enhance their visual distinction. This can be achieved by setting the `hatch` parameter in the `violinplot()` function.
```Python
import matplotlib.pyplot as plt
import numpy as np
# Creating dataset
data = [np.random.normal(0, std, 100) for std in range(1, 5)]
colors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange']
hatches = ['/', '\\', '|', '-']
# Creating plot using matplotlib.pyplot.subplots()
fig, ax = plt.subplots()
# Creating violin plots with hatching
parts = ax.violinplot(data, showmeans=False, showextrema=False, showmedians=True, widths=0.9, bw_method=0.5)
for i, pc in enumerate(parts['bodies']):
pc.set_facecolor(colors[i])
pc.set_hatch(hatches[i])
# Show plot
plt.show()
```
![violin_hatching](images/violin-hatching.png)
### Labeling Violin Plots
You can add `labels` to violin plots to provide additional information about the data. This can be achieved by setting the label parameter in the `violinplot()` function.
An example in shown here:
```Python
import matplotlib.pyplot as plt
import numpy as np
# Creating dataset
data = [np.random.normal(0, std, 100) for std in range(1, 5)]
labels = ['Group {}'.format(i) for i in range(1, 5)]
# Creating plot using matplotlib.pyplot.subplots()
fig, ax = plt.subplots()
# Creating violin plots
parts = ax.violinplot(data, showmeans=False, showextrema=False, showmedians=True, widths=0.9, bw_method=0.5)
# Adding labels to violin plots
for i, label in enumerate(labels):
parts['bodies'][i].set_label(label)
# Show plot
plt.legend()
plt.show()
```
![violin_labeling](images/violin-labelling.png)
In this example, each violin plot is labeled according to its group, providing context to the viewer.
These customizations can be combined and further refined to create violin plots that effectively convey the underlying data distributions.
### Stacked Violin Plots
`Stacked violin plots` are useful when you want to compare the distribution of a `single` variable across different categories or groups. In a stacked violin plot, violins for each category or group are `stacked` on top of each other, allowing for easy visual comparison.
```Python
import matplotlib.pyplot as plt
import numpy as np
# Generating sample data
np.random.seed(0)
data1 = np.random.normal(0, 1, 100)
data2 = np.random.normal(2, 1, 100)
data3 = np.random.normal(1, 1, 100)
# Creating a stacked violin plot
plt.violinplot([data1, data2, data3], showmedians=True)
# Adding labels to x-axis ticks
plt.xticks([1, 2, 3], ['Group 1', 'Group 2', 'Group 3'])
# Adding title and labels
plt.title('Stacked Violin Plot')
plt.xlabel('Groups')
plt.ylabel('Values')
# Displaying the plot
plt.show()
```
![stacked violin plots](images/stacked_violin_plots.png)
### Split Violin Plots
`Split violin plots` are effective for comparing the distribution of a `single variable` across `two` different categories or groups. In a split violin plot, each violin is split into two parts representing the distributions of the variable for each category.
```Python
import matplotlib.pyplot as plt
import numpy as np
# Generating sample data
np.random.seed(0)
data_male = np.random.normal(0, 1, 100)
data_female = np.random.normal(2, 1, 100)
# Creating a split violin plot
plt.violinplot([data_male, data_female], showmedians=True)
# Adding labels to x-axis ticks
plt.xticks([1, 2], ['Male', 'Female'])
# Adding title and labels
plt.title('Split Violin Plot')
plt.xlabel('Gender')
plt.ylabel('Values')
# Displaying the plot
plt.show()
```
![Shadow](images/split-violin-plot.png)
In both examples, we use Matplotlib's `violinplot()` function to create the violin plots. These unique features provide additional flexibility and insights when analyzing data distributions across different groups or categories.

Wyświetl plik

@ -0,0 +1,198 @@
# Scatter Plots in Plotly
* A scatter plot is a type of data visualization that uses dots to show values for two variables, with one variable on the x-axis and the other on the y-axis. It's useful for identifying relationships, trends, and correlations, as well as spotting clusters and outliers.
* The dots on the plot shows how the variables are related. A scatter plot is made with the plotly library's `px.scatter()`.
## Prerequisites
Before creating Scatter plots in Plotly you must ensure that you have Python, Plotly and Pandas installed on your system.
## Introduction
There are various ways to create Scatter plots in `plotly`. One of the prominent and easiest one is using `plotly.express`. Plotly Express is the easy-to-use, high-level interface to Plotly, which operates on a variety of types of data and produces easy-to-style figures. On the other hand you can also use `plotly.graph_objects` to create various plots.
Here, we'll be using `plotly.express` to create the Scatter Plots. Also we'll be converting our datasets into pandas DataFrames which makes it extremely convenient and easy to create charts.
Also, note that when you execute the codes in a simple python file, the output plot will be shown in your **browser**, rather than a pop-up window like in matplotlib. If you do not want that, it is **recommended to create the plots in a notebook (like jupyter)**. For this, install an additional library `nbformat`. This way you can see the output on the notebook itself, and can also render its format to png, jpg, etc.
## Creating a simple Scatter Plot using `plotly.express.scatter`
In `plotly.express.scatter`, each data point is represented as a marker point, whose location is given by the x and y columns.
```Python
import plotly.express as px
import pandas as pd
# Creating dataset
years = ['1998', '1999', '2000', '2001', '2002']
num_of_cars_sold = [200, 300, 500, 700, 1000]
# Converting dataset to pandas DataFrame
dataset = {"Years": years, "Number of Cars sold": num_of_cars_sold}
df = pd.DataFrame(dataset)
# Creating scatter plot
fig = px.scatter(df, x='Years', y='Number of Cars sold')
# Showing plot
fig.show()
```
![Basic Scatter Plot](images/plotly-basic-scatter-plot.png)
Here, we are first creating the dataset and converting it into a pandas DataFrame using a dictionary, with its keys being DataFrame columns. Next, we are plotting the scatter plot by using `px.scatter`. In the `x` and `y` parameters, we have to specify a column name in the DataFrame.
`px.scatter(df, x='Years', y='Number of Cars sold')` is used to specify that the scatter plot is to be plotted by taking the values from column `Years` for the x-axis and the values from column `Number of Cars sold` for the y-axis.
Note: When you generate the image using the above code, it will show you an interactive plot. If you want an image, you can download it from the interactive plot itself.
## Customizing Scatter Plots
### Adding title to the plot
Simply pass the title of your plot as a parameter in `px.scatter`.
```Python
import plotly.express as px
import pandas as pd
# Creating dataset
years = ['1998', '1999', '2000', '2001', '2002']
num_of_cars_sold = [200, 300, 500, 700, 1000]
# Converting dataset to pandas DataFrame
dataset = {"Years": years, "Number of Cars sold": num_of_cars_sold}
df = pd.DataFrame(dataset)
# Creating scatter plot
fig = px.scatter(df, x='Years', y='Number of Cars sold' ,title='Number of cars sold in various years')
# Showing plot
fig.show()
```
![Scatter Plot title](images/plotly-scatter-title.png)
### Adding bar colors and legends
* To add different colors to different bars, simply pass the column name of the x-axis or a custom column which groups different bars in `color` parameter.
* There are a lot of beautiful color scales available in plotly and can be found here [plotly color scales](https://plotly.com/python/builtin-colorscales/). Choose your favourite colorscale apply it like this:
```Python
import plotly.express as px
import pandas as pd
# Creating dataset
flowers = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
petals = [11,9,17,4,7]
# Converting dataset to pandas DataFrame
dataset = {'flowers':flowers, 'petals':petals}
df = pd.DataFrame(dataset)
# Creating pie chart
fig = px.pie(df, values='petals', names='flowers',
title='Number of Petals in Flowers',
color_discrete_sequence=px.colors.sequential.Agsunset)
# Showing plot
fig.show()
```
![Scatter Plot Colors-1](images/plotly-scatter-colour.png)
You can also set custom colors for each label by passing it as a dictionary(map) in `color_discrete_map`, like this:
```Python
import plotly.express as px
import pandas as pd
# Creating dataset
years = ['1998', '1999', '2000', '2001', '2002']
num_of_cars_sold = [200, 300, 500, 700, 1000]
# Converting dataset to pandas DataFrame
dataset = {"Years": years, "Number of Cars sold": num_of_cars_sold}
df = pd.DataFrame(dataset)
# Creating scatter plot
fig = px.scatter(df, x='Years',
y='Number of Cars sold' ,
title='Number of cars sold in various years',
color='Years',
color_discrete_map={'1998':'red',
'1999':'magenta',
'2000':'green',
'2001':'yellow',
'2002':'royalblue'})
# Showing plot
fig.show()
```
![Scatter Plot Colors-1](images/plotly-scatter-colour-2.png)
### Setting Size of Scatter
We may want to set the size of different scatters for visibility differences between categories. This can be done by using the `size` parameter in `px.scatter`, where we specify a column in the DataFrame that determines the size of each scatter point.
```Python
import plotly.express as px
import pandas as pd
# Creating dataset
years = ['1998', '1999', '2000', '2001', '2002']
num_of_cars_sold = [200, 300, 500, 700, 1000]
# Converting dataset to pandas DataFrame
dataset = {"Years": years, "Number of Cars sold": num_of_cars_sold}
df = pd.DataFrame(dataset)
# Creating scatter plot
fig = px.scatter(df, x='Years',
y='Number of Cars sold' ,
title='Number of cars sold in various years',
color='Years',
color_discrete_map={'1998':'red',
'1999':'magenta',
'2000':'green',
'2001':'yellow',
'2002':'royalblue'},
size='Number of Cars sold')
# Showing plot
fig.show()
```
![Scatter plot size](images/plotly-scatter-size.png)
### Giving a hover effect
you can use the `hover_name` and `hover_data` parameters in `px.scatter`. The `hover_name` parameter specifies the column to use for the `hover text`, and the `hover_data` parameter allows you to specify additional data to display when hovering over a point
```Python
import plotly.express as px
import pandas as pd
# Creating dataset
years = ['1998', '1999', '2000', '2001', '2002']
num_of_cars_sold = [200, 300, 500, 700, 1000]
# Converting dataset to pandas DataFrame
dataset = {"Years": years, "Number of Cars sold": num_of_cars_sold}
df = pd.DataFrame(dataset)
# Creating scatter plot
fig = px.scatter(df, x='Years',
y='Number of Cars sold' ,
title='Number of cars sold in various years',
color='Years',
color_discrete_map={'1998':'red',
'1999':'magenta',
'2000':'green',
'2001':'yellow',
'2002':'royalblue'},
size='Number of Cars sold',
hover_name='Years',
hover_data={'Number of Cars sold': True})
# Showing plot
fig.show()
```
![Scatter Hover](images/plotly-scatter-hover.png)