From 4afd4a98f263a411c0eb80c223e1a72886f34cff Mon Sep 17 00:00:00 2001 From: anamika123 Date: Sun, 9 Jun 2024 16:12:58 +0530 Subject: [PATCH 1/5] Pandas Series' --- contrib/pandas/index.md | 1 + contrib/pandas/pandas_series.md | 283 ++++++++++++++++++++++++++++++++ 2 files changed, 284 insertions(+) create mode 100644 contrib/pandas/pandas_series.md diff --git a/contrib/pandas/index.md b/contrib/pandas/index.md index e5a8353..2a15929 100644 --- a/contrib/pandas/index.md +++ b/contrib/pandas/index.md @@ -9,3 +9,4 @@ - [Working with Date & Time in Pandas](datetime.md) - [Importing and Exporting Data in Pandas](import-export.md) - [Handling Missing Values in Pandas](handling-missing-values.md) +- [Pandas Series](pandas_series.md) \ No newline at end of file diff --git a/contrib/pandas/pandas_series.md b/contrib/pandas/pandas_series.md new file mode 100644 index 0000000..8cc12c8 --- /dev/null +++ b/contrib/pandas/pandas_series.md @@ -0,0 +1,283 @@ +# Pandas Series + + + A series is a Panda data structures that represents a one dimensional array-like object containing an array of data and an associated array of data type labels, called index. + +## Creating a Series object: + +### Basic Series +To create a basic Series, you can pass a list or array of data to the `pd.Series()` function. + +```python +import pandas as pd + +s1 = pd.Series([4, 5, 2, 3]) +print(s1) + +#Output: +#0 4 +#1 5 +#2 2 +#3 3 +#dtype: int64 + +``` + +### Series from a Dictionary + +If you pass a dictionary to `pd.Series()`, the keys become the index and the values become the data of the Series. +```python +import pandas as pd + +s2 = pd.Series({'A': 1, 'B': 2, 'C': 3}) +print(s2) + +#Output: +#A 1 +#B 2 +#C 3 +#dtype: int64 +``` + + +## Additional Functionality + + +### Specifying Data Type and Index +You can specify the data type and index while creating a Series. +```python +import pandas as pd + +s4 = pd.Series([1, 2, 3], index=['a', 'b', 'c'], dtype='float64') +print(s4) + +#output +#a 1.0 +#b 2.0 +#c 3.0 +#dtype: float64 +``` + +### Specifying NaN Values: +* Sometimes you need to create a series object of a certain size but you do not have complete data available so in such cases you can fill missing data with a NaN(Not a Number) value. +* When you store NaN value in series object, the data type must be floating pont type. Even if you specify an integer type , pandas will promote it to floating point type automatically because NaN is not supported by integer type. + +```python +import pandas as pd +s3=pd.Series([1,np.Nan,2]) +print(s3) + +#output: + +#0 1.0 +#1 NaN +#2 2.0 +#dtype: float64 +``` + + +### Creating Data from Expressions +You can create a Series using an expression or function. + +``=np.Series(data=,index=None) + +```python +import pandas as pd +a=np.arange(1,5) # [1,2,3,4] +s5=pd.Series(data=a**2,index=a) +print(s5) + +#output: +#1 1 +#2 4 +#3 9 +#4 16 +#dtype: int64 +``` + + +## Series Object Attributes + +| **Attribute** | **Description** | +|--------------------------|---------------------------------------------------| +| `.index` | Array of index of the Series | +| `.values` | Array of values of the Series | +| `.dtype` | Return the dtype of the data | +| `.shape` | Return a tuple representing the shape of the data | +| `.ndim` | Return the number of dimensions of the data | +| `.size` | Return the number of elements in the data | +| `.hasnans` | Return True if there is any NaN in the data | +| `.empty` | Return True if the Series object is empty | + + + +- If you use len() on a series object then it return total number of elements in the series object whereas .count() return only the number of non NaN elements. + +## Accessing a Series object and its elements + +### Accessing Individual Elements +You can access individual elements using their index. +'legal' indexes arte used to access individual element. +```python +import pandas as pd + +s7 = pd.Series(data=[13, 45, 67, 89], index=['A', 'B', 'C', 'D']) +print(s7['A']) + +#output +#13 +``` + + +### Slicing a Series + +- Slices are extracted based on their positional index, regardless of the custom index labels. +- Each element in the Series has a positional index starting from 0 (i.e., 0 for the first element, 1 for the second element, and so on). +- `[:]` will return the values of the elements between the start and end positions (excluding the end position). + +#### Example + +```python +import pandas as pd + +s = pd.Series(data=[13, 45, 67, 89], index=['A', 'B', 'C', 'D']) +print(s[:2]) + +#Output +#A 13 +#B 45 +#dtype: int64 +#This example demonstrates that the first two elements (positions 0 and 1) are returned, regardless of their custom index labels. +``` + + +## Operation on series object + +### Modifying elements and indexes +* [indexes]=< new data value > +* [start : end]=< new data value > +* .index=[new indexes] + +```python +import pandas as pd + +s8 = pd.Series([10, 20, 30], index=['a', 'b', 'c']) +s8['a'] = 100 +s8.index = ['x', 'y', 'z'] +print(s8) + +#output +#x 100 +#y 20 +#z 30 +#dtype: int64 +``` + +**Note: Series object are value-mutable but size immutable objects.** + +### vector operations +We can perform vector operations such as `+`,`-`,`/`,`%` etc. +```python +import pandas as pd + +s9 = pd.Series([1, 2, 3]) +print("addition:", s9 + 5) +print("subtraction:", s9 - 2) + +#output: +#addition: +#0 6 +#1 7 +#2 8 +#dtype: int64 +#subtraction: +#0 -1 +#1 0 +#2 1 +#dtype: int64 +``` + +### Arthmetic on series object +```python +import pandas as pd + +s10 = pd.Series([1, 2, 3]) +s11 = pd.Series([4, 5, 6]) +print("addition:", s10 + s11) + +print("multiplication:", s10 * s11) + +#output: +#addition: +#0 5 +#1 7 +#2 9 +#dtype: int64 +#multiplication: +#0 4 +#1 10 +#2 18 +#dtype: int64 +``` + +Here one thing we should keep in mind that both the series object should have same indexes otherwise it will return NaN value to all the indexes of two series object . + + +### Head and Tail Functions + +| **Functions** | **Description** | +|--------------------------|---------------------------------------------------| +| `.head(n)` | return the first n elements of the series | +| `.tail(n)` | return the last n elements of the series | + +```python +import pandas as pd + +s12 = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100]) +print(s12.head(3)) +print(s12.tail(3)) + +#output +#0 10 +#1 20 +#2 30 +#dtype: int64 + +#7 80 +#8 90 +#9 100 +#dtype: int64 +``` +If you dont provide any value to n the by default it give results for `n=5`. + +### Few extra functions: +| **Function** | **Description** | +|----------------------------------------|------------------------------------------------------------------------| +| `.sort_values()` | Return the Series object in ascending order based on its values. | +| `.sort_index()` | Return the Series object in ascending order based on its index. | +| `.sort_drop()` | Return the Series with the deleted index and its corresponding value. | +```python +import pandas as pd + +s13 = pd.Series([3, 1, 2], index=['c', 'a', 'b']) +print(s13.sort_values()) +print(s13.sort_index()) +print(s13.drop('a')) +#Output +#a 1 +#b 2 +#c 3 +#dtype: int64 + +#a 1 +#b 2 +#c 3 +#dtype: int64 + +#c 3 +#b 2 +#dtype: int64 +``` + +## Conclusion +In short, Pandas Series is a fundamental data structure in Python for handling one-dimensional data. It combines an array of values with an index, offering efficient methods for data manipulation and analysis. With its ease of use and powerful functionality, Pandas Series is widely used in data science and analytics for tasks such as data cleaning, exploration, and visualization. \ No newline at end of file From 4916b99a3ad28b1d464894e50b09f9868d640856 Mon Sep 17 00:00:00 2001 From: anamika123 Date: Sat, 22 Jun 2024 18:19:33 +0530 Subject: [PATCH 2/5] Update pandas_series.md --- contrib/pandas/pandas_series.md | 201 +++++++++++++++++--------------- 1 file changed, 108 insertions(+), 93 deletions(-) diff --git a/contrib/pandas/pandas_series.md b/contrib/pandas/pandas_series.md index 8cc12c8..cbb9a0f 100644 --- a/contrib/pandas/pandas_series.md +++ b/contrib/pandas/pandas_series.md @@ -13,14 +13,14 @@ import pandas as pd s1 = pd.Series([4, 5, 2, 3]) print(s1) - -#Output: -#0 4 -#1 5 -#2 2 -#3 3 -#dtype: int64 - +``` +``` +Output: +0 4 +1 5 +2 2 +3 3 +dtype: int64 ``` ### Series from a Dictionary @@ -31,12 +31,13 @@ import pandas as pd s2 = pd.Series({'A': 1, 'B': 2, 'C': 3}) print(s2) - -#Output: -#A 1 -#B 2 -#C 3 -#dtype: int64 +``` +``` +Output: +A 1 +B 2 +C 3 +dtype: int64 ``` @@ -50,12 +51,13 @@ import pandas as pd s4 = pd.Series([1, 2, 3], index=['a', 'b', 'c'], dtype='float64') print(s4) - -#output -#a 1.0 -#b 2.0 -#c 3.0 -#dtype: float64 +``` +``` +Output: +a 1.0 +b 2.0 +c 3.0 +dtype: float64 ``` ### Specifying NaN Values: @@ -66,13 +68,13 @@ print(s4) import pandas as pd s3=pd.Series([1,np.Nan,2]) print(s3) - -#output: - -#0 1.0 -#1 NaN -#2 2.0 -#dtype: float64 +``` +``` +Output: +0 1.0 +1 NaN +2 2.0 +dtype: float64 ``` @@ -86,15 +88,15 @@ import pandas as pd a=np.arange(1,5) # [1,2,3,4] s5=pd.Series(data=a**2,index=a) print(s5) - -#output: -#1 1 -#2 4 -#3 9 -#4 16 -#dtype: int64 ``` - +``` +Output: +1 1 +2 4 +3 9 +4 16 +dtype: int64 +``` ## Series Object Attributes @@ -123,9 +125,10 @@ import pandas as pd s7 = pd.Series(data=[13, 45, 67, 89], index=['A', 'B', 'C', 'D']) print(s7['A']) - -#output -#13 +``` +``` +Output: +13 ``` @@ -142,12 +145,14 @@ import pandas as pd s = pd.Series(data=[13, 45, 67, 89], index=['A', 'B', 'C', 'D']) print(s[:2]) +``` +``` +Output: +A 13 +B 45 +dtype: int64 -#Output -#A 13 -#B 45 -#dtype: int64 -#This example demonstrates that the first two elements (positions 0 and 1) are returned, regardless of their custom index labels. +This example demonstrates that the first two elements (positions 0 and 1) are returned, regardless of their custom index labels. ``` @@ -165,12 +170,13 @@ s8 = pd.Series([10, 20, 30], index=['a', 'b', 'c']) s8['a'] = 100 s8.index = ['x', 'y', 'z'] print(s8) - -#output -#x 100 -#y 20 -#z 30 -#dtype: int64 +``` +``` +Output: +x 100 +y 20 +z 30 +dtype: int64 ``` **Note: Series object are value-mutable but size immutable objects.** @@ -183,18 +189,21 @@ import pandas as pd s9 = pd.Series([1, 2, 3]) print("addition:", s9 + 5) print("subtraction:", s9 - 2) +``` +``` +output: -#output: -#addition: -#0 6 -#1 7 -#2 8 -#dtype: int64 -#subtraction: -#0 -1 -#1 0 -#2 1 -#dtype: int64 +addition: +0 6 +1 7 +2 8 +dtype: int64 + +subtraction: +0 -1 +1 0 +2 1 +dtype: int64 ``` ### Arthmetic on series object @@ -206,18 +215,21 @@ s11 = pd.Series([4, 5, 6]) print("addition:", s10 + s11) print("multiplication:", s10 * s11) +``` +``` +output: -#output: -#addition: -#0 5 -#1 7 -#2 9 -#dtype: int64 -#multiplication: -#0 4 -#1 10 -#2 18 -#dtype: int64 +addition: +0 5 +1 7 +2 9 +dtype: int64 + +multiplication: +0 4 +1 10 +2 18 +dtype: int64 ``` Here one thing we should keep in mind that both the series object should have same indexes otherwise it will return NaN value to all the indexes of two series object . @@ -236,17 +248,18 @@ import pandas as pd s12 = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100]) print(s12.head(3)) print(s12.tail(3)) +``` +``` +Output: +0 10 +1 20 +2 30 +dtype: int64 -#output -#0 10 -#1 20 -#2 30 -#dtype: int64 - -#7 80 -#8 90 -#9 100 -#dtype: int64 +7 80 +8 90 +9 100 +dtype: int64 ``` If you dont provide any value to n the by default it give results for `n=5`. @@ -263,20 +276,22 @@ s13 = pd.Series([3, 1, 2], index=['c', 'a', 'b']) print(s13.sort_values()) print(s13.sort_index()) print(s13.drop('a')) -#Output -#a 1 -#b 2 -#c 3 -#dtype: int64 +``` +``` +Output: +a 1 +b 2 +c 3 +dtype: int64 -#a 1 -#b 2 -#c 3 -#dtype: int64 +a 1 +b 2 +c 3 +dtype: int64 -#c 3 -#b 2 -#dtype: int64 +c 3 +b 2 +dtype: int64 ``` ## Conclusion From f1c00b3443b3c1b0188d28424581c128c012d256 Mon Sep 17 00:00:00 2001 From: Ashita Prasad Date: Sun, 23 Jun 2024 12:50:58 +0530 Subject: [PATCH 3/5] Update and rename pandas_series.md to pandas-series.md --- .../{pandas_series.md => pandas-series.md} | 90 +++++++++++-------- 1 file changed, 54 insertions(+), 36 deletions(-) rename contrib/pandas/{pandas_series.md => pandas-series.md} (91%) diff --git a/contrib/pandas/pandas_series.md b/contrib/pandas/pandas-series.md similarity index 91% rename from contrib/pandas/pandas_series.md rename to contrib/pandas/pandas-series.md index cbb9a0f..a6fe042 100644 --- a/contrib/pandas/pandas_series.md +++ b/contrib/pandas/pandas-series.md @@ -1,7 +1,6 @@ # Pandas Series - - A series is a Panda data structures that represents a one dimensional array-like object containing an array of data and an associated array of data type labels, called index. +A series is a Panda data structures that represents a one dimensional array-like object containing an array of data and an associated array of data type labels, called index. ## Creating a Series object: @@ -14,8 +13,9 @@ import pandas as pd s1 = pd.Series([4, 5, 2, 3]) print(s1) ``` + +#### Output ``` -Output: 0 4 1 5 2 2 @@ -32,8 +32,9 @@ import pandas as pd s2 = pd.Series({'A': 1, 'B': 2, 'C': 3}) print(s2) ``` + +#### Output ``` -Output: A 1 B 2 C 3 @@ -52,8 +53,9 @@ import pandas as pd s4 = pd.Series([1, 2, 3], index=['a', 'b', 'c'], dtype='float64') print(s4) ``` + +#### Output ``` -Output: a 1.0 b 2.0 c 3.0 @@ -69,8 +71,9 @@ import pandas as pd s3=pd.Series([1,np.Nan,2]) print(s3) ``` + +#### Output ``` -Output: 0 1.0 1 NaN 2 2.0 @@ -89,8 +92,9 @@ a=np.arange(1,5) # [1,2,3,4] s5=pd.Series(data=a**2,index=a) print(s5) ``` + +#### Output ``` -Output: 1 1 2 4 3 9 @@ -111,8 +115,6 @@ dtype: int64 | `.hasnans` | Return True if there is any NaN in the data | | `.empty` | Return True if the Series object is empty | - - - If you use len() on a series object then it return total number of elements in the series object whereas .count() return only the number of non NaN elements. ## Accessing a Series object and its elements @@ -126,12 +128,12 @@ import pandas as pd s7 = pd.Series(data=[13, 45, 67, 89], index=['A', 'B', 'C', 'D']) print(s7['A']) ``` + +#### Output ``` -Output: 13 ``` - ### Slicing a Series - Slices are extracted based on their positional index, regardless of the custom index labels. @@ -146,15 +148,15 @@ import pandas as pd s = pd.Series(data=[13, 45, 67, 89], index=['A', 'B', 'C', 'D']) print(s[:2]) ``` + +#### Output ``` -Output: A 13 B 45 dtype: int64 - -This example demonstrates that the first two elements (positions 0 and 1) are returned, regardless of their custom index labels. ``` +This example demonstrates that the first two elements (positions 0 and 1) are returned, regardless of their custom index labels. ## Operation on series object @@ -171,8 +173,9 @@ s8['a'] = 100 s8.index = ['x', 'y', 'z'] print(s8) ``` + +#### Output ``` -Output: x 100 y 20 z 30 @@ -181,25 +184,32 @@ dtype: int64 **Note: Series object are value-mutable but size immutable objects.** -### vector operations +### Vector operations We can perform vector operations such as `+`,`-`,`/`,`%` etc. + +#### Addition ```python import pandas as pd s9 = pd.Series([1, 2, 3]) -print("addition:", s9 + 5) -print("subtraction:", s9 - 2) +print(s9 + 5) ``` -``` -output: -addition: +#### Output +``` 0 6 1 7 2 8 dtype: int64 +``` -subtraction: +#### Subtraction +```python +print(s9 - 2) +``` + +#### Output +``` 0 -1 1 0 2 1 @@ -207,25 +217,32 @@ dtype: int64 ``` ### Arthmetic on series object + +#### Addition ```python import pandas as pd s10 = pd.Series([1, 2, 3]) s11 = pd.Series([4, 5, 6]) -print("addition:", s10 + s11) - -print("multiplication:", s10 * s11) +print(s10 + s11) ``` -``` -output: -addition: +#### Output +``` 0 5 1 7 2 9 dtype: int64 +``` -multiplication: +#### Multiplication + +```python +print("s10 * s11) +``` + +#### Output +``` 0 4 1 10 2 18 @@ -249,26 +266,28 @@ s12 = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100]) print(s12.head(3)) print(s12.tail(3)) ``` + +#### Output ``` -Output: 0 10 1 20 2 30 dtype: int64 - 7 80 8 90 9 100 dtype: int64 ``` + If you dont provide any value to n the by default it give results for `n=5`. -### Few extra functions: +### Few extra functions | **Function** | **Description** | |----------------------------------------|------------------------------------------------------------------------| | `.sort_values()` | Return the Series object in ascending order based on its values. | | `.sort_index()` | Return the Series object in ascending order based on its index. | | `.sort_drop()` | Return the Series with the deleted index and its corresponding value. | + ```python import pandas as pd @@ -277,22 +296,21 @@ print(s13.sort_values()) print(s13.sort_index()) print(s13.drop('a')) ``` + +#### Output ``` -Output: a 1 b 2 c 3 dtype: int64 - a 1 b 2 c 3 dtype: int64 - c 3 b 2 dtype: int64 ``` ## Conclusion -In short, Pandas Series is a fundamental data structure in Python for handling one-dimensional data. It combines an array of values with an index, offering efficient methods for data manipulation and analysis. With its ease of use and powerful functionality, Pandas Series is widely used in data science and analytics for tasks such as data cleaning, exploration, and visualization. \ No newline at end of file +In short, Pandas Series is a fundamental data structure in Python for handling one-dimensional data. It combines an array of values with an index, offering efficient methods for data manipulation and analysis. With its ease of use and powerful functionality, Pandas Series is widely used in data science and analytics for tasks such as data cleaning, exploration, and visualization. From 372f5ff634d6f4b66aafe0440da2943ef0274724 Mon Sep 17 00:00:00 2001 From: Ashita Prasad Date: Sun, 23 Jun 2024 12:51:40 +0530 Subject: [PATCH 4/5] Update index.md --- contrib/pandas/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/contrib/pandas/index.md b/contrib/pandas/index.md index 2a15929..db008e2 100644 --- a/contrib/pandas/index.md +++ b/contrib/pandas/index.md @@ -9,4 +9,4 @@ - [Working with Date & Time in Pandas](datetime.md) - [Importing and Exporting Data in Pandas](import-export.md) - [Handling Missing Values in Pandas](handling-missing-values.md) -- [Pandas Series](pandas_series.md) \ No newline at end of file +- [Pandas Series](pandas-series.md) From 33be2407c8ef074649604f7ea743088195476df1 Mon Sep 17 00:00:00 2001 From: Ashita Prasad Date: Sun, 23 Jun 2024 12:52:35 +0530 Subject: [PATCH 5/5] Update pandas-series.md --- contrib/pandas/pandas-series.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/contrib/pandas/pandas-series.md b/contrib/pandas/pandas-series.md index a6fe042..88b1235 100644 --- a/contrib/pandas/pandas-series.md +++ b/contrib/pandas/pandas-series.md @@ -256,8 +256,8 @@ Here one thing we should keep in mind that both the series object should have sa | **Functions** | **Description** | |--------------------------|---------------------------------------------------| -| `.head(n)` | return the first n elements of the series | -| `.tail(n)` | return the last n elements of the series | +| `.head(n)` | return the first n elements of the series | +| `.tail(n)` | return the last n elements of the series | ```python import pandas as pd @@ -282,6 +282,7 @@ dtype: int64 If you dont provide any value to n the by default it give results for `n=5`. ### Few extra functions + | **Function** | **Description** | |----------------------------------------|------------------------------------------------------------------------| | `.sort_values()` | Return the Series object in ascending order based on its values. |