In today's digital era, we depend on computers, smartphones and the internet to perform a plethora of tasks, like:
- A mathematical task, such as finding the square root of a number or solving a set of simultaneous equations.
- A text-based task such as reading a document and performing search/replace.
- Streaming and playing multimedia files containing audio and video.
- Using a search engine to find and visit a website.
- Playing an online multiplayer game with friends.
- and many more...
Softwares play an important role as they translate human activity into corresponding machine instructions which are executed to accomplish these tasks.
A **software** is a collection of programs where each program provides a sequence of instructions specifying how the computer should act.
These instructions have to be provided in **machine language** or **low level language** (0s and 1s) that is difficult to read or write for a human being.
This led to the invention of **high-level programming languages** in which programs can be easily written and managed. The human-readable programs written using high-level languages are converted into computer-readable machine code or bytecode using **compilers** or **interpreters**.
There are many high-level programming languages that are currently in wide use.
Guido van Rossum started the development of Python in December 1989. He released the first version (0.9.9) of Python for general public on February 20, 1991.
Python is a **high-level programming language** which can be used to write a program in natural language (english) making it readable, writable, shareable and manageable.
While developing a Python program one is not required to handle the various components of computer architecture like registers, memory addresses and call stacks which have to be handled if an assembly language or a low-level language is used for development.
Python includes high-level language features like variables, data structures (lists, dictionaries, etc.), objects, expressions, modules, classes, functions, loops, threads, file handling, string handling, error handling and other computer science abstraction concepts.
In traditional programming languages like C or C++, codes are compiled into computer-readable machine code before it can be executed.
Python is an **interpreted language** where the Python interpreter reads and executes the program line by line.
The process is more time consuming compared to compiled code execution, but allows faster development as one does not have to go through the entire compilation step during testing and debugging. Also, the code can run on any platform as long as it has a valid Python installation (which includes interpreter) as there is no generation of platform dependent binaries.
Python does not enforce **Object-oriented programming (OOP)**, but completely supports it.
A programmer can define Classes specifying the data in the form of attributes (or properties) and some programming logic in the form of member functions (or methods). Once a class is defined, the user can create an instance of that class which is known as an object.
In Python, everything (`int`, `list`, `dict`, etc.) is an object. We will cover more about objects in detail in the later sections.
As Python is an interpreted language in which the code is executed line-by-line, a python statement or expression is evaluated during run-time. This allows **dynamic typing** (type of a variable can change over its lifetime) and creation of dynamic objects during run-time, which provides more flexibility, usability and fewer lines of code as compared to statically-typed compiled languages like C/C++.
The Python programming language is easy to learn with low technical and conceptual overhead. This makes it an ideal language for beginners to learn programming.
Python has a rich and extensive Standard Library, a collection of predefined functions for various tasks.
Python programmers also have at their disposal the vast ecosystem of more than 250,000 community contributed libraries in the Python Package Index (PyPI), where one can find a solution to every conceivable task.
Some of the most popular web development frameworks (django, flask, etc.) are written in Python. This coupled with the availablity of packages to connect to any database makes Python a great choice for web application development.
After installing the latest version of the Python interpreter, we can now write and execute some basic Python codes.
There are two ways to execute a Python program:
1.**Interactive Mode**: When the IDLE application is launched, the Python interpreter or the Python shell pops up on the screen. User can interact with the Python interpreter and execute statements (single line or multiline code snippets) directly in this Python shell.
2.**Script Mode**: This is the most commonly used method for executing a Python program. The entire Python program is written and saved in a file (`.py` extension) which can be executed using the IDLE application.
To launch the IDLE application click `[Windows Start Menu Button] -> [Python 3.9 Folder] -> [IDLE (Python 3.9 64 bit)]`.
![Launch IDLE](images/0201c.png)
The Python interpreter or the Python shell will pop-up on the screen.
![Python Shell](images/0201d.png)
The version (`3.9`) of the Python interpreter is displayed at the top of the window followed by the `>>>` symbol which indicates that the interpreter is ready to take instructions.
Python commands or statements can be input on this prompt. The input statements are executed instantaneously and any variable assignments are retained as long as the session is not terminated.
Interactive mode is not just restricted to basic arithmetic or assignments. Let us join two strings - `"Hello, "` and `"world!"`.
``` python
>>> "Hello, " + "world!"
'Hello, world!'
```
The complete functionality of Python is easily accessible to a user via the **Interactive Mode**.
This makes it convenient for testing and instant execution of small code snippets (single line or few lines of code), a feature not available in compiled languages like C, C++ and Java.
But, the statements cannot be saved for future use and have to retyped for re-execution. This disadvantage is overcome by the use of Python in **Script Mode** as described in the next section.
Now run this script using `[Run] -> [Run Module]`.
![Execute File](images/0303c.png)
It can be observed that the code has been executed, but no output is displayed on the console (or the standard output) as all outputs have to be explicitly specified when running a code in the script mode.
This can be done by using the `print()` function which is used in Python scripts to display output on the output stream. Let us quickly add the `print()` function in the above code and execute it.
``` python
a = 2 + 2
print(a)
```
Now, when you run the script you will observe that the value of `a`, that is `4`, is now displayed on the console.
When a Python code is executed, the Python interpreter reads each logical line and breaks it into a sequence of lexical units.
These lexical units are better known as **tokens** - the smallest individual units of a program. They are the building blocks of a Python code and can be classified into one of the following categories:
- **Keywords** : Reserved words that convey special meaning when processed by the Python interpreter.
- **Identifiers** : Names defined by the programmer to refer to objects that can represent variables, functions, classes, etc.
- **Literals** : Values specified in the program which belong to exactly one of the Python's built-in data types.
- **Delimiters** : Symbols that denote grouping, punctuation, and assignment/binding.
- **Operators** : Symbols that can operate on data and compute results.
Keywords are reserved words that have special meaning when processed by the Python interpreter. They are case-sensitive and cannot be used for naming identifiers (class, function, variable or structure names).
The list of keywords in Python are provided below:
Identifiers are used for defining the names of Python objects such as variables, functions, classes, modules, etc. The naming convention for identifiers is as follows:
- Must begin with a lowercase character (`a-z`) or an uppercase character (`A-Z`) or underscore sign (`_`).
- Followed by any number of letters (`a-z`, `A-Z`), digits (`0-9`), or underscores (`_`).
- Should not be a keyword.
- No special symbols are allowed like `!`, `@`, `#`, `$`, `%`, etc.
Some points to keep in mind while naming identifiers:
- Identifiers are case-sensitive in nature and any difference in case of any character refers to a different identifier. e.g., `length` and `Length` are different identifiers.
- Identifiers differing by only underscores are different. e.g., `unitlength` and `unit_length` are different identifiers.
It is also a good practice (although not compulsory) to follow the following procedure while naming identifiers:
- Identifiers should be named carefully with an emphasis on clarity and readability. For example, in a program that calculates the area of a rectangle, a good choice for identifier names are - `length`, `breadth` and `area`.
- Class names should start with uppercase character.
- Identifiers starting with an underscore have special meaning in a program.
- Variable, function and method names should be in lowercase characters, with underscores separating multiple words like `area_of_square`, `area_of_triangle`, etc.
Literals are tokens in the source code which represent fixed or constant values. They are often used in assignment statements for initializing variables or in comparison expressions.
The various types of literals available in Python are as follows:
Numeric literals are used for representing numeric values in the source code. They can be of three types - integers, float point numbers and imaginary numbers.
A decimal integer literal consists of one or more digits (`0-9`) and cannot have any zeros preceding the first non-zero digit, except when the number is `0`.
Example base-10 integers:
``` python
34
3283298
864
0
```
`092` is not a valid decimal integer literal as a zero precedes the first non-zero digit `9`.
Floating point literals are real numbers present in the source code. They contain fractional component and/or exponential component.
The fractional component includes the digits after the decimal point (`.`).
Example floating point literals:
```
3.4
.4
8.
3.4E2
3.4e-2
```
In the above example, `.4` is equivalent to `0.4` and `8.` is equivalent to `8.0`.
The exponential component can be identified by the letter `e` or `E` followed by an optional sign (`+` or `-`) and digits (`0-9`). This exponent is equivalent to multiplying the real number with the power of `10`.
For example, `3.4E2` is equivalent to `3.4 x 10^2` or `340.0`, whereas `3.4e-2` is equivalent to `3.4 x 10^-2` or `.034`.
To specify complex numbers and perform complex number mathematics, Python supports imaginary literals which are given by real or integer number followed by the letter `j` or `J` which represents the unit imaginary number.
- there is no specialized literal such as a complex literal. A complex number is actually represented in the program using an expression comprising a real number (integer/float numeric literal) and an imaginary number (imaginary literal). For example, `1 + 2j` consists of an integer literal (`1`) and a imaginary literal (`2j`).
- numeric literals do not include the minus sign (`-`). `-` is actually a unary operator it combines with a numeric literal to represent negative numbers. For example, in `-3.14` the numeric literal is `3.14` and `-` is an operator.
Triple quoted strings can also span multiple lines.
Example:
``` python
s = "I am a String"
s1 = """A
multiline
String"""
s2 = '''Also a
multiline
String'''
```
The backslash (`\`) character can be used in a string literal to escape characters that otherwise have a special meaning, such as newline, linefeed, or the quote character.
| Escape Sequence | Meaning |
|--|--|
| `\\` | Backslash (`\`) |
| `\'` | Single quote (`'`) |
| `\"` | Double quote (`"`) |
| `\a` | ASCII Bell (BEL) |
| `\b` | ASCII Backspace (BS) |
| `\f` | ASCII Formfeed (FF) |
| `\n` | ASCII Linefeed (LF) |
| `\r` | ASCII Carriage Return (CR) |
| `\t` | ASCII Horizontal Tab (TAB) |
| `\v` | ASCII Vertical Tab (VT) |
Although `\'` and `\"` can be used to specify quote characters, Python allows embedding double quotes inside a single-quoted string (`'My name is "Python".'`) and single quotes inside a double-quoted string (`"Python's World"`).
String literals also support unicode characters which can be specified using `\u` escape sequence followed by the 4 letter unicode.
``` python
>>> print("E = mc\u00B2")
E = mc²
```
In the above example, `\u00B2` is the unicode character which represents the 'SUPERSCRIPT TWO'.
Operators are tokens which can be combined with values and variables to create expressions which evaluate to a single value. Python supports a rich set of operators:
``` python
+ - * **
/ // % @
<< >>
& | ^ ~
:= < >
<= >= == !=
```
Each of the above operators are covered in detail in the chapter - Operators.
Delimiters are tokens which are useful for organizing a program and are used in statements, expressions, functions, literal collections, and various other code structures.
They can be classified based on utility as follows:
A set of valid characters that a programming language recognizes is known as its **character set**.
Python is a new age programming language which supports Unicode encoding standard. The default encoding for Python source code is UTF-8 (Unicode Transformation Format – 8-bit) which enables developers to use Unicode characters not only as literals, but also as identifiers.
This makes Python one of the very few programming languages that support multiple languages as shown in the example below:
System.out.println("x is less than or equal to 5");
}
else {
System.out.println("x is more than 5 but less than 10");
}
}
else {
System.out.print("x is not less than 10");
}
```
It can be seen how indentations (`tab` at the beginning of line) are added (not required by programming language) to the code to increase readability, which helps in guiding readers through the code.
Code blocks in Python are inspired by this idea as it makes it easier to understand a Python code.
A block of code is denoted by line indentation, typically **4 spaces** (preferred) or a **tab**. This indentation is used to determine the logical group of statements, with all statements within a group having the same level of indentation.
The corresponding Python code for the above C++/java examples is provided below.
Notice how the code blocks are indented according to the logic.
A program is a sequence of instructions which often acts on information (data) provided by the user.
The process of creating, storing and manipulating this data helps in the computation of new data or the end result.
**Variables are the fundamental building blocks of a program** which provide a way to store, access and modify values during the life-cycle of a program.
Each variable has:
- a name (handle),
- a type or data-type (kind of data), and
- a value (actual data).
In traditional programming languages like Java or C++, the type of the variable is pre-defined.
For example, if you want to use the value `1` inside the program, you can store it in a variable named `a` of type `int`.
```
int a = 1;
```
This `a` is synonymous to a box of fixed dimensions (fixed type) holding something (value `1`) inside it.
![Box 'a'](images/0501a.png)
In case we want to change the contents of the box, we can replace it with something similar (same type).
```
a = 2;
```
![Filled box 'a'](images/0501b.png)
The contents of this box can be replicated and placed in a similar (same type) box:
```
int b = a;
```
![Copy box 'a' contents](images/0501c.png)
Multiple boxes can exist, each containing an item having the same value.
```
int x = 3;
int y = 3;
int z = 3;
```
![Boxes 'x', 'y' & 'z'](images/0501d.png)
As shown above, the programming languages in which the variables (named boxes) are declared along with their types (size of the boxes) are known as **statically typed** languages.
The size of these boxes cannot change later in the program until the variable is re-initialized with the same name and different type.
**Python is a dynamically-typed language**, where every value or data item (of any type like numeric, string, etc.) is an object.
The variable names are just name-tags pointing to the actual object containing data of any type.
As there is no need of any variable declaration in Python before usage, there is no concept of default value (an empty box or `null`) which exists in other programming languages.
Whenever a new object is created in Python, it is assigned a unique identity (ID) which remains the same throughout the lifetime of that object. This ID is the address of the object in memory and the built-in function `id()` returns the value of this address.
``` python
>>> a = 1
>>> id(a)
140407745943856
>>> a = 2
>>> id(a)
140407745943888
```
In the above example, the ID of `a` changes as it points to a new object (`2`).
``` python
>>> b = a
>>> id(b)
140407745943888
```
Also, when `a` is assigned to `b`, instead of creating a new copy, `b` points to the same object as `a`.
Variables can be bound to a reference of an object (of any type) using assignment statements.
You can create an object (data) and bind it's reference to a variable using equal sign (`=`):
``` python
count = 100 # integer
pi = 3.141 # real number
name = "Python" # string
```
Here, L-value refers to the assignable variables (`count`, `pi`, `name`) on the left side of the assignment and R-value refers to the expression on the right side of the assignment operator that has a value (`100`, `3.141`, `"Python"`).
As variables are just references, you can rebind them to another object of same or different type:
In Python, the `type` of a data (or value) is not linked to the variable, but to the actual object which contains it. This type is also known as the object's data type and is used for identifying the operations that can be performed on the data.
The following built-in data types are available in Python:
Often sequence, set and mapping types are also collectively known as **iterables** as they are a collection of items on which a user can traverse (iterate).
Objects storing complex numbers like `2 + 1j, -3j, -1 + 2J` are of type `complex`.
Each complex number has two parts, the real part which is a numeric integer or floating point literal, and the imaginary part which is an imaginary literal.
The boolean data type (`bool`) is a subtype of `int`. It stores the evaluated value of expressions represented as keywords - `True` (integer value `1`) and `False` (integer value `0`).
An ordered collection of items where each item can be accessed using an integer index is known as a sequence. The following three sequence data types are available in Python:
`dict` is a mapping data type which stores values in the form of key-value pairs.
It is used for representing data where you can quickly access the value (any data type) corresponding to a key (any data type except `list`, `set` or `dict`), just like a dictionary where you can lookup the meaning of a given word.
Keys and corresponding values are separated by colon (`:`).
The key-value pairs are separated by comma (`,`) and enclosed within curly braces - `{ }`.
Some example dictionaries are - `{1: "a", 2: "b", 3: "c"}`, `{"name": "edpunk", "language": "python"}`.
The Python interpreter automatically converts the data type without the need of user intervention when evaluating expressions to determine the final data type.
In the below example the final type of `c` is automatically determined as `float` by the Python interpreter.
When the type conversion is explicitly specified by the user using the various built-in functions available in Python, it is known as explicit type casting.
The built-in functions which can be used for explicit type casting are as follows:
A data type is said to be immutable when the value of an object of that type cannot be modified.
The following data types are immutable:
-`int`
-`float`
-`complex`
-`bool`
-`tuple`
-`str`
-`None`
You might be wondering if some of the above types are immutable then how are we able modify the values of variables?
In case of variable re-assignment, the original objects are not modified, but new objects (with new values) are created in a new memory location and are bound to the variables. The object containing the old value is destroyed if no other variable references it.
Let us take an example,
``` python
>>> a = 1
>>> id_a = id(a)
>>> a = 2
>>> id_a2 = id(a)
>>> id_a == id_a2
False
```
You can witness in the above example how the object containing the value `1` is different from the object containing the value `2`, and `a` points to the latest object.
Sequence data types like strings and tuples are also immutable, i.e., no modifications are permitted to any item once it is created and any attempt to do so raises an error.
``` python
>>> s = "Hello"
>>> s[1] = "P"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
>>> t = (1, 2, 3)
>>> t[1] = 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
```
Although, similar to numeric types the variables can be re-assigned to new sequences.
In Python, the following data types are mutable, i.e., any modification does not create a new object but modifies the existing object:
-`list`
-`set`
-`dict`
Let us take a list and modify its contents.
``` python
>>> l = [1, 2, 3]
>>> id_l = id(l)
>>> l[0] = 0
>>> l
[0, 2, 3]
>>> id_l2 = id(l)
>>> id_l == id_l2
True
```
Let us take an example of a dictionary and add a new `key:value` pair.
``` python
>>> d = {"a": "apple", "b": "boy"}
>>> id_d = id(d)
>>> d["c"] = "cat"
>>> d
{'a': 'apple', 'b': 'boy', 'c': 'cat'}
>>> id_d2 = id(d)
>>> id_d == id_d2
True
```
Let us take an example of a set and add new item.
``` python
>>> s = {"apple", "bat"}
>>> id_s = id(s)
>>> s.add("cat")
>>> s
{'cat', 'apple', 'bat'}
>>> id_s2 = id(s)
>>> id_s == id_s2
True
```
In the above examples, the `id` of the objects (`list`, `dict`, `set`) do not change, which implies that no new objects are created and the original objects are modified.
`input()` function is used to accept new input data from the user.
When this function is encountered in the code, the python interpreter waits for the user to type a response which is read as a string and assigned to a variable.
``` python
>>> name = input()
edpunk
>>> name
'edpunk'
```
The function also has an optional string argument which is used as a prompt message for the user.
``` python
>>> name2 = input("Enter name: ")
Enter name: EdPunk
>>> name2
'EdPunk'
```
User input can be converted into integer or floating point numbers using the type conversion functions `int()` and `float()`.
In the above code snippet, each `print()` function invocation creates a new line of output. This is because `end` parameter has the newline character (`'\n'`) as the default value in the `print()` function.
All non-keyword arguments or expressions are converted to strings and written to the output stream by the `print()` function. They are separated by `sep` and followed by `end`. An empty `print()` invocation writes `end` parameter (an empty line as `end` defaults to the newline character `'\n'`).
The `*` operator multiplies the values of numeric operands.
``` python
>>> 2 * 3
6
```
In case the operands are of type `str`, `list` or `tuple`, the `*` operator returns a sequence or string self-concatenated the specified number of times.
Relational operators are useful for comparing the values of the operands to determine their relationship. Following relational operators are available in Python:
The `>` operator returns `True` if the value of operand on left is greater than the value of operand on right.
``` python
>>> 3 > 2
True
>>> 2 > 2
False
```
In case of strings operands, `>` operator perform comparison according to the Unicode code point (integer) of each character one-by-one.
The Unicode code point of a character can be obtained using the `ord()` function in Python.
The code point of first character of both operands are compared. In case they are equal, the code point of next character of both operands are compared and the process continues.
For example,
``` python
>>> "python" > "Python"
True
```
The code point of `"p"` (`112`) is greater than the code point of `"P"` (`80`). As `112` is greater than `80` the expression evaluates to `True`.
Let us take another example:
``` python
>>> "pYthon" > "python"
False
```
The code point of first character is same (`112`), so the next set of characters are compared. The code point of `"Y"` (`89`) is not greater than the code point of `"y"` (`121`) so the expression evaluates to `False`.
If two string operands `p` and `q` are of unequal lengths (`len(p) <len(q)`)and`p`isasubstringof`q`suchthat`q = pt`wheretisanystringoflengthgreaterthan`0`,then`q > p`returns`True`.
``` python
>>> "python" > "py"
True
```
In case of sequence operands like `list` or `tuple`, the items are compared one-by-one starting from index `0`.
``` python
>>> ["p","py","PY"] > ["p","Py","PY"]
True
>>> [1, 3] > [1, 2]
True
>>> [1, 3, 4] > [1, 2]
True
```
In the above examples, `"py"` is greater than `"Py"` and `3` is greater than `2` respectively.
If two sequences are of unequal lengths and the smaller sequence is the starting subsequence of the larger one, then the larger sequence is considered greater than the smaller one.
We have already witnessed how Python treats every value or data item as an object.
The relational operator `==` can be used to test whether the operands contain the same value.
``` python
>>> n = 1
>>> n2 = 1
>>> n == n2
True
```
This operator however does not check if both the operands are referring to the same object or different objects.
The identity operators `is` and `is not` are used to test whether two objects have the same or different identity (pointing to the same location in memory) respectively.
`a is b` is equivalent to `id(a) == id(b)`, where `id()` is the built-in function which returns the identity of an object.
``` python
>>> n = 1
>>> n2 = 1
>>> n is n2
True
```
In the above example, both variables `n` and `n2` point to that same memory location (same object).
``` python
>>> l = [1, 2, 3]
>>> l2 = [1, 2, 3]
>>> l == l2
True
>>> l is l2
False
```
In the above example, both lists `l` and `l2` although contain items with same values, are actually two different objects occupying different memory locations.
Comparison operators can be chained together in Python.
For example, `lower <= age <= upper` is a valid chained expression which is equivalent to the expression -
`lower <= age and age <= upper`.
If `a`, `b`, `c`, …, `y`, `z` are expressions and `op1`, `op2`, …, `opN` are comparison operators, then the chained expression `a op1 b op2 c ... y opN z` is equivalent to `a op1 b and b op2 c and ... y opN z`.
Python does not have ternary operators (`?:`) like other programming languages. Hence, the keywords `if` and `else` are used to create conditional expressions which evaluates to a value based on the given condition.
For example,
``` python
var = t_val if cond else f_val
```
If the above condition `cond` evaluates to `True`, then the variable `var` is assigned `t_val`, else it is assigned `f_val`.
While studying mathematics in middle school, we came across the **BODMAS** (Bracket, Of, Division, Multiplication, Addition, and Subtraction) rule which helps us in understanding how mathematical expressions are computed in the presence of multiple operators (`of`, `x`, `/`, `+`, `-`).
In Python, we have a large number of operators and a similar rule to determine the order of evaluation of an expression. This is known as **operator precedence** where the operator with higher precedence is evaluated before the operator with lower precedence in an expression.
The table below presents the precedence of operators in Python from highest to lowest. Operators in the same row have the same precedence, so in such cases the expression is evaluated from left to right.
A program contains **"bug(s)"** when it is unable to execute or produces an output which is different from what is expected. These bugs are generally introduced by a programmer unknowingly.
The process of identifying and eliminating these bugs or errors is known as **debugging**.
Syntax error occurs when the program contains any statement that does not follow the prescribed Python rules or syntax which makes it difficult for the Python interpreter to parse (understand) and execute it.
When a syntactically incorrect statement is executed in the Python console (interactive mode), the Python interpreter displays it and also adds a little arrow (`^`) pointing at the entry point or token where the error was detected.
In the above example there is a syntax error with `^` pointing to `print` function which the parser is unable to understand as there is a missing `:` (colon) after `True`.
A runtime error occurs when the program is terminated prematurely by the Python interpreter as it is unable to execute a statement although it is correct syntactically.
Some runtime error examples are:
- **ImportError**: Raised when the `import` statement has trouble loading a module or any definition from a module.
- **IOError**: Raised when the interpreter is not able to open the file specified in the program.
- **ZeroDivisionError**: Raised when a number is divided or mod by zero.
- **NameError**: Raised when an identifier is encountered which has not been defined.
- **ValueError**: Raised when an argument or operand is of required data type, but has undesired value.
- **IndexError**: Raised when the provided index in a sequence (string, list, tuple, etc.) is out of range.
- **KeyError**: Raised when a dictionary key is not found in the set of existing keys.
- **TypeError**: Raised while performing an operation on incompatible types.
- **IndentationError**: Raised when the indentation of a statement or code block is incorrect.
We have witnessed that even if a program is syntactically correct, its execution may lead to a run-time error.
This error detected during execution is known as an **exception** which is an object created by the Python interpreter containing information regarding the error like type of error, file name and the location of the error (line number, token) in the program.
Some of the built-in exceptions that are raised by the Python interpreter are - `ImportError`, `ZeroDivisionError`, `NameError`, `ValueError`, `IndexError`, `KeyError`, `TypeError` and `IndentationError`.
Apart from the Python interpreter, a programmer can also trigger and raise an exception (along with a custom message) in the code using `raise` or `assert` statement.
An `assert` statement is often used during code development to act like a safety valve which notifies the programmer in case the test expression is evaluated as `False`.
If the test expression’s value is `True`, the code execution continues normally.
An `AssertionError` is raised if the value is `False`.
Exception handling is the process of properly handling an exception which can potentially crash a program during execution.
When an error occurs, the program throws an exception.
The runtime system attempts to find an **exception handler**, a block of code that can handle a particular type of error. Once located, the suitable exception handler **catches the exeception** and executes the code block which can attempt to recover from the error. In case the error is unrecoverable, the handler provides a way to gently exit the program.
The `try` statement in Python specifies the exception handlers and/or cleanup code for a code block.
The various parts of a try statement are:
-`try` block: The block of statements within which an exception might be thrown.
-`except` clause(s): One or more exception handlers. Each `except` clause handles a particular type of exception. In case an exception of a particular type occurs in the `try` block, the corresponding `except` clause code block is executed.
-`else` clause: An optional `else` clause can also be included after the last `except` block. In case no exception is raised, none of the `except` blocks are executed. In this case, the `else` code block is executed.
-`finally` clause: An optional `finally` clause can be added at the end of the try statement which includes a block of statements that are executed regardless of whether or not any error occured inside the try block. This block is usually setup for code cleanup and closing all open file objects.
A simple Python program can be treated as a block of code where each statement is executed by the Python interpreter in a sequential order from top to bottom.
But, in real world we would like to have some control over the execution of code such as:
- skip or execute a block (set of statements) based on certain conditions
- execute a block repeatedly
- redirect execution to another set of statements
- breaking up the execution
This control over the flow of execution is provided by **Control Flow Statements**.
They can be categorized as:
- Sequential
- Selection
- Iteration/Repetition
- Jump
- Procedural Abstraction - A sequence of statements are referenced as a single function or method call
- Recursion - Calling a method/function in the same method/function
Selection statements, also known as Decision making statements, control the flow of a program based on the outcome of one or many test expression(s). If the condition is satisfied (`True`) then the code block is executed. There is also a provision to execute another code block if the condition is not satisfied.
This process can be demonstrated using the below flowchart:
![Selection Flow](images/0803a.png)
Python supports `if` compound statement which provides this control. The `if` statement comprises:
-`if` keyword followed by the test expression, a colon `:` and an indented block of code which gets executed if the condition is satisfied
- (optional) one or many `elif` clause followed by their test conditions and their corresponding code blocks
- (optional) `else` clause and the corresponding code block which gets executed if none of the above conditions (`if`, `elif`) are satisfied
Iteration statements, also known as Looping statements, allow repeated execution of a code block.
Python provides `for` and `while` statements to perform iteration.
The `for` statement can be used to iterate over the items of a sequence (`list`, `string`, `tuple`, `range`). It can also be used to iterate over unordered sequences like `set` and `dict`.
This process can be demonstrated using the below flowchart:
![Iteration in Flow of Code Python](images/0804a.png)
Let us go through some code examples to demonstrate how `for` statement can be used to iterate over sequences.
The `range` type represents an immutable sequence of numbers that is usually used in for loops for looping a certain number of times. `range` object always take the same (small) amount of memory, no matter the size of the range it represents, which is an advantage over a regular `list` or `tuple`.
**Syntax**: `range(stop)` or
`range(start, stop[, step])`
``` python
>>> range(10)
range(0, 10)
>>> list(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(range(1, 10, 2))
[1, 3, 5, 7, 9]
```
`range()` function is widely used in a `for` statement to control the number of iterations and provide the index value (`i`) of each iteration.
`while` statement repeatedly executes a code block as long as the test condition is satisfied.
Usually there is a statement at the end of the code block which updates the value of the variable being used in the test expression, so the the loop does not execute infinitely.
A flowchart of the process is provided below:
![while loop Iteration in Flow of Code](images/0805a.png)
For example, let us traverse a list and print the position(index) and value of each element until we reach the end of the list.
When a loop is present inside another loop, it is known as a nested loop.
For each iteration of the outer loop, the inner loop undergoes complete iteration. Thus, if the outer loop has to undergo `n` iterations and the inner loop has to undergo `m` iterations, the code block inside the inner loop executes `n x m` times.
The backslash (`\`) character can be used in a string to escape characters that otherwise have a special meaning, such as newline, linefeed, or the quote character.
| Escape Sequence | Meaning |
|--|--|
| `\\` | Backslash (`\`) |
| `\'` | Single quote (`'`) |
| `\"` | Double quote (`"`) |
| `\a` | ASCII Bell (BEL) |
| `\b` | ASCII Backspace (BS) |
| `\f` | ASCII Formfeed (FF) |
| `\n` | ASCII Linefeed (LF) |
| `\r` | ASCII Carriage Return (CR) |
| `\t` | ASCII Horizontal Tab (TAB) |
| `\v` | ASCII Vertical Tab (VT) |
Although `\'` and `\"` can be used to specify quote characters, Python allows embedding double quotes inside a single-quoted string (`'My name is "Python".'`) and single quotes inside a double-quoted string (`"Python's World"`).
In Python, a character in a string can be easily accessed using its index.
``` python
>>> s = "Hello"
>>> s[1]
'e'
```
Python also provides a way to access a substring from a string. This substring is known as a **slice** and it can be obtained using the slice operator `[n:m]` which returns the part of the string from the start index (`n`) to the end index (`m`), including the first but excluding the last.
``` python
>>> s = "Hello"
>>> s[1:3]
'el'
```
If the start index (`n`) is omitted, the default value of `n` is set as `0` which denotes the beginning of the string. If the end index (`m`) is omitted, the substring ends at the last character of the string.
``` python
>>> s = "Hello"
>>> s[:3]
'Hel'
>>> s[3:]
'lo'
>>> s[:]
'Hello'
```
Negative indexing is also supported in the slice operator.
``` python
>>> s = "Hello"
>>> s[-4:-2]
'el'
```
In the above example, `-4` is equivalent to `len(s) - 4 = 5 - 4 = 1` and `-2` is equivalent to `5 - 2 = 3`. Thus, `s[-4:-2]` is same as `s[1:3]`.
The slice operator also allows the usage of a third index which is known as step as it allows a user to step over (skip) characters.
``` python
>>> s = "Hello"
>>> s[0:5:2]
'Hlo'
```
In the above example, the substring begins at the start of the string, takes a step size of `2` skipping `e` and ends at the last character again skipping the 4th character `l`.
Apart from the built-in function `len()` which returns the length of the string, String objects have access to several specialized functions (methods) that can:
Returns `True` if there are only whitespace characters in the string. Some popular whitespace characters are ` ` (space), `\t` (tab), `\n` (newline), `\r` (carriage return), `\f` (form feed) and `\v` (vertical tab).
Returns `True` if the string is title-cased, i.e., the first character of every word in the string is uppercased and the remaining characters are lowercased.
`partition(sep)` method splits the string when the separator (`sep`) is encountered for the first time, and returns a tuple with three items `(string before separator, separator, string after separator)`.
`split(sep=None, maxsplit=-1)` method splits a string into a list based on a string separator (`sep`).
If `sep` is not specified, it defaults to `None`, where whitespace is regarded as separator, and the string is stripped of all leading and trailing whitespaces after which it is split into words contained in the string.
``` python
>>> "Hi|Ed|Punk".split('|')
['Hi', 'Ed', 'Punk']
>>> "Hi Ed Punk".split()
['Hi', 'Ed', 'Punk']
>>> " Hi Ed Punk ".split()
['Hi', 'Ed', 'Punk']
```
If `maxsplit` is provided, at most `maxsplit` number of splits are performed and the list will contain a maximum of `maxsplit+1` elements.
`maxsplit` when not specified defaults to `-1`, which implies that there is no limit on the number of splits.
`count(sub[, start[, end]])` returns the number of non-overlapping occurrences of a substring `sub` in the range `[start, end]`.
`start` and `end` are optional parameters and they default to `0` and `len(string)` respectively.
``` python
>>> s = "she sells sea shells"
>>> s.count("she")
2
>>> s.count("she", 5)
1
>>> s.count("she", 5, 10)
0
>>> s.count("she", 5, 17)
1
```
It has to be noted that the method counts non-overlapping occurrences, so it does not start a new matching process until the current substring matching is complete.
``` python
>>> s = "valhala alala"
>>> s.count("al")
4
>>> s.count("ala")
2
```
In the above example, `ala` is counted twice as the first occurence is in `valh"ala"` and the next occurance is in `"ala"la`. Although `ala` can be located again in `al"ala"`, it overlaps with the occurance `"ala"la`, hence it is not counted.
`index(sub[, start[, end]])` is similar to `find(sub[, start[, end]])`, but instead of returning `-1` it raises `ValueError` when the substring is not found.
`s.rindex(sub[, start[, end]])` is similar to `rfind(sub[, start[, end]])`, but instead of returning `-1` it raises `ValueError` when the substring is not found.
The most common and widely used collections in Python are **lists** which store an ordered group of objects (of any datatype) which might have some logical relation. This marks a considerable difference from arrays (in traditional languages) and makes Python an ideal language for handling real-life data which is not type-bound.
Let us create a list of attributes of a particular vehicle available for sale in a car dealership:
``` python
>>> l = ["BMW", "Z4", 2019,
... 4, "Red", True]
```
In this list:
-`"BMW"` is the make of the vehicle,
-`"Z4"` is the model of the vehicle,
-`2019` is the year when the vehicle was manufactured,
-`4` represents the number of wheels,
-`"Red"` is the color of the vehicle, and
-`True` tells us that the vehicle up for sale is brand new.
This method of creating a list from a collection of literals is known as **list display**.
Notice, how this list contains items of multiple data types - `str`, `int` and `bool`.
Apart from the list display shown above, the built-in `list()` function can also be used to create new lists.
If no arguments are provided to the `list()` function, an empty list is created.
``` python
>>> l = list()
>>> l
[]
```
If a string, tuple or set is passed as an argument, `list()` functions converts them into a list.
A subset of list `l` can be obtained using the list slice notation given as `l[i:j]`, where the item at start index `i` is included, but the item at end index `j` is excluded.
For example, the slice notation `[1:4]` refers to items from index `1` to index `3` (i.e. `4-1`).
``` python
>>> l = ["BMW", "Z4", 2019,
... 4, "Red", True]
>>> l[1:4]
['Z4', 2019, 4]
```
The slice notation `l[i:j:k]` can also include a third number known as the stepper. Here, a list is sliced from start index `i` to end index (`j`) - 1 with a step of `k` items.
``` python
>>> l = ["BMW", "Z4", 2019,
... 4, "Red", True]
>>> l[1:4:2]
['Z4', 4]
```
Slice notations also have some useful defaults. 0 is the default for the first number and size of the list is the default for the second number.
``` python
>>> l = ["BMW", "Z4", 2019,
... 4, "Red", True]
>>> l[2:]
[2019, 4, 'Red', True]
>>> l[:4]
['BMW', 'Z4', 2019, 4]
```
Slice notation also supports negative indexing.
``` python
>>> l = ["BMW", "Z4", 2019,
... 4, "Red", True]
>>> l[-4:]
[2019, 4, 'Red', True]
>>> l[:-2]
['BMW', 'Z4', 2019, 4]
>>> l[-4:-1]
[2019, 4, 'Red']
```
Slice notation can be used to replace multiple items in a list.
``` python
>>> l = ["BMW", "Z4", 2019,
... 4, "Red", True]
>>> l[:2] = ["Kia", "Sonet"]
>>> l
['Kia', 'Sonet', 2019, 4, 'Red', True]
>>> l = ["BMW", "Z4", 2019,
... 4, "Red", True]
>>> l[1:5:2] = ["Sonet", 2]
>>> l
['BMW', 'Sonet', 2019, 2, 'Red', True]
```
Slice notation can also be used to delete multiple items in a list.
List Traversal is the process of visiting every item in a list, usually from the first item to the last item, and executing some instruction on the accessed item.
An item `x` can be located in a list using the `index(x[, i[, j]])` method which returns the first occurrence of the item at or after index `i` and before index `j`.
In case `i` and `j` are not specified they default to `i=0` and `j=len(l)`.
``` python
>>> l = [34, 4, 6, 23, 4]
>>> l.index(4)
1
>>> l.index(4, 3)
4
>>> l.index(6, 1, 4)
2
```
`count()` method can be used to count the occurrence(s) of an item in a list.
`reverse()` method can be used to reverse a list in-place.
``` python
>>> l = ["T", "C", 2, 4, "S"]
>>> l.reverse()
>>> l
['S', 4, 2, 'C', 'T']
```
If you do not wish to modify the existing list and create a new list with items in reverse order, use the built-in function `reversed()` nested in the built-in `list()`.
Python lists have a built-in `sort()` method which sorts the items in-place using `<` comparisons between items.
The method also accepts 2 keyworded arguments:
-`key` is used to specify a function which is called on each list element prior to making the comparisons.
-`reverse` is a boolean which specifies whether the list is to be sorted in descending order.
``` python
>>> l = [34, 4, 6, 23]
>>> l.sort()
>>> l
[4, 6, 23, 34]
>>> l = [34, 4, 6, 23]
>>> l.sort(reverse=True)
>>> l
[34, 23, 6, 4]
>>> l = ["Oh", "Hi", "Py", "ed"]
>>> l.sort()
>>> l
['Hi', 'Oh', 'Py', 'ed']
>>> l = ["Oh", "Hi", "Py", "ed"]
# lowercase the words before sorting
>>> l.sort(key=str.lower)
>>> l
['ed', 'Hi', 'Oh', 'Py']
```
If you do not wish to modify the existing list and create a new list with sorted items in, use the built-in `sorted()` function which returns a new sorted list.
In Python, we can create an object (data) and bind its reference to a variable using the assignment operator (`=`).
As multiple collections or items in collections can point to the same mutable object, a copy is required so one can change one copy without changing the other.
Let us take an example:
``` python
>>> old_l = [1, 2, 3]
# Copying old list into a new list
>>> new_l = old_l
# Checking if both lists are
# pointing to the same object
>>> id(new_l)==id(old_l)
True
# Adding element to new list
>>> new_l.append(4)
>>> new_l
[1, 2, 3, 4]
>>> old_l
[1, 2, 3, 4]
```
It can be seen how the assignment operator **does not create a new copy** of the list.
The `copy()` method can be used to create a new `list` containing the items of the original list.
``` python
>>> old_l = [1, 2, 3]
# Copying old list into a new list
>>> new_l = old_l.copy()
# Checking if both lists are
# pointing to the same object
>>> id(new_l)==id(old_l)
False
# Adding element to new list
>>> new_l.append(4)
>>> new_l
[1, 2, 3, 4]
>>> old_l
[1, 2, 3]
```
Assigning a slice of the entire list (`[:]`) is also equivalent to creating a new copy.
List comprehension can be used to make a new list where each element is the result of some operations applied to each member of another sequence or iterable.
For example, to create a new list where each item is squared.
Tuples are like lists, with a difference - they are immutable. This means that once initialized a user cannot modify its value, which makes it a useful feature to ensure the sanctity of data and guarantee that it is not being modified by the program.
In case a tuple has only 1 item, it is known as a singleton.
It is a good practice to include a trailing comma to avoid the Python interpreter from treating it as a value inside regular parentheses as shown in the examples below.
Just like `list`, `tuple` supports negative indexing, i.e., you can access the values of tuple from the end. Index of -1 denotes the last item in the tuple, -2 is the second last item and so forth.
A subset of tuple `t` can be obtained using the tuple slice notation given as `t[i:j]`, where the item at index `i` is included, but the item at index `j` is excluded.
For example, the slice notation `[1:4]` refers to items from index `1` to index `3` (i.e. `4-1`).
``` python
>>> t = ("BMW", "Z4", 2019,
... 4, "Red", True)
>>> t[1:4]
('Z4', 2019, 4)
```
The slice notation `t[i:j:k]` can also include a third number known as the step. Here, a tuple is sliced from start index `i` to end index (`j`) - 1 with a step of `k` items.
``` python
>>> t = ("BMW", "Z4", 2019,
... 4, "Red", True)
>>> t[1:4:2]
('Z4', 4)
```
Slice notations also have some useful defaults.
`0` is the default for the first number and size of the tuple is the default for the second number.
Tuple Traversal is the process of visiting every item in a tuple, usually from the first item to the last item, and executing some instruction on the accessed item.
An item `x` can be located in a tuple using the `index(x[, i[, j]])` method which returns the first occurrence of the item at or after index `i` and before index `j`. In case `i` and `j` are not specified they default to `i=0` and `j=len(t)`.
As compared to a nested list, a nested tuple is useful in representing a dataset (such as a table fetched from a database) where it is important to ensure the sanctity of data as the code cannot modify it because tuples are immutable.
Just like lists, the `+=` operation works for a tuple and adds new item(s) to it.
``` python
>>> t = ("Hi", "Ed", "Punk")
>>> t += (1, 2)
>>> t
('Hi', 'Ed', 'Punk', 1, 2)
```
But tuples are immutable right?
Then how are we able to modify it.
To understand it better, let us revisit what happens when we apply `+=` operator on a list.
``` python
>>> l = ["Hi", "Ed", "Punk"]
>>> id_l = id(l)
>>> l += [1, 2]
>>> l
['Hi', 'Ed', 'Punk', 1, 2]
>>> id(l) == id_l
True
```
In case of a list, the modified list `l` still points at the same object.
Now, let us add items to a tuple.
``` python
>>> t = ("Hi", "Ed", "Punk")
>>> id_t = id(t)
>>> t += (1, 2)
>>> t
('Hi', 'Ed', 'Punk', 1, 2)
>>> id(t) == id_t
False
```
In case of a tuple, the modified tuple is actually a completely new tuple with contents of the original tuple and the extension.
The original tuple is not modified as it is immutable. But, as `t` is no longer pointing to the original tuple, it is freed from memory.
Thus, it is recommended that instead of `+=`, `append()` and `extend()` methods should be employed to add new items programatically as it will raise an error in case the code is trying to modify a tuple.
``` python
>>> l = ["Hi", "Ed", "Punk"]
>>> l.extend([1, 2])
>>> l
['Hi', 'Ed', 'Punk', 1, 2]
>>> t = ("Hi", "Ed", "Punk")
>>> t.extend((1, 2))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'extend'
Python provides a mapping type collection which contains keys, and values corresponding to those keys.
This collection is known as a `dictionary` (type `dict`) , where each key is unique and can be used to easily store or retrieve values (any data-type including `string`, `int`, `float`, `list`).
Dictionaries are indexed by keys (any immutable type - numbers, string, tuple) as compared to lists which are indexed by a range of numbers.
Keys and values can be passed as keyword arguments to the `dict()` function.
``` python
>>> d = dict(yr=20, name="Ed",
... is18=True)
>>> d
{'yr': 20, 'name': 'Ed', 'is18': True}
```
One of the limitations of this method is that the keys of the dictionary are only of type string and their names must be within the namespace of an identifier.
`dict.fromkeys(iterable[, value])` can be used to create new dictionary with items of an `iterable` as keys with an optional value (default - `None`) corresponding to the keys.
Python has a rich and extensive **Standard Library** which gives it an edge over traditional programming languages.
**Python Standard Library** contains 69 built-in functions that are commonly used by programmers which saves a lot of time as they can be directly used in a program.
We have already used some functions in the previous sections like `input()`, `output()`, `len()`, `sum()`, `min()`, `max()`, `list()`, `dict()`, etc.
Some of the widely used functions can be categorized as follows:
`min(sequence)` returns the minimum value of items in a `sequence` of type `list`, `tuple`, `range`, `str` or `set`.
Apart from iterables, `min(arg1, arg2, ..)` also accepts multiple arguments `arg1`, `arg2` .. of numeric type and returns the smallest among these arguments.
`max(sequence)` returns the maximum value of items in a `sequence` of type `list`, `tuple`, `range`, `str` or `set`.
Apart from iterables, `max(arg1, arg2, ..)` also accepts multiple arguments `arg1`, `arg2` .. of numeric type and returns the largest among these arguments.
`divmod(a, b)` returns a tuple `(a // b, a % b)` consisting of the quotient and remainder when `a` (`int`, `float`) is divided by `b` (`int`, `float`).
Apart from built-in functions, the **Python Standard Library** also contains a wide range of built-in modules which are a group of functions organized based on functionality.
Some of the commonly used modules are mentioned below:
A module can be accessed in a program using the `import` statement.
``` python
>>> import math
```
The above `import` statement loads all the functions available in the `math` module. To access any function in the module, simply type the module name followed by a period (`.`), followed by the function name.
``` python
>>> import math
>>> math.pow(3, 2)
9.0
```
Instead of loading all the functions in a module, the `from` statement can be used to access only specified functions.
`fmod(x, y)` returns the value of the expression `x - n*y` such that the result has the same sign as `x` and magnitude less than `|y|` for some integer `n`.
This function should be preferred when working with floating point numbers as compared to `x % y` that should be used when working with integers.
`randrange(stop)` is used to randomly select an integer from a range `0` to `stop` (excluding).
``` python
>>> import random
>>> random.randrange(10)
5
```
`randrange(start, stop)` is used to randomly select an integer from a range `start` to `stop` (excluding).
``` python
>>> import random
>>> random.randrange(5, 10)
8
```
`randint(a, b)` is an alias for `randrange(a, b+1)`, provides an interface to generate a random integer `N` such that `a <= N <= b` (includes boundaries).
So far we have written programs which accept input from the user via keyboard and display the generated output on the standard output (console).
This activity has the following disadvantages:
- Feasible for small inputs, but cumbersome in case of large inputs.
- The entire input has to be entered every time the program is executed.
- The output generated is not saved for sharing or later use.
Just like words can be written on a piece of paper, information can be stored on a computer disk as a named location or a file.
Python comes with the in-built ability to interact with files. Through file handling, Python programs can read a file to provide input data and output the results into a file for later use.
A text file is the simplest way to store information in a human readable text format (sequence of ASCII or Unicode characters), which includes numeric values.
It is easy to view and modify the contents of a text file using any text editor like notepad or even IDLE.
Even the python script is stored as a text file given by the extension `.py`. `.txt` is the most popular extension used for a generic text file.
Although text files do not have any structure, there are international standards defining some rules for creating specialized text files like:
-`.csv`, where each row is a set of comma separated values. This is the most popular data exchange format.
-`.tsv`, similar to CSV but the values are separated by tabs instead of commas.
-`.xml` and `.json` are popular web data exchange formats.
-`.html` files are text files with contents written in the Hypertext Markup Language designed to display contents in a web browser.
Although the contents of a text file is human readable, in reality this information is stored on the disk in form of machine readable bytes (1s and 0s) one character at a time. This makes the format simple to be operated by any application, but it is less efficient and consumes more memory (greater file size).
In a binary file, the contents are also stored in form of bytes, but these bytes do not directly translate into ASCII or Unicode characters.
Rather these bytes can represent anything like:
- Complex Data Structures
- Image
- Audio
- Video
and more.
As there is no simple rule to determine the basic unit of information, so opening a binary file in a text editor will display garbage values and even a single bit of change can corrupt the entire file and make it unreadable. Hence, specialized softwares are required to read and write binary files.
Python has a built-in `pickle` module which has implemented protocols to read and write binary files having `.dat` or `.pickle` file extension. `pickle` is a Python specific binary file format which can serialize any Python data structure (lists, dictionaries, etc) or code object into a binary file. This byte-content of this file can then be de-serialized and used later in any computer running Python.
`file_name` is a required string parameter specifying the path of the file to be opened.
It can either be an absolute path or a relative path as shown below:
-`'fname.txt'` is the relative path of a text file residing in the current working directory from where the python script is being executed.
-`'../fname.txt'` is the relative path of a text file outside the current directory where the python script is being executed.
-`'/Users/edpunk/Documents/fname.txt'` is the absolute path of a text file which can be opened by the python script from any location as long as it is in the same system.
`mode` is an optional string parameter which specifies the mode in which the file has to be opened. It defaults to `'r'` which means open for reading in text mode.
| `'t'` | Opens the file in text mode (default). |
| `'w'` | Opens the file for writing, truncating (emptying) the file if it already exists. |
| `'x'` | Same as `'w'`, but it fails if the file already exists. |
| `'a'` | Opens the file for writing, where any new data is added at the end. It creates a new file if the file does not exists. |
| `'b'` | Opens the file in binary mode. |
| `'rb'` | Opens the file in binary and read-only mode. |
| `'wb'` | Opens the file for writing in binary mode, truncating (emptying) the file if it already exists. |
| `'+'` | Allows both read and write operations on a file. |
| `'r+'` | Opens the file in read and write mode. It throws an error in case the file does not exists. If the file already exists, new data is overwritten on top of the existing data if the position of stream is not moved to the end of the file. |
| `'r+b'` | Opens the file in binary read and write mode. It does not truncate the file if it already exists. |
| `'w+'` | Opens the file in read and write mode. It creates a new file or truncates the contents of the file if it already exists. |
| `'w+b'` | Opens the file in binary read and write mode. It creates a new file or truncates the contents of the file if it already exists. |
| `'a+'` | Opens the file in read and append mode. It creates a new file if it does not exist. If the file already exists, new data is automatically added at the end of the file after existing data. |
| `'a+b'` | Opens the file in binary read and append mode. |
Since `readline()` returns one row at a time, it can be used in a `while` statement to iterate over the data row-wise. Once it reaches the end of file it returns an empty string.
`for` statement can also be used to traverse the file row-wise without any requirement of the `readline()` method. Simply iterating over the file object returns the data one row at a time.
`seek(offset, reference=0)` can be used to seek a location in the file object which is `offset` bytes from the provided `reference`.
The default value for `reference` is `0` which stands for the beginning of the file. For this default `reference`, the `offset` has to be a whole number (`>=0`).
Other allowed values of `reference` are:
-`1`, which denotes that the offset will be calculated from the current position of the file object (`offset` can be positive or negative)
-`2`, which denotes that the offset is calculated from the end of the file (`offset` is negative)
**Note**: In text files (those opened without a `b` in the mode string), only seek relative to the beginning of the file (`reference = 0`) is allowed. `reference = 1 or 2` is only valid when the file is opened in binary mode.
Let us consider the same file `info.txt` for the below example:
In Python, the protocols to read and write binary files (`.dat` or `.pickle` file extension) have been implemented in the built-in `pickle` module.
`pickle` is a Python specific binary file format which can not only be used to store binary data, but also store any Python object.
This process of translating data structures (lists, dictionaries, etc.) and code objects (classes, functions, etc.) into bytes that can be stored in a binary file is known as **Serialization**. This binary file can be stored on disk or shared and it can be de-serialized and used later via Python.
Comma-separated value (CSV) file format is one of the most common data serialization and exchange format where each row is a set of values separated by a delimiter.
The main rules for creating a CSV file are:
- Each field is separated by a delimiter. Comma (`,`) is the default delimiter used in CSV files, but other delimiters (`;`, `\t`, `|`, etc.) can also be used.
- If any field value contains a delimiter, it must be surrounded by a pair of quote characters (usually double-quotes character `"` ).
- There is an optional single header line which contains the names of the fields.
An example CSV file containing the marks data is given below:
To demonstrate why quote characters are required, let us have a look at the below contents of a CSV file which contains the average marks of some subjects (separated by comma) for each student.
`csv.reader(csvfile, delimiter=',', quotechar='"')` function is used to return the reader object which iterates over CSV file line-by-line and returns it as a list of strings.
The `csvfile` file object should be opened with `newline=''` argument as the `csv` module has its own newline handling which correctly interprets the newlines depending on platform or in case they are embedded inside quoted fields.
`csv.writer(csvfile, delimiter=',', quotechar='"')` function returns a writer object which converts data in form of a list into delimited strings which is passed down to the file object for writing.
All non-string data in the list are automatically converted into string before it is written.
The methods `writerow(row)` or `writerows(rows)` can be used to write a row (list of strings) or list of rows to a CSV file.
The built-in functions and modules of the Python Standard Library add tremendous value by performing generic operations and can be utilised by developers for any task.
But, at times a programmer needs to perform a set of operations for a specific task multiple times. In such cases instead of rewriting the statements again and again, the block of code can be wrapped into a function which can be called anytime in the scope of the program.
**User-defined functions** make the program more:
- **Organized**, as a block of code is now replaced by a single function call which performs a specific task.
- **Manageable**, due to reduction of code length.
- **Reusable**, as the user defined function can be used anywhere in the current program or it can also be imported in another program.
The function header begins with the keyword `def`.
The function name follows the `def` keyword. As it is an identifier, the same nomenclature is followed while naming it. `adder` is the name of the function in the above example.
The function name is followed by a pair of parenthesis `( )`.
In case any parameters are required by the function, they are enclosed inside the parentheses. `f, s, t = None` are the three parameters of the function.
The body of the function consists of one or more Python statements which have the same amount of indentation (4 spaces) from the function header.
It is a good practice to include the documentation string at the beginning of the function body that briefly explains how it works. This docstring can either be a single-line or a multiline string literal. In the above example the docstring is:
``` python
"""
Returns the sum of f, s and t.
If t is not provided,
return the sum of f and s.
"""
```
This docstring is followed by a series of statements which represent the set of instructions to be executed. The set of instructions in the above example are
``` python
s = f + s
if t:
s += t
```
The code block finally ends with a `return` statement which returns one or more values.
``` python
return s
```
In the above example the value of `s` is returned which is the sum.
A missing `return` statement or a `return` statement returning no value implies that the function returns `None`. These functions are known as void functions which display the result of statements via `print()` function.
After defining the above function, let us now invoke or call the function:
``` python
fst = 20
snd = 10
trd = 10
sm1 = adder(fst, snd, trd)
sm2 = adder(fst, snd)
```
`f`, `s` and `t` are known as positional parameters as they have a defined position in the function definition.
Also there is a provision to assign a default value to a parameter using `=`.
In the above example `t` has a default value of `None`.
Arguments are the values passed to a function (or method) when it is called. In the above example `fst`, `snd` and `trd` are the arguments. Since `t` has a default value, the function can be invoked with or without the `trd` argument as shown for `sm2`.
Argument values are assigned to the corresponding function parameters that are available as local variables inside the function.
Thus, value of `fst` is assigned to `f` and `snd` is assigned to `s`. In case there is no third argument, `t` has the default value `None`.
The values of `sm1` and `sm2` after executing the script are:
Variables defined outside any function or code block are known as global variables.
They are often used to specify mathematical constants, file path or other such values and can be accessed anywhere in the source code (by functions or code blocks).
Below example demonstrates how the global variable `n` can be accessed by all the functions.
In case a variable is defined inside a function with the same name as that of a global variable, then the variable is considered as a local variable and all references made to the variable point to this local variable.
Any changes made to a global variable inside a code block or a function can modify it for that session. This can have an impact on all the code blocks/functions that access and use it.
To modify the value of a global variable one can use the `global` keyword as shown in the example below.
When mutable objects (`list`, `dict`) are provided as an argument to a function, any modification made to the corresponding parameters in the body of the function leads to the modification of the original object.
Hence, care should be taken while passing mutable objects.
Once a function is defined in the Python interpreter, it can be called any number of times. But, these function definitions are lost upon exiting the interpreter.
To solve this problem we can create a python script with the function definitions at the beginning of teh file, followed by the rest of the code which includes statements invoking the defined functions.
But, this process is tedious and not managable as what makes user-defined functions powerful is that the programmer can - **Write once, and use many times**.
Instead of repeating the function definition again and again for each new program, one can put all the function definitions in a file from which the required function can be imported and invoked either in script mode or interactive mode.
This file (`.py` extension) is known as a `module` and it is the most basic form of reusable code accessible by other programs.
Let us create a new file `basics.py` containing the following functions:
``` python
def adder(f, s, t = None):
"""
Returns the sum of f, s and t.
If t is not provided,
return the sum of f and s.
"""
s = f + s
if t:
s += t
return s
def tripler(a):
"""
Multiplies a by 3 and
returns it
"""
result = 3*a
return result
```
After saving the `basics.py` file, reopen IDLE and create a new file `test.py` in the same directory as basics.py.
The name of the file is the module name which is also available as the value of the global variable `__name__` in the module.
Import the functions of the basics module in `test.py` by executing the following statement.
``` python
import basics
```
The above `import` statement loads all the functions available in the `basics` module. To access any function in the module, simply type the module name followed by a period (`.`), followed by the function name.
The above method of using `*` to import the contents of a module is not recommended as it can clutter the namespace and cause issues if there are conflicts between the identifiers defined by the programmer and those defined in the module/package.
Apart from containing definitions, a module can also contain block of code which is executed whenever the stand-alone script is run in script mode. The block has to be enclosed in an `if` statement as shown below:
``` python
if __name__ == '__main__':
...code to be executed...
```
Using the above pattern, a file can be imported or executed directly.
Let us undertake an example module (save it as `multi.py`) which multiplies two numbers:
``` python
def multiply(a, b):
return a * b
f = int(input("Enter a: "))
s = int(input("Enter b: "))
print(multiply(f, s))
```
Now when we try to load all functions from the module it automatically executes the input assignment statements and prints the output.
``` python
>>> from test import *
Enter a: 4
Enter b: 5
20
>>>
```
Let us modify the code:
``` python
def multiply(a, b):
return a * b
if __name__ == '__main__':
f = int(input("Enter a: "))
s = int(input("Enter b: "))
print(multiply(f, s))
```
Now, the block will execute only if the script is executed directly and not when the file is imported as a module.
A collection of modules which can work together to provide a common functionality is known as a **package**.
These modules are present in a folder along with the `__init__.py` file which tells Python that this folder is a package.
A package can also contain subfolders (sub-packages), each containing their respective `__init__.py` files.
Let us take the example of a package called `restaurant` which consists of various modules to perform activities such as order booking, reservation, staff attendance, etc.
Here is a possible structure for the package:
```
restaurant/
__init__.py
orders.py
reservation.py
employee.py
inventory.py
```
A package is simply the directory containing sub-packages and modules, but when this package or a collection of packages are made available for others to use (eg. via PyPI) it is known as a **library**.
For example, `restaurant` can be called a library if it provides reusable codes to manage a restaurant and is built using multiple packages which handle the various aspects of a restaurant like human resource management, inventory management, order fulfilment and billing, etc.