Introduction
File handling is an important operation in programming. It helps with data management and storage.
Objectives
In this article, you will learn:
how to open a file
how to close a file after using it
how to read the content of a file
how to write or append text content to a file
- how to work with JSON files
Prerequisites
To follow along with this tutorial, you need to:
have Python installed on your computer
have a basic knowledge of Python
know how to run Python on the command line
know how to run basic commands on the command line
Scope
Our focus in this article will be on file operations in text mode.
How to open and close a file
You use the open()
function to open a file. open()
is a built-in Python function that returns a file object. It takes up to eight arguments, but only three are relevant to our current discussion.
open(filename, mode='r', encoding=None)
filename
is a string that matches the name of the file you want to open. A FileNotFoundError
is raised if the file does not exist.
mode
, another string, describes how you want to use the file. It is a keyword argument with a default value r
, signifying that you want to read the content of the file. Other values mode can take include w
for write and a
for append. The r
mode places the 'cursor' at the beginning of the file. You can omit the mode
argument if you just want to read a file.
encoding
describes the character encoding standard in which the file is to be opened. encoding='utf-8'
sets the encoding to UTF-8, which is the recommended standard. Omitting the encoding argument makes the default value platform-dependent.
Let's say you have a file named hello
. You can open the file for reading this way:
file = open('hello', mode='r', encoding='utf-8')
To read the entire content of the file, you use the read()
method of the file object (more on this method in the How to read section): file.read()
.
You must close the file after using it: file.close()
.
Because the operation may fail (maybe because the file did not open correctly), leading to some exception being raised, it is safer to use a try...catch...finally
block to handle file operations:
file = open('hello', mode='r', encoding='utf-8')
try:
file.read()
except:
print("File can't be read")
finally:
file.close()
The with
statement
The with
statement ensures that a file closes properly after it has been used, even if some exception is raised. It makes using the try...catch...finally
statement unnecessary and also accomplishes the same thing in fewer lines of code.
The above try...catch...finally
block is equivalent to:
with open('hello', mode='r', encoding='utf-8') as file:
file.read()
You can check if the file indeed closed automatically using the closed
attribute of the file object:
How to read a file
You use the read()
, readline()
and readlines()
methods of a file object to read the file content.
The read()
method
The read()
method accepts an optional numerical argument that describes the number of characters you want to read. If this argument is greater than the file size, only as many characters as in the file would be read. The entire file content will be read if you omit this argument or if it is negative.
Let's say you have a text file named hello
with the following content:
You may read the first 25 characters of this file using the following code:
with open('hello', encoding='utf-8') as file:
content = file.read(25)
print(content)
Remember that the mode argument defaults to 'r'
if it is not explicitly reassigned. This is the output when you run the script:
The entire file content is read when no argument is passed to the read()
method:
with open('hello', encoding='utf-8') as file:
content = file.read()
print(content)
This is the output:
The readline()
method
The readline()
method reads file content one line at a time.
The newline character \n
at the end of each line describes the corresponding line break in the file content. The last output, which is just a newline character \n
, is for line 3 (a blank line) in the hello
file.
readline()
outputs an empty string when it has read all the lines of the file.
A more convenient way of reading the lines of a file is by looping over the file object.
with open('hello', encoding='utf-8') as file:
for line in file:
print(line, end='')
This is the output:
The readlines()
method
The readlines()
method returns a list of all lines that make up a file content.
with open('hello', encoding='utf-8') as file:
lines = file.readlines()
print(lines)
This is the output:
file.readlines()
is equivalent to list(file)
.
How to write to a file
To write to a file, open it in the w
(write) or a
(append) mode. The file is created if it does not exist. If the file already exists, opening in w
mode deletes the file content. Opening in a
mode (for an existing file) places the cursor at the end of the file. The cursor position in the w
mode is irrelevant since the file is empty.
The write()
method
The write()
method of the file object is used to write (or append) to a file. The content you want to write must be a string passed to this method. In w
mode, this string becomes the new content of the file; in a
mode, the string is appended to the existing file content (if the file already exists). The write()
method returns the number of characters written to the file.
Let’s say you want to write to a file textfile
that doesn’t exist, as can be seen below, and print the number of written characters.
Both w
and a
mode will achieve this (because the file doesn’t exist yet), but let’s use w
.
with open("textfile", "w", encoding="utf-8") as file:
num_chars = file.write("But we cannot simply sit and stare at our wounds forever.")
print(num_chars)
This is the output:
Let’s check if the file was created and the string written to it:
You can open the same file in append mode if you want to add to the content.
with open("textfile", "a", encoding="utf-8") as file:
num_chars = file.write(" - Haruki Murakakmi")
print(num_chars)
Let’s see the current content of the file.
File operations in combined modes
You can open files for both reading and writing by combining modes thus: r+
(read and write), w+
(write and read) and a+
(append and read).
r+
mode
r+
will raise an exception if the file doesn’t exist. It places the cursor at the beginning of the file. Writing to the file will truncate (or delete) as many characters from the beginning of the file as you want to write, replacing the truncated characters with the string passed to the write()
method. The cursor is now placed just after the last character of the written string. A read()
operation begins from this new cursor position until the end of the file (if you pass no argument to read()
).
Let’s demonstrate this using the following code:
with open("textfile", "r+", encoding="utf-8") as file:
file.write("'The things that we love tell us who we are.")
print(file.read())
Below is the content of textfile
before and after running the script, with the output of the script in between:
readline()
and readlines()
have a very interesting behavior in the r+
mode. Let’s demonstrate this for readline()
using the code below:
with open("textfile", "r+", encoding="utf-8") as file:
file.write("The things that we love tell us who we are.")
print(file.readline())
The textfile
content before and after running the script, as well as the output of the script, is shown below:
For all intents and purposes, readline()
executes before write()
, placing the cursor just after the last character in textfile
before the write operation. write()
now begins from this new cursor position, appending its string argument to the end of the file. Replacing readline()
with readlines()
will return the same output but as a list.
Check out this discussion on Stack Overflow for more on this behavior of readline()
and readlines()
.
w+
and a+
mode
Calling open()
in w+
or a+
mode creates the file if it doesn’t exist. If the file already exists, opening in w+
mode deletes its content, while opening in a+
mode places the cursor at the end of the file. In the w+
mode, calling write()
on the file object writes a new string (the argument of write()
) to the file; in the a+
mode, write()
appends its string argument to the existing file content. In both modes, the cursor is placed at the end of the file after the write operation. Because the cursor is now at the end of the file, calling read()
or readline()
returns an empty string, while readlines()
returns an empty list.
Let’s demonstrate these concepts using the w+
mode and the read()
method.
with open("textfile", "w+", encoding="utf-8") as file:
file.write("The things that we love tell us who we are.")
print(file.read())
This is the output:
In the following table, we summarize the behavior of file objects when opened in different modes.
Creates new file if file doesn't exist? | Deletes file content? | Cursor position | Acceptable operations | |
| No | No | Places the cursor at the beginning of the file |
|
| Yes | Yes | Does not apply since file content is deleted |
|
| Yes | No | Places the cursor at the end of the file |
|
| No | No | Places the cursor at the beginning of the file |
|
| Yes | Yes | Does not apply since file content is deleted |
|
| Yes | No | Places the cursor at the end of the file |
|
Table 1. Behavior of file objects when opened in different modes
Working with JSON
JSON (JavaScript Object Notation), a standard data interchange format, makes it possible to save complex data types. The Python json
module can convert different data objects (lists, dictionaries, etc.) into string representations. This is called serialization (or encoding). Recreating the original data object from its JSON string representation is known as deserialization (or decoding).
Serialization and deserialization don’t necessarily involve only file objects. But in line with the main objective of this article, which is to learn how to work with text files in Python, our focus in this section will be on serializing to and deserializing from files.
Serialization
To work with JSON, you must first import the json module.
import json
To convert a Python object to its JSON string representation and store the output in a file, you use the dump()
method of the json
module. This method accepts a variable number of arguments, but only three are of interest to this discussion.
json.dump(obj, fp, indent=None)
obj
is the Python object you want to convert to a JSON string, fp
is the file object in which you want to save the JSON string, and indent
describes the indentation level for each member of obj
(assuming it’s a list, tuple or dictionary). indent
can take an integer or a string (e.g. '\t'
) as its value. The default value, None
, introduces no indentation. To insert just a new line, assign indent
a zero, a negative number or an empty string.
Serialization can be seen as writing a JSON string to a file. This means that to use dump()
, you must open the file in a mode that accepts a write operation (see Table 1).
This is how to write a list object to JSON string:
import json
with open("textfile", "w", encoding="utf-8") as file:
json.dump(["Python", "C", "JS"], file)
This is the content of textfile
after running the code:
This is how to write a dictionary object to JSON string:
import json
with open("textfile", "w", encoding="utf-8") as file:
json.dump(
{"name": "Python", "department": "data science", "school": "computer science"},
file,
indent=4,
)
This is the textfile
content after running the code:
Notice the effect of the indent
argument on the textfile
content.
json.dump()
treats tuple objects as lists. set objects are not acceptable as an argument of dump()
. See this table for all acceptable object types.
Deserialization
You use the load()
method of the json
module to convert a JSON string from a file to its equivalent Python object. It accepts a variable number of arguments, but only the fp
argument, which describes the file object containing the JSON string, is relevant here.
json.load(fp)
Deserialization can be seen as reading a JSON string from a file. This implies that to use load()
, you must open the file in a mode that accepts a read operation (read()
, readline()
or readlines()
) (see Table 1).
import json
with open("textfile", "r", encoding="utf-8") as file:
print(json.load(file))
This is the textfile content before running the script and the output of the script:
This table outlines all supported JSON to object type conversions for the load()
method.
Conclusion
We have looked at different modes of opening files in Python, and the acceptable operations for these modes. To some extent, appreciating how the different modes affect the file and cursor positions is key to a better understanding of the different file operations. This is why much emphasis has been laid on these effects in this article. We have also learned how to convert Python objects to JSON and save the JSON to a file, as well as how to decode the JSON back to its original Python object. As much as possible, comparisons have been made between related concepts to boost understanding.