Working with Files in Python

Python File I/O

Introduction

File handling is an important operation in programming. It helps with data management and storage.

Objectives

In this article, you will learn:

  • how to open a file

  • how to close a file after using it

  • how to read the content of a file

  • how to write or append text content to a file

  • how to work with JSON files

Prerequisites

To follow along with this tutorial, you need to:

  • have Python installed on your computer

  • have a basic knowledge of Python

  • know how to run Python on the command line

  • know how to run basic commands on the command line

Scope

Our focus in this article will be on file operations in text mode.

How to open and close a file

You use the open() function to open a file. open() is a built-in Python function that returns a file object. It takes up to eight arguments, but only three are relevant to our current discussion.

open(filename, mode='r', encoding=None)

filename is a string that matches the name of the file you want to open. A FileNotFoundError is raised if the file does not exist.

mode, another string, describes how you want to use the file. It is a keyword argument with a default value r, signifying that you want to read the content of the file. Other values mode can take include w for write and a for append. The r mode places the 'cursor' at the beginning of the file. You can omit the mode argument if you just want to read a file.

encoding describes the character encoding standard in which the file is to be opened. encoding='utf-8' sets the encoding to UTF-8, which is the recommended standard. Omitting the encoding argument makes the default value platform-dependent.

Let's say you have a file named hello. You can open the file for reading this way:

file = open('hello', mode='r', encoding='utf-8')

To read the entire content of the file, you use the read() method of the file object (more on this method in the How to read section): file.read().

You must close the file after using it: file.close().

Because the operation may fail (maybe because the file did not open correctly), leading to some exception being raised, it is safer to use a try...catch...finally block to handle file operations:

file = open('hello', mode='r', encoding='utf-8')

try:
    file.read() 
except:
    print("File can't be read")
finally:
    file.close()

The with statement

The with statement ensures that a file closes properly after it has been used, even if some exception is raised. It makes using the try...catch...finally statement unnecessary and also accomplishes the same thing in fewer lines of code.

The above try...catch...finally block is equivalent to:

with open('hello', mode='r', encoding='utf-8') as file:
    file.read()

You can check if the file indeed closed automatically using the closed attribute of the file object:

How to read a file

You use the read(), readline() and readlines() methods of a file object to read the file content.

The read() method

The read() method accepts an optional numerical argument that describes the number of characters you want to read. If this argument is greater than the file size, only as many characters as in the file would be read. The entire file content will be read if you omit this argument or if it is negative.

Let's say you have a text file named hello with the following content:

You may read the first 25 characters of this file using the following code:

with open('hello', encoding='utf-8') as file:
    content = file.read(25)
    print(content)

Remember that the mode argument defaults to 'r' if it is not explicitly reassigned. This is the output when you run the script:

The entire file content is read when no argument is passed to the read() method:

with open('hello', encoding='utf-8') as file:
    content = file.read()
    print(content)

This is the output:

The readline() method

The readline() method reads file content one line at a time.

The newline character \n at the end of each line describes the corresponding line break in the file content. The last output, which is just a newline character \n, is for line 3 (a blank line) in the hello file.

readline() outputs an empty string when it has read all the lines of the file.

A more convenient way of reading the lines of a file is by looping over the file object.

with open('hello', encoding='utf-8') as file:
    for line in file:
        print(line, end='')

This is the output:

The readlines() method

The readlines() method returns a list of all lines that make up a file content.

with open('hello', encoding='utf-8') as file:
    lines = file.readlines()
    print(lines)

This is the output:

file.readlines() is equivalent to list(file).

How to write to a file

To write to a file, open it in the w (write) or a (append) mode. The file is created if it does not exist. If the file already exists, opening in w mode deletes the file content. Opening in a mode (for an existing file) places the cursor at the end of the file. The cursor position in the w mode is irrelevant since the file is empty.

The write() method

The write() method of the file object is used to write (or append) to a file. The content you want to write must be a string passed to this method. In w mode, this string becomes the new content of the file; in a mode, the string is appended to the existing file content (if the file already exists). The write() method returns the number of characters written to the file.

Let’s say you want to write to a file textfile that doesn’t exist, as can be seen below, and print the number of written characters.

Both w and a mode will achieve this (because the file doesn’t exist yet), but let’s use w.

with open("textfile", "w", encoding="utf-8") as file:
    num_chars = file.write("But we cannot simply sit and stare at our wounds forever.")
    print(num_chars)

This is the output:

Let’s check if the file was created and the string written to it:

You can open the same file in append mode if you want to add to the content.

with open("textfile", "a", encoding="utf-8") as file:
    num_chars = file.write(" - Haruki Murakakmi")
    print(num_chars)

Let’s see the current content of the file.

File operations in combined modes

You can open files for both reading and writing by combining modes thus: r+ (read and write), w+ (write and read) and a+ (append and read).

r+ mode

r+ will raise an exception if the file doesn’t exist. It places the cursor at the beginning of the file. Writing to the file will truncate (or delete) as many characters from the beginning of the file as you want to write, replacing the truncated characters with the string passed to the write() method. The cursor is now placed just after the last character of the written string. A read() operation begins from this new cursor position until the end of the file (if you pass no argument to read()).

Let’s demonstrate this using the following code:

with open("textfile", "r+", encoding="utf-8") as file:
    file.write("'The things that we love tell us who we are.")
    print(file.read())

Below is the content of textfile before and after running the script, with the output of the script in between:

readline() and readlines() have a very interesting behavior in the r+ mode. Let’s demonstrate this for readline() using the code below:

with open("textfile", "r+", encoding="utf-8") as file:
    file.write("The things that we love tell us who we are.")
    print(file.readline())

The textfile content before and after running the script, as well as the output of the script, is shown below:

For all intents and purposes, readline() executes before write(), placing the cursor just after the last character in textfile before the write operation. write() now begins from this new cursor position, appending its string argument to the end of the file. Replacing readline() with readlines() will return the same output but as a list.

Check out this discussion on Stack Overflow for more on this behavior of readline() and readlines().

w+ and a+ mode

Calling open() in w+ or a+ mode creates the file if it doesn’t exist. If the file already exists, opening in w+ mode deletes its content, while opening in a+ mode places the cursor at the end of the file. In the w+ mode, calling write() on the file object writes a new string (the argument of write()) to the file; in the a+ mode, write() appends its string argument to the existing file content. In both modes, the cursor is placed at the end of the file after the write operation. Because the cursor is now at the end of the file, calling read() or readline() returns an empty string, while readlines() returns an empty list.

Let’s demonstrate these concepts using the w+ mode and the read() method.

with open("textfile", "w+", encoding="utf-8") as file:
    file.write("The things that we love tell us who we are.")
    print(file.read())

This is the output:

In the following table, we summarize the behavior of file objects when opened in different modes.

Creates new file if file doesn't exist?

Deletes file content?

Cursor position

Acceptable operations

r

No

No

Places the cursor at the beginning of the file

read(), readline(), readlines()

w

Yes

Yes

Does not apply since file content is deleted

write()

a

Yes

No

Places the cursor at the end of the file

write()

r+

No

No

Places the cursor at the beginning of the file

read(), readline(), readlines(), write()

w+

Yes

Yes

Does not apply since file content is deleted

read(), readline(), readlines(), write()

a+

Yes

No

Places the cursor at the end of the file

read(), readline(), readlines(), write()

Table 1. Behavior of file objects when opened in different modes

Working with JSON

JSON (JavaScript Object Notation), a standard data interchange format, makes it possible to save complex data types. The Python json module can convert different data objects (lists, dictionaries, etc.) into string representations. This is called serialization (or encoding). Recreating the original data object from its JSON string representation is known as deserialization (or decoding).

Serialization and deserialization don’t necessarily involve only file objects. But in line with the main objective of this article, which is to learn how to work with text files in Python, our focus in this section will be on serializing to and deserializing from files.

Serialization

To work with JSON, you must first import the json module.

import json

To convert a Python object to its JSON string representation and store the output in a file, you use the dump() method of the json module. This method accepts a variable number of arguments, but only three are of interest to this discussion.

json.dump(obj, fp, indent=None)

obj is the Python object you want to convert to a JSON string, fp is the file object in which you want to save the JSON string, and indent describes the indentation level for each member of obj (assuming it’s a list, tuple or dictionary). indent can take an integer or a string (e.g. '\t') as its value. The default value, None, introduces no indentation. To insert just a new line, assign indent a zero, a negative number or an empty string.

Serialization can be seen as writing a JSON string to a file. This means that to use dump(), you must open the file in a mode that accepts a write operation (see Table 1).

This is how to write a list object to JSON string:

import json

with open("textfile", "w", encoding="utf-8") as file:
    json.dump(["Python", "C", "JS"], file)

This is the content of textfile after running the code:

This is how to write a dictionary object to JSON string:

import json

with open("textfile", "w", encoding="utf-8") as file:
    json.dump(
        {"name": "Python", "department": "data science", "school": "computer science"},
        file,
        indent=4,
    )

This is the textfile content after running the code:

Notice the effect of the indent argument on the textfile content.

json.dump() treats tuple objects as lists. set objects are not acceptable as an argument of dump(). See this table for all acceptable object types.

Deserialization

You use the load() method of the json module to convert a JSON string from a file to its equivalent Python object. It accepts a variable number of arguments, but only the fp argument, which describes the file object containing the JSON string, is relevant here.

json.load(fp)

Deserialization can be seen as reading a JSON string from a file. This implies that to use load(), you must open the file in a mode that accepts a read operation (read(), readline() or readlines()) (see Table 1).

import json

with open("textfile", "r", encoding="utf-8") as file:
    print(json.load(file))

This is the textfile content before running the script and the output of the script:

This table outlines all supported JSON to object type conversions for the load() method.

Conclusion

We have looked at different modes of opening files in Python, and the acceptable operations for these modes. To some extent, appreciating how the different modes affect the file and cursor positions is key to a better understanding of the different file operations. This is why much emphasis has been laid on these effects in this article. We have also learned how to convert Python objects to JSON and save the JSON to a file, as well as how to decode the JSON back to its original Python object. As much as possible, comparisons have been made between related concepts to boost understanding.