01. Pathlib

pathlib module is assed in Python 3.4 and it’s a great way to work with files and directories.

Most of the time that you’re going to be using pathlib is going to be through one of the following main classes: Path, PosixPath, PurePath, PurePosixPath, PureWindowsPath, and WindowsPath. In this lesson we’re mostly going to be dealing with the Path class because it exposes the maximum amount of functionality. However, if we need to do something more specific there are those alternatives (for example PosixPath are for people on unix-like systems, so that’s either Unix itself, Linux or Mac).

Let’s start by importing parhlib module and by creating a new object, that is an instance of the Path object using a method called .cwd():

>>> import pathlib
 
>>> pwd = pathlib.Path.cwd()
PosixPath('/Users/simonedangelo/Documents/obsidian-blog/content/Python/Python Standard Library Tour (by Jake Callahan)/jupyter_notebooks')

The reason why we’re doing that is because pwd is the Linux-equivalent of cwd. The output is a PosixPath object and it gives the absolute path to the directory we’re worked in.

Another way to create a path object is:

>>> curr = pathlib.Path()
>>> curr
PosixPath('.')

where . refers to the current directory that we are in.

Actually, there’s some notable differences between pwd and curr. The biggest one is that pwd is an absolute directory while curr is a relative directory. Where you can run into issues here is if we look at one of the attributes exposed from this object, parts, you can break down pwd into its directory parts starting at / which is our root directory:

>>> pwd.parts
('/',
 'Users',
 'simonedangelo',
 'Documents',
 'obsidian-blog',
 'content',
 'Python',
 'Python Standard Library Tour (by Jake Callahan)',
 'jupyter_notebooks')

However if we try to do the same to curr we see nothing:

>>> curr.parts
()

but there is a way around this and that is:

>>> curr.absolute()
PosixPath('/Users/simonedangelo/Documents/obsidian-blog/content/Python/Python Standard Library Tour (by Jake Callahan)/jupyter_notebooks')

with .absolute() method we can get the absolute directory from any valid relative directory and then we call parts attribute on it:

>>> curr.absolute().parts
('/',
 'Users',
 'simonedangelo',
 'Documents',
 'obsidian-blog',
 'content',
 'Python',
 'Python Standard Library Tour (by Jake Callahan)',
 'jupyter_notebooks')

So, you can convert relative paths into absolute paths anytime you want as long as the path is valid. One way to check if the path is valid is by using .exists() method:

pwd.exists()
True

There are a number of other different is_ helpers that are exposes from this class:

Let’s explore some of them:

>>> pwd.is_dir()
True
 
>>> pwd.is_file()
False

This result is pretty obvious because pwd is a directory and not a file.

We can also look at the parent directory with parent attribute:

>>> pwd.parent
PosixPath('/Users/simonedangelo/Documents/obsidian-blog/content/Python/Python Standard Library Tour (by Jake Callahan)')

and also iterate:

>>> pwd.parent.parent
PosixPath('/Users/simonedangelo/Documents/obsidian-blog/content/Python')

and all the way up to the root directory:

>>> pwd.root
'/'

Let’s check one of those interesting properties by storing our parents as up and we’ll see a little bit more magic:

>>> up = pwd.parent
 
>>> up < pwd
True
>>> up > pwd
False

With < operator we can check if a directory is a parent of another directory; with > operator we can check if a directory is a child of another directory. If you want to link this concept with the operator you can think this way “up < pwd checks if the length of pwd is greater than the path of up; if so, it’s its parent directory”. Note that this is just a way to memorize the behaviour of < and > operators: they are not checking any paths length.

We can also list out the contents of the current directory we are in (only one file in our case):

>>> children = list(pwd.iterdir())
>>> children
[PosixPath('/Users/simonedangelo/Documents/obsidian-blog/content/Python/Python Standard Library Tour (by Jake Callahan)/jupyter_notebooks/txt_test.txt'),
 PosixPath('/Users/simonedangelo/Documents/obsidian-blog/content/Python/Python Standard Library Tour (by Jake Callahan)/jupyter_notebooks/yaml_test.yaml'),
 PosixPath('/Users/simonedangelo/Documents/obsidian-blog/content/Python/Python Standard Library Tour (by Jake Callahan)/jupyter_notebooks/python_test.py'),
 PosixPath('/Users/simonedangelo/Documents/obsidian-blog/content/Python/Python Standard Library Tour (by Jake Callahan)/jupyter_notebooks/1. Pathlib.ipynb'),
 PosixPath('/Users/simonedangelo/Documents/obsidian-blog/content/Python/Python Standard Library Tour (by Jake Callahan)/jupyter_notebooks/folder_test')]

Hidden files (if any) would also be shown.
We’re calling list because the .iterdir() method creates an iterable, which you can iterate over. This is particularly useful when working with large directories; you might not want to list all the contents at once but rather iterate over them one by one to perform specific operations. For example we may want to list only files in the directory:

>>> child_files = [pth for pth in pwd.iterdir() if pth.is_file()]
>>> child_files
[PosixPath('/Users/simonedangelo/Documents/obsidian-blog/content/Python/Python Standard Library Tour (by Jake Callahan)/jupyter_notebooks/txt_test.txt'),
 PosixPath('/Users/simonedangelo/Documents/obsidian-blog/content/Python/Python Standard Library Tour (by Jake Callahan)/jupyter_notebooks/yaml_test.yaml'),
 PosixPath('/Users/simonedangelo/Documents/obsidian-blog/content/Python/Python Standard Library Tour (by Jake Callahan)/jupyter_notebooks/python_test.py'),
 PosixPath('/Users/simonedangelo/Documents/obsidian-blog/content/Python/Python Standard Library Tour (by Jake Callahan)/jupyter_notebooks/1. Pathlib.ipynb')]
 
>>> child_dir = [pth for pth in pwd.iterdir() if pth.is_dir()]
>>> child_dir
[PosixPath('/Users/simonedangelo/Documents/obsidian-blog/content/Python/Python Standard Library Tour (by Jake Callahan)/jupyter_notebooks/folder_test')]

This is really useful for doing recursive directory traversal. So, if you’re writing some program that needs to go through every file under some subdirectory we can use .iterdir() if something’s a file that matches whatever you’re looking for, you can either register that or ignore if it’s a directory you can recursively do the same process with that directory. This is a common pattern.

If you need to go through each file within a subdirectory, you can use .iterdir() to iterate through its contents. For each item, you can check if it’s a file or a directory. If it’s a file and matches certain criteria, you can process it accordingly. If it’s a directory, you can use recursion to apply the same process to its contents. This approach is a common pattern. (*Here’s an example made with ChatGPT:

from pathlib import Path
 
def find_text_files(directory: Path):
    for item in directory.iterdir():
        if item.is_file() and item.suffix == '.txt':  # Check if it's a .txt file
            print(item)
        elif item.is_dir():  # If it's a directory, recurse into it
            find_text_files(item)
 
find_text_files(pwd)

There is also a really nice way to do this if you’re looking for particular types of files or fields with a particular pattern, that is by using .glob() method. Say we’re looking all yaml files in the directory:

>>> yaml_files = list(pwd.glob("*.yaml"))
>>> yaml_files
[PosixPath('/Users/simonedangelo/Documents/obsidian-blog/content/Python/Python Standard Library Tour (by Jake Callahan)/jupyter_notebooks/yaml_test.yaml')]

Say we want to work on something that might not exists. Let’s create a new path with .joinpath() method:

>>> new_path = pwd.joinpath("scripts", "test_script.py")
>>> new_path
PosixPath('/Users/simonedangelo/Documents/obsidian-blog/content/Python/Python Standard Library Tour (by Jake Callahan)/jupyter_notebooks/scripts/test_script.py')

We can pass an arbitrary number of arguments in this method that are going to be new parts in the new_path.

There’s also nicer looking way of putting these together and it involves some syntactic sugar and to do this, instead of passing arguments along like we just did, we’ll use the following notation with /:

>>> new_path = pwd / "scripts" / "test_script.py"
>>> new_path
PosixPath('/Users/simonedangelo/Documents/obsidian-blog/content/Python/Python Standard Library Tour (by Jake Callahan)/jupyter_notebooks/scripts/test_script.py')

The result is the same as .joinpath() method.

Theoretically, new_path is pointing to a file instead of directory. Let’s prove it:

>>> new_path.is_file()
False

Actually, it return False. Why? That’s because .is_file() method works only if a directory exists, and that’s not the case:

>>> new_path.exists()
False

Even though a path doesn’t exists, we can still do some interesting things with it. For example:

>>> new_path.name
'test_script.py'
 
>>> new_path.suffix # it's the file extension
'.py'
 
>>> new_path.stem # it's everything before the suffix
'test_script'

We can also change them with with_ methods like:

>>> new_path = new_path.with_suffix(".sh")
>>> new_path
PosixPath('/Users/simonedangelo/Documents/obsidian-blog/content/Python/Python Standard Library Tour (by Jake Callahan)/jupyter_notebooks/scripts/test_script.sh')

Of course this file doesn’t exists, but we can remedy that. So, first let’s check if its parent exist:

>>> new_path.parent.exists()
False

However, pathlib module makes it really easy to also create a new directory with mkdir() function:

>>> new_path.parent.mkdir(parents=True, exist_ok=True)

parents=True means allows Python to recursively create multiple directories, if they need to be created;
exists_ok=True this ensures that an error is not thrown if that directory already exists.

Now, the script directory is created:

>>> new_path.parent.exists()
True

Now that we have a parent directory, to create a file we’re going to use .touch() method. This method simply creates a file if it doesn’t exist (if it already exists, it just updates the timestamp of when it was last modified without modifying the content of the file itself):

>>> new_path.touch()

Finally, we can prove that this file exists and see that is a file:

>>> new_path.exists()
True
 
>>> new_path.parent.exists()
True

Here’s an example of a script that uses the os and shutil modules, which, for a long time, were the standards with dealing with files and directories in Python. What this script does is pretty straightforward, so we won’t comment it:

import os
import shutil
 
# Creates a new directory
os.mkdir("mydir")
 
# Creates a new file in the directory
with open("mydir/myfile.txt", "w") as f:
    f.write("Hello, world!")
 
#Read the contents of the file
with open("mydir/myfile.txt", "r") as f:
    print(f.read()) # Hello, world!
 
# Updates the content of the file
with open("mydir/myfile.txt", "w") as f:
    f.write("Goodbye, world!")
 
# Read the contents of the file again
with open("mydir/myfile.txt", "r") as f:
    print(f.read())
 
# Delete the file
os.remove("mydir/myfile.txt")
 
# Delete the directory
shutil.rmtree("mydir")

We can definitely improve this code with pathlib module. Let’s first create a Path object:

from pathlib import Path
 
# Creates a new directory
my_file = Path("mydir/myfile.txt")
my_file.parent.mkdir()

To work with the file itself we have two main options when we’re working with the contents of a file:

using .open() method:

# Creates a new file in the directory
with my_file.open("w") as f:
    f.write("Hello, world!")

using .write_text() method (especially for more simple contents):

my_file.write_text("Hello, world!")

This more succinct way can be used to read files as well:

print(my_file.read_text())

Finally, to delete the file we can use .unlink() method:

my_file.unlink()

And to remove the directory we’ll use .rmdir() method:

my_file.parent.rmdir()

The full code is:

from pathlib import Path
 
# Creates a new directory
my_file = Path("mydir/myfile.txt")
my_file.parent.mkdir()
 
# Creates a new file in the directory
my_file.write_text("Hello, world!")
 
#Read the contents of the file
print(my_file.read_text())
 
# Updates the content of the file
my_file.write_text("Goodbye, world!")
 
# Read the contents of the file again
print(my_file.read_text())
 
# Delete the file
my_file.unlink()
 
# Delete the directory
my_file.parent.rmdir()

Let’s use a real use-case of pathlib module:

def load_file(file, warn=True):
    """Verify the existence of and load data from json amd yaml files."""
    file = Path(file)
    if not file.exists() or file.suffix not in (".json", ".yaml", ".yml"):
        if warn:
            logger.warning(f"File {file.absolute()} is invalid or does not exists.")
        return []
    ...

Let’s discuss only important points:

Line 3: it gives the user of this function the ability to either pass in a Path object or a path string. We’re just handling with an explicit conversion and we don’t need to worry about.
Line 6: in case they pass a relative directory, it may have been mistaken about where they were finding this from and .absolute() method would make it quite clear where Python was expecting the file to be.

Quartz 4

Explorer

01. Pathlib

Graph View