21. File Operations#
Pythonʼs view of files and directories derives from the Unix/Linux operating system variants. For more information on files, look at the Basics of the Unix Shell in Section 8, The Tools.
Python’s os module provides support for file operations and interacting with the operating system.
Python’s functionality largely mirrors that as provided by various command-line programs and the underlying standard C libraries upon which Python is implemented.
21.1. Existence#
To see whether or not a given file or directory exists, call os.path.exists()
with the name as the argument.
1import os
2print("test_binary.dat",os.path.exists("test_binary.dat"))
3print("binary.dat",os.path.exists("binary.dat"))
4print(".",os.path.exists(".")) # current directory
5print("..",os.path.exists("..")) # parent directory
test_binary.dat False
binary.dat False
. True
.. True
21.2. Checking Filetype#
Use os.path.isfile()
to return a Boolean on whether or the argument is a file.
Use os.path.isdir()
to return a Boolean on whether or the argument is a directory.
1print("isfile: test_binary.dat", os.path.isfile("test_binary.dat"))
2print("isdir: test_binary.dat", os.path.isdir("test_binary.dat"))
isfile: test_binary.dat False
isdir: test_binary.dat False
21.3. Deleting Files#
To delete a file, use os.remove()
.
1os.remove("test_binary.dat")
2os.path.exists("test_binary.dat") #verify that file was removed
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[3], line 1
----> 1 os.remove("test_binary.dat")
2 os.path.exists("test_binary.dat") #verify that file was removed
FileNotFoundError: [Errno 2] No such file or directory: 'test_binary.dat'
21.4. File Information: stat#
To get details (Unix/Linux calls “status”), call os.stat()
. This returns an object with various fields to represent the permissions on the file, the file’s type, size, owner, group, and various timestamps.
1stat_obj = os.stat('.')
2print(stat_obj)
os.stat_result(st_mode=16877, st_ino=2271486, st_dev=16777230, st_nlink=41, st_uid=503, st_gid=20, st_size=1312, st_atime=1714354953, st_mtime=1714354953, st_ctime=1714354953)
Initially, that result looks very esoteric, but once we break down a few of the fields, it makes more sense.
The st_mode
contains the file type and permissions associated with the file. Using ls -l
, we see this data represented with a string that looks like ‘-rwxr-xr-x’. This first character specifies the type: ‘-’ for files and ‘d’ for directories. The next nine characters represent the user, group, and world permissions in terms of read, write, and execute. Typically, st_mode makes more sense in its octal representation.
1print(oct(stat_obj.st_mode))
0o40755
The first number represents the file type. You will see 40 for a directory and 100 for a file. The last three numbers correspond to the owner, group, and world permissions using a bit representation for read, write, and execute. For example, 111 in binary equals 7 in octal - so read, write, and execute permssions are set for that group. 101 = 5 in octal, so only read and execute permissions are set. 100 = 4 in octal, so only read.
For more explanation, see the “Understanding and Modifying File Permissions” section of Overview of the Unix File System.
st_size is the number of bytes to contain the file’s contents.
st_atime, st_mtime, and st_ctime represent when the file was last accessed, modified, and created. The times are specified in seconds. To convert to a date and time, they present the number of seconds since the Unix epoch, which is midnight on January 1st, 1970. While this fact seems esoteric, this is a ubiquitous representation of dates and times. Fortunately, as with other languages, Python provides APIs to perform the necessary conversion into a datetime object.
1import datetime
2accessed_dt = datetime.datetime.utcfromtimestamp(stat_obj.st_atime).replace(tzinfo=datetime.timezone.utc)
3print(accessed_dt.isoformat())
2024-04-29T01:42:33.797208+00:00
/var/folders/f0/69wncpqd02s3j1r_z5sfddd00000gq/T/ipykernel_24914/3581680587.py:2: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
accessed_dt = datetime.datetime.utcfromtimestamp(stat_obj.st_atime).replace(tzinfo=datetime.timezone.utc)
21.5. Directory Operations#
As with files, Python supports various directory operations.
21.5.1. Create Directory#
Use os.mkdir()
to create a new directory
1os.mkdir('newDir')
---------------------------------------------------------------------------
FileExistsError Traceback (most recent call last)
Cell In[7], line 1
----> 1 os.mkdir('newDir')
FileExistsError: [Errno 17] File exists: 'newDir'
21.5.2. List Directory Contents#
Use os.listdir()
list the contents of a directory. This method returns a list of file names (strings) within that directory.
1os.listdir('newDir')
['newSubDir']
1os.listdir('.')
['02-ProblemSolvingByProgramming.ipynb',
'09.b-FormattingStrings-fStrings.ipynb',
'26-Comprehensions.ipynb',
'13-Iteration.ipynb',
'27-DateTime.ipynb',
'answers',
'25-ClassesAndObjects-Inheritance.ipynb',
'images',
'19-Recursion.ipynb',
'22-Testing.ipynb',
'14-Dictionaries.ipynb',
'09-FormattingStrings.ipynb',
'resources',
'17-Files.ipynb',
'07-Functions.ipynb',
'10-Lists.ipynb',
'21-Validation,ExceptionsAndErrorHandling.ipynb',
'03-Types,Values,Variables,Names.ipynb',
'08.a-Strings-OtherFunctions.ipynb',
'newDir',
'README.md',
'24-ClassesAndObjects.ipynb',
'04-Boolean,Numbers,Operations.ipynb',
'12-RandomNumbers.ipynb',
'06-Control.ipynb',
'18-FileOperations.ipynb',
'28-RegularExpressions.ipynb',
'20-Modules.ipynb',
'16-Miscellaneous.ipynb',
'01-Introduction.ipynb',
'11-Tuples.ipynb',
'test.txt',
'.ipynb_checkpoints',
'23-Debugging.ipynb',
'15-Sets.ipynb',
'data',
'05-BasicInputAndOutput.ipynb',
'09.a-FormattingStrings-OldStyle.ipynb',
'08-Strings.ipynb']
1# now, make a subdirectory in newDir
2os.mkdir('newDir/newSubDir')
3os.listdir('newDir')
---------------------------------------------------------------------------
FileExistsError Traceback (most recent call last)
Cell In[10], line 2
1 # now, make a subdirectory in newDir
----> 2 os.mkdir('newDir/newSubDir')
3 os.listdir('newDir')
FileExistsError: [Errno 17] File exists: 'newDir/newSubDir'
1with open("newDir/newSubDir/dickens.txt", 'w') as f:
2 f.write('It was the best of times,\n, it was the worst of times.\n')
1os.listdir('newDir/newSubDir')
['dickens.txt']
21.5.3. Delete Directory#
To delete a directory, use os.rmdir()
. However, the directory must be empty to be deleted – it cannot contain any other files or directories. You cannot use os.remove()
to delete a directory, only a file.
1# this will cause an error as remove can't be used on directory
2os.remove('newDir/newSubDir')
---------------------------------------------------------------------------
PermissionError Traceback (most recent call last)
Cell In[13], line 2
1 # this will cause an error as remove can't be used on directory
----> 2 os.remove('newDir/newSubDir')
PermissionError: [Errno 1] Operation not permitted: 'newDir/newSubDir'
1# this will cause an error as the directory is not empty
2os.rmdir('newDir/newSubDir')
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
Cell In[14], line 2
1 # this will cause an error as the directory is not empty
----> 2 os.rmdir('newDir/newSubDir')
OSError: [Errno 66] Directory not empty: 'newDir/newSubDir'
Fix the following code block to delete the text file created above first.
1# add a method call here
2
3# the following two lines of code are correct
4os.rmdir('newDir/newSubDir')
5os.path.exists('newDir/newSubDir')
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
Cell In[15], line 4
1 # add a method call here
2
3 # the following two lines of code are correct
----> 4 os.rmdir('newDir/newSubDir')
5 os.path.exists('newDir/newSubDir')
OSError: [Errno 66] Directory not empty: 'newDir/newSubDir'
21.5.4. Change the Current Working Directory#
Use os.chdir()
to change the current working directory.
1os.chdir('newDir')
Now, enter the method call to list the contents of the current directory
For other file and directory operations, look at the os module.
1os.chdir('..') # move the current directory back to our starting point
21.6. Pathnames#
Most computers use a hierarchical file system. As such, we have a current working directory based on our current shell session. Other times, a setting when an executable starts can establish the working directory. At the command line within the shell session), you can print the working directory with pwd
. With Python, we get the current working directory with
1os.getcwd()
'/Users/jbslanka/Documents/GitHub/jupyternotebooks/1-Core Python'
Within Jupyter Notebooks, we can also call out to the operating system:
1!pwd
/Users/jbslanka/Documents/GitHub/jupyternotebooks/1-Core Python
Throughout this notebook (and in most file/directory operation commands), we pass a directory name or file name as arguments into the various function calls. As we specify those names, we can either specify absolute or relative pathnames. Absolute pathnames start from the root (top) directory - these pathnames start with a /
. Relative pathnames start from the current directory. As demonstrated in this notebook’s first code block, .
refers to the current directory, and ..
refers to its parents.
To separate directories, most systems use a forward slash /
. The exception is Windows, which uses a backward slash \
. The reasoning dates back to the early days of MS-DOS in the 1980s. The ‘/’ was used to specify command line arguments, whereas Unix typically uses a dash -
. Windows is slowly migrating away from the \
. Within PowerShell, you can specify names with a /
, PowerShell converts it automatically to \
. Powershell uses -
to specify arguments. This migration demonstrates how difficult it is to overcome an implemented decision.
21.6.1. Finding Absolute Pathnames#
From a relative pathname, we can determine the absolute pathname with os.path.abspath()
1os.path.abspath('.')
'/Users/jbslanka/Documents/GitHub/jupyternotebooks/1-Core Python'
21.6.2. Creating Pathnames#
We can build a pathname from several parts(i.e., strings) by using os.path.join()
. This function combines names with the proper path separation character for the current operating system.
1os.path.join('stuff','foo','bar.txt')
'stuff/foo/bar.txt'
21.7. Pathlib#
In Python 3.4, the language developers added the pathlib
module. This module provides an alternative to the os
module presented in this notebook.
The pathlib
module introduced a Path
class to treat files and directories as objects with methods we call from that object rather than strings and calling functions under os
.
Further details The very bottom of that page shows the correspondence between the two approaches.
21.8. Suggest LLM Prompts#
Explain how to work with file paths and directories in Python using the os and pathlib modules. Cover operations such as creating, renaming, moving, and deleting files and directories. Provide examples of joining paths and handling relative and absolute paths.
Create a beginner-friendly tutorial that introduces the basic file operations in Python using the os and pathlib modules. Cover topics such as creating, reading, writing, and deleting files and directories. Include clear examples and explanations for each operation. Ensure each section has a detailed explanation.
Write a detailed tutorial on managing file and directory permissions in Python using the os module. Cover topics such as checking permissions, modifying permissions (chmod), and handling permission-related exceptions. Provide examples for different operating systems.
Develop a comprehensive guide on advanced path manipulation techniques using the pathlib module in Python. Cover topics such as path normalization, joining paths, extracting components (parent, stem, suffix), and handling different path flavors (Windows vs. Unix).
Create a comprehensive article that compares the os.path module and the pathlib module in Python for handling file and directory operations. Discuss the advantages and disadvantages of each approach, and provide guidance on when to use one over the other.
21.9. Review Questions#
What is the purpose of the os module in Python, and what kind of operations does it support?
How do you check if a file exists before attempting to open it?
What is the purpose of the
pathlib
module, and how does it differ from theos
module when working with file paths?How do you create a new directory in Python?
How can you list the contents of a directory using the os module?
What is the difference between
os.remove()
andos.rmdir()
functions, and when would you use each one?Explain the concept of the current working directory? How do you determine this within a Python program? How do you change it in Python?
List the various attributes of a file? How do you access this in Python? What about from the command-line?
How can you handle file permissions using the
os
module and thestat
module?What is the significance of the Unix epoch (January 1st, 1970) in file operations, and how is it related to the timestamps returned by
os.stat()
?
21.10. Exercise#
Sizes: For the current working directory, print each of the files on a separate line. Each line should start with the file size in bytes, followed by a tab character, and then the file’s name. Do not display subdirectories. Sort this output by the file name. After all of the lines have been, print a blank line and then this line:
Directory size: XXXX
where XXX is the total of all the file sizes (excluding subdirectories).
File Renaming and Moving: Develop a Python script that renames all files with the extension “.txt” in a specified directory to have the prefix “backup_” before the original filename. Additionally, move all renamed files to a new directory called “backup_files.”
Creating and Deleting Files and Directories: Write a Python script that creates a new directory called “project_files” in the current working directory. Within this new directory, create three subdirectories named “data,” “scripts,” and “output.” Additionally, create an empty file called “README.txt” in the “project_files” directory. Finally, delete the “scripts” subdirectory and its contents.