24. Modules#

Except for existing classes and functions we have used, a single file could easily fit any of the programs in these notebooks so far. However, most software projects are not as little as these examples and exercises. Fortunately, Python provides capabilities to organize code into multiple files.

Just as functions provide abstractions for a series of steps that perform a task, a module creates an abstraction of a group of related variables(data), functions, and classes. Modules are a crucial component for code reuse encapsulating functionality.

24.1. Using Modules#

To use a module, use the following statement import moduleName where moduleName is the name of an existing Python file (but without the .py extension) or a directory.

1import statistics
2
3statistics.stdev([10,5,12,3,6,4,11,4,8,5,6,7,8,6,5,6,7])
2.4985289789841403

You can also rename a module as you import it. Renaming provides an alternate alias to refer to the module in the code.

1import statistics as stat
2
3stat.stdev([10,5,12,3,6,4,11,4,8,5,6,7,8,6,5,6,7])
2.4985289789841403

Why rename imported modules?

  • avoid duplicate names

  • mnemonic

  • follow convention (Example: import pandas as pd)

  • minimize typing

You can also import specific items from a module: from moduleName import name

1from statistics import mean
2
3mean([10,5,12,3,6,4,11,4,8,5,6,7,8,6,5,6,7])
6.647058823529412

To list all modules currently installed (including built-in modules):

1help('modules')
Hide code cell output
Please wait a moment while I gather a list of all available modules...

test_sqlite3: testing with SQLite version 3.44.2
/opt/homebrew/Cellar/python@3.12/3.12.1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/pkgutil.py:78: UserWarning: The numpy.array_api submodule is still experimental. See NEP 47.
  __import__(info.name)
IPython             asyncio             jupyter_book        seaborn
PIL                 atexit              jupyter_cache       secrets
__future__          attr                jupyter_client      select
__hello__           attrs               jupyter_core        selectors
__phello__          audioop             jupyter_events      send2trash
_abc                babel               jupyter_lsp         setuptools
_aix_support        base64              jupyter_server      shelve
_argon2_cffi_bindings bdb                 jupyter_server_terminals shlex
_ast                binascii            jupyterlab          shutil
_asyncio            bisect              jupyterlab_pygments signal
_bisect             bleach              jupyterlab_server   site
_blake2             bs4                 keyword             sitecustomize
_bz2                builtins            kiwisolver          six
_cffi_backend       bz2                 latexcodec          sklearn
_codecs             cProfile            lib2to3             smtplib
_codecs_cn          calendar            linecache           sndhdr
_codecs_hk          certifi             linkify_it          sniffio
_codecs_iso2022     cffi                locale              snowballstemmer
_codecs_jp          cgi                 logging             socket
_codecs_kr          cgitb               lzma                socketserver
_codecs_tw          chardet             mailbox             soupsieve
_collections        charset_normalizer  mailcap             sphinx
_collections_abc    chunk               markdown_it         sphinx_book_theme
_compat_pickle      click               markupsafe          sphinx_comments
_compression        cmath               marshal             sphinx_copybutton
_contextvars        cmd                 math                sphinx_design
_crypt              code                matplotlib          sphinx_external_toc
_csv                codecs              matplotlib_inline   sphinx_jupyterbook_latex
_ctypes             codeop              mdit_py_plugins     sphinx_multitoc_numbering
_ctypes_test        collections         mdurl               sphinx_thebe
_curses             colorsys            mimetypes           sphinx_togglebutton
_curses_panel       comm                mistune             sqlalchemy
_datetime           compileall          mmap                sqlite3
_dbm                concurrent          modulefinder        sre_compile
_decimal            configparser        multiprocessing     sre_constants
_distutils_hack     contextlib          myst_nb             sre_parse
_elementtree        contextvars         myst_parser         ssl
_functools          contourpy           nbclient            stack_data
_hashlib            copy                nbconvert           stat
_heapq              copyreg             nbformat            statistics
_imp                coverage            nest_asyncio        string
_io                 crypt               netrc               stringprep
_json               csv                 nis                 struct
_locale             ctypes              nntplib             subprocess
_lsprof             curses              notebook_shim       sunau
_lzma               cycler              ntpath              symtable
_markupbase         dataclasses         nturl2path          sys
_md5                datetime            numbers             sysconfig
_multibytecodec     dateutil            numpy               syslog
_multiprocessing    dbm                 opcode              tabnanny
_opcode             debugpy             operator            tabulate
_operator           decimal             optparse            tarfile
_osx_support        decorator           os                  telnetlib
_pickle             defusedxml          overrides           tempfile
_posixshmem         difflib             packaging           terminado
_posixsubprocess    dis                 pandas              termios
_py_abc             doctest             pandocfilters       test
_pydatetime         docutils            parso               tests
_pydecimal          email               pathlib             textwrap
_pyio               encodings           pdb                 this
_pylong             ensurepip           pendulum            threading
_queue              enum                pexpect             threadpoolctl
_random             errno               pickle              time
_scproxy            executing           pickletools         time_machine
_sha1               fastjsonschema      pip                 timeit
_sha2               faulthandler        pipes               tinycss2
_sha3               fcntl               pkg_resources       tkinter
_signal             filecmp             pkgutil             token
_sitebuiltins       fileinput           platform            tokenize
_socket             fnmatch             platformdirs        tomllib
_sqlite3            fontTools           plistlib            tornado
_sre                fqdn                poplib              trace
_ssl                fractions           posix               traceback
_stat               ftplib              posixpath           tracemalloc
_statistics         functools           pprint              traitlets
_string             gc                  profile             tty
_strptime           genericpath         prometheus_client   turtle
_struct             getopt              prompt_toolkit      turtledemo
_symtable           getpass             pstats              types
_sysconfigdata__darwin_darwin gettext             psutil              typing
_testbuffer         glob                pty                 typing_extensions
_testcapi           googlesearch        ptyprocess          tzdata
_testclinic         graphlib            pure_eval           uc_micro
_testimportmultiple grp                 pwd                 unicodedata
_testinternalcapi   gzip                py_compile          unittest
_testmultiphase     h11                 pyaml               uri_template
_testsinglephase    hashlib             pybtex              urllib
_thread             heapq               pybtex_docutils     urllib3
_threading_local    hmac                pyclbr              uu
_time_machine       html                pycparser           uuid
_tokenize           http                pydata_sphinx_theme venv
_tracemalloc        httpcore            pydoc               warnings
_typing             httpx               pydoc_data          wave
_uuid               idlelib             pyexpat             wcwidth
_warnings           idna                pygments            weakref
_weakref            imagesize           pylab               webbrowser
_weakrefset         imaplib             pyparsing           webcolors
_xxinterpchannels   imghdr              pythonjsonlogger    webencodings
_xxsubinterpreters  importlib           pytz                websocket
_xxtestfuzz         importlib_metadata  queue               wheel
_yaml               inspect             quopri              wsgiref
_zoneinfo           io                  random              xdrlib
a11y_pygments       ipaddress           re                  xml
abc                 ipykernel           readline            xmlrpc
aifc                ipykernel_launcher  redis               xxlimited
alabaster           isoduration         referencing         xxlimited_35
antigravity         itertools           reprlib             xxsubtype
anyio               jedi                requests            yaml
appnope             jinja2              resource            zipapp
argon2              joblib              rfc3339_validator   zipfile
argparse            json                rfc3986_validator   zipimport
array               json5               rlcompleter         zipp
arrow               jsonpointer         rpds                zlib
ast                 jsonschema          runpy               zmq
asttokens           jsonschema_specifications sched               zoneinfo
async_lru           jupyter             scipy               

Enter any module name to get more help.  Or, type "modules spam" to search
for modules whose name or summary contain the string "spam".

To see the help documentation for a specific module, pass it as a string to help

1help('chardet')
Help on package chardet:

NAME
    chardet

DESCRIPTION
    ######################## BEGIN LICENSE BLOCK ########################
    # This library is free software; you can redistribute it and/or
    # modify it under the terms of the GNU Lesser General Public
    # License as published by the Free Software Foundation; either
    # version 2.1 of the License, or (at your option) any later version.
    #
    # This library is distributed in the hope that it will be useful,
    # but WITHOUT ANY WARRANTY; without even the implied warranty of
    # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
    # Lesser General Public License for more details.
    #
    # You should have received a copy of the GNU Lesser General Public
    # License along with this library; if not, write to the Free Software
    # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
    # 02110-1301  USA
    ######################### END LICENSE BLOCK #########################

PACKAGE CONTENTS
    __main__
    big5freq
    big5prober
    chardistribution
    charsetgroupprober
    charsetprober
    cli (package)
    codingstatemachine
    codingstatemachinedict
    cp949prober
    enums
    escprober
    escsm
    eucjpprober
    euckrfreq
    euckrprober
    euctwfreq
    euctwprober
    gb2312freq
    gb2312prober
    hebrewprober
    jisfreq
    johabfreq
    johabprober
    jpcntx
    langbulgarianmodel
    langgreekmodel
    langhebrewmodel
    langhungarianmodel
    langrussianmodel
    langthaimodel
    langturkishmodel
    latin1prober
    macromanprober
    mbcharsetprober
    mbcsgroupprober
    mbcssm
    metadata (package)
    resultdict
    sbcharsetprober
    sbcsgroupprober
    sjisprober
    universaldetector
    utf1632prober
    utf8prober
    version

CLASSES
    builtins.object
        chardet.universaldetector.UniversalDetector

    class UniversalDetector(builtins.object)
     |  UniversalDetector(lang_filter: chardet.enums.LanguageFilter = <LanguageFilter.ALL: 31>, should_rename_legacy: bool = False) -> None
     |
     |  The ``UniversalDetector`` class underlies the ``chardet.detect`` function
     |  and coordinates all of the different charset probers.
     |
     |  To get a ``dict`` containing an encoding and its confidence, you can simply
     |  run:
     |
     |  .. code::
     |
     |          u = UniversalDetector()
     |          u.feed(some_bytes)
     |          u.close()
     |          detected = u.result
     |
     |  Methods defined here:
     |
     |  __init__(self, lang_filter: chardet.enums.LanguageFilter = <LanguageFilter.ALL: 31>, should_rename_legacy: bool = False) -> None
     |      Initialize self.  See help(type(self)) for accurate signature.
     |
     |  close(self) -> dict
     |      Stop analyzing the current document and come up with a final
     |      prediction.
     |
     |      :returns:  The ``result`` attribute, a ``dict`` with the keys
     |                 `encoding`, `confidence`, and `language`.
     |
     |  feed(self, byte_str: Union[bytes, bytearray]) -> None
     |      Takes a chunk of a document and feeds it through all of the relevant
     |      charset probers.
     |
     |      After calling ``feed``, you can check the value of the ``done``
     |      attribute to see if you need to continue feeding the
     |      ``UniversalDetector`` more data, or if it has made a prediction
     |      (in the ``result`` attribute).
     |
     |      .. note::
     |         You should always call ``close`` when you're done feeding in your
     |         document if ``done`` is not already ``True``.
     |
     |  reset(self) -> None
     |      Reset the UniversalDetector and all of its probers back to their
     |      initial states.  This is called by ``__init__``, so you only need to
     |      call this directly in between analyses of different documents.
     |
     |  ----------------------------------------------------------------------
     |  Readonly properties defined here:
     |
     |  charset_probers
     |
     |  has_win_bytes
     |
     |  input_state
     |
     |  ----------------------------------------------------------------------
     |  Data descriptors defined here:
     |
     |  __dict__
     |      dictionary for instance variables
     |
     |  __weakref__
     |      list of weak references to the object
     |
     |  ----------------------------------------------------------------------
     |  Data and other attributes defined here:
     |
     |  ESC_DETECTOR = re.compile(b'(\x1b|~{)')
     |
     |  HIGH_BYTE_DETECTOR = re.compile(b'[\x80-\xff]')
     |
     |  ISO_WIN_MAP = {'iso-8859-1': 'Windows-1252', 'iso-8859-13': 'Windows-1...
     |
     |  LEGACY_MAP = {'ascii': 'Windows-1252', 'euc-kr': 'CP949', 'gb2312': 'G...
     |
     |  MINIMUM_THRESHOLD = 0.2
     |
     |  WIN_BYTE_DETECTOR = re.compile(b'[\x80-\x9f]')

FUNCTIONS
    detect(byte_str: Union[bytes, bytearray], should_rename_legacy: bool = False) -> dict
        Detect the encoding of the given byte string.

        :param byte_str:     The byte sequence to examine.
        :type byte_str:      ``bytes`` or ``bytearray``
        :param should_rename_legacy:  Should we rename legacy encodings
                                      to their more modern equivalents?
        :type should_rename_legacy:   ``bool``

    detect_all(byte_str: Union[bytes, bytearray], ignore_threshold: bool = False, should_rename_legacy: bool = False) -> List[dict]
        Detect all the possible encodings of the given byte string.

        :param byte_str:          The byte sequence to examine.
        :type byte_str:           ``bytes`` or ``bytearray``
        :param ignore_threshold:  Include encodings that are below
                                  ``UniversalDetector.MINIMUM_THRESHOLD``
                                  in results.
        :type ignore_threshold:   ``bool``
        :param should_rename_legacy:  Should we rename legacy encodings
                                      to their more modern equivalents?
        :type should_rename_legacy:   ``bool``

DATA
    VERSION = ['5', '2', '0']
    __all__ = ['UniversalDetector', 'detect', 'detect_all', '__version__',...

VERSION
    5.2.0

FILE
    /Users/jbslanka/Documents/GitHub/jupyternotebooks/venv/lib/python3.12/site-packages/chardet/__init__.py

24.2. Packages#

Python organizes modules by subdirectories into packages. The directory names form a hierarchy of names.

Before Python 3.3, developers had to create a file named __init__.py in a directory for the interpreter to consider the directory a Python package. __init__.py is typically empty but can contain any initialization code for the package. Without the __init__.py file, the package is considered an implicit namespace package. The technical details between packages and implicit namespace packages are irrelevant for most use cases, However, issues generally arise when the same package name appears in more than one location in the search path (see the section below - “How Import Works”).

View more details.

Note: the last two links are for informational purposes only.

The use of __init__.py is a common interview question.

Typically, programmers use the terms “modules” and “packages” interchangeably.

24.2.1. Installing other Modules#

The de facto way to install additional modules and packages is to use pip.

As mentioned in both the Preliminaries amd The Tools sections, you should be use virtual environments for your projects - especially to prevent incompatibilities among different versions of a package across multiple projects.

Technically you can use the ‘pip’ command to install packages:

pip install packageName
However, the recommended approach is to start the Python interpreter and use the module name as the command line argument:

    python -m pip install packageName

Using the python executable ensures the package installs into the correct environment.

Similarly, for Jupyter Notebooks:

    import sys
    !{sys.executable} -m pip install packageName

or

    %pip install packageName

For notebooks, you may have seen

!pip install packageName

However, this will install the package into the environment from which Jupyter started, not the current environment. The prior code block used a magic command available in Jupyter.

You should ensure the current Python environment has the packages setuptools and wheel installed when using pip. wheel can install compatible, pre-built packages into your environment if compatible. setuptools helps to handle the installation of other packages from source code. The following code block ensures that the current environment has the most recent versions of these three packages installed.

1import sys
2!{sys.executable} -m pip install --upgrade pip setuptools wheel
Requirement already satisfied: pip in /Users/jbslanka/Documents/GitHub/jupyternotebooks/venv/lib/python3.12/site-packages (24.0)
Requirement already satisfied: setuptools in /Users/jbslanka/Documents/GitHub/jupyternotebooks/venv/lib/python3.12/site-packages (69.5.1)
Requirement already satisfied: wheel in /Users/jbslanka/Documents/GitHub/jupyternotebooks/venv/lib/python3.12/site-packages (0.43.0)
Security Note

Software supply chain has become one of the more easy to exploit avenues for compromising software security. Often, developers will include dependencies in their code without validating those dependencies first.

Possible ways to mitigate this attack vector:

  • Use trusted, well known components
  • Scan dependencies for known vulnerabilities
  • Practice defense in depth. While you may not be to prevent the software issue, can you minimize the damage?
  • Use a trusted source for components.
Supply Chain Attacks

Regarding a trusted source, Google announced in May 2022 that they would provide a new Google Cloud Service, “Assured Open Source Software”, distributing components curated by the company. https://cloud.google.com/blog/products/identity-security/introducing-assured-open-source-software-service

24.2.2. Commonly Used Modules / Packages#

The following table contains a list of commonly used modules and a brief description.
Modules with a URL containing “python.org” belong to Python’s standard library. When installing Python, the process installs these modules as part of the overall environment. Python Standard Library

Package Name

Description

Import
Alias

URL

datetime

Supplies classes to represent and manipulate date and times

dt

https://docs.python.org/3/library/datetime.html

json

Exposes APIs to load, parse, and write JSON Objects.

https://docs.python.org/3/library/json.html

math

Variety of math functions for floats and integers

https://docs.python.org/3/library/math.html

matplotlib

Comprehensive visualization library

mpl

https://matplotlib.org

numpy

Foundational package for scientific computing. Supports multidimensional arrays and matrices

np

https://numpy.org

pandas

Data analysis and manipulation tool. Core library to perform data science in Python

pd

https://pandas.pydata.org

os

Provides access to common operating system functions.

https://docs.python.org/3/library/os.html

random

Implements random number generation for various distributions

https://docs.python.org/3/library/random.html

scipy

Contains algorithms for optimization, integration, interpolation, eigenvalue problems, algebraic equations, differential equations, statistics, and many other classes of problems

https://scipy.org

seaborn

Visualization library built on top of matplotlib that provides attractive and informative statistical graphics

sns

https://seaborn.pydata.org

statistics

Provides functions to calculate common statistics

https://docs.python.org/3/library/statistics.html

sys

Provides access to variables and functions used by the Python interpreter

https://docs.python.org/3/library/sys.html

unittest

Automated testing framework

https://docs.python.org/3/library/sys.html

The “Import Alias” column contains the conventional alias used for this package/module during an import.

24.3. Developing and Using Modules#

At the very simplest level, a module is just a text file that contains python code.

For example, we could create a statistics module. The code below exists in a file “mystatistics.py”.

"""mystatistics provides implementations of common descriptive statistical functions - min - max - range - mean - median - variance - std_dev

Each funtion takes a single list. All contents of that should be a float or an integer “””

def min(l): “”” returns the minimum value in the list. Raises a ValueError if empty””” if l: s_list = sorted(l) return s_list[0] else: raise ValueError(“list empty”)

def max(l): “”” returns the maximum value in the list. Raises a ValueError if empty””” if l: s_list = sorted(l) return s_list[-1] else: raise ValueError(“list empty”)

def range(l): “”” returns the difference between the minimum and maximum value in the list. Raises a ValueError if empty””” if l: s_list = sorted(l) return s_list[-1] - s_list[0] else: raise ValueError(“list empty”)

def mean(l): “””computes the mean of the list””” if l: return sum(l)/len(l) else: raise ValueError(“list empty”)

def median(l): “””Finds the median value of the list””” if l: s_list = sorted(l) return s_list[len(s_list)//2] if len(s_list)%2 == 1 else (s_list[len(s_list)//2 - 1] +s_list[len(s_list)//2])/2 else: raise ValueError(“list empty”)

def variance(l): “””Calculates the population variance for the list””” m = mean(l) dif = 0 for x in l: dif += (m-x)**2 return dif/len(l)

def std_dev(l): “””Calculates the population standard deviation for list””” return variance(l)**.5

if __name__ == “__main__”: test_list = [10,12,14] print(“Min:”, min(test_list)) print(“Max:”, max(test_list)) print(“Range:”, range(test_list)) print(“Mean:”, mean(test_list)) print(“Median:”, median(test_list)) print(“Variance:”, variance(test_list)) print(“Std Dev:”, std_dev(test_list))

We can now use this module by importing it and then using the functions defined within it.

1import mystatistics
2
3test_list = [10,12,14]
4print("Std Dev:",  mystatistics.std_dev(test_list))
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[7], line 1
----> 1 import mystatistics
      3 test_list = [10,12,14]
      4 print("Std Dev:",  mystatistics.std_dev(test_list))

ModuleNotFoundError: No module named 'mystatistics'

24.4. How Import Works#

When the Python interpreter executes the import moduleName statement, it first checks to see if it has previously imported that module. If not, the interpreter searches a list of directories for a file named moduleName.py or a directory with that name. This search list is available in a Python variable sys.path and is composed of the following sources:

  • the current working directory

  • the PYTHONPATH environment variable

  • an installation-dependent list of directories (created at install time or when creating a virtual environment)

1import sys
2print(sys.path)
['/opt/homebrew/Cellar/python@3.12/3.12.1/Frameworks/Python.framework/Versions/3.12/lib/python312.zip', '/opt/homebrew/Cellar/python@3.12/3.12.1/Frameworks/Python.framework/Versions/3.12/lib/python3.12', '/opt/homebrew/Cellar/python@3.12/3.12.1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload', '', '/Users/jbslanka/Documents/GitHub/jupyternotebooks/venv/lib/python3.12/site-packages']

Next, the Python interpreter binds the search results to a name in the current local scope. This binding allows us to reference the module name, alias, or specific import item within our code. The following code shows that the length of the local namespace has grown by one from the import of the PI variable:

1print("Local namespace size:",len(locals()))
2from math import pi
3print("Local namespace size (after import of PI):",len(locals()))
Local namespace size: 39
Local namespace size (after import of PI): 40

Then the Python interpreter executes the code within the moduleName.py name. The execution creates any classes or functions defined within the file and runs any statements not contained within a class or function declaration. The latter is essential to allow the module to perform any necessary initialization steps before use.

24.5. The “main” Method.#

Unlike many other languages that explicitly use a main method as the entry point for a program, Python does not explicitly have such a standard entry point. (C, C++, and Java all have main functions/methods.) As just mentioned, when a module (file) is loaded, all of the code within that module is interpreted and executed.

The standard convention for Python programmers is to use the following boilerplate code towards the bottom of module files. This code checks if the file started from the command line (through a command such as python moduleName.py.

if __name__ == "__main__":
    statements

If it has, then the statements for that block will execute. This check enables the module to run as a main program, but if the module is imported by other programs, skip that code block.

24.5.1. Command-line Arguments#

Quite often, programs can process arguments as they are executed to alter their behavior. You may have seen such arguments when looking at man pages. For example, the man page for head starts:

HEAD(1)                                           User Commands                                       

NAME
       head - output the first part of files

SYNOPSIS
       head [OPTION]... [FILE]...

DESCRIPTION
       Print  the  first  10 lines of each FILE to standard output.  With more than one FILE, 
       precede each with a header giving the file name.

       With no FILE, or when FILE is -, read standard input.

       Mandatory arguments to long options are mandatory for short options too.

       -c, --bytes=[-]NUM
              print the first NUM bytes of each file; with the leading '-', print all but the 
              last NUM  bytes  of each file

       -n, --lines=[-]NUM
              print the first NUM lines instead of the first 10; with the leading '-', 
              print all but the last NUM lines of each file

Executing head -100 information.txt prints the first 100 lines of the file information.txt.

Within Python, we can access these command-line arguments by using the argv list from the sys module:

1from sys import argv
2print(argv)
['/Users/jbslanka/Documents/GitHub/jupyternotebooks/venv/lib/python3.12/site-packages/ipykernel_launcher.py', '-f', '/private/var/folders/f0/69wncpqd02s3j1r_z5sfddd00000gq/T/tmpdhw8v2er.json', '--HistoryManager.hist_file=:memory:']

Running that cell displays the arguments to the Python interpreter used to start the instance of Jupyter (or what other environment this notebook may be executing). Running that cell from a command line such as python test.py hello word produces the following argv: argv[0]: “test.py” argv[1]: “hello” argv[2]: “world” Note that each entry in the argv list is a string.

Python does provide a built-in module argparse to make it easier to create command-line interfaces by defining the required arguments, generating help & usage statements, and then parsing the arguments.

24.5.2. Exit Codes#

A common task in many programs is to return a value when the program exists. This is why the main() function in C returns an int. By default, Python will automatically return 0 which is used to indicate success for most programs. man pages will often list the return values. For example, man useradd contains the following section:

EXIT VALUES
       The useradd command exits with the following values:
       0   success
       1   can't update password file
       2   invalid command syntax
       3   invalid argument to option
       4   UID already in use (and no -o)
       ...

Within most shell environments, executing echo $? provides the value of the last process that completed.

Using a return value provides information to the environment in which your program executes such that others may take action based upon that result (i.e., the success or failure of your program).

Here’s an example of how this could look for a Python program expecting at least one command-line argument in addition to the Python file:

    import sys

    if len(sys.argv) != 2:
        print("Missing command-line argument")
        sys.exit(1)

    # various processing ... and then success
    sys.exit(0)

24.6. Module Docstrings#

To help other developers properly use your modules, you should use a docstring at the top of the file. The docstring should list the purpose of the module and then list the classes, functions, exceptions, and any other items exported by the module with a quick summary of each. Docstring conventions

24.7. Best Practices#

Although developers can configure modules to only export specific items when another programmer uses from module import *, it is still considered bad practice. This statement imports all of the module’s objects into your local namespace, making it difficult to determine what’s what. While typing module. is a bit more tedious, it makes your code clear where an object originated.

As you create modules, you should only group things that logically belong together. Simply because you wrote two functions does not necessarily mean they should be within the same module. Quite often, “utility” packages violate this principle.

While you can distribute modules and packages by simply providing the source code to others, you should ‘package” these: Overview of Packaging for Python Tutorial

The Python interpreter will only load a module once into your program - even if the code imports the module in multiple locations. Thus any changes to that module can be seen by other code that uses that module. As with anything else with Python (or any programming language), such functionality can be beneficial or a curse. Unsurprisingly, programming languages expect developers to not behave maliciously, such as in this code:

 1import statistics
 2
 3def bad_programmer(l):
 4    import random
 5    return random.random()
 6
 7statistics.stdev = bad_programmer
 8
 9my_list = [10,5,12,3,6,4,11,4,8,5,6,7,8,6,5,6,7]
10print("Std dev:",statistics.stdev(my_list))
11print("Std dev:",statistics.stdev(my_list))
12# good luck tracking down that one!
Std dev: 0.8549979233036142
Std dev: 0.3393732734989049

24.8. Suggested LLM Prompts#

  • Explain the different ways to import modules in Python, including the import statement, from ... import syntax, and the use of aliases. Provide examples of importing built-in modules like math and os, as well as importing custom modules from different directories. Discuss the importance of managing import statements and the best practices for organizing them.

  • Walk through the process of creating a custom Python module from scratch. Explain the structure of a module file, including the use of functions, classes, and variables. Discuss the importance of docstrings and how to properly document a module for others to use. Provide examples of writing reusable and modular code that can be easily imported into other projects.

  • Explain the purpose and significance of the __init__.py file in Python packages and modules. Discuss how it allows a directory to be treated as a package, and how it can be used to initialize package-level variables or execute code when the package is imported. Provide examples of using __init__.py to define package-level constants, functions, or aliases. Is this required in new versions of Python?

  • Explore the concepts of namespaces and scope in relation to Python modules. Explain how modules create their own namespace and how variables and functions within a module are accessible or inaccessible from other parts of the program. Discuss the use of the global and nonlocal keywords, and provide examples of how they can be used to modify variables from different scopes.

  • Describe the module search path in Python and how it determines where to look for imported modules. Explain the role of the sys.path list and how it can be modified to include additional directories. Discuss the use of environment variables like PYTHONPATH and how they can be used to specify additional search locations for modules.

  • Explain the purpose and usage of the if __name__ == "__main__" idiom in Python. Describe how it allows you to separate code that should only run when the script is executed directly, as opposed to being imported as a module. Provide examples of how this idiom is used to create an entry point or “main method” for a Python script.

  • Introduce the sys.argv list in Python, which contains the command-line arguments passed to a script. Explain how to access and iterate over the elements of sys.argv, and provide examples of scripts that take various command-line arguments and process them accordingly.

  • Explain what exit codes are and their purpose in programming. Discuss how they are used to communicate the success or failure of a program or script to the operating system or the calling environment. Provide examples of common exit codes and their meanings (e.g., 0 for success, non-zero for failure).

  • Demonstrate how to return exit codes from Python scripts using the sys.exit() function. Explain the difference between exit() and quit() as both allow you to specify a custom exit code. Provide examples of returning different exit codes based on different scenarios or error conditions. [Note: these two functions are virtually the same now, quit() is typically used by a human in a Python shell to stop the interpreter and return to the command-line.]

  • Explain how command-line arguments can be combined with configuration files and environment variables to provide a more flexible and extensible way of configuring applications. Discuss how to read configuration files using Python modules like configparser and how to access environment variables using os.environ.

  • Discuss the importance of logging and debugging in command-line scripts, especially when dealing with complex argument parsing and processing. Introduce the logging module in Python and demonstrate how to set up different logging levels, formats, and handlers for effective debugging and monitoring of your scripts.

  • Explain what supply chain attacks are and how they can specifically target Python packages and repositories. Discuss real-world examples of supply chain attacks, such as the infamous event-stream incident, and their potential consequences, including data breaches, malware infections, and system compromises. Highlight the importance of verifying the integrity of Python packages from trusted sources and the potential risks associated with using third-party packages from untrusted repositories.

  • Discuss the various security measures and best practices that can be implemented to secure the Python package distribution ecosystem against supply chain attacks. Explore techniques like code signing, package integrity verification, and the use of secure package repositories. Highlight the role of organizations like the Python Software Foundation (PSF) and package maintainers in ensuring the security and integrity of popular Python packages. Provide practical recommendations for developers to mitigate supply chain risks when using third-party packages.

  • Explain the importance of proper dependency management in Python projects and how it relates to supply chain security. Discuss the potential risks of using outdated or vulnerable dependencies and the challenges of managing transitive dependencies (dependencies of dependencies). Introduce tools and techniques for managing dependencies, such as virtual environments, dependency lockers (e.g., pip-tools), and vulnerability scanners (e.g., safety). Provide guidance on best practices for maintaining up-to-date and secure dependencies in Python projects.

24.9. Review Questions#

  1. Explain the difference between a module and a package in Python.

  2. What is the purpose of the import statement in Python?

  3. What is the difference between import module and from module import function?

  4. How can you import a module with an alias?

  5. How do you define functions, classes, and variables within a module?

  6. How do you properly document a module for others to use?

  7. How can variables and functions within a module be accessed from other parts of the program?

  8. Discuss the pros and cons of from module import *.

  9. Do different python files importing the same module/package have unique or shared instances of that module/package?

  10. What is the purpose of the if __name__ == "__main__" idiom in Python?

  11. What is the purpose of the sys.argv list in Python, and how is it used? How do you access and iterate over the elements of sys.argv?

answers

24.10. Exercises#

  1. Create a Module and Import Functions: Write a Python module containing several functions (e.g., mathematical operations, string manipulation, etc.). In a separate script, import the module and use its functions to solve various problems or perform calculations.

  2. Echo Arguments: Write a program that will echo any arguments from the command-line to standard out.

  3. Modules and the Path: Write a Python script that prints the current module search path (sys.path). Modify the search path by appending a new directory and create a custom module in that directory. Import and use the custom module in your script. In addition to modifying sys.path directly, add a path to the PYTHONPATH environment variable.