18. Sets#

A set is an unordered collection of objects with no duplicates. Objects must be immutable to be placed into a set, similar to keys in a dictionary.

Sets support many of the same operations that lists and tuples have, but set objects also support mathematical operations such as union, intersection, and difference. Since sets are unordered, index operations to retrieve and insert elements at specific points in the collection do not make sense and have no implementation.

Another way to think about sets is to equate them to a dictionary with its keys but no values corresponding to those keys.

Set theory plays a pivotal role in many computer science applications - most significantly relational databases, but programmers use sets for various purposes. For example, we can create a unique collection (no repeated elements) from a list by converting it to a set.

To create a set, use the built-in function set() or enclose a comma-separated list of values within curly brackets. The set() function iterates through the the value of its argument and adds members to the set, removing duplicates.

To create an empty set, you must use the set() function as {} creates an empty dictionary.

1empty_set = set()
2primes = {1,2,3,5,7,13,11,13,17,19}
3colors = set(("red","green","blue","red")) # without the inner parenthesis, we have 4 arguments.  Set expects an single iterable.
4
5print(colors)
{'blue', 'green', 'red'}
1set('abracadabra')
{'a', 'b', 'c', 'd', 'r'}

In the above example, we pass a string, an iterable sequence of characters. Only five unique characters exist in the string.

As members must be unique within a set, we cannot place mutable objects into a set. Otherwise, the value of that item could change and affect the set’s uniqueness. In the second example, we create a set of a single tuple. Tuples are immutable.

1invalid_set = {['test', 'word']}    # raises a TypeError, sets cannot contain mutable members
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 invalid_set = {['test', 'word']}    # raises a TypeError, sets cannot contain mutable members

TypeError: unhashable type: 'list'
1valid_set = { ('test', 'word')}

18.1. Getting the size (number of entries) of a set#

As with the other data structures, we can get the number of entries (length, size) of a set by using the len() function

1len(primes)
9

18.2. Adding to a set#

Use the add() method on a set to add another element. To add multiple items, use the update() method.

1primes.add(23)
2primes.update([29,31,37])
3primes
{1, 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37}

18.3. Deleting from a set#

Use remove() to delete (remove) an item from a set by the value. If the value does not exist, the Python interpreter raises a KeyError. discard() will remove a value from a set, but will not raise an error if the value does not exist.

To remove multiple items, use difference_update()

1primes.add(4)
2print(primes)
3primes.remove(4)
4print(primes)
5primes.difference_update([6,8,29,31,37])
6print(primes)
{1, 2, 3, 4, 5, 37, 7, 11, 13, 17, 19, 23, 29, 31}
{1, 2, 3, 5, 37, 7, 11, 13, 17, 19, 23, 29, 31}
{1, 2, 3, 5, 7, 11, 13, 17, 19, 23}

18.4. Iterating over Sets#

Just like the other built-in data structures, we can use the for statement(loop) to process all of the items in a set.

1for color in colors:
2    print(color)
blue
green
red

18.5. Checking if a values exists#

Use the in operator to test if a contains a value for not

1'magenta' in colors
False

18.6. Set Operations#

Python supports standard math operations on sets: union, intersection, difference, and symmetric difference.

Set Operations can be performed either by method calls or opeartors.

Source:https://www.datacamp.com/community/tutorials/sets-in-python

1us_flag_colors = set(["red", "white", "blue"])
2france_flag_colors = set(["blue", "white", "red"])
3switzerland_flag_colors = set(["white", "red"])
4mexico_flag_colors = {"green", "white", "red"}
5germany_flag_colors = {"black", "red", "gold"}

18.6.1. Union#

Returns a new set with all of the unique elements from both sets. Can either use the union() method or |

Note: the difference with the update() method is that update operates on the set itself, not creating a new set.

1print(us_flag_colors | germany_flag_colors)
2print(mexico_flag_colors.union(switzerland_flag_colors))
{'blue', 'red', 'gold', 'white', 'black'}
{'green', 'red', 'white'}

The union operation can be performed on multiple sets at once:

1print(us_flag_colors | germany_flag_colors | france_flag_colors | mexico_flag_colors | switzerland_flag_colors)
2print(us_flag_colors.union(germany_flag_colors, france_flag_colors, mexico_flag_colors, switzerland_flag_colors))
{'blue', 'green', 'red', 'gold', 'white', 'black'}
{'blue', 'green', 'red', 'gold', 'white', 'black'}

18.6.2. Intersection#

Returns a new set with all of the elements that exist in both sets. Can use intersection() or &

This operation can be performed on multiple sets as well.

1print(us_flag_colors & germany_flag_colors)
2print(mexico_flag_colors.intersection(switzerland_flag_colors))
{'red'}
{'white', 'red'}
1print(us_flag_colors & germany_flag_colors & france_flag_colors & mexico_flag_colors & switzerland_flag_colors)
2print(us_flag_colors.intersection(germany_flag_colors, france_flag_colors, mexico_flag_colors, switzerland_flag_colors))
{'red'}
{'red'}

18.6.3. Difference#

Returns a new set with the elements in the first set that are not in the other set. Use difference() or -

Yes, you can chain these operators. The operations evaluate left to right.

1print(us_flag_colors - germany_flag_colors)
2print(mexico_flag_colors.difference(switzerland_flag_colors))
3
4print (us_flag_colors - germany_flag_colors - switzerland_flag_colors)
{'blue', 'white'}
{'green'}
{'blue'}

18.6.4. Symmetric Difference#

Returns a new set with the elements in the first set and elements in the second set, but the elements are not in both sets. Use symmetric_difference() or ^

1print(us_flag_colors ^ germany_flag_colors)
2print(mexico_flag_colors.symmetric_difference(switzerland_flag_colors))
{'blue', 'gold', 'white', 'black'}
{'green'}

18.6.5. Set Operation Discussion#

Given De Morgan’s Laws for union and intersection, demonstrate these two equations in Python using sets:

\(\overline{A \cup B} = \overline{A} \cap \overline{B}\)

\(\overline{A \cap B} = \overline{A} \cup \overline{B}\)

where

  • \(\overline{A}\) is the negation of set \(A\)

  • \(\cap\) is the intersection operator

  • \(\cup\) is the union operator

At first glance, this problem seems easy - these laws have been around since the 19th century. However, what does \(\overline{A}\) mean to a computer? Fundamentally, this is everything in the universe \(U\) not in set \(A\). Python, and most other general-purpose programming languages, do not have direct support for such a universal set. Therefore, we need to define the universal set for our problem domain.

So, let us practice these laws on the flag colors. We can define a universal set of colors:

\(U = \{ red, orange, yellow, green, blue, purple, pink, brown, grey, black, white, gold, silver \}\)

1a = { "green", "white", "red"}
2b = { "black", "red", "gold"}
3U = { "red", "orange", "yellow", "green", "blue", "purple", "pink", "brown", "grey", "black", "white", "gold", "silver" }
1print(U - (a | b) == (U - a) & (U - b))
2print(U - (a | b))
3print(a | b)
True
{'silver', 'brown', 'pink', 'grey', 'blue', 'orange', 'purple', 'yellow'}
{'green', 'red', 'gold', 'white', 'black'}
1print(U - (a & b) == (U - a) | (U - b))
2print(U - (a & b))
3print(a & b)
True
{'yellow', 'silver', 'brown', 'pink', 'blue', 'orange', 'green', 'purple', 'gold', 'grey', 'black', 'white'}
{'red'}

Throughout your programming career, you will find De Morgan’s laws arising in Boolean operations - an if statements will have a condition like not(A or B).

18.7. Comparing Sets#

As with lists, we can use the complete range of comparison operators on sets.

Comparison

Method

Description

a == b

Do sets a and b contain the same members?

a != b

Is there at least one difference between sets a and b?

a < b

a.issubset(b)

Is set a a proper subset of b?

a <= b

Is set a a subset or equal to b?

a > b

a.issuperset(b)

Is set a a proper superset of b?

a >= b

Is set a a superset or equal to b?

1print("us_flag_colors == france_flag_colors:",us_flag_colors == france_flag_colors)
2print("us_flag_colors != france_flag_colors:",us_flag_colors != france_flag_colors)
3print("us_flag_colors <  france_flag_colors:",us_flag_colors < france_flag_colors)
4print("us_flag_colors <= france_flag_colors:",us_flag_colors <= france_flag_colors)
5print("us_flag_colors >  france_flag_colors:",us_flag_colors > france_flag_colors)
6print("us_flag_colors >= france_flag_colors:",us_flag_colors >= france_flag_colors)
7
8print ("switzerland_flag_colors < us_flag_colors:", switzerland_flag_colors < us_flag_colors)
9print ("us_flag_colors > switzerland_flag_colors:", us_flag_colors > switzerland_flag_colors)
us_flag_colors == france_flag_colors: True
us_flag_colors != france_flag_colors: False
us_flag_colors <  france_flag_colors: False
us_flag_colors <= france_flag_colors: True
us_flag_colors >  france_flag_colors: False
us_flag_colors >= france_flag_colors: True
switzerland_flag_colors < us_flag_colors: True
us_flag_colors > switzerland_flag_colors: True

18.8. Suggested LLM Prompts#

  1. Explain what a set is in Python, its mathematical definition, and how it differs from other data structures like lists and dictionaries. Discuss the key properties of sets, such as uniqueness, unorderedness, and their use cases in various programming scenarios.

  2. Provide financial examples of creating sets using different methods, such as the set() constructor and set literal ({}). Demonstrate how to create sets from other iterables like lists and strings.

  3. Explore the basic set operations, including union (|), intersection (&), difference (-), and symmetric difference (^). Explain their functionalities and use cases, and demonstrate their implementation with code examples.

  4. Dive into the various built-in methods available for sets, such as add(), remove(), discard(), pop(), clear(), copy(), isdisjoint(), issubset(), and issuperset(). Explain their functionalities and use cases, and demonstrate their implementation with code examples.

  5. Discuss the importance of membership testing in sets using the in and not in operators. Provide examples of how to check for the existence of elements in a set and explain the time complexity of these operations.

  6. Explain the concept of frozen sets, which are immutable versions of regular sets. Discuss their use cases, such as serving as keys in dictionaries or elements in other sets, and demonstrate how to create and work with frozen sets.

  7. Discuss the performance considerations of sets in Python. Explain how sets are implemented under the hood (hash tables), and discuss the time complexities of various set operations. Provide guidelines for optimizing set usage in performance-critical applications.

18.9. Review Questions#

  1. What is a set in Python, and how is it different from other data structures like lists and dictionaries?

  2. How do you create a set from a list or a string in Python?

  3. What is the union operation on sets, and how do you perform it in Python?

  4. What is the intersection operation on sets, and how do you perform it in Python?

  5. How do you find the difference between two sets in Python?

  6. How do you add a new element to an existing set in Python?

  7. How do you remove an element from a set in Python? What happens if the value does not exist?

  8. How do you copy a set in Python?

answers

18.10. Exercises#

  1. Investment Portfolio Diversification: Suppose you are a financial advisor, and one of your clients wants to diversify their investment portfolio. They have provided you with two lists: one containing the stocks they currently hold, and another containing the stocks they are interested in buying. Your task is to write a Python program that uses sets to:

    1. Identify the stocks that are common between the two lists (i.e., the intersection).
    2. Identify the stocks that are unique to each list (i.e., the differences).
    3. Suggest a new portfolio that includes all the unique stocks from both lists (i.e., the union).
    portfolio = {"WFC", "IBM", "APPL", "T", "CAT", "CVX" }
    interest = {"T","F","TSLA","CVX","CSCO","KO"}
    
  2. Expense Tracking: Develop a Python program that helps track monthly expenses. The program should allow the user to input their expenses (e.g., rent, utilities, groceries, entertainment) for each day of the month. At the end of the month, the program should use sets to:

    1. Identify the unique expense categories for the month.
    2. Calculate the total amount spent for each expense category.
    3. Determine the most frequently occurring expense category.

    Use a dictionary called expenses to track the data as it is being entered. Use category_daynumber as the dictionary key.

  3. Bank Account Reconciliation: You are a bank teller, and you need to reconcile a customer’s account transactions. The customer has provided you with a list of transactions they believe they have made, and you have a list of transactions recorded by the bank. Write a Python program that uses sets to:

    1. Identify the transactions that are common between the customer's list and the bank's list.
    2. Identify the transactions that are unique to each list.
    3. Flag any discrepancies between the two lists for further investigation.
    You should also sum up the totals for each category from the unionized group of transactions.
    customer = [
       {'id': '1012', 'date': '20240502', 'memo': 'salary deposit', 'amount': 5051.25, 'category': 'income'},
       {'id': '1013', 'date': '20240504', 'memo': 'supermarket', 'amount': 110.25, 'category': 'expense:food'},
       {'id': '1014', 'date': '20240505', 'memo': 'headphones', 'amount': 71.95, 'category': 'expense:household'},
       {'id': '1015', 'date': '20240506', 'memo': 'airfare', 'amount': 551.35, 'category': 'expense:travel'},
       {'id': '1016', 'date': '20240508', 'memo': 'rental car', 'amount': 176.34, 'category': 'expense:travel'}
    ]
    bank = [
       {'id': '1012', 'date': '20240502', 'memo': 'salary deposit', 'amount': 5051.25, 'category': 'income'},
       {'id': '1014', 'date': '20240505', 'memo': 'headphones', 'amount': 74.95, 'category': 'expense:household'},
       {'id': '1015', 'date': '20240506', 'memo': 'airfare', 'amount': 551.35, 'category': 'expense:travel'},
       {'id': '1016', 'date': '20240508', 'memo': 'rental car', 'amount': 176.34, 'category': 'expense:travel'}
       {'id': '1017', 'date': '20240509', 'memo': 'Wikipedia donation', 'amount': 25.00, 'category': 'expense:charity'}
    ]