2. Problem Solving by Programming#
aka Performing Tasks with Computers
One of the great satisfactions with computer programming is solving real-world tasks - especially when others can use our creations.
One approach to creating programs is to apply procedural programming. In this paradigm, programmers create a list of instructions that computers must perform to solve a particular problem or perform a specific task. For right now, we only consider small programs. However, as the course progresses and our programs become more extensive, we will adopt more techniques to approach those problems.
In the Introductory Notebook, the first task was to open a web page from the Wayback Machine. Suppose instead that you needed to find the homepage from The News & Observer from November 11th, 2018. One way to begin to create an algorithm for this process is to utilize “The Seven Steps” [1] (The following description is inspired by “All of Programming” by Drew Hilton and Anne Bracy.)
An algorithm is a well-defined procedure (series of steps) that takes some input and produces some value or set of values (output).[2]
2.1. Work an Instance Yourself#
Step One is to work through one instance of the problem manually. Then, think through the various actions that need to occur and the order of those actions.
One way we could work the instance ourselves is to visit https://archive.org/. Once there, we could enter https://www.newsobserver.com/ as the URL to search in the Wayback Machine. This search produces a timeline and calendar from which we can select a particular page. So we click on 2018. Then we click on November 11th in the calendar and select one of the dates. Success! The page refreshes with this URL: https://web.archive.org/web/20181111065826/https://www.newsobserver.com/
If you are stuck getting started, look at the problem statement. What nouns exist? Often, these nouns form the inputs to the process, possible intermediary states of the computation(process), and the possible output. The verbs often state the computation(procesing) that occurs.
Nouns: web archive, URL, date, archived page
Verbs: enter, select
Also, realize that the problem may require intermediary steps and information not immediately evident. Over time, you will gain the experience to work through these issues.
You may also find yourself not having information to proceed. Sometimes this may be due to vagueness in the problem statement and you need to get clarification in terms of excactly what needs to be done. Other times, you may need to research the problem domain itself to gain a deeper understanding.
2.2. Write Down Exactly What You Just Did#
Step Two is to write these steps down using “pseudocode” - an informal, natural language (i.e., English, Chinese, etc.) description of how the program will work. Do not worry if you are unsure how this works yet - the course has plenty of examples.
1. Visit https://archive.org/ 2. Type https://www.newsobserver.com into the Wayback Machine and press "Enter" 3. Select 2018 from the timeline 4. Select November 11th and one of the dates from the calendar 5. Wait for the browser to respond with the archived page
As you write these steps down, make sure you explicitly state everything. Are there implicit steps/instructions missing? A computer cannot infer actions that our mind implicitly places into steps. For example, in the above steps, how do I visit that website (open a browser window and enter the URL into the address/search bar). Many computer programming books like to use examples of recipes or knitting patterns to demonstrate the sequences of steps to be followed.
2.3. Generalize your Steps#
Step Three involves generalizing our steps. Our initial solution works for a specific site and date but requires manual intervention. We need to search for ways to access the data programmatically. Often, sites provide an application programming interface (API) to access data. Performing a web search on “Wayback machine API” leads to this page: https://archive.org/help/wayback_api.php. In reading this page, we see that the Wayback Machine has an API endpoint with URLs that follows this example: https://archive.org/wayback/available?url=example.com×tamp=20060101
In other situations, you will need to research the problem domain to understand how something functions and how you can automate that process.
Cool. We now know how to access the Wayback Machine programmatically, but we still lack a general algorithm. We have something that works for a specific problem, but we want to find something that works for all instances of this problem - any site and date. This step examines why you performed your steps, what patterns may exist in the possible solution, and how to handle various inputs.
To generalize this solution, we need to pass the URL and date as inputs into the process. Right now, we can just have the user enter these two values.
The URL and date inputs function as parameters (arguments) to our algorithm - they are placeholders for the values that make up a specific instance of the problem being solved. These inputs also have expected types (e.g., a number, text, a type of object, etc.). At this point, your steps do not have to explicitly check if the inputs have unexpected types. If inappropriate values are passed to your algorithm, just assume that the algorithm stops when the value is first examined. Later in these notebooks, we will make our code more robust and perform the necessary input validation.
The generalized set of steps:
1. Have the user enter a website URL (website_url) 2. Have the user enter a date in the format of YYYYMMDD (search_date) 3. Create a URL of this pattern: https://archive.org/wayback/available?url=website_url×tamp=searchdate 4. Open a connection to that URL 5. Read the results of the URL 6. From the results, get the URL of the closest page. 7. Open a browser with the closest URL.
2.4. Test your Steps#
In Step Four, we then manually perform the steps to ensure we have a correct, working algorithm. We test these steps by varying the inputs to see how our algorithm reacts and whether it will produce the expected output. This example is relatively straightforward, but in more complex situations, we will want to have at least some assurance that our approach is sound.
We also need to consider potential error conditions. What happens if we enter an empty URL or date? What if the URL does not exist in the Wayback Machine? What happens if the date is in the wrong format? How does the program behave if our network connection is down?
These types of tests are considered corner cases, where the algorithm’s (system’s) behavior may be unexpected due to extreme or missing values. As another example, consider the traveling salesperson problem from the previous notebook. What happens if no cities are passed into the algorithm? One? A hundred? The “extreme” value of one hundred may cause the algorithm to appear as it will never produce a solution due to the amount of possibilities being considered. You may need to write specific steps to handle corner cases.
For all but the most straightforward situations, it is impossible to determine how many test cases are required: tests can demonstrate that defects exist but cannot prove that no defects exist. Such proof requires formal verification using models and logical reasoning. Hardware manufacturers are probably the leading users of formal verification in industry, but widespread use among software developers is practically nonexistent. As we discuss testing in later notebooks, we will look at different ways to determine the appropriate test cases.
Side Note: One of the fundamental questions within Computer Science has been to prove if an arbitrary program will halt for a given input. This problem, known as the Halting Problem, has been used to show other problems are equivalent to it and hence unsolvable. Proving the absence of all defects is one such problem.
The goal of these first four steps is to have performed sufficient planning to minimize and eliminate issues before we convert our steps into computer code and test that code. This plan is referred to as “design”. Our “requirement” was to develop a routine to search the Wayback Machine for a particular URL and date, and then to open a web browser with the closest URL available in the archive.
From a documentation perspective, the expected outcome of the first four steps should contain:
Inputs to the routine
Outputs from the routine
Pseudocode (steps in the routine)
Test cases - various input and output combinations to ensure the proposed steps work.
This “outcome” is the basis of an algorithm, which is a well-defined procedure (series of steps) that takes some input and produces some value or set of values (output).[2] Following the steps precisely in the algorithm will solve a particular class of a problem.
Pseudocode Guidelines: [3]
Use natural language statements that precisely describe specific operations
Avoid syntactic elements from the target programming language
Write pseudocode at the level of intent - document the approach to solve the problem
Write pseudocode at a low enough level that generating code from it is straightforward
2.5. Steps 5-7, Convert to Code#
In Step 5, we convert the pseudocode into the code of our target language.
In Step 6, we test the program to see if it works. If all of our tests pass, we are complete. However, if we find issues, we need to debug our program (Step 7) to discover those issues, which we will correct by circling back to Step 5.
As you increase your coding skills, much of this process will become second nature, and you can minimize/combine steps. However, you should always spend an appropriate amount of time planning out your solution. This can just be a few scribbles or notes for more straightforward problems. For more complex problems and large systems, you will spend a substantial amount of time researching the best approach and then documenting that approach.
2.6. IPOT Chart#
One way to document our algorithms is to an IPOT Chart. The acronym stands for input, processing, output, and testing of an algorithm. These elements can either form section headers within a document or as a table:
Algorithm Name | ||
Input | Processing | Output |
List inputs List other known values |
Processing steps (pseudocode) in order | Output/results of the processing (including any messages/formatting required) |
Test Cases Input and expected output(result) combinations validating the algorithm |
The test cases form a foundation for our algorithm by ensuring the algorithm’s behavior is fully documented and how the algorithm behaves under different conditions. Test cases serve as concrete examples that help programmers understand the algorithm’s functionality.
2.7. Suggested LLM Prompts#
Break down how we can load a file of historical stock prices for a company and find the overall rate of return as well as the average standard deviation with a computer program. Do not provide any code, but rather provide a guide and an outline on how to get started.
Use Drew Hilton’s approach that consists of work an instance yourself, write down what you did, generalize your steps, and test your steps to solve the same problem to load a file of historical stock prices for a company and find the overall rate of return as well as the average standard deviation.
How can an IPO chart assist in starting to write a program do this problem? IPO stands for input, processing, and output.
2.8. Summary#
As we solve problems with computer programs, our first step is to analyze the problem and find a generalizable algorithm to solve that problem. From that algorithm, we then write a computer program that implements that algorithm to solve the problem. The algorithms we write may include:
sequences of steps
repetitions/loops of steps
decisions as to which series of steps to execute.
As we document our algorithms, we should include the following elements:
Algorithm Name
Inputs
Outputs
Pseudocode (description of the steps to solve the problem)
Test cases (inputs and expected behavior/output)
2.9. Review Questions#
Describe the steps outlined in “The Seven Steps” approach for creating algorithms. How does this method facilitate problem-solving in computer programming?
How does Step Three in “The Seven Steps” involve generalizing the solution to a problem? Provide an example of how this step can lead to more versatile algorithms.
What is the importance of testing algorithms, as mentioned in Step Four? How do corner cases contribute to testing the robustness of an algorithm?
What is an IPOT Chart, and how does it aid in documenting algorithms? Create a generalized version for the travelling saleperson problem listed in the previous notebook.
Why do we need to explicitly document the steps in an algorithm? Create an IPOT chart for making a bowl of cereal.
What is an algorithm?
- A clear set of steps to solve any instance of a particular class of problems.
- A mathematical function that helps you solve a specific problem.
- An American politician who warns about climate change.
- The output of a program.
If you are unable to complete Step 1 of the Seven Steps, which of the following could be a viable path forward?
- Consult reference material for your programming language to find more useful library functions.
- Ask a mathematician to help you find the function that describes a sequence of numbers.
- Gain more domain knowledge for the particular problem's domain.
- Skip Step 1 and move on to writing code.
What are parameters?
- The assumed conditions under which an algorithm is written. If these are not met, the algorithm will not function properly.
- Lines of code, such as "if" which let the program make decisions.
- Symbolic placeholders in an algorithm for the values that specify what particular instance of the problem is being solved.
- Invariants of an algorithm, which describe mathematical relationships between variables that are always guaranteed to be true at particular places in the code.
What are corner cases?
- Combinations of parameter values that require special handling.
- Patterns that are difficult to find due to their non-linear nature.
- Syntax errors that crash the program.
- Triangular storage units.
2.10. References#
[1] Andrew D. Hilton, Genevieve M. Lipp, and Susan H. Rodger. 2019. Translation from Problem to Code in Seven Steps. In Proceedings of the ACM Conference on Global Computing Education (CompEd ‘19). Association for Computing Machinery, New York, NY, USA, 78–84. https://doi.org/10.1145/3300115.3309508
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition (3rd. ed.). The MIT Press.
[3] Steve McConnell. 2004. Code Complete, Second Edition. Microsoft Press, USA.