Introduction to Algorithms
What's an Algorithm?
Definition - Algorithm: a set of instructions, or rules that, starting with initial data, solves a class of problems using precise operations, executed mechanically, without creative human intervention.
Cooking recipes could be thought of as algorithms. They enumerate a step-by-step procedure which achieves a task (the final dish) starting with initial data (ingredients) and asking you to execute relatively straight forward instructions. Yet, any of us who tried to use a cookbook know that 'straight forward' is such a relative term. Recipes tend to be ambiguous and could potentially be a nightmare for the beginner chef. So, then, what should we ask of a good recipe?
To qualify as an algorithm, the set of instructions should have some properties:
- Preciseness - Or non-ambiguity. Each step must be non-ambiguous and executable without creative intervention.
- Generality - It should solve a class of problems, not just one particular problem.
- Finiteness - The algorithm must finish in a finite amount of time. For all practical purposes it should finish in reasonable time (e.g. before we die).
- Correctness - The end result should be correct for all inputs.
Do recipes have the above properties? Supposedly they are correct, though some of your dinner guests might disagree (if they feed it to your dog while nobody's watching that might be a sign). They are also finite. What they lack is generality, since they cook one dish using fixed input. And what most of them lack is preciseness, which, on one hand makes them bad examples of algorithms, but on the other hand leaves way for creativity and uniqueness.
The word algorithm has interesting roots. Some say it comes from al-Khwārizmī, a Persian mathematician who lived at around 800 and wrote books on linear and quadratic equations. Other people say it is an anagram of logarithm, a mathematical operation which is an inverse of the exponential. In the old days there seemed to be little distinction between algorithms and 'math recipes'. For a long time the word was linked with Euclid's algorithm for computing the greatest common divisor. See The timeline of algorithms for more details. Also, for a more detailed introduction to the concept visit Algorithm. We'll move on to see how we can create and describe algorithms.
Flow charts are one way to describe algorithms (another way is pseudocode, but we won't be dealing with that one here). They consist of the following symbols:
As the name suggests, these boxes are used to compute things. They usually comprise mathematical computations. Examples:
- computes the expression and stores the result (in this case ) in .
- takes the old value of , adds to it and stores the result in variable , overwriting the old value. If was initially , the final value will be , after executing this block.
- negates the value stored in and assigns it to . In our case would be .
These boxes deal with the flow of data in and out of our algorithm. We can read (or input) initial values, as well as write (or output) the results. Examples:
- asks for a value to be entered and stores that value in . The 'R' is for 'read'.
- asks for two values, one to be stored in and the other in .
- writes (or outputs, displays) the value currently stored in . The 'W' is for 'write'.
A decision symbol is used to create a 'fork on the road'. It branches the execution depending on the result of the condition inside the box. If the expression is true, the execution of the algorithm continues on the right branch. If the expression is false the execution continues on the left branch. Examples:
- tests if is less than zero, when reaching this box in the execution of the algorithm. If is less than zero, we move on to the boxes on the right branch. If is greater than zero, or equal to zero, we continue on the left branch.
- tests if 's value is at least 's value plus . If so, we continue on the right branch. If not, we continue on the left branch.
We'll get a better idea about decision boxes when we design our first flow chart.
I wouldn't even call this a proper symbol, we could probably do without it just fine. Its only role is to set where the algorithm starts and where it ends. Here are, for now, the two terminators we will use:
- is where the flow chart starts.
- is where the flow chart ends.
Example one: Absolute value
We will now plunge into the hot waters of flow charts by designing our first one. And no, it's not the 'hello world' example, it's boring and so classical. Let us try to compute the following math function:
What function is this? Indeed, it's the absolute value, or modulus function. Let's build a flow chart that computes it:
To execute a flow chart, we start at the START symbol and follow the arrows. First thing we read , the number for which we want to compute the absolute value. Then we test if it's less than zero. If so, we follow the right branch, the branch, where we compute , the absolute value, as negative . If is greater or equal to zero than we follow the branch, making the value for . Variable holds the value of the function . In the end we display and we stop at the STOP symbol.
The flow chart implements the definition function exactly. One alternative is to display the result on each branch, without storing it in (which means we don't really need , we used it for extra clarity).
Example two: Sum of the first n integers
Having successfully survived the encounter with our first algorithm, let's go on with a second one, shall we? Let's compute the following summation:
In human language that is the sum of the first integers:
Number is the input of our algorithm, while number is the output.
One way to do this is to have a counter, a variable that counts from to ; let's name it . More precisely, will hold all values from to , one after another. For that, we create a loop in which is increased by on each iteration:
Remember, is interpreted as taking the old value of , then adding to it, and then storing it as the new value of . Therefore, this flow chart section will add one, to , in a loop. However, it will never end (and we know that is not quite an algorithm). To fix it, we have to add a condition to exit the loop when exceeds :
There is one more thing we need to do: our counter has to start somewhere. Let's initialize it with , the first number we need to add:
Now, that we have a counter which cycles through the numbers we need to add, all that's left to do is add it to our sum . The sum needs to be initialized, as well. What should its initial value be? Math teaches us that zero is probably a good candidate, since it is the neutral value for addition.
The counter has dual purpose, in our algorithm. First, it makes sure that the loop is executed times. Second, it is used in computing the sum. In general, counters control the number of times the loop gets executed, but they don't always take part in the computation, as in our case.
The summation variable, is an accumulator. It accumulates the result incrementally.
By this time I'd expect math geeks to jump up and down crying there's a simpler and more efficient way to do this. Indeed, there is a formula that computes the sum of the first n integers (derived from the general case of arithmetic progressions):
The flow chart for computing the sum becomes much simpler:
This is clearly a simpler and faster solution. There are two morals to this. First, we started with a more complicated solution to get our feet wet with a more complex example. Second, math is a powerful tool when building algorithms. Don't ignore it. Gauss certainly didn't. In elementary school his teacher, J.G. Büttner tried to occupy pupils by making them add up the integers from 1 to 100. The young Gauss produced the correct answer within seconds by a flash of mathematical insight, to the astonishment of all. Gauss had realized that pairwise addition of terms from opposite ends of the list yielded identical intermediate sums: 1 + 100 = 101, 2 + 99 = 101, 3 + 98 = 101, and so on, for a total sum of 50 × 101 = 5050. For more information, see Gauss' Biography and Gauss's Day of Reckoning.
To make everyone happy, including folks who might think "if he wanted a more complex example, why didn't he pick one that didn't have a trivial solution", imagine the problem of computing the product of the first integers, which mathematicians call the factorial:
Definition - Factorial:
Not surprisingly, the algorithm looks the same as our first solution, except the sign in becomes a ( is the symbol for multiplication) and has to be initialized to . There is no simple formula for computing .
This example introduces two common tools in algorithms: counters and accumulators. Let's try and define them.
Definition - Counter: a variable that keeps track of the number of times an event occurs. It is usually incremented by one at a time, and it is used frequently to count (and decide) the number of times a loop is executed.
Definition - Accumulator: a variable that holds a partial result and that aggregates values obtained incrementally along the computation process. The values are usually computed in a loop and they are usually added to the accumulator (though they could contribute in a more complex way). At the end of the process the accumulator contains the final result.
In other words, a counter counts, and an accumulator accumulates. What a surprise!
Solve the following problems: First Degree Equation, Second Degree Equation.
Absolute value of a number
Absolute value of a number - version 2
Finite counter counting from 1 to n
Sum of integers from 1 to n
Sum of integers from 1 to n
Table of Contents
1The Role of Algorithms in Computing5
1.2Algorithms as a technology11
3Growth of Functions43
3.2Standard notations and common functions53
4.1The maximum-subarray problem68
4.2Strassen's algorithm for matrix multiplication75
4.3The substitution method for solving recurrences83
4.4The recursion-tree method for solving recurrences88
4.5The master method for solving recurrences93
4.6Proof of the master theorem97
5Probabilistic Analysis and Randomized Algorithms114
5.1The hiring problem114
5.2Indicator random variables118
5.4Probabilistic analysis and further uses of indicator random variables130
IISorting and Order Statistics
6.2Maintaining the heap property154
6.3Building a heap156
6.4The heapsort algorithm159
7.1Description of quicksort170
7.2Performance of quicksort174
7.3A randomized version of quicksort179
7.4Analysis of quicksort180
8Sorting in Linear Time191
8.1Lower bounds for sorting191
9Medians and Order Statistics213
9.1Minimum and maximum214
9.2Selection in expected linear time215
9.3Selection in worst-case linear time220
10Elementary Data Structures232
10.1Stacks and queues232
10.3Implementing pointers and objects241
10.4Representing rooted trees246
12Binary Search Trees286
12.1What is a binary search tree?286
12.2Querying a binary search tree289
12.3Insertion and deletion294
12.4Randomly built binary search trees299
13.1Properties of red-black trees308
14Augmenting Data Structures339
14.1Dynamic order statistics339
14.2How to augment a data structure345
IVAdvanced Design and Analysis Techniques
15.3Elements of dynamic programming378
15.4Longest common subsequence390
15.5Optimal binary search trees397
16.1An activity-selection problem415
16.2Elements of the greedy strategy423
16.4Matroids and greedy methods437
16.5A task-scheduling problem as a matroid443
17.2The accounting method456
17.3The potential method459
VAdvanced Data Structures
18.1Definition of B-trees488
18.2Basic operations on B-trees491
18.3Deleting a key from a B-tree499
19.1Structure of Fibonacci heaps507
19.3Decreasing a key and deleting a node518
19.4Bounding the maximum degree523
20van Emde Boas Trees531
20.2A recursive structure536
20.3The van Emde Boas tree545
21Data Structures for Disjoint Sets561
21.2Linked-list representation of disjoint sets564
21.4Analysis of union by rank with path compression573
22Elementary Graph Algorithms589
22.1Representations of graphs589
22.5Strongly connected components615
23Minimum Spanning Trees624
23.1Growing a minimum spanning tree625
23.2The algorithms of Kruskal and Prim631
24Single-Source Shortest Paths643
24.1The Bellman-Ford algorithm651
24.2Single-source shortest paths in directed acyclic graphs655
24.4Difference constraints and shortest paths664
24.5Proofs of shortest-paths properties671
25All-Pairs Shortest Paths684
25.1Shortest paths and matrix multiplication686
25.2The Floyd-Warshall algorithm693
25.3Johnson's algorithm for sparse graphs700
26.2The Ford-Fulkerson method714
26.3Maximum bipartite matching732
26.5The relabel-to-front algorithm748
27.1The basics of dynamic multithreading774
27.2Multithreaded matrix multiplication792
27.3Multithreaded merge sort797
28.1Solving systems of linear equations813
28.3Symmetric positive-definite matrices and least-squares approximation832
29.1Standard and slack forms850
29.2Formulating problems as linear programs859
29.3The simplex algorithm864
29.5The initial basic feasible solution886
30Polynomials and the FFT898
30.2The DFT and FFT906
30.3Efficient FFT implementations915
31.1Elementary number-theoretic notions927
31.2Greatest common divisor933
31.4Solving modular linear equations946
31.5The Chinese remainder theorem950
31.6Powers of an element954
31.7The RSA public-key cryptosystem958
32.1The naive string-matching algorithm988
32.2The Rabin-Karp algorithm990
32.3String matching with finite automata995
32.4The Knuth-Morris-Pratt algorithm1002
33.2Determining whether any pair of segments intersects1021
33.3Finding the convex hull1029
33.4Finding the closest pair of points1039
34.3NP-completeness and reducibility1067
35.1The vertex-cover problem1108
35.2The traveling-salesman problem1111
35.3The set-covering problem1117
35.4Randomization and linear programming1123
35.5The subset-sum problem1128
VIIIAppendix: Mathematical Background
A.1Summation formulas and properties1145
CCounting and Probability1183
C.3Discrete random variables1196
C.4The geometric and binomial distributions1201
C.5The tails of the binomial distribution1208
D.1Matrices and matrix operations1217
D.2Basic matrix properties1222
A list of known bugs and errata may be found here. Please review this list before sending a new bug report. Reports of previously-unreported bugs, misprints, and other errata may be sent to email@example.com
As of the third edition, solutions for a select set of exercises and problems are available here in PDF format: Intro to Algorithms Solutions PDF.
This file contains the follow content:
Chapter 2: Getting Started
Chapter 3: Growth of Functions
Chapter 4: Divide-and-Conquer
Chapter 5: Probabilistic Analysis and Randomized Algorithms
Chapter 6: Heapsort
Chapter 7: Quicksort
Chapter 8: Sorting in Linear Time
Chapter 9: Medians and Order Statistics
Chapter 11: Hash Tables
Chapter 12: Binary Search Trees
Chapter 13: Red-Black Trees
Chapter 14: Augmenting Data Structures
Chapter 15: Dynamic Programming
Chapter 16: Greedy Algorithms
Chapter 17: Amortized Analysis
Chapter 21: Data Structures for Disjoint Sets
Chapter 22: Elementary Graph Algorithms
Chapter 23: Minimum Spanning Trees
Chapter 24: Single-Source Shortest Paths
Chapter 25: All-Pairs Shortest Paths
Chapter 26: Maximum Flow
Instructor's Manual and Figure Files
An instructor's manual is available to instructors who have adopted the book for course use. The manual is not available to students, or to those studying the book on their own. Instructors with access to the manual must agree not to share the manual's password, to make any of the solutions publicly accessible, on the Web or otherwise, or to make any of their own solutions publicly available.
Instructors using the MIT Press English language edition of the book may request access to the online instructor’s manual and figure file via the Instructor Resources link listed to the left under the Instructor Resources heading.
Instructors using the book in another language should contact the publisher licensing the book in that language.
The downloadable instructor's manual is updated periodically; it was last updated February 24, 2014.
Downloaded with the instructor's manual is a file of the figures/illustrations in the textbook and PDFs of pseudocode in the instructor's manual notes (not the solutions).
The clrscode3e package gives you pseudocode in the style of the third edition. You can download the package and its documentation here: clrscode3e package
Chapters 19 and 27 were removed from the second edition, and chapters 20, 26, and 28 were substantially revised from the second to the third edition. Those second edition chapters are available here.
Chapter 19 Bionomial Heaps
Chapter 20 Fibonacci Heaps
Chapter 26 Maximum Flow
Chapter 27 Sorting Networks
Chapter 28 Matrix Operations
Wondering about the professor names in the text? The jokes are explained here.
Request an Exam or Desk Copy
Use the Request Exam/Desk Copy link listed to the left under Instructor Resources
As of the third edition, this textbook is published exclusively by the MIT Press.
Please refer any inquiries to the appropriate representative of the MIT Press:
IN THE UNITED STATES AND CANADA, contact:
Michelle Pullano, Textbook Manager
The MIT Press
One Rogers Street
Cambridge, MA 02142
IN THE UK AND CONTINENTAL EUROPE, MIDDLE EAST, INDIA, ISRAEL, AND AFRICA, contact:
+44 (0) 20 7306 0603
The MIT Press
1 Duchess Street
London W1W 6AN
IN SOUTH AND CENTRAL AMERICA, AUSTRALIA, NEW ZEALAND, AND ASIA (EXCLUDING INDIA AND PAKISTAN), contact:
The MIT Press
One Rogers Street
Cambridge, MA 02142
IN INDIA AND PAKISTAN, contact:
PHI Learning Limited, Delhi, India