An **array** is, for our purposes, a list of numbers where each number has an index. We start indexing the numbers with 0. For example, in the array

[-19, 4, 5, 100]

we have that the number -19 is index 0, the number 4 is index 1, the number 5 is index 2, and the number 100 is index 3.

A **subarray** of an array is a subset of our list made of *consecutive elements* of the original list. For example, in

[-19, 4, 5, 100]

we have that

[4, 5, 100]

is a subarray but

[-19, 5, 100]

is **not a subarray** since the indexes here are 0, 2, 3 which are not consecutive. We include the entire array and the empty array as subarrays.

Given an array of integers, find the maximum value of the sum of a subarray.

Note that the empty subarray has value 0.

For example, give the array

[1,-3,2,3,4,-10]

The possible subarrays along with their sums are as follows:

sum | array ------------------------------ 0 | [] 1 | [1] -2 | [1,-3] 0 | [1,-3,2] 3 | [1,-3,2,3] 7 | [1,-3,2,3,4] -3 | [1,-3,2,3,4,-10] -3 | [-3] -1 | [-3,2] 2 | [-3,2,3] 6 | [-3,2,3,4] -4 | [-3,2,3,4,-10] 2 | [2] 5 | [2,3] 9 | [2,3,4] -1 | [2,3,4,-10] 3 | [3] 7 | [3,4] -3 | [3,4,-10] 4 | [4] -6 | [4,-10] -10 | [-10]

But, it was probably easy to see in the first place that the solution was just to use [2,3,4], since those were positive numbers sitting right next to each other. But not every array is as easy as this.

Notice the pattern here: in order to compute the sum of [2,3,4,-10], we *could* have used the information given by the sum of [3,4,-10] which, in turn, could have used the information given by the sum of [4,-10], and so on. The idea for the solution we're proposing uses something similar to this.

If going through all possible subarrays works, why don't we go through all of the possible arrays? Done. Problem solved. The issue with this is that the number of subarrays becomes extremely large as our original array gets larger. For a 2 element array, there are 4 subarrays. For a 10 element array, there are 56 subarrays. In general, for an $n$ element array, there are \[\frac{n(n+1)}{2} + 1\] possible subarrays, meaning that this algorithm will run in $O(n^{2})$ time. If you don't know what that means, just note that even an array of size 1000 (which is relatively small) will require over 500,000 computations. We can do a bit better than this. For a modest one-million element array, we would have to do 500,000,500,001 calculations — that's five hundred billion calculations.

In **Kandane's Algorithm**, which we'll describe formally in a second, the idea is that we'll cycle through the elements and look at, for each element, what the best possible subarray is which ends at that element, and we'll use all of the calculations we've done so far to figure that out.

In Python (or pseudocode, if you don't know Python), the algorithm is given by the following , where $A$ is our array.

```
def max_subarray(A):
max_ending_here = max_so_far = 0
for x in A:
max_ending_here = max(0, max_ending_here + x)
max_so_far = max(max_so_far, max_ending_here)
return max_so_far
```

Let's do one example to go through the algorithm. I'm going to be purposely verbose for those who are not used to analyzing algorithms. If you're a bit more adventurous, you may want to pick an array and go through the algorithm step-by-step before reading on.

Let's follow along with the algorithm above. Suppose we have an array

A = [1,-2,3,4,-5,6]

We initialize *max_ending_here* and *max_so_far* as $0$.

We are in a For-loop now. We start with 1 in $A$. This will essentially give us the maximum value for all subarrays ending with this element (which, in this case, is just the subarray [1]). We see that *max_ending_here* is now $0 + 1 = 1$, and therefore *max_so_far* is also 1. We have nothing else to do, so we go on.

Next, our element is -2. We now have *max_ending_here* is 0, the maximum of 0 and -1, and the *max_so_far* is still 1. Notice that we're still saying the best we can do is [1], and that we've tried [1,-2], but this gave us a negative answer so we cut it out. **Notice also that any array after this shouldn't include [1,-2] since it's equivalent to adding -1 to its value; it would be better to exclude it.** This last sentence is one key point to the algorithm, and why it runs so fast. We throw out subarrays if they're going to add negative values to a subarray.

It may be a bit clearer if we create the values of *max_ending_here* for each iteration of the For loop. I've made a table below which gives the value of the array as well as the value of *max_ending_here*.

We've already seen that the *max_ending_here* value for the second element is 0. Going onto element 3, we record that 3 is our *max_so_far* and continue. We get to 4. This is great. Our *max_ending_here* is now 7; notice that we're not explicitly saying that the max subarray is [3,4], just that the maximum value for the sum of any subarray ending at 4 has the value 7. This is a minor but important distinction.

Next, we get to $-5$, but notice that our *max_ending_here* is not 0, it's 2. This is using our previous *max_ending_here*, at the previous element, to see that the best we can do by tacking on the $-5$ is 2. This should be relatively clear, as it's taking our previous max-so-far value (7, given by [3,4]) and tacking on -5 to this. Unfortunately, this doesn't give us a new *max_so_far* since 7 tops it. We'd be better off using [3,4] at this point than [3,4,-5]. But, because it's always possible for the next value to shoot up our sum, we keep the value of *max_ending_here* as 2. Notice that our *max_so_far* doesn't change to 2 since, as far as we know, 7 is the best value we've got.

Last, we get to add 6 on, which gives us a *best_ending_here* equal to 8. This is also the best we can do. We're out of elements at this point, the For loop ends, and we return 8.

Notice that there is a significant amount of power contained in the line with *max_ending_here* which is perhaps not obvious at first glance.

Notice here that we have essentially trimmed the subarrays we've checked out to the following:

[1] [1,-2] [3] [3,4] [3,4,-5] [3,4,-5,6]

Which is significantly fewer than we had to check before. In fact, Kadane's Algorithm is $O(n)$, which is significantly better than our quadratic, brute solution.

As an exercise, one should think about the same problem but with a matrix. The algorithm can be modified (it's actually a fairly small modification) to look at each element and the "best-so-far" submatrices which contain that element as the element in their lower-right-corner (basically, the "ending" element).