It has been a very long time since I have written any posts because I’ve been doing a ton of competitive programming and school work. However, I am going to start a series on mostly advanced algorithms for competitive programming. The first topic I am going to cover is range updates and queries!
In this series of blog posts, I will try to explain general concepts, not minor details unless they are important. Most of the code will be available on my GitHub here. If you do not understand a concept or key term, just use your favorite search engine. They shouldn’t be very hard to find.
Range updates and queries are part of many coding problems, as they can be used in very, very many different ways. Some simple examples are querying for the sum, max, or min value in some range from l to r. L and r represent the interval or segment. Many different updates can also be applied to these intervals to modify the elements in that range. I will skip the naive implementations where you loop through all of the elements from l to r and update or access them, and directly go to discuss the advanced data structures needed, and how they can be used. These data structures provide much better time complexity than the naive solutions.
I will also cover some tricks that aren’t necessarily advanced data structures, but they have to do with range queries/updates. This post will be slightly easier, while Part 2 will offer some harder data structures.
Note that this post might get a bit long… but let’s start!
I will represent intervals using l and r (left and right). Indexes will usually be represented by i, j, or k. The square brackets denote accessing an array. For example arr[i] means accessing the element at index i in the array arr. Querying or updating the data structures to be discussed will use query() and update() with some parameters in those parentheses. Time and space complexity is represented using Big O notation. Finally, range, interval, and segment will be used interchangeably. They should represent the same idea most of the time in this post.
I want to start off with some simple concepts regarding solving problems on ranges or intervals. These are immensely useful in many problems even though they are simple.
Let’s represent each interval as a start event and an end event. Then sort all of these events for all of the intervals and go through the events in a sweeping line fashion. This can be used to find how many intervals are covering a point or for merging ranges. Usually, a counter for how many intervals are active (encountered its start event, but it hasn’t ended yet) is used. Interval sorting can also be used for some geometry problems that deal with lines, line segments or (convex) polygons.
When we need to quickly add some value to each element in an interval (specifically, in O(1) time), we can mark the start and the end of the interval by adding a value to array[start] and the negative of that value to array[end + 1]. This marks that a whole range/interval is incremented by that value we placed as a marker. The negative of that value signals the end of that interval. When all of the intervals are marked, the prefix sum can be calculated on that array of marked intervals. The prefix sum at each index is the final sum at that index after all of the intervals are processed.
Here is an example:
Before: 0 0 1 0 0 -1 0
After Calculating Prefix Sum: 0 0 1 1 1 0 0
Now what if we had many (unsorted) points instead of intervals? Our goal is to merge all immediate neighbors for each point into one continuous segment. The points can be very spread out, so using an array for the exact positions of the points is very wasteful. One easy way is sorting the points and then going through to find nearby points with contiguous indexes. Another way is using a disjoint set union data structure such as Union-Find. However, a possibly easier and slightly faster way is using Hash Tables. And no, I did not make a mistake! Each point is inserted into the Hash Table in arbitrary order, and if there already exists a point in the Hash Table that is directly next to the inserted point, the points are merged. Each inserted element has a key that is its index, and the length of the segment it is part of. When the new point is not part of any segment, the point is inserted with the segment length of 1. If it links two segments, we only update the beginning point of the left segment and ending point of the right segment to a new length that includes both segments. If the point inserted is at the start or at the end of a segment, the length is the size of that segment plus one, and both that point and the point of the opposite side (start -> end, end -> start) is updated with the new length. We do not need to update any points in between the starts or ends of the segments because they will never come into contact with any newly added points. This algorithm is O(N) time.
. . . x n x . . .
. . x x n . . . .
. . . n x x . . .
. . . n . . x x .
where x is an existing element and n is the new element
This is useful for data structures when the range of the indexes can be huge. For example, modifying an element at index 1 and at index 1,000,000 requires an array of at least 1,000,000 elements, which is very wasteful. We can just sort the elements and when we need to access an element, we can binary search for it and use the result of the binary search as the index. A Hash Table can also be used for finding the compressed index from the original index. Note that coordinate compression is necessary for many problems that deal with harder data structures!
Binary Index Tree or Fenwick Tree
Binary Index Tree (BIT) or Fenwick Tree is very commonly used and a very simple algorithm for calculating prefix sums. A prefix sum of an index is the sum of all elements up to that index in an array, inclusive. This gives us a very simple formula for calculating the sum of all elements in a range: query(r) – query(l – 1). Query(i) gives the prefix sum of some index i. What about updating? BIT supports single element modifications such as adding some value to a single element or setting that element to a new value. All of this can be realized in O(log N) time. Very efficient!
A BIT is represented as an implicit tree. It uses an array and manipulates the indexes to do query and update operations. Each node in the tree stores the sum of its entire subtree, and update or query operations traverse only log n nodes in the tree. The space complexity is O(N), where n is the number of elements.
BIT can still do more! Not only can single element update and range query be accomplished, range update and single element query can be implemented. The modification is very simple. We want to add to each element in an interval, and query for the element at a specific index. To do this, a range’s start and end index is marked using the same exact update function. Querying the prefix sum of an index is actually querying the value at that index. This is basically the dynamic version of placing markers in an array, as the value at each node can be queried or updated at any time.
What about the reverse of the query operations discussed so far? Can the index be found from some query value? The answer is yes, but with one catch: the prefix sums must be monotonically increasing, so elements cannot be negative! That means that the range update, single element query method cannot be used with this. There are two ways to answer this query. One way is binary searching for a guess of the index and checking the prefix sum of that guess to see if the binary search should go up or down. This yields a time complexity of O(log^2 N). Another way is to directly search the BIT by traversing it, yielding a complexity of O(log N). Both the largest and the smallest index where the prefix sum equals that query value can be located.
BIT provides a nice way for implementing sets. Adding a number means placing a 1 at that number in the BIT. Removing a number means subtracting 1 from that number in the BIT. Note that the numbers are used as indexes. Querying for the number of elements less than a number is just calculating the prefix sum up to that number. Finding the k-th largest or smallest number can be accomplished using the reverse of the query operation. If the set’s numbers can be very large or very small, coordinate compression can be used to save space.
A BIT can be dynamically updated while going through something. This adds another dimension to the queries/updates. For example, the classical problem of finding inversions in an array can be solved using a BIT. An inversion is two elements such that arr[i] < arr[j] and i > j. Using BIT as a set, numbers can be added into the BIT and for each number, we can check how many numbers are greater than that number in the BIT. The BIT is persistent over all of the numbers. so the only numbers in the BIT when a certain number is being processed are the numbers that appear before the currently processing number.
My general template for BITs is available here.
I think I’m going to end it here, as this post is getting a bit too long. Don’t worry, the post on Segment Trees and Square Root Decomposition is coming up next! Thanks for reading!