If you haven’t read the previous part 1, please do so, or else some things will not make sense in this blog post! As always, if you do not understand something, just search it up. There are many sources online that do in-depth explanations much better than me. Also, my code is available here on GitHub.
Let’s get started with some cool data structures and algorithms!
Sqrt (Square Root) Decomposition
Imagine we have N numbers. We wish to partition these numbers into blocks in some way to improve range query and update speed. Let’s say we want to calculate the sum of all numbers in a range and also be able to increment all of the elements in a range by some value. Why do we need to partition those numbers into blocks? The answer is because we can lazily update or query a block by keeping the result for the entire block as a separate value for fast access. The theoretically best value for the size of the partitions is sqrt(N). It offers a balance between the number of blocks and the number of elements in each block. Note that sqrt(N) is rounded to get an integer block size.
For every query or update, the range of the update or query could fully include multiple blocks and also partially overlap with one or two blocks. The blocks that partially overlap with the range can be the first and the last block covered by the range. To find the sum of all of the blocks, the blocks that are fully covered by the range can be easily accessed using each block’s precalculated sum. A naive algorithm that loops through the elements covered by the interval in the first and last block can be used. When updating, the procedure is similar, but the sums and lazy tags for each block needs to be changed, too. The time complexity is O(sqrt(N)) for both operations.
If the value to be queried is more complicated, such as the number (count) of some element x, then an array, Hash Table, BIT, or some other data structure can be used for each block. However, using BIT or another type of tree can increase the query time complexity by O(log N).
The idea for Sqrt Decomposition can be applied to other problems, too. For example, it can provide an O(sqrt(N))) way to find some node’s ancestor in a tree. More information on that will have to wait until I write a post on trees.
My template for Sqrt Decomposition is available here.
Another way to partition an array of numbers is by using powers of two. For each query, the query segment can be greedily cut up into segments that have lengths that are powers of two. The precomputed value for each of those smaller segments can be combined to calculate the final result. Of course, the most important part is precomputing values for each exponent (power of two) at each index. Dynamic programming can be used to calculate that in O(N * log N) time by combining previously calculated segments. When querying, the time complexity is O(log N), because the longest precalculated segments are greedily selected. Of course, updating will require some parts of the DP array to be recalculated, so this method is only good for static queries.
Segment Tree is a generalization of BIT that can handle both range query and range update. It is much more flexible than BIT, although it is slightly harder to write. I will cover the vanilla Segment Tree, lazy propagation, some ideas of “Segment Tree Beats,” and persistent Segment Tree. I will not discuss the exact implementation as there are many sources online. Some code for these data structures can also be found on my GitHub.
I will sometimes call the segments nodes, as each node represents a segment.
The classical segment tree implementation allows for range queries and point updates. Unlike BITs, the query can be almost any binary function. Sum, max, min, GCD, LCM, AND, XOR, etc. are all acceptable functions. The update operation can be set, increment, max, min, etc. Note that the segment tree is saved within an array of size 4*N, so it takes up linear space. The time complexity is O(log N) for both query and update. If necessary, each node can save multiple values.
One great thing about the vanilla version is that each node in the segment tree can be some other data structure. The data structure of each node can be an array, a Hash Table, another Segment Tree, a BIT, or some sorted tree structure. If an element is only saved/referenced within the nodes whose segment includes that element, then the space complexity is only O(N * log N). That is because each element can only belong to at most log N nodes in the tree. Using a structure with log N access time at each node, the query operation is O(log^2 N). While updating an element, every node containing that element must be changed, also yielding O(log^2 N) time.
Using Segment Tree with a sorted array (static, O(log N)) or tree structure (dynamic, O(log^2 N)) can answer queries that ask for how many elements are less than or greater than a threshold. That can be accomplished using binary search or just traversing the trees saved at the nodes that are covered by the range queried depending on the data structures used. This is commonly called a merge sort tree due to its likeness with the merging operation in merge sort. However, the data structures at each node shouldn’t actually be merged because that takes too long.
Sometimes, each node does not need to save all of the data that resides within its segment. A queue or another data structure at each node can allow that node to save a subset of the data of its children by dropping a few while merging, to maintain a constant queue size. This can be used to query for a few of the largest or smallest values at once. Note that if the queue size is too big, then merging the queues during the update/query operations will take too long.
Here is my general template.
This method, which is sometimes also known as lazy tag, allows range updates in O(log N) time. Each node must have a lazy tag value that will only be propagated to its children nodes when necessary. The lazy tags indicate that some value needs to be pushed down to a node’s children when necessary. Otherwise, too many nodes will have to be visited during the update operation. Both query and update operations have to propagate the lazy tag to the children and update the nodes with the lazy tag values, unlike vanilla Segment Trees. There is a downside to having lazy tags: the value at each node must be able to be calculated without looking at the children first. That means that updates that have a condition, such as max or min are much harder to implement. When updating a node or evaluating a lazy tag, the node might need to factor in the entire range it represents.
My general template is available here.
Isolating the Index
In some problems, the update equations is not as simple as modifying everything equally within a range. Usually, there will be an equation for how to update everything within a range based on the index of the updated item. An important step in solving these types of problems is factoring out the index for each update in the equation. That index can be computed and incorporated into the segment tree for the updates. More generally, if an equation is given for queries and updates, the equation can usually be manipulated to be calculated by saving one or more different values in the segment tree. The challenging part is figuring out how to manipulate the equation and how to handle lazy propagation, if necessary.
Traversing the Tree
A Segment Tree can be traversed just like any binary tree. One use for this is finding the element with a specific prefix sum in O(log N) time. For each node going to the left child means that the current prefix sum is too large, otherwise we go to the right child and add the left child’s sum. If the segment tree is used to keep track of empty and filled slots, then traversing the tree can help find the first or last occurrence of a contiguous interval of filled or empty slots. To do this, each node must keep track of the longest contiguous interval it contains and the longest prefix/suffix intervals. This is necessary when merging two intervals to calculate the 3 values for the parent node. The max could be the sum of the longest suffix of the left child and the longest prefix of the right child.
Like the BITs, binary search and repeatedly querying the segment tree can also be done, but it is slower than actually traversing the tree.
This was a cool technique I read about, which adds to the lazy tag and return conditions when updating. The goal is to do some conditional updates, such as max, min, or mod, along with lazy propagation. The condition to place the lazy tag (if a segment is completely in the update range) and return (if the segment is completely outside of the update range) can be modified to take into account extra information stored in the Segment Tree to realize some harder updates.
One example is calculating the mod of each element in a range by some other value. The segment tree needs to contain two extra fields: the max and the min value for each segment. The return condition is simple. If a node’s max value is less than the value to mod by, then there is no reason to go any further. If the max value and the min value for a node equals, then that segment has all of the same values and it should be bulk updated to be modded the mod value.
To do min or max updates, where each node gets set to max(v, x) or min(v, x) (x represents a update parameter and v represents the value for a certain segment), the second largest value or min value also needs to be saved. The number of elements with the max value or min value for each segment is also very important in determining how much to change the values at each node by and they should be kept. Let’s discuss min updates first. The return condition for min updates is if the max value in that segment is less than or equal to x. The tag/update condition is if the second largest value is less than x. That means that there will only be an update if the only value in a segment that is greater than to x is the max value. All of the max values will be changed to x afterwards due to the min operation. For max updates, the process is similar, except with max and min swapped.
Persistent Segment Tree
Just like any other persistent data structure, persistent Segment Trees keep immutable elements instead of elements that can be directly changed after every update. This provides a source control option that can be used for many different purposes. However, that does not mean that the Segment Tree cannot be efficiently updated. In fact, the time complexity is the same compared to regular Segment Trees! Lazy propagation can also be applied to persistent Segment Trees for range updates.
Let’s go over the general ideas of persistent Segment Trees. Each update operation should only modify around log N nodes or segments, or else we cannot keep the O(log N) time complexity. Also, instead of using a 4 * N sized array to represent the tree, actual nodes are used. These nodes can be represented by actual class objects, or allocated in a 8 * 31 * N (slightly more than N * log N) sized array. The array method is preferred because it is practically faster than using objects. Instead of changing each individual node’s value, new nodes are created. After updating, a pointer (some index in the large array or an actual object) to the root node needs to be returned so it can be kept for later. That pointer acts as a version ID for the state of the Segment Tree. If the tree is changed again, a new pointer (version ID) is created. The query operation mostly stays the same. When querying and updating, the version needs to be passed in along with other parameters. The query will only access the elements in that version, and the update operation will only build upon that version’s state. Lazy propagation just requires lazy tags to be stored along with a Segment Tree’s nodes’ values, similar to lazy propagation on regular Segment Trees. The time complexity is O(log N) and the space complexity is O(N * log N) for either algorithm.
The versioning system on persistent Segment Trees can be applied to some less obvious cases. For example, finding the number of elements less than or greater than a threshold x in a range, l to r, can easily be accomplished using persistent Segment Trees. By borrowing some ideas from range queries with BITs, the number of elements less than x in a range is just less_than_x(r, x) – less_than_x(l – 1, x). Here, less_than_x returns the number of elements less than x from indexes 0 to i (l – 1 or r). The actually calculate less_than_x, we can imagine each element in the array being queried as a Segment Tree. Each Segment Tree represents the frequency of each value that appears in the array from 0 up to its index. Querying for the number of elements less than x (in range 0 to i) is basically querying for the sum up to index x in the Segment Tree at index i. The last problem is how we should build the N Segment Trees. Because we know each neighboring pair of indexes in the array only differs by at most log N nodes in their Segment Trees, persistent Segment Trees can be used create a new version of the tree at each index in the array that is based on the version at the previous index. This algorithm only consumes O(N * log N) time, compared to the O(N * log^2 N) algorithm that uses merge sort trees.
Another interesting use for persistent Segment Trees is for solving some queries on n-ary trees (trees that can have any number of children nodes). This is very similar to using persistent Segment Trees on a linear array. In this case, we need a Segment Tree at each node in the actual tree. Since each node only differs by around log N nodes to that node’s parent, each node can be represented by a version in the persistent Segment Tree. Of course, the tree needs to be rooted first if a root is not provided.
Persistent Segment Trees also allows us to swap nodes in a range from some version A to nodes in some other version B. This is very simple, as the pointers in the range in version A are updated to the pointers in version B.
Vanilla persistent Segment Tree: here.
Persistent Segment Tree with lazy propagation: here.
Well, that is it for my ranges/intervals posts! I hope you learned something! Stay tuned for my next article and thanks for reading!