# Chapter 8 Incremental sorting

```
innovation("Incremental-Zacksort","E","Amortized Incremental-Zacksort")
dependency("Incremental-Zacksort","Zacksort")
innovation("Zackheaps","E","Symmetric (tie-handling) Quickheaps")
dependency("Zackheaps","Incremental-Zacksort")
innovation("Incremental-Frogsort","E","Amortized Incremental-Frogsort")
dependency("Incremental-Frogsort","Frogsort")
innovation("Frogsteps","E","Symmetric stable dictionary")
dependency("Frogsteps","Incremental-Frogsort")
```

So far we have looked at basic batch-sorting, where the set of sorted elements is known once the sorting starts. *Incremental sorting* deals with expanding sorted sets as data arrives, potentially in single elements (*Online sorting*).

## 8.1 Incremental insertion

The most known Online sorting algorithm is *Insertionsort*, which by default does this: integrating a single element in a sorted set. Unfortunately, Insertionsort has \(\mathcal{O}(N^2)\) cost, even if *binary search* is used to find the position in the sorted set at which to insert the new element (due to \(\mathcal{O}(N)\) move cost per inserted element). No algorithm is known that can expand a contiguous sorted set by a single element at only \(\mathcal{O}(\log{N})\) cost. All \(\mathcal{O}(N\log{N})\) incremental sorting algorithms somehow relax the requirement, that a sorted set is maintained in contiguous space at any time (after each single element expansion). Examples of relaxing the requirement of contiguous sorting is *indexing* such as *binary trees*. Examples of relaxing the requirement that the set is sorted at any time is *amortized sorting* which somehow collects the changes and does the real work in batches.

An example of a combined strategy is *Librarysort* (Bender, Farach-Colton, and Mosteiro (2006)), a variant of *Insertionsort* that keeps random gaps between the sorted elements (hence not contiguous). With enough random gaps inserting a elements requires only local moves and hence \(\mathcal{O}(1)\) insertion cost - like inserting a book in a book-shelf. Once the amount of gaps gets too low, physical rearrangement in a larger piece of memory costs \(\mathcal{O}(N)\) which amortizes to \(\mathcal{O}(N\log{N})\). In spite of everyone’s’ experience with book-shelves this algorithms was discovered as late as 2006, and it has the underappreciated ability to sort elements of varying size.

A classic algorithm for real-time application is the maintenance of a heap, which integrates a new element at cost of only \(\mathcal{O}(\log{N})\), but sacrifices contiguous sorting (limited usefulness) and it sacrifices total speed due to heavy random access.

## 8.2 Incremental divide&conquer

A generic strategy to obtain a amortized incremental *Divide&Conquer* algorithm is to convert it into a data-structure on which the algorithm can be paused and resumed. Once that is available, it is possible to run it with interruptions and inject the new data at that position where the algorithm naturally would pick it up first. With this strategy each batch-algorithm hat its incremental dual.

For example the incremental dual of *Quicksort* is the *Incremental-Quicksort* of Navarro and Paredes (2010) with \(\mathcal{O}(N\log{N})\) amortized cost. By applying the *greeNsort ^{®}* innovations one obtains a

*Incremental-Zacksort*

^{21}with \(\mathcal{O}(N\log{D})\) amortized cost, i.e. cheaper when duplicates reduce the number of distinct values such that \(D < N\). Navarro and Paredes (2010) turned

*Incremental-Quicksort*into a fully-fledged

*priority queue*named

*Quickheaps*. By combining with

*Incremental-Zacksort*one obtains

*Zackheaps*with efficient tie-handling

^{22}.

Of course *Quickheaps* and *Zackheaps* inherit all weaknesses of *Quicksort* (see Split&Merge Algorithms). The incremental dual of *Frogsort* – *Incremental-Frogsort* – by contrast is *stable*, can handle elements of varying size and has deterministic \(\mathcal{O}(N\log{N})\) amortized cost^{23}. Different from an *incremental Mergesort* that uses up to 100% buffer for maintaining a hierarchy of sorted runs, *Incremental-Frogsort* needs only 50% buffer. Different from *Librarysort* with its randomized gaps, *Incremental-Frogsort* has deterministic guarantees due to its *(F)lip (R)ecursive (O)rganized (G)aps (FROG)*.

Like *incremental Mergesort*, *Incremental-Frogsort* can be turned into a fully-fledged *sorted dictionary* named *Frogsteps*^{24}. Note that the maximum memory requirement of *Frogsteps* is lower with 150% (compared to 200%), but the average memory requirement is not lower: when growing the structure to the next duplication, the traditional method must allocate 2x the current data size, *Frogsteps* must allocate 3x the current data size. Furthermore *Frogsteps* needs an extra copy to migrate from the old to the grown memory, whereas the traditional method can *merge* into the grown memory, hence is always *nocopy*. But size duplication is a rare event in the lifetime of such a dictionary, during normal operation, *Frogsteps* benefits from reduced comparison and move-distance.

Such merge-based dictionaries do more work at insertion time than partition-based priority queues, and need less work for many queries, in other words: merge-based dictionaries have better read-amplification, partition-based priority queues have better write-amplification.

Such amortized data-structures and algorithms can further be developed into real-time indexes, such as *Lazy Search Trees* (Sandlund and Wild (2020)) or *LSM-Trees* (O’Neil et al. (1996)).

## 8.3 Incremental conclusion

The purpose of this section was to demonstrate that the *greeNsort*^{®} innovations are orthogonal to a large body of existing literature and code which refers to binary *Divide&Conquer* sorting. Potentially all of these existing artefacts of the prior-art could be checked and adapted for the symmetric *greeNsort ^{®}* innovations. This could duplicate the number of these artefacts. Marketing on:

*greeNsort*is a contribution to exponential growth in scientific publications. Marketing off: realistically,

^{®}*greeNsort*is a contribution to correcting and optimizing, rather than duplicating human knowledge, most importantly when it comes to textbooks on data-structures and algorithms. But still: a fundamental an huge tidy-up project in human knowledge. Plus: I announce yet another plan for a new incremental sorting algorithm.

^{®}