Chapter 1 Introduction

1.1 Document-Structure

1.2 Motivation&Scope

Sorting is a fundamental operation in IT, in my eyes even more fundamental than hashing, see the empirical tests in Oehlschlägel and Silvestri (2012). Sorting emits CO2 by consuming electricity during operation and by requiring certain hardware that embodies CO2. The idea of greeNsort® is to bring CO2 savings to thousands of pieces of software and to billions of devices without a need to re-write software or a need to produce new hardware. The idea of greeNsort® is to reduce the CO2 during sorting and to reduce the required hardware, and to develop better sorting-algorithms that are so simple, that they can easily be implemented and replace inefficient algorithms that today are found in many central libraries of popular programming languages.

The core scope of greeNsort® is simple general binary sorting in contiguous space with Divide&Conquer algorithms that can be described by recursion. Why general? Because I don’t want to optimize just a special case, there are already some very efficient sorting-algorithms for special cases, for example radix-sorting algorithms. Why simple? Because simplicity enables quick low-cost implementation. Why binary? Because binary is simple, teaching sorting starts with binary and ends with k-ary; there are already some very efficient k-ary sorting-algorithms, but those are more complicated and pay-off only on hardware with particular properties. Why contiguous space? Because contiguous space is simple and algorithms operating in contiguous space can be designed to have good cache-properties; algorithms that operate in block-managed-memory1 can reduce hardware-memory requirements, but they are more complex and introduce random-access on block-level. Why Divide&Conquer? Because Divide&Conquer guarantees that the costs of sorting are limited at \(\mathcal{O}(log{N} \cdot f(N))\), for example \(\mathcal{O}(log{N} \cdot N)\) in the random-access-model. Why by recursion? Because recursion is an important mental method for developing algorithms, teaching algorithms, understanding algorithms and analyzing algorithms2.

Having said this, out-of-scope are for example in-place-merging such as Grailsort (complicated and slow), algorithms such as Funnelsort that leverage the continuous van-Emde-Boas-Layout (k-ary, complicated and not practical for N that is not a power of two), Lazy-Funnelsort (k-ary, not contiguous) or (I)n-place (P)arallel (S)uper (S)calar (S)ample(So)rt (IPS4o) (k-ary, complicated, not contiguous, not completely general, not yet stable). All of these have been academically successful, none of these has found widespread adoption in core-libraries. greeNsort® does not claim to beat these algorithms in their domain, although I expect that joining forces could create synergies, for example result in a stable IPS4o with still excellent CO2-properties.

1.3 Perspective on Innovation

greeNsort® is a web of interrelated intangible innovations with the intention to forever alter the way how sorting is taught and used. The changes are fundamental and deconstruct many prior art beliefs that stand since 1945 resp. 1961, hence from an academic perspective, it could be appropriate to classify greeNsort® as a Paradigm Change in the sense of Kuhn (1962). But could it be a Leapfrog Innovation? I believe greeNsort® is a leapfrog innovation, in spite of three aspects of leapfrog innovations that I believe are myths about leapfrog innovations. Let’s begin with common ground: Leapfrog innovations are innovations that change the world.

1.3.1 Are leapfrog innovations tangible products?

A common myth about leapfrog innovations is that they are tangible products. For example the list of published in the Atlantic The 50 Greatest Breakthroughs Since the Wheel mentions almost exclusively tangible innovations (45), with the exception of three borderline cases (paper money, electricity and the internet) and two clearly intangible innovations (alphabetization, the Gregorian calendar). Here is a counterexample missing from the list without the world today would not be possible: the logarithm, I could list hundreds of such intangible innovations that shaped the world as we know it today. Tangible innovations are just the visible part of the iceberg, they are born out of an interrelated web of intangible innovations that made tangible innovations possible. Leapfrog innovations often are intangible ideas that change the world.

1.3.2 Do leapfrog innovations create profitable markets?

The next myth about leapfrog innovations is that they create profitable markets and that they make private inventors rich. Here is a counterexample that that changed peoples lives towards more wealth: the invention of the three-field system did not create a market or made its inventor rich. The knowledge about three-field system was simply a public good. Leapfrog innovations, particularly intangible leapfrog innovations often create public goods and their inventors remain poor. Leapfrog innovations often have no business model and require upfront sponsoring or meritocratic rewards afterwards.

1.3.3 Are leapfrog innovations monolithic?

The last myth about leapfrog innovations is that they are single impressive innovations like a flashing idea of a genius. Here is a counterexample: the iPhone, not technically new, was rather a design driven, symbolic innovation, see Steffen (2010). Leapfrog innovations can result from a web of interrelated smaller innovations, where each of the smaller innovations does not look convincing, but in combination they change the rules of the game. As the iPhone did. And as greeNsort® can do, if supported.

1.3.4 The lackmus test

In 1998 John Chambers from Bell Labs was awarded the ACM Software System Award for The S system, which has forever altered how people analyze, visualize, and manipulate data. After the commercial S+ implementation of S we got the free open-source R interpreter, we got libraries that allow S-style statistical analysis in Python, and we got Julia, a language that tries to combine the best of many worlds (C++, R, Python, Matlab). Except for SQL, virtually anyone today doing statistical analyses is using one of R, Python, Julia3. I consider the S system a leapfrog innovation, although it is intangible, largely free and open-source and not a single monolithic invention. If you agree on this, you might well consider greeNsort® a leapfrog innovation (in line with its logo): greeNsort® is intangible, designed for the public good and is a composed web of interrelated innovations, that together enable more efficient, more sustainable algorithms.

1.4 Innovation-Taxonomy

In order to guide the reader through the greeNsort® web of innovations, I use the following symbols to classify the innovations:

Table 1.1: Innovation-classes and -counts
Type Symbol Level Count
paradigmatic change (*) implications beyond sorting 21
paradigmatic change D Definitions 1
paradigmatic change C Convictions (Core beliefs) 12
patentable methods B Bold techniques 20
patentable methods A Algorithms 21
patentable methods E Extended algorithms 9
patentable methods F Further algorithms 8
prior art T useful Techniques 12

Each section of this report that describe innovations is preceded by formal entries of those innovations and their dependencies to prior innovation or prior art. For the sake of simplicity - although the classification allows “Extended algorithms” - I have refrained from adding tuned versions of the algorithms to the dependency net. The complete web of innovations can be visualized as a (D)irected (A)cyclic (G)raph (DAG):

Figure 1.1: Dependencies of innovations (red), core-beliefs (green), methods (blue), algorithms (black), prior art dependencies (grey)

The greeNsort® innovations will be introduced step-by-step in the following sections.

  1. although with Jumpsort and Walksort greeNsort® has two particularly efficient algorithms that use random access to blocks of data (not strictly contiguous)↩︎

  2. notwithstanding the fact that each recursive algorithms can be re-written in loops↩︎

  3. Often using Jupyter notebooks: Project Jupyter’s name is a reference to the three core programming languages supported by Jupyter, which are Julia, Python and R, and also a homage to Galileo’s notebooks recording the discovery of the moons of Jupiter↩︎