From 0031f748d229a8926ee8bda39acfcdecf5ec9e8e Mon Sep 17 00:00:00 2001
From: William Clodius <w.clodius@icloud.com>
Date: Sat, 10 Apr 2021 21:23:43 -0600
Subject: [PATCH 01/18] Start the addition of the module stdlib_sorting

Added the draft markdown documentation file, stdlib_sorting.md.

[ticket: X]
---
 doc/specs/stdlib_sorting.md | 526 ++++++++++++++++++++++++++++++++++++
 1 file changed, 526 insertions(+)
 create mode 100644 doc/specs/stdlib_sorting.md
diff --git a/doc/specs/stdlib_sorting.md b/doc/specs/stdlib_sorting.md
new file mode 100644
index 000000000..b843b89be
--- /dev/null
+++ b/doc/specs/stdlib_sorting.md
@@ -0,0 +1,526 @@
+---
+title: Sorting Procedures
+---
+
+# The `stdlib_sorting` module
+
+(TOC)
+
+## Overview of sorting
+
+The sorting of collections of data is useful in the analysis of those
+collections.
+With its absence of generics and limited polymorphism, it is
+impractical, in current Fortran, to provide sorting routines for
+arbitrary collections of arbitrary types of data.
+However Fortran's arrays are by far its most widely used collection,
+and arrays of arbitrary types of data can often be sorted in terms of
+a single component of intrinsic type.
+The Fortran Standard Library therefore provides a module,
+`stdlib_sorting`, with procedures to sort arrays of single intrinsic
+numeric types, i.e. the different kinds of integers and reals.
+
+## Overview of the module
+
+The module `stdlib_sorting` defines several public entities one
+default integer parameter, `int_size`, and three overloaded
+subroutines: `ORD_SORT`, `UNORD_SORT`, and `ORD_SORTING`. The
+overloaded subroutines also each have seven specific names for
+versions corresponding to differend types of array arguments.
+
+### The `int_size` parameter
+
+The `int_size` parameter is used to specify the kind of integer used
+in indexing the various arrays. Currently the module sets `int_size`
+to the value of `int64` from the intrinsic `ISO_FORTRAN_ENV` module.
+For many applications a value of `INT32` would be sufficient for
+addressing and would save some stack space for the subroutines,
+
+### The module subroutines
+
+The `stdlib_sorting` module provides three different overloaded
+subroutines intended to sort three different kinds of arrays of
+data:
+* `ORD_SORT` is intended to sort simple arrays of intrinsic data
+  that have significant sections that were partially ordered before
+  the sort; and
+* `ORD_SORTING` is intended to provide indices for sorting arrays of
+  derived type data, based on the ordering of an intrinsic component
+  of the derived type.
+* `UNORD_SORT` is intended to sort simple arrays of intrinsic data 
+  that are effectively unordered before the sort; 
+
+#### The `ORD_SORT` subroutine
+
+`ORD_SORT` is a translation of the [`rust sort` sorting algorithm]
+(https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs)
+which in turn is inspired by the [`timsort` algorithm of Tim Peters]
+(http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
+`ORD_SORT` is a hybrid stable comparison algorithm combining `merge sort`,
+and `insertion sort`. It has always at worst O(N Ln(N)) runtime
+performance in sorting random data, having a performance about 15-25%
+slower than `UNORD_SORT` on such data. However it has much better
+performance than `UNORD_SORT` on partially sorted data, having O(N)
+performance on uniformly increasing or decreasing data.
+
+The [`rust sort` implementation]
+(https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs)
+is distributed with the header:
+
+    Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
+    file at the top-level directory of this distribution and at
+    http://rust-lang.org/COPYRIGHT.
+
+    Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
+    http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
+    <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
+    option. This file may not be copied, modified, or distributed
+    except according to those terms.
+
+so the license for the original code is compatible with the use of
+modified versions of the code in the Fortran Standard Library under
+the MIT license.
+
+As with `timsort`, `ORD_SORT` is a stable hybrid algorithm.
+It begins by traversing the array starting in its tail attempting to
+identify `runs` in the array, where a run is either a uniformly
+decreasing sequence, `ARRAY(i-1) > ARRAY(i)`, or non-decreasing,
+`ARRAY(i-1) <= ARRAY(i)`, sequence. Decreasing sequences are reversed.
+Then, if the sequence has less than `MIN_RUN` elements, previous
+elements in the array are added to the run using `insertion sort`
+until the run contains `MIN_RUN` elements or the array is completely
+processed. As each run is identified the start and length of the run
+is then pushed onto a stack and the stack is then processed using
+`merge` until it obeys the stack invariants:
+
+1. len(i-2) > len(i-1) + len(i)
+2. len(i-1) > len(i)
+
+ensuring that processing the stack is, at worst, of order `O(N
+Ln(N))`. However, because of the identification of decreasing and
+non-decreasing runs, processing of structured data can be much faster,
+with processing of uniformly decreasing or non-decreasing arrays being
+of order O(N). The result in our tests is that `ORD_SORT` is about
+15-25% slower than `UNORD_SORT` on purely random data, depending on
+the compiler, but can be more than an order of magnitude faster than
+`UNORD_SORT` on highly structured data. As a modified `merge sort`,
+`ORD_SORT` requires the use of a "scratch" array, that may be provided
+as an optional `work` argument or allocated internally on the stack.
+
+#### The `ORD_SORTING` subroutine
+
+The `UNORD_SORT` and `ORD_SORT` subroutines can sort isolated arrays
+of intrinsic types, but do nothing for the sorting of arrays of
+derived types. For arrays of derived types what is useful is an array
+of indices that maps the original array to an array sorted based on the
+value of a component of the derived type. For such a sort, a stable
+sort is useful, therefore the module provides a subroutine,
+`ORD_SORTING`, that generates such an array of indices based on
+the `ORD_SORT` algorithm.
+
+As `ORD_SORT` is also based on the `rust sort` algorithm the `rust
+sort` license must be acknowledged:
+
+    Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
+    file at the top-level directory of this distribution and at
+    http://rust-lang.org/COPYRIGHT.
+
+    Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
+    http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
+    <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
+    option. This file may not be copied, modified, or distributed
+    except according to those terms.
+
+noting that the Fortran Standard Library is released under the MIT
+license so that incorporation of the `rust sort` algorithm is
+compatible with its license.
+
+The logic of `ORD_SORTING` parallels that of `ORD_SORT`, with
+additional housekeeping to keep the array of indices consistent with
+the sorted positions of the input array. Because of this additional
+housekeeping it has slower runtime performance than `ORD_SORT`.
+`ORD_SORTING` requires the use of two "scratch" arrays, that may be
+provided as optional `work` and `iwork` arguments or allocated
+internally on the stack.
+
+#### The `UNORD_SORT` subroutines 
+
+`UNORD_SORT` uses the [`introsort` sorting algorithm of David Musser]
+(http://www.cs.rpi.edu/~musser/gp/introsort.ps). `introsort` is a hybrid 
+unstable comparison algorithm combining `quicksort`, `insertion sort`,
+and `heap sort`. While this algorithm's runtime performance is always
+O(N Ln(N)), it is relatively fast on randomly ordered data, but
+inconsistent in performance on partly sorted data. David Musser has
+given permission to include a variant of `introsort` in the Fortran
+Standard Library under the MIT license provided we cite: 
+
+    Musser, D.R., “Introspective Sorting and Selection Algorithms,”
+    Software—Practice and Experience, Vol. 27(8), 983–993 (August 1997). 
+
+as the official source of the algorithm. 
+
+As with `introsort`, `UNORD_SORT` is an unstable hybrid algorithm.
+First it examines the array and estimates the depth of recursion a
+quick sort would require for ideal (random) data, `D =
+Ceiling(Ln(N)/Ln(2))`. It then defines a limit to the number of
+quicksort recursions to be allowed in processing,
+`D_limit = factor * D`, where factor is currently 2, and
+calls `introsort` proper. `introsort` proper then:
+
+1. Examines the number of elements remaining to be sorted, and, if
+   they are less than 16, sorts them using insertion sort and returns;
+2. If they are not less than 16, checks whether the current depth of
+   recursion exceeds `D_limit` and, if it does, processes the remaining
+   elements with heap sort and returns;
+3. If the current depth of recursion does not exceed `D_limit`, then
+   in effect does a quicksort step:
+
+    * Partitions the remaining array using a median of three,
+    * Calls introsort proper on the leftmost partition,
+    * Calls introsort proper on the rightmost partition, and then
+      returns.
+
+The resulting algorithm is of order O(N Ln(N)) run time
+performance for all inputs. Because it relies on `quicksort`, the
+coefficient of the O(N Ln(N)) behavior is typically small compared to
+other sorting algorithms on random data. On partially sorted data it
+can show either slower `heap sort` performance, or enhanced
+performance by up to a factor of six. Still, even when it shows
+enhanced performance, its performance on partially sorted data is
+typically an order of magnitude slower then `ORD_SORT`.
+
+### Tentative specifications of the `stdlib_sorting` procedures
+
+#### `ord_sort` - sorts an input array
+
+##### Status
+
+Experimental
+
+##### Description
+
+Returns an input `array` with the elements sorted in order of
+increasing value.
+
+##### Syntax
+
+`call [[stdlib_sorting(module):ord_sort(subroutine)]]ord_sort ( array[, work ] )`
+
+##### Class
+
+Generic subroutine.
+
+##### Arguments
+
+`array` : shall be a rank one array of any of the types:
+`integer(int8)`, `integer(int16)`, `integer(int32)`, `integer(int64)`,
+`real(real32)`, `real(real64)`, or `real(real128)`. It is an
+`intent(inout)` argument. On input it is the array to be sorted. If
+both the type of `array` is real and at least one of the elements is a
+`NaN`, then the ordering of the result is undefined. Otherwise on
+return its elements will be sorted in order of non-decreasing value.
+
+`work` (optional): shall be a rank one array of any of the same type as
+array, and shall have at least `size(array)/2` elements. It is an
+`intent(inout)` argument. It is intended to be used as "scratch"
+memory for internal record keeping. If associated with an array in
+static storage, its use can significantly reduce the stack memory
+requirements for the code. Its contents on return are undefined.
+
+##### Notes
+
+`ORD_SORT` implements a hybrid sorting algorithm combining
+`merge sort`, and `insertion sort`. For most purposes it behaves like
+a `merge sort`, providing worst case `O(N Ln(N))` run time performance
+for most random arrays, that is typically slower than `UNORD_SORT`.
+However, if the array has significant runs of decreasing or
+non-decreasing values, performance can be much better than
+`UNORD_SORT`, with `O(N)` behavior on uniformly decreasing, or
+non-decreasing arrays. The optional `work` array replaces "scratch"
+memory that would otherwise be allocated on the stack. If `array` is of
+any type `REAL` the order of its elements on return undefined if any
+element of `array` is a `NaN`.
+
+
+##### Example
+
+```Fortran
+	...
+	! Read arrays from sorted files
+	call read_sorted_file( 'dummy_file1', array1 )
+	call read_sorted_file( 'dummy_file2', array2 )
+	! Concatenate the arrays
+	allocate( array( size(array1) + size(array2) ) )
+	array( 1:size(array1) ) = array1(:)
+	array( size(array1)+1:size(array1)+size(array2) ) = array2(:)
+	! Sort the resulting array
+	call ord_sort( array, work )
+	! Process the sorted array
+	call array_search( array, values )
+	...
+```
+
+#### `ord_sorting` - creates an arry of sorting indices for an input array.
+
+##### Status
+
+Experimental
+
+##### Description
+
+Returns an integer array whose elements would sort the input array in
+the specified direction retaining order stability.
+
+##### Syntax
+
+`call [[stdlib_sorting(module):ord_sorting(subroutine)]]ord_sorting ( array, index[, work, iwork, reverse, zero_based ] )`
+
+##### Class
+
+Generic subroutine.
+
+##### Arguments
+
+`array`: shall be a rank one array of any of the types:
+`integer(int8)`, `integer(int16)`, `integer(int32)`, `integer(int64)`,
+`real(real32)`, `real(real64)`, or `real(real128)`. It is an
+`intent(inout)` argument. On input it will be an array whose sorting
+indices are to be determined. On return it will be the sorted
+array.
+
+`index`: shall be a rank one integer array of kind `int_size` and of
+the size of `array`. It is an `intent(out)` argument. On return it
+shall have values that are the indices needed to sort the original
+array in the desired direction.
+
+`work` (optional): shall be a rank one array of any of the same type as
+array, and shall have at least `size(array)/2` elements. It is an
+`intent(inout)` argument. It is intended to be used as "scratch"
+memory for internal record keeping. If associated with an array in
+static storage, its use can significantly reduce the stack memory
+requirements for the code. Its contents on return are undefined.
+
+`iwork` (optional): shall be a rank one integer array of kind
+`int_size`, and shall have at least `size(array)/2` elements. It
+is an `intent(inout)` argument. Its contents on return are undefined.
+
+`reverse` (optional): shall be a scalar of type default logical. It
+is an `intent(in)` argument. If present with a value of `.true.` then
+`index` will sort `array` in order of non-increasing values in stable
+order. Otherwise index will sort `array` in order of non-decreasing
+values in stable order.
+
+`zero_based` (optional): shall be a scalar of type default logical.
+It is an `intent(in)` argument. If present with the value `.true.`
+the values of `index` will be for zero based array indexing,
+otherwise the values of index will be for one's based array
+indexing.
+
+##### Notes
+
+`ORD_SORTING` implements the hybrid sorting algorithm of `ORD_SORT`,
+keeping the values of `index` consistent with the elements of `array`
+as it is sorted. As a `merge sort` based algorithm, it is a stable
+sorting comparison algorithm. The optional `work` and `iwork` arrays
+replace "scratch" memory that would otherwise be allocated on the
+stack. If `array` is of any kind of `REAL` the order of the elements in
+`index` and `array` on return are undefined if any element of `array`
+is a `NaN`. It should be emphasized that the order of `array` will
+typically be different on return.
+
+
+##### Example
+
+```Fortran
+	subroutine sort_a_data( a_data, a, work, index, iwork )
+	    ! Sort `a_data` in terms or its component `a`
+	    type(a_type), intent(inout)      :: a_data(:)
+		integer(int32), intent(inout)    :: a(:)
+		integer(int32), intent(inout)    :: work(:)
+		integer(int_size), intent(inout) :: index(:)
+		integer(int_size), intent(inout) :: iwork(:)
+		! Extract a component of `a_data`
+		a(1:size(a_data)) = a_data(:) % a
+		! Find the indices to sort the component
+		call ord_sorting(a(1:size(a_data)), index(1:size(a_data)),&
+		    work(1:size(a_data)/2), iwork(1:size(a_data)/2))
+		! Sort a_data based on the sorting of that component
+		a_data(:) = a_data( index(1:size(a_data)) )
+	end subroutine sort_a_data
+```
+
+#### `unord_sort` - sorts an input array
+
+##### Status
+
+Experimental
+
+##### Description
+
+Returns an input array with the elements sorted in order of increasing
+value.
+
+##### Syntax
+
+`call [[stdlib_sorting(module):unord_sort(subroutine)]]unord_sort ( array )`
+
+##### Class
+
+Pure generic subroutine.
+
+##### Arguments
+
+`array` : shall be a rank one array of any of the types:
+`integer(int8)`, `integer(int16)`, `integer(int32)`, `integer(int64)`,
+`real(real32)`, `real(real64)`, or `real(real128)`. It
+is an `intent(inout)` argument. On return its input elements will be
+sorted in order of non-decreasing value.
+
+##### Notes
+
+`UNORD_SORT` implements a hybrid sorting algorithm combining
+`quicksort`, `merge sort`, and `insertion sort`. For most purposes it
+behaves like a `quicksort` with a median of three partition, providing
+good, `O(N Ln(N))`, run time performance for most random arrays, but
+defaulting to `merge sort` if the structure of the array results in
+the `quicksort` not converging as rapidly as expected. If `array` is of
+any type `REAL`, the behavior of the sorting is undefined if any
+element of `array` is a `NaN`.
+
+##### Example
+
+```Fortran
+	...
+	! Read random data from a file
+	call read_file( 'dummy_file', array )
+	! Sort the random data
+	call unord_sort( array )
+	! Process the sorted data
+	call array_search( array, values )
+	...
+```
+
+#### Specific procedures
+
+Usually the name of a generic procedure is the most convenient way of
+invoking it. However sometimes it is necessary to pass a procedure as
+an argument to another procedure. In that case it is usually necessary
+to know the name of the specific procedure desired. The following
+table lists the specific subroutines and the corresponding types of
+their `array` arguments.
+
+| Generic Subroutine | Specific Subroutine | Array type      |
+|--------------------|---------------------|-----------------|
+| `ORD_SORT`         | `INT8_ORD_SORT`     | `INTEGER(INT8)` |
+|                    | `INT16_ORD_SORT`    | `INTEGER(INT16)` |
+|                    | `INT32_ORD_SORT`    | `INTEGER(INT32)` |
+|                    | `INT64_ORD_SORT`    | `INTEGER(INT64)` |
+|                    | `SP_ORD_SORT`    | `REAL(REAL32)` |
+|                    | `DP_ORD_SORT`    | `REAL(REAL64)` |
+|                    | `QP_ORD_SORT`    | `REAL(REAL128)` |
+| `ORD_SORTING`      | `INT8_ORD_SORTING`     | `INTEGER(INT8)` |
+|                    | `INT16_ORD_SORTING`    | `INTEGER(INT16)` |
+|                    | `INT32_ORD_SORTING`    | `INTEGER(INT32)` |
+|                    | `INT64_ORD_SORTING`    | `INTEGER(INT64)` |
+|                    | `SP_ORD_SORTING`    | `REAL(REAL32)` |
+|                    | `DP_ORD_SORTING`    | `REAL(REAL64)` |
+|                    | `QP_ORD_SORTING`    | `REAL(REAL128)` |
+| `UNORD_SORT`         | `INT8_UNORD_SORT`     | `INTEGER(INT8)` |
+|                    | `INT16_UNORD_SORT`    | `INTEGER(INT16)` |
+|                    | `INT32_UNORD_SORT`    | `INTEGER(INT32)` |
+|                    | `INT64_UNORD_SORT`    | `INTEGER(INT64)` |
+|                    | `SP_UNORD_SORT`    | `REAL(REAL32)` |
+|                    | `DP_UNORD_SORT`    | `REAL(REAL64)` |
+|                    | `QP_UNORD_SORT`    | `REAL(REAL128)` |
+
+
+### Performance benchmarks
+
+We have performed benchmarks of the procedures on nine different
+arrays each of size `2**20`:
+
+* Blocks - the array is divided into siz blocks, each of distinct
+  uniformly increasing integers. 
+* Decreasing - values decrease uniformly from `2**20-1` to `0`. 
+* Identical - all integers have the same value of 10.
+* Increasing - values increase uniformly from `0` to `2**20-1`.
+* Random dense - the integers are generated randomly from a set of 
+  values from `0` to `2**18-1` so duplicates are dense. 
+* Random order - a set of integers from `0` to `2**20 - 1` in random 
+  order. 
+* Random sparse - the integers are generated randomly from a set of
+  values from `0` to `2**22-1` so duplicates are sparse.
+* Random-3 - the increasing array has 3 random exchanges of individual
+  elements.
+* Random-10 - the final ten elements of the increasing array are
+  replaced by random values.
+
+These benchmarks have been performed on two different processors, both
+on a MacBook Pro, featuring a 2.3 GHz Quad-Core Intel Core i5, with 8
+GB 2133 MHz LPDDR3 memory. The first processor was GNU Fortran
+(GCC) 10.2.0, with the following results:
+
+|   Type  | Elements |   Order       |    Method     | Time (s)  |
+|---------|----------|---------------|---------------|-----------|
+| Integer | 1048576  |        Blocks |      Ord_Sort |   0.00692 |
+| Integer | 1048576  |    Decreasing |      Ord_Sort |   0.00377 |
+| Integer | 1048576  |     Identical |      Ord_Sort |   0.00236 |
+| Integer | 1048576  |    Increasing |      Ord_Sort |   0.00205 |
+| Integer | 1048576  |  Random dense |      Ord_Sort |   0.16675 |
+| Integer | 1048576  |  Random order |      Ord_Sort |   0.16780 |
+| Integer | 1048576  | Random sparse |      Ord_Sort |   0.16715 |
+| Integer | 1048576  |      Random 3 |      Ord_Sort |   0.00907 |
+| Integer | 1048576  |     Random 10 |      Ord_Sort |   0.00435 |
+| Integer | 1048576  |        Blocks |   Ord_Sorting |   0.01177 |
+| Integer | 1048576  |    Decreasing |   Ord_Sorting |   0.00695 |
+| Integer | 1048576  |     Identical |   Ord_Sorting |   0.00438 |
+| Integer | 1048576  |    Increasing |   Ord_Sorting |   0.00429 |
+| Integer | 1048576  |  Random dense |   Ord_Sorting |   0.19079 |
+| Integer | 1048576  |  Random order |   Ord_Sorting |   0.19242 |
+| Integer | 1048576  | Random sparse |   Ord_Sorting |   0.19212 |
+| Integer | 1048576  |      Random 3 |   Ord_Sorting |   0.01504 |
+| Integer | 1048576  |     Random 10 |   Ord_Sorting |   0.00794 |
+| Integer | 1048576  |        Blocks |    Unord_Sort |   0.20889 |
+| Integer | 1048576  |    Decreasing |    Unord_Sort |   0.12826 |
+| Integer | 1048576  |     Identical |    Unord_Sort |   0.15646 |
+| Integer | 1048576  |    Increasing |    Unord_Sort |   0.05012 |
+| Integer | 1048576  |  Random dense |    Unord_Sort |   0.14359 |
+| Integer | 1048576  |  Random order |    Unord_Sort |   0.14613 |
+| Integer | 1048576  | Random sparse |    Unord_Sort |   0.15159 |
+| Integer | 1048576  |      Random 3 |    Unord_Sort |   0.13826 |
+| Integer | 1048576  |     Random 10 |    Unord_Sort |   0.35356 |
+
+The second processor was ifort (IFORT) 18.0.3 20180410, with the
+following results:
+
+|   Type  | Elements |   Order       |    Method     | Time (s)  |
+|---------|----------|---------------|---------------|-----------|
+| Integer | 1048576  |        Blocks |      Ord_Sort |   0.00231 |
+| Integer | 1048576  |    Decreasing |      Ord_Sort |   0.00123 |
+| Integer | 1048576  |     Identical |      Ord_Sort |   0.00088 |
+| Integer | 1048576  |    Increasing |      Ord_Sort |   0.00088 |
+| Integer | 1048576  |  Random dense |      Ord_Sort |   0.08914 |
+| Integer | 1048576  |  Random order |      Ord_Sort |   0.08779 |
+| Integer | 1048576  | Random sparse |      Ord_Sort |   0.08357 |
+| Integer | 1048576  |      Random 3 |      Ord_Sort |   0.00335 |
+| Integer | 1048576  |     Random 10 |      Ord_Sort |   0.00196 |
+| Integer | 1048576  |        Blocks |   Ord_Sorting |   0.00519 |
+| Integer | 1048576  |    Decreasing |   Ord_Sorting |   0.00240 |
+| Integer | 1048576  |     Identical |   Ord_Sorting |   0.00110 |
+| Integer | 1048576  |    Increasing |   Ord_Sorting |   0.00113 |
+| Integer | 1048576  |  Random dense |   Ord_Sorting |   0.10185 |
+| Integer | 1048576  |  Random order |   Ord_Sorting |   0.10222 |
+| Integer | 1048576  | Random sparse |   Ord_Sorting |   0.10376 |
+| Integer | 1048576  |      Random 3 |   Ord_Sorting |   0.00628 |
+| Integer | 1048576  |     Random 10 |   Ord_Sorting |   0.00264 |
+| Integer | 1048576  |        Blocks |    Unord_Sort |   0.07272 |
+| Integer | 1048576  |    Decreasing |    Unord_Sort |   0.03873 |
+| Integer | 1048576  |     Identical |    Unord_Sort |   0.03677 |
+| Integer | 1048576  |    Increasing |    Unord_Sort |   0.01153 |
+| Integer | 1048576  |  Random dense |    Unord_Sort |   0.06706 |
+| Integer | 1048576  |  Random order |    Unord_Sort |   0.06808 |
+| Integer | 1048576  | Random sparse |    Unord_Sort |   0.06798 |
+| Integer | 1048576  |      Random 3 |    Unord_Sort |   0.07230 |
+| Integer | 1048576  |     Random 10 |    Unord_Sort |   0.13640 |
+
+

From 56fd86f8ea832839ee9f89fcb6a5cb2efebbf7b9 Mon Sep 17 00:00:00 2001
From: William Clodius <w.clodius@icloud.com>
Date: Sun, 11 Apr 2021 11:27:01 -0600
Subject: [PATCH 02/18] Removed trailing whitespace

Removed trailing whitespace from the markdown document, stdlib_sorting.md.

[ticket: X]
---
 doc/specs/stdlib_sorting.md | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/doc/specs/stdlib_sorting.md b/doc/specs/stdlib_sorting.md
index b843b89be..dcd6058e8 100644
--- a/doc/specs/stdlib_sorting.md
+++ b/doc/specs/stdlib_sorting.md
@@ -47,8 +47,8 @@ data:
 * `ORD_SORTING` is intended to provide indices for sorting arrays of
   derived type data, based on the ordering of an intrinsic component
   of the derived type.
-* `UNORD_SORT` is intended to sort simple arrays of intrinsic data 
-  that are effectively unordered before the sort; 
+* `UNORD_SORT` is intended to sort simple arrays of intrinsic data
+  that are effectively unordered before the sort;
 
 #### The `ORD_SORT` subroutine
 
@@ -143,21 +143,21 @@ housekeeping it has slower runtime performance than `ORD_SORT`.
 provided as optional `work` and `iwork` arguments or allocated
 internally on the stack.
 
-#### The `UNORD_SORT` subroutines 
+#### The `UNORD_SORT` subroutines
 
 `UNORD_SORT` uses the [`introsort` sorting algorithm of David Musser]
-(http://www.cs.rpi.edu/~musser/gp/introsort.ps). `introsort` is a hybrid 
+(http://www.cs.rpi.edu/~musser/gp/introsort.ps). `introsort` is a hybrid
 unstable comparison algorithm combining `quicksort`, `insertion sort`,
 and `heap sort`. While this algorithm's runtime performance is always
 O(N Ln(N)), it is relatively fast on randomly ordered data, but
 inconsistent in performance on partly sorted data. David Musser has
 given permission to include a variant of `introsort` in the Fortran
-Standard Library under the MIT license provided we cite: 
+Standard Library under the MIT license provided we cite:
 
     Musser, D.R., “Introspective Sorting and Selection Algorithms,”
-    Software—Practice and Experience, Vol. 27(8), 983–993 (August 1997). 
+    Software—Practice and Experience, Vol. 27(8), 983–993 (August 1997).
 
-as the official source of the algorithm. 
+as the official source of the algorithm.
 
 As with `introsort`, `UNORD_SORT` is an unstable hybrid algorithm.
 First it examines the array and estimates the depth of recursion a
@@ -440,14 +440,14 @@ We have performed benchmarks of the procedures on nine different
 arrays each of size `2**20`:
 
 * Blocks - the array is divided into siz blocks, each of distinct
-  uniformly increasing integers. 
-* Decreasing - values decrease uniformly from `2**20-1` to `0`. 
+  uniformly increasing integers.
+* Decreasing - values decrease uniformly from `2**20-1` to `0`.
 * Identical - all integers have the same value of 10.
 * Increasing - values increase uniformly from `0` to `2**20-1`.
-* Random dense - the integers are generated randomly from a set of 
-  values from `0` to `2**18-1` so duplicates are dense. 
-* Random order - a set of integers from `0` to `2**20 - 1` in random 
-  order. 
+* Random dense - the integers are generated randomly from a set of
+  values from `0` to `2**18-1` so duplicates are dense.
+* Random order - a set of integers from `0` to `2**20 - 1` in random
+  order.
 * Random sparse - the integers are generated randomly from a set of
   values from `0` to `2**22-1` so duplicates are sparse.
 * Random-3 - the increasing array has 3 random exchanges of individual

From 9b28d8fb44c134c21ab242ec909d6498a472d442 Mon Sep 17 00:00:00 2001
From: William Clodius <w.clodius@icloud.com>
Date: Sun, 11 Apr 2021 11:30:18 -0600
Subject: [PATCH 03/18] Added stdlib_sorting to the index

Added a reference to stdlib_sorting.md to index.md

[ticket: X]
---
 doc/specs/index.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/specs/index.md b/doc/specs/index.md
index 098f2ce21..4238dc35e 100644
--- a/doc/specs/index.md
+++ b/doc/specs/index.md
@@ -19,6 +19,7 @@ This is and index/directory of the specifications (specs) for each new module/fe
  - [logger](./stdlib_logger.html) - Runtime logging system
  - [optval](./stdlib_optval.html) - Fallback value for optional arguments
  - [quadrature](./stdlib_quadrature.html) - Numerical integration
+ - [sorting](.stdlib_sorting.html) - Sorting of numerical arrays
  - [stats](./stdlib_stats.html) - Descriptive Statistics
  - [stats_distribution_PRNG](./stdlib_stats_distribution_PRNG.html) - Probability Distributions random number generator
  - [string\_type](./stdlib_string_type.html) - Basic string support

From f2944224db1bad9de14d09fcc70f77223ded0da0 Mon Sep 17 00:00:00 2001
From: William Clodius <w.clodius@icloud.com>
Date: Mon, 12 Apr 2021 20:36:52 -0600
Subject: [PATCH 04/18] Incorproated suggestions by Jeremie Vandenplas

Added a licnsing section. Changed the description of `ORD_SORT`, `ORD_SORTING`,
and `UNORD_SORT`. Added more examples for `ORD_SORTING`. Simplified the
discussion of `INT_SIZE`. Used `stdlib_kinds` instead of `iso_fortran_env`.
Changed `processor` to `compiler`. Fixed spelling of array.

[ticket: X]
---
 doc/specs/stdlib_sorting.md | 200 ++++++++++++++++++++++--------------
 1 file changed, 122 insertions(+), 78 deletions(-)

diff --git a/doc/specs/stdlib_sorting.md b/doc/specs/stdlib_sorting.md
index dcd6058e8..5ea6da868 100644
--- a/doc/specs/stdlib_sorting.md
+++ b/doc/specs/stdlib_sorting.md
@@ -32,9 +32,7 @@ versions corresponding to differend types of array arguments.
 
 The `int_size` parameter is used to specify the kind of integer used
 in indexing the various arrays. Currently the module sets `int_size`
-to the value of `int64` from the intrinsic `ISO_FORTRAN_ENV` module.
-For many applications a value of `INT32` would be sufficient for
-addressing and would save some stack space for the subroutines,
+to the value of `int64` from the `stdlib_kinds` module.
 
 ### The module subroutines
 
@@ -43,29 +41,26 @@ subroutines intended to sort three different kinds of arrays of
 data:
 * `ORD_SORT` is intended to sort simple arrays of intrinsic data
   that have significant sections that were partially ordered before
-  the sort; and
+  the sort;
 * `ORD_SORTING` is intended to provide indices for sorting arrays of
   derived type data, based on the ordering of an intrinsic component
-  of the derived type.
+  of the derived type; and
 * `UNORD_SORT` is intended to sort simple arrays of intrinsic data
-  that are effectively unordered before the sort;
+  that are effectively unordered before the sort.
 
-#### The `ORD_SORT` subroutine
+#### Licensing
 
-`ORD_SORT` is a translation of the [`rust sort` sorting algorithm]
-(https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs)
-which in turn is inspired by the [`timsort` algorithm of Tim Peters]
-(http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
-`ORD_SORT` is a hybrid stable comparison algorithm combining `merge sort`,
-and `insertion sort`. It has always at worst O(N Ln(N)) runtime
-performance in sorting random data, having a performance about 15-25%
-slower than `UNORD_SORT` on such data. However it has much better
-performance than `UNORD_SORT` on partially sorted data, having O(N)
-performance on uniformly increasing or decreasing data.
+The Fortran Standard Library is distributed under the MIT
+License. However components of the library may be based on code with
+additional licensing restriction. In particular `ORD_SORT`,
+`ORD_SORTING`, and `UNORD_SORT` are translations of codes with their
+own distribution restrictions.
 
-The [`rust sort` implementation]
-(https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs)
-is distributed with the header:
+The `ORD_SORT` and `ORD_SORTING` subroutines are essentially
+translations to Fortran 2008 of the `"rust" sort` of the Rust Lsnguage
+distributed as part of
+[`slice.rs`](https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs).
+The header of the `slice.rs` file has as its licensing requirements:
 
     Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
     file at the top-level directory of this distribution and at
@@ -77,19 +72,48 @@ is distributed with the header:
     option. This file may not be copied, modified, or distributed
     except according to those terms.
 
-so the license for the original code is compatible with the use of
+so the license for the `slice.rs` code is compatible with the use of
 modified versions of the code in the Fortran Standard Library under
 the MIT license.
 
-As with `timsort`, `ORD_SORT` is a stable hybrid algorithm.
-It begins by traversing the array starting in its tail attempting to
-identify `runs` in the array, where a run is either a uniformly
-decreasing sequence, `ARRAY(i-1) > ARRAY(i)`, or non-decreasing,
-`ARRAY(i-1) <= ARRAY(i)`, sequence. Decreasing sequences are reversed.
-Then, if the sequence has less than `MIN_RUN` elements, previous
-elements in the array are added to the run using `insertion sort`
-until the run contains `MIN_RUN` elements or the array is completely
-processed. As each run is identified the start and length of the run
+The `UNORD_SORT` subroutine is essentially a translation to Fortran
+2008 of the
+[`introsort`]((http://www.cs.rpi.edu/~musser/gp/introsort.ps) of David
+Musser.  David Musser has given permission to include a variant of
+`introsort` in the Fortran Standard Library under the MIT license
+provided we cite:
+
+    Musser, D.R., “Introspective Sorting and Selection Algorithms,”
+    Software—Practice and Experience, Vol. 27(8), 983–993 (August 1997).
+
+as the official source of the algorithm.
+
+
+#### The `ORD_SORT` subroutine
+
+`ORD_SORT` is a translation of the `"Rust" sort` sorting algorithm
+contained in [`slice.rs`]
+(https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs).
+`"Rust" sort`, in turn, is inspired by the [`timsort` algorithm]
+(http://svn.python.org/projects/python/trunk/Objects/listsort.txt)
+that Tim Peters created for the Python Language.
+`ORD_SORT` is a hybrid stable comparison algorithm combining `merge sort`,
+and `insertion sort`. It has always at worst O(N Ln(N)) runtime
+performance in sorting random data, having a performance about 15-25%
+slower than `UNORD_SORT` on such data. However it has much better
+performance than `UNORD_SORT` on partially sorted data, having O(N)
+performance on uniformly increasing or decreasing data.
+
+
+`ORD_SORt` begins by traversing the array starting in its tail
+attempting to identify `runs` in the array, where a run is either a
+uniformly decreasing sequence, `ARRAY(i-1) > ARRAY(i)`, or a
+non-decreasing, `ARRAY(i-1) <= ARRAY(i)`, sequence. Once deliminated
+decreasing sequences are reversed in their order. Then, if the
+sequence has less than `MIN_RUN` elements, previous elements in the
+array are added to the run using `insertion sort` until the run
+contains `MIN_RUN` elements or the array is completely processed. As
+each run is identified the start and length of the run 
 is then pushed onto a stack and the stack is then processed using
 `merge` until it obeys the stack invariants:
 
@@ -101,39 +125,22 @@ Ln(N))`. However, because of the identification of decreasing and
 non-decreasing runs, processing of structured data can be much faster,
 with processing of uniformly decreasing or non-decreasing arrays being
 of order O(N). The result in our tests is that `ORD_SORT` is about
-15-25% slower than `UNORD_SORT` on purely random data, depending on
-the compiler, but can be more than an order of magnitude faster than
-`UNORD_SORT` on highly structured data. As a modified `merge sort`,
-`ORD_SORT` requires the use of a "scratch" array, that may be provided
-as an optional `work` argument or allocated internally on the stack.
+25% slower than `UNORD_SORT` on purely random data, depending on
+the compiler, but can be `Ln(N)` faster than `UNORD_SORT` on highly
+structured data. As a modified `merge sort`, `ORD_SORT` requires the
+use of a "scratch" array, that may be provided as an optional `work`
+argument or allocated internally on the stack.
 
 #### The `ORD_SORTING` subroutine
 
-The `UNORD_SORT` and `ORD_SORT` subroutines can sort isolated arrays
-of intrinsic types, but do nothing for the sorting of arrays of
-derived types. For arrays of derived types what is useful is an array
-of indices that maps the original array to an array sorted based on the
-value of a component of the derived type. For such a sort, a stable
-sort is useful, therefore the module provides a subroutine,
-`ORD_SORTING`, that generates such an array of indices based on
-the `ORD_SORT` algorithm.
-
-As `ORD_SORT` is also based on the `rust sort` algorithm the `rust
-sort` license must be acknowledged:
-
-    Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
-    file at the top-level directory of this distribution and at
-    http://rust-lang.org/COPYRIGHT.
-
-    Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
-    http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
-    <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
-    option. This file may not be copied, modified, or distributed
-    except according to those terms.
-
-noting that the Fortran Standard Library is released under the MIT
-license so that incorporation of the `rust sort` algorithm is
-compatible with its license.
+The `UNORD_SORT` and `ORD_SORT` subroutines can sort rank 1 isolated
+arrays of intrinsic types, but do nothing for the coordinated sorting
+of related data, e.g., multiple related rank 1 arrays, higher rank
+arrays, or arrays of derived types. For such related data, what is
+useful is an array of indices that maps a rank 1 array to its sorted
+form. For such a sort, a stable sort is useful, therefore the module
+provides a subroutine, `ORD_SORTING`, that generates such an array of
+indices based on the `ORD_SORT` algorithm.
 
 The logic of `ORD_SORTING` parallels that of `ORD_SORT`, with
 additional housekeeping to keep the array of indices consistent with
@@ -145,19 +152,12 @@ internally on the stack.
 
 #### The `UNORD_SORT` subroutines
 
-`UNORD_SORT` uses the [`introsort` sorting algorithm of David Musser]
-(http://www.cs.rpi.edu/~musser/gp/introsort.ps). `introsort` is a hybrid
-unstable comparison algorithm combining `quicksort`, `insertion sort`,
-and `heap sort`. While this algorithm's runtime performance is always
-O(N Ln(N)), it is relatively fast on randomly ordered data, but
-inconsistent in performance on partly sorted data. David Musser has
-given permission to include a variant of `introsort` in the Fortran
-Standard Library under the MIT license provided we cite:
-
-    Musser, D.R., “Introspective Sorting and Selection Algorithms,”
-    Software—Practice and Experience, Vol. 27(8), 983–993 (August 1997).
-
-as the official source of the algorithm.
+`UNORD_SORT` uses the `introsort` sorting algorithm of David Musser.
+`introsort` is a hybrid unstable comparison algorithm combining
+`quicksort`, `insertion sort`, and `heap sort`. While this algorithm's
+runtime performance is always O(N Ln(N)), it is relatively fast on
+randomly ordered data, but inconsistent in performance on partly
+sorted data.as the official source of the algorithm.
 
 As with `introsort`, `UNORD_SORT` is an unstable hybrid algorithm.
 First it examines the array and estimates the depth of recursion a
@@ -260,7 +260,7 @@ element of `array` is a `NaN`.
 	...
 ```
 
-#### `ord_sorting` - creates an arry of sorting indices for an input array.
+#### `ord_sorting` - creates an array of sorting indices for an input array.
 
 ##### Status
 
@@ -329,8 +329,52 @@ is a `NaN`. It should be emphasized that the order of `array` will
 typically be different on return.
 
 
-##### Example
+##### Examples
+
+Sorting a related rank one array:
+
+```Fortran
+	subroutine sort_related_data( a, b, work, index, iwork )
+	    ! Sort `b` in terms or its related array `a`
+	    integer, intent(inout)           :: a(:)
+		integer(int32), intent(inout)    :: b(:) ! The same size as a
+		integer(int32), intent(inout)    :: work(:)
+		integer(int_size), intent(inout) :: index(:)
+		integer(int_size), intent(inout) :: iwork(:)
+		! Find the indices to sort a
+		call ord_sorting(a, index(1:size(a)),&
+		    work(1:size(a)/2), iwork(1:size(a)/2))
+		! Sort b based on the sorting of a
+		b(:) = b( index(1:size(a)) )
+	end subroutine sort_related_data
+```
+
+Sorting a rank 2 array based on the data in a column
+
+```Fortran
+	subroutine sort_related_data( array, column, work, index, iwork )
+	    ! Sort `a_data` in terms or its component `a`
+	    integer, intent(inout)           :: a(:,:)
+		integer(int32), intent(in)       :: column
+		integer(int32), intent(inout)    :: work(:)
+		integer(int_size), intent(inout) :: index(:)
+		integer(int_size), intent(inout) :: iwork(:)
+		integer, allocatable             :: dummy(:)
+		integer :: i
+		allocate(dummy(size(a, dim=1)))
+		! Extract a component of `a_data`
+		dummy(:) = a(:, column)
+		! Find the indices to sort the column
+		call ord_sorting(dummy, index(1:size(dummy)),&
+		    work(1:size(dummy)/2), iwork(1:size(dummy)/2))
+		! Sort a based on the sorting of its column
+		do i=1, size(a, dim=2)
+		    a(:, i) = a(index(1:size(a, dim=1)), i)
+		end do
+	end subroutine sort_related_data
+```
 
+Sorting an array of a derived type based on the dsta in one component
 ```Fortran
 	subroutine sort_a_data( a_data, a, work, index, iwork )
 	    ! Sort `a_data` in terms or its component `a`
@@ -455,9 +499,9 @@ arrays each of size `2**20`:
 * Random-10 - the final ten elements of the increasing array are
   replaced by random values.
 
-These benchmarks have been performed on two different processors, both
+These benchmarks have been performed on two different compilers, both
 on a MacBook Pro, featuring a 2.3 GHz Quad-Core Intel Core i5, with 8
-GB 2133 MHz LPDDR3 memory. The first processor was GNU Fortran
+GB 2133 MHz LPDDR3 memory. The first compiler was GNU Fortran
 (GCC) 10.2.0, with the following results:
 
 |   Type  | Elements |   Order       |    Method     | Time (s)  |
@@ -490,7 +534,7 @@ GB 2133 MHz LPDDR3 memory. The first processor was GNU Fortran
 | Integer | 1048576  |      Random 3 |    Unord_Sort |   0.13826 |
 | Integer | 1048576  |     Random 10 |    Unord_Sort |   0.35356 |
 
-The second processor was ifort (IFORT) 18.0.3 20180410, with the
+The second compiler was ifort (IFORT) 18.0.3 20180410, with the
 following results:
 
 |   Type  | Elements |   Order       |    Method     | Time (s)  |

From 28afb184e7752cea2aeb8064d17c7d9db7812b91 Mon Sep 17 00:00:00 2001
From: William Clodius <w.clodius@icloud.com>
Date: Tue, 20 Apr 2021 22:51:21 -0600
Subject: [PATCH 05/18] Added sorting for character(*) and string_type

Added sorting for `character(*)` and `string_type` rank 1 arrays. Improved the
overall documentation. Added newer benchmarks for the addiitons and the Intel
One API compiler system.

[ticket: X]
---
 doc/specs/stdlib_sorting.md | 490 ++++++++++++++++++++----------------
 1 file changed, 280 insertions(+), 210 deletions(-)

diff --git a/doc/specs/stdlib_sorting.md b/doc/specs/stdlib_sorting.md
index 5ea6da868..c573c5876 100644
--- a/doc/specs/stdlib_sorting.md
+++ b/doc/specs/stdlib_sorting.md
@@ -17,16 +17,18 @@ However Fortran's arrays are by far its most widely used collection,
 and arrays of arbitrary types of data can often be sorted in terms of
 a single component of intrinsic type.
 The Fortran Standard Library therefore provides a module,
-`stdlib_sorting`, with procedures to sort arrays of single intrinsic
-numeric types, i.e. the different kinds of integers and reals.
+`stdlib_sorting`, with procedures to sort arrays of simple intrinsic
+numeric types, i.e. the different kinds of integers and reals, the
+default assumed length character, and the `stdlib_string_type`
+module's `string_type` type.
 
 ## Overview of the module
 
 The module `stdlib_sorting` defines several public entities one
 default integer parameter, `int_size`, and three overloaded
-subroutines: `ORD_SORT`, `UNORD_SORT`, and `ORD_SORTING`. The
+subroutines: `ORD_SORT`, `SORT`, and `SORT_INDEX`. The
 overloaded subroutines also each have seven specific names for
-versions corresponding to differend types of array arguments.
+versions corresponding to different types of array arguments.
 
 ### The `int_size` parameter
 
@@ -42,10 +44,10 @@ data:
 * `ORD_SORT` is intended to sort simple arrays of intrinsic data
   that have significant sections that were partially ordered before
   the sort;
-* `ORD_SORTING` is intended to provide indices for sorting arrays of
+* `SORT_INDEX` is intended to provide indices for sorting arrays of
   derived type data, based on the ordering of an intrinsic component
   of the derived type; and
-* `UNORD_SORT` is intended to sort simple arrays of intrinsic data
+* `SORT` is intended to sort simple arrays of intrinsic data
   that are effectively unordered before the sort.
 
 #### Licensing
@@ -53,11 +55,11 @@ data:
 The Fortran Standard Library is distributed under the MIT
 License. However components of the library may be based on code with
 additional licensing restriction. In particular `ORD_SORT`,
-`ORD_SORTING`, and `UNORD_SORT` are translations of codes with their
+`SORT_INDEX`, and `SORT` are translations of codes with their
 own distribution restrictions.
 
-The `ORD_SORT` and `ORD_SORTING` subroutines are essentially
-translations to Fortran 2008 of the `"rust" sort` of the Rust Lsnguage
+The `ORD_SORT` and `SORT_INDEX` subroutines are essentially
+translations to Fortran 2008 of the `"rust" sort` of the Rust Language
 distributed as part of
 [`slice.rs`](https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs).
 The header of the `slice.rs` file has as its licensing requirements:
@@ -76,7 +78,7 @@ so the license for the `slice.rs` code is compatible with the use of
 modified versions of the code in the Fortran Standard Library under
 the MIT license.
 
-The `UNORD_SORT` subroutine is essentially a translation to Fortran
+The `SORT` subroutine is essentially a translation to Fortran
 2008 of the
 [`introsort`]((http://www.cs.rpi.edu/~musser/gp/introsort.ps) of David
 Musser.  David Musser has given permission to include a variant of
@@ -100,15 +102,15 @@ that Tim Peters created for the Python Language.
 `ORD_SORT` is a hybrid stable comparison algorithm combining `merge sort`,
 and `insertion sort`. It has always at worst O(N Ln(N)) runtime
 performance in sorting random data, having a performance about 15-25%
-slower than `UNORD_SORT` on such data. However it has much better
-performance than `UNORD_SORT` on partially sorted data, having O(N)
+slower than `SORT` on such data. However it has much better
+performance than `SORT` on partially sorted data, having O(N)
 performance on uniformly increasing or decreasing data.
 
 
-`ORD_SORt` begins by traversing the array starting in its tail
+`ORD_SORT` begins by traversing the array starting in its tail
 attempting to identify `runs` in the array, where a run is either a
 uniformly decreasing sequence, `ARRAY(i-1) > ARRAY(i)`, or a
-non-decreasing, `ARRAY(i-1) <= ARRAY(i)`, sequence. Once deliminated
+non-decreasing, `ARRAY(i-1) <= ARRAY(i)`, sequence. Once delimitated
 decreasing sequences are reversed in their order. Then, if the
 sequence has less than `MIN_RUN` elements, previous elements in the
 array are added to the run using `insertion sort` until the run
@@ -125,45 +127,45 @@ Ln(N))`. However, because of the identification of decreasing and
 non-decreasing runs, processing of structured data can be much faster,
 with processing of uniformly decreasing or non-decreasing arrays being
 of order O(N). The result in our tests is that `ORD_SORT` is about
-25% slower than `UNORD_SORT` on purely random data, depending on
-the compiler, but can be `Ln(N)` faster than `UNORD_SORT` on highly
+25% slower than `SORT` on purely random data, depending on
+the compiler, but can be `Ln(N)` faster than `SORT` on highly
 structured data. As a modified `merge sort`, `ORD_SORT` requires the
 use of a "scratch" array, that may be provided as an optional `work`
 argument or allocated internally on the stack.
 
-#### The `ORD_SORTING` subroutine
+#### The `SORT_INDEX` subroutine
 
-The `UNORD_SORT` and `ORD_SORT` subroutines can sort rank 1 isolated
+The `SORT` and `ORD_SORT` subroutines can sort rank 1 isolated
 arrays of intrinsic types, but do nothing for the coordinated sorting
 of related data, e.g., multiple related rank 1 arrays, higher rank
 arrays, or arrays of derived types. For such related data, what is
 useful is an array of indices that maps a rank 1 array to its sorted
 form. For such a sort, a stable sort is useful, therefore the module
-provides a subroutine, `ORD_SORTING`, that generates such an array of
+provides a subroutine, `SORT_INDEX`, that generates such an array of
 indices based on the `ORD_SORT` algorithm.
 
-The logic of `ORD_SORTING` parallels that of `ORD_SORT`, with
+The logic of `SORT_INDEX` parallels that of `ORD_SORT`, with
 additional housekeeping to keep the array of indices consistent with
 the sorted positions of the input array. Because of this additional
 housekeeping it has slower runtime performance than `ORD_SORT`.
-`ORD_SORTING` requires the use of two "scratch" arrays, that may be
+`SORT_INDEX` requires the use of two "scratch" arrays, that may be
 provided as optional `work` and `iwork` arguments or allocated
 internally on the stack.
 
-#### The `UNORD_SORT` subroutines
+#### The `SORT` subroutines
 
-`UNORD_SORT` uses the `introsort` sorting algorithm of David Musser.
+`SORT` uses the `introsort` sorting algorithm of David Musser.
 `introsort` is a hybrid unstable comparison algorithm combining
 `quicksort`, `insertion sort`, and `heap sort`. While this algorithm's
 runtime performance is always O(N Ln(N)), it is relatively fast on
 randomly ordered data, but inconsistent in performance on partly
 sorted data.as the official source of the algorithm.
 
-As with `introsort`, `UNORD_SORT` is an unstable hybrid algorithm.
+As with `introsort`, `SORT` is an unstable hybrid algorithm.
 First it examines the array and estimates the depth of recursion a
 quick sort would require for ideal (random) data, `D =
 Ceiling(Ln(N)/Ln(2))`. It then defines a limit to the number of
-quicksort recursions to be allowed in processing,
+`quicksort` recursions to be allowed in processing,
 `D_limit = factor * D`, where factor is currently 2, and
 calls `introsort` proper. `introsort` proper then:
 
@@ -173,11 +175,11 @@ calls `introsort` proper. `introsort` proper then:
    recursion exceeds `D_limit` and, if it does, processes the remaining
    elements with heap sort and returns;
 3. If the current depth of recursion does not exceed `D_limit`, then
-   in effect does a quicksort step:
+   in effect does a `quicksort` step:
 
     * Partitions the remaining array using a median of three,
-    * Calls introsort proper on the leftmost partition,
-    * Calls introsort proper on the rightmost partition, and then
+    * Calls `introsort` proper on the leftmost partition,
+    * Calls `introsort` proper on the rightmost partition, and then
       returns.
 
 The resulting algorithm is of order O(N Ln(N)) run time
@@ -187,7 +189,7 @@ other sorting algorithms on random data. On partially sorted data it
 can show either slower `heap sort` performance, or enhanced
 performance by up to a factor of six. Still, even when it shows
 enhanced performance, its performance on partially sorted data is
-typically an order of magnitude slower then `ORD_SORT`.
+typically an order of magnitude slower than `ORD_SORT`.
 
 ### Tentative specifications of the `stdlib_sorting` procedures
 
@@ -214,13 +216,14 @@ Generic subroutine.
 
 `array` : shall be a rank one array of any of the types:
 `integer(int8)`, `integer(int16)`, `integer(int32)`, `integer(int64)`,
-`real(real32)`, `real(real64)`, or `real(real128)`. It is an
-`intent(inout)` argument. On input it is the array to be sorted. If
-both the type of `array` is real and at least one of the elements is a
-`NaN`, then the ordering of the result is undefined. Otherwise on
-return its elements will be sorted in order of non-decreasing value.
-
-`work` (optional): shall be a rank one array of any of the same type as
+`real(sp)`, `real(dp)`, `real(qp)`, `character(*)`, or
+`type(string_type)`. It is an `intent(inout)` argument. On input it is
+the array to be sorted. If both the type of `array` is real and at
+least one of the elements is a `NaN`, then the ordering of the result
+is undefined. Otherwise on return its elements will be sorted in order
+of non-decreasing value.
+
+`work` (optional): shall be a rank one array of the same type as
 array, and shall have at least `size(array)/2` elements. It is an
 `intent(inout)` argument. It is intended to be used as "scratch"
 memory for internal record keeping. If associated with an array in
@@ -232,35 +235,90 @@ requirements for the code. Its contents on return are undefined.
 `ORD_SORT` implements a hybrid sorting algorithm combining
 `merge sort`, and `insertion sort`. For most purposes it behaves like
 a `merge sort`, providing worst case `O(N Ln(N))` run time performance
-for most random arrays, that is typically slower than `UNORD_SORT`.
+for most random arrays, that is typically slower than `SORT`.
 However, if the array has significant runs of decreasing or
 non-decreasing values, performance can be much better than
-`UNORD_SORT`, with `O(N)` behavior on uniformly decreasing, or
+`SORT`, with `O(N)` behavior on uniformly decreasing, or
 non-decreasing arrays. The optional `work` array replaces "scratch"
 memory that would otherwise be allocated on the stack. If `array` is of
 any type `REAL` the order of its elements on return undefined if any
-element of `array` is a `NaN`.
+element of `array` is a `NaN`.  Sorting of `CHARACTER(*)` and
+`STRING_TYPE` arrays are based on the operator `>`, and not on the
+function `LGT`.
 
 
 ##### Example
 
-```Fortran
-	...
-	! Read arrays from sorted files
-	call read_sorted_file( 'dummy_file1', array1 )
-	call read_sorted_file( 'dummy_file2', array2 )
-	! Concatenate the arrays
-	allocate( array( size(array1) + size(array2) ) )
-	array( 1:size(array1) ) = array1(:)
-	array( size(array1)+1:size(array1)+size(array2) ) = array2(:)
-	! Sort the resulting array
-	call ord_sort( array, work )
-	! Process the sorted array
-	call array_search( array, values )
-	...
+```fortran
+    ...
+    ! Read arrays from sorted files
+    call read_sorted_file( 'dummy_file1', array1 )
+    call read_sorted_file( 'dummy_file2', array2 )
+    ! Concatenate the arrays
+    allocate( array( size(array1) + size(array2) ) )
+    array( 1:size(array1) ) = array1(:)
+    array( size(array1)+1:size(array1)+size(array2) ) = array2(:)
+    ! Sort the resulting array
+    call ord_sort( array, work )
+    ! Process the sorted array
+    call array_search( array, values )
+    ...
 ```
 
-#### `ord_sorting` - creates an array of sorting indices for an input array.
+#### `sort` - sorts an input array
+
+##### Status
+
+Experimental
+
+##### Description
+
+Returns an input array with the elements sorted in order of increasing
+value.
+
+##### Syntax
+
+`call [[stdlib_sorting(module):sort(subroutine)]]sort ( array )`
+
+##### Class
+
+Pure generic subroutine.
+
+##### Arguments
+
+`array` : shall be a rank one array of any of the types:
+`integer(int8)`, `integer(int16)`, `integer(int32)`, `integer(int64)`,
+`real(sp)`, `real(dp)`, `real(qp)`. `character(*)`, or
+`type(string_type)`. It is an `intent(inout)` argument. On return its
+input elements will be sorted in order of non-decreasing value.
+
+##### Notes
+
+`SORT` implements a hybrid sorting algorithm combining
+`quicksort`, `merge sort`, and `insertion sort`. For most purposes it
+behaves like a `quicksort` with a median of three partition, providing
+good, `O(N Ln(N))`, run time performance for most random arrays, but
+defaulting to `merge sort` if the structure of the array results in
+the `quicksort` not converging as rapidly as expected. If `array` is of
+any type `REAL`, the behavior of the sorting is undefined if any
+element of `array` is a `NaN`.  Sorting of `CHARACTER(*)` and
+`STRING_TYPE` arrays are based on the operators `<`, `<=`, `>`, and
+`>=`, and not on the functions `LLT`, `LLE`, `LGT`, or `LGE`.
+
+##### Example
+
+```fortran
+    ...
+    ! Read random data from a file
+    call read_file( 'dummy_file', array )
+    ! Sort the random data
+    call sort( array )
+    ! Process the sorted data
+    call array_search( array, values )
+    ...
+```
+
+#### `sort_index` - creates an array of sorting indices for an input array.
 
 ##### Status
 
@@ -273,7 +331,7 @@ the specified direction retaining order stability.
 
 ##### Syntax
 
-`call [[stdlib_sorting(module):ord_sorting(subroutine)]]ord_sorting ( array, index[, work, iwork, reverse, zero_based ] )`
+`call [[stdlib_sorting(module):sort_index(subroutine)]]sort_index ( array, index[, work, iwork, reverse ] )`
 
 ##### Class
 
@@ -283,10 +341,10 @@ Generic subroutine.
 
 `array`: shall be a rank one array of any of the types:
 `integer(int8)`, `integer(int16)`, `integer(int32)`, `integer(int64)`,
-`real(real32)`, `real(real64)`, or `real(real128)`. It is an
-`intent(inout)` argument. On input it will be an array whose sorting
-indices are to be determined. On return it will be the sorted
-array.
+`real(sp)`, `real(dp)`, `real(qp)`, `character(*)`, or
+`type(string_type)`. It is an `intent(inout)` argument. On input it
+will be an array whose sorting indices are to be determined. On return
+it will be the sorted array.
 
 `index`: shall be a rank one integer array of kind `int_size` and of
 the size of `array`. It is an `intent(out)` argument. On return it
@@ -294,7 +352,7 @@ shall have values that are the indices needed to sort the original
 array in the desired direction.
 
 `work` (optional): shall be a rank one array of any of the same type as
-array, and shall have at least `size(array)/2` elements. It is an
+`array`, and shall have at least `size(array)/2` elements. It is an
 `intent(inout)` argument. It is intended to be used as "scratch"
 memory for internal record keeping. If associated with an array in
 static storage, its use can significantly reduce the stack memory
@@ -302,7 +360,10 @@ requirements for the code. Its contents on return are undefined.
 
 `iwork` (optional): shall be a rank one integer array of kind
 `int_size`, and shall have at least `size(array)/2` elements. It
-is an `intent(inout)` argument. Its contents on return are undefined.
+is an `intent(inout)` argument.  It is intended to be used as "scratch"
+memory for internal record keeping. If associated with an array in
+static storage, its use can significantly reduce the stack memory
+requirements for the code. Its contents on return are undefined.
 
 `reverse` (optional): shall be a scalar of type default logical. It
 is an `intent(in)` argument. If present with a value of `.true.` then
@@ -310,23 +371,19 @@ is an `intent(in)` argument. If present with a value of `.true.` then
 order. Otherwise index will sort `array` in order of non-decreasing
 values in stable order.
 
-`zero_based` (optional): shall be a scalar of type default logical.
-It is an `intent(in)` argument. If present with the value `.true.`
-the values of `index` will be for zero based array indexing,
-otherwise the values of index will be for one's based array
-indexing.
-
 ##### Notes
 
-`ORD_SORTING` implements the hybrid sorting algorithm of `ORD_SORT`,
+`SORT_INDEX` implements the hybrid sorting algorithm of `ORD_SORT`,
 keeping the values of `index` consistent with the elements of `array`
 as it is sorted. As a `merge sort` based algorithm, it is a stable
 sorting comparison algorithm. The optional `work` and `iwork` arrays
 replace "scratch" memory that would otherwise be allocated on the
 stack. If `array` is of any kind of `REAL` the order of the elements in
 `index` and `array` on return are undefined if any element of `array`
-is a `NaN`. It should be emphasized that the order of `array` will
-typically be different on return.
+is a `NaN`. Sorting of `CHARACTER(*)` and `STRING_TYPE` arrays are
+based on the operator `>`, and not on the function `LGT`. It should be
+emphasized that the order of `array` will typically be different on
+return.
 
 
 ##### Examples
@@ -334,15 +391,15 @@ typically be different on return.
 Sorting a related rank one array:
 
 ```Fortran
-	subroutine sort_related_data( a, b, work, index, iwork )
-	    ! Sort `b` in terms or its related array `a`
-	    integer, intent(inout)           :: a(:)
-		integer(int32), intent(inout)    :: b(:) ! The same size as a
+    subroutine sort_related_data( a, b, work, index, iwork )
+        ! Sort `b` in terms or its related array `a`
+        integer, intent(inout)           :: a(:)
+        integer(int32), intent(inout)    :: b(:) ! The same size as a
 		integer(int32), intent(inout)    :: work(:)
 		integer(int_size), intent(inout) :: index(:)
 		integer(int_size), intent(inout) :: iwork(:)
 		! Find the indices to sort a
-		call ord_sorting(a, index(1:size(a)),&
+		call sort_index(a, index(1:size(a)),&
 		    work(1:size(a)/2), iwork(1:size(a)/2))
 		! Sort b based on the sorting of a
 		b(:) = b( index(1:size(a)) )
@@ -365,7 +422,7 @@ Sorting a rank 2 array based on the data in a column
 		! Extract a component of `a_data`
 		dummy(:) = a(:, column)
 		! Find the indices to sort the column
-		call ord_sorting(dummy, index(1:size(dummy)),&
+		call sort_index(dummy, index(1:size(dummy)),&
 		    work(1:size(dummy)/2), iwork(1:size(dummy)/2))
 		! Sort a based on the sorting of its column
 		do i=1, size(a, dim=2)
@@ -375,7 +432,8 @@ Sorting a rank 2 array based on the data in a column
 ```
 
 Sorting an array of a derived type based on the dsta in one component
-```Fortran
+
+```fortran
 	subroutine sort_a_data( a_data, a, work, index, iwork )
 	    ! Sort `a_data` in terms or its component `a`
 	    type(a_type), intent(inout)      :: a_data(:)
@@ -386,64 +444,13 @@ Sorting an array of a derived type based on the dsta in one component
 		! Extract a component of `a_data`
 		a(1:size(a_data)) = a_data(:) % a
 		! Find the indices to sort the component
-		call ord_sorting(a(1:size(a_data)), index(1:size(a_data)),&
+		call sort_index(a(1:size(a_data)), index(1:size(a_data)),&
 		    work(1:size(a_data)/2), iwork(1:size(a_data)/2))
 		! Sort a_data based on the sorting of that component
 		a_data(:) = a_data( index(1:size(a_data)) )
 	end subroutine sort_a_data
 ```
 
-#### `unord_sort` - sorts an input array
-
-##### Status
-
-Experimental
-
-##### Description
-
-Returns an input array with the elements sorted in order of increasing
-value.
-
-##### Syntax
-
-`call [[stdlib_sorting(module):unord_sort(subroutine)]]unord_sort ( array )`
-
-##### Class
-
-Pure generic subroutine.
-
-##### Arguments
-
-`array` : shall be a rank one array of any of the types:
-`integer(int8)`, `integer(int16)`, `integer(int32)`, `integer(int64)`,
-`real(real32)`, `real(real64)`, or `real(real128)`. It
-is an `intent(inout)` argument. On return its input elements will be
-sorted in order of non-decreasing value.
-
-##### Notes
-
-`UNORD_SORT` implements a hybrid sorting algorithm combining
-`quicksort`, `merge sort`, and `insertion sort`. For most purposes it
-behaves like a `quicksort` with a median of three partition, providing
-good, `O(N Ln(N))`, run time performance for most random arrays, but
-defaulting to `merge sort` if the structure of the array results in
-the `quicksort` not converging as rapidly as expected. If `array` is of
-any type `REAL`, the behavior of the sorting is undefined if any
-element of `array` is a `NaN`.
-
-##### Example
-
-```Fortran
-	...
-	! Read random data from a file
-	call read_file( 'dummy_file', array )
-	! Sort the random data
-	call unord_sort( array )
-	! Process the sorted data
-	call array_search( array, values )
-	...
-```
-
 #### Specific procedures
 
 Usually the name of a generic procedure is the most convenient way of
@@ -459,31 +466,37 @@ their `array` arguments.
 |                    | `INT16_ORD_SORT`    | `INTEGER(INT16)` |
 |                    | `INT32_ORD_SORT`    | `INTEGER(INT32)` |
 |                    | `INT64_ORD_SORT`    | `INTEGER(INT64)` |
-|                    | `SP_ORD_SORT`    | `REAL(REAL32)` |
-|                    | `DP_ORD_SORT`    | `REAL(REAL64)` |
-|                    | `QP_ORD_SORT`    | `REAL(REAL128)` |
-| `ORD_SORTING`      | `INT8_ORD_SORTING`     | `INTEGER(INT8)` |
-|                    | `INT16_ORD_SORTING`    | `INTEGER(INT16)` |
-|                    | `INT32_ORD_SORTING`    | `INTEGER(INT32)` |
-|                    | `INT64_ORD_SORTING`    | `INTEGER(INT64)` |
-|                    | `SP_ORD_SORTING`    | `REAL(REAL32)` |
-|                    | `DP_ORD_SORTING`    | `REAL(REAL64)` |
-|                    | `QP_ORD_SORTING`    | `REAL(REAL128)` |
-| `UNORD_SORT`         | `INT8_UNORD_SORT`     | `INTEGER(INT8)` |
-|                    | `INT16_UNORD_SORT`    | `INTEGER(INT16)` |
-|                    | `INT32_UNORD_SORT`    | `INTEGER(INT32)` |
-|                    | `INT64_UNORD_SORT`    | `INTEGER(INT64)` |
-|                    | `SP_UNORD_SORT`    | `REAL(REAL32)` |
-|                    | `DP_UNORD_SORT`    | `REAL(REAL64)` |
-|                    | `QP_UNORD_SORT`    | `REAL(REAL128)` |
+|                    | `SP_ORD_SORT`    | `REAL(SP)` |
+|                    | `DP_ORD_SORT`    | `REAL(DP)` |
+|                    | `QP_ORD_SORT`    | `REAL(QP)` |
+|                    | `CHAR_ORD_SORT`  | `CHARACTER(*)`  |
+|                    | `STRING_ORD_SORT`  | `STRING_TYPE`  |
+| `SORT`         | `INT8_SORT`     | `INTEGER(INT8)` |
+|                    | `INT16_SORT`    | `INTEGER(INT16)` |
+|                    | `INT32_SORT`    | `INTEGER(INT32)` |
+|                    | `INT64_SORT`    | `INTEGER(INT64)` |
+|                    | `SP_SORT`    | `REAL(SP)` |
+|                    | `DP_SORT`    | `REAL(DP)` |
+|                    | `QP_SORT`    | `REAL(QP)` |
+|                    | `CHAR_SORT`  | `CHARACTER(*)` |
+|                    | `STRING_SORT`  | `STRING_TYPE` |
+| `SORT_INDEX`      | `INT8_SORT_INDEX`     | `INTEGER(INT8)` |
+|                    | `INT16_SORT_INDEX`    | `INTEGER(INT16)` |
+|                    | `INT32_SORT_INDEX`    | `INTEGER(INT32)` |
+|                    | `INT64_SORT_INDEX`    | `INTEGER(INT64)` |
+|                    | `SP_SORT_INDEX`    | `REAL(SP)` |
+|                    | `DP_SORT_INDEX`    | `REAL(DP)` |
+|                    | `QP_SORT_INDEX`    | `REAL(QP)` |
+|                    | `CHAR_SORT_INDEX`    | `CHARACTER(*)` |
+|                    | `STRING_SORT_INDEX`    | `STRING_TYPE` |
 
 
 ### Performance benchmarks
 
 We have performed benchmarks of the procedures on nine different
-arrays each of size `2**20`:
+integer arrays each of size `2**20`:
 
-* Blocks - the array is divided into siz blocks, each of distinct
+* Blocks - the array is divided into six blocks, each of distinct
   uniformly increasing integers.
 * Decreasing - values decrease uniformly from `2**20-1` to `0`.
 * Identical - all integers have the same value of 10.
@@ -499,72 +512,129 @@ arrays each of size `2**20`:
 * Random-10 - the final ten elements of the increasing array are
   replaced by random values.
 
+On three different default character arrays, each of length 4 and of
+size `26**4`:
+
+* Char. Decreasing - values decrease uniformly from `"zzzz"` to
+  `"aaaa"`.
+* Char. Increasing - values decrease uniformly from `"aaaa"` to
+  `"zzzz"`.
+* Char. Random - the set of strings from `"aaaa"` to `"zzzz"` in
+  random order.
+
+On three different `string_type` arrays, each of length 4 elements and
+of size `26**4`:
+
+* String Decreasing - values decrease uniformly from `"zzzz"` to
+  `"aaaa"`.
+* String Increasing - values decrease uniformly from `"aaaa"` to
+  `"zzzz"`.
+* String Random - the set of strings from `"aaaa"` to `"zzzz"` in
+  random order.
+
 These benchmarks have been performed on two different compilers, both
 on a MacBook Pro, featuring a 2.3 GHz Quad-Core Intel Core i5, with 8
 GB 2133 MHz LPDDR3 memory. The first compiler was GNU Fortran
 (GCC) 10.2.0, with the following results:
 
-|   Type  | Elements |   Order       |    Method     | Time (s)  |
-|---------|----------|---------------|---------------|-----------|
-| Integer | 1048576  |        Blocks |      Ord_Sort |   0.00692 |
-| Integer | 1048576  |    Decreasing |      Ord_Sort |   0.00377 |
-| Integer | 1048576  |     Identical |      Ord_Sort |   0.00236 |
-| Integer | 1048576  |    Increasing |      Ord_Sort |   0.00205 |
-| Integer | 1048576  |  Random dense |      Ord_Sort |   0.16675 |
-| Integer | 1048576  |  Random order |      Ord_Sort |   0.16780 |
-| Integer | 1048576  | Random sparse |      Ord_Sort |   0.16715 |
-| Integer | 1048576  |      Random 3 |      Ord_Sort |   0.00907 |
-| Integer | 1048576  |     Random 10 |      Ord_Sort |   0.00435 |
-| Integer | 1048576  |        Blocks |   Ord_Sorting |   0.01177 |
-| Integer | 1048576  |    Decreasing |   Ord_Sorting |   0.00695 |
-| Integer | 1048576  |     Identical |   Ord_Sorting |   0.00438 |
-| Integer | 1048576  |    Increasing |   Ord_Sorting |   0.00429 |
-| Integer | 1048576  |  Random dense |   Ord_Sorting |   0.19079 |
-| Integer | 1048576  |  Random order |   Ord_Sorting |   0.19242 |
-| Integer | 1048576  | Random sparse |   Ord_Sorting |   0.19212 |
-| Integer | 1048576  |      Random 3 |   Ord_Sorting |   0.01504 |
-| Integer | 1048576  |     Random 10 |   Ord_Sorting |   0.00794 |
-| Integer | 1048576  |        Blocks |    Unord_Sort |   0.20889 |
-| Integer | 1048576  |    Decreasing |    Unord_Sort |   0.12826 |
-| Integer | 1048576  |     Identical |    Unord_Sort |   0.15646 |
-| Integer | 1048576  |    Increasing |    Unord_Sort |   0.05012 |
-| Integer | 1048576  |  Random dense |    Unord_Sort |   0.14359 |
-| Integer | 1048576  |  Random order |    Unord_Sort |   0.14613 |
-| Integer | 1048576  | Random sparse |    Unord_Sort |   0.15159 |
-| Integer | 1048576  |      Random 3 |    Unord_Sort |   0.13826 |
-| Integer | 1048576  |     Random 10 |    Unord_Sort |   0.35356 |
-
-The second compiler was ifort (IFORT) 18.0.3 20180410, with the
-following results:
-
-|   Type  | Elements |   Order       |    Method     | Time (s)  |
-|---------|----------|---------------|---------------|-----------|
-| Integer | 1048576  |        Blocks |      Ord_Sort |   0.00231 |
-| Integer | 1048576  |    Decreasing |      Ord_Sort |   0.00123 |
-| Integer | 1048576  |     Identical |      Ord_Sort |   0.00088 |
-| Integer | 1048576  |    Increasing |      Ord_Sort |   0.00088 |
-| Integer | 1048576  |  Random dense |      Ord_Sort |   0.08914 |
-| Integer | 1048576  |  Random order |      Ord_Sort |   0.08779 |
-| Integer | 1048576  | Random sparse |      Ord_Sort |   0.08357 |
-| Integer | 1048576  |      Random 3 |      Ord_Sort |   0.00335 |
-| Integer | 1048576  |     Random 10 |      Ord_Sort |   0.00196 |
-| Integer | 1048576  |        Blocks |   Ord_Sorting |   0.00519 |
-| Integer | 1048576  |    Decreasing |   Ord_Sorting |   0.00240 |
-| Integer | 1048576  |     Identical |   Ord_Sorting |   0.00110 |
-| Integer | 1048576  |    Increasing |   Ord_Sorting |   0.00113 |
-| Integer | 1048576  |  Random dense |   Ord_Sorting |   0.10185 |
-| Integer | 1048576  |  Random order |   Ord_Sorting |   0.10222 |
-| Integer | 1048576  | Random sparse |   Ord_Sorting |   0.10376 |
-| Integer | 1048576  |      Random 3 |   Ord_Sorting |   0.00628 |
-| Integer | 1048576  |     Random 10 |   Ord_Sorting |   0.00264 |
-| Integer | 1048576  |        Blocks |    Unord_Sort |   0.07272 |
-| Integer | 1048576  |    Decreasing |    Unord_Sort |   0.03873 |
-| Integer | 1048576  |     Identical |    Unord_Sort |   0.03677 |
-| Integer | 1048576  |    Increasing |    Unord_Sort |   0.01153 |
-| Integer | 1048576  |  Random dense |    Unord_Sort |   0.06706 |
-| Integer | 1048576  |  Random order |    Unord_Sort |   0.06808 |
-| Integer | 1048576  | Random sparse |    Unord_Sort |   0.06798 |
-| Integer | 1048576  |      Random 3 |    Unord_Sort |   0.07230 |
-| Integer | 1048576  |     Random 10 |    Unord_Sort |   0.13640 |
+|    Type     | Elements |      Order      |    Method   | Time (s)  |
+|-------------|----------|-----------------|-------------|-----------|
+|     Integer | 1048576  |          Blocks |    Ord_Sort |   0.00694 |
+|     Integer | 1048576  |      Decreasing |    Ord_Sort |   0.00363 |
+|     Integer | 1048576  |       Identical |    Ord_Sort |   0.00206 |
+|     Integer | 1048576  |      Increasing |    Ord_Sort |   0.00206 |
+|     Integer | 1048576  |    Random dense |    Ord_Sort |   0.17705 |
+|     Integer | 1048576  |    Random order |    Ord_Sort |   0.17749 |
+|     Integer | 1048576  |   Random sparse |    Ord_Sort |   0.17641 |
+|     Integer | 1048576  |        Random 3 |    Ord_Sort |   0.00943 |
+|     Integer | 1048576  |       Random 10 |    Ord_Sort |   0.00378 |
+|   Character |  456976  |  Char. Decrease |    Ord_Sort |   0.00766 |
+|   Character |  456976  |  Char. Increase |    Ord_Sort |   0.00416 |
+|   Character |  456976  |    Char. Random |    Ord_Sort |   0.23689 |
+| String_type |  456976  | String Decrease |    Ord_Sort |   0.15467 |
+| String_type |  456976  | String Increase |    Ord_Sort |   0.09072 |
+| String_type |  456976  |   String Random |    Ord_Sort |   4.04153 |
+|     Integer | 1048576  |          Blocks |        Sort |   0.11175 |
+|     Integer | 1048576  |      Decreasing |        Sort |   0.14222 |
+|     Integer | 1048576  |       Identical |        Sort |   0.16076 |
+|     Integer | 1048576  |      Increasing |        Sort |   0.05380 |
+|     Integer | 1048576  |    Random dense |        Sort |   0.15519 |
+|     Integer | 1048576  |    Random order |        Sort |   0.15907 |
+|     Integer | 1048576  |   Random sparse |        Sort |   0.15792 |
+|     Integer | 1048576  |        Random 3 |        Sort |   0.23041 |
+|     Integer | 1048576  |       Random 10 |        Sort |   0.24668 |
+|   Character |  456976  |  Char. Decrease |        Sort |   0.31256 |
+|   Character |  456976  |  Char. Increase |        Sort |   0.11330 |
+|   Character |  456976  |    Char. Random |        Sort |   0.20166 |
+| String_type |  456976  | String Decrease |        Sort |   6.41878 |
+| String_type |  456976  | String Increase |        Sort |   2.26954 |
+| String_type |  456976  |   String Random |        Sort |   3.58182 |
+|     Integer | 1048576  |          Blocks |  Sort_Index |   0.01175 |
+|     Integer | 1048576  |      Decreasing |  Sort_Index |   0.00732 |
+|     Integer | 1048576  |       Identical |  Sort_Index |   0.00453 |
+|     Integer | 1048576  |      Increasing |  Sort_Index |   0.00495 |
+|     Integer | 1048576  |    Random dense |  Sort_Index |   0.20723 |
+|     Integer | 1048576  |    Random order |  Sort_Index |   0.20818 |
+|     Integer | 1048576  |   Random sparse |  Sort_Index |   0.20441 |
+|     Integer | 1048576  |        Random 3 |  Sort_Index |   0.01468 |
+|     Integer | 1048576  |       Random 10 |  Sort_Index |   0.00669 |
+|   Character |  456976  |  Char. Decrease |  Sort_Index |   0.00915 |
+|   Character |  456976  |  Char. Increase |  Sort_Index |   0.00522 |
+|   Character |  456976  |    Char. Random |  Sort_Index |   0.26615 |
+| String_type |  456976  | String Decrease |  Sort_Index |   0.19026 |
+| String_type |  456976  | String Increase |  Sort_Index |   0.09407 |
+| String_type |  456976  |   String Random |  Sort_Index |   4.12616 |
+
+The second compiler was Intel(R) Fortran Intel(R) 64 Compiler Classic
+for applications running on Intel(R) 64, Version 2021.2.0 Build
+20210228_000000, with the following results:
+
+|    Type     | Elements |      Order      |    Method   | Time (s)  |
+|-------------|----------|-----------------|-------------|-----------|
+|     Integer | 1048576  |          Blocks |    Ord_Sort |   0.00287 |
+|     Integer | 1048576  |      Decreasing |    Ord_Sort |   0.00123 |
+|     Integer | 1048576  |       Identical |    Ord_Sort |   0.00094 |
+|     Integer | 1048576  |      Increasing |    Ord_Sort |   0.00095 |
+|     Integer | 1048576  |    Random dense |    Ord_Sort |   0.09865 |
+|     Integer | 1048576  |    Random order |    Ord_Sort |   0.10138 |
+|     Integer | 1048576  |   Random sparse |    Ord_Sort |   0.10223 |
+|     Integer | 1048576  |        Random 3 |    Ord_Sort |   0.00420 |
+|     Integer | 1048576  |       Random 10 |    Ord_Sort |   0.00174 |
+|   Character |  456976  |  Char. Decrease |    Ord_Sort |   0.00721 |
+|   Character |  456976  |  Char. Increase |    Ord_Sort |   0.00354 |
+|   Character |  456976  |    Char. Random |    Ord_Sort |   0.21287 |
+| String_type |  456976  | String Decrease |    Ord_Sort |   0.50840 |
+| String_type |  456976  | String Increase |    Ord_Sort |   0.14912 |
+| String_type |  456976  |   String Random |    Ord_Sort |  15.28657 |
+|     Integer | 1048576  |          Blocks |        Sort |   0.03632 |
+|     Integer | 1048576  |      Decreasing |        Sort |   0.04135 |
+|     Integer | 1048576  |       Identical |        Sort |   0.03672 |
+|     Integer | 1048576  |      Increasing |        Sort |   0.01346 |
+|     Integer | 1048576  |    Random dense |        Sort |   0.07099 |
+|     Integer | 1048576  |    Random order |        Sort |   0.07172 |
+|     Integer | 1048576  |   Random sparse |        Sort |   0.07196 |
+|     Integer | 1048576  |        Random 3 |        Sort |   0.07729 |
+|     Integer | 1048576  |       Random 10 |        Sort |   0.14964 |
+|   Character |  456976  |  Char. Decrease |        Sort |   0.31387 |
+|   Character |  456976  |  Char. Increase |        Sort |   0.12050 |
+|   Character |  456976  |    Char. Random |        Sort |   0.20228 |
+| String_type |  456976  | String Decrease |        Sort |  23.87099 |
+| String_type |  456976  | String Increase |        Sort |   8.71762 |
+| String_type |  456976  |   String Random |        Sort |  11.50930 |
+|     Integer | 1048576  |          Blocks |  Sort_Index |   0.00563 |
+|     Integer | 1048576  |      Decreasing |  Sort_Index |   0.00230 |
+|     Integer | 1048576  |       Identical |  Sort_Index |   0.00142 |
+|     Integer | 1048576  |      Increasing |  Sort_Index |   0.00142 |
+|     Integer | 1048576  |    Random dense |  Sort_Index |   0.11723 |
+|     Integer | 1048576  |    Random order |  Sort_Index |   0.12539 |
+|     Integer | 1048576  |   Random sparse |  Sort_Index |   0.12706 |
+|     Integer | 1048576  |        Random 3 |  Sort_Index |   0.00718 |
+|     Integer | 1048576  |       Random 10 |  Sort_Index |   0.00316 |
+|   Character |  456976  |  Char. Decrease |  Sort_Index |   0.00790 |
+|   Character |  456976  |  Char. Increase |  Sort_Index |   0.00405 |
+|   Character |  456976  |    Char. Random |  Sort_Index |   0.22963 |
+| String_type |  456976  | String Decrease |  Sort_Index |   0.51546 |
+| String_type |  456976  | String Increase |  Sort_Index |   0.14960 |
+| String_type |  456976  |   String Random |  Sort_Index |  15.59673 |
 
 

From 57273986b812ab5179586c277b39f6cfdc5a3283 Mon Sep 17 00:00:00 2001
From: William Clodius <w.clodius@icloud.com>
Date: Fri, 23 Apr 2021 17:37:11 -0600
Subject: [PATCH 06/18] Commited main sorting files

Added and commited the main sorting source codes: stdlib_sorting.fypp,
stdlib_sorting_sort.fypp, stdlib_sorting_ord_sort.f90, and
stdlib_sorting_sort_index.fypp. Also commited the revised CMakeLists.txt and
Makefile.manual files needed co compile the library including the new codes.

[ticket: X]
---
 src/CMakeLists.txt                 |    4 +
 src/Makefile.manual                |   13 +
 src/stdlib_sorting.fypp            |  577 ++++++++++
 src/stdlib_sorting_ord_sort.fypp   | 1419 +++++++++++++++++++++++
 src/stdlib_sorting_sort.fypp       |  736 ++++++++++++
 src/stdlib_sorting_sort_index.fypp | 1684 ++++++++++++++++++++++++++++
 6 files changed, 4433 insertions(+)
 create mode 100644 src/stdlib_sorting.fypp
 create mode 100644 src/stdlib_sorting_ord_sort.fypp
 create mode 100644 src/stdlib_sorting_sort.fypp
 create mode 100644 src/stdlib_sorting_sort_index.fypp

diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 96a6ebcce..92f06fc04 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -9,6 +9,10 @@ set(fppFiles
     stdlib_linalg.fypp
     stdlib_linalg_diag.fypp
     stdlib_optval.fypp
+    stdlib_sorting.fypp
+    stdlib_sorting_ord_sort.fypp
+    stdlib_sorting_sort.fypp
+    stdlib_sorting_sort_index.fypp 
     stdlib_stats.fypp
     stdlib_stats_corr.fypp
     stdlib_stats_cov.fypp
diff --git a/src/Makefile.manual b/src/Makefile.manual
index 85541e038..fb176a235 100644
--- a/src/Makefile.manual
+++ b/src/Makefile.manual
@@ -9,6 +9,10 @@ SRCFYPP =\
         stdlib_quadrature.fypp \
         stdlib_quadrature_trapz.fypp \
         stdlib_quadrature_simps.fypp \
+        stdlib_sorting.fypp \
+        stdlib_sorting_ord_sort.fypp \
+        stdlib_sorting_sort.fypp \
+        stdlib_sorting_sort_index.fypp \
         stdlib_stats.fypp \
         stdlib_stats_corr.fypp \
         stdlib_stats_cov.fypp \
@@ -79,6 +83,15 @@ stdlib_quadrature_trapz.o: \
 	stdlib_quadrature.o \
 	stdlib_error.o \
 	stdlib_kinds.o
+stdlib_sorting.o: \
+	stdlib_kinds.o \
+	stdlib_string_type.o
+stdlib_sorting_ord_sort.o: \
+	stdlib_sorting.o
+stdlib_sorting_sort.o: \
+	stdlib_sorting.o
+stdlib_sorting_sort_index.o: \
+	stdlib_sorting.o
 stdlib_stats.o: \
 	stdlib_kinds.o
 stdlib_stats_corr.o: \
diff --git a/src/stdlib_sorting.fypp b/src/stdlib_sorting.fypp
new file mode 100644
index 000000000..e7fe92589
--- /dev/null
+++ b/src/stdlib_sorting.fypp
@@ -0,0 +1,577 @@
+#! Integer kinds to be considered during templating
+#:set INT_KINDS = ["int8", "int16", "int32", "int64"]
+
+#! Real kinds to be considered during templating
+#:set REAL_KINDS = ["sp", "dp", "qp"]
+
+!! Licensing:
+!!
+!! This file is subjec† both to the Fortran Standard Library license, and
+!! to additional licensing requirements as it contains translations of
+!! other software.
+!!
+!! The Fortran Standard Library, including this file, is distributed under
+!! the MIT license that should be included with the library's distribution.
+!!
+!!   Copyright (c) 2021 Fortran stdlib developers
+!!
+!!   Permission is hereby granted, free of charge, to any person obtaining a
+!!   copy of this software and associated documentation files (the
+!!   "Software"),  to deal in the Software without restriction, including
+!!   without limitation the rights to use, copy, modify, merge, publish,
+!!   distribute, sublicense, and/or sellcopies of the Software, and to permit
+!!   persons to whom the Software is furnished to do so, subject to the
+!!   following conditions:
+!!
+!!   The above copyright notice and this permission notice shall be included
+!!   in all copies or substantial portions of the Software.
+!!
+!!   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+!!   OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+!!   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+!!   IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+!!   CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+!!   TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+!!   SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+!!
+!! Two of the generic subroutines, `ORD_SORT` and `SORT_INDEX`, are
+!! substantially translations to Fortran 2008 of the `"Rust" sort` sorting
+!! routines in
+!! [`slice.rs`](https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs)
+!! The `rust sort` implementation is distributed with the header:
+!!
+!!   Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
+!!   file at the top-level directory of this distribution and at
+!!   http://rust-lang.org/COPYRIGHT.
+!!
+!!   Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
+!!   http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
+!!   <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
+!!   option. This file may not be copied, modified, or distributed
+!!   except according to those terms.
+!!
+!! so the licence for the original`slice.rs` code is compatible with the use
+!! of modified versions of the code in the Fortran Standard Library under
+!! the MIT license.
+!!
+!! One of the generic subroutines, `SORT`, is substantially a
+!! translation to Fortran 2008, of the `introsort` of David Musser.
+!! David Musser has given permission to include a variant of `introsort`
+!! in the Fortran Standard Library under the MIT license provided
+!! we cite:
+!!
+!!   Musser, D.R., “Introspective Sorting and Selection Algorithms,”
+!!   Software—Practice and Experience, Vol. 27(8), 983–993 (August 1997).
+!!
+!! as the official source of the algorithm.
+
+module stdlib_sorting
+!! This module implements overloaded sorting subroutines named `ORD_SORT`,
+!! `SORT_INDEX`, and `SORT`, that each can be used to sort four kinds
+!! of `INTEGER` arrays and three kinds of `REAL` arrays. By default, sorting
+!! is in order of increasing value, though `SORT_INDEX` has the option of
+!! sorting in order of decresasing value. All the subroutines have worst
+!! case run time performance of `O(N Ln(N))`, but on largely sorted data
+!! `ORD_SORT` and `SORT_INDEX` can have a run time performance of `O(N)`.
+!!
+!! `ORD_SORT` is a translation of the `"Rust" sort` sorting algorithm in
+!! `slice.rs`:
+!! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs
+!! which in turn is inspired by the `timsort` algorithm of Tim Peters,
+!! http://svn.python.org/projects/python/trunk/Objects/listsort.txt.
+!! `ORD_SORT` is a hybrid stable comparison algorithm combining `merge sort`,
+!! and `insertion sort`. It is always at worst O(N Ln(N)) in sorting random
+!! data, having a performance about 25% slower than `SORT` on such
+!! data, but has much better performance than `SORT` on partially
+!! sorted data, having O(N) performance on uniformly non-increasing or
+!! non-decreasing data.
+!!
+!! `SORT_INDEX` is a modification of `ORD_SORT` so that in addition to
+!! sorting the input array, it returns the indices that map to a
+!! stable sort of the original array. These indices are intended to be
+!! intended to be used to sort data that is correlated with the input
+!! array, e.g., different arrays in a database, different columns of a
+!! rank 2 array, different elements of a derived type. It is less
+!! efficient than `ORD_SORT` at sorting a simple array.
+!!
+!! `SORT` uses the `INTROSORT` sorting algorithm of David Musser,
+!! http://www.cs.rpi.edu/~musser/gp/introsort.ps. `introsort` is a hybrid
+!! unstable comparison algorithm combining `quicksort`, `insertion sort`, and
+!! `heap sort`. While this algorithm is always O(N Ln(N)) it is relatively
+!! fast on randomly ordered data, but inconsistent in performance on partly
+!! sorted data, sometimes having `merge sort` performance, sometimes having
+!! better than `quicksort` performance. `UNORD_SOORT` is about 25%
+!! more efficient than `ORD_SORT` at sorting purely random data, but af an
+!! order of `Ln(N)` less efficient at sorting partially sorted data.
+
+    use stdlib_kinds, only: &
+        int8,               &
+        int16,              &
+        int32,              &
+        int64,              &
+        sp,                 &
+        dp,                 &
+        qp
+
+    use stdlib_string_type, only: string_type, assignment(=), operator(>), &
+        operator(>=), operator(<), operator(<=)
+
+    implicit none
+    private
+
+    public :: int_size
+
+    integer, parameter :: int_size = int64 !! Integer kind for indexing
+
+! Constants for use by tim_sort
+    integer, parameter :: &
+! The maximum number of entries in a run stack, good for an array of
+! 2**64 elements see
+! https://svn.python.org/projects/python/trunk/Objects/listsort.txt
+        max_merge_stack = int( ceiling( log( 2._dp**64 ) / &
+                               log(1.6180339887_dp) ) )
+
+    type run_type
+! Version: experimental
+!
+! Used to pass state around in a stack among helper functions for the
+! `ORD_SORT` and `SORT_INDEX` algorithms
+        integer(int_size) :: base = 0
+        integer(int_size) :: len = 0
+    end type run_type
+
+    public ord_sort
+!! Version: experimental
+!!
+!! The generic subroutine implementing the `ORD_SORT` algorithm to return
+!! an input array with its elements sorted in order of non-decreasing
+!! value. Its use has the syntax:
+!!
+!!     call ord_sort( array[, work] )
+!!
+!! with the arguments:
+!!
+!! * array: the rank 1 array to be sorted. It is an `intent(inout)`
+!!   argument of any of the types `integer(int8)`, `integer(int16)`,
+!!   `integer(int32)`, `integer(int64)`, `real(real32)`, `real(real64)`,
+!!   `real(real128)`, or `character(*)`. If both the type of `array` is
+!!   real and at least one of the elements is a `NaN`, then the ordering
+!!   of the result is undefined. Otherwise it is defined to be the
+!!   original elements in non-decreasing order.
+!!
+!! * work (optional): shall be a rank 1 array of the same type as
+!!   `array`, and shall have at least `size(array)/2` elements. It is an
+!!   `intent(inout)` argument to be used as "scratch" memory
+!!   for internal record keeping. If associated with an array in static
+!!   storage, its use can significantly reduce the stack memory requirements
+!!   for the code. Its value on return is undefined.
+!!
+!!#### Example
+!!
+!!```fortran
+!!    ...
+!!    ! Read arrays from sorted files
+!!    call read_sorted_file( 'dummy_file1', array1 )
+!!    call read_sorted_file( 'dummy_file2', array2 )
+!!    ! Concatenate the arrays
+!!    allocate( array( size(array1) + size(array2) ) )
+!!    array( 1:size(array1) ) = array1(:)
+!!    array( size(array1)+1:size(array1)+size(array2) ) = array2(:)
+!!    ! Sort the resulting array
+!!    call ord_sort( array, work )
+!!    ! Process the sorted array
+!!    call array_search( array, values )
+!!    ...
+!!```
+
+    public sort
+!! Version: experimental
+!!
+!! The generic subroutine implementing the `SORT` algorithm to return
+!! an input array with its elements sorted in order of non-decreasing
+!! value. Its use has the syntax:
+!!
+!!     call sort( array )
+!!
+!! with the arguments:
+!!
+!! * array: the rank 1 array to be sorted. It is an `intent(inout)`
+!!   argument of any of the types `integer(int8)`, `integer(int16)`,
+!!   `integer(int32)`, `integer(int64)`, `real(real32)`, `real(real64)`,
+!!   `real(real128)`, or `character(*)`. If both the type of `array` is
+!!   real and at least one of the elements is a `NaN`, then the ordering
+!!   of the result is undefined. Otherwise it is defined to be the
+!!   original elements in non-decreasing order.
+!!
+!!#### Example
+!!
+!!```fortran
+!!    ...
+!!    ! Read random data from a file
+!!    call read_file( 'dummy_file', array )
+!!    ! Sort the random data
+!!    call sort( array )
+!!    ! Process the sorted data
+!!    call array_search( array, values )
+!!    ...
+!!```
+
+    public sort_index
+!! Version: experimental
+!!
+!! The generic subroutine implementing the `SORT_INDEX` algorithm to
+!! return an index array whose elements would sort the input array in the
+!! desired direction. It is primarilly intended to be used to sort a
+!! derived type array based on the values of a component of the array.
+!! Its use has the syntax:
+!!
+!!     call sort_index( array, index[, work, iwork, reverse ] )
+!!
+!! with the arguments:
+!!
+!! * array: the rank 1 array to be sorted. It is an `intent(inout)`
+!!   argument of any of the types `integer(int8)`, `integer(int16)`,
+!!   `integer(int32)`, `integer(int64)`, `real(real32)`, `real(real64)`,
+!!   `real(real128)`, or `character(*)`. If both the type of `array` is
+!!   real and at least one of the elements is a `NaN`, then the ordering
+!!   of the `array` and `index` results is undefined. Otherwise it is
+!!   defined to be as specified by reverse.
+!!
+!! * index: a rank 1 array of sorting indices. It is an `intent(inout)`
+!!   argument of the type `integer(int_size)`. Its size shall be the
+!!   same as `array`. On return, if defined, its elements would
+!!   sort the input `array` in the direction specified by `reverse`.
+!!
+!! * work (optional): shall be a rank 1 array of the same type as
+!!   `array`, and shall have at least `size(array)/2` elements. It is an
+!!   `intent(inout)` argument to be used as "scratch" memory
+!!   for internal record keeping. If associated with an array in static
+!!   storage, its use can significantly reduce the stack memory requirements
+!!   for the code. Its value on return is undefined.
+!!
+!! * iwork (optional): shall be a rank 1 integer array of kind `int_size`,
+!!   and shall have at least `size(array)/2` elements. It is an
+!!   `intent(inout)` argument to be used as "scratch" memory
+!!   for internal record keeping. If associated with an array in static
+!!   storage, its use can significantly reduce the stack memory requirements
+!!   for the code. Its value on return is undefined.
+!!
+!! * `reverse` (optional): shall be a scalar of type default logical. It
+!!   is an `intent(in)` argument. If present with a value of `.true.` then
+!!   `index` will sort `array` in order of non-increasing values in stable
+!!   order. Otherwise index will sort `array` in order of non-decreasing
+!!   values in stable order.
+!!
+!!#### Examples
+!!
+!! Sorting a related rank one array:
+!!
+!!```Fortran
+!!    subroutine sort_related_data( a, b, work, index, iwork )
+!!        ! Sort `b` in terms or its related array `a`
+!!        integer, intent(inout)           :: a(:)
+!!        integer(int32), intent(inout)    :: b(:) ! The same size as a
+!!        integer(int32), intent(inout)    :: work(:)
+!!        integer(int_size), intent(inout) :: index(:)
+!!        integer(int_size), intent(inout) :: iwork(:)
+!!    ! Find the indices to sort a
+!!        call sort_index(a, index(1:size(a)),&
+!!            work(1:size(a)/2), iwork(1:size(a)/2))
+!!    ! Sort b based on the sorting of a
+!!        b(:) = b( index(1:size(a)) )
+!!    end subroutine sort_related_data
+!!```
+!!
+!! Sorting a rank 2 array based on the data in a column
+!!
+!!```Fortran
+!!    subroutine sort_related_data( array, column, work, index, iwork )
+!!    ! Sort `a_data` in terms or its component `a`
+!!        integer, intent(inout)           :: a(:,:)
+!!        integer(int32), intent(in)       :: column
+!!        integer(int32), intent(inout)    :: work(:)
+!!        integer(int_size), intent(inout) :: index(:)
+!!        integer(int_size), intent(inout) :: iwork(:)
+!!        integer, allocatable             :: dummy(:)
+!!        integer :: i
+!!        allocate(dummy(size(a, dim=1)))
+!!    ! Extract a component of `a_data`
+!!        dummy(:) = a(:, column)
+!!    ! Find the indices to sort the column
+!!        call sort_index(dummy, index(1:size(dummy)),&
+!!                        work(1:size(dummy)/2), iwork(1:size(dummy)/2))
+!!    ! Sort a based on the sorting of its column
+!!        do i=1, size(a, dim=2)
+!!            a(:, i) = a(index(1:size(a, dim=1)), i)
+!!        end do
+!!    end subroutine sort_related_data
+!!```
+!!
+!! Sorting an array of a derived type based on the dsta in one component
+!!```fortran
+!!    subroutine sort_a_data( a_data, a, work, index, iwork )
+!!    ! Sort `a_data` in terms or its component `a`
+!!        type(a_type), intent(inout)      :: a_data(:)
+!!        integer(int32), intent(inout)    :: a(:)
+!!        integer(int32), intent(inout)    :: work(:)
+!!        integer(int_size), intent(inout) :: index(:)
+!!        integer(int_size), intent(inout) :: iwork(:)
+!!    ! Extract a component of `a_data`
+!!        a(1:size(a_data)) = a_data(:) % a
+!!    ! Find the indices to sort the component
+!!        call sort_index(a(1:size(a_data)), index(1:size(a_data)),&
+!!                        work(1:size(a_data)/2), iwork(1:size(a_data)/2))
+!!    ! Sort a_data based on the sorting of that component
+!!        a_data(:) = a_data( index(1:size(a_data)) )
+!!    end subroutine sort_a_data
+!!```
+
+#:for k1 in INT_KINDS
+    public ${k1}$_ord_sort
+!! Version: experimental
+!!
+!! `${k1}$_ord_sort` is the specific `ORD_SORT` subroutine for an `array`
+!! argument of type `integer(${k1}$)`.
+
+    public ${k1}$_sort
+!! Version: experimental
+!!
+!! `${k1}$_sort` is the specific `SORT` subroutine for an `array`
+!! argument of type `integer(${k1}$)`.
+
+    public ${k1}$_sort_index
+!! Version: experimental
+!!
+!! `${k1}$_sort_index` is the specific `SORT_INDEX` subroutine for an `array`
+!! argument of type `integer(${k1}$)`.
+
+#:endfor
+
+#:for k1 in REAL_KINDS
+    public ${k1}$_ord_sort
+!! Version: experimental
+!!
+!! `${k1}$_ord_sort` is the specific `ORD_SORT` subroutine for an `array`
+!! argument of type `real(${k1}$)`.
+
+    public ${k1}$_sort
+!! Version: experimental
+!!
+!! `${k1}$_sort` is the specific `SORT` subroutine for an `array`
+!! argument of type `real(${k1}$)`.
+
+    public ${k1}$_sort_index
+!! Version: experimental
+!!
+!! `${k1}$_sort_index` is the specific `SORT_INDEX` subroutine for an `array`
+!! argument of type `real(${k1}$)`.
+
+#:endfor
+
+    public char_ord_sort
+
+    public char_sort
+
+    public char_sort_index
+
+    public string_ord_sort
+
+    public string_sort
+
+    public string_sort_index
+
+    interface ord_sort
+!! Version: experimental
+!!
+!! The generic subroutine interface implementing the `ORD_SORT` algorithm,
+!! a translation to Fortran 2008, of the `"Rust" sort` algorithm found in
+!! `slice.rs`
+!! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+!! `ORD_SORT` is a hybrid stable comparison algorithm combining `merge sort`,
+!! and `insertion sort`. It is always at worst O(N Ln(N)) in sorting random
+!! data, having a performance about 25% slower than `SORT` on such
+!! data, but has much better performance than `SORT` on partially
+!! sorted data, having O(N) performance on uniformly non-increasing or
+!! non-decreasing data.
+
+#:for k1 in INT_KINDS
+        module subroutine ${k1}$_ord_sort( array, work )
+!! Version: experimental
+!!
+!! `${k1}$_ord_sort( array )` sorts the input `ARRAY` of type `INTEGER(${k1}$)`
+!! using a hybrid sort based on the `'Rust" sort` algorithm found in `slice.rs`
+            integer(${k1}$), intent(inout)           :: array(0:)
+            integer(${k1}$), intent(inout), optional :: work(0:)
+        end subroutine ${k1}$_ord_sort
+
+#:endfor
+
+#:for k1 in REAL_KINDS
+        module subroutine ${k1}$_ord_sort( array, work )
+ !! Version: experimental
+!!
+!! `${k1}$_ord_sort( array )` sorts the input `ARRAY` of type `REAL(${k1}$)`
+!! using a hybrid sort based on the `'Rust" sort` algorithm found in `slice.rs`
+            real(${k1}$), intent(inout)           :: array(0:)
+            real(${k1}$), intent(inout), optional :: work(0:)
+        end subroutine ${k1}$_ord_sort
+
+#:endfor
+        module subroutine char_ord_sort( array, work )
+!! Version: experimental
+!!
+!! `char_ord_sort( array )` sorts the input `ARRAY` of type `CHARACTER(*)`
+!! using a hybrid sort based on the `'Rust" sort` algorithm found in `slice.rs`
+            character(len=*), intent(inout)                    :: array(0:)
+            character(len=len(array)), intent(inout), optional :: work(0:)
+        end subroutine char_ord_sort
+
+        module subroutine string_ord_sort( array, work )
+!! Version: experimental
+!!
+!! `char_ord_sort( array )` sorts the input `ARRAY` of type `CHARACTER(*)`
+!! using a hybrid sort based on the `'Rust" sort` algorithm found in `slice.rs`
+            type(string_type), intent(inout)           :: array(0:)
+            type(string_type), intent(inout), optional :: work(0:)
+        end subroutine string_ord_sort
+
+    end interface ord_sort
+
+    interface sort
+!! Version: experimental
+!!
+!! The generic subroutine interface implementing the `SORT` algorithm, based
+!! on the `introsort` of David Musser.
+
+#:for k1 in INT_KINDS
+        pure module subroutine ${k1}$_sort( array )
+!! Version: experimental
+!!
+!! `${k1}$_sort( array )` sorts the input `ARRAY` of type `INTEGER(${k1}$)`
+!! using a hybrid sort based on the `introsort` of David Musser.
+!! The algorithm is of order O(N Ln(N)) for all inputs.
+!! Because it relies on `quicksort`, the coefficient of the O(N Ln(N))
+!! behavior is small for random data compared to other sorting algorithms.
+            integer(${k1}$), intent(inout) :: array(0:)
+        end subroutine ${k1}$_sort
+
+#:endfor
+
+#:for k1 in REAL_KINDS
+        pure module subroutine ${k1}$_sort( array )
+!! Version: experimental
+!!
+!! `${k1}$_sort( array )` sorts the input `ARRAY` of type `REAL(${k1}$)`
+!! using a hybrid sort based on the `introsort` of David Musser.
+!! The algorithm is of order O(N Ln(N)) for all inputs.
+!! Because it relies on `quicksort`, the coefficient of the O(N Ln(N))
+!! behavior is small for random data compared to other sorting algorithms.
+            real(${k1}$), intent(inout) :: array(0:)
+        end subroutine ${k1}$_sort
+
+#:endfor
+
+        pure module subroutine char_sort( array )
+!! Version: experimental
+!!
+!! `char_sort( array )` sorts the input `ARRAY` of type `CHARACTER(*)`
+!! using a hybrid sort based on the `introsort` of David Musser.
+!! The algorithm is of order O(N Ln(N)) for all inputs.
+!! Because it relies on `quicksort`, the coefficient of the O(N Ln(N))
+!! behavior is small for random data compared to other sorting algorithms.
+            character(len=*), intent(inout) :: array(0:)
+        end subroutine char_sort
+
+        pure module subroutine string_sort( array )
+!! Version: experimental
+!!
+!! `string_sort( array )` sorts the input `ARRAY` of type `STRING_TYPE`
+!! using a hybrid sort based on the `introsort` of David Musser.
+!! The algorithm is of order O(N Ln(N)) for all inputs.
+!! Because it relies on `quicksort`, the coefficient of the O(N Ln(N))
+!! behavior is small for random data compared to other sorting algorithms.
+            type(string_type), intent(inout) :: array(0:)
+        end subroutine string_sort
+
+    end interface sort
+
+    interface sort_index
+!! Version: experimental
+!!
+!! The generic subroutine interface implementing the `SORT_INDEX` algorithm,
+!! based on the `"Rust" sort` algorithm found in `slice.rs`
+!! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+!! but modified to return an array of indices that would provide a stable
+!! sort of the rank one `ARRAY` input. The indices by default correspond to a
+!! non-decreasing sort, but if the optional argument `REVERSE` is present
+!! with a value of `.TRUE.` the indices correspond to a non-increasing sort.
+
+#:for k1 in INT_KINDS
+        module subroutine ${k1}$_sort_index( array, index, work, iwork, &
+            reverse )
+!! Version: experimental
+!!
+!! `${k1}$_sort_index( array )` sorts an input `ARRAY` of type `INTEGER(${k1}$)`
+!! using a hybrid sort based on the `'Rust" sort` algorithm found in `slice.rs`
+!! and returns the sorted `ARRAY` and an array `INDEX of indices in the
+!! order that would sort the input `ARRAY` in the desired direction.
+            integer(${k1}$), intent(inout)             :: array(0:)
+            integer(int_size), intent(inout)           :: index(0:)
+            integer(${k1}$), intent(inout), optional   :: work(0:)
+            integer(int_size), intent(inout), optional :: iwork(0:)
+            logical, intent(in), optional              :: reverse
+        end subroutine ${k1}$_sort_index
+
+#:endfor
+
+#:for k1 in REAL_KINDS
+        module subroutine ${k1}$_sort_index( array, index, work, iwork, &
+            reverse )
+!! Version: experimental
+!!
+!! `${k1}$_sort_index( array )` sorts an input `ARRAY` of type `REAL(${k1}$)`
+!! using a hybrid sort based on the `'Rust" sort` algorithm found in `slice.rs`
+!! and returns the sorted `ARRAY` and an array `INDEX of indices in the
+!! order that would sort the input `ARRAY` in the desired direction.
+            real(${k1}$), intent(inout)                :: array(0:)
+            integer(int_size), intent(inout)           :: index(0:)
+            real(${k1}$), intent(inout), optional      :: work(0:)
+            integer(int_size), intent(inout), optional :: iwork(0:)
+            logical, intent(in), optional              :: reverse
+        end subroutine ${k1}$_sort_index
+
+#:endfor
+        module subroutine char_sort_index( array, index, work, iwork, &
+            reverse )
+!! Version: experimental
+!!
+!! `char_sort_index( array )` sorts an input `ARRAY` of type `CHARACTER(*)`
+!! using a hybrid sort based on the `'Rust" sort` algorithm found in `slice.rs`
+!! and returns the sorted `ARRAY` and an array `INDEX of indices in the
+!! order that would sort the input `ARRAY` in the desired direction.
+            character(len=*), intent(inout)                    :: array(0:)
+            integer(int_size), intent(inout)                   :: index(0:)
+            character(len=len(array)), intent(inout), optional :: work(0:)
+            integer(int_size), intent(inout), optional         :: iwork(0:)
+            logical, intent(in), optional                      :: reverse
+        end subroutine char_sort_index
+
+        module subroutine string_sort_index( array, index, work, iwork, &
+            reverse )
+!! Version: experimental
+!!
+!! `string_sort_index( array )` sorts an input `ARRAY` of type `STRING_TYPE`
+!! using a hybrid sort based on the `'Rust" sort` algorithm found in `slice.rs`
+!! and returns the sorted `ARRAY` and an array `INDEX of indices in the
+!! order that would sort the input `ARRAY` in the desired direction.
+            type(string_type), intent(inout)           :: array(0:)
+            integer(int_size), intent(inout)           :: index(0:)
+            type(string_type), intent(inout), optional :: work(0:)
+            integer(int_size), intent(inout), optional :: iwork(0:)
+            logical, intent(in), optional              :: reverse
+        end subroutine string_sort_index
+
+    end interface sort_index
+
+
+end module stdlib_sorting
diff --git a/src/stdlib_sorting_ord_sort.fypp b/src/stdlib_sorting_ord_sort.fypp
new file mode 100644
index 000000000..0e8c59a47
--- /dev/null
+++ b/src/stdlib_sorting_ord_sort.fypp
@@ -0,0 +1,1419 @@
+#! Integer kinds to be considered during templating
+#:set INT_KINDS = ["int8", "int16", "int32", "int64"]
+
+#! Real kinds to be considered during templating
+#:set REAL_KINDS = ["sp", "dp", "qp"]
+
+!! Licensing:
+!!
+!! This file is subjec† both to the Fortran Standard Library license, and
+!! to additional licensing requirements as it contains translations of
+!! other software.
+!!
+!! The Fortran Standard Library, including this file, is distributed under
+!! the MIT license that should be included with the library's distribution.
+!!
+!!   Copyright (c) 2021 Fortran stdlib developers
+!!
+!!   Permission is hereby granted, free of charge, to any person obtaining a
+!!   copy of this software and associated documentation files (the
+!!   "Software"),  to deal in the Software without restriction, including
+!!   without limitation the rights to use, copy, modify, merge, publish,
+!!   distribute, sublicense, and/or sellcopies of the Software, and to permit
+!!   persons to whom the Software is furnished to do so, subject to the
+!!   following conditions:
+!!
+!!   The above copyright notice and this permission notice shall be included
+!!   in all copies or substantial portions of the Software.
+!!
+!!   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+!!   OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+!!   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+!!   IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+!!   CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+!!   TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+!!   SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+!!
+!! The generic subroutine, `ORD_SORT`, is substantially a translation to
+!! Fortran 2008 of the `"Rust" sort` sorting routines in
+!! [`slice.rs`](https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs)
+!! The `rust sort` implementation is distributed with the header:
+!!
+!!   Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
+!!   file at the top-level directory of this distribution and at
+!!   http://rust-lang.org/COPYRIGHT.
+!!
+!!   Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
+!!   http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
+!!   <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
+!!   option. This file may not be copied, modified, or distributed
+!!   except according to those terms.
+!!
+!! so the licence for the original`slice.rs` code is compatible with the use
+!! of modified versions of the code in the Fortran Standard Library under
+!! the MIT license.
+
+submodule(stdlib_sorting) stdlib_sorting_ord_sort
+
+    implicit none
+
+contains
+
+#:for k1 in INT_KINDS
+
+    module subroutine ${k1}$_ord_sort( array, work )
+! A translation to Fortran 2008, of the `"Rust" sort` algorithm found in
+! `slice.rs`
+! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+! The Rust version in turn is a simplification of the Timsort algorithm
+! described in
+! https://svn.python.org/projects/python/trunk/Objects/listsort.txt, as
+! it drops both the use of 'galloping' to identify bounds of regions to be
+! sorted and the estimation of the optimal `run size`. However it remains
+! a hybrid sorting algorithm combining an iterative Merge sort controlled
+! by a stack of `RUNS` identified by regions of uniformly decreasing or
+! non-decreasing sequences that may be expanded to a minimum run size and
+! initialy processed by an insertion sort.
+!
+! Note the Fortran implementation simplifies the logic as it only has to
+! deal with Fortran arrays of intrinsic types and not the full generality
+! of Rust's arrays and lists for arbitrary types. It also adds the
+! estimation of the optimal `run size` as suggested in Tim Peters'
+! original `listsort.txt`, and an optional `work` array to be used as
+! scratch memory.
+        integer(${k1}$), intent(inout)           :: array(0:)
+        integer(${k1}$), intent(inout), optional :: work(0:)
+
+        integer(${k1}$), allocatable :: buf(:)
+        integer(int_size) :: array_size
+        integer :: stat
+
+        if ( present(work) ) then
+! Use the work array as scratch memory
+            call merge_sort( array, work )
+        else
+! Allocate a buffer to use as scratch memory.
+            array_size = size( array, kind=int_size )
+            allocate( buf(0:array_size/2-1), stat=stat )
+            if ( stat /= 0 ) error stop "Allocation of buffer failed."
+            call merge_sort( array, buf )
+        end if
+
+    contains
+
+        pure function calc_min_run( n ) result(min_run)
+!! Returns the minimum length of a run from 32-63 so that N/MIN_RUN is
+!! less than or equal to a power of two. See
+!! https://svn.python.org/projects/python/trunk/Objects/listsort.txt
+            integer(int_size)             :: min_run
+            integer(int_size), intent(in) :: n
+
+            integer(int_size) :: num, r
+
+            num = n
+            r = 0_int_size
+
+            do while( num >= 64 )
+                r = ior( r, iand(num, 1_int_size) )
+                num = ishft(num, -1_int_size)
+            end do
+            min_run = num + r
+
+        end function calc_min_run
+
+
+        pure subroutine insertion_sort( array )
+! Sorts `ARRAY` using an insertion sort.
+            integer(${k1}$), intent(inout) :: array(0:)
+
+            integer(int_size) :: i, j
+            integer(${k1}$) :: key
+
+            do j=1, size(array, kind=int_size)-1
+                key = array(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+            end do
+
+        end subroutine insertion_sort
+
+
+        pure function collapse( runs ) result ( r )
+! Examine the stack of runs waiting to be merged, identifying adjacent runs
+! to be merged until the stack invariants are restablished:
+!
+! 1. len(-3) > len(-2) + len(-1)
+! 2. len(-2) > len(-1)
+            integer(int_size) :: r
+            type(run_type), intent(in), target :: runs(0:)
+
+            integer(int_size) :: n
+            logical :: test
+
+            n = size(runs, kind=int_size)
+            test = .false.
+            if (n >= 2) then
+                if ( runs( n-1 ) % base == 0 .or. &
+                     runs( n-2 ) % len <= runs(n-1) % len ) then
+                    test = .true.
+                else if ( n >= 3 ) then ! X exists
+                    if ( runs(n-3) % len <= &
+                         runs(n-2) % len + runs(n-1) % len ) then
+                        test = .true.
+!               |X| <= |Y| + |Z| => will need to merge due to rho1 or rho2
+                    else if( n >= 4 ) then
+                        if ( runs(n-4) % len <= &
+                             runs(n-3) % len + runs(n-2) % len ) then
+                            test = .true.
+!               |W| <= |X| + |Y| => will need to merge due to rho1 or rho3
+                        end if
+                    end if
+                end if
+            end if
+            if ( test ) then
+! By default merge Y & Z, rho2 or rho3
+                if ( n >= 3 ) then
+                    if ( runs(n-3) % len < runs(n-1) % len ) then
+                        r = n - 3
+! |X| < |Z| => merge X & Y, rho1
+                        return
+                    end if
+                end if
+                r = n - 2
+! |Y| <= |Z| => merge Y & Z, rho4
+                return
+            else
+                r = -1
+            end if
+
+        end function collapse
+
+
+        pure subroutine insert_head( array )
+! Inserts `array(0)` into the pre-sorted sequence `array(1:)` so that the
+! whole `array(0:)` becomes sorted, copying the first element into
+! a temporary variable, iterating until the right place for it is found.
+! copying every traversed element into the slot preceding it, and finally,
+! copying data from the temporary variable into the resulting hole.
+
+            integer(${k1}$), intent(inout) :: array(0:)
+
+            integer(${k1}$) :: tmp
+            integer(int_size) :: i
+
+            tmp = array(0)
+            find_hole: do i=1, size(array, kind=int_size)-1
+                if ( array(i) >= tmp ) exit find_hole
+                array(i-1) = array(i)
+            end do find_hole
+            array(i-1) = tmp
+
+        end subroutine insert_head
+
+
+        subroutine merge_sort( array, buf )
+! The Rust merge sort borrows some (but not all) of the ideas from TimSort,
+! which is described in detail at
+! (http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
+!
+! The algorithm identifies strictly descending and non-descending
+! subsequences, which are called natural runs. Where these runs are less
+! than a minimum run size they are padded by adding additional samples
+! using an insertion sort. The merge process is driven by a stack of
+! pending unmerged runs. Each newly found run is pushed onto the stack,
+! and then pairs of adjacentd runs are merged until these two invariants
+! are satisfied:
+!
+! 1. for every `i` in `1..size(runs)-1`: `runs(i - 1)%len > runs(i)%len`
+! 2. for every `i` in `2..size(runs)-1`: `runs(i - 2)%len >
+!    runs(i - 1)%len + runs(i)%len`
+!
+! The invariants ensure that the total running time is `O(n log n)`
+! worst-case.
+
+            integer(${k1}$), intent(inout) :: array(0:)
+            integer(${k1}$), intent(inout) :: buf(0:)
+
+            integer(int_size) :: array_size, finish, min_run, r, r_count, &
+                start
+            type(run_type) :: runs(0:max_merge_stack-1), left, right
+
+            array_size = size(array, kind=int_size)
+
+! Very short runs are extended using insertion sort to span at least
+! min_run elements. Slices of up to this length are sorted using insertion
+! sort.
+            min_run = calc_min_run( array_size )
+
+            if ( array_size <= min_run ) then
+                if ( array_size >= 2 ) call insertion_sort( array )
+                return
+            end if
+
+! Following Rust sort, natural runs in `array` are identified by traversing
+! it backwards. By traversing it backward, merges more often go in the
+! opposite direction (forwards). According to developers of Rust sort,
+! merging forwards is slightly faster than merging backwards. Therefore
+! identifying runs by traversing backwards should improve performance.
+            r_count = 0
+            finish = array_size - 1
+            do while ( finish >= 0 )
+! Find the next natural run, and reverse it if it's strictly descending.
+                start = finish
+                if ( start > 0 ) then
+                    start = start - 1
+                    if ( array(start+1) < array(start) ) then
+                        Descending: do while ( start > 0 )
+                            if ( array(start) >= array(start-1) ) &
+                                exit Descending
+                            start = start - 1
+                        end do Descending
+                        call reverse_segment( array(start:finish) )
+                    else
+                        Ascending: do while( start > 0 )
+                            if ( array(start) < array(start-1) ) exit Ascending
+                            start = start - 1
+                        end do Ascending
+                    end if
+                end if
+
+! If the run is too short insert some more elements using an insertion sort.
+                Insert: do while ( start > 0 )
+                    if ( finish - start >= min_run - 1 ) exit Insert
+                    start = start - 1
+                    call insert_head( array(start:finish) )
+                end do Insert
+                if ( start == 0 .and. finish == array_size - 1 ) return
+
+                runs(r_count) = run_type( base = start, &
+                                          len = finish - start + 1 )
+                finish = start-1
+                r_count = r_count + 1
+
+! Determine whether pairs of adjacent runs need to be merged to satisfy
+! the invariants, and, if so, merge them.
+                Merge_loop: do
+                    r = collapse( runs(0:r_count - 1) )
+                    if ( r < 0 .or. r_count <= 1 ) exit Merge_loop
+                    left = runs( r + 1 )
+                    right = runs( r )
+                    call merge( array( left % base: &
+                                       right % base + right % len - 1 ), &
+                                left % len, buf )
+
+                    runs(r) = run_type( base = left % base, &
+                                        len = left % len + right % len )
+                    if ( r == r_count - 3 ) runs(r+1) = runs(r+2)
+                    r_count = r_count - 1
+
+                end do Merge_loop
+            end do
+            if ( r_count /= 1 ) &
+                error stop "MERGE_SORT completed without RUN COUNT == 1."
+
+        end subroutine merge_sort
+
+
+        pure subroutine merge( array, mid, buf )
+! Merges the two non-decreasing runs `ARRAY(0:MID-1)` and `ARRAY(MID:)`
+! using `BUF` as temporary storage, and stores the merged runs into
+! `ARRAY(0:)`. `MID` must be > 0, and < `SIZE(ARRAY)-1`. Buffer `BUF`
+! must be long enough to hold the shorter of the two runs.
+            integer(${k1}$), intent(inout) :: array(0:)
+            integer(int_size), intent(in)  :: mid
+            integer(${k1}$), intent(inout) :: buf(0:)
+
+            integer(int_size) :: array_len, i, j, k
+
+            array_len = size(array, kind=int_size)
+
+! Merge first copies the shorter run into `buf`. Then, depending on which
+! run was shorter, it traces the copied run and the longer run forwards
+! (or backwards), comparing their next unprocessed elements and then
+! copying the lesser (or greater) one into `array`.
+
+            if ( mid <= array_len - mid ) then ! The left run is shorter.
+                buf(0:mid-1) = array(0:mid-1)
+                i = 0
+                j = mid
+                merge_lower: do k = 0, array_len-1
+                    if ( buf(i) <= array(j) ) then
+                        array(k) = buf(i)
+                        i = i + 1
+                        if ( i >= mid ) exit merge_lower
+                    else
+                        array(k) = array(j)
+                        j = j + 1
+                        if ( j >= array_len ) then
+                            array(k+1:) = buf(i:mid-1)
+                            exit merge_lower
+                        end if
+                    end if
+                end do merge_lower
+            else ! The right run is shorter ! check that it is stable
+                buf(0:array_len-mid) = array(mid:array_len-1)
+                i = mid - 1
+                j = array_len - mid -1
+                merge_upper: do k = array_len-1, 0, -1
+                    if ( buf(j) >= array(i) ) then
+                        array(k) = buf(j)
+                        j = j - 1
+                        if ( j < 0 ) exit merge_upper
+                    else
+                        array(k) = array(i)
+                        i = i - 1
+                        if ( i < 0 ) then
+                            array(0:k-1) = buf(0:j)
+                            exit merge_upper
+                        end if
+                    end if
+                end do merge_upper
+            end if
+        end subroutine merge
+
+
+        pure subroutine reverse_segment( array )
+! Reverse a segment of an array in place
+            integer(${k1}$), intent(inout) :: array(0:)
+
+            integer(int_size) :: lo, hi
+            integer(${k1}$) :: temp
+
+            lo = 0
+            hi = size( array, kind=int_size ) - 1
+            do while( lo < hi )
+                temp = array(lo)
+                array(lo) = array(hi)
+                array(hi) = temp
+                lo = lo + 1
+                hi = hi - 1
+            end do
+
+        end subroutine reverse_segment
+
+    end subroutine ${k1}$_ord_sort
+
+#:endfor
+
+
+#:for k1 in REAL_KINDS
+
+    module subroutine ${k1}$_ord_sort( array, work )
+! A translation to Fortran 2008, of the `"Rust" sort` algorithm found in
+! `slice.rs`
+! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+! The Rust version in turn is a simplification of the Timsort algorithm
+! described in
+! https://svn.python.org/projects/python/trunk/Objects/listsort.txt, as
+! it drops both the use of 'galloping' to identify bounds of regions to be
+! sorted and the estimation of the optimal `run size`. However it remains
+! a hybrid sorting algorithm combining an iterative Merge sort controlled
+! by a stack of `RUNS` identified by regions of uniformly decreasing or
+! non-decreasing sequences that may be expanded to a minimum run size and
+! initialy processed by an insertion sort.
+!
+! Note the Fortran implementation simplifies the logic as it only has to
+! deal with Fortran arrays of intrinsic types and not the full generality
+! of Rust's arrays and lists for arbitrary types. It also adds the
+! estimation of the optimal `run size` as suggested in Tim Peters'
+! original `listsort.txt`, and an optional `work` array to be used as
+! scratch memory.
+        real(${k1}$), intent(inout)           :: array(0:)
+        real(${k1}$), intent(inout), optional :: work(0:)
+
+        real(${k1}$), allocatable :: buf(:)
+        integer(int_size) :: array_size
+        integer :: stat
+
+        if ( present(work) ) then
+! Use the work array as scratch memory
+            call merge_sort( array, work )
+        else
+! Allocate a buffer to use as scratch memory.
+            array_size = size( array, kind=int_size )
+            allocate( buf(0:array_size/2-1), stat=stat )
+            if ( stat /= 0 ) error stop "Allocation of buffer failed."
+            call merge_sort( array, buf )
+        end if
+
+    contains
+
+        pure function calc_min_run( n ) result(min_run)
+!! Returns the minimum length of a run from 32-63 so that N/MIN_RUN is
+!! less than or equal to a power of two. See
+!! https://svn.python.org/projects/python/trunk/Objects/listsort.txt
+            integer(int_size)             :: min_run
+            integer(int_size), intent(in) :: n
+
+            integer(int_size) :: num, r
+
+            num = n
+            r = 0_int_size
+
+            do while( num >= 64 )
+                r = ior( r, iand(num, 1_int_size) )
+                num = ishft(num, -1_int_size)
+            end do
+            min_run = num + r
+
+        end function calc_min_run
+
+
+        pure subroutine insertion_sort( array )
+! Sorts `ARRAY` using an insertion sort.
+            real(${k1}$), intent(inout) :: array(0:)
+
+            integer(int_size) :: i, j
+            real(${k1}$) :: key
+
+            do j=1, size(array, kind=int_size)-1
+                key = array(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+            end do
+
+        end subroutine insertion_sort
+
+
+        pure function collapse( runs ) result ( r )
+! Examine the stack of runs waiting to be merged, identifying adjacent runs
+! to be merged until the stack invariants are restablished:
+!
+! 1. len(-3) > len(-2) + len(-1)
+! 2. len(-2) > len(-1)
+            integer(int_size) :: r
+            type(run_type), intent(in), target :: runs(0:)
+
+            integer(int_size) :: n
+            logical :: test
+
+            n = size(runs, kind=int_size)
+            test = .false.
+            if (n >= 2) then
+                if ( runs( n-1 ) % base == 0 .or. &
+                     runs( n-2 ) % len <= runs(n-1) % len ) then
+                    test = .true.
+                else if ( n >= 3 ) then ! X exists
+                    if ( runs(n-3) % len <= &
+                         runs(n-2) % len + runs(n-1) % len ) then
+                        test = .true.
+!               |X| <= |Y| + |Z| => will need to merge due to rho1 or rho2
+                    else if( n >= 4 ) then
+                        if ( runs(n-4) % len <= &
+                             runs(n-3) % len + runs(n-2) % len ) then
+                            test = .true.
+!               |W| <= |X| + |Y| => will need to merge due to rho1 or rho3
+                        end if
+                    end if
+                end if
+            end if
+            if ( test ) then
+! By default merge Y & Z, rho2 or rho3
+                if ( n >= 3 ) then
+                    if ( runs(n-3) % len < runs(n-1) % len ) then
+                        r = n - 3
+! |X| < |Z| => merge X & Y, rho1
+                        return
+                    end if
+                end if
+                r = n - 2
+! |Y| <= |Z| => merge Y & Z, rho4
+                return
+            else
+                r = -1
+            end if
+
+        end function collapse
+
+
+        pure subroutine insert_head( array )
+! Inserts `array(0)` into the pre-sorted sequence `array(1:)` so that the
+! whole `array(0:)` becomes sorted, copying the first element into
+! a temporary variable, iterating until the right place for it is found.
+! copying every traversed element into the slot preceding it, and finally,
+! copying data from the temporary variable into the resulting hole.
+
+            real(${k1}$), intent(inout) :: array(0:)
+
+            real(${k1}$) :: tmp
+            integer(int_size) :: i
+
+            tmp = array(0)
+            find_hole: do i=1, size(array, kind=int_size)-1
+                if ( array(i) >= tmp ) exit find_hole
+                array(i-1) = array(i)
+            end do find_hole
+            array(i-1) = tmp
+
+        end subroutine insert_head
+
+
+        subroutine merge_sort( array, buf )
+! The Rust merge sort borrows some (but not all) of the ideas from TimSort,
+! which is described in detail at
+! (http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
+!
+! The algorithm identifies strictly descending and non-descending
+! subsequences, which are called natural runs. Where these runs are less
+! than a minimum run size they are padded by adding additional samples
+! using an insertion sort. The merge process is driven by a stack of
+! pending unmerged runs. Each newly found run is pushed onto the stack,
+! and then pairs of adjacentd runs are merged until these two invariants
+! are satisfied:
+!
+! 1. for every `i` in `1..size(runs)-1`: `runs(i - 1)%len > runs(i)%len`
+! 2. for every `i` in `2..size(runs)-1`: `runs(i - 2)%len >
+!    runs(i - 1)%len + runs(i)%len`
+!
+! The invariants ensure that the total running time is `O(n log n)`
+! worst-case.
+
+            real(${k1}$), intent(inout) :: array(0:)
+            real(${k1}$), intent(inout) :: buf(0:)
+
+            integer(int_size) :: array_size, finish, min_run, r, r_count, &
+                start
+            type(run_type) :: runs(0:max_merge_stack-1), left, right
+
+            array_size = size(array, kind=int_size)
+
+! Very short runs are extended using insertion sort to span at least
+! min_run elements. Slices of up to this length are sorted using insertion
+! sort.
+            min_run = calc_min_run( array_size )
+
+            if ( array_size <= min_run ) then
+                if ( array_size >= 2 ) call insertion_sort( array )
+                return
+            end if
+
+! Following Rust sort, natural runs in `array` are identified by traversing
+! it backwards. By traversing it backward, merges more often go in the
+! opposite direction (forwards). According to developers of Rust sort,
+! merging forwards is slightly faster than merging backwards. Therefore
+! identifying runs by traversing backwards should improve performance.
+            r_count = 0
+            finish = array_size - 1
+            do while ( finish >= 0 )
+! Find the next natural run, and reverse it if it's strictly descending.
+                start = finish
+                if ( start > 0 ) then
+                    start = start - 1
+                    if ( array(start+1) < array(start) ) then
+                        Descending: do while ( start > 0 )
+                            if ( array(start) >= array(start-1) ) &
+                                exit Descending
+                            start = start - 1
+                        end do Descending
+                        call reverse_segment( array(start:finish) )
+                    else
+                        Ascending: do while( start > 0 )
+                            if ( array(start) < array(start-1) ) exit Ascending
+                            start = start - 1
+                        end do Ascending
+                    end if
+                end if
+
+! If the run is too short insert some more elements using an insertion sort.
+                Insert: do while ( start > 0 )
+                    if ( finish - start >= min_run - 1 ) exit Insert
+                    start = start - 1
+                    call insert_head( array(start:finish) )
+                end do Insert
+                if ( start == 0 .and. finish == array_size - 1 ) return
+
+                runs(r_count) = run_type( base = start, &
+                                          len = finish - start + 1 )
+                finish = start-1
+                r_count = r_count + 1
+
+! Determine whether pairs of adjacent runs need to be merged to satisfy
+! the invariants, and, if so, merge them.
+                Merge_loop: do
+                    r = collapse( runs(0:r_count - 1) )
+                    if ( r < 0 .or. r_count <= 1 ) exit Merge_loop
+                    left = runs( r + 1 )
+                    right = runs( r )
+                    call merge( array( left % base: &
+                                       right % base + right % len - 1 ), &
+                                left % len, buf )
+
+                    runs(r) = run_type( base = left % base, &
+                                        len = left % len + right % len )
+                    if ( r == r_count - 3 ) runs(r+1) = runs(r+2)
+                    r_count = r_count - 1
+
+                end do Merge_loop
+            end do
+            if ( r_count /= 1 ) &
+                error stop "MERGE_SORT completed without RUN COUNT == 1."
+
+        end subroutine merge_sort
+
+
+        pure subroutine merge( array, mid, buf )
+! Merges the two non-decreasing runs `ARRAY(0:MID-1)` and `ARRAY(MID:)`
+! using `BUF` as temporary storage, and stores the merged runs into
+! `ARRAY(0:)`. `MID` must be > 0, and < `SIZE(ARRAY)-1`. Buffer `BUF`
+! must be long enough to hold the shorter of the two runs.
+            real(${k1}$), intent(inout) :: array(0:)
+            integer(int_size), intent(in)  :: mid
+            real(${k1}$), intent(inout) :: buf(0:)
+
+            integer(int_size) :: array_len, i, j, k
+
+            array_len = size(array, kind=int_size)
+
+! Merge first copies the shorter run into `buf`. Then, depending on which
+! run was shorter, it traces the copied run and the longer run forwards
+! (or backwards), comparing their next unprocessed elements and then
+! copying the lesser (or greater) one into `array`.
+
+            if ( mid <= array_len - mid ) then ! The left run is shorter.
+                buf(0:mid-1) = array(0:mid-1)
+                i = 0
+                j = mid
+                merge_lower: do k = 0, array_len-1
+                    if ( buf(i) <= array(j) ) then
+                        array(k) = buf(i)
+                        i = i + 1
+                        if ( i >= mid ) exit merge_lower
+                    else
+                        array(k) = array(j)
+                        j = j + 1
+                        if ( j >= array_len ) then
+                            array(k+1:) = buf(i:mid-1)
+                            exit merge_lower
+                        end if
+                    end if
+                end do merge_lower
+            else ! The right run is shorter ! check that it is stable
+                buf(0:array_len-mid) = array(mid:array_len-1)
+                i = mid - 1
+                j = array_len - mid -1
+                merge_upper: do k = array_len-1, 0, -1
+                    if ( buf(j) >= array(i) ) then
+                        array(k) = buf(j)
+                        j = j - 1
+                        if ( j < 0 ) exit merge_upper
+                    else
+                        array(k) = array(i)
+                        i = i - 1
+                        if ( i < 0 ) then
+                            array(0:k-1) = buf(0:j)
+                            exit merge_upper
+                        end if
+                    end if
+                end do merge_upper
+            end if
+        end subroutine merge
+
+
+        pure subroutine reverse_segment( array )
+! Reverse a segment of an array in place
+            real(${k1}$), intent(inout) :: array(0:)
+
+            integer(int_size) :: lo, hi
+            real(${k1}$) :: temp
+
+            lo = 0
+            hi = size( array, kind=int_size ) - 1
+            do while( lo < hi )
+                temp = array(lo)
+                array(lo) = array(hi)
+                array(hi) = temp
+                lo = lo + 1
+                hi = hi - 1
+            end do
+
+        end subroutine reverse_segment
+
+    end subroutine ${k1}$_ord_sort
+
+#:endfor
+
+
+    module subroutine char_ord_sort( array, work )
+! A translation to Fortran 2008, of the `"Rust" sort` algorithm found in
+! `slice.rs`
+! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+! The Rust version in turn is a simplification of the Timsort algorithm
+! described in
+! https://svn.python.org/projects/python/trunk/Objects/listsort.txt, as
+! it drops both the use of 'galloping' to identify bounds of regions to be
+! sorted and the estimation of the optimal `run size`. However it remains
+! a hybrid sorting algorithm combining an iterative Merge sort controlled
+! by a stack of `RUNS` identified by regions of uniformly decreasing or
+! non-decreasing sequences that may be expanded to a minimum run size and
+! initialy processed by an insertion sort.
+!
+! Note the Fortran implementation simplifies the logic as it only has to
+! deal with Fortran arrays of intrinsic types and not the full generality
+! of Rust's arrays and lists for arbitrary types. It also adds the
+! estimation of the optimal `run size` as suggested in Tim Peters'
+! original `listsort.txt`, and an optional `work` array to be used as
+! scratch memory.
+        character(len=*), intent(inout)                    :: array(0:)
+        character(len=len(array)), intent(inout), optional :: work(0:)
+
+        character(len=:), allocatable :: buf(:)
+        integer(int_size) :: array_size
+        integer :: stat
+
+        if ( present(work) ) then
+! Use the work array as scratch memory
+            call merge_sort( array, work )
+        else
+! Allocate a buffer to use as scratch memory.
+            array_size = size( array, kind=int_size )
+            allocate( character(len=len(array)) :: buf(0:array_size/2-1), &
+                      stat=stat )
+            if ( stat /= 0 ) error stop "Allocation of buffer failed."
+            call merge_sort( array, buf )
+        end if
+
+    contains
+
+        pure function calc_min_run( n ) result(min_run)
+!! Returns the minimum length of a run from 32-63 so that N/MIN_RUN is
+!! less than or equal to a power of two. See
+!! https://svn.python.org/projects/python/trunk/Objects/listsort.txt
+            integer(int_size)             :: min_run
+            integer(int_size), intent(in) :: n
+
+            integer(int_size) :: num, r
+
+            num = n
+            r = 0_int_size
+
+            do while( num >= 64 )
+                r = ior( r, iand(num, 1_int_size) )
+                num = ishft(num, -1_int_size)
+            end do
+            min_run = num + r
+
+        end function calc_min_run
+
+
+        pure subroutine insertion_sort( array )
+! Sorts `ARRAY` using an insertion sort.
+            character(len=*), intent(inout) :: array(0:)
+
+            integer(int_size)         :: i, j
+            character(len=len(array)) :: key
+
+            do j=1, size(array, kind=int_size)-1
+                key = array(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+            end do
+
+        end subroutine insertion_sort
+
+
+        pure function collapse( runs ) result ( r )
+! Examine the stack of runs waiting to be merged, identifying adjacent runs
+! to be merged until the stack invariants are restablished:
+!
+! 1. len(-3) > len(-2) + len(-1)
+! 2. len(-2) > len(-1)
+            integer(int_size) :: r
+            type(run_type), intent(in), target :: runs(0:)
+
+            integer(int_size) :: n
+            logical :: test
+
+            n = size(runs, kind=int_size)
+            test = .false.
+            if (n >= 2) then
+                if ( runs( n-1 ) % base == 0 .or. &
+                     runs( n-2 ) % len <= runs(n-1) % len ) then
+                    test = .true.
+                else if ( n >= 3 ) then ! X exists
+                    if ( runs(n-3) % len <= &
+                         runs(n-2) % len + runs(n-1) % len ) then
+                        test = .true.
+!               |X| <= |Y| + |Z| => will need to merge due to rho1 or rho2
+                    else if( n >= 4 ) then
+                        if ( runs(n-4) % len <= &
+                             runs(n-3) % len + runs(n-2) % len ) then
+                            test = .true.
+!               |W| <= |X| + |Y| => will need to merge due to rho1 or rho3
+                        end if
+                    end if
+                end if
+            end if
+            if ( test ) then
+! By default merge Y & Z, rho2 or rho3
+                if ( n >= 3 ) then
+                    if ( runs(n-3) % len < runs(n-1) % len ) then
+                        r = n - 3
+! |X| < |Z| => merge X & Y, rho1
+                        return
+                    end if
+                end if
+                r = n - 2
+! |Y| <= |Z| => merge Y & Z, rho4
+                return
+            else
+                r = -1
+            end if
+
+        end function collapse
+
+
+        pure subroutine insert_head( array )
+! Inserts `array(0)` into the pre-sorted sequence `array(1:)` so that the
+! whole `array(0:)` becomes sorted, copying the first element into
+! a temporary variable, iterating until the right place for it is found.
+! copying every traversed element into the slot preceding it, and finally,
+! copying data from the temporary variable into the resulting hole.
+
+            character(len=*), intent(inout) :: array(0:)
+
+            character(len=len(array)) :: tmp
+            integer(int_size)         :: i
+
+            tmp = array(0)
+            find_hole: do i=1, size(array, kind=int_size)-1
+                if ( array(i) >= tmp ) exit find_hole
+                array(i-1) = array(i)
+            end do find_hole
+            array(i-1) = tmp
+
+        end subroutine insert_head
+
+
+        subroutine merge_sort( array, buf )
+! The Rust merge sort borrows some (but not all) of the ideas from TimSort,
+! which is described in detail at
+! (http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
+!
+! The algorithm identifies strictly descending and non-descending
+! subsequences, which are called natural runs. Where these runs are less
+! than a minimum run size they are padded by adding additional samples
+! using an insertion sort. The merge process is driven by a stack of
+! pending unmerged runs. Each newly found run is pushed onto the stack,
+! and then pairs of adjacentd runs are merged until these two invariants
+! are satisfied:
+!
+! 1. for every `i` in `1..size(runs)-1`: `runs(i - 1)%len > runs(i)%len`
+! 2. for every `i` in `2..size(runs)-1`: `runs(i - 2)%len >
+!    runs(i - 1)%len + runs(i)%len`
+!
+! The invariants ensure that the total running time is `O(n log n)`
+! worst-case.
+
+            character(len=*), intent(inout)          :: array(0:)
+            character(len=len(array)), intent(inout) :: buf(0:)
+
+            integer(int_size) :: array_size, finish, min_run, r, r_count, &
+                start
+            type(run_type) :: runs(0:max_merge_stack-1), left, right
+
+            array_size = size(array, kind=int_size)
+
+! Very short runs are extended using insertion sort to span at least
+! min_run elements. Slices of up to this length are sorted using insertion
+! sort.
+            min_run = calc_min_run( array_size )
+
+            if ( array_size <= min_run ) then
+                if ( array_size >= 2 ) call insertion_sort( array )
+                return
+            end if
+
+! Following Rust sort, natural runs in `array` are identified by traversing
+! it backwards. By traversing it backward, merges more often go in the
+! opposite direction (forwards). According to developers of Rust sort,
+! merging forwards is slightly faster than merging backwards. Therefore
+! identifying runs by traversing backwards should improve performance.
+            r_count = 0
+            finish = array_size - 1
+            do while ( finish >= 0 )
+! Find the next natural run, and reverse it if it's strictly descending.
+                start = finish
+                if ( start > 0 ) then
+                    start = start - 1
+                    if ( array(start+1) < array(start) ) then
+                        Descending: do while ( start > 0 )
+                            if ( array(start) >= array(start-1) ) &
+                                exit Descending
+                            start = start - 1
+                        end do Descending
+                        call reverse_segment( array(start:finish) )
+                    else
+                        Ascending: do while( start > 0 )
+                            if ( array(start) < array(start-1) ) exit Ascending
+                            start = start - 1
+                        end do Ascending
+                    end if
+                end if
+
+! If the run is too short insert some more elements using an insertion sort.
+                Insert: do while ( start > 0 )
+                    if ( finish - start >= min_run - 1 ) exit Insert
+                    start = start - 1
+                    call insert_head( array(start:finish) )
+                end do Insert
+                if ( start == 0 .and. finish == array_size - 1 ) return
+
+                runs(r_count) = run_type( base = start, &
+                                          len = finish - start + 1 )
+                finish = start-1
+                r_count = r_count + 1
+
+! Determine whether pairs of adjacent runs need to be merged to satisfy
+! the invariants, and, if so, merge them.
+                Merge_loop: do
+                    r = collapse( runs(0:r_count - 1) )
+                    if ( r < 0 .or. r_count <= 1 ) exit Merge_loop
+                    left = runs( r + 1 )
+                    right = runs( r )
+                    call merge( array( left % base: &
+                                       right % base + right % len - 1 ), &
+                                left % len, buf )
+
+                    runs(r) = run_type( base = left % base, &
+                                        len = left % len + right % len )
+                    if ( r == r_count - 3 ) runs(r+1) = runs(r+2)
+                    r_count = r_count - 1
+
+                end do Merge_loop
+            end do
+            if ( r_count /= 1 ) &
+                error stop "MERGE_SORT completed without RUN COUNT == 1."
+
+        end subroutine merge_sort
+
+
+        pure subroutine merge( array, mid, buf )
+! Merges the two non-decreasing runs `ARRAY(0:MID-1)` and `ARRAY(MID:)`
+! using `BUF` as temporary storage, and stores the merged runs into
+! `ARRAY(0:)`. `MID` must be > 0, and < `SIZE(ARRAY)-1`. Buffer `BUF`
+! must be long enough to hold the shorter of the two runs.
+            character(len=*), intent(inout)          :: array(0:)
+            integer(int_size), intent(in)            :: mid
+            character(len=len(array)), intent(inout) :: buf(0:)
+
+            integer(int_size) :: array_len, i, j, k
+
+            array_len = size(array, kind=int_size)
+
+! Merge first copies the shorter run into `buf`. Then, depending on which
+! run was shorter, it traces the copied run and the longer run forwards
+! (or backwards), comparing their next unprocessed elements and then
+! copying the lesser (or greater) one into `array`.
+
+            if ( mid <= array_len - mid ) then ! The left run is shorter.
+                buf(0:mid-1) = array(0:mid-1)
+                i = 0
+                j = mid
+                merge_lower: do k = 0, array_len-1
+                    if ( buf(i) <= array(j) ) then
+                        array(k) = buf(i)
+                        i = i + 1
+                        if ( i >= mid ) exit merge_lower
+                    else
+                        array(k) = array(j)
+                        j = j + 1
+                        if ( j >= array_len ) then
+                            array(k+1:) = buf(i:mid-1)
+                            exit merge_lower
+                        end if
+                    end if
+                end do merge_lower
+            else ! The right run is shorter ! check that it is stable
+                buf(0:array_len-mid) = array(mid:array_len-1)
+                i = mid - 1
+                j = array_len - mid -1
+                merge_upper: do k = array_len-1, 0, -1
+                    if ( buf(j) >= array(i) ) then
+                        array(k) = buf(j)
+                        j = j - 1
+                        if ( j < 0 ) exit merge_upper
+                    else
+                        array(k) = array(i)
+                        i = i - 1
+                        if ( i < 0 ) then
+                            array(0:k-1) = buf(0:j)
+                            exit merge_upper
+                        end if
+                    end if
+                end do merge_upper
+            end if
+        end subroutine merge
+
+
+        pure subroutine reverse_segment( array )
+! Reverse a segment of an array in place
+            character(len=*), intent(inout) :: array(0:)
+
+            integer(int_size)         :: lo, hi
+            character(len=len(array)) :: temp
+
+            lo = 0
+            hi = size( array, kind=int_size ) - 1
+            do while( lo < hi )
+                temp = array(lo)
+                array(lo) = array(hi)
+                array(hi) = temp
+                lo = lo + 1
+                hi = hi - 1
+            end do
+
+        end subroutine reverse_segment
+
+    end subroutine char_ord_sort
+
+
+    module subroutine string_ord_sort( array, work )
+! A translation to Fortran 2008, of the `"Rust" sort` algorithm found in
+! `slice.rs`
+! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+! The Rust version in turn is a simplification of the Timsort algorithm
+! described in
+! https://svn.python.org/projects/python/trunk/Objects/listsort.txt, as
+! it drops both the use of 'galloping' to identify bounds of regions to be
+! sorted and the estimation of the optimal `run size`. However it remains
+! a hybrid sorting algorithm combining an iterative Merge sort controlled
+! by a stack of `RUNS` identified by regions of uniformly decreasing or
+! non-decreasing sequences that may be expanded to a minimum run size and
+! initialy processed by an insertion sort.
+!
+! Note the Fortran implementation simplifies the logic as it only has to
+! deal with Fortran arrays of intrinsic types and not the full generality
+! of Rust's arrays and lists for arbitrary types. It also adds the
+! estimation of the optimal `run size` as suggested in Tim Peters'
+! original `listsort.txt`, and an optional `work` array to be used as
+! scratch memory.
+        type(string_type), intent(inout)           :: array(0:)
+        type(string_type), intent(inout), optional :: work(0:)
+
+        type(string_type), allocatable :: buf(:)
+        integer(int_size)              :: array_size
+        integer                        :: stat
+
+        if ( present(work) ) then
+! Use the work array as scratch memory
+            call merge_sort( array, work )
+        else
+! Allocate a buffer to use as scratch memory.
+            array_size = size( array, kind=int_size )
+            allocate( buf(0:array_size/2-1), stat=stat )
+            if ( stat /= 0 ) error stop "Allocation of buffer failed."
+            call merge_sort( array, buf )
+        end if
+
+    contains
+
+        pure function calc_min_run( n ) result(min_run)
+!! Returns the minimum length of a run from 32-63 so that N/MIN_RUN is
+!! less than or equal to a power of two. See
+!! https://svn.python.org/projects/python/trunk/Objects/listsort.txt
+            integer(int_size)             :: min_run
+            integer(int_size), intent(in) :: n
+
+            integer(int_size) :: num, r
+
+            num = n
+            r = 0_int_size
+
+            do while( num >= 64 )
+                r = ior( r, iand(num, 1_int_size) )
+                num = ishft(num, -1_int_size)
+            end do
+            min_run = num + r
+
+        end function calc_min_run
+
+
+        pure subroutine insertion_sort( array )
+! Sorts `ARRAY` using an insertion sort.
+            type(string_type), intent(inout) :: array(0:)
+
+            integer(int_size) :: i, j
+            type(string_type) :: key
+
+            do j=1, size(array, kind=int_size)-1
+                key = array(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+            end do
+
+        end subroutine insertion_sort
+
+
+        pure function collapse( runs ) result ( r )
+! Examine the stack of runs waiting to be merged, identifying adjacent runs
+! to be merged until the stack invariants are restablished:
+!
+! 1. len(-3) > len(-2) + len(-1)
+! 2. len(-2) > len(-1)
+            integer(int_size) :: r
+            type(run_type), intent(in), target :: runs(0:)
+
+            integer(int_size) :: n
+            logical           :: test
+
+            n = size(runs, kind=int_size)
+            test = .false.
+            if (n >= 2) then
+                if ( runs( n-1 ) % base == 0 .or. &
+                     runs( n-2 ) % len <= runs(n-1) % len ) then
+                    test = .true.
+                else if ( n >= 3 ) then ! X exists
+                    if ( runs(n-3) % len <= &
+                         runs(n-2) % len + runs(n-1) % len ) then
+                        test = .true.
+!               |X| <= |Y| + |Z| => will need to merge due to rho1 or rho2
+                    else if( n >= 4 ) then
+                        if ( runs(n-4) % len <= &
+                             runs(n-3) % len + runs(n-2) % len ) then
+                            test = .true.
+!               |W| <= |X| + |Y| => will need to merge due to rho1 or rho3
+                        end if
+                    end if
+                end if
+            end if
+            if ( test ) then
+! By default merge Y & Z, rho2 or rho3
+                if ( n >= 3 ) then
+                    if ( runs(n-3) % len < runs(n-1) % len ) then
+                        r = n - 3
+! |X| < |Z| => merge X & Y, rho1
+                        return
+                    end if
+                end if
+                r = n - 2
+! |Y| <= |Z| => merge Y & Z, rho4
+                return
+            else
+                r = -1
+            end if
+
+        end function collapse
+
+
+        pure subroutine insert_head( array )
+! Inserts `array(0)` into the pre-sorted sequence `array(1:)` so that the
+! whole `array(0:)` becomes sorted, copying the first element into
+! a temporary variable, iterating until the right place for it is found.
+! copying every traversed element into the slot preceding it, and finally,
+! copying data from the temporary variable into the resulting hole.
+
+            type(string_type), intent(inout) :: array(0:)
+
+            type(string_type) :: tmp
+            integer(int_size) :: i
+
+            tmp = array(0)
+            find_hole: do i=1, size(array, kind=int_size)-1
+                if ( array(i) >= tmp ) exit find_hole
+                array(i-1) = array(i)
+            end do find_hole
+            array(i-1) = tmp
+
+        end subroutine insert_head
+
+
+        subroutine merge_sort( array, buf )
+! The Rust merge sort borrows some (but not all) of the ideas from TimSort,
+! which is described in detail at
+! (http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
+!
+! The algorithm identifies strictly descending and non-descending
+! subsequences, which are called natural runs. Where these runs are less
+! than a minimum run size they are padded by adding additional samples
+! using an insertion sort. The merge process is driven by a stack of
+! pending unmerged runs. Each newly found run is pushed onto the stack,
+! and then pairs of adjacentd runs are merged until these two invariants
+! are satisfied:
+!
+! 1. for every `i` in `1..size(runs)-1`: `runs(i - 1)%len > runs(i)%len`
+! 2. for every `i` in `2..size(runs)-1`: `runs(i - 2)%len >
+!    runs(i - 1)%len + runs(i)%len`
+!
+! The invariants ensure that the total running time is `O(n log n)`
+! worst-case.
+
+            type(string_type), intent(inout) :: array(0:)
+            type(string_type), intent(inout) :: buf(0:)
+
+            integer(int_size) :: array_size, finish, min_run, r, r_count, &
+                start
+            type(run_type)    :: runs(0:max_merge_stack-1), left, right
+
+            array_size = size(array, kind=int_size)
+
+! Very short runs are extended using insertion sort to span at least
+! min_run elements. Slices of up to this length are sorted using insertion
+! sort.
+            min_run = calc_min_run( array_size )
+
+            if ( array_size <= min_run ) then
+                if ( array_size >= 2 ) call insertion_sort( array )
+                return
+            end if
+
+! Following Rust sort, natural runs in `array` are identified by traversing
+! it backwards. By traversing it backward, merges more often go in the
+! opposite direction (forwards). According to developers of Rust sort,
+! merging forwards is slightly faster than merging backwards. Therefore
+! identifying runs by traversing backwards should improve performance.
+            r_count = 0
+            finish = array_size - 1
+            do while ( finish >= 0 )
+! Find the next natural run, and reverse it if it's strictly descending.
+                start = finish
+                if ( start > 0 ) then
+                    start = start - 1
+                    if ( array(start+1) < array(start) ) then
+                        Descending: do while ( start > 0 )
+                            if ( array(start) >= array(start-1) ) &
+                                exit Descending
+                            start = start - 1
+                        end do Descending
+                        call reverse_segment( array(start:finish) )
+                    else
+                        Ascending: do while( start > 0 )
+                            if ( array(start) < array(start-1) ) exit Ascending
+                            start = start - 1
+                        end do Ascending
+                    end if
+                end if
+
+! If the run is too short insert some more elements using an insertion sort.
+                Insert: do while ( start > 0 )
+                    if ( finish - start >= min_run - 1 ) exit Insert
+                    start = start - 1
+                    call insert_head( array(start:finish) )
+                end do Insert
+                if ( start == 0 .and. finish == array_size - 1 ) return
+
+                runs(r_count) = run_type( base = start, &
+                                          len = finish - start + 1 )
+                finish = start-1
+                r_count = r_count + 1
+
+! Determine whether pairs of adjacent runs need to be merged to satisfy
+! the invariants, and, if so, merge them.
+                Merge_loop: do
+                    r = collapse( runs(0:r_count - 1) )
+                    if ( r < 0 .or. r_count <= 1 ) exit Merge_loop
+                    left = runs( r + 1 )
+                    right = runs( r )
+                    call merge( array( left % base: &
+                                       right % base + right % len - 1 ), &
+                                left % len, buf )
+
+                    runs(r) = run_type( base = left % base, &
+                                        len = left % len + right % len )
+                    if ( r == r_count - 3 ) runs(r+1) = runs(r+2)
+                    r_count = r_count - 1
+
+                end do Merge_loop
+            end do
+            if ( r_count /= 1 ) &
+                error stop "MERGE_SORT completed without RUN COUNT == 1."
+
+        end subroutine merge_sort
+
+
+        pure subroutine merge( array, mid, buf )
+! Merges the two non-decreasing runs `ARRAY(0:MID-1)` and `ARRAY(MID:)`
+! using `BUF` as temporary storage, and stores the merged runs into
+! `ARRAY(0:)`. `MID` must be > 0, and < `SIZE(ARRAY)-1`. Buffer `BUF`
+! must be long enough to hold the shorter of the two runs.
+            type(string_type), intent(inout) :: array(0:)
+            integer(int_size), intent(in)    :: mid
+            type(string_type), intent(inout) :: buf(0:)
+
+            integer(int_size) :: array_len, i, j, k
+
+            array_len = size(array, kind=int_size)
+
+! Merge first copies the shorter run into `buf`. Then, depending on which
+! run was shorter, it traces the copied run and the longer run forwards
+! (or backwards), comparing their next unprocessed elements and then
+! copying the lesser (or greater) one into `array`.
+
+            if ( mid <= array_len - mid ) then ! The left run is shorter.
+                buf(0:mid-1) = array(0:mid-1)
+                i = 0
+                j = mid
+                merge_lower: do k = 0, array_len-1
+                    if ( buf(i) <= array(j) ) then
+                        array(k) = buf(i)
+                        i = i + 1
+                        if ( i >= mid ) exit merge_lower
+                    else
+                        array(k) = array(j)
+                        j = j + 1
+                        if ( j >= array_len ) then
+                            array(k+1:) = buf(i:mid-1)
+                            exit merge_lower
+                        end if
+                    end if
+                end do merge_lower
+            else ! The right run is shorter ! check that it is stable
+                buf(0:array_len-mid) = array(mid:array_len-1)
+                i = mid - 1
+                j = array_len - mid -1
+                merge_upper: do k = array_len-1, 0, -1
+                    if ( buf(j) >= array(i) ) then
+                        array(k) = buf(j)
+                        j = j - 1
+                        if ( j < 0 ) exit merge_upper
+                    else
+                        array(k) = array(i)
+                        i = i - 1
+                        if ( i < 0 ) then
+                            array(0:k-1) = buf(0:j)
+                            exit merge_upper
+                        end if
+                    end if
+                end do merge_upper
+            end if
+        end subroutine merge
+
+
+        pure subroutine reverse_segment( array )
+! Reverse a segment of an array in place
+            type(string_type), intent(inout) :: array(0:)
+
+            integer(int_size) :: lo, hi
+            type(string_type) :: temp
+
+            lo = 0
+            hi = size( array, kind=int_size ) - 1
+            do while( lo < hi )
+                temp = array(lo)
+                array(lo) = array(hi)
+                array(hi) = temp
+                lo = lo + 1
+                hi = hi - 1
+            end do
+
+        end subroutine reverse_segment
+
+    end subroutine string_ord_sort
+
+end submodule stdlib_sorting_ord_sort
+
diff --git a/src/stdlib_sorting_sort.fypp b/src/stdlib_sorting_sort.fypp
new file mode 100644
index 000000000..fab24aa1c
--- /dev/null
+++ b/src/stdlib_sorting_sort.fypp
@@ -0,0 +1,736 @@
+#! Integer kinds to be considered during templating
+#:set INT_KINDS = ["int8", "int16", "int32", "int64"]
+
+#! Real kinds to be considered during templating
+#:set REAL_KINDS = ["sp", "dp", "qp"]
+
+!! Licensing:
+!!
+!! This file is subjec† both to the Fortran Standard Library license, and
+!! to additional licensing requirements as it contains translations of
+!! other software.
+!!
+!! The Fortran Standard Library, including this file, is distributed under
+!! the MIT license that should be included with the library's distribution.
+!!
+!!   Copyright (c) 2021 Fortran stdlib developers
+!!
+!!   Permission is hereby granted, free of charge, to any person obtaining a
+!!   copy of this software and associated documentation files (the
+!!   "Software"),  to deal in the Software without restriction, including
+!!   without limitation the rights to use, copy, modify, merge, publish,
+!!   distribute, sublicense, and/or sellcopies of the Software, and to permit
+!!   persons to whom the Software is furnished to do so, subject to the
+!!   following conditions:
+!!
+!!   The above copyright notice and this permission notice shall be included
+!!   in all copies or substantial portions of the Software.
+!!
+!!   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+!!   OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+!!   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+!!   IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+!!   CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+!!   TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+!!   SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+!!
+!! The generic subroutine, `SORT`, is substantially a
+!! translation to Fortran 2008, of the `introsort` of David Musser.
+!! David Musser has given permission to include a variant of `introsort`
+!! in the Fortran Standard Library under the MIT license provided
+!! we cite:
+!!
+!!   Musser, D.R., “Introspective Sorting and Selection Algorithms,”
+!!   Software—Practice and Experience, Vol. 27(8), 983–993 (August 1997).
+!!
+!! as the official source of the algorithm.
+
+submodule(stdlib_sorting) stdlib_sorting_sort
+!! This submodule implements the overloaded sorting subroutine `SORT`
+!! that can be used to sort four kinds of `INTEGER` arrays and three kinds
+!! of `REAL` arrays. Sorting is in order of increasing value, with the worst
+!! case run time performance of `O(N Ln(N))`.
+!!
+!! `SORT` uses the `INTROSORT` sorting algorithm of David Musser,
+!! http://www.cs.rpi.edu/~musser/gp/introsort.ps. `introsort` is a hybrid
+!! unstable comparison algorithm combining `quicksort`, `insertion sort`, and
+!! `heap sort`. While this algorithm is always O(N Ln(N)) it is relatively
+!! fast on randomly ordered data, but inconsistent in performance on partly
+!! sorted data, sometimes having `merge sort` performance, sometimes having
+!! better than `quicksort` performance.
+
+    implicit none
+
+contains
+
+
+#:for k1 in INT_KINDS
+
+    pure module subroutine ${k1}$_sort( array )
+! `${k1}$_sort( array )` sorts the input `ARRAY` of type `INTEGER(${k1}$)`
+! using a hybrid sort based on the `introsort` of David Musser. As with
+! `introsort`, `${k1}$_sort( array )` is an unstable hybrid comparison
+! algorithm using `quicksort` for the main body of the sort tree,
+! supplemented by `insertion sort` for the outer brances, but if
+! `quicksort` is converging too slowly the algorithm resorts
+! to `heapsort`. The algorithm is of order O(N Ln(N)) for all inputs.
+! Because it relies on `quicksort`, the coefficient of the O(N Ln(N))
+! behavior is typically small compared to other sorting algorithms.
+
+        integer(${k1}$), intent(inout) :: array(0:)
+
+        integer(int32) :: depth_limit
+
+        depth_limit = 2 * int( floor( log( real( size( array, kind=int64 ),  &
+                                                 kind=dp) ) / log(2.0_dp) ), &
+                               kind=int32 )
+        call introsort(array, depth_limit)
+
+    contains
+
+        pure recursive subroutine introsort( array, depth_limit )
+! It devolves to `insertionsort` if the remaining number of elements
+! is fewer than or equal to `INSERT_SIZE`, `heapsort` if the completion
+! of the `quicksort` is too slow as estimated from `DEPTH_LIMIT`,
+! otherwise sorting is done by a `quicksort`.
+            integer(${k1}$), intent(inout) :: array(0:)
+            integer(int32), intent(in)     :: depth_limit
+
+            integer(int_size), parameter :: insert_size = 16_int_size
+            integer(int_size)            :: index
+
+            if ( size(array, kind=int_size) <= insert_size ) then
+                ! May be best at the end of SORT processing the whole array
+                ! See Musser, D.R., “Introspective Sorting and Selection
+                ! Algorithms,” Software—Practice and Experience, Vol. 27(8),
+                ! 983–993 (August 1997).
+
+                call insertion_sort( array )
+            else if ( depth_limit == 0 ) then
+                call heap_sort( array )
+            else
+                call partition( array, index )
+                call introsort( array(0:index-1), depth_limit-1 )
+                call introsort( array(index+1:), depth_limit-1 )
+            end if
+
+        end subroutine introsort
+
+
+        pure subroutine partition( array, index )
+! quicksort partition using median of three.
+            integer(${k1}$), intent(inout) :: array(0:)
+            integer(int_size), intent(out) :: index
+
+            integer(${k1}$) :: u, v, w, x, y
+            integer(int_size) :: i, j
+
+! Determine median of three and exchange it with the end.
+            u = array( 0 )
+            v = array( size(array, kind=int_size)/2-1 )
+            w = array( size(array, kind=int_size)-1 )
+            if ( (u > v) .neqv. (u > w) ) then
+                x = u
+                y = array(0)
+                array(0) = array( size( array, kind=int_size ) - 1 )
+                array( size( array, kind=int_size ) - 1 ) = y
+            else if ( (v < u) .neqv. (v < w) ) then
+                x = v
+                y = array(size( array, kind=int_size )/2-1)
+                array( size( array, kind=int_size )/2-1 ) = &
+                    array( size( array, kind=int_size )-1 )
+                array( size( array, kind=int_size )-1 ) = y
+            else
+                x = w
+            end if
+! Partition the array.
+            i = -1_int_size
+            do j = 0_int_size, size(array, kind=int_size)-2
+                if ( array(j) <= x ) then
+                    i = i + 1
+                    y = array(i)
+                    array(i) = array(j)
+                    array(j) = y
+                end if
+            end do
+            y = array(i+1)
+            array(i+1) = array(size(array, kind=int_size)-1)
+            array(size(array, kind=int_size)-1) = y
+            index = i + 1
+
+        end subroutine partition
+
+        pure subroutine insertion_sort( array )
+! Bog standard insertion sort.
+            integer(${k1}$), intent(inout) :: array(0:)
+
+            integer(int_size) :: i, j
+            integer(${k1}$)   :: key
+
+            do j=1_int_size, size(array, kind=int_size)-1
+                key = array(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+            end do
+
+        end subroutine insertion_sort
+
+        pure subroutine heap_sort( array )
+! A bog standard heap sort
+            integer(${k1}$), intent(inout) :: array(0:)
+
+            integer(int_size) :: i, heap_size
+            integer(${k1}$)   :: y
+
+            heap_size = size( array, kind=int_size )
+! Build the max heap
+            do i = (heap_size-2)/2_int_size, 0_int_size, -1_int_size
+                call max_heapify( array, i, heap_size )
+            end do
+            do i = heap_size-1, 1_int_size, -1_int_size
+! Swap the first element with the current final element
+                y = array(0)
+                array(0) = array(i)
+                array(i) = y
+! Sift down using max_heapify
+                call max_heapify( array, 0_int_size, i )
+            end do
+
+        end subroutine heap_sort
+
+        pure recursive subroutine max_heapify( array, i, heap_size )
+! Transform the array into a max heap
+            integer(${k1}$), intent(inout) :: array(0:)
+            integer(int_size), intent(in)  :: i, heap_size
+
+            integer(int_size) :: l, r, largest
+            integer(${k1}$)   :: y
+
+            largest = i
+            l = 2_int_size * i + 1_int_size
+            r = l + 1_int_size
+            if ( l < heap_size ) then
+                if ( array(l) > array(largest) ) largest = l
+            end if
+            if ( r < heap_size ) then
+                if ( array(r) > array(largest) ) largest = r
+            end if
+            if ( largest /= i ) then
+                y = array(i)
+                array(i) = array(largest)
+                array(largest) = y
+                call max_heapify( array, largest, heap_size )
+            end if
+
+        end subroutine max_heapify
+
+    end subroutine ${k1}$_sort
+
+#:endfor
+
+
+#:for k1 in REAL_KINDS
+
+    pure module subroutine ${k1}$_sort( array )
+
+! `${k1}$_sort( array )` sorts the input `ARRAY` of type `REAL(${k1}$)`
+! using a hybrid sort based on the `introsort` of David Musser. As with
+! `introsort`, `${k1}$_sort( array )` is an unstable hybrid comparison
+! algorithm using `quicksort` for the main body of the sort tree,
+! supplemented by `insertion sort` for the outer brances, but if
+! `quicksort` is converging too slowly the algorithm resorts
+! to `heapsort`. The algorithm is of order O(N Ln(N)) for all inputs.
+! Because it relies on `quicksort`, the coefficient of the O(N Ln(N))
+! behavior is typically small compared to other sorting algorithms.
+
+        real(${k1}$), intent(inout) :: array(0:)
+        integer(int32)              :: depth_limit
+
+        depth_limit = &
+            2 * int( floor( log( real( size( array, kind=int_size ),   &
+                                             kind=dp) ) / log(2.0_dp) ), &
+                               kind=int32 )
+        call introsort(array, depth_limit)
+
+    contains
+
+        pure recursive subroutine introsort( array, depth_limit )
+! It devolves to `insertionsort` if the remaining number of elements
+! is fewer than or equal to `INSERT_SIZE`, `heapsort` if the completion of
+! the quicksort is too slow as estimated from `DEPTH_LIMIT`, otherwise
+! sorting is done by a `quicksort`.
+            real(${k1}$), intent(inout) :: array(0:)
+            integer(int32), intent(in)  :: depth_limit
+
+            integer(int_size), parameter :: insert_size = 16_int_size
+            integer(int_size)            :: index
+
+            if ( size(array, kind=int_size) <= insert_size ) then
+                ! May be best at the end of SORT processing the whole array
+                ! See Musser, D.R., “Introspective Sorting and Selection
+                ! Algorithms,” Software—Practice and Experience, Vol. 27(8),
+                ! 983–993 (August 1997).
+
+                call insertion_sort( array )
+
+            else if ( depth_limit == 0 ) then
+                call heap_sort( array )
+
+            else
+                call partition( array, index )
+                call introsort( array(0:index-1), depth_limit-1 )
+                call introsort( array(index+1:), depth_limit-1 )
+            end if
+
+        end subroutine introsort
+
+
+        pure subroutine partition( array, index )
+! quicksort partition using median of three.
+            real(${k1}$), intent(inout)    :: array(0:)
+            integer(int_size), intent(out) :: index
+
+            real(${k1}$)      :: u, v, w, x, y
+            integer(int_size) :: i, j
+
+! Determine median of three and exchange it with the end.
+            u = array( 0 )
+            v = array( size(array, kind=int_size)/2-1 )
+            w = array( size(array, kind=int_size)-1 )
+            if ( (u > v) .neqv. (u > w) ) then
+                x = u
+                y = array(0)
+                array(0) = array( size( array, kind=int_size ) - 1 )
+                array( size( array, kind=int_size ) - 1 ) = y
+            else if ( (v < u) .neqv. (v < w) ) then
+                x = v
+                y = array(size( array, kind=int_size )/2-1)
+                array(size( array, kind=int_size )/2-1) = &
+                    array(size( array, kind=int_size )-1)
+                array( size( array, kind=int_size )-1 ) = y
+            else
+                x = w
+            end if
+! Partition the array
+            i = -1_int_size
+            do j = 0_int_size, size(array, kind=int_size)-2
+                if ( array(j) <= x ) then
+                    i = i + 1
+                    y = array(i)
+                    array(i) = array(j)
+                    array(j) = y
+                end if
+            end do
+            y = array(i+1)
+            array(i+1) = array(size(array, kind=int_size)-1)
+            array(size(array, kind=int_size)-1) = y
+            index = i + 1
+
+        end subroutine partition
+
+        pure subroutine insertion_sort( array )
+! Bog standard insertion sort.
+            real(${k1}$), intent(inout) :: array(0:)
+
+            integer(int_size) :: i, j
+            real(${k1}$)      :: key
+
+            do j=1_int_size, size(array, kind=int_size)-1
+                key = array(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+            end do
+
+        end subroutine insertion_sort
+
+        pure subroutine heap_sort( array )
+! A bog standard heap sort
+            real(${k1}$), intent(inout) :: array(0:)
+
+            integer(int_size) :: i, heap_size
+            real(${k1}$)      :: y
+
+            heap_size = size( array, kind=int_size )
+! Build the max heap
+            do i = (heap_size-2)/2_int_size, 0_int_size, -1_int_size
+                call max_heapify( array, i, heap_size )
+            end do
+            do i = heap_size-1, 1_int_size, -1_int_size
+! Swap the first element with the current final element
+                y = array(0)
+                array(0) = array(i)
+                array(i) = y
+! Sift down using max_heapify
+                call max_heapify( array, 0_int_size, i )
+            end do
+
+        end subroutine heap_sort
+
+        pure recursive subroutine max_heapify( array, i, heap_size )
+! Transform the array into a max heap
+            real(${k1}$), intent(inout)   :: array(0:)
+            integer(int_size), intent(in) :: i, heap_size
+
+            integer(int_size) :: l, r, largest
+            real(${k1}$)      :: y
+
+            largest = i
+            l = 2_int_size * i + 1_int_size
+            r = l + 1_int_size
+            if ( l < heap_size ) then
+                if ( array(l) > array(largest) ) largest = l
+            end if
+            if ( r < heap_size ) then
+                if ( array(r) > array(largest) ) largest = r
+            end if
+            if ( largest /= i ) then
+                y = array(i)
+                array(i) = array(largest)
+                array(largest) = y
+                call max_heapify( array, largest, heap_size )
+            end if
+
+        end subroutine max_heapify
+
+    end subroutine ${k1}$_sort
+
+#:endfor
+
+
+    pure module subroutine char_sort( array )
+! `char_sort( array )` sorts the input `ARRAY` of type `CHARACTER(*)`
+! using a hybrid sort based on the `introsort` of David Musser. As with
+! `introsort`, `char_sort( array )` is an unstable hybrid comparison
+! algorithm using `quicksort` for the main body of the sort tree,
+! supplemented by `insertion sort` for the outer brances, but if
+! `quicksort` is converging too slowly the algorithm resorts
+! to `heapsort`. The algorithm is of order O(N Ln(N)) for all inputs.
+! Because it relies on `quicksort`, the coefficient of the O(N Ln(N))
+! behavior is typically small compared to other sorting algorithms.
+
+        character(len=*), intent(inout) :: array(0:)
+
+        integer(int32) :: depth_limit
+
+        depth_limit = 2 * int( floor( log( real( size( array, kind=int64 ),  &
+                                                 kind=dp) ) / log(2.0_dp) ), &
+                               kind=int32 )
+        call introsort(array, depth_limit)
+
+    contains
+
+        pure recursive subroutine introsort( array, depth_limit )
+! It devolves to `insertionsort` if the remaining number of elements
+! is fewer than or equal to `INSERT_SIZE`, `heapsort` if the completion
+! of the `quicksort` is too slow as estimated from `DEPTH_LIMIT`,
+! otherwise sorting is done by a `quicksort`.
+            character(len=*), intent(inout) :: array(0:)
+            integer(int32), intent(in)      :: depth_limit
+
+            integer(int_size), parameter :: insert_size = 16_int_size
+            integer(int_size)            :: index
+
+            if ( size(array, kind=int_size) <= insert_size ) then
+                ! May be best at the end of SORT processing the whole array
+                ! See Musser, D.R., “Introspective Sorting and Selection
+                ! Algorithms,” Software—Practice and Experience, Vol. 27(8),
+                ! 983–993 (August 1997).
+
+                call insertion_sort( array )
+            else if ( depth_limit == 0 ) then
+                call heap_sort( array )
+            else
+                call partition( array, index )
+                call introsort( array(0:index-1), depth_limit-1 )
+                call introsort( array(index+1:), depth_limit-1 )
+            end if
+
+        end subroutine introsort
+
+
+        pure subroutine partition( array, index )
+! quicksort partition using median of three.
+            character(len=*), intent(inout) :: array(0:)
+            integer(int_size), intent(out)  :: index
+
+            integer(int_size)         :: i, j
+            character(len=len(array)) :: u, v, w, x, y
+
+! Determine median of three and exchange it with the end.
+            u = array( 0 )
+            v = array( size(array, kind=int_size)/2-1 )
+            w = array( size(array, kind=int_size)-1 )
+            if ( (u > v) .neqv. (u > w) ) then
+                x = u
+                y = array(0)
+                array(0) = array( size( array, kind=int_size ) - 1 )
+                array( size( array, kind=int_size ) - 1 ) = y
+            else if ( (v < u) .neqv. (v < w) ) then
+                x = v
+                y = array(size( array, kind=int_size )/2-1)
+                array( size( array, kind=int_size )/2-1 ) = &
+                    array( size( array, kind=int_size )-1 )
+                array( size( array, kind=int_size )-1 ) = y
+            else
+                x = w
+            end if
+! Partition the array.
+            i = -1_int_size
+            do j = 0_int_size, size(array, kind=int_size)-2
+                if ( array(j) <= x ) then
+                    i = i + 1
+                    y = array(i)
+                    array(i) = array(j)
+                    array(j) = y
+                end if
+            end do
+            y = array(i+1)
+            array(i+1) = array(size(array, kind=int_size)-1)
+            array(size(array, kind=int_size)-1) = y
+            index = i + 1
+
+        end subroutine partition
+
+        pure subroutine insertion_sort( array )
+! Bog standard insertion sort.
+            character(len=*), intent(inout) :: array(0:)
+
+            integer(int_size)         :: i, j
+            character(len=len(array)) :: key
+
+            do j=1_int_size, size(array, kind=int_size)-1
+                key = array(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+            end do
+
+        end subroutine insertion_sort
+
+        pure subroutine heap_sort( array )
+! A bog standard heap sort
+            character(len=*), intent(inout) :: array(0:)
+
+            integer(int_size)         :: i, heap_size
+            character(len=len(array)) :: y
+
+            heap_size = size( array, kind=int_size )
+! Build the max heap
+            do i = (heap_size-2)/2_int_size, 0_int_size, -1_int_size
+                call max_heapify( array, i, heap_size )
+            end do
+            do i = heap_size-1, 1_int_size, -1_int_size
+! Swap the first element with the current final element
+                y = array(0)
+                array(0) = array(i)
+                array(i) = y
+! Sift down using max_heapify
+                call max_heapify( array, 0_int_size, i )
+            end do
+
+        end subroutine heap_sort
+
+        pure recursive subroutine max_heapify( array, i, heap_size )
+! Transform the array into a max heap
+            character(len=*), intent(inout) :: array(0:)
+            integer(int_size), intent(in)   :: i, heap_size
+
+            integer(int_size)         :: l, r, largest
+            character(len=len(array)) :: y
+
+            largest = i
+            l = 2_int_size * i + 1_int_size
+            r = l + 1_int_size
+            if ( l < heap_size ) then
+                if ( array(l) > array(largest) ) largest = l
+            end if
+            if ( r < heap_size ) then
+                if ( array(r) > array(largest) ) largest = r
+            end if
+            if ( largest /= i ) then
+                y = array(i)
+                array(i) = array(largest)
+                array(largest) = y
+                call max_heapify( array, largest, heap_size )
+            end if
+
+        end subroutine max_heapify
+
+    end subroutine char_sort
+
+    pure module subroutine string_sort( array )
+! `string_sort( array )` sorts the input `ARRAY` of type `STRING_TyPE`
+! using a hybrid sort based on the `introsort` of David Musser. As with
+! `introsort`, `string_sort( array )` is an unstable hybrid comparison
+! algorithm using `quicksort` for the main body of the sort tree,
+! supplemented by `insertion sort` for the outer brances, but if
+! `quicksort` is converging too slowly the algorithm resorts
+! to `heapsort`. The algorithm is of order O(N Ln(N)) for all inputs.
+! Because it relies on `quicksort`, the coefficient of the O(N Ln(N))
+! behavior is typically small compared to other sorting algorithms.
+
+        type(string_type), intent(inout) :: array(0:)
+
+        integer(int32) :: depth_limit
+
+        depth_limit = 2 * int( floor( log( real( size( array, kind=int64 ),  &
+                                                 kind=dp) ) / log(2.0_dp) ), &
+                               kind=int32 )
+        call introsort(array, depth_limit)
+
+    contains
+
+        pure recursive subroutine introsort( array, depth_limit )
+! It devolves to `insertionsort` if the remaining number of elements
+! is fewer than or equal to `INSERT_SIZE`, `heapsort` if the completion
+! of the `quicksort` is too slow as estimated from `DEPTH_LIMIT`,
+! otherwise sorting is done by a `quicksort`.
+            type(string_type), intent(inout) :: array(0:)
+            integer(int32), intent(in)      :: depth_limit
+
+            integer(int_size), parameter :: insert_size = 16_int_size
+            integer(int_size)            :: index
+
+            if ( size(array, kind=int_size) <= insert_size ) then
+                ! May be best at the end of SORT processing the whole array
+                ! See Musser, D.R., “Introspective Sorting and Selection
+                ! Algorithms,” Software—Practice and Experience, Vol. 27(8),
+                ! 983–993 (August 1997).
+
+                call insertion_sort( array )
+            else if ( depth_limit == 0 ) then
+                call heap_sort( array )
+            else
+                call partition( array, index )
+                call introsort( array(0:index-1), depth_limit-1 )
+                call introsort( array(index+1:), depth_limit-1 )
+            end if
+
+        end subroutine introsort
+
+
+        pure subroutine partition( array, index )
+! quicksort partition using median of three.
+            type(string_type), intent(inout) :: array(0:)
+            integer(int_size), intent(out)  :: index
+
+            integer(int_size) :: i, j
+            type(string_type) :: u, v, w, x, y
+
+! Determine median of three and exchange it with the end.
+            u = array( 0 )
+            v = array( size(array, kind=int_size)/2-1 )
+            w = array( size(array, kind=int_size)-1 )
+            if ( (u > v) .neqv. (u > w) ) then
+                x = u
+                y = array(0)
+                array(0) = array( size( array, kind=int_size ) - 1 )
+                array( size( array, kind=int_size ) - 1 ) = y
+            else if ( (v < u) .neqv. (v < w) ) then
+                x = v
+                y = array(size( array, kind=int_size )/2-1)
+                array( size( array, kind=int_size )/2-1 ) = &
+                    array( size( array, kind=int_size )-1 )
+                array( size( array, kind=int_size )-1 ) = y
+            else
+                x = w
+            end if
+! Partition the array.
+            i = -1_int_size
+            do j = 0_int_size, size(array, kind=int_size)-2
+                if ( array(j) <= x ) then
+                    i = i + 1
+                    y = array(i)
+                    array(i) = array(j)
+                    array(j) = y
+                end if
+            end do
+            y = array(i+1)
+            array(i+1) = array(size(array, kind=int_size)-1)
+            array(size(array, kind=int_size)-1) = y
+            index = i + 1
+
+        end subroutine partition
+
+        pure subroutine insertion_sort( array )
+! Bog standard insertion sort.
+            type(string_type), intent(inout) :: array(0:)
+
+            integer(int_size) :: i, j
+            type(string_type) :: key
+
+            do j=1_int_size, size(array, kind=int_size)-1
+                key = array(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+            end do
+
+        end subroutine insertion_sort
+
+        pure subroutine heap_sort( array )
+! A bog standard heap sort
+            type(string_type), intent(inout) :: array(0:)
+
+            integer(int_size) :: i, heap_size
+            type(string_type) :: y
+
+            heap_size = size( array, kind=int_size )
+! Build the max heap
+            do i = (heap_size-2)/2_int_size, 0_int_size, -1_int_size
+                call max_heapify( array, i, heap_size )
+            end do
+            do i = heap_size-1, 1_int_size, -1_int_size
+! Swap the first element with the current final element
+                y = array(0)
+                array(0) = array(i)
+                array(i) = y
+! Sift down using max_heapify
+                call max_heapify( array, 0_int_size, i )
+            end do
+
+        end subroutine heap_sort
+
+        pure recursive subroutine max_heapify( array, i, heap_size )
+! Transform the array into a max heap
+            type(string_type), intent(inout) :: array(0:)
+            integer(int_size), intent(in)    :: i, heap_size
+
+            integer(int_size) :: l, r, largest
+            type(string_type) :: y
+
+            largest = i
+            l = 2_int_size * i + 1_int_size
+            r = l + 1_int_size
+            if ( l < heap_size ) then
+                if ( array(l) > array(largest) ) largest = l
+            end if
+            if ( r < heap_size ) then
+                if ( array(r) > array(largest) ) largest = r
+            end if
+            if ( largest /= i ) then
+                y = array(i)
+                array(i) = array(largest)
+                array(largest) = y
+                call max_heapify( array, largest, heap_size )
+            end if
+
+        end subroutine max_heapify
+
+    end subroutine string_sort
+
+end submodule stdlib_sorting_sort
diff --git a/src/stdlib_sorting_sort_index.fypp b/src/stdlib_sorting_sort_index.fypp
new file mode 100644
index 000000000..4146e77f1
--- /dev/null
+++ b/src/stdlib_sorting_sort_index.fypp
@@ -0,0 +1,1684 @@
+#! Integer kinds to be considered during templating
+#:set INT_KINDS = ["int8", "int16", "int32", "int64"]
+
+#! Real kinds to be considered during templating
+#:set REAL_KINDS = ["sp", "dp", "qp"]
+
+!! Licensing:
+!!
+!! This file is subjec† both to the Fortran Standard Library license, and
+!! to additional licensing requirements as it contains translations of
+!! other software.
+!!
+!! The Fortran Standard Library, including this file, is distributed under
+!! the MIT license that should be included with the library's distribution.
+!!
+!!   Copyright (c) 2021 Fortran stdlib developers
+!!
+!!   Permission is hereby granted, free of charge, to any person obtaining a
+!!   copy of this software and associated documentation files (the
+!!   "Software"),  to deal in the Software without restriction, including
+!!   without limitation the rights to use, copy, modify, merge, publish,
+!!   distribute, sublicense, and/or sellcopies of the Software, and to permit
+!!   persons to whom the Software is furnished to do so, subject to the
+!!   following conditions:
+!!
+!!   The above copyright notice and this permission notice shall be included
+!!   in all copies or substantial portions of the Software.
+!!
+!!   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+!!   OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+!!   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+!!   IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+!!   CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+!!   TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+!!   SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+!!
+!! The generic subroutine, `SORT_INDEX`, is substantially a translation to
+!! Fortran 2008 of the `"Rust" sort` sorting routines in
+!! [`slice.rs`](https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs)
+!! The `rust sort` implementation is distributed with the header:
+!!
+!!   Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
+!!   file at the top-level directory of this distribution and at
+!!   http://rust-lang.org/COPYRIGHT.
+!!
+!!   Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
+!!   http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
+!!   <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
+!!   option. This file may not be copied, modified, or distributed
+!!   except according to those terms.
+!!
+!! so the licence for the original`slice.rs` code is compatible with the use
+!! of modified versions of the code in the Fortran Standard Library under
+!! the MIT license.
+
+submodule(stdlib_sorting) stdlib_sorting_sort_index
+
+    implicit none
+
+contains
+
+#:for k1 in INT_KINDS
+
+    module subroutine ${k1}$_sort_index( array, index, work, iwork, reverse )
+! A modification of `${k1}$_ord_sort` to return an array of indices that
+! would provide a stable sort of the `ARRAY` input. The indices by default
+! correspond to a non-decreasing sort, but if the optional argument
+! `REVERSE` is present with a value of `.TRUE.` the indices correspond to
+! a non-increasing sort. The logic of the determination of indexing largely
+! follows the `"Rust" sort` found in `slice.rs`:
+! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+! The Rust version is a simplification of the Timsort algorithm described
+! in https://svn.python.org/projects/python/trunk/Objects/listsort.txt, as
+! it drops both the use of 'galloping' to identify bounds of regions to be
+! sorted and the estimation of the optimal `run size`. However it remains
+! a hybrid sorting algorithm combining an iterative Merge sort controlled
+! by a stack of `RUNS` identified by regions of uniformly decreasing or
+! non-decreasing sequences that may be expanded to a minimum run size, with
+! an insertion sort.
+!
+! Note the Fortran implementation simplifies the logic as it only has to
+! deal with Fortran arrays of intrinsic types and not the full generality
+! of Rust's arrays and lists for arbitrary types. It also adds the
+! estimation of the optimal `run size` as suggested in Tim Peters'
+! original listsort.txt, and the optional `work` and `iwork` arraya to be
+! used as scratch memory.
+
+        integer(${k1}$), intent(inout)             :: array(0:)
+        integer(int_size), intent(inout)           :: index(0:)
+        integer(${k1}$), intent(inout), optional   :: work(0:)
+        integer(int_size), intent(inout), optional :: iwork(0:)
+        logical, intent(in), optional              :: reverse
+
+        integer(int_size) :: array_size, i, stat
+        integer(${k1}$), allocatable :: buf(:)
+        integer(int_size), allocatable :: ibuf(:)
+
+        array_size = size(array, kind=int_size)
+
+        do i = 0, array_size-1
+            index(i) = i+1
+        end do
+
+        if ( present(reverse) ) then
+            if ( reverse ) then
+                call reverse_segment( array, index )
+            end if
+        end if
+
+! If necessary allocate buffers to serve as scratch memory.
+        if ( present(work) ) then
+            if ( present(iwork) ) then
+                call merge_sort( array, index, work, iwork )
+            else
+                allocate( ibuf(0:array_size/2-1), stat=stat )
+                if ( stat /= 0 ) error stop "Allocation of index buffer failed."
+                call merge_sort( array, index, work, ibuf )
+            end if
+        else
+            allocate( buf(0:array_size/2-1), stat=stat )
+            if ( stat /= 0 ) error stop "Allocation of array buffer failed."
+            if ( present(iwork) ) then
+                call merge_sort( array, index, buf, iwork )
+            else
+                allocate( ibuf(0:array_size/2-1), stat=stat )
+                if ( stat /= 0 ) error stop "Allocation of index buffer failed."
+                call merge_sort( array, index, buf, ibuf )
+            end if
+        end if
+
+        if ( present(reverse) ) then
+            if ( reverse ) then
+                call reverse_segment( array, index )
+            end if
+        end if
+
+    contains
+
+        pure function calc_min_run( n ) result(min_run)
+!! Returns the minimum length of a run from 32-63 so that N/MIN_RUN is
+!! less than or equal to a power of two. See
+!! https://svn.python.org/projects/python/trunk/Objects/listsort.txt
+            integer(int_size)             :: min_run
+            integer(int_size), intent(in) :: n
+
+            integer(int_size) :: num, r
+
+            num = n
+            r = 0_int_size
+
+            do while( num >= 64 )
+                r = ior( r, iand(num, 1_int_size) )
+                num = ishft(num, -1_int_size)
+            end do
+            min_run = num + r
+
+        end function calc_min_run
+
+
+        pure subroutine insertion_sort( array, index )
+! Sorts `ARRAY` using an insertion sort, while maintaining consistency in
+! location of the indices in `INDEX` to the elements of `ARRAY`.
+            integer(${k1}$), intent(inout) :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            integer(int_size) :: i, j, key_index
+            integer(${k1}$) :: key
+
+            do j=1, size(array, kind=int_size)-1
+                key = array(j)
+                key_index = index(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    index(i+1) = index(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+                index(i+1) = key_index
+            end do
+
+        end subroutine insertion_sort
+
+
+        pure function collapse( runs ) result ( r )
+! Examine the stack of runs waiting to be merged, identifying adjacent runs
+! to be merged until the stack invariants are restablished:
+!
+! 1. len(-3) > len(-2) + len(-1)
+! 2. len(-2) > len(-1)
+
+            integer(int_size) :: r
+            type(run_type), intent(in), target :: runs(0:)
+
+            integer(int_size) :: n
+            logical :: test
+
+            n = size(runs, kind=int_size)
+            test = .false.
+            if (n >= 2) then
+                if ( runs( n-1 ) % base == 0 .or. &
+                     runs( n-2 ) % len <= runs(n-1) % len ) then
+                    test = .true.
+                else if ( n >= 3 ) then ! X exists
+                    if ( runs(n-3) % len <= &
+                         runs(n-2) % len + runs(n-1) % len ) then
+                        test = .true.
+!               |X| <= |Y| + |Z| => will need to merge due to rho1 or rho2
+                    else if( n >= 4 ) then
+                        if ( runs(n-4) % len <= &
+                             runs(n-3) % len + runs(n-2) % len ) then
+                            test = .true.
+!               |W| <= |X| + |Y| => will need to merge due to rho1 or rho3
+                        end if
+                    end if
+                end if
+            end if
+            if ( test ) then
+! By default merge Y & Z, rho2 or rho3
+                if ( n >= 3 ) then
+                    if ( runs(n-3) % len < runs(n-1) % len ) then
+                        r = n - 3
+! |X| < |Z| => merge X & Y, rho1
+                        return
+                    end if
+                end if
+                r = n - 2
+! |Y| <= |Z| => merge Y & Z, rho4
+                return
+            else
+                r = -1
+            end if
+
+        end function collapse
+
+
+        pure subroutine insert_head( array, index )
+! Inserts `array(0)` into the pre-sorted sequence `array(1:)` so that the
+! whole `array(0:)` becomes sorted, copying the first element into
+! a temporary variable, iterating until the right place for it is found.
+! copying every traversed element into the slot preceding it, and finally,
+! copying data from the temporary variable into the resulting hole.
+! Consistency of the indices in `index` with the elements of `array`
+! are maintained.
+
+            integer(${k1}$), intent(inout)   :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            integer(${k1}$) :: tmp
+            integer(int_size) :: i, tmp_index
+
+            tmp = array(0)
+            tmp_index = index(0)
+            find_hole: do i=1, size(array, kind=int_size)-1
+                if ( array(i) >= tmp ) exit find_hole
+                array(i-1) = array(i)
+                index(i-1) = index(i)
+            end do find_hole
+            array(i-1) = tmp
+            index(i-1) = tmp_index
+
+        end subroutine insert_head
+
+
+        subroutine merge_sort( array, index, buf, ibuf )
+! The Rust merge sort borrows some (but not all) of the ideas from TimSort,
+! which is described in detail at
+! (http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
+!
+! The algorithm identifies strictly descending and non-descending
+! subsequences, which are called natural runs. Where these runs are less
+! than a minimum run size they are padded by adding additional samples
+! using an insertion sort. The merge process is driven by a stack of
+! pending unmerged runs. Each newly found run is pushed onto the stack,
+! and then pairs of adjacentd runs are merged until these two invariants
+! are satisfied:
+!
+! 1. for every `i` in `1..size(runs)-1`: `runs(i - 1)%len > runs(i)%len`
+! 2. for every `i` in `2..size(runs)-1`: `runs(i - 2)%len >
+!    runs(i - 1)%len + runs(i)%len`
+!
+! The invariants ensure that the total running time is `O(n log n)`
+! worst-case. Consistency of the indices in `index` with the elements of
+! `array` are maintained.
+
+            integer(${k1}$), intent(inout)   :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+            integer(${k1}$), intent(inout)   :: buf(0:)
+            integer(int_size), intent(inout) :: ibuf(0:)
+
+            integer(int_size) :: array_size, finish, min_run, r, r_count, &
+                start
+            type(run_type) :: runs(0:max_merge_stack-1), left, right
+
+            array_size = size(array, kind=int_size)
+
+! Very short runs are extended using insertion sort to span at least this
+! many elements. Slices of up to this length are sorted using insertion sort.
+            min_run = calc_min_run( array_size )
+
+            if ( array_size <= min_run ) then
+                if ( array_size >= 2 ) call insertion_sort( array, index )
+                return
+            end if
+
+
+! Following Rust sort, natural runs in `array` are identified by traversing
+! it backwards. By traversing it backward, merges more often go in the
+! opposite direction (forwards). According to developers of Rust sort,
+! merging forwards is slightly faster than merging backwards. Therefore
+! identifying runs by traversing backwards should improve performance.
+            r_count = 0
+            finish = array_size - 1
+            do while ( finish >= 0 )
+! Find the next natural run, and reverse it if it's strictly descending.
+                start = finish
+                if ( start > 0 ) then
+                    start = start - 1
+                    if ( array(start+1) < array(start) ) then
+                        Descending: do while ( start > 0 )
+                            if ( array(start) >= array(start-1) ) &
+                                exit Descending
+                            start = start - 1
+                        end do Descending
+                        call reverse_segment( array(start:finish), &
+                                            index(start:finish) )
+                    else
+                        Ascending: do while( start > 0 )
+                            if ( array(start) < array(start-1) ) exit Ascending
+                            start = start - 1
+                        end do Ascending
+                    end if
+                end if
+
+! If the run is too short insert some more elements using an insertion sort.
+                Insert: do while ( start > 0 )
+                    if ( finish - start >= min_run - 1 ) exit Insert
+                    start = start - 1
+                    call insert_head( array(start:finish), index(start:finish) )
+                end do Insert
+                if ( start == 0 .and. finish == array_size - 1 ) return
+
+                runs(r_count) = run_type( base = start, &
+                                          len = finish - start + 1 )
+                finish = start-1
+                r_count = r_count + 1
+
+! Determine whether pairs of adjacent runs need to be merged to satisfy
+! the invariants, and, if so, merge them.
+                Merge_loop: do
+                    r = collapse( runs(0:r_count - 1) )
+                    if ( r < 0 .or. r_count <= 1 ) exit Merge_loop
+                    left = runs( r + 1 )
+                    right = runs( r )
+                    call merge( array( left % base: &
+                                     right % base + right % len - 1 ), &
+                                left % len, buf, &
+                                index( left % base: &
+                                     right % base + right % len - 1 ), ibuf )
+
+                    runs(r) = run_type( base = left % base, &
+                                        len = left % len + right % len )
+                    if ( r == r_count - 3 ) runs(r+1) = runs(r+2)
+                    r_count = r_count - 1
+
+                end do Merge_loop
+            end do
+            if ( r_count /= 1 ) &
+                error stop "MERGE_SORT completed without RUN COUNT == 1."
+
+        end subroutine merge_sort
+
+
+        pure subroutine merge( array, mid, buf, index, ibuf )
+! Merges the two non-decreasing runs `ARRAY(0:MID-1)` and `ARRAY(MID:)`
+! using `BUF` as temporary storage, and stores the merged runs into
+! `ARRAY(0:)`. `MID` must be > 0, and < `SIZE(ARRAY)-1`. Buffer `BUF`
+! must be long enough to hold the shorter of the two runs.
+            integer(${k1}$), intent(inout)   :: array(0:)
+            integer(int_size), intent(in)    :: mid
+            integer(${k1}$), intent(inout)   :: buf(0:)
+            integer(int_size), intent(inout) :: index(0:)
+            integer(int_size), intent(inout) :: ibuf(0:)
+
+            integer(int_size) :: array_len, i, j, k
+
+            array_len = size(array, kind=int_size)
+
+! Merge first copies the shorter run into `buf`. Then, depending on which
+! run was shorter, it traces the copied run and the longer run forwards
+! (or backwards), comparing their next unprocessed elements and then
+! copying the lesser (or greater) one into `array`.
+
+            if ( mid <= array_len - mid ) then ! The left run is shorter.
+                buf(0:mid-1) = array(0:mid-1)
+                ibuf(0:mid-1) = index(0:mid-1)
+                i = 0
+                j = mid
+                merge_lower: do k = 0, array_len-1
+                    if ( buf(i) <= array(j) ) then
+                        array(k) = buf(i)
+                        index(k) = ibuf(i)
+                        i = i + 1
+                        if ( i >= mid ) exit merge_lower
+                    else
+                        array(k) = array(j)
+                        index(k) = index(j)
+                        j = j + 1
+                        if ( j >= array_len ) then
+                            array(k+1:) = buf(i:mid-1)
+                            index(k+1:) = ibuf(i:mid-1)
+                            exit merge_lower
+                        end if
+                    end if
+                end do merge_lower
+            else ! The right run is shorter
+                buf(0:array_len-mid) = array(mid:array_len-1)
+                ibuf(0:array_len-mid) = index(mid:array_len-1)
+                i = mid - 1
+                j = array_len - mid -1
+                merge_upper: do k = array_len-1, 0, -1
+                    if ( buf(j) >= array(i) ) then
+                        array(k) = buf(j)
+                        index(k) = ibuf(j)
+                        j = j - 1
+                        if ( j < 0 ) exit merge_upper
+                    else
+                        array(k) = array(i)
+                        index(k) = index(i)
+                        i = i - 1
+                        if ( i < 0 ) then
+                            array(0:k-1) = buf(0:j)
+                            index(0:k-1) = ibuf(0:j)
+                            exit merge_upper
+                        end if
+                    end if
+                end do merge_upper
+            end if
+        end subroutine merge
+
+
+        pure subroutine reverse_segment( array, index )
+! Reverse a segment of an array in place
+            integer(${k1}$), intent(inout) :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            integer(int_size) :: itemp, lo, hi
+            integer(${k1}$) :: temp
+
+            lo = 0
+            hi = size( array, kind=int_size ) - 1
+            do while( lo < hi )
+                temp = array(lo)
+                array(lo) = array(hi)
+                array(hi) = temp
+                itemp = index(lo)
+                index(lo) = index(hi)
+                index(hi) = itemp
+                lo = lo + 1
+                hi = hi - 1
+            end do
+
+        end subroutine reverse_segment
+
+    end subroutine ${k1}$_sort_index
+
+#:endfor
+
+
+#:for k1 in REAL_KINDS
+
+    module subroutine ${k1}$_sort_index( array, index, work, iwork, reverse )
+! A modification of `${k1}$_ord_sort` to return an array of indices that
+! would provide a stable sort of the `ARRAY` input. The indices by default
+! correspond to a non-decreasing sort, but if the optional argument
+! `REVERSE` is present with a value of `.TRUE.` the indices correspond to
+! a non-increasing sort. The logic of the determination of indexing largely
+! follows the `"Rust" sort` found in `slice.rs`:
+! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+! The Rust version is a simplification of the Timsort algorithm described
+! in https://svn.python.org/projects/python/trunk/Objects/listsort.txt, as
+! it drops both the use of 'galloping' to identify bounds of regions to be
+! sorted and the estimation of the optimal `run size`. However it remains
+! a hybrid sorting algorithm combining an iterative Merge sort controlled
+! by a stack of `RUNS` identified by regions of uniformly decreasing or
+! non-decreasing sequences that may be expanded to a minimum run size, with
+! an insertion sort.
+!
+! Note the Fortran implementation simplifies the logic as it only has to
+! deal with Fortran arrays of intrinsic types and not the full generality
+! of Rust's arrays and lists for arbitrary types. It also adds the
+! estimation of the optimal `run size` as suggested in Tim Peters'
+! original listsort.txt, and the optional `work` and `iwork` arraya to be
+! used as scratch memory.
+
+        real(${k1}$), intent(inout)             :: array(0:)
+        integer(int_size), intent(inout)           :: index(0:)
+        real(${k1}$), intent(inout), optional   :: work(0:)
+        integer(int_size), intent(inout), optional :: iwork(0:)
+        logical, intent(in), optional              :: reverse
+
+        integer(int_size) :: array_size, i, stat
+        real(${k1}$), allocatable :: buf(:)
+        integer(int_size), allocatable :: ibuf(:)
+
+        array_size = size(array, kind=int_size)
+
+        do i = 0, array_size-1
+            index(i) = i+1
+        end do
+
+        if ( present(reverse) ) then
+            if ( reverse ) then
+                call reverse_segment( array, index )
+            end if
+        end if
+
+! If necessary allocate buffers to serve as scratch memory.
+        if ( present(work) ) then
+            if ( present(iwork) ) then
+                call merge_sort( array, index, work, iwork )
+            else
+                allocate( ibuf(0:array_size/2-1), stat=stat )
+                if ( stat /= 0 ) error stop "Allocation of index buffer failed."
+                call merge_sort( array, index, work, ibuf )
+            end if
+        else
+            allocate( buf(0:array_size/2-1), stat=stat )
+            if ( stat /= 0 ) error stop "Allocation of array buffer failed."
+            if ( present(iwork) ) then
+                call merge_sort( array, index, buf, iwork )
+            else
+                allocate( ibuf(0:array_size/2-1), stat=stat )
+                if ( stat /= 0 ) error stop "Allocation of index buffer failed."
+                call merge_sort( array, index, buf, ibuf )
+            end if
+        end if
+
+        if ( present(reverse) ) then
+            if ( reverse ) then
+                call reverse_segment( array, index )
+            end if
+        end if
+
+    contains
+
+        pure function calc_min_run( n ) result(min_run)
+!! Returns the minimum length of a run from 32-63 so that N/MIN_RUN is
+!! less than or equal to a power of two. See
+!! https://svn.python.org/projects/python/trunk/Objects/listsort.txt
+            integer(int_size)             :: min_run
+            integer(int_size), intent(in) :: n
+
+            integer(int_size) :: num, r
+
+            num = n
+            r = 0_int_size
+
+            do while( num >= 64 )
+                r = ior( r, iand(num, 1_int_size) )
+                num = ishft(num, -1_int_size)
+            end do
+            min_run = num + r
+
+        end function calc_min_run
+
+
+        pure subroutine insertion_sort( array, index )
+! Sorts `ARRAY` using an insertion sort, while maintaining consistency in
+! location of the indices in `INDEX` to the elements of `ARRAY`.
+            real(${k1}$), intent(inout) :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            integer(int_size) :: i, j, key_index
+            real(${k1}$) :: key
+
+            do j=1, size(array, kind=int_size)-1
+                key = array(j)
+                key_index = index(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    index(i+1) = index(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+                index(i+1) = key_index
+            end do
+
+        end subroutine insertion_sort
+
+
+        pure function collapse( runs ) result ( r )
+! Examine the stack of runs waiting to be merged, identifying adjacent runs
+! to be merged until the stack invariants are restablished:
+!
+! 1. len(-3) > len(-2) + len(-1)
+! 2. len(-2) > len(-1)
+
+            integer(int_size) :: r
+            type(run_type), intent(in), target :: runs(0:)
+
+            integer(int_size) :: n
+            logical :: test
+
+            n = size(runs, kind=int_size)
+            test = .false.
+            if (n >= 2) then
+                if ( runs( n-1 ) % base == 0 .or. &
+                     runs( n-2 ) % len <= runs(n-1) % len ) then
+                    test = .true.
+                else if ( n >= 3 ) then ! X exists
+                    if ( runs(n-3) % len <= &
+                         runs(n-2) % len + runs(n-1) % len ) then
+                        test = .true.
+!               |X| <= |Y| + |Z| => will need to merge due to rho1 or rho2
+                    else if( n >= 4 ) then
+                        if ( runs(n-4) % len <= &
+                             runs(n-3) % len + runs(n-2) % len ) then
+                            test = .true.
+!               |W| <= |X| + |Y| => will need to merge due to rho1 or rho3
+                        end if
+                    end if
+                end if
+            end if
+            if ( test ) then
+! By default merge Y & Z, rho2 or rho3
+                if ( n >= 3 ) then
+                    if ( runs(n-3) % len < runs(n-1) % len ) then
+                        r = n - 3
+! |X| < |Z| => merge X & Y, rho1
+                        return
+                    end if
+                end if
+                r = n - 2
+! |Y| <= |Z| => merge Y & Z, rho4
+                return
+            else
+                r = -1
+            end if
+
+        end function collapse
+
+
+        pure subroutine insert_head( array, index )
+! Inserts `array(0)` into the pre-sorted sequence `array(1:)` so that the
+! whole `array(0:)` becomes sorted, copying the first element into
+! a temporary variable, iterating until the right place for it is found.
+! copying every traversed element into the slot preceding it, and finally,
+! copying data from the temporary variable into the resulting hole.
+! Consistency of the indices in `index` with the elements of `array`
+! are maintained.
+
+            real(${k1}$), intent(inout)   :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            real(${k1}$) :: tmp
+            integer(int_size) :: i, tmp_index
+
+            tmp = array(0)
+            tmp_index = index(0)
+            find_hole: do i=1, size(array, kind=int_size)-1
+                if ( array(i) >= tmp ) exit find_hole
+                array(i-1) = array(i)
+                index(i-1) = index(i)
+            end do find_hole
+            array(i-1) = tmp
+            index(i-1) = tmp_index
+
+        end subroutine insert_head
+
+
+        subroutine merge_sort( array, index, buf, ibuf )
+! The Rust merge sort borrows some (but not all) of the ideas from TimSort,
+! which is described in detail at
+! (http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
+!
+! The algorithm identifies strictly descending and non-descending
+! subsequences, which are called natural runs. Where these runs are less
+! than a minimum run size they are padded by adding additional samples
+! using an insertion sort. The merge process is driven by a stack of
+! pending unmerged runs. Each newly found run is pushed onto the stack,
+! and then pairs of adjacentd runs are merged until these two invariants
+! are satisfied:
+!
+! 1. for every `i` in `1..size(runs)-1`: `runs(i - 1)%len > runs(i)%len`
+! 2. for every `i` in `2..size(runs)-1`: `runs(i - 2)%len >
+!    runs(i - 1)%len + runs(i)%len`
+!
+! The invariants ensure that the total running time is `O(n log n)`
+! worst-case. Consistency of the indices in `index` with the elements of
+! `array` are maintained.
+
+            real(${k1}$), intent(inout)   :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+            real(${k1}$), intent(inout)   :: buf(0:)
+            integer(int_size), intent(inout) :: ibuf(0:)
+
+            integer(int_size) :: array_size, finish, min_run, r, r_count, &
+                start
+            type(run_type) :: runs(0:max_merge_stack-1), left, right
+
+            array_size = size(array, kind=int_size)
+
+! Very short runs are extended using insertion sort to span at least this
+! many elements. Slices of up to this length are sorted using insertion sort.
+            min_run = calc_min_run( array_size )
+
+            if ( array_size <= min_run ) then
+                if ( array_size >= 2 ) call insertion_sort( array, index )
+                return
+            end if
+
+
+! Following Rust sort, natural runs in `array` are identified by traversing
+! it backwards. By traversing it backward, merges more often go in the
+! opposite direction (forwards). According to developers of Rust sort,
+! merging forwards is slightly faster than merging backwards. Therefore
+! identifying runs by traversing backwards should improve performance.
+            r_count = 0
+            finish = array_size - 1
+            do while ( finish >= 0 )
+! Find the next natural run, and reverse it if it's strictly descending.
+                start = finish
+                if ( start > 0 ) then
+                    start = start - 1
+                    if ( array(start+1) < array(start) ) then
+                        Descending: do while ( start > 0 )
+                            if ( array(start) >= array(start-1) ) &
+                                exit Descending
+                            start = start - 1
+                        end do Descending
+                        call reverse_segment( array(start:finish), &
+                                            index(start:finish) )
+                    else
+                        Ascending: do while( start > 0 )
+                            if ( array(start) < array(start-1) ) exit Ascending
+                            start = start - 1
+                        end do Ascending
+                    end if
+                end if
+
+! If the run is too short insert some more elements using an insertion sort.
+                Insert: do while ( start > 0 )
+                    if ( finish - start >= min_run - 1 ) exit Insert
+                    start = start - 1
+                    call insert_head( array(start:finish), index(start:finish) )
+                end do Insert
+                if ( start == 0 .and. finish == array_size - 1 ) return
+
+                runs(r_count) = run_type( base = start, &
+                                          len = finish - start + 1 )
+                finish = start-1
+                r_count = r_count + 1
+
+! Determine whether pairs of adjacent runs need to be merged to satisfy
+! the invariants, and, if so, merge them.
+                Merge_loop: do
+                    r = collapse( runs(0:r_count - 1) )
+                    if ( r < 0 .or. r_count <= 1 ) exit Merge_loop
+                    left = runs( r + 1 )
+                    right = runs( r )
+                    call merge( array( left % base: &
+                                     right % base + right % len - 1 ), &
+                                left % len, buf, &
+                                index( left % base: &
+                                     right % base + right % len - 1 ), ibuf )
+
+                    runs(r) = run_type( base = left % base, &
+                                        len = left % len + right % len )
+                    if ( r == r_count - 3 ) runs(r+1) = runs(r+2)
+                    r_count = r_count - 1
+
+                end do Merge_loop
+            end do
+            if ( r_count /= 1 ) &
+                error stop "MERGE_SORT completed without RUN COUNT == 1."
+
+        end subroutine merge_sort
+
+
+        pure subroutine merge( array, mid, buf, index, ibuf )
+! Merges the two non-decreasing runs `ARRAY(0:MID-1)` and `ARRAY(MID:)`
+! using `BUF` as temporary storage, and stores the merged runs into
+! `ARRAY(0:)`. `MID` must be > 0, and < `SIZE(ARRAY)-1`. Buffer `BUF`
+! must be long enough to hold the shorter of the two runs.
+            real(${k1}$), intent(inout)   :: array(0:)
+            integer(int_size), intent(in)    :: mid
+            real(${k1}$), intent(inout)   :: buf(0:)
+            integer(int_size), intent(inout) :: index(0:)
+            integer(int_size), intent(inout) :: ibuf(0:)
+
+            integer(int_size) :: array_len, i, j, k
+
+            array_len = size(array, kind=int_size)
+
+! Merge first copies the shorter run into `buf`. Then, depending on which
+! run was shorter, it traces the copied run and the longer run forwards
+! (or backwards), comparing their next unprocessed elements and then
+! copying the lesser (or greater) one into `array`.
+
+            if ( mid <= array_len - mid ) then ! The left run is shorter.
+                buf(0:mid-1) = array(0:mid-1)
+                ibuf(0:mid-1) = index(0:mid-1)
+                i = 0
+                j = mid
+                merge_lower: do k = 0, array_len-1
+                    if ( buf(i) <= array(j) ) then
+                        array(k) = buf(i)
+                        index(k) = ibuf(i)
+                        i = i + 1
+                        if ( i >= mid ) exit merge_lower
+                    else
+                        array(k) = array(j)
+                        index(k) = index(j)
+                        j = j + 1
+                        if ( j >= array_len ) then
+                            array(k+1:) = buf(i:mid-1)
+                            index(k+1:) = ibuf(i:mid-1)
+                            exit merge_lower
+                        end if
+                    end if
+                end do merge_lower
+            else ! The right run is shorter
+                buf(0:array_len-mid) = array(mid:array_len-1)
+                ibuf(0:array_len-mid) = index(mid:array_len-1)
+                i = mid - 1
+                j = array_len - mid -1
+                merge_upper: do k = array_len-1, 0, -1
+                    if ( buf(j) >= array(i) ) then
+                        array(k) = buf(j)
+                        index(k) = ibuf(j)
+                        j = j - 1
+                        if ( j < 0 ) exit merge_upper
+                    else
+                        array(k) = array(i)
+                        index(k) = index(i)
+                        i = i - 1
+                        if ( i < 0 ) then
+                            array(0:k-1) = buf(0:j)
+                            index(0:k-1) = ibuf(0:j)
+                            exit merge_upper
+                        end if
+                    end if
+                end do merge_upper
+            end if
+        end subroutine merge
+
+
+        pure subroutine reverse_segment( array, index )
+! Reverse a segment of an array in place
+            real(${k1}$), intent(inout) :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            integer(int_size) :: itemp, lo, hi
+            real(${k1}$) :: temp
+
+            lo = 0
+            hi = size( array, kind=int_size ) - 1
+            do while( lo < hi )
+                temp = array(lo)
+                array(lo) = array(hi)
+                array(hi) = temp
+                itemp = index(lo)
+                index(lo) = index(hi)
+                index(hi) = itemp
+                lo = lo + 1
+                hi = hi - 1
+            end do
+
+        end subroutine reverse_segment
+
+    end subroutine ${k1}$_sort_index
+
+#:endfor
+
+    module subroutine char_sort_index( array, index, work, iwork, reverse )
+! A modification of `char_ord_sort` to return an array of indices that
+! would provide a stable sort of the `ARRAY` input. The indices by default
+! correspond to a non-decreasing sort, but if the optional argument
+! `REVERSE` is present with a value of `.TRUE.` the indices correspond to
+! a non-increasing sort. The logic of the determination of indexing largely
+! follows the `"Rust" sort` found in `slice.rs`:
+! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+! The Rust version is a simplification of the Timsort algorithm described
+! in https://svn.python.org/projects/python/trunk/Objects/listsort.txt, as
+! it drops both the use of 'galloping' to identify bounds of regions to be
+! sorted and the estimation of the optimal `run size`. However it remains
+! a hybrid sorting algorithm combining an iterative Merge sort controlled
+! by a stack of `RUNS` identified by regions of uniformly decreasing or
+! non-decreasing sequences that may be expanded to a minimum run size, with
+! an insertion sort.
+!
+! Note the Fortran implementation simplifies the logic as it only has to
+! deal with Fortran arrays of intrinsic types and not the full generality
+! of Rust's arrays and lists for arbitrary types. It also adds the
+! estimation of the optimal `run size` as suggested in Tim Peters'
+! original listsort.txt, and the optional `work` and `iwork` arraya to be
+! used as scratch memory.
+
+        character(len=*), intent(inout)                      :: array(0:)
+        integer(int_size), intent(inout)                     :: index(0:)
+        character(len=len(array)), intent(inout), optional   :: work(0:)
+        integer(int_size), intent(inout), optional           :: iwork(0:)
+        logical, intent(in), optional                        :: reverse
+
+        integer(int_size) :: array_size, i, stat
+        character(len=:), allocatable :: buf(:)
+        integer(int_size), allocatable :: ibuf(:)
+
+        array_size = size(array, kind=int_size)
+
+        do i = 0, array_size-1
+            index(i) = i+1
+        end do
+
+        if ( present(reverse) ) then
+            if ( reverse ) then
+                call reverse_segment( array, index )
+            end if
+        end if
+
+! If necessary allocate buffers to serve as scratch memory.
+        if ( present(work) ) then
+            if ( present(iwork) ) then
+                call merge_sort( array, index, work, iwork )
+            else
+                allocate( ibuf(0:array_size/2-1), stat=stat )
+                if ( stat /= 0 ) error stop "Allocation of index buffer failed."
+                call merge_sort( array, index, work, ibuf )
+            end if
+        else
+            allocate( character(len=len(array)) :: buf(0:array_size/2-1), &
+                      stat=stat )
+            if ( stat /= 0 ) error stop "Allocation of array buffer failed."
+            if ( present(iwork) ) then
+                call merge_sort( array, index, buf, iwork )
+            else
+                allocate( ibuf(0:array_size/2-1), stat=stat )
+                if ( stat /= 0 ) error stop "Allocation of index buffer failed."
+                call merge_sort( array, index, buf, ibuf )
+            end if
+        end if
+
+        if ( present(reverse) ) then
+            if ( reverse ) then
+                call reverse_segment( array, index )
+            end if
+        end if
+
+    contains
+
+        pure function calc_min_run( n ) result(min_run)
+!! Returns the minimum length of a run from 32-63 so that N/MIN_RUN is
+!! less than or equal to a power of two. See
+!! https://svn.python.org/projects/python/trunk/Objects/listsort.txt
+            integer(int_size)             :: min_run
+            integer(int_size), intent(in) :: n
+
+            integer(int_size) :: num, r
+
+            num = n
+            r = 0_int_size
+
+            do while( num >= 64 )
+                r = ior( r, iand(num, 1_int_size) )
+                num = ishft(num, -1_int_size)
+            end do
+            min_run = num + r
+
+        end function calc_min_run
+
+
+        pure subroutine insertion_sort( array, index )
+! Sorts `ARRAY` using an insertion sort, while maintaining consistency in
+! location of the indices in `INDEX` to the elements of `ARRAY`.
+            character(len=*), intent(inout)  :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            integer(int_size)         :: i, j, key_index
+            character(len=len(array)) :: key
+
+            do j=1, size(array, kind=int_size)-1
+                key = array(j)
+                key_index = index(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    index(i+1) = index(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+                index(i+1) = key_index
+            end do
+
+        end subroutine insertion_sort
+
+
+        pure function collapse( runs ) result ( r )
+! Examine the stack of runs waiting to be merged, identifying adjacent runs
+! to be merged until the stack invariants are restablished:
+!
+! 1. len(-3) > len(-2) + len(-1)
+! 2. len(-2) > len(-1)
+
+            integer(int_size) :: r
+            type(run_type), intent(in), target :: runs(0:)
+
+            integer(int_size) :: n
+            logical :: test
+
+            n = size(runs, kind=int_size)
+            test = .false.
+            if (n >= 2) then
+                if ( runs( n-1 ) % base == 0 .or. &
+                     runs( n-2 ) % len <= runs(n-1) % len ) then
+                    test = .true.
+                else if ( n >= 3 ) then ! X exists
+                    if ( runs(n-3) % len <= &
+                         runs(n-2) % len + runs(n-1) % len ) then
+                        test = .true.
+!               |X| <= |Y| + |Z| => will need to merge due to rho1 or rho2
+                    else if( n >= 4 ) then
+                        if ( runs(n-4) % len <= &
+                             runs(n-3) % len + runs(n-2) % len ) then
+                            test = .true.
+!               |W| <= |X| + |Y| => will need to merge due to rho1 or rho3
+                        end if
+                    end if
+                end if
+            end if
+            if ( test ) then
+! By default merge Y & Z, rho2 or rho3
+                if ( n >= 3 ) then
+                    if ( runs(n-3) % len < runs(n-1) % len ) then
+                        r = n - 3
+! |X| < |Z| => merge X & Y, rho1
+                        return
+                    end if
+                end if
+                r = n - 2
+! |Y| <= |Z| => merge Y & Z, rho4
+                return
+            else
+                r = -1
+            end if
+
+        end function collapse
+
+
+        pure subroutine insert_head( array, index )
+! Inserts `array(0)` into the pre-sorted sequence `array(1:)` so that the
+! whole `array(0:)` becomes sorted, copying the first element into
+! a temporary variable, iterating until the right place for it is found.
+! copying every traversed element into the slot preceding it, and finally,
+! copying data from the temporary variable into the resulting hole.
+! Consistency of the indices in `index` with the elements of `array`
+! are maintained.
+
+            character(len=*), intent(inout)  :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            character(len=len(array)) :: tmp
+            integer(int_size) :: i, tmp_index
+
+            tmp = array(0)
+            tmp_index = index(0)
+            find_hole: do i=1, size(array, kind=int_size)-1
+                if ( array(i) >= tmp ) exit find_hole
+                array(i-1) = array(i)
+                index(i-1) = index(i)
+            end do find_hole
+            array(i-1) = tmp
+            index(i-1) = tmp_index
+
+        end subroutine insert_head
+
+
+        subroutine merge_sort( array, index, buf, ibuf )
+! The Rust merge sort borrows some (but not all) of the ideas from TimSort,
+! which is described in detail at
+! (http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
+!
+! The algorithm identifies strictly descending and non-descending
+! subsequences, which are called natural runs. Where these runs are less
+! than a minimum run size they are padded by adding additional samples
+! using an insertion sort. The merge process is driven by a stack of
+! pending unmerged runs. Each newly found run is pushed onto the stack,
+! and then pairs of adjacentd runs are merged until these two invariants
+! are satisfied:
+!
+! 1. for every `i` in `1..size(runs)-1`: `runs(i - 1)%len > runs(i)%len`
+! 2. for every `i` in `2..size(runs)-1`: `runs(i - 2)%len >
+!    runs(i - 1)%len + runs(i)%len`
+!
+! The invariants ensure that the total running time is `O(n log n)`
+! worst-case. Consistency of the indices in `index` with the elements of
+! `array` are maintained.
+
+            character(len=*), intent(inout)          :: array(0:)
+            integer(int_size), intent(inout)         :: index(0:)
+            character(len=len(array)), intent(inout) :: buf(0:)
+            integer(int_size), intent(inout)         :: ibuf(0:)
+
+            integer(int_size) :: array_size, finish, min_run, r, r_count, &
+                start
+            type(run_type) :: runs(0:max_merge_stack-1), left, right
+
+            array_size = size(array, kind=int_size)
+
+! Very short runs are extended using insertion sort to span at least this
+! many elements. Slices of up to this length are sorted using insertion sort.
+            min_run = calc_min_run( array_size )
+
+            if ( array_size <= min_run ) then
+                if ( array_size >= 2 ) call insertion_sort( array, index )
+                return
+            end if
+
+
+! Following Rust sort, natural runs in `array` are identified by traversing
+! it backwards. By traversing it backward, merges more often go in the
+! opposite direction (forwards). According to developers of Rust sort,
+! merging forwards is slightly faster than merging backwards. Therefore
+! identifying runs by traversing backwards should improve performance.
+            r_count = 0
+            finish = array_size - 1
+            do while ( finish >= 0 )
+! Find the next natural run, and reverse it if it's strictly descending.
+                start = finish
+                if ( start > 0 ) then
+                    start = start - 1
+                    if ( array(start+1) < array(start) ) then
+                        Descending: do while ( start > 0 )
+                            if ( array(start) >= array(start-1) ) &
+                                exit Descending
+                            start = start - 1
+                        end do Descending
+                        call reverse_segment( array(start:finish), &
+                                            index(start:finish) )
+                    else
+                        Ascending: do while( start > 0 )
+                            if ( array(start) < array(start-1) ) exit Ascending
+                            start = start - 1
+                        end do Ascending
+                    end if
+                end if
+
+! If the run is too short insert some more elements using an insertion sort.
+                Insert: do while ( start > 0 )
+                    if ( finish - start >= min_run - 1 ) exit Insert
+                    start = start - 1
+                    call insert_head( array(start:finish), index(start:finish) )
+                end do Insert
+                if ( start == 0 .and. finish == array_size - 1 ) return
+
+                runs(r_count) = run_type( base = start, &
+                                          len = finish - start + 1 )
+                finish = start-1
+                r_count = r_count + 1
+
+! Determine whether pairs of adjacent runs need to be merged to satisfy
+! the invariants, and, if so, merge them.
+                Merge_loop: do
+                    r = collapse( runs(0:r_count - 1) )
+                    if ( r < 0 .or. r_count <= 1 ) exit Merge_loop
+                    left = runs( r + 1 )
+                    right = runs( r )
+                    call merge( array( left % base: &
+                                     right % base + right % len - 1 ), &
+                                left % len, buf, &
+                                index( left % base: &
+                                     right % base + right % len - 1 ), ibuf )
+
+                    runs(r) = run_type( base = left % base, &
+                                        len = left % len + right % len )
+                    if ( r == r_count - 3 ) runs(r+1) = runs(r+2)
+                    r_count = r_count - 1
+
+                end do Merge_loop
+            end do
+            if ( r_count /= 1 ) &
+                error stop "MERGE_SORT completed without RUN COUNT == 1."
+
+        end subroutine merge_sort
+
+
+        pure subroutine merge( array, mid, buf, index, ibuf )
+! Merges the two non-decreasing runs `ARRAY(0:MID-1)` and `ARRAY(MID:)`
+! using `BUF` as temporary storage, and stores the merged runs into
+! `ARRAY(0:)`. `MID` must be > 0, and < `SIZE(ARRAY)-1`. Buffer `BUF`
+! must be long enough to hold the shorter of the two runs.
+            character(len=*), intent(inout)          :: array(0:)
+            integer(int_size), intent(in)            :: mid
+            character(len=len(array)), intent(inout) :: buf(0:)
+            integer(int_size), intent(inout)         :: index(0:)
+            integer(int_size), intent(inout)         :: ibuf(0:)
+
+            integer(int_size) :: array_len, i, j, k
+
+            array_len = size(array, kind=int_size)
+
+! Merge first copies the shorter run into `buf`. Then, depending on which
+! run was shorter, it traces the copied run and the longer run forwards
+! (or backwards), comparing their next unprocessed elements and then
+! copying the lesser (or greater) one into `array`.
+
+            if ( mid <= array_len - mid ) then ! The left run is shorter.
+                buf(0:mid-1) = array(0:mid-1)
+                ibuf(0:mid-1) = index(0:mid-1)
+                i = 0
+                j = mid
+                merge_lower: do k = 0, array_len-1
+                    if ( buf(i) <= array(j) ) then
+                        array(k) = buf(i)
+                        index(k) = ibuf(i)
+                        i = i + 1
+                        if ( i >= mid ) exit merge_lower
+                    else
+                        array(k) = array(j)
+                        index(k) = index(j)
+                        j = j + 1
+                        if ( j >= array_len ) then
+                            array(k+1:) = buf(i:mid-1)
+                            index(k+1:) = ibuf(i:mid-1)
+                            exit merge_lower
+                        end if
+                    end if
+                end do merge_lower
+            else ! The right run is shorter
+                buf(0:array_len-mid) = array(mid:array_len-1)
+                ibuf(0:array_len-mid) = index(mid:array_len-1)
+                i = mid - 1
+                j = array_len - mid -1
+                merge_upper: do k = array_len-1, 0, -1
+                    if ( buf(j) >= array(i) ) then
+                        array(k) = buf(j)
+                        index(k) = ibuf(j)
+                        j = j - 1
+                        if ( j < 0 ) exit merge_upper
+                    else
+                        array(k) = array(i)
+                        index(k) = index(i)
+                        i = i - 1
+                        if ( i < 0 ) then
+                            array(0:k-1) = buf(0:j)
+                            index(0:k-1) = ibuf(0:j)
+                            exit merge_upper
+                        end if
+                    end if
+                end do merge_upper
+            end if
+        end subroutine merge
+
+
+        pure subroutine reverse_segment( array, index )
+! Reverse a segment of an array in place
+            character(len=*), intent(inout)  :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            integer(int_size)         :: itemp, lo, hi
+            character(len=len(array)) :: temp
+
+            lo = 0
+            hi = size( array, kind=int_size ) - 1
+            do while( lo < hi )
+                temp = array(lo)
+                array(lo) = array(hi)
+                array(hi) = temp
+                itemp = index(lo)
+                index(lo) = index(hi)
+                index(hi) = itemp
+                lo = lo + 1
+                hi = hi - 1
+            end do
+
+        end subroutine reverse_segment
+
+    end subroutine char_sort_index
+
+    module subroutine string_sort_index( array, index, work, iwork, reverse )
+! A modification of `string_ord_sort` to return an array of indices that
+! would provide a stable sort of the `ARRAY` input. The indices by default
+! correspond to a non-decreasing sort, but if the optional argument
+! `REVERSE` is present with a value of `.TRUE.` the indices correspond to
+! a non-increasing sort. The logic of the determination of indexing largely
+! follows the `"Rust" sort` found in `slice.rs`:
+! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+! The Rust version is a simplification of the Timsort algorithm described
+! in https://svn.python.org/projects/python/trunk/Objects/listsort.txt, as
+! it drops both the use of 'galloping' to identify bounds of regions to be
+! sorted and the estimation of the optimal `run size`. However it remains
+! a hybrid sorting algorithm combining an iterative Merge sort controlled
+! by a stack of `RUNS` identified by regions of uniformly decreasing or
+! non-decreasing sequences that may be expanded to a minimum run size, with
+! an insertion sort.
+!
+! Note the Fortran implementation simplifies the logic as it only has to
+! deal with Fortran arrays of intrinsic types and not the full generality
+! of Rust's arrays and lists for arbitrary types. It also adds the
+! estimation of the optimal `run size` as suggested in Tim Peters'
+! original listsort.txt, and the optional `work` and `iwork` arraya to be
+! used as scratch memory.
+
+        type(string_type), intent(inout)           :: array(0:)
+        integer(int_size), intent(inout)           :: index(0:)
+        type(string_type), intent(inout), optional :: work(0:)
+        integer(int_size), intent(inout), optional :: iwork(0:)
+        logical, intent(in), optional              :: reverse
+
+        integer(int_size)              :: array_size, i, stat
+        type(string_type), allocatable :: buf(:)
+        integer(int_size), allocatable :: ibuf(:)
+
+        array_size = size(array, kind=int_size)
+
+        do i = 0, array_size-1
+            index(i) = i+1
+        end do
+
+        if ( present(reverse) ) then
+            if ( reverse ) then
+                call reverse_segment( array, index )
+            end if
+        end if
+
+! If necessary allocate buffers to serve as scratch memory.
+        if ( present(work) ) then
+            if ( present(iwork) ) then
+                call merge_sort( array, index, work, iwork )
+            else
+                allocate( ibuf(0:array_size/2-1), stat=stat )
+                if ( stat /= 0 ) error stop "Allocation of index buffer failed."
+                call merge_sort( array, index, work, ibuf )
+            end if
+        else
+            allocate( buf(0:array_size/2-1), stat=stat )
+            if ( stat /= 0 ) error stop "Allocation of array buffer failed."
+            if ( present(iwork) ) then
+                call merge_sort( array, index, buf, iwork )
+            else
+                allocate( ibuf(0:array_size/2-1), stat=stat )
+                if ( stat /= 0 ) error stop "Allocation of index buffer failed."
+                call merge_sort( array, index, buf, ibuf )
+            end if
+        end if
+
+        if ( present(reverse) ) then
+            if ( reverse ) then
+                call reverse_segment( array, index )
+            end if
+        end if
+
+    contains
+
+        pure function calc_min_run( n ) result(min_run)
+!! Returns the minimum length of a run from 32-63 so that N/MIN_RUN is
+!! less than or equal to a power of two. See
+!! https://svn.python.org/projects/python/trunk/Objects/listsort.txt
+            integer(int_size)             :: min_run
+            integer(int_size), intent(in) :: n
+
+            integer(int_size) :: num, r
+
+            num = n
+            r = 0_int_size
+
+            do while( num >= 64 )
+                r = ior( r, iand(num, 1_int_size) )
+                num = ishft(num, -1_int_size)
+            end do
+            min_run = num + r
+
+        end function calc_min_run
+
+
+        pure subroutine insertion_sort( array, index )
+! Sorts `ARRAY` using an insertion sort, while maintaining consistency in
+! location of the indices in `INDEX` to the elements of `ARRAY`.
+            type(string_type), intent(inout) :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            integer(int_size) :: i, j, key_index
+            type(string_type) :: key
+
+            do j=1, size(array, kind=int_size)-1
+                key = array(j)
+                key_index = index(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    index(i+1) = index(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+                index(i+1) = key_index
+            end do
+
+        end subroutine insertion_sort
+
+
+        pure function collapse( runs ) result ( r )
+! Examine the stack of runs waiting to be merged, identifying adjacent runs
+! to be merged until the stack invariants are restablished:
+!
+! 1. len(-3) > len(-2) + len(-1)
+! 2. len(-2) > len(-1)
+
+            integer(int_size) :: r
+            type(run_type), intent(in), target :: runs(0:)
+
+            integer(int_size) :: n
+            logical :: test
+
+            n = size(runs, kind=int_size)
+            test = .false.
+            if (n >= 2) then
+                if ( runs( n-1 ) % base == 0 .or. &
+                     runs( n-2 ) % len <= runs(n-1) % len ) then
+                    test = .true.
+                else if ( n >= 3 ) then ! X exists
+                    if ( runs(n-3) % len <= &
+                         runs(n-2) % len + runs(n-1) % len ) then
+                        test = .true.
+!               |X| <= |Y| + |Z| => will need to merge due to rho1 or rho2
+                    else if( n >= 4 ) then
+                        if ( runs(n-4) % len <= &
+                             runs(n-3) % len + runs(n-2) % len ) then
+                            test = .true.
+!               |W| <= |X| + |Y| => will need to merge due to rho1 or rho3
+                        end if
+                    end if
+                end if
+            end if
+            if ( test ) then
+! By default merge Y & Z, rho2 or rho3
+                if ( n >= 3 ) then
+                    if ( runs(n-3) % len < runs(n-1) % len ) then
+                        r = n - 3
+! |X| < |Z| => merge X & Y, rho1
+                        return
+                    end if
+                end if
+                r = n - 2
+! |Y| <= |Z| => merge Y & Z, rho4
+                return
+            else
+                r = -1
+            end if
+
+        end function collapse
+
+
+        pure subroutine insert_head( array, index )
+! Inserts `array(0)` into the pre-sorted sequence `array(1:)` so that the
+! whole `array(0:)` becomes sorted, copying the first element into
+! a temporary variable, iterating until the right place for it is found.
+! copying every traversed element into the slot preceding it, and finally,
+! copying data from the temporary variable into the resulting hole.
+! Consistency of the indices in `index` with the elements of `array`
+! are maintained.
+
+            type(string_type), intent(inout) :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            type(string_type) :: tmp
+            integer(int_size) :: i, tmp_index
+
+            tmp = array(0)
+            tmp_index = index(0)
+            find_hole: do i=1, size(array, kind=int_size)-1
+                if ( array(i) >= tmp ) exit find_hole
+                array(i-1) = array(i)
+                index(i-1) = index(i)
+            end do find_hole
+            array(i-1) = tmp
+            index(i-1) = tmp_index
+
+        end subroutine insert_head
+
+
+        subroutine merge_sort( array, index, buf, ibuf )
+! The Rust merge sort borrows some (but not all) of the ideas from TimSort,
+! which is described in detail at
+! (http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
+!
+! The algorithm identifies strictly descending and non-descending
+! subsequences, which are called natural runs. Where these runs are less
+! than a minimum run size they are padded by adding additional samples
+! using an insertion sort. The merge process is driven by a stack of
+! pending unmerged runs. Each newly found run is pushed onto the stack,
+! and then pairs of adjacentd runs are merged until these two invariants
+! are satisfied:
+!
+! 1. for every `i` in `1..size(runs)-1`: `runs(i - 1)%len > runs(i)%len`
+! 2. for every `i` in `2..size(runs)-1`: `runs(i - 2)%len >
+!    runs(i - 1)%len + runs(i)%len`
+!
+! The invariants ensure that the total running time is `O(n log n)`
+! worst-case. Consistency of the indices in `index` with the elements of
+! `array` are maintained.
+
+            type(string_type), intent(inout) :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+            type(string_type), intent(inout) :: buf(0:)
+            integer(int_size), intent(inout) :: ibuf(0:)
+
+            integer(int_size) :: array_size, finish, min_run, r, r_count, &
+                start
+            type(run_type)    :: runs(0:max_merge_stack-1), left, right
+
+            array_size = size(array, kind=int_size)
+
+! Very short runs are extended using insertion sort to span at least this
+! many elements. Slices of up to this length are sorted using insertion sort.
+            min_run = calc_min_run( array_size )
+
+            if ( array_size <= min_run ) then
+                if ( array_size >= 2 ) call insertion_sort( array, index )
+                return
+            end if
+
+
+! Following Rust sort, natural runs in `array` are identified by traversing
+! it backwards. By traversing it backward, merges more often go in the
+! opposite direction (forwards). According to developers of Rust sort,
+! merging forwards is slightly faster than merging backwards. Therefore
+! identifying runs by traversing backwards should improve performance.
+            r_count = 0
+            finish = array_size - 1
+            do while ( finish >= 0 )
+! Find the next natural run, and reverse it if it's strictly descending.
+                start = finish
+                if ( start > 0 ) then
+                    start = start - 1
+                    if ( array(start+1) < array(start) ) then
+                        Descending: do while ( start > 0 )
+                            if ( array(start) >= array(start-1) ) &
+                                exit Descending
+                            start = start - 1
+                        end do Descending
+                        call reverse_segment( array(start:finish), &
+                                            index(start:finish) )
+                    else
+                        Ascending: do while( start > 0 )
+                            if ( array(start) < array(start-1) ) exit Ascending
+                            start = start - 1
+                        end do Ascending
+                    end if
+                end if
+
+! If the run is too short insert some more elements using an insertion sort.
+                Insert: do while ( start > 0 )
+                    if ( finish - start >= min_run - 1 ) exit Insert
+                    start = start - 1
+                    call insert_head( array(start:finish), index(start:finish) )
+                end do Insert
+                if ( start == 0 .and. finish == array_size - 1 ) return
+
+                runs(r_count) = run_type( base = start, &
+                                          len = finish - start + 1 )
+                finish = start-1
+                r_count = r_count + 1
+
+! Determine whether pairs of adjacent runs need to be merged to satisfy
+! the invariants, and, if so, merge them.
+                Merge_loop: do
+                    r = collapse( runs(0:r_count - 1) )
+                    if ( r < 0 .or. r_count <= 1 ) exit Merge_loop
+                    left = runs( r + 1 )
+                    right = runs( r )
+                    call merge( array( left % base: &
+                                     right % base + right % len - 1 ), &
+                                left % len, buf, &
+                                index( left % base: &
+                                     right % base + right % len - 1 ), ibuf )
+
+                    runs(r) = run_type( base = left % base, &
+                                        len = left % len + right % len )
+                    if ( r == r_count - 3 ) runs(r+1) = runs(r+2)
+                    r_count = r_count - 1
+
+                end do Merge_loop
+            end do
+            if ( r_count /= 1 ) &
+                error stop "MERGE_SORT completed without RUN COUNT == 1."
+
+        end subroutine merge_sort
+
+
+        pure subroutine merge( array, mid, buf, index, ibuf )
+! Merges the two non-decreasing runs `ARRAY(0:MID-1)` and `ARRAY(MID:)`
+! using `BUF` as temporary storage, and stores the merged runs into
+! `ARRAY(0:)`. `MID` must be > 0, and < `SIZE(ARRAY)-1`. Buffer `BUF`
+! must be long enough to hold the shorter of the two runs.
+            type(string_type), intent(inout) :: array(0:)
+            integer(int_size), intent(in)    :: mid
+            type(string_type), intent(inout) :: buf(0:)
+            integer(int_size), intent(inout) :: index(0:)
+            integer(int_size), intent(inout) :: ibuf(0:)
+
+            integer(int_size) :: array_len, i, j, k
+
+            array_len = size(array, kind=int_size)
+
+! Merge first copies the shorter run into `buf`. Then, depending on which
+! run was shorter, it traces the copied run and the longer run forwards
+! (or backwards), comparing their next unprocessed elements and then
+! copying the lesser (or greater) one into `array`.
+
+            if ( mid <= array_len - mid ) then ! The left run is shorter.
+                buf(0:mid-1) = array(0:mid-1)
+                ibuf(0:mid-1) = index(0:mid-1)
+                i = 0
+                j = mid
+                merge_lower: do k = 0, array_len-1
+                    if ( buf(i) <= array(j) ) then
+                        array(k) = buf(i)
+                        index(k) = ibuf(i)
+                        i = i + 1
+                        if ( i >= mid ) exit merge_lower
+                    else
+                        array(k) = array(j)
+                        index(k) = index(j)
+                        j = j + 1
+                        if ( j >= array_len ) then
+                            array(k+1:) = buf(i:mid-1)
+                            index(k+1:) = ibuf(i:mid-1)
+                            exit merge_lower
+                        end if
+                    end if
+                end do merge_lower
+            else ! The right run is shorter
+                buf(0:array_len-mid) = array(mid:array_len-1)
+                ibuf(0:array_len-mid) = index(mid:array_len-1)
+                i = mid - 1
+                j = array_len - mid -1
+                merge_upper: do k = array_len-1, 0, -1
+                    if ( buf(j) >= array(i) ) then
+                        array(k) = buf(j)
+                        index(k) = ibuf(j)
+                        j = j - 1
+                        if ( j < 0 ) exit merge_upper
+                    else
+                        array(k) = array(i)
+                        index(k) = index(i)
+                        i = i - 1
+                        if ( i < 0 ) then
+                            array(0:k-1) = buf(0:j)
+                            index(0:k-1) = ibuf(0:j)
+                            exit merge_upper
+                        end if
+                    end if
+                end do merge_upper
+            end if
+        end subroutine merge
+
+
+        pure subroutine reverse_segment( array, index )
+! Reverse a segment of an array in place
+            type(string_type), intent(inout)  :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            integer(int_size) :: itemp, lo, hi
+            type(string_type) :: temp
+
+            lo = 0
+            hi = size( array, kind=int_size ) - 1
+            do while( lo < hi )
+                temp = array(lo)
+                array(lo) = array(hi)
+                array(hi) = temp
+                itemp = index(lo)
+                index(lo) = index(hi)
+                index(hi) = itemp
+                lo = lo + 1
+                hi = hi - 1
+            end do
+
+        end subroutine reverse_segment
+
+    end subroutine string_sort_index
+
+end submodule stdlib_sorting_sort_index

From bc3d6524459d4e2655b1762c83038cf1c218e392 Mon Sep 17 00:00:00 2001
From: William Clodius <w.clodius@icloud.com>
Date: Fri, 23 Apr 2021 17:48:45 -0600
Subject: [PATCH 07/18] Commited files related to testing the sorting
 procedures.

Commited the stdlib_sorting test code tests/sorting/test_sorting.f90, and related
CMakeLists.txt, and Makefile.manual files needed to compile the test code:
tests/CMakeLists.txt, tests/Makefile.manual, tests/sorting/CMakeLists.txt, and
tests/sorting/Makefile.manual.

[ticket: X]
---
 src/tests/CMakeLists.txt           |   1 +
 src/tests/Makefile.manual          |   1 +
 src/tests/sorting/CMakeLists.txt   |   2 +
 src/tests/sorting/Makefile.manual  |   3 +
 src/tests/sorting/test_sorting.f90 | 625 +++++++++++++++++++++++++++++
 5 files changed, 632 insertions(+)
 create mode 100644 src/tests/sorting/CMakeLists.txt
 create mode 100644 src/tests/sorting/Makefile.manual
 create mode 100644 src/tests/sorting/test_sorting.f90

diff --git a/src/tests/CMakeLists.txt b/src/tests/CMakeLists.txt
index 288445de9..381226222 100644
--- a/src/tests/CMakeLists.txt
+++ b/src/tests/CMakeLists.txt
@@ -12,6 +12,7 @@ add_subdirectory(io)
 add_subdirectory(linalg)
 add_subdirectory(logger)
 add_subdirectory(optval)
+add_subdirectory(sorting)
 add_subdirectory(stats)
 add_subdirectory(string)
 add_subdirectory(system)
diff --git a/src/tests/Makefile.manual b/src/tests/Makefile.manual
index 773066b1c..57f663bef 100644
--- a/src/tests/Makefile.manual
+++ b/src/tests/Makefile.manual
@@ -6,6 +6,7 @@ all test clean:
 	$(MAKE) -f Makefile.manual --directory=io $@
 	$(MAKE) -f Makefile.manual --directory=logger $@
 	$(MAKE) -f Makefile.manual --directory=optval $@
+	$(MAKE) -f Makefile.manual --directory=sorting $@
 	$(MAKE) -f Makefile.manual --directory=quadrature $@
 	$(MAKE) -f Makefile.manual --directory=stats $@
 	$(MAKE) -f Makefile.manual --directory=string $@
diff --git a/src/tests/sorting/CMakeLists.txt b/src/tests/sorting/CMakeLists.txt
new file mode 100644
index 000000000..bb4b27896
--- /dev/null
+++ b/src/tests/sorting/CMakeLists.txt
@@ -0,0 +1,2 @@
+ADDTEST(sorting)
+
diff --git a/src/tests/sorting/Makefile.manual b/src/tests/sorting/Makefile.manual
new file mode 100644
index 000000000..39657c8f3
--- /dev/null
+++ b/src/tests/sorting/Makefile.manual
@@ -0,0 +1,3 @@
+PROGS_SRC = test_sorting.f90
+
+include ../Makefile.manual.test.mk
diff --git a/src/tests/sorting/test_sorting.f90 b/src/tests/sorting/test_sorting.f90
new file mode 100644
index 000000000..4025aef42
--- /dev/null
+++ b/src/tests/sorting/test_sorting.f90
@@ -0,0 +1,625 @@
+program test_sorting
+
+    use, intrinsic :: iso_fortran_env, only: &
+        compiler_version
+    use stdlib_kinds, only: int32, int64, dp, sp
+    use stdlib_sorting
+    use stdlib_string_type, only: string_type, assignment(=), operator(>), &
+        write(formatted)
+
+    implicit none
+
+    integer(int32), parameter :: test_size = 2_int32**20
+    integer(int32), parameter :: char_size = 26**4
+    integer(int32), parameter :: string_size = 26**3
+    integer(int32), parameter :: block_size = test_size/6
+    integer, parameter        :: repeat = 8
+
+    integer(int32) ::             &
+        blocks(0:test_size-1),    &
+        decrease(0:test_size-1),  &
+        identical(0:test_size-1), &
+        increase(0:test_size-1),  &
+        rand0(0:test_size-1),     &
+        rand1(0:test_size-1),     &
+        rand2(0:test_size-1),     &
+        rand3(0:test_size-1),     &
+        rand10(0:test_size-1)
+    character(len=4) ::               &
+        char_decrease(0:char_size-1), &
+        char_increase(0:char_size-1), &
+        char_rand(0:char_size-1)
+    type(string_type) ::                  &
+        string_decrease(0:string_size-1), &
+        string_increase(0:string_size-1), &
+        string_rand(0:string_size-1)
+
+    integer(int32)          :: dummy(0:test_size-1)
+    character(len=4)        :: char_dummy(0:char_size-1)
+    type(string_type)       :: string_dummy(0:string_size-1)
+    integer(int_size)       :: index(0:test_size-1)
+    integer(int32)          :: work(0:test_size/2-1)
+    character(len=4)        :: char_work(0:char_size/2-1)
+    type(string_type)       :: string_work(0:string_size/2-1)
+    integer(int_size)       :: iwork(0:test_size/2-1)
+    integer                 :: count, i, index1, index2, j, k, l, temp
+    real(sp)                :: arand, brand
+    character(*), parameter :: filename = 'test_sorting.txt'
+    integer                 :: lun
+    character(len=4)        :: char_temp
+    type(string_type)       :: string_temp
+
+! Create the test arrays
+    identical(:) = 10
+    do i=0, test_size-1
+        increase(i) = i
+        decrease(i) = test_size - 1 - i
+        call random_number( arand )
+        rand0(i) = int( floor( 4 * arand * test_size ), kind=int32 )
+        rand1(i) = int( floor( arand * test_size / 4 ), kind=int32 )
+    end do
+    blocks(:) = increase(:)
+    blocks(0:block_size-1) = increase(4*block_size:5*block_size-1)
+    blocks(block_size:2*block_size-1) = increase(0:block_size-1)
+    blocks(2*block_size:3*block_size-1) = increase(2*block_size:3*block_size-1)
+    blocks(3*block_size:4*block_size-1) = increase(block_size:2*block_size-1)
+    blocks(4*block_size:5*block_size-1) = increase(3*block_size:4*block_size-1)
+    rand2(:) = increase(:)
+    do i=0, test_size-1
+        call random_number( arand )
+        index1 = int( floor( arand * test_size ), kind=int32 )
+        temp = rand2(i)
+        rand2(i) = rand2(index1)
+        rand2(index1) = temp
+    end do
+    rand3(:) = increase(:)
+    do i=0, 2
+        call random_number( arand )
+        call random_number( brand )
+        index1 = int( floor( arand * test_size ), kind=int32 )
+        index2 = int( floor( brand * test_size ), kind=int32 )
+        temp = rand3(index1)
+        rand3(index1) = rand3(index2)
+        rand3(index2) = temp
+    end do
+    rand10(:) = increase(:)
+    do i=test_size-10, test_size-1
+        call random_number( arand )
+        rand10(i) = int( floor( arand * test_size ), kind=int32 )
+    end do
+
+    count = 0
+    do i=0, 25
+        do j=0, 25
+            do k=0, 25
+                do l=0, 25
+                    char_increase(count) = achar(97+i) // achar(97+j) // &
+                        achar(97+k) // achar(97+l)
+                    count = count + 1
+                end do
+            end do
+        end do
+    end do
+
+    do i=0, char_size-1
+        char_decrease(char_size-1-i) = char_increase(i)
+    end do
+
+    char_rand(:) = char_increase(:)
+    do i=0, char_size-1
+        call random_number( arand )
+        index1 = int( floor( arand * char_size ), kind=int32 )
+        char_temp = char_rand(i)
+        char_rand(i) = char_rand(index1)
+        char_rand(index1) = char_temp
+    end do
+
+    count = 0
+    do i=0, 25
+        do j=0, 25
+            do k=0, 25
+                string_increase(count) = achar(97+i) // achar(97+j) // &
+                    achar(97+k)
+                count = count + 1
+            end do
+        end do
+    end do
+
+    do i=0, string_size-1
+        string_decrease(string_size - 1 - i) = char_increase(i)
+    end do
+
+    string_rand(:) = string_increase(:)
+    do i=0, string_size-1
+        call random_number( arand )
+        index1 = int( floor( arand * string_size ), kind=int32 )
+        string_temp = string_rand(i)
+        string_rand(i) = string_rand(index1)
+        string_rand(index1) = string_temp
+    end do
+
+! Create and intialize file to report the results of the sortings
+    open( newunit=lun, file=filename, access='sequential', action='write', &
+        form='formatted', status='replace' )
+    write( lun, '(a)' ) trim(compiler_version())
+    write( lun, * )
+    write( lun, '("|    Type     | Elements |    Array Name   |    Method ' // &
+        '  |  Time (s) |")' )
+    write( lun, '("|-------------|----------|-----------------|-----------' // &
+        '--|-----------|")' )
+
+! test the sorting routines on the test arrays
+    call test_int_ord_sorts( )
+
+    call test_char_ord_sorts( )
+
+    call test_string_ord_sorts( )
+
+    call test_int_sorts( )
+
+    call test_char_sorts( )
+
+    call test_string_sorts( )
+
+    call test_int_sort_indexes( )
+
+    call test_char_sort_indexes( )
+
+    call test_string_sort_indexes( )
+
+contains
+
+    subroutine test_int_ord_sorts( )
+
+        call test_int_ord_sort( blocks, "Blocks" )
+        call test_int_ord_sort( decrease, "Decreasing" )
+        call test_int_ord_sort( identical, "Identical" )
+        call test_int_ord_sort( increase, "Increasing" )
+        call test_int_ord_sort( rand1, "Random dense" )
+        call test_int_ord_sort( rand2, "Random order" )
+        call test_int_ord_sort( rand0, "Random sparse" )
+        call test_int_ord_sort( rand3, "Random 3" )
+        call test_int_ord_sort( rand10, "Random 10" )
+
+    end subroutine test_int_ord_sorts
+
+
+    subroutine test_int_ord_sort( a, a_name )
+        integer(int32), intent(in) :: a(:)
+        character(*), intent(in)   :: a_name 
+
+        integer(int64) :: t0, t1, tdiff
+        real(dp)       :: rate
+        integer(int64) :: i
+        logical :: valid
+
+        tdiff = 0
+        do i = 1, repeat
+            dummy = a
+            call system_clock( t0, rate )
+            call ord_sort( dummy, work )
+            call system_clock( t1, rate )
+            tdiff = tdiff + t1 - t0
+        end do
+        tdiff = tdiff/repeat
+
+        call verify_sort( dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "ORD_SORT did not sort " // a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a12, 2i7)') 'dummy(i-1:i) = ', dummy(i-1:i)
+        end if
+        write( lun, '("|     Integer |", 1x, i7, 2x, "|", 1x, a15, " |", ' // &
+            'a12, " |",  F10.5, " |" )' ) &
+            test_size, a_name, "Ord_Sort", tdiff/rate
+
+    end subroutine test_int_ord_sort
+
+    subroutine test_char_ord_sorts( )
+
+        call test_char_ord_sort( char_decrease, "Char. Decrease" )
+        call test_char_ord_sort( char_increase, "Char. Increase" )
+        call test_char_ord_sort( char_rand, "Char. Random" )
+
+    end subroutine test_char_ord_sorts
+
+    subroutine test_char_ord_sort( a, a_name )
+        character(len=4), intent(in) :: a(0:)
+        character(*), intent(in) :: a_name
+
+        integer(int64) :: t0, t1, tdiff
+        real(dp)       :: rate
+        integer(int64) :: i
+        logical        :: valid
+
+        tdiff = 0
+        do i = 1, repeat
+            char_dummy = a
+            call system_clock( t0, rate )
+            call ord_sort( char_dummy, char_work )
+            call system_clock( t1, rate )
+            tdiff = tdiff + t1 - t0
+        end do
+        tdiff = tdiff/repeat
+
+        call verify_char_sort( char_dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "ORD_SORT did not sort " // a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a17, 2(1x,a4))') 'char_dummy(i-1:i) = ', char_dummy(i-1:i)
+        end if
+        write( lun, '("|   Character |", 1x, i7, 2x, "|", 1x, a15, " |", ' // &
+            'a12, " |",  F10.5, " |" )' ) &
+            char_size, a_name, "Ord_Sort", tdiff/rate
+
+    end subroutine test_char_ord_sort
+
+    subroutine test_string_ord_sorts( )
+
+        call test_string_ord_sort( string_decrease, "String Decrease" )
+        call test_string_ord_sort( string_increase, "String Increase" )
+        call test_string_ord_sort( string_rand, "String Random" )
+
+    end subroutine test_string_ord_sorts
+
+    subroutine test_string_ord_sort( a, a_name )
+        type(string_type), intent(in) :: a(0:)
+        character(*), intent(in)      :: a_name
+
+        integer(int64) :: t0, t1, tdiff
+        real(dp)       :: rate
+        integer(int64) :: i
+        logical        :: valid
+
+        tdiff = 0
+        do i = 1, repeat
+            string_dummy = a
+            call system_clock( t0, rate )
+            call ord_sort( string_dummy, string_work )
+            call system_clock( t1, rate )
+            tdiff = tdiff + t1 - t0
+        end do
+        tdiff = tdiff/repeat
+
+        call verify_string_sort( string_dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "ORD_SORT did not sort " // a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a17, 2(1x,a4))') 'string_dummy(i-1:i) = ', &
+                string_dummy(i-1:i)
+        end if
+        write( lun, '("| String_type |", 1x, i7, 2x, "|", 1x, a15, " |", ' // &
+            'a12, " |",  F10.5, " |" )' ) &
+            string_size, a_name, "Ord_Sort", tdiff/rate
+
+    end subroutine test_string_ord_sort
+
+
+    subroutine test_int_sorts( )
+
+        call test_int_sort( blocks, "Blocks" )
+        call test_int_sort( decrease, "Decreasing" )
+        call test_int_sort( identical, "Identical" )
+        call test_int_sort( increase, "Increasing" )
+        call test_int_sort( rand1, "Random dense" )
+        call test_int_sort( rand2, "Random order" )
+        call test_int_sort( rand0, "Random sparse" )
+        call test_int_sort( rand3, "Random 3" )
+        call test_int_sort( rand10, "Random 10" )
+
+    end subroutine test_int_sorts
+
+    subroutine test_int_sort( a, a_name )
+        integer(int32), intent(in) :: a(:)
+        character(*), intent(in)   :: a_name
+
+        integer(int64) :: t0, t1, tdiff
+        real(dp)       :: rate
+        integer(int64) :: i
+        logical        :: valid
+
+        tdiff = 0
+        do i = 1, repeat
+            dummy = a
+            call system_clock( t0, rate )
+            call sort( dummy )
+            call system_clock( t1, rate )
+            tdiff = tdiff + t1 - t0
+        end do
+        tdiff = tdiff/repeat
+
+        call verify_sort( dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "SORT did not sort " // a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a12, 2i7)') 'dummy(i-1:i) = ', dummy(i-1:i)
+        end if
+        write( lun, '("|     Integer |", 1x, i7, 2x, "|", 1x, a15, " |", ' // &
+            'a12, " |",  F10.5, " |" )' ) &
+            test_size, a_name, "Sort", tdiff/rate
+
+    end subroutine test_int_sort
+
+    subroutine test_char_sorts( )
+
+        call test_char_sort( char_decrease, "Char. Decrease" )
+        call test_char_sort( char_increase, "Char. Increase" )
+        call test_char_sort( char_rand, "Char. Random" )
+
+    end subroutine test_char_sorts
+
+    subroutine test_char_sort( a, a_name )
+        character(len=4), intent(in) :: a(0:)
+        character(*), intent(in) :: a_name
+
+        integer(int64) :: t0, t1, tdiff
+        real(dp)       :: rate
+        integer(int64) :: i
+        logical        :: valid
+
+        tdiff = 0
+        do i = 1, repeat
+            char_dummy = a
+            call system_clock( t0, rate )
+            call sort( char_dummy )
+            call system_clock( t1, rate )
+            tdiff = tdiff + t1 - t0
+        end do
+        tdiff = tdiff/repeat
+
+        call verify_char_sort( char_dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "SORT did not sort " // a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a17, 2(1x,a4))') 'char_dummy(i-1:i) = ', char_dummy(i-1:i)
+        end if
+        write( lun, '("|   Character |", 1x, i7, 2x, "|", 1x, a15, " |", ' // &
+            'a12, " |",  F10.5, " |" )' ) &
+            char_size, a_name, "Sort", tdiff/rate
+
+    end subroutine test_char_sort
+
+    subroutine test_string_sorts( )
+
+        call test_string_sort( string_decrease, "String Decrease" )
+        call test_string_sort( string_increase, "String Increase" )
+        call test_string_sort( string_rand, "String Random" )
+
+    end subroutine test_string_sorts
+
+    subroutine test_string_sort( a, a_name )
+        type(string_type), intent(in) :: a(0:)
+        character(*), intent(in) :: a_name
+
+        integer(int64) :: t0, t1, tdiff
+        real(dp)       :: rate
+        integer(int64) :: i
+        logical        :: valid
+
+        tdiff = 0
+        do i = 1, repeat
+            string_dummy = a
+            call system_clock( t0, rate )
+            call sort( string_dummy )
+            call system_clock( t1, rate )
+            tdiff = tdiff + t1 - t0
+        end do
+        tdiff = tdiff/repeat
+
+        call verify_string_sort( string_dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "SORT did not sort " // a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a17, 2(1x,a4))') 'string_dummy(i-1:i) = ', &
+                string_dummy(i-1:i)
+        end if
+        write( lun, '("| String_type |", 1x, i7, 2x, "|", 1x, a15, " |", ' // &
+            'a12, " |",  F10.5, " |" )' ) &
+            string_size, a_name, "Sort", tdiff/rate
+
+    end subroutine test_string_sort
+
+    subroutine test_int_sort_indexes( )
+
+        call test_int_sort_index( blocks, "Blocks" )
+        call test_int_sort_index( decrease, "Decreasing" )
+        call test_int_sort_index( identical, "Identical" )
+        call test_int_sort_index( increase, "Increasing" )
+        call test_int_sort_index( rand1, "Random dense" )
+        call test_int_sort_index( rand2, "Random order" )
+        call test_int_sort_index( rand0, "Random sparse" )
+        call test_int_sort_index( rand3, "Random 3" )
+        call test_int_sort_index( rand10, "Random 10" )
+
+    end subroutine test_int_sort_indexes
+
+    subroutine test_int_sort_index( a, a_name )
+        integer(int32), intent(inout) :: a(:)
+        character(*), intent(in)      :: a_name
+
+        integer(int64)                 :: t0, t1, tdiff
+        real(dp)                       :: rate
+        integer(int64)                 :: i
+        logical                        :: valid
+
+        tdiff = 0
+        do i = 1, repeat
+            dummy = a
+            call system_clock( t0, rate )
+            call sort_index( dummy, index, work, iwork )
+            call system_clock( t1, rate )
+            tdiff = tdiff + t1 - t0
+        end do
+        tdiff = tdiff/repeat
+
+        dummy = a(index)
+        call verify_sort( dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "SORT_INDEX did not sort " // a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a18, 2i7)') 'a(index(i-1:i)) = ', a(index(i-1:i))
+        end if
+        write( lun, '("|     Integer |", 1x, i7, 2x, "|", 1x, a15, " |", ' // &
+            'a12, " |",  F10.5, " |" )' ) &
+            test_size, a_name, "Sort_Index", tdiff/rate
+
+        dummy = a
+        call sort_index( dummy, index, work, iwork, reverse=.true. )
+        dummy = a(index)
+        call verify_reverse_sort( dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "SORT_INDEX did not reverse sort " // &
+                a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a18, 2i7)') 'a(index(i-1:i)) = ', a(index(i-1:i))
+        end if
+
+    end subroutine test_int_sort_index
+
+    subroutine test_char_sort_indexes( )
+
+        call test_char_sort_index( char_decrease, "Char. Decrease" )
+        call test_char_sort_index( char_increase, "Char. Increase" )
+        call test_char_sort_index( char_rand, "Char. Random" )
+
+    end subroutine test_char_sort_indexes
+
+    subroutine test_char_sort_index( a, a_name )
+        character(len=4), intent(in) :: a(0:)
+        character(*), intent(in) :: a_name
+
+        integer(int64) :: t0, t1, tdiff
+        real(dp)       :: rate
+        integer(int64) :: i
+        logical        :: valid
+
+        tdiff = 0
+        do i = 1, repeat
+            char_dummy = a
+            call system_clock( t0, rate )
+            call sort_index( char_dummy, index, char_work, iwork )
+            call system_clock( t1, rate )
+            tdiff = tdiff + t1 - t0
+        end do
+        tdiff = tdiff/repeat
+
+        call verify_char_sort( char_dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "SORT_INDEX did not sort " // a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a17, 2(1x,a4))') 'char_dummy(i-1:i) = ', char_dummy(i-1:i)
+        end if
+        write( lun, '("|   Character |", 1x, i7, 2x, "|", 1x, a15, " |", ' // &
+            'a12, " |",  F10.5, " |" )' ) &
+            char_size, a_name, "Sort_Index", tdiff/rate
+
+    end subroutine test_char_sort_index
+
+    subroutine test_string_sort_indexes( )
+
+        call test_string_sort_index( string_decrease, "String Decrease" )
+        call test_string_sort_index( string_increase, "String Increase" )
+        call test_string_sort_index( string_rand, "String Random" )
+
+    end subroutine test_string_sort_indexes
+
+    subroutine test_string_sort_index( a, a_name )
+        type(string_type), intent(in) :: a(0:)
+        character(*), intent(in) :: a_name
+
+        integer(int64) :: t0, t1, tdiff
+        real(dp)       :: rate
+        integer(int64) :: i
+        logical        :: valid
+
+        tdiff = 0
+        do i = 1, repeat
+            string_dummy = a
+            call system_clock( t0, rate )
+            call sort_index( string_dummy, index, string_work, iwork )
+            call system_clock( t1, rate )
+            tdiff = tdiff + t1 - t0
+        end do
+        tdiff = tdiff/repeat
+
+        call verify_string_sort( string_dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "SORT_INDEX did not sort " // a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a17, 2(1x,a4))') 'string_dummy(i-1:i) = ', &
+                string_dummy(i-1:i)
+        end if
+        write( lun, '("| String_type |", 1x, i7, 2x, "|", 1x, a15, " |", ' // &
+            'a12, " |",  F10.5, " |" )' ) &
+            string_size, a_name, "Sort_Index", tdiff/rate
+
+    end subroutine test_string_sort_index
+
+
+    subroutine verify_sort( a, valid, i )
+        integer(int32), intent(in) :: a(0:)
+        logical, intent(out) :: valid
+        integer(int64), intent(out) :: i
+
+        integer(int64) :: n
+
+        n = size( a, kind=int64 )
+        valid = .false.
+        do i=1, n-1
+            if ( a(i-1) > a(i) ) return
+        end do
+        valid = .true.
+
+    end subroutine verify_sort
+
+
+    subroutine verify_string_sort( a, valid, i )
+        type(string_type), intent(in) :: a(0:)
+        logical, intent(out) :: valid
+        integer(int64), intent(out) :: i
+
+        integer(int64) :: n
+
+        n = size( a, kind=int64 )
+        valid = .false.
+        do i=1, n-1
+            if ( a(i-1) > a(i) ) return
+        end do
+        valid = .true.
+
+    end subroutine verify_string_sort
+
+    subroutine verify_char_sort( a, valid, i )
+        character(len=4), intent(in) :: a(0:)
+        logical, intent(out) :: valid
+        integer(int64), intent(out) :: i
+
+        integer(int64) :: n
+
+        n = size( a, kind=int64 )
+        valid = .false.
+        do i=1, n-1
+            if ( a(i-1) > a(i) ) return
+        end do
+        valid = .true.
+
+    end subroutine verify_char_sort
+
+
+    subroutine verify_reverse_sort( a, valid, i )
+        integer(int32), intent(in) :: a(0:)
+        logical, intent(out) :: valid
+        integer(int64), intent(out) :: i
+
+        integer(int64) :: n
+
+        n = size( a, kind=int64 )
+        valid = .false.
+        do i=1, n-1
+            if ( a(i-1) < a(i) ) return
+        end do
+        valid = .true.
+
+    end subroutine verify_reverse_sort
+
+end program test_sorting

From cb2f0d85e72364e7bf3220b179362bed2a29f333 Mon Sep 17 00:00:00 2001
From: William Clodius <w.clodius@icloud.com>
Date: Fri, 23 Apr 2021 17:56:30 -0600
Subject: [PATCH 08/18] Commited the revised stdlib/doc/specs/stdlib_sorting.md

Commited the revised stdlib_sorting markdown document stdlib_sorting.md. It was
revised largely to reflect changes in test_sorting.f90 intended to shorten the
processor time devoted to string_type sorting.

[ticket: X]
---
 doc/specs/stdlib_sorting.md | 196 ++++++++++++++++++------------------
 1 file changed, 98 insertions(+), 98 deletions(-)

diff --git a/doc/specs/stdlib_sorting.md b/doc/specs/stdlib_sorting.md
index c573c5876..346a95448 100644
--- a/doc/specs/stdlib_sorting.md
+++ b/doc/specs/stdlib_sorting.md
@@ -523,13 +523,13 @@ size `26**4`:
   random order.
 
 On three different `string_type` arrays, each of length 4 elements and
-of size `26**4`:
+of size `26**3`:
 
-* String Decreasing - values decrease uniformly from `"zzzz"` to
-  `"aaaa"`.
-* String Increasing - values decrease uniformly from `"aaaa"` to
-  `"zzzz"`.
-* String Random - the set of strings from `"aaaa"` to `"zzzz"` in
+* String Decreasing - values decrease uniformly from `"zzz"` to
+  `"aaa"`.
+* String Increasing - values decrease uniformly from `"aaa"` to
+  `"zzz"`.
+* String Random - the set of strings from `"aaa"` to `"zzz"` in
   random order.
 
 These benchmarks have been performed on two different compilers, both
@@ -537,104 +537,104 @@ on a MacBook Pro, featuring a 2.3 GHz Quad-Core Intel Core i5, with 8
 GB 2133 MHz LPDDR3 memory. The first compiler was GNU Fortran
 (GCC) 10.2.0, with the following results:
 
-|    Type     | Elements |      Order      |    Method   | Time (s)  |
+|    Type     | Elements |    Array Name   |    Method   |  Time (s) |
 |-------------|----------|-----------------|-------------|-----------|
-|     Integer | 1048576  |          Blocks |    Ord_Sort |   0.00694 |
-|     Integer | 1048576  |      Decreasing |    Ord_Sort |   0.00363 |
-|     Integer | 1048576  |       Identical |    Ord_Sort |   0.00206 |
-|     Integer | 1048576  |      Increasing |    Ord_Sort |   0.00206 |
-|     Integer | 1048576  |    Random dense |    Ord_Sort |   0.17705 |
-|     Integer | 1048576  |    Random order |    Ord_Sort |   0.17749 |
-|     Integer | 1048576  |   Random sparse |    Ord_Sort |   0.17641 |
-|     Integer | 1048576  |        Random 3 |    Ord_Sort |   0.00943 |
-|     Integer | 1048576  |       Random 10 |    Ord_Sort |   0.00378 |
-|   Character |  456976  |  Char. Decrease |    Ord_Sort |   0.00766 |
-|   Character |  456976  |  Char. Increase |    Ord_Sort |   0.00416 |
-|   Character |  456976  |    Char. Random |    Ord_Sort |   0.23689 |
-| String_type |  456976  | String Decrease |    Ord_Sort |   0.15467 |
-| String_type |  456976  | String Increase |    Ord_Sort |   0.09072 |
-| String_type |  456976  |   String Random |    Ord_Sort |   4.04153 |
-|     Integer | 1048576  |          Blocks |        Sort |   0.11175 |
-|     Integer | 1048576  |      Decreasing |        Sort |   0.14222 |
-|     Integer | 1048576  |       Identical |        Sort |   0.16076 |
-|     Integer | 1048576  |      Increasing |        Sort |   0.05380 |
-|     Integer | 1048576  |    Random dense |        Sort |   0.15519 |
-|     Integer | 1048576  |    Random order |        Sort |   0.15907 |
-|     Integer | 1048576  |   Random sparse |        Sort |   0.15792 |
-|     Integer | 1048576  |        Random 3 |        Sort |   0.23041 |
-|     Integer | 1048576  |       Random 10 |        Sort |   0.24668 |
-|   Character |  456976  |  Char. Decrease |        Sort |   0.31256 |
-|   Character |  456976  |  Char. Increase |        Sort |   0.11330 |
-|   Character |  456976  |    Char. Random |        Sort |   0.20166 |
-| String_type |  456976  | String Decrease |        Sort |   6.41878 |
-| String_type |  456976  | String Increase |        Sort |   2.26954 |
-| String_type |  456976  |   String Random |        Sort |   3.58182 |
-|     Integer | 1048576  |          Blocks |  Sort_Index |   0.01175 |
-|     Integer | 1048576  |      Decreasing |  Sort_Index |   0.00732 |
-|     Integer | 1048576  |       Identical |  Sort_Index |   0.00453 |
-|     Integer | 1048576  |      Increasing |  Sort_Index |   0.00495 |
-|     Integer | 1048576  |    Random dense |  Sort_Index |   0.20723 |
-|     Integer | 1048576  |    Random order |  Sort_Index |   0.20818 |
-|     Integer | 1048576  |   Random sparse |  Sort_Index |   0.20441 |
-|     Integer | 1048576  |        Random 3 |  Sort_Index |   0.01468 |
-|     Integer | 1048576  |       Random 10 |  Sort_Index |   0.00669 |
-|   Character |  456976  |  Char. Decrease |  Sort_Index |   0.00915 |
-|   Character |  456976  |  Char. Increase |  Sort_Index |   0.00522 |
-|   Character |  456976  |    Char. Random |  Sort_Index |   0.26615 |
-| String_type |  456976  | String Decrease |  Sort_Index |   0.19026 |
-| String_type |  456976  | String Increase |  Sort_Index |   0.09407 |
-| String_type |  456976  |   String Random |  Sort_Index |   4.12616 |
+|     Integer | 1048576  |          Blocks |    Ord_Sort |   0.00738 |
+|     Integer | 1048576  |      Decreasing |    Ord_Sort |   0.00380 |
+|     Integer | 1048576  |       Identical |    Ord_Sort |   0.00220 |
+|     Integer | 1048576  |      Increasing |    Ord_Sort |   0.00209 |
+|     Integer | 1048576  |    Random dense |    Ord_Sort |   0.17972 |
+|     Integer | 1048576  |    Random order |    Ord_Sort |   0.17503 |
+|     Integer | 1048576  |   Random sparse |    Ord_Sort |   0.17340 |
+|     Integer | 1048576  |        Random 3 |    Ord_Sort |   0.00847 |
+|     Integer | 1048576  |       Random 10 |    Ord_Sort |   0.00484 |
+|   Character |  456976  |  Char. Decrease |    Ord_Sort |   0.00763 |
+|   Character |  456976  |  Char. Increase |    Ord_Sort |   0.00414 |
+|   Character |  456976  |    Char. Random |    Ord_Sort |   0.23746 |
+| String_type |   17576  | String Decrease |    Ord_Sort |   0.00543 |
+| String_type |   17576  | String Increase |    Ord_Sort |   0.00347 |
+| String_type |   17576  |   String Random |    Ord_Sort |   0.09461 |
+|     Integer | 1048576  |          Blocks |        Sort |   0.10556 |
+|     Integer | 1048576  |      Decreasing |        Sort |   0.13348 |
+|     Integer | 1048576  |       Identical |        Sort |   0.15719 |
+|     Integer | 1048576  |      Increasing |        Sort |   0.05316 |
+|     Integer | 1048576  |    Random dense |        Sort |   0.15047 |
+|     Integer | 1048576  |    Random order |        Sort |   0.15176 |
+|     Integer | 1048576  |   Random sparse |        Sort |   0.15767 |
+|     Integer | 1048576  |        Random 3 |        Sort |   0.19907 |
+|     Integer | 1048576  |       Random 10 |        Sort |   0.34244 |
+|   Character |  456976  |  Char. Decrease |        Sort |   0.30723 |
+|   Character |  456976  |  Char. Increase |        Sort |   0.10984 |
+|   Character |  456976  |    Char. Random |        Sort |   0.20642 |
+| String_type |   17576  | String Decrease |        Sort |   0.15101 |
+| String_type |   17576  | String Increase |        Sort |   0.05569 |
+| String_type |   17576  |   String Random |        Sort |   0.08499 |
+|     Integer | 1048576  |          Blocks |  Sort_Index |   0.01163 |
+|     Integer | 1048576  |      Decreasing |  Sort_Index |   0.00720 |
+|     Integer | 1048576  |       Identical |  Sort_Index |   0.00451 |
+|     Integer | 1048576  |      Increasing |  Sort_Index |   0.00452 |
+|     Integer | 1048576  |    Random dense |  Sort_Index |   0.20295 |
+|     Integer | 1048576  |    Random order |  Sort_Index |   0.20190 |
+|     Integer | 1048576  |   Random sparse |  Sort_Index |   0.20221 |
+|     Integer | 1048576  |        Random 3 |  Sort_Index |   0.01406 |
+|     Integer | 1048576  |       Random 10 |  Sort_Index |   0.00765 |
+|   Character |  456976  |  Char. Decrease |  Sort_Index |   0.00912 |
+|   Character |  456976  |  Char. Increase |  Sort_Index |   0.00515 |
+|   Character |  456976  |    Char. Random |  Sort_Index |   0.24693 |
+| String_type |   17576  | String Decrease |  Sort_Index |   0.00528 |
+| String_type |   17576  | String Increase |  Sort_Index |   0.00341 |
+| String_type |   17576  |   String Random |  Sort_Index |   0.09554 |
 
 The second compiler was Intel(R) Fortran Intel(R) 64 Compiler Classic
 for applications running on Intel(R) 64, Version 2021.2.0 Build
 20210228_000000, with the following results:
 
-|    Type     | Elements |      Order      |    Method   | Time (s)  |
+|    Type     | Elements |   Array Name    |    Method   |  Time (s) |
 |-------------|----------|-----------------|-------------|-----------|
-|     Integer | 1048576  |          Blocks |    Ord_Sort |   0.00287 |
-|     Integer | 1048576  |      Decreasing |    Ord_Sort |   0.00123 |
-|     Integer | 1048576  |       Identical |    Ord_Sort |   0.00094 |
-|     Integer | 1048576  |      Increasing |    Ord_Sort |   0.00095 |
-|     Integer | 1048576  |    Random dense |    Ord_Sort |   0.09865 |
-|     Integer | 1048576  |    Random order |    Ord_Sort |   0.10138 |
-|     Integer | 1048576  |   Random sparse |    Ord_Sort |   0.10223 |
-|     Integer | 1048576  |        Random 3 |    Ord_Sort |   0.00420 |
-|     Integer | 1048576  |       Random 10 |    Ord_Sort |   0.00174 |
-|   Character |  456976  |  Char. Decrease |    Ord_Sort |   0.00721 |
-|   Character |  456976  |  Char. Increase |    Ord_Sort |   0.00354 |
-|   Character |  456976  |    Char. Random |    Ord_Sort |   0.21287 |
-| String_type |  456976  | String Decrease |    Ord_Sort |   0.50840 |
-| String_type |  456976  | String Increase |    Ord_Sort |   0.14912 |
-| String_type |  456976  |   String Random |    Ord_Sort |  15.28657 |
-|     Integer | 1048576  |          Blocks |        Sort |   0.03632 |
-|     Integer | 1048576  |      Decreasing |        Sort |   0.04135 |
-|     Integer | 1048576  |       Identical |        Sort |   0.03672 |
-|     Integer | 1048576  |      Increasing |        Sort |   0.01346 |
-|     Integer | 1048576  |    Random dense |        Sort |   0.07099 |
-|     Integer | 1048576  |    Random order |        Sort |   0.07172 |
-|     Integer | 1048576  |   Random sparse |        Sort |   0.07196 |
-|     Integer | 1048576  |        Random 3 |        Sort |   0.07729 |
-|     Integer | 1048576  |       Random 10 |        Sort |   0.14964 |
-|   Character |  456976  |  Char. Decrease |        Sort |   0.31387 |
-|   Character |  456976  |  Char. Increase |        Sort |   0.12050 |
-|   Character |  456976  |    Char. Random |        Sort |   0.20228 |
-| String_type |  456976  | String Decrease |        Sort |  23.87099 |
-| String_type |  456976  | String Increase |        Sort |   8.71762 |
-| String_type |  456976  |   String Random |        Sort |  11.50930 |
-|     Integer | 1048576  |          Blocks |  Sort_Index |   0.00563 |
-|     Integer | 1048576  |      Decreasing |  Sort_Index |   0.00230 |
-|     Integer | 1048576  |       Identical |  Sort_Index |   0.00142 |
-|     Integer | 1048576  |      Increasing |  Sort_Index |   0.00142 |
-|     Integer | 1048576  |    Random dense |  Sort_Index |   0.11723 |
-|     Integer | 1048576  |    Random order |  Sort_Index |   0.12539 |
-|     Integer | 1048576  |   Random sparse |  Sort_Index |   0.12706 |
-|     Integer | 1048576  |        Random 3 |  Sort_Index |   0.00718 |
-|     Integer | 1048576  |       Random 10 |  Sort_Index |   0.00316 |
-|   Character |  456976  |  Char. Decrease |  Sort_Index |   0.00790 |
-|   Character |  456976  |  Char. Increase |  Sort_Index |   0.00405 |
-|   Character |  456976  |    Char. Random |  Sort_Index |   0.22963 |
-| String_type |  456976  | String Decrease |  Sort_Index |   0.51546 |
-| String_type |  456976  | String Increase |  Sort_Index |   0.14960 |
-| String_type |  456976  |   String Random |  Sort_Index |  15.59673 |
+|     Integer | 1048576  |          Blocks |    Ord_Sort |   0.00320 |
+|     Integer | 1048576  |      Decreasing |    Ord_Sort |   0.00142 |
+|     Integer | 1048576  |       Identical |    Ord_Sort |   0.00102 |
+|     Integer | 1048576  |      Increasing |    Ord_Sort |   0.00158 |
+|     Integer | 1048576  |    Random dense |    Ord_Sort |   0.09859 |
+|     Integer | 1048576  |    Random order |    Ord_Sort |   0.09704 |
+|     Integer | 1048576  |   Random sparse |    Ord_Sort |   0.09599 |
+|     Integer | 1048576  |        Random 3 |    Ord_Sort |   0.00396 |
+|     Integer | 1048576  |       Random 10 |    Ord_Sort |   0.00183 |
+|   Character |  456976  |  Char. Decrease |    Ord_Sort |   0.00763 |
+|   Character |  456976  |  Char. Increase |    Ord_Sort |   0.00341 |
+|   Character |  456976  |    Char. Random |    Ord_Sort |   0.21991 |
+| String_type |   17576  | String Decrease |    Ord_Sort |   0.01957 |
+| String_type |   17576  | String Increase |    Ord_Sort |   0.00573 |
+| String_type |   17576  |   String Random |    Ord_Sort |   0.37850 |
+|     Integer | 1048576  |          Blocks |        Sort |   0.03668 |
+|     Integer | 1048576  |      Decreasing |        Sort |   0.04073 |
+|     Integer | 1048576  |       Identical |        Sort |   0.03884 |
+|     Integer | 1048576  |      Increasing |        Sort |   0.01279 |
+|     Integer | 1048576  |    Random dense |        Sort |   0.06945 |
+|     Integer | 1048576  |    Random order |        Sort |   0.07151 |
+|     Integer | 1048576  |   Random sparse |        Sort |   0.07224 |
+|     Integer | 1048576  |        Random 3 |        Sort |   0.07954 |
+|     Integer | 1048576  |       Random 10 |        Sort |   0.14395 |
+|   Character |  456976  |  Char. Decrease |        Sort |   0.30367 |
+|   Character |  456976  |  Char. Increase |        Sort |   0.11316 |
+|   Character |  456976  |    Char. Random |        Sort |   0.20233 |
+| String_type |   17576  | String Decrease |        Sort |   0.64479 |
+| String_type |   17576  | String Increase |        Sort |   0.23737 |
+| String_type |   17576  |   String Random |        Sort |   0.31361 |
+|     Integer | 1048576  |          Blocks |  Sort_Index |   0.00643 |
+|     Integer | 1048576  |      Decreasing |  Sort_Index |   0.00219 |
+|     Integer | 1048576  |       Identical |  Sort_Index |   0.00126 |
+|     Integer | 1048576  |      Increasing |  Sort_Index |   0.00130 |
+|     Integer | 1048576  |    Random dense |  Sort_Index |   0.12911 |
+|     Integer | 1048576  |    Random order |  Sort_Index |   0.13024 |
+|     Integer | 1048576  |   Random sparse |  Sort_Index |   0.12956 |
+|     Integer | 1048576  |        Random 3 |  Sort_Index |   0.00781 |
+|     Integer | 1048576  |       Random 10 |  Sort_Index |   0.00281 |
+|   Character |  456976  |  Char. Decrease |  Sort_Index |   0.00779 |
+|   Character |  456976  |  Char. Increase |  Sort_Index |   0.00393 |
+|   Character |  456976  |    Char. Random |  Sort_Index |   0.22561 |
+| String_type |   17576  | String Decrease |  Sort_Index |   0.01878 |
+| String_type |   17576  | String Increase |  Sort_Index |   0.00543 |
+| String_type |   17576  |   String Random |  Sort_Index |   0.37748 |
 
 

From d84da28bbcf1d9d5671008154395b770d1367239 Mon Sep 17 00:00:00 2001
From: William Clodius <w.clodius@icloud.com>
Date: Fri, 23 Apr 2021 21:32:37 -0600
Subject: [PATCH 09/18] Fixed an off by one error in an array assignment

In both stdlib_sorting_ord_sort.fypp and stdlib_sorting_sort_index.fypp there
was an off by one array assignment, that was repeated several times throough
copy/paste/edit.
                buf(0:array_len-mid) = array(mid:array_len-1)
should have been
                buf(0:array_len-mid-1) = array(mid:array_len-1)
Fixed in both files.

[ticket: X]
---
 src/stdlib_sorting_ord_sort.fypp   |  8 ++++----
 src/stdlib_sorting_sort_index.fypp | 16 ++++++++--------
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/src/stdlib_sorting_ord_sort.fypp b/src/stdlib_sorting_ord_sort.fypp
index 0e8c59a47..39f8866a8 100644
--- a/src/stdlib_sorting_ord_sort.fypp
+++ b/src/stdlib_sorting_ord_sort.fypp
@@ -355,7 +355,7 @@ contains
                     end if
                 end do merge_lower
             else ! The right run is shorter ! check that it is stable
-                buf(0:array_len-mid) = array(mid:array_len-1)
+                buf(0:array_len-mid-1) = array(mid:array_len-1)
                 i = mid - 1
                 j = array_len - mid -1
                 merge_upper: do k = array_len-1, 0, -1
@@ -696,7 +696,7 @@ contains
                     end if
                 end do merge_lower
             else ! The right run is shorter ! check that it is stable
-                buf(0:array_len-mid) = array(mid:array_len-1)
+                buf(0:array_len-mid-1) = array(mid:array_len-1)
                 i = mid - 1
                 j = array_len - mid -1
                 merge_upper: do k = array_len-1, 0, -1
@@ -1036,7 +1036,7 @@ contains
                     end if
                 end do merge_lower
             else ! The right run is shorter ! check that it is stable
-                buf(0:array_len-mid) = array(mid:array_len-1)
+                buf(0:array_len-mid-1) = array(mid:array_len-1)
                 i = mid - 1
                 j = array_len - mid -1
                 merge_upper: do k = array_len-1, 0, -1
@@ -1373,7 +1373,7 @@ contains
                     end if
                 end do merge_lower
             else ! The right run is shorter ! check that it is stable
-                buf(0:array_len-mid) = array(mid:array_len-1)
+                buf(0:array_len-mid-1) = array(mid:array_len-1)
                 i = mid - 1
                 j = array_len - mid -1
                 merge_upper: do k = array_len-1, 0, -1
diff --git a/src/stdlib_sorting_sort_index.fypp b/src/stdlib_sorting_sort_index.fypp
index 4146e77f1..158baa1f3 100644
--- a/src/stdlib_sorting_sort_index.fypp
+++ b/src/stdlib_sorting_sort_index.fypp
@@ -414,8 +414,8 @@ contains
                     end if
                 end do merge_lower
             else ! The right run is shorter
-                buf(0:array_len-mid) = array(mid:array_len-1)
-                ibuf(0:array_len-mid) = index(mid:array_len-1)
+                buf(0:array_len-mid-1) = array(mid:array_len-1)
+                ibuf(0:array_len-mid-1) = index(mid:array_len-1)
                 i = mid - 1
                 j = array_len - mid -1
                 merge_upper: do k = array_len-1, 0, -1
@@ -822,8 +822,8 @@ contains
                     end if
                 end do merge_lower
             else ! The right run is shorter
-                buf(0:array_len-mid) = array(mid:array_len-1)
-                ibuf(0:array_len-mid) = index(mid:array_len-1)
+                buf(0:array_len-mid-1) = array(mid:array_len-1)
+                ibuf(0:array_len-mid-1) = index(mid:array_len-1)
                 i = mid - 1
                 j = array_len - mid -1
                 merge_upper: do k = array_len-1, 0, -1
@@ -1228,8 +1228,8 @@ contains
                     end if
                 end do merge_lower
             else ! The right run is shorter
-                buf(0:array_len-mid) = array(mid:array_len-1)
-                ibuf(0:array_len-mid) = index(mid:array_len-1)
+                buf(0:array_len-mid-1) = array(mid:array_len-1)
+                ibuf(0:array_len-mid-1) = index(mid:array_len-1)
                 i = mid - 1
                 j = array_len - mid -1
                 merge_upper: do k = array_len-1, 0, -1
@@ -1631,8 +1631,8 @@ contains
                     end if
                 end do merge_lower
             else ! The right run is shorter
-                buf(0:array_len-mid) = array(mid:array_len-1)
-                ibuf(0:array_len-mid) = index(mid:array_len-1)
+                buf(0:array_len-mid-1) = array(mid:array_len-1)
+                ibuf(0:array_len-mid-1) = index(mid:array_len-1)
                 i = mid - 1
                 j = array_len - mid -1
                 merge_upper: do k = array_len-1, 0, -1

From b2a2149e470ddc6770e0e77ec4c22a46524ed462 Mon Sep 17 00:00:00 2001
From: William Clodius <w.clodius@icloud.com>
Date: Fri, 23 Apr 2021 21:54:33 -0600
Subject: [PATCH 10/18] Fixed improperly assumed short circuiting.

The code stdlib_sorting_sort.f90 was failing on some systems becasue it
improperlly assumed short circuiting
                do while( i >= 0 .and. array(i) > key )
should have been
                do while( i >= 0 )
		    if (array(i) <= key ) exit

[ticket: X]
---
 src/stdlib_sorting_sort.fypp | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/stdlib_sorting_sort.fypp b/src/stdlib_sorting_sort.fypp
index fab24aa1c..d0b803bf9 100644
--- a/src/stdlib_sorting_sort.fypp
+++ b/src/stdlib_sorting_sort.fypp
@@ -170,7 +170,8 @@ contains
             do j=1_int_size, size(array, kind=int_size)-1
                 key = array(j)
                 i = j - 1
-                do while( i >= 0 .and. array(i) > key )
+                do while( i >= 0 )
+                    if ( array(i) <= key ) exit
                     array(i+1) = array(i)
                     i = i - 1
                 end do
@@ -342,7 +343,8 @@ contains
             do j=1_int_size, size(array, kind=int_size)-1
                 key = array(j)
                 i = j - 1
-                do while( i >= 0 .and. array(i) > key )
+                do while( i >= 0 )
+                    if ( array(i) <= key ) exit
                     array(i+1) = array(i)
                     i = i - 1
                 end do
@@ -509,7 +511,8 @@ contains
             do j=1_int_size, size(array, kind=int_size)-1
                 key = array(j)
                 i = j - 1
-                do while( i >= 0 .and. array(i) > key )
+                do while( i >= 0 )
+                    if ( array(i) <= key ) exit
                     array(i+1) = array(i)
                     i = i - 1
                 end do
@@ -673,7 +676,8 @@ contains
             do j=1_int_size, size(array, kind=int_size)-1
                 key = array(j)
                 i = j - 1
-                do while( i >= 0 .and. array(i) > key )
+                do while( i >= 0 )
+                    if ( array(i) <= key ) exit
                     array(i+1) = array(i)
                     i = i - 1
                 end do

From 4280f3ee7d2963c2cd834a2ee6a475372f0942f9 Mon Sep 17 00:00:00 2001
From: William Clodius <w.clodius@icloud.com>
Date: Fri, 30 Apr 2021 12:55:16 -0600
Subject: [PATCH 11/18] Fixed documentation problems noted by gareth_nx

Changed the markdown document docs/specs/stdlib_sorting.md to correct problems
noted by gareth_nx. Changed comments in the FYPP file
src/stlib_sorting_sort_index.fypp to note that the input array is sorted on
output.

[ticket: X]
---
 doc/specs/stdlib_sorting.md        | 62 +++++++++++++++---------------
 src/stdlib_sorting_sort_index.fypp | 12 ++++--
 2 files changed, 40 insertions(+), 34 deletions(-)

diff --git a/doc/specs/stdlib_sorting.md b/doc/specs/stdlib_sorting.md
index 346a95448..9b190b05c 100644
--- a/doc/specs/stdlib_sorting.md
+++ b/doc/specs/stdlib_sorting.md
@@ -152,14 +152,14 @@ housekeeping it has slower runtime performance than `ORD_SORT`.
 provided as optional `work` and `iwork` arguments or allocated
 internally on the stack.
 
-#### The `SORT` subroutines
+#### The `SORT` subroutine
 
 `SORT` uses the `introsort` sorting algorithm of David Musser.
 `introsort` is a hybrid unstable comparison algorithm combining
 `quicksort`, `insertion sort`, and `heap sort`. While this algorithm's
 runtime performance is always O(N Ln(N)), it is relatively fast on
-randomly ordered data, but inconsistent in performance on partly
-sorted data.as the official source of the algorithm.
+randomly ordered data, but does not show the improvement in
+performance on partly sorted data found for `ORD_SORT`.
 
 As with `introsort`, `SORT` is an unstable hybrid algorithm.
 First it examines the array and estimates the depth of recursion a
@@ -182,14 +182,16 @@ calls `introsort` proper. `introsort` proper then:
     * Calls `introsort` proper on the rightmost partition, and then
       returns.
 
-The resulting algorithm is of order O(N Ln(N)) run time
-performance for all inputs. Because it relies on `quicksort`, the
-coefficient of the O(N Ln(N)) behavior is typically small compared to
-other sorting algorithms on random data. On partially sorted data it
-can show either slower `heap sort` performance, or enhanced
-performance by up to a factor of six. Still, even when it shows
-enhanced performance, its performance on partially sorted data is
-typically an order of magnitude slower than `ORD_SORT`.
+The resulting algorithm is of order O(N Ln(N)) run time performance
+for all inputs. Because it relies on `quicksort`, the coefficient of
+the O(N Ln(N)) behavior is typically small compared to other sorting
+algorithms on random data. On partially sorted data it can show either
+slower `heap sort` performance, or enhanced performance by up to a
+factor of six. Still, even when it shows enhanced performance, its
+performance on partially sorted data is typically an order of
+magnitude slower than `ORD_SORT`. Its memory requirements are also
+low, being of order O(Ln(N)), while the memory requirements of
+`ORD_SORT` and `SORT_INDEX` are of order O(N).
 
 ### Tentative specifications of the `stdlib_sorting` procedures
 
@@ -255,9 +257,7 @@ function `LGT`.
     call read_sorted_file( 'dummy_file1', array1 )
     call read_sorted_file( 'dummy_file2', array2 )
     ! Concatenate the arrays
-    allocate( array( size(array1) + size(array2) ) )
-    array( 1:size(array1) ) = array1(:)
-    array( size(array1)+1:size(array1)+size(array2) ) = array2(:)
+	array = [ array1, array2 ]
     ! Sort the resulting array
     call ord_sort( array, work )
     ! Process the sorted array
@@ -318,7 +318,7 @@ element of `array` is a `NaN`.  Sorting of `CHARACTER(*)` and
     ...
 ```
 
-#### `sort_index` - creates an array of sorting indices for an input array.
+#### `sort_index` - creates an array of sorting indices for an input array, while also sorting the array.
 
 ##### Status
 
@@ -326,8 +326,9 @@ Experimental
 
 ##### Description
 
-Returns an integer array whose elements would sort the input array in
-the specified direction retaining order stability.
+Returns the input `array` sorted in the direction requested while
+retaining order stability, and an integer array whose elements would
+sort the input `array` to produce the output `array`.
 
 ##### Syntax
 
@@ -381,9 +382,10 @@ replace "scratch" memory that would otherwise be allocated on the
 stack. If `array` is of any kind of `REAL` the order of the elements in
 `index` and `array` on return are undefined if any element of `array`
 is a `NaN`. Sorting of `CHARACTER(*)` and `STRING_TYPE` arrays are
-based on the operator `>`, and not on the function `LGT`. It should be
-emphasized that the order of `array` will typically be different on
-return.
+based on the operator `>`, and not on the function `LGT`.
+
+It should be emphasized that the order of `array` will typically be
+different on return
 
 
 ##### Examples
@@ -392,15 +394,15 @@ Sorting a related rank one array:
 
 ```Fortran
     subroutine sort_related_data( a, b, work, index, iwork )
-        ! Sort `b` in terms or its related array `a`
+        ! Sort `a`, and  also  sort `b` to be reorderd the same way as `a`
         integer, intent(inout)           :: a(:)
         integer(int32), intent(inout)    :: b(:) ! The same size as a
 		integer(int32), intent(inout)    :: work(:)
 		integer(int_size), intent(inout) :: index(:)
 		integer(int_size), intent(inout) :: iwork(:)
 		! Find the indices to sort a
-		call sort_index(a, index(1:size(a)),&
-		    work(1:size(a)/2), iwork(1:size(a)/2))
+        call sort_index(a, index(1:size(a)),&
+            work(1:size(a)/2), iwork(1:size(a)/2))
 		! Sort b based on the sorting of a
 		b(:) = b( index(1:size(a)) )
 	end subroutine sort_related_data
@@ -410,23 +412,23 @@ Sorting a rank 2 array based on the data in a column
 
 ```Fortran
 	subroutine sort_related_data( array, column, work, index, iwork )
-	    ! Sort `a_data` in terms or its component `a`
-	    integer, intent(inout)           :: a(:,:)
+	    ! Reorder rows of `array` such that `array(:, column)` is  sorted
+	    integer, intent(inout)           :: array(:,:)
 		integer(int32), intent(in)       :: column
 		integer(int32), intent(inout)    :: work(:)
 		integer(int_size), intent(inout) :: index(:)
 		integer(int_size), intent(inout) :: iwork(:)
 		integer, allocatable             :: dummy(:)
 		integer :: i
-		allocate(dummy(size(a, dim=1)))
-		! Extract a component of `a_data`
-		dummy(:) = a(:, column)
+		allocate(dummy(size(array, dim=1)))
+		! Extract a column of `array`
+		dummy(:) = array(:, column)
 		! Find the indices to sort the column
 		call sort_index(dummy, index(1:size(dummy)),&
 		    work(1:size(dummy)/2), iwork(1:size(dummy)/2))
 		! Sort a based on the sorting of its column
-		do i=1, size(a, dim=2)
-		    a(:, i) = a(index(1:size(a, dim=1)), i)
+		do i=1, size(array, dim=2)
+		    array(:, i) = array(index(1:size(array, dim=1)), i)
 		end do
 	end subroutine sort_related_data
 ```
diff --git a/src/stdlib_sorting_sort_index.fypp b/src/stdlib_sorting_sort_index.fypp
index 158baa1f3..6aa078b2b 100644
--- a/src/stdlib_sorting_sort_index.fypp
+++ b/src/stdlib_sorting_sort_index.fypp
@@ -63,7 +63,8 @@ contains
 
     module subroutine ${k1}$_sort_index( array, index, work, iwork, reverse )
 ! A modification of `${k1}$_ord_sort` to return an array of indices that
-! would provide a stable sort of the `ARRAY` input. The indices by default
+! would perform a stable sort of the `ARRAY` as input, and also sort `ARRAY`
+! as desired. The indices by default
 ! correspond to a non-decreasing sort, but if the optional argument
 ! `REVERSE` is present with a value of `.TRUE.` the indices correspond to
 ! a non-increasing sort. The logic of the determination of indexing largely
@@ -471,7 +472,8 @@ contains
 
     module subroutine ${k1}$_sort_index( array, index, work, iwork, reverse )
 ! A modification of `${k1}$_ord_sort` to return an array of indices that
-! would provide a stable sort of the `ARRAY` input. The indices by default
+! would perform a stable sort of the `ARRAY` as input, and also sort `ARRAY`
+! as desired. The indices by default
 ! correspond to a non-decreasing sort, but if the optional argument
 ! `REVERSE` is present with a value of `.TRUE.` the indices correspond to
 ! a non-increasing sort. The logic of the determination of indexing largely
@@ -876,7 +878,8 @@ contains
 
     module subroutine char_sort_index( array, index, work, iwork, reverse )
 ! A modification of `char_ord_sort` to return an array of indices that
-! would provide a stable sort of the `ARRAY` input. The indices by default
+! would perform a stable sort of the `ARRAY` as input, and also sort `ARRAY`
+! as desired. The indices by default
 ! correspond to a non-decreasing sort, but if the optional argument
 ! `REVERSE` is present with a value of `.TRUE.` the indices correspond to
 ! a non-increasing sort. The logic of the determination of indexing largely
@@ -1280,7 +1283,8 @@ contains
 
     module subroutine string_sort_index( array, index, work, iwork, reverse )
 ! A modification of `string_ord_sort` to return an array of indices that
-! would provide a stable sort of the `ARRAY` input. The indices by default
+! would perform a stable sort of the `ARRAY` as input, and also sort `ARRAY`
+! as desired. The indices by default
 ! correspond to a non-decreasing sort, but if the optional argument
 ! `REVERSE` is present with a value of `.TRUE.` the indices correspond to
 ! a non-increasing sort. The logic of the determination of indexing largely

From 62c678148c84104d40afa957d28c15df15cfbcac Mon Sep 17 00:00:00 2001
From: William Clodius <w.clodius@icloud.com>
Date: Fri, 30 Apr 2021 13:04:23 -0600
Subject: [PATCH 12/18] Commited final edit

Forgot to save an edit prior to the previous commit of
docs/specs/stdlib_sorting.md. Saved and commited that edit.

[ticket: X]
---
 doc/specs/stdlib_sorting.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/specs/stdlib_sorting.md b/doc/specs/stdlib_sorting.md
index 9b190b05c..85823052f 100644
--- a/doc/specs/stdlib_sorting.md
+++ b/doc/specs/stdlib_sorting.md
@@ -433,7 +433,7 @@ Sorting a rank 2 array based on the data in a column
 	end subroutine sort_related_data
 ```
 
-Sorting an array of a derived type based on the dsta in one component
+Sorting an array of a derived type based on the data in one component
 
 ```fortran
 	subroutine sort_a_data( a_data, a, work, index, iwork )

From 0a73bbc994dfc99a70777fa41be007d14d5c8819 Mon Sep 17 00:00:00 2001
From: William Clodius <w.clodius@icloud.com>
Date: Fri, 30 Apr 2021 13:21:41 -0600
Subject: [PATCH 13/18] Fixed misspellings

Review-dog caught several misspellings in the comments of the source codes.

[ticket: X]
---
 src/stdlib_sorting.fypp            |  4 ++--
 src/stdlib_sorting_ord_sort.fypp   | 10 +++++-----
 src/stdlib_sorting_sort_index.fypp |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/stdlib_sorting.fypp b/src/stdlib_sorting.fypp
index e7fe92589..00d153d43 100644
--- a/src/stdlib_sorting.fypp
+++ b/src/stdlib_sorting.fypp
@@ -50,7 +50,7 @@
 !!   option. This file may not be copied, modified, or distributed
 !!   except according to those terms.
 !!
-!! so the licence for the original`slice.rs` code is compatible with the use
+!! so the license for the original`slice.rs` code is compatible with the use
 !! of modified versions of the code in the Fortran Standard Library under
 !! the MIT license.
 !!
@@ -221,7 +221,7 @@ module stdlib_sorting
 !!
 !! The generic subroutine implementing the `SORT_INDEX` algorithm to
 !! return an index array whose elements would sort the input array in the
-!! desired direction. It is primarilly intended to be used to sort a
+!! desired direction. It is primarily intended to be used to sort a
 !! derived type array based on the values of a component of the array.
 !! Its use has the syntax:
 !!
diff --git a/src/stdlib_sorting_ord_sort.fypp b/src/stdlib_sorting_ord_sort.fypp
index 39f8866a8..eb0e58480 100644
--- a/src/stdlib_sorting_ord_sort.fypp
+++ b/src/stdlib_sorting_ord_sort.fypp
@@ -49,7 +49,7 @@
 !!   option. This file may not be copied, modified, or distributed
 !!   except according to those terms.
 !!
-!! so the licence for the original`slice.rs` code is compatible with the use
+!! so the license for the original`slice.rs` code is compatible with the use
 !! of modified versions of the code in the Fortran Standard Library under
 !! the MIT license.
 
@@ -73,7 +73,7 @@ contains
 ! a hybrid sorting algorithm combining an iterative Merge sort controlled
 ! by a stack of `RUNS` identified by regions of uniformly decreasing or
 ! non-decreasing sequences that may be expanded to a minimum run size and
-! initialy processed by an insertion sort.
+! initially processed by an insertion sort.
 !
 ! Note the Fortran implementation simplifies the logic as it only has to
 ! deal with Fortran arrays of intrinsic types and not the full generality
@@ -414,7 +414,7 @@ contains
 ! a hybrid sorting algorithm combining an iterative Merge sort controlled
 ! by a stack of `RUNS` identified by regions of uniformly decreasing or
 ! non-decreasing sequences that may be expanded to a minimum run size and
-! initialy processed by an insertion sort.
+! initially processed by an insertion sort.
 !
 ! Note the Fortran implementation simplifies the logic as it only has to
 ! deal with Fortran arrays of intrinsic types and not the full generality
@@ -753,7 +753,7 @@ contains
 ! a hybrid sorting algorithm combining an iterative Merge sort controlled
 ! by a stack of `RUNS` identified by regions of uniformly decreasing or
 ! non-decreasing sequences that may be expanded to a minimum run size and
-! initialy processed by an insertion sort.
+! initially processed by an insertion sort.
 !
 ! Note the Fortran implementation simplifies the logic as it only has to
 ! deal with Fortran arrays of intrinsic types and not the full generality
@@ -1091,7 +1091,7 @@ contains
 ! a hybrid sorting algorithm combining an iterative Merge sort controlled
 ! by a stack of `RUNS` identified by regions of uniformly decreasing or
 ! non-decreasing sequences that may be expanded to a minimum run size and
-! initialy processed by an insertion sort.
+! initially processed by an insertion sort.
 !
 ! Note the Fortran implementation simplifies the logic as it only has to
 ! deal with Fortran arrays of intrinsic types and not the full generality
diff --git a/src/stdlib_sorting_sort_index.fypp b/src/stdlib_sorting_sort_index.fypp
index 6aa078b2b..222c2cd97 100644
--- a/src/stdlib_sorting_sort_index.fypp
+++ b/src/stdlib_sorting_sort_index.fypp
@@ -49,7 +49,7 @@
 !!   option. This file may not be copied, modified, or distributed
 !!   except according to those terms.
 !!
-!! so the licence for the original`slice.rs` code is compatible with the use
+!! so the license for the original`slice.rs` code is compatible with the use
 !! of modified versions of the code in the Fortran Standard Library under
 !! the MIT license.
 

From a0a5d0c6e78502da1d500c9d1a2d35cd669ef8e5 Mon Sep 17 00:00:00 2001
From: William Clodius <w.clodius@icloud.com>
Date: Sun, 2 May 2021 07:56:12 -0600
Subject: [PATCH 14/18] Commit modified index.md

Modified doc/specs/index.md so it mentioned stdlib_sorting.md.

[ticket: X]
---
 doc/specs/index.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/specs/index.md b/doc/specs/index.md
index 4238dc35e..0883ea07c 100644
--- a/doc/specs/index.md
+++ b/doc/specs/index.md
@@ -19,7 +19,7 @@ This is and index/directory of the specifications (specs) for each new module/fe
  - [logger](./stdlib_logger.html) - Runtime logging system
  - [optval](./stdlib_optval.html) - Fallback value for optional arguments
  - [quadrature](./stdlib_quadrature.html) - Numerical integration
- - [sorting](.stdlib_sorting.html) - Sorting of numerical arrays
+ - [sorting](.stdlib_sorting.html) - Sorting of rank one arrays
  - [stats](./stdlib_stats.html) - Descriptive Statistics
  - [stats_distribution_PRNG](./stdlib_stats_distribution_PRNG.html) - Probability Distributions random number generator
  - [string\_type](./stdlib_string_type.html) - Basic string support

From b45441b193a5491731593b8d623fa9db66464480 Mon Sep 17 00:00:00 2001
From: William Clodius <w.clodius@icloud.com>
Date: Sun, 2 May 2021 08:12:22 -0600
Subject: [PATCH 15/18] Added changes suggested by gareth-nx.

Added changes in the description of `SORT_INDEX`.

[ticket: X]
---
 doc/specs/stdlib_sorting.md | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/doc/specs/stdlib_sorting.md b/doc/specs/stdlib_sorting.md
index 85823052f..83429166d 100644
--- a/doc/specs/stdlib_sorting.md
+++ b/doc/specs/stdlib_sorting.md
@@ -44,9 +44,10 @@ data:
 * `ORD_SORT` is intended to sort simple arrays of intrinsic data
   that have significant sections that were partially ordered before
   the sort;
-* `SORT_INDEX` is intended to provide indices for sorting arrays of
-  derived type data, based on the ordering of an intrinsic component
-  of the derived type; and
+* `SORT_INDEX` is based on ORD_SORT, but in addition to sorting the
+  input array, it returns indices that map the original array to its
+  sorted version. This enables related arrays to be re-ordered in the
+  same way; and
 * `SORT` is intended to sort simple arrays of intrinsic data
   that are effectively unordered before the sort.
 
@@ -142,7 +143,8 @@ arrays, or arrays of derived types. For such related data, what is
 useful is an array of indices that maps a rank 1 array to its sorted
 form. For such a sort, a stable sort is useful, therefore the module
 provides a subroutine, `SORT_INDEX`, that generates such an array of
-indices based on the `ORD_SORT` algorithm.
+indices based on the `ORD_SORT` algorithm, in addition to sorting
+the input array.
 
 The logic of `SORT_INDEX` parallels that of `ORD_SORT`, with
 additional housekeeping to keep the array of indices consistent with

From 7087255513f4a9248e941d87b09aa45070756309 Mon Sep 17 00:00:00 2001
From: "William B. Clodius" <65470906+wclodius2@users.noreply.github.com>
Date: Sat, 8 May 2021 19:59:12 -0600
Subject: [PATCH 16/18] Update src/stdlib_sorting.fypp

Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
---
 src/stdlib_sorting.fypp | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/src/stdlib_sorting.fypp b/src/stdlib_sorting.fypp
index 00d153d43..fe96afa43 100644
--- a/src/stdlib_sorting.fypp
+++ b/src/stdlib_sorting.fypp
@@ -1,8 +1,4 @@
-#! Integer kinds to be considered during templating
-#:set INT_KINDS = ["int8", "int16", "int32", "int64"]
-
-#! Real kinds to be considered during templating
-#:set REAL_KINDS = ["sp", "dp", "qp"]
+#:include "common.fypp"
 
 !! Licensing:
 !!

From 6f0f189344469f8f4d9685945e4ad14afed6bc1b Mon Sep 17 00:00:00 2001
From: "William B. Clodius" <65470906+wclodius2@users.noreply.github.com>
Date: Sat, 8 May 2021 20:10:30 -0600
Subject: [PATCH 17/18] Update src/stdlib_sorting.fypp

Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
---
 src/stdlib_sorting.fypp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/stdlib_sorting.fypp b/src/stdlib_sorting.fypp
index fe96afa43..8a1020d81 100644
--- a/src/stdlib_sorting.fypp
+++ b/src/stdlib_sorting.fypp
@@ -84,7 +84,7 @@ module stdlib_sorting
 !!
 !! `SORT_INDEX` is a modification of `ORD_SORT` so that in addition to
 !! sorting the input array, it returns the indices that map to a
-!! stable sort of the original array. These indices are intended to be
+!! stable sort of the original array. These indices are
 !! intended to be used to sort data that is correlated with the input
 !! array, e.g., different arrays in a database, different columns of a
 !! rank 2 array, different elements of a derived type. It is less

From 7c8ed79ce3fe25445ba8f187c552b2bd35d86d69 Mon Sep 17 00:00:00 2001
From: "William B. Clodius" <65470906+wclodius2@users.noreply.github.com>
Date: Sat, 8 May 2021 20:12:01 -0600
Subject: [PATCH 18/18] Update src/stdlib_sorting.fypp

Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
---
 src/stdlib_sorting.fypp | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/stdlib_sorting.fypp b/src/stdlib_sorting.fypp
index 8a1020d81..8908ccfb7 100644
--- a/src/stdlib_sorting.fypp
+++ b/src/stdlib_sorting.fypp
@@ -128,10 +128,10 @@ module stdlib_sorting
                                log(1.6180339887_dp) ) )
 
     type run_type
-! Version: experimental
-!
-! Used to pass state around in a stack among helper functions for the
-! `ORD_SORT` and `SORT_INDEX` algorithms
+!! Version: experimental
+!!
+!! Used to pass state around in a stack among helper functions for the
+!! `ORD_SORT` and `SORT_INDEX` algorithms
         integer(int_size) :: base = 0
         integer(int_size) :: len = 0
     end type run_type