09_iterators.html

<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width">
<title>9. Iterators</title>
<link rel="stylesheet" href="template/style.css">
</head>
<body>


<span style="float: right">
[
<a href="08_lisp.html">previous</a>, 
<a href="index.html">contents</a>, 
<a href="10_binary_counter.html">next</a>
]
</span>


<a name="L9.-Iterators"></a>
<h1>9. Iterators</h1>

<a name="History-of-iterators"></a>
<h2>History of iterators</h2>

<p>When we wrote <code>list_pool</code> before, we didn&rsquo;t provide it with iterators,
because we are learning to be lazy.
Now is a good time to do it, so we can use
other algorithms and learn to write iterators right once
and for all.</p>

<p>Let me tell you a little about them.
Anybody who programs in C++ is forced to use <code>std::vector</code>
and those have iterators.
But, there are lots of people who do not quite understand what they are,
partially because iterators are called iterators.
Someone told me that once I shared with them the proper
name for them, they immediately understood.</p>

<p>There is a thing in computer science called iterators which first
appeared in the mid-seventies around 1975 or 1976 at MIT.
It was invented
by <a href="https://en.wikipedia.org/wiki/Barbara_Liskov">Barbara Liskov</a> who was designing a language called <a href="https://en.wikipedia.org/wiki/CLU_(programming_language)">CLU</a> and she was
interested by another &ldquo;great and very successful language&rdquo; (joke) called <a href="https://en.wikipedia.org/wiki/Alphard_(programming_language)">Alphard</a>.
It never existed.
It&rsquo;s a mythical language. It was never implemented.
But, many people at <a href="https://en.wikipedia.org/wiki/Carnegie_Mellon_University">CMU</a> got tenure because of it.
It has some interesting ideas, including the idea of a generator.
For those of you who know <a href="https://en.wikipedia.org/wiki/Python_(programming_language)">Python</a>, it is like an iterator in Python<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>.
Barbara Liskov said, &ldquo;wouldn&rsquo;t it be nice to write something like: <code>for x in thing</code>&rdquo;.
Iterators allow you to do that<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup>.
It is like a generator in Alphard.
It is a procedure which returns multiple values, one by one.
It&rsquo;s a procedural abstraction.
In CLU you could only have one iterator to a data
structure and it extended the <code>for</code>.
It was a generalization of a control structure.</p>

<p>At the same time, I was working on how to do algorithms and I
introduced the notion of position.
A better name is coordinate, the name which Paul and I use
in our book &ldquo;Elements of Programming&rdquo;.
A coordinate is some way of indicating where in the data structure you are.
It is not a control structure, it&rsquo;s just the pointer into a data structure,
or generalized notion of a coordinate.
It is something which allows me to navigate through the data structure in a
natural way<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup>.</p>

<p>Eventually I started talking to C++ guys and showed them coordinates
and they said, &ldquo;we call it iterator&rdquo;.
Of course they didn&rsquo;t call what I had an iterator.
They had some C++ code where they
were trying to do CLU like iterators, heavy state procedural
things instead of lightweight pointers.
So, I chickened out.
I said, &ldquo;Yes, that&rsquo;s exactly what you guys have&rdquo;.
I thought it was much better to win with the wrong name
than lose with the right name, so the name is stuck.
It&rsquo;s in the standard. But again, the concept which it designates is not a
concept of iterator in CLU or iterator in Python or iterator in Java.
Our iterator is a generalization of a coordinate in a data structure.
It&rsquo;s a lightweight thing. It doesn&rsquo;t <em>do</em> anything,
it just <em>points</em> to something.</p>

<p>There are these arguments which I hear from people like the
Boost<sup id="fnref:4"><a href="#fn:4" rel="footnote">4</a></sup> guys, who say &ldquo;Iterators are all wrong. Let us go back
and do ranges.&rdquo;
Guess what? They&rsquo;re reinventing Barbara Liskov&rsquo;s iterators.
Just for the record, when I introduced my iterators
I was very familiar with the iterators in CLU.
Moreover I was even very familiar with Barbara herself
and with <a href="https://dblp.org/pid/04/4444.html">Alan Snyder</a> who co-designed the iterators in CLU.
I didn&rsquo;t do what they did because I wanted to do something else.
It wasn&rsquo;t out of ignorance.
Maybe I was stupid,
but I wasn&rsquo;t ignorant.</p>

<a name="Affiliated-types-for-iterators"></a>
<h2>Affiliated types for iterators</h2>

<p>We always need to distinguish between how we type
code in C++ and what the notion is behind it.
A key part of iterators is <strong>affiliated types</strong>.
Iterators point to values and you want to know what those values are.
Types don&rsquo;t work on their own.
They come in clusters, or connected families of types.
If you have a type <code>int*</code>, there is an affiliated type <code>int</code>.
The two types are related. It would be terribly nice if we had a
way to obtain <code>int</code> from <code>int*</code>. That is, if somebody
gives me a pointer type, I want a way to find out what type
it points to.</p>

<p>To do this we need this notion of <strong>type functions</strong> which accept one type
and return a different type.
This problem of needing to obtain affiliated types is not specific to C and C++.
It appears in Java and Python as well.
In spite of Python&rsquo;s <a href="https://en.wikipedia.org/wiki/Duck_typing">duck typing</a> there&rsquo;s <em>still</em> a connection
between types, even if they are duck types.</p>

<p>So we need this notion of type functions,
but C doesn&rsquo;t let us do this,
and neither does C++.
Instead of type functions we are going to solve
this problem for iterators by using <code>typedef</code>s.</p>

<p>For an iterator, there are <a href="https://en.cppreference.com/w/cpp/iterator/iterator_traits">5 types</a> that we always need to define.
Three of them are primary, two are secondary.
Let&rsquo;s start with the primary:</p>

<ol>
<li><p><code>value_type</code>: the type of the value it points to.</p></li>
<li><p><code>difference_type</code>: Difference between pointers (<a href="https://en.cppreference.com/w/c/types/ptrdiff_t"><code>ptrdiff_t</code></a> in C).
 Unlike C we might need a different type.
 The length between elements in a range depends on the range type.
 It is an integral type large enough to encode any valid
 range of a given iterator.</p></li>
<li><p><code>iterator_category</code>:
 Once again we need to distinguish between how we type this in C++ and what
 the notion behind it is.
 The notion here is that there are different categories, or theories,
 of iterators: <code>ForwardIterators</code>, <code>RandomAccessIterators</code>,
 <code>InputIterators</code>, etc&hellip;</p>

<p> In C++ (without concepts) we use tag types to designate the iterator categories.
 Every iterator uses a tag type to denote which theory it supports.
 The tag lets you do compile time dispatch<sup id="fnref:5"><a href="#fn:5" rel="footnote">5</a></sup>.</p></li>
</ol>


<a name="Historical-artifacts"></a>
<h3>Historical artifacts</h3>

<p>The fourth and fifth types to be defined are required only for historical reasons.
Microsoft said they were going to vote against STL unless it accommodated
multiple memory models.
At the time, they had tiny pointers, huge pointers and
far pointers.
They wanted STL to somehow work with all of them.
I had to figure out
how they work.
The difference between far pointer and huge pointer is really
weird.
They are both 32 bits.
But, with far pointer if you add
one to it, and the two lowest bytes overflow,
they wrap without propagation to the upper bytes.
With a huge pointer, the carry <em>is</em> propagated to the upper bytes, but by adding
8 to them<sup id="fnref:6"><a href="#fn:6" rel="footnote">6</a></sup>.</p>

<p>So they demanded that I change the whole architecture to accommodate them.
Guess how they voted?
No.
Now we&rsquo;re stuck for the next hundred years with stuff which was included to placate people that
couldn&rsquo;t have been placated.</p>

<p>So what does an iterator return when you dereference it?
Normally a reference. It&rsquo;s an <a href="https://en.wikipedia.org/wiki/Value_(computer_science)#lrvalue"><code>lvalue</code></a>
so you can modify it.
But, with far and tiny pointers, you don&rsquo;t know what the reference
type should be.
So now we need to provide it.</p>

<p>&emsp;4. <code>reference</code>: the type of a reference to the value. <br/>
&emsp;5. <code>pointer</code>: the type of a pointer to the value.</p>

<p>It&rsquo;s not particularly harmful, but it obfuscates things
and it provides &ldquo;language experts&rdquo; with steady employment.</p>

<a name="List-pool-iterators"></a>
<h2>List pool iterators</h2>

<p>Let&rsquo;s define these types in the <code>list_pool</code> iterator.
What category is the iterator for <code>list_pool</code> from last chapter?
We need <code>ForwardIterator</code>, because there is no way in a singly linked list
to go backwards.</p>

<pre><code>#include &lt;iterator&gt;

// class list_pool {
// ...

     struct iterator {
       typedef list_pool::value_type value_type;
       typedef list_pool::list_type difference_type;
       typedef std::forward_iterator_tag iterator_category;
       typedef value_type&amp; reference;
       typedef value_type* pointer;
     };

// };
</code></pre>

<a name="Constructors"></a>
<h3>Constructors</h3>

<p>Let&rsquo;s write constructors for our iterator:</p>

<pre><code>iterator() {}
iterator(list_pool&amp; p, list_pool::list_type node) :
    pool(&amp;p), node(node) {}
</code></pre>

<p>We should explicitly call a constructor when we can.
Default constructor shouldn&rsquo;t really be used
because it guarantees only a partially formed value.
It is only valid in that it can be destructed or assigned to.</p>

<p>This is convenience constructor:</p>

<pre><code>iterator(list_pool&amp; p) :
    pool(&amp;p), node(p.empty()) {}
</code></pre>

<a name="Dereference"></a>
<h3>Dereference</h3>

<p>What goes inside an iterator? Obviously
an index which lets us get nodes.</p>

<pre><code>list_pool* pool;
list_pool::list_type node;

reference operator*() const {
  return pool-&gt;value(node);
};

pointer operator-&gt;() const {
  &amp;**this;
};
</code></pre>

<p>The arrow operator could be automatically
generated. Sadly it isn&rsquo;t.</p>

<p>You may notice I have a lot of minor annoyances like this.
Whenever you enter a language
you have to put aside thoughts that things
could be done differently.
If you work in C, you write C.</p>

<a name="Pre-2d-increment-2c--post-2d-increment"></a>
<h3>Pre-increment, post-increment</h3>

<p>When you increment, the iterator should
move to the next node.</p>

<pre><code>iterator&amp; operator++() {
  node = pool-&gt;next(node);
  return *this;
};


iterator operator++(int) {
  iterator tmp(*this);
  ++*this;
  return tmp;
};
</code></pre>

<p><code>int</code> here doesn&rsquo;t do anything, it&rsquo;s
just to distinguish pre and post.
Post-increment could be automatically generated
and it would be criminal to do anything else.</p>

<a name="Equality"></a>
<h3>Equality</h3>

<p>It&rsquo;s customary here to
pass the arguments by reference.
Why not by value?
Passing by value makes a copy.
At the metaphysical level, what is a copy?
A copy is something which is equal to the original.
<em>Equality is a more primitive thing</em>.
So let&rsquo;s try to define it without referring to copy.</p>

<pre><code>friend
bool operator==(const iterator&amp; x, const iterator&amp; y) {
  // assert(x.pool == y.pool);
  return x.node == y.node;
}

friend
bool operator!=(const iterator&amp; x, const iterator&amp; y) {
  return !(x == y);
}
</code></pre>

<p>We could also complain about using <code>==</code> instead of <code>=</code>
for equality, as it violates mathematical tradition.
But, oh well.
When you took Algebra in grade school,
they used <code>=</code> and they used <code>x</code> and <code>y</code> and I think it&rsquo;s good.</p>

<p><strong>Exercise:</strong> Experiment with list pool iterators by using
  a standard library algorithm on them, such as <code>find</code> or <code>copy</code>
  (see <code>test_list_pool_iterator.cpp</code> at the end of the chapter).</p>

<a name="Thoughts-about-iterator-design"></a>
<h2>Thoughts about iterator design</h2>

<a name="Should-we-add-safety-guards-3f-"></a>
<h3>Should we add safety guards?</h3>

<p>Notice I sometimes write assertions in comments:</p>

<pre><code>// assert(x.pool == y.pool);
</code></pre>

<p>I don&rsquo;t use a real assert because it takes too long to check<sup id="fnref:7"><a href="#fn:7" rel="footnote">7</a></sup>.
There is nobody who should be comparing iterators from separate pools.
If he does, he deserves what he gets.
But, wouldn&rsquo;t it be good to guarantee safety?
No. It cannot be done.
Safe programming is bogus programming.
Programming has to be done right.
You have to write correct code.
Checks are good, but you can&rsquo;t
make wrong code correct with syntactic limitations.
Turing machines are fundamentally unsafe.</p>

<p>What about compiling debug mode?
It&rsquo;s mildly useful.
The only truly useful thing is to decompose
your program into clear subroutines and clear units
which you understand.
That&rsquo;s the only thing I know that works.</p>

<p>The reason to program in C++ is to have access to the machine.
You want to have these unsafe things called pointers.
If you want a language which hides the machine,
then use it. It has its advantages.
Python is good for writing scripts.
But don&rsquo;t write your operating system in Python.
Don&rsquo;t write your search engine.
<a href="https://en.wikipedia.org/wiki/BASIC">BASIC</a> and <a href="https://en.wikipedia.org/wiki/COBOL">COBOL</a> were wonderful for what they are.
I wouldn&rsquo;t use COBOL to write an operating system.</p>

<a name="Why-are-forward-iterators-not-comparable-3f-"></a>
<h3>Why are forward iterators not comparable?</h3>

<p>What is the basic example of a <code>ForwardIterator</code>?
A singly linked list.
Forward iterators do not have &lsquo;less-than&rsquo; defined on them.
Is that a good idea?
You could do it for linked lists, but it&rsquo;s very expensive
and not even guaranteed to terminate<sup id="fnref:8"><a href="#fn:8" rel="footnote">8</a></sup>.
But you&rsquo;re assuming &ldquo;If I am less-than, then I am before&rdquo;.
This is only one interpretation.
Let me explain.</p>

<p>I cannot effectively implement
<code>&lt;</code> on <code>ForwardIterator</code>, but I was very tempted to include it.
It would do a very simple thing, which is to just compare pointers and
return that comparison.
This would allow me to sort them.
This is an amazing thing.
I don&rsquo;t need the semantics of before or after to be able to use them for fast binary
search, whether in a map, or whether in a sorted array.
For example, suppose I want to see if an iterator
is in the structure, such as inclusion in a set.
The only effective way is comparison.
I was torn because I knew I could not include it,
because most people would attempt to write
code in STL the old way they learned which is:</p>

<pre><code>while (i &lt; j) 
</code></pre>

<p>They taught you that this was better code than STL style,
but it doesn&rsquo;t work on linked lists or map.
If I provide it for linked lists, people will
write it, and it will compile and it will work.
(I don&rsquo;t know what that will do.)
So, that was not an option.</p>

<a name="Everything-on-a-computer-is-totally-ordered"></a>
<h3>Everything on a computer is totally ordered</h3>

<p>But, I still want to emphasize the fact that total ordering is a universally useful thing.
Fortunately on a computer
it is fully defined on every byte.
It&rsquo;s fully defined on every word as an integer.
It&rsquo;s even defined, somewhat on <code>double</code> (ignoring bad values).</p>

<p>Equality is naturally extendable to struct.
It was an oversight on the part of C and still C++ again.
The compiler should generate equality,
and it could generate inequality using
<a href="https://en.wikipedia.org/wiki/Lexicographic_order">lexicographical</a> ordering.</p>

<p>So if you want to use iterators
on set, you can define a custom comparator.
One which compares the addresses of the elements to which the iterator points.
It has nothing to do with before and after,
but it establishes a total ordering.
A total ordering does not have to be topologically induced by the traversal.</p>

<p><strong>Exercise:</strong> Extend the list pool iterator
    with the ability to modify the <code>next</code> of the node it points to
    (this is discussed and solved in chapter 12).</p>

<a name="Code"></a>
<h2>Code</h2>

<ul>
<li><a href="code/list_pool_iterator.h">list_pool_iterator.h</a></li>
<li><a href="code/test_list_pool_iterator.cpp">test_list_pool_iterator.cpp</a></li>
</ul>

<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
<p>Alex almost certainly means Python generators, not iterators, but I will describe both.</p>

<p>An <a href="https://wiki.python.org/moin/Iterator">iterator</a> in Python is any object which implements a method called <code>__next__()</code>.
Unlike C++ iterators, the <code>__next__()</code> always returns another element of the sequence, not an iterator,
so they do not resemble pointers or coordinates, neither are they comparable.
They are most similar to <code>InputIterator</code> in that previous values in the sequence become inaccessible after advancing.
The only thing special about the <code>__next__()</code> method is its compatibility with language constructs like <code>for</code> loops.</p>

<p>A <a href="https://wiki.python.org/moin/Generators">generator</a> in Python is a kind of <code>iterator</code> that is typically implemented
as a function with some helpful syntax additions.
In a generator function, the <code>yield</code> keyword is used to return the next value in the sequence.
If an additional value is requested after yielding,
the function will resume at the point of the previous call to <code>yield</code>.
This makes writing complex sequences more natural, as the control flow operates like other code.
For example, the following returns square numbers:</p>

<pre><code>def square_nums(count):
    k = 0
    while k &lt; count:
        yield k * k
        k += 1
</code></pre>

<p>It can be used in a <code>for</code> loop:</p>

<pre><code>for x in square_nums(10):
    print(x)
# 0 1 4 ...
</code></pre><a href="#fnref:1" rev="footnote">&#8617;</a></li>
<li id="fn:2">
See this <a href="http://web.mit.edu/ghudson/info/iterators">brief description of CLU iterators</a>.<a href="#fnref:2" rev="footnote">&#8617;</a></li>
<li id="fn:3">
See chapter 7 of &ldquo;Elements of Programming&rdquo; on coordinate structures.
An interesting discussion on the general idea of &ldquo;coordinatisation&rdquo;
is found in chapter 1 of &ldquo;Basic Notions of Algebra&rdquo; by Shafarevich.<a href="#fnref:3" rev="footnote">&#8617;</a></li>
<li id="fn:4">
<a href="https://www.boost.org/">Boost</a> is a popular collection of C++ libraries, covering a wide range of uses,
 generally accepted as the next tool to reach for beyond the standard library.
 Many standard library features, such as smart pointers, were initially developed in Boost.
 Alex speaks positively of some parts (see <a href="http://stepanovpapers.com/siekforeword.pdf">his foreword</a> for &ldquo;The Boost Graph Library&rdquo;), but others he is more critical of.<a href="#fnref:4" rev="footnote">&#8617;</a></li>
<li id="fn:5">
Some algorithms can be implemented more efficiently for certain
iterator categories. For example <a href="https://en.cppreference.com/w/cpp/iterator/distance"><code>std::distance</code></a> can be
implemented as a constant time algorithm for <code>RandomAccessIterators</code> but
only a linear time algorithm for other iterator categories. The
<code>iterator_category</code> tag allows the appropriate algorithm to be selected at
compile time. This technique is known as <a href="https://quuxplusone.github.io/blog/2021/06/07/tag-dispatch-and-concept-overloading">tag dispatch</a>.<a href="#fnref:5" rev="footnote">&#8617;</a></li>
<li id="fn:6">
See <a href="https://devblogs.microsoft.com/oldnewthing/20200728-00/?p=104012">&ldquo;A look back at memory models in 16-bit MS-DOS&rdquo;</a>
for a brief overview of these various pointer types.
The more general concept behind them is <a href="https://en.wikipedia.org/wiki/Memory_segmentation">memory segmentation</a>.<a href="#fnref:6" rev="footnote">&#8617;</a></li>
<li id="fn:7">
I think modern compilers have fixed this so you can
write checks with more confidence that they won&rsquo;t affect release builds.
Specifically the standard has mandated some rules about when
<a href="https://en.cppreference.com/w/cpp/error/assert">assert</a> is enabled.<a href="#fnref:7" rev="footnote">&#8617;</a></li>
<li id="fn:8">
The <a href="https://en.wikipedia.org/wiki/Swift_(programming_language)">Swift</a> standard library actually takes a lot of inspiration from C++ and Alex&rsquo;s work.
For example, the protocol <a href="https://developer.apple.com/documentation/swift/collection"><code>Collection</code></a> has an <code>Index</code> type which is equivalent to <code>ForwardIterator</code>,
with some differences.
One of these is the <code>Index</code> type <a href="https://forums.swift.org/t/dropping-comparable-requirement-for-indices/3290">must be comparable</a>,
in order to support safety features like bounds checking.
This restriction makes it difficult to write data structures like linked lists
and probably makes Alex&rsquo;s pointer comparison trick impossible.<a href="#fnref:8" rev="footnote">&#8617;</a></li>
</ol>
</div>


<span style="float: right">
[
<a href="08_lisp.html">previous</a>, 
<a href="index.html">contents</a>, 
<a href="10_binary_counter.html">next</a>
]
</span>


</body>
</html>