You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Lecture 13.md
+26-15
Original file line number
Diff line number
Diff line change
@@ -9,9 +9,11 @@
9
9
10
10
## Primary Backup (P/B) Replication
11
11
12
-
In P/B Replication, the clients only ever talk to the primary node `P`. Any time `P` receives a write request, that request is broadcast to all the backup nodes, which independently send their `ack`s back to the primary.
12
+
In P/B Replication, the clients only ever talk to the primary node `P`.
13
+
Any time `P` receives a write request, that request is broadcast to all the backup nodes, which independently send their `ack`s back to the primary.
13
14
14
-
When the primary has received `ack`s from all its backups, it then delivers the write to itself and sends an `ack` back to the client. This point in time is known as the ***commit point***.
15
+
When the primary has received `ack`s from all its backups, it then delivers the write to itself and sends an `ack` back to the client.
16
+
This point in time is known as the ***commit point***.
15
17
16
18
The write latency time experienced by the client is the sum of the times taken to complete each of the following four steps (imagine we have some function `rt(From, To)` that can measure the response time between two nodes):
17
19
@@ -25,7 +27,6 @@ Read requests are handled directly by the primary.
The read latency time is the sum of the time taken to complete the two steps:
30
31
31
32
`rt(C, P) + rt(P, C)`
@@ -72,23 +73,27 @@ These graphs compare the request throughput times of three different backup stra
72
73
All client requests are served the primary.
73
74
Indicated by the dotted line with `*` signs
74
75
75
-
As you can see, Weak Replication offers the highest throughput because any client can talk to any replica. So, this is good illustration of how throughput can be improved simply by throwing more resources at the problem. However, it must also be understood that Weak Replication cannot offer the same strong consistency guarantees as either Primary Backup or Chain Replication.
76
+
As you can see, Weak Replication offers the highest throughput because any client can talk to any replica.
77
+
So, this is good illustration of how throughput can be improved simply by throwing more resources at the problem.
78
+
However, it must also be understood that Weak Replication cannot offer the same strong consistency guarantees as either Primary Backup or Chain Replication.
76
79
77
80
Weak Replication therefore is only valuable in situations where access to the data is *"read mostly"*, and you're not overly concerned if different replicas occasionally give different answers to the same read request.
78
81
79
-
Comparing the Chain and P/B Replication curves, notice that if none of the requests are updates, then their performance is identical. The same is true when the update percentage starts to exceed about 40%.
82
+
Comparing the Chain and P/B Replication curves, notice that if none of the requests are updates, then their performance is identical.
83
+
The same is true when the update percentage starts to exceed about 40%.
80
84
81
85
However, look at the Chain Replication curve.
82
86
83
-
Instead of descending in a gradually flattening curve, there is a hump at around the 10-15% mark. This is where the benefits of Chain Replication can be seen.
87
+
Instead of descending in a gradually flattening curve, there is a hump at around the 10-15% mark.
88
+
This is where the benefits of Chain Replication can be seen.
84
89
85
90
By why should this improvement be seen at this particular ratio of writes to reads?
86
91
87
-
The answer here lies in understanding how the workload is distributed between the head and tail processes in Chain Replication. According to the research done by Renesse and Schneider, their experiments showed that when 10-15% of the requests are writes, then this produces the best throughput — presumably because the workload has now been distributed evenly between the head and tail processes.
92
+
The answer here lies in understanding how the workload is distributed between the head and tail processes in Chain Replication.
93
+
According to the research done by Renesse and Schneider, their experiments showed that when 10-15% of the requests are writes, then this produces the best throughput — presumably because the workload has now been distributed evenly between the head and tail processes.
88
94
89
95
It turns out that in practice, this ratio of writes to reads is quite representative of many distributed systems that are *"out there in the wild"*.
90
96
91
-
92
97
## Dealing with Failure
93
98
94
99
If the primary process in a P/B Replication system fails, who is responsible for informing the clients that one of the backups has now taken on the role of primary?
@@ -111,12 +116,12 @@ In Chain Replication, coordination is slightly more involved in that:
111
116
So, in both situations, it is necessary to have some sort of internal coordinating process whose job it is to know who all the replicas are, and what role they are playing at any given time.
112
117
113
118
> ***Assumptions***
114
-
>
119
+
>
115
120
> * Not all the processes in our system will crash.
116
121
> For a system containing `n` processes, we are relying on the fact that no more than `n-1` processes will ever crash (Ha ha!!)
117
122
> * The coordinator process is able to detect when a process crashes.
118
123
>
119
-
> However, we have not discussed how such assumptions could possibly be true because the term *"crash"*could mean a variety of things: perhaps software execution has terminated, or execution continues but the process simply stops responding to messages, or responds very slowly...
124
+
> However, we have not discussed how such assumptions could possibly be true because the term *"crash"*can mean a wide variety of things: perhaps software execution has terminated, or execution continues but the process simply stops responding to messages, or responds very slowly...
120
125
>
121
126
> Failure detection is a deep topic in itself that we cannot venture into at the moment; suffice it to say, that in an asynchronous distributed system, perfect failure detection is impossible.
122
127
@@ -141,7 +146,8 @@ In the event of failure in a P/B Replication system, the coordinator must keep t
141
146
142
147
#### Coordinator Role in Chain Replication
143
148
144
-
The coordinator must perform a similar set of tasks if failure occurs in a Chain Replication system. If we assume that the head process fails, then the coordinator must keep the system running by:
149
+
The coordinator must perform a similar set of tasks if failure occurs in a Chain Replication system.
150
+
If we assume that the head process fails, then the coordinator must keep the system running by:
145
151
146
152
* Nominating the head's successor to act as the new head
147
153
* Informing all clients to direct their write requests to the new head
@@ -156,19 +162,24 @@ If we go to all the trouble of implementing a system that replicates data across
156
162
157
163
So, what steps can we take to be more tolerant of coordinator failure…
158
164
159
-
* Simply spin up some replicas of the coordinator? And should we do this in just one data centre, or across multiple data centres?
165
+
* Simply spin up some replicas of the coordinator?
166
+
And should we do this in just one data centre, or across multiple data centres?
160
167
* But then how do you keep the coordinators coordinated?
161
-
* Do you have a coordinator coordinator process? If so, who coordinates the coordinator coordinator process?
168
+
* Do you have a coordinator coordinator process?
169
+
If so, who coordinates the coordinator coordinator process?
162
170
163
171
This quickly leads either to an infinite regression of coordinators, or another [Monty Python sketch](./img/very_silly.png)... (Spam! spam! spam! spam!)
164
172
165
173
This question then leads us very nicely into the next topic of ***Consensus***— but we won't start that now.
166
174
167
-
It is amusing to notice that in Renesse and Schneider's paper, one of the first things they state is *"We assume the coordinator doesn't fail!"* which they then admit is an unrealistic assumption. They then go on to describe how in their tests, they had a set of coordinator processes that were able to behave as a single process by running a consensus protocol between them.
175
+
It is amusing to notice that in Renesse and Schneider's paper, one of the first things they state is *"We assume the coordinator doesn't fail!"* which they then admit is an unrealistic assumption.
176
+
They then go on to describe how in their tests, they had a set of coordinator processes that were able to behave as a single process by running a consensus protocol between them.
168
177
169
178
It is sobering to realise that if we wish to implement both strong consistency between replicas ***and*** fault tolerance (which was the problem we wanted to avoid in the first place), then ultimately, we are forced to rely upon some form of consensus protocol.
170
179
171
-
But consensus is both ***hard*** and ***expensive*** to implement. This difficulty might then become a factor in deciding ***not*** to implement strong consistency. Now it looks very appealing to say *"If we can get away with a weaker form of consistency such as Causal Consistency, then shouldn't we look at this option?"*
180
+
But consensus is both ***hard*** and ***expensive*** to implement.
181
+
This difficulty might then become a factor in deciding ***not*** to implement strong consistency.
182
+
Now it looks very appealing to say *"If we can get away with a weaker form of consistency such as Causal Consistency, then shouldn't we look at this option?"*
172
183
173
184
That said, there are times when consensus really is vitally important.
0 commit comments