You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: src/routes/(pages)/blog/posts/changing-service-mesh/+page.md
+17-6
Original file line number
Diff line number
Diff line change
@@ -5,9 +5,11 @@ date: 2021-05-04T20:37:13+02:00
5
5
draft: false
6
6
author: Frode Sundby
7
7
tags: [istio, linkerd, LoadBalancing]
8
+
language: en
8
9
---
9
10
10
11
## Why change?
12
+
11
13
With an ambition of making our environments as secure as possible, we jumped on the service-mesh bandwagon in 2018 with Istio 0.7 and have stuck with it since.
12
14
13
15
Istio is a large and feature rich system that brings capabilities aplenty.
@@ -27,7 +29,9 @@ Rarely has a meme depicted a feeling more strongly
27
29
Even though we'd invested a lot of time and built in quite a bit of Istio into our platform, we knew we had to make the change.
28
30
29
31
## How did we do it?
30
-
### Original architecture:
32
+
33
+
### Original architecture:
34
+
31
35
Let's first have a quick look at what we were dealing with:
32
36
33
37
The first thing an end user encountered was our Google LoadBalancer configured by an IstioOperator.
@@ -38,20 +42,22 @@ We used an [operator](https://github.com/nais/naiserator) to configure these pol
We tied the Cloud Armor security policy to the Loadbalancer with a `BackendConfig` on the Ingress Controller's service:
73
+
67
74
```yaml
68
75
apiVersion: v1
69
76
kind: Service
@@ -89,6 +96,7 @@ Alrighty. We'd now gotten ourselves a brand new set of independantly configured
89
96
90
97
However - if we'd started shipping traffic to the new components at this stage, things would start breaking as there were no ingresses in the cluster - only VirtualServices.
91
98
To avoid downtime, we created an interim ingress that forwarded all traffic to the Istio IngressGateway:
With this ingress in place, we could reach all the existing VirtualServices exposed by the Istio Ingressgateway via the new Loadbalancers and Nginx.
108
117
And we could point our DNS records to the new rig without anyone noticing a thing.
109
118
110
119
### Migrating workloads from Istio to Linkerd
111
-
Once LoadBalancing and ingress traffic were closed chapters, we changed our attention to migrating workloads from Istio to Linkerd.
120
+
121
+
Once LoadBalancing and ingress traffic were closed chapters, we changed our attention to migrating workloads from Istio to Linkerd.
112
122
When moving a workload to a new service-mesh, there's a bit more to it than swapping out the sidecar with a new namespace annotation.
113
123
Our primary concerns were:
114
-
- The new sidecar would require NetworkPolicies to allow traffic to and from linkerd.
124
+
125
+
- The new sidecar would require NetworkPolicies to allow traffic to and from linkerd.
115
126
- The application's VirtualService would have to be transformed into an Ingress.
116
127
- Applications that used [scuttle](https://github.com/redboxllc/scuttle) to wait for the Istio sidecar to be ready had to be disabled.
117
128
- We couldn't possibly migrate all workloads simultaneously due to scale.
@@ -122,7 +133,7 @@ Using the NetworkPolicies to map out who were communicating with whom, we found
122
133
123
134
We then gave this list to our [operator](https://github.com/nais/naiserator), who in turn removed Istio resources, updated NetworkPolicies, created Ingresses and restarted workloads.
124
135
Slowly but surely our linkerd dashboard was starting to populate, and the only downtime was during the seconds it took for the first Linkerd pod to be ready.
125
-
One thing we didn't take into concideration (but should have), was that some applications shared a hostname.
136
+
One thing we didn't take into concideration (but should have), was that some applications shared a hostname.
126
137
When an ingress was created for a shared hostname, Nginx would stop forwarding requests for these hosts to Istio Ingressgateway, resulting in non-migrated applications not getting any traffic.
127
138
Realizing this, we started migrating applications on the same hostname simultaneously too.
128
139
@@ -133,8 +144,8 @@ And then they all lived happily ever after...
133
144
Except that we had to clean up Istio's mess.
134
145
135
146
### Cleaning up
136
-
What was left after the party was a fully operational Istio control plane, a whole bunch of Istio CRD's and a completely unused set of LoadBalancers. In addition we had to clean up everything related to Istio in a whole lot of pipelines and components
137
147
148
+
What was left after the party was a fully operational Istio control plane, a whole bunch of Istio CRD's and a completely unused set of LoadBalancers. In addition we had to clean up everything related to Istio in a whole lot of pipelines and components
Copy file name to clipboardexpand all lines: src/routes/(pages)/blog/posts/data_mesh_governance/+page.md
+33-20
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,9 @@ date: 2022-05-11T09:27:57+02:00
5
5
draft: false
6
6
author: Louis Dieffenthaler
7
7
tags: []
8
+
language: en
8
9
---
10
+
9
11
Using data to develop the world`s best welfare system is a bold mission we at the Norwegian Welfare and Labour Administration (NAV) have.
10
12
Being able to estimate the impact of measures on youth unemployment to avoid social exclusion.
11
13
Gaining more insight on how we can adjust our communication to different user groups so that we are more certain that the citizens get their benefits.
@@ -19,17 +21,20 @@ Enabling the teams to explore problems without being tied to a specified output
19
21
Our ambition is to enable the teams to treat data in the same manner: as a product.
20
22
21
23
## The case for data governance
24
+
22
25
The shift from building data pipelines answering specific questions to sharing data as products that can solve a variety of problems not yet known is massive.
23
26
To make this leap, we look to the [data mesh-approach](https://martinfowler.com/articles/data-mesh-principles.html) described by Zhamakh Deghani.
24
27
The principle of shifting ownership upstream to domain teams supported by a platform reducing the cognitive burden resonates with how we do it on the operational side.
25
28
We firmly believe that the product teams are best positioned to produce data that meets the consumers` needs related to for instance business definitions and quality.
26
29
On the other side, we acknowledge that there are justifications for collaborating with other teams on reaching decisions.
27
30
A few of those are:
28
-
- The need for coordination between teams. For instance, related to a common standard for communicating data quality
29
-
- The need to make sure the teams´ internal decisions do not conflict with the “greater good” of NAV. Policies could for instance be applied to a data product with multiple critical dependencies downstream
30
-
- The need to define policies related to compliance. GDPR is the prime example
31
+
32
+
- The need for coordination between teams. For instance, related to a common standard for communicating data quality
33
+
- The need to make sure the teams´ internal decisions do not conflict with the “greater good” of NAV. Policies could for instance be applied to a data product with multiple critical dependencies downstream
34
+
- The need to define policies related to compliance. GDPR is the prime example
31
35
32
36
## Governance should not only control but enable.
37
+
33
38
The word “governance” does not make people jump out of bed in the morning.
34
39
Traditionally, governance has been focused more on making sure people don`t do stuff they should not do.
35
40
Furthermore, data management – the management of business logic, access controls, etc. – has typically been the responsibility of a centralized team.
@@ -41,21 +46,24 @@ At the same time, the usage of data needs to adhere to the organization`s polici
41
46
In other words, data governance should enable teams to a) produce data matching the analytical needs of NAV and b) stay compliant.
42
47
43
48
## Principles and representation
49
+
44
50
The data mesh-paradigm describes a federated model of data governance.
45
51
The actors on the platform need to collaborate on developing policies that provide value and are feasible.
46
52
The data producers are supported by the data platform in performing the data management.
47
53
At NAV, we have approached this by setting up a “ground rules forum”.
48
54
The principles underlying the group`s work are:
49
-
1. The responsibility of producing data as a product should be placed as close to the origin as possible
50
-
2. Policies that move responsibility away from the data producers should be justified by value either in the form of analytical use or compliance
51
-
3. The policies should be supported by the platform when this is feasible
55
+
56
+
1. The responsibility of producing data as a product should be placed as close to the origin as possible
57
+
2. Policies that move responsibility away from the data producers should be justified by value either in the form of analytical use or compliance
58
+
3. The policies should be supported by the platform when this is feasible
52
59
53
60
The forum consists of team members with different perspectives:
54
-
- Consumer: The value from an analytical perspective
55
-
- Producer: The value from a producer`s perspective in addition to the cost of implementing policies
56
-
- Platform: The knowledge of how the policies can be supported by the platform
57
-
- Legal: The value in terms of compliance
58
-
- Business owner: The overall business value
61
+
62
+
- Consumer: The value from an analytical perspective
63
+
- Producer: The value from a producer`s perspective in addition to the cost of implementing policies
64
+
- Platform: The knowledge of how the policies can be supported by the platform
65
+
- Legal: The value in terms of compliance
66
+
- Business owner: The overall business value
59
67
60
68
In a complex organization with close to 100 product teams, it is important to scope down the complexity.
61
69
Luckily for us, NAV is currently replacing legacy systems.
@@ -65,29 +73,34 @@ The plan is to calibrate the organization and processes before scaling out to ot
65
73
At the same time, we are also seeking input from other parts of the organization when forming policies.
66
74
67
75
## Governance affects everyone – and should include everyone!
76
+
68
77
Broad involvement of the people sharing and using data is crucial for at least two reasons:
69
78
It increases both the chance of forming better policies and the legitimacy of these policies.
70
79
The process we have set up from start is deliberately simple:
71
-
1. People on the platform use Slack to discuss problems governance policies might solve
72
-
2. The forum discusses suggested policies when enough input has piled up. The suggested policies are documented using an architectural decision record
73
-
3. People on the platform share their opinions on the suggested policies
74
-
4. Decisions are reached
75
-
5. The policy is implemented, preferably supported by the platform.
80
+
81
+
1. People on the platform use Slack to discuss problems governance policies might solve
82
+
2. The forum discusses suggested policies when enough input has piled up. The suggested policies are documented using an architectural decision record
83
+
3. People on the platform share their opinions on the suggested policies
84
+
4. Decisions are reached
85
+
5. The policy is implemented, preferably supported by the platform.
76
86
77
87
Instead of engineering a complex process upfront, we start out with this and will calibrate it to fit our needs.
78
88
79
89
## How technology supports process
90
+
80
91
We are currently working on a feature that illustrates how the implementation of policy can be supported by technology.
81
92
To protect privacy, we have a policy that states that the consumer of data needs to document the legal basis for using the data.
82
93
On our marketplace – where the producers share data the consumers can find and understand – we are now working on a feature where the consumers can fill out a form to request access.
83
94
This feature is integrated with the databases where the legal basis is documented, easing the process for the consumer.
84
95
The request is sent to the data product owner`s site on the marketplace, where they can grant/reject access.
85
96
The policy was already there, but by adding the platform to the mix, we:
86
-
- Eased the process for the consumer and the producer as the requests are now part of their workflow – not a Jira ticket completely on the side of everything
87
-
- Eased the evaluation of assessments done by the consumers since we are logging it
88
-
- Allowed for product thinking of this process. We can now set up metrics to see how long it takes for requests to be processed, the share of requests being rejected, etc. In turn, this can be used as input into the next iteration of the platform. This also covers the non-technical sides of the process: How can we change the policies to reduce frictions?
97
+
98
+
- Eased the process for the consumer and the producer as the requests are now part of their workflow – not a Jira ticket completely on the side of everything
99
+
- Eased the evaluation of assessments done by the consumers since we are logging it
100
+
- Allowed for product thinking of this process. We can now set up metrics to see how long it takes for requests to be processed, the share of requests being rejected, etc. In turn, this can be used as input into the next iteration of the platform. This also covers the non-technical sides of the process: How can we change the policies to reduce frictions?
89
101
90
102
## Summing up
103
+
91
104
Governance is key to unlocking the value of data in a secure way.
92
105
To achieve this, we need to approach this by supporting teams instead of controlling them.
93
-
The approach we describe here is intentionally lightweight, and we will share our learning points as we progress.
106
+
The approach we describe here is intentionally lightweight, and we will share our learning points as we progress.
0 commit comments