Skip to content

Commit 87ba91b

Browse files
committed
wip: blog
1 parent 87a90bf commit 87ba91b

File tree

21 files changed

+1613
-5
lines changed

21 files changed

+1613
-5
lines changed

package.json

+1
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
"eslint-config-prettier": "^9.1.0",
2222
"eslint-plugin-svelte": "^2.36.0",
2323
"globals": "^15.0.0",
24+
"mdsvex": "^0.12.3",
2425
"prettier": "^3.3.2",
2526
"prettier-plugin-svelte": "^3.2.6",
2627
"svelte": "^5.0.0",

src/lib/Header.svelte

+1-1
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
</button>
3434
<nav class="main-menu" class:isOpen>
3535
<ul class="main-menu-list">
36-
<li><a class="main-menu-item" href="https://nais.io/blog/">Artikler</a></li>
36+
<li><a class:isActive={isActive("blogg")} href="/blog">Blogg</a></li>
3737
<li><a class="main-menu-item" href="https://docs.nais.io">Dokumentasjon</a></li>
3838
<!-- <li><a class="main-menu-item" href="/annonseringer" class:isActive={isActive('annonseringer')}>Annonseringer</a></li> -->
3939
<!-- <li>

src/routes/blog/+page.server.ts

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
interface Post {
2+
route: string;
3+
content: string;
4+
metadata: {
5+
title: string;
6+
date: string;
7+
description: string;
8+
tags: string[];
9+
author: string;
10+
};
11+
}
12+
export async function load() {
13+
const postFiles = import.meta.glob<boolean, string, Post>("./posts/*/+page.md");
14+
let posts = await Promise.all(
15+
Object.entries(postFiles).map(async ([path, post]) => {
16+
const { metadata } = await post();
17+
const route = "/blog/" + path.split("/").slice(1, 3).join("/");
18+
return { metadata, route } as Post;
19+
}),
20+
);
21+
posts = posts.sort((a, b) => new Date(b.metadata.date).getTime() - new Date(a.metadata.date).getTime());
22+
return { posts };
23+
}

src/routes/blog/+page.svelte

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
<script lang="ts">
2+
import type { PageData } from "./$types";
3+
4+
let { data }: { data: PageData } = $props();
5+
6+
</script>
7+
8+
{#each data.posts as post}
9+
<a href="{post.route}"><h1>{post.metadata.title}</h1></a>
10+
<p>{post.metadata.description}</p>
11+
{/each}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
---
2+
title: "Changing Service Mesh"
3+
description: "How we swapped Istio with Linkerd with hardly any downtime"
4+
date: 2021-05-04T20:37:13+02:00
5+
draft: false
6+
author: Frode Sundby
7+
tags: [istio, linkerd, LoadBalancing]
8+
---
9+
10+
11+
## Why change?
12+
With an ambition of making our environments as secure as possible, we jumped on the service-mesh bandwagon in 2018 with Istio 0.7 and have stuck with it since.
13+
14+
Istio is a large and feature rich system that brings capabilities aplenty.
15+
Although there are a plethora of nifty and useful things we could do with Istio, we've primarily used it for mTLS and authorization policies.
16+
17+
One might think that having lots of features available but not using them couldn't possibly be a problem.
18+
However, all these extra capabilities comes with a cost - namely complexity; and we've felt encumbered by this complexity every time when configuring, maintaining or troubleshooting in our clusters.
19+
Our suspicions were that since we hardly used any of the capabilities, we could probably make do with a much simpler alternative.
20+
So, and after yet another _"Oh... This problem was caused by Istio!"_-moment, we decided the time was ripe to consider the alternatives out there.
21+
22+
We looked to the grand ol' Internet for alternatives and fixed our gaze on the rising star Linkerd 2.
23+
Having honed in on our preferred candidate, we decided to take it for a quick spin in a cluster and found our suspicions to be accurate.
24+
25+
Rarely has a meme depicted a feeling more strongly
26+
<!--![service-mesh-experiance](/blog/images/service-mesh-experience.jpg)-->
27+
28+
Even though we'd invested a lot of time and built in quite a bit of Istio into our platform, we knew we had to make the change.
29+
30+
## How did we do it?
31+
### Original architecture:
32+
Let's first have a quick look at what we were dealing with:
33+
34+
The first thing an end user encountered was our Google LoadBalancer configured by an IstioOperator.
35+
The traffic was then forwarded to the Istio Ingressgateway, who in turn sent it along via an mTLS connection to the application.
36+
Before the Ingressgateway could reach the application, both NetworkPolicies and AuthorizationPolicies were required to allow the traffic.
37+
We used an [operator](https://github.com/nais/naiserator) to configure these policies when an application was deployed.
38+
39+
<!-- ![changing-service-mesh](/blog/images/changing-service-mesh-1.png) -->
40+
41+
### New LoadBalancers and ingress controllers
42+
Since our LoadBalancers were configured by (and sent traffic to) Istio, we had to change the way we configured them.
43+
Separating LoadBalancing from mesh is a healthy separation of concern that will give us greater flexibility in the future as well.
44+
We also had to swap out Istio Ingressgateway with an Ingress Controller - we opted for NGINX.
45+
46+
We started by creating IP-addresses and Cloud Armor security policies for our new LoadBalancers with [Terraform](https://www.terraform.io/).
47+
48+
The loadbalancers themselves were created by an Ingress object:
49+
```yaml
50+
apiVersion: networking.k8s.io/v1beta1
51+
kind: Ingress
52+
metadata:
53+
annotations:
54+
networking.gke.io/v1beta1.FrontendConfig: <tls-config>
55+
kubernetes.io/ingress.global-static-ip-name: <global-ip-name>
56+
kubernetes.io/ingress.allow-http: "false"
57+
name: <loadbalancer-name>
58+
namespace: <ingress-controller-namespace>
59+
spec:
60+
backend:
61+
serviceName: <ingress-controller-service>
62+
servicePort: 443
63+
tls:
64+
- secretName: <kubernetes-secret-with-certificates>
65+
```
66+
67+
We tied the Cloud Armor security policy to the Loadbalancer with a `BackendConfig` on the Ingress Controller's service:
68+
```yaml
69+
apiVersion: v1
70+
kind: Service
71+
metadata:
72+
annotations:
73+
cloud.google.com/app-protocols: '{"https": "HTTP2"}'
74+
cloud.google.com/backend-config: '{"default": "<backendconfig-name>"}'
75+
cloud.google.com/neg: '{"ingress": true}'
76+
...
77+
---
78+
apiVersion: cloud.google.com/v1
79+
kind: BackendConfig
80+
metadata:
81+
name: <backendconfig-name>
82+
spec:
83+
securityPolicy:
84+
name: <security-policy-name>
85+
...
86+
```
87+
88+
Alrighty. We'd now gotten ourselves a brand new set of independantly configured LoadBalancers and a shiny new Ingress Controller.
89+
<!-- ![changing-service-mesh](/blog/images/changing-service-mesh-2.png) -->
90+
91+
However - if we'd started shipping traffic to the new components at this stage, things would start breaking as there were no ingresses in the cluster - only VirtualServices.
92+
To avoid downtime, we created an interim ingress that forwarded all traffic to the Istio IngressGateway:
93+
```yaml
94+
apiVersion: networking.k8s.io/v1beta1
95+
kind: Ingress
96+
spec:
97+
rules:
98+
- host: '<domain-name>'
99+
http:
100+
paths:
101+
- backend:
102+
serviceName: <istio-ingressgateway-service>
103+
servicePort: 443
104+
path: /
105+
...
106+
```
107+
<!-- ![changing-service-mesh](/blog/images/changing-service-mesh-3.png) -->
108+
With this ingress in place, we could reach all the existing VirtualServices exposed by the Istio Ingressgateway via the new Loadbalancers and Nginx.
109+
And we could point our DNS records to the new rig without anyone noticing a thing.
110+
111+
### Migrating workloads from Istio to Linkerd
112+
Once LoadBalancing and ingress traffic were closed chapters, we changed our attention to migrating workloads from Istio to Linkerd.
113+
When moving a workload to a new service-mesh, there's a bit more to it than swapping out the sidecar with a new namespace annotation.
114+
Our primary concerns were:
115+
- The new sidecar would require NetworkPolicies to allow traffic to and from linkerd.
116+
- The application's VirtualService would have to be transformed into an Ingress.
117+
- Applications that used [scuttle](https://github.com/redboxllc/scuttle) to wait for the Istio sidecar to be ready had to be disabled.
118+
- We couldn't possibly migrate all workloads simultaneously due to scale.
119+
- Applications have to communicate, but they can't when they're in different service-meshes.
120+
121+
Since applications have a tendency of communicating with eachother, and communication between different service-meshes was a bit of a bother, we decided to migrate workloads based on who they were communicating with to avoid causing trouble.
122+
Using the NetworkPolicies to map out who were communicating with whom, we found a suitable order to migrate workloads in.
123+
124+
We then gave this list to our [operator](https://github.com/nais/naiserator), who in turn removed Istio resources, updated NetworkPolicies, created Ingresses and restarted workloads.
125+
Slowly but surely our linkerd dashboard was starting to populate, and the only downtime was during the seconds it took for the first Linkerd pod to be ready.
126+
One thing we didn't take into concideration (but should have), was that some applications shared a hostname.
127+
When an ingress was created for a shared hostname, Nginx would stop forwarding requests for these hosts to Istio Ingressgateway, resulting in non-migrated applications not getting any traffic.
128+
Realizing this, we started migrating applications on the same hostname simultaneously too.
129+
130+
<!-- ![changing-service-mesh](/blog/images/changing-service-mesh-4.png) -->
131+
And within a couple of hours, all workloads were migrated and we had ourselves a brand spanking new service-mesh in production.
132+
And then they all lived happily ever after...
133+
134+
Except that we had to clean up Istio's mess.
135+
136+
### Cleaning up
137+
What was left after the party was a fully operational Istio control plane, a whole bunch of Istio CRD's and a completely unused set of LoadBalancers. In addition we had to clean up everything related to Istio in a whole lot of pipelines and components
138+
139+
140+
<!-- ![changing-service-mesh](/blog/images/changing-service-mesh-5.png)
141+
![changing-service-mesh](/blog/images/changing-service-mesh-6.png) -->
142+
143+
It has to be said - there is a certain satisfaction in cleaning up after a party that has been going on for too long.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
---
2+
title: "Enable data teams to deliver high-quality data products"
3+
description: "How data governance adds value in a data mesh"
4+
date: 2022-05-11T09:27:57+02:00
5+
draft: false
6+
author: Louis Dieffenthaler
7+
tags: []
8+
---
9+
Using data to develop the world`s best welfare system is a bold mission we at the Norwegian Welfare and Labour Administration (NAV) have.
10+
Being able to estimate the impact of measures on youth unemployment to avoid social exclusion.
11+
Gaining more insight on how we can adjust our communication to different user groups so that we are more certain that the citizens get their benefits.
12+
To see how a new feature affects the click-through rate.
13+
Data plays a central role in solving each of these cases.
14+
So how do we get there?
15+
16+
NAV has in recent years undergone a radical change in its approach to developing products.
17+
The traditional project-oriented approach of specifying “everything” up-front is replaced by a model where product teams are given the freedom to solve problems within boundaries.
18+
Enabling the teams to explore problems without being tied to a specified output allows us to gain knowledge about the value, the feasibility as well as other aspects of the product.
19+
Our ambition is to enable the teams to treat data in the same manner: as a product.
20+
21+
## The case for data governance
22+
The shift from building data pipelines answering specific questions to sharing data as products that can solve a variety of problems not yet known is massive.
23+
To make this leap, we look to the [data mesh-approach](https://martinfowler.com/articles/data-mesh-principles.html) described by Zhamakh Deghani.
24+
The principle of shifting ownership upstream to domain teams supported by a platform reducing the cognitive burden resonates with how we do it on the operational side.
25+
We firmly believe that the product teams are best positioned to produce data that meets the consumers` needs related to for instance business definitions and quality.
26+
On the other side, we acknowledge that there are justifications for collaborating with other teams on reaching decisions.
27+
A few of those are:
28+
- The need for coordination between teams. For instance, related to a common standard for communicating data quality
29+
- The need to make sure the teams´ internal decisions do not conflict with the “greater good” of NAV. Policies could for instance be applied to a data product with multiple critical dependencies downstream
30+
- The need to define policies related to compliance. GDPR is the prime example
31+
32+
## Governance should not only control but enable.
33+
The word “governance” does not make people jump out of bed in the morning.
34+
Traditionally, governance has been focused more on making sure people don`t do stuff they should not do.
35+
Furthermore, data management – the management of business logic, access controls, etc. – has typically been the responsibility of a centralized team.
36+
This might work in a static environment.
37+
38+
The cases presented in the introduction are not solved in a static environment.
39+
Data scientists, decision-makers and product teams all need to find, understand and get access to data quickly in order to explore solutions.
40+
At the same time, the usage of data needs to adhere to the organization`s policies.
41+
In other words, data governance should enable teams to a) produce data matching the analytical needs of NAV and b) stay compliant.
42+
43+
## Principles and representation
44+
The data mesh-paradigm describes a federated model of data governance.
45+
The actors on the platform need to collaborate on developing policies that provide value and are feasible.
46+
The data producers are supported by the data platform in performing the data management.
47+
At NAV, we have approached this by setting up a “ground rules forum”.
48+
The principles underlying the group`s work are:
49+
1. The responsibility of producing data as a product should be placed as close to the origin as possible
50+
2. Policies that move responsibility away from the data producers should be justified by value either in the form of analytical use or compliance
51+
3. The policies should be supported by the platform when this is feasible
52+
53+
The forum consists of team members with different perspectives:
54+
- Consumer: The value from an analytical perspective
55+
- Producer: The value from a producer`s perspective in addition to the cost of implementing policies
56+
- Platform: The knowledge of how the policies can be supported by the platform
57+
- Legal: The value in terms of compliance
58+
- Business owner: The overall business value
59+
60+
In a complex organization with close to 100 product teams, it is important to scope down the complexity.
61+
Luckily for us, NAV is currently replacing legacy systems.
62+
We use people from teams affected by and involved in this effort to set up the forum.
63+
This makes it easier to reach decisions in the forum.
64+
The plan is to calibrate the organization and processes before scaling out to other parts of the organization.
65+
At the same time, we are also seeking input from other parts of the organization when forming policies.
66+
67+
## Governance affects everyone – and should include everyone!
68+
Broad involvement of the people sharing and using data is crucial for at least two reasons:
69+
It increases both the chance of forming better policies and the legitimacy of these policies.
70+
The process we have set up from start is deliberately simple:
71+
1. People on the platform use Slack to discuss problems governance policies might solve
72+
2. The forum discusses suggested policies when enough input has piled up. The suggested policies are documented using an architectural decision record
73+
3. People on the platform share their opinions on the suggested policies
74+
4. Decisions are reached
75+
5. The policy is implemented, preferably supported by the platform.
76+
77+
Instead of engineering a complex process upfront, we start out with this and will calibrate it to fit our needs.
78+
79+
## How technology supports process
80+
We are currently working on a feature that illustrates how the implementation of policy can be supported by technology.
81+
To protect privacy, we have a policy that states that the consumer of data needs to document the legal basis for using the data.
82+
On our marketplace – where the producers share data the consumers can find and understand – we are now working on a feature where the consumers can fill out a form to request access.
83+
This feature is integrated with the databases where the legal basis is documented, easing the process for the consumer.
84+
The request is sent to the data product owner`s site on the marketplace, where they can grant/reject access.
85+
The policy was already there, but by adding the platform to the mix, we:
86+
- Eased the process for the consumer and the producer as the requests are now part of their workflow – not a Jira ticket completely on the side of everything
87+
- Eased the evaluation of assessments done by the consumers since we are logging it
88+
- Allowed for product thinking of this process. We can now set up metrics to see how long it takes for requests to be processed, the share of requests being rejected, etc. In turn, this can be used as input into the next iteration of the platform. This also covers the non-technical sides of the process: How can we change the policies to reduce frictions?
89+
90+
## Summing up
91+
Governance is key to unlocking the value of data in a secure way.
92+
To achieve this, we need to approach this by supporting teams instead of controlling them.
93+
The approach we describe here is intentionally lightweight, and we will share our learning points as we progress.

0 commit comments

Comments
 (0)