Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
896ff1f
Add SwiftV2 long-running pipeline with scheduled tests
Nov 22, 2025
873c05e
Update readme file.
Nov 24, 2025
3395415
fix syntax for pe test.
Nov 24, 2025
b34b332
Create NSG rules with unique priority.
Dec 2, 2025
e9f50e6
Update go.mod
sivakami-projects Dec 5, 2025
8364bf5
Update test/integration/swiftv2/longRunningCluster/datapath_create_te…
sivakami-projects Dec 5, 2025
1d2ed59
Update test/integration/swiftv2/longRunningCluster/datapath_delete_te…
sivakami-projects Dec 5, 2025
efbfb02
Update test/integration/swiftv2/longRunningCluster/datapath_connectiv…
sivakami-projects Dec 5, 2025
04a22a0
Update test/integration/swiftv2/longRunningCluster/datapath_delete_te…
sivakami-projects Dec 5, 2025
56fbeb2
Error handling for private endpoint tests.
Dec 8, 2025
4d29aec
Private endpoint tests.
Dec 8, 2025
a1baf08
update private endpoint test.
Dec 8, 2025
0945c2c
update pod.yaml
Dec 8, 2025
7df1c79
Check if mtpnc is cleaned up after pods are deleted.
Dec 8, 2025
b37b033
Update vnet names.
Dec 8, 2025
2672caa
add container readiness check.
Dec 8, 2025
9d27d43
update pod.yaml
Dec 8, 2025
95ff010
Update pod.yaml
Dec 8, 2025
c1bd2e6
Update connectivity test.
Dec 8, 2025
7bdf1b0
Update netcat curl test.
Dec 9, 2025
a27aa52
Enable delete pods.
Dec 9, 2025
4f32773
Remove test changes.
Dec 9, 2025
feb46e4
remove test changes for storage accounts.
Dec 9, 2025
3b9bc5c
update go.mod
Dec 11, 2025
de09b98
Make dockerfiles.
Dec 11, 2025
d3c4686
lint fixes
Dec 12, 2025
4adcb1a
Merge branch 'master' into sv2-long-running-pipeline-stage2
sivakami-projects Dec 12, 2025
e08fa01
update dockerfiles.
Dec 12, 2025
066ba2c
Lint fix.
Dec 12, 2025
6688685
reset package name.
Dec 12, 2025
cf7173b
fix package name.
Dec 12, 2025
e51230d
refactor: clean up long-running pipeline and update tests
Dec 13, 2025
5f53e7c
Assign and remove rbac role after every run.
Dec 15, 2025
a55deb9
Make dockerfiles
Dec 16, 2025
ff25707
fetch go.sum from master branch.
Dec 16, 2025
c5b3000
Check if rbac roles are cleaned up after delete.
Dec 16, 2025
c100abc
TCP netcat connectivity test improvements.
Dec 16, 2025
c379c32
lint fix.
Dec 16, 2025
7bc70dc
workload inputs are validated skip g204 check.
Dec 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .pipelines/build/dockerfiles/cns.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ ENTRYPOINT ["azure-cns.exe"]
EXPOSE 10090

# mcr.microsoft.com/azurelinux/base/core:3.0
FROM --platform=linux/${ARCH} mcr.microsoft.com/azurelinux/base/core@sha256:3d53b96f4e336a197023bda703a056eaefecc6728e9a2b0c1ef42f7dce183338 AS build-helper
FROM --platform=linux/${ARCH} mcr.microsoft.com/azurelinux/base/core@sha256:72b38469d220910172d0f52dcb029b3cbbb226c63522103c955778ad7ace06a8 AS build-helper
RUN tdnf install -y iptables

# mcr.microsoft.com/azurelinux/distroless/minimal:3.0
FROM --platform=linux/${ARCH} mcr.microsoft.com/azurelinux/distroless/minimal@sha256:6b78aa535a2a5107ee308b767c0f1f5055a58d0e751f9d87543bc504da6d0ed3 AS linux
FROM --platform=linux/${ARCH} mcr.microsoft.com/azurelinux/distroless/minimal@sha256:c8917363fff9f80859b0e4140cf1d3677fb29cc8f48c0337e53dd58797e9f3ee AS linux
ARG ARTIFACT_DIR .

COPY --from=build-helper /usr/sbin/*tables* /usr/sbin/
Expand Down
237 changes: 237 additions & 0 deletions .pipelines/swiftv2-long-running/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
# SwiftV2 Long-Running Pipeline

This pipeline tests SwiftV2 pod networking in a persistent environment with scheduled test runs.

## Architecture Overview

**Infrastructure (Persistent)**:
- **2 AKS Clusters**: aks-1, aks-2 (4 nodes each: 2 low-NIC default pool, 2 high-NIC nplinux pool)
- **4 VNets**: cx_vnet_v1, cx_vnet_v2, cx_vnet_v3 (Customer 1 with PE to storage), cx_vnet_v4 (Customer 2)
- **VNet Peerings**: vnet mesh.
- **Storage Account**: With private endpoint from cx_vnet_v1
- **NSGs**: Restricting traffic between subnets (s1, s2) in vnet cx_vnet_v1.
- **Node Labels**: All nodes labeled with `workload-type` and `nic-capacity` for targeted test execution


**Node Labeling for Multiple Workload Types**:
Each node pool gets labeled with its designated workload type during setup:
```bash
# During cluster creation or node pool addition:
kubectl label nodes -l workload-type=swiftv2-linux
kubectl label nodes -l workload-type=swiftv2-linuxbyon
kubectl label nodes -l workload-type=swiftv2-l1vhaccelnet
kubectl label nodes -l workload-type=swiftv2-l1vhib
```

## How It Works

### Scheduled Test Flow
Every scheduled run, the pipeline:
1. Skips setup stages (infrastructure already exists)
2. **Job 1 - Create Resources**: Creates 8 test scenarios (PodNetwork, PNI, Pods with TCP netcat listeners on port 8080)
3. **Job 2 - Connectivity Tests**: Tests TCP connectivity between pods (9 test cases), then waits 20 minutes
4. **Job 3 - Private Endpoint Tests**: Tests private endpoint access and tenant isolation (5 test cases)
5. **Job 4 - Delete Resources**: Deletes all test resources (Phase 1: Pods, Phase 2: PNI/PN/Namespaces)
6. Reports results


## Test Case Details

### 8 Pod Scenarios (Created in Job 1)

All test scenarios create the following resources:
- **PodNetwork**: Defines the network configuration for a VNet/subnet combination
- **PodNetworkInstance**: Instance-level configuration with IP allocation
- **Pod**: Test pod running nicolaka/netshoot with TCP netcat listener on port 8080

| # | Scenario | Cluster | VNet | Subnet | Node Type | Pod Name | Purpose |
|---|----------|---------|------|--------|-----------|----------|---------|
| 1 | Customer2-AKS2-VnetV4-S1-LowNic | aks-2 | cx_vnet_v4 | s1 | low-nic | pod-c2-aks2-v4s1-low | Tenant B pod for isolation testing |
| 2 | Customer2-AKS2-VnetV4-S1-HighNic | aks-2 | cx_vnet_v4 | s1 | high-nic | pod-c2-aks2-v4s1-high | Tenant B pod on high-NIC node |
| 3 | Customer1-AKS1-VnetV1-S1-LowNic | aks-1 | cx_vnet_v1 | s1 | low-nic | pod-c1-aks1-v1s1-low | Tenant A pod in NSG-protected subnet |
| 4 | Customer1-AKS1-VnetV1-S2-LowNic | aks-1 | cx_vnet_v1 | s2 | low-nic | pod-c1-aks1-v1s2-low | Tenant A pod for NSG isolation test |
| 5 | Customer1-AKS1-VnetV1-S2-HighNic | aks-1 | cx_vnet_v1 | s2 | high-nic | pod-c1-aks1-v1s2-high | Tenant A pod on high-NIC node |
| 6 | Customer1-AKS1-VnetV2-S1-HighNic | aks-1 | cx_vnet_v2 | s1 | high-nic | pod-c1-aks1-v2s1-high | Tenant A pod in peered VNet |
| 7 | Customer1-AKS2-VnetV2-S1-LowNic | aks-2 | cx_vnet_v2 | s1 | low-nic | pod-c1-aks2-v2s1-low | Cross-cluster same VNet test |
| 8 | Customer1-AKS2-VnetV3-S1-HighNic | aks-2 | cx_vnet_v3 | s1 | high-nic | pod-c1-aks2-v3s1-high | Private endpoint access test |

### Connectivity Tests (9 Test Cases in Job 2)

Tests TCP connectivity between pods using netcat with 3-second timeout:

**Expected to SUCCEED (4 tests)**:

| Test | Source → Destination | Validation | Purpose |
|------|---------------------|------------|---------|
| SameVNetSameSubnet | pod-c1-aks1-v1s2-low → pod-c1-aks1-v1s2-high | TCP Connected | Basic same-subnet connectivity |
| DifferentVNetSameCustomer | pod-c1-aks1-v2s1-high → pod-c1-aks2-v2s1-low | TCP Connected | Cross-cluster, same VNet (v2) |
| PeeredVNets | pod-c1-aks1-v1s2-low → pod-c1-aks1-v2s1-high | TCP Connected | VNet peering (v1 ↔ v2) |
| PeeredVNets_v2tov3 | pod-c1-aks1-v2s1-high → pod-c1-aks2-v3s1-high | TCP Connected | VNet peering across clusters |

**Expected to FAIL (5 tests)**:

| Test | Source → Destination | Expected Error | Purpose |
|------|---------------------|----------------|---------|
| NSGBlocked_S1toS2 | pod-c1-aks1-v1s1-low → pod-c1-aks1-v1s2-high | Connection timeout | NSG blocks s1→s2 in cx_vnet_v1 |
| NSGBlocked_S2toS1 | pod-c1-aks1-v1s2-low → pod-c1-aks1-v1s1-low | Connection timeout | NSG blocks s2→s1 (bidirectional) |
| DifferentCustomers_V1toV4 | pod-c1-aks1-v1s2-low → pod-c2-aks2-v4s1-low | Connection timeout | Customer isolation (no peering) |
| DifferentCustomers_V2toV4 | pod-c1-aks1-v2s1-high → pod-c2-aks2-v4s1-high | Connection timeout | Customer isolation (no peering) |
| UnpeeredVNets_V3toV4 | pod-c1-aks2-v3s1-high → pod-c2-aks2-v4s1-low | Connection timeout | No peering between v3 and v4 |

**NSG Rules Configuration**:
- cx_vnet_v1 has NSG rules blocking traffic between s1 and s2 subnets:
- Deny outbound from s1 to s2 (priority 100)
- Deny inbound from s1 to s2 (priority 110)
- Deny outbound from s2 to s1 (priority 100)
- Deny inbound from s2 to s1 (priority 110)

### Private Endpoint Tests (5 Test Cases in Job 3)

Tests access to Azure Storage Account via Private Endpoint with public network access disabled:

**Expected to SUCCEED (4 tests)**:

| Test | Source → Storage | Validation | Purpose |
|------|-----------------|------------|---------|
| TenantA_VNetV1_S1_to_StorageA | pod-c1-aks1-v1s1-low → Storage-A | Blob download via SAS | Access via private endpoint from VNet V1 |
| TenantA_VNetV1_S2_to_StorageA | pod-c1-aks1-v1s2-low → Storage-A | Blob download via SAS | Access via private endpoint from VNet V1 |
| TenantA_VNetV2_to_StorageA | pod-c1-aks1-v2s1-high → Storage-A | Blob download via SAS | Access via peered VNet (V2 peered with V1) |
| TenantA_VNetV3_to_StorageA | pod-c1-aks2-v3s1-high → Storage-A | Blob download via SAS | Access via peered VNet from different cluster |

**Expected to FAIL (1 test)**:

| Test | Source → Storage | Expected Error | Purpose |
|------|-----------------|----------------|---------|
| TenantB_to_StorageA_Isolation | pod-c2-aks2-v4s1-low → Storage-A | Connection timeout/failed | Tenant isolation - no private endpoint access, public blocked |

**Private Endpoint Configuration**:
- Private endpoint created in cx_vnet_v1 subnet 'pe'
- Private DNS zone `privatelink.blob.core.windows.net` linked to:
- cx_vnet_v1, cx_vnet_v2, cx_vnet_v3 (Tenant A VNets)
- aks-1 and aks-2 cluster VNets
- Storage Account 1 (Tenant A):
- Public network access: **Disabled**
- Shared key access: Disabled (Azure AD only)
- Blob public access: Disabled
- Storage Account 2 (Tenant B): Public access enabled (for future tests)

**Test Flow**:
1. DNS resolution: Storage FQDN resolves to private IP for Tenant A, fails/public IP for Tenant B
2. Generate SAS token: Azure AD authentication via management plane
3. Download blob: Using curl with SAS token via data plane
4. Validation: Verify blob content matches expected value

### Resource Creation Patterns

**Naming Convention**:
```
BUILD_ID = <resourceGroupName>

PodNetwork: pn-<BUILD_ID>-<vnet>-<subnet>
PodNetworkInstance: pni-<BUILD_ID>-<vnet>-<subnet>
Namespace: pn-<BUILD_ID>-<vnet>-<subnet>
Pod: pod-<scenario-suffix>
```

**Example** (for `resourceGroupName=sv2-long-run-centraluseuap`):
```
pn-sv2-long-run-centraluseuap-v1-s1
pni-sv2-long-run-centraluseuap-v1-s1
pn-sv2-long-run-centraluseuap-v1-s1 (namespace)
pod-c1-aks1-v1s1-low
```

**VNet Name Simplification**:
- `cx_vnet_v1` → `v1`
- `cx_vnet_v2` → `v2`
- `cx_vnet_v3` → `v3`
- `cx_vnet_v4` → `v4`


## Node Pool Configuration

### Node Labels and Architecture

All nodes in the clusters are labeled with two key labels for workload identification and NIC capacity. These labels are applied during cluster creation by the `create_aks.sh` script.

**1. Workload Type Label** (`workload-type`):
- Purpose: Identifies which test scenario group the node belongs to
- Current value: `swiftv2-linux` (applied to all nodes in current setup)
- Applied during: Cluster creation in Stage 1 (AKSClusterAndNetworking)
- Applied by: `.pipelines/swiftv2-long-running/scripts/create_aks.sh`
- Future use: Supports multiple workload types running as separate stages (e.g., `swiftv2-windows`, `swiftv2-byonodeid`)
- Stage isolation: Each test stage uses `WORKLOAD_TYPE` environment variable to filter nodes

**2. NIC Capacity Label** (`nic-capacity`):
- Purpose: Identifies the NIC capacity tier of the node
- Applied during: Cluster creation in Stage 1 (AKSClusterAndNetworking)
- Applied by: `.pipelines/swiftv2-long-running/scripts/create_aks.sh`
- Values:
- `low-nic`: Default nodepool (nodepool1) with `Standard_D4s_v3` (1 NIC)
- `high-nic`: NPLinux nodepool (nplinux) with `Standard_D16s_v3` (7 NICs)

**Label Application in create_aks.sh**:
```bash
# Step 1: All nodes get workload-type label
kubectl label nodes --all workload-type=swiftv2-linux --overwrite

# Step 2: Default nodepool gets low-nic capacity label
kubectl label nodes -l agentpool=nodepool1 nic-capacity=low-nic --overwrite

# Step 3: NPLinux nodepool gets high-nic capacity label
kubectl label nodes -l agentpool=nplinux nic-capacity=high-nic --overwrite
```

### Node Selection in Tests

Tests use these labels to select appropriate nodes dynamically:
- **Function**: `GetNodesByNicCount()` in `test/integration/swiftv2/longRunningCluster/datapath.go`
- **Filtering**: Nodes filtered by BOTH `workload-type` AND `nic-capacity` labels
- **Environment Variable**: `WORKLOAD_TYPE` (set by each test stage) determines which nodes are used
- Current: `WORKLOAD_TYPE=swiftv2-linux` in ManagedNodeDataPathTests stage
- Future: Different values for each stage (e.g., `swiftv2-byonodeid`, `swiftv2-windows`)
- **Selection Logic**:
```go
// Get low-nic nodes with matching workload type
kubectl get nodes -l "nic-capacity=low-nic,workload-type=$WORKLOAD_TYPE"

// Get high-nic nodes with matching workload type
kubectl get nodes -l "nic-capacity=high-nic,workload-type=$WORKLOAD_TYPE"
```
- **Pod Assignment**:
- Low-NIC nodes: Limited to 1 pod per node
- High-NIC nodes: Currently limited to 1 pod per node in test logic

**Node Pool Configuration**:

| Node Pool | VM SKU | NICs | Label | Pods per Node |
|-----------|--------|------|-------|---------------|
| nodepool1 (default) | `Standard_D4s_v3` | 1 | `nic-capacity=low-nic` | 1 |
| nplinux | `Standard_D16s_v3` | 7 | `nic-capacity=high-nic` | 1 (current test logic) |

**Note**: VM SKUs are hardcoded as constants in the pipeline template and cannot be changed by users.

## File Structure

```
.pipelines/swiftv2-long-running/
├── pipeline.yaml # Main pipeline with schedule
├── README.md # This file
├── template/
│ └── long-running-pipeline-template.yaml # Stage definitions (2 jobs)
└── scripts/
├── create_aks.sh # AKS cluster creation
├── create_vnets.sh # VNet and subnet creation
├── create_peerings.sh # VNet peering setup
├── create_storage.sh # Storage account creation
├── create_nsg.sh # Network security groups
└── create_pe.sh # Private endpoint setup

test/integration/swiftv2/longRunningCluster/
├── datapath_test.go # Original combined test (deprecated)
├── datapath_create_test.go # Create test scenarios (Job 1)
├── datapath_delete_test.go # Delete test scenarios (Job 2)
├── datapath.go # Resource orchestration
└── helpers/
└── az_helpers.go # Azure/kubectl helper functions
```
39 changes: 19 additions & 20 deletions .pipelines/swiftv2-long-running/pipeline.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,14 @@
trigger: none
pr: none

# Schedule: Run every 1 hour
schedules:
- cron: "0 */3 * * *" # Every 3 hours at minute 0
displayName: "Run tests every 3 hours"
branches:
include:
- sv2-long-running-pipeline-stage2
always: true # Run even if there are no code changes

parameters:
- name: subscriptionId
Expand All @@ -11,32 +21,21 @@ parameters:
type: string
default: "centraluseuap"

- name: resourceGroupName
displayName: "Resource Group Name"
type: string
default: "long-run-$(Build.BuildId)"

- name: vmSkuDefault
displayName: "VM SKU for Default Node Pool"
type: string
default: "Standard_D2s_v3"

- name: vmSkuHighNIC
displayName: "VM SKU for High NIC Node Pool"
type: string
default: "Standard_D16s_v3"
- name: runSetupStages
displayName: "Create New Infrastructure Setup"
type: boolean
default: false

- name: serviceConnection
displayName: "Azure Service Connection"
# Setup-only parameters (only used when runSetupStages=true)
- name: resourceGroupName
displayName: "Resource Group Name used when Create new Infrastructure Setup is selected"
type: string
default: "Azure Container Networking - Standalone Test Service Connection"
default: "sv2-long-run-$(Build.BuildId)"

extends:
template: template/long-running-pipeline-template.yaml
parameters:
subscriptionId: ${{ parameters.subscriptionId }}
location: ${{ parameters.location }}
resourceGroupName: ${{ parameters.resourceGroupName }}
vmSkuDefault: ${{ parameters.vmSkuDefault }}
vmSkuHighNIC: ${{ parameters.vmSkuHighNIC }}
serviceConnection: ${{ parameters.serviceConnection }}
runSetupStages: ${{ parameters.runSetupStages }}
Loading
Loading