Skip to content

Commit 04ef8b4

Browse files
AusNOG 2025 notes
1 parent 4f1cc74 commit 04ef8b4

File tree

1 file changed

+287
-0
lines changed

1 file changed

+287
-0
lines changed
Lines changed: 287 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,287 @@
1+
+++
2+
title = "AUSNOG 2025 Notes"
3+
date = "2025-05-17"
4+
description = "Just some general notes from attending AUSNOG 2025"
5+
tags = [
6+
"ausnog",
7+
"network automation",
8+
"python",
9+
]
10+
showComments = "true"
11+
robots = "all"
12+
draft = "true"
13+
+++
14+
15+
## Nexthop
16+
17+
- Strategies to ping everything in/out of our network from generic monitoring and routers themselves
18+
- Arista allows you to run containers within their routers/switches as its just an AlmaLinux base
19+
- Can also just runs scripts on the switch itself
20+
- Running a python daemon that connects to kakfa
21+
- Arista comes with ProcMgr inbuilt to monitor your custom script/process
22+
23+
## Reannz
24+
25+
Network State Checking
26+
27+
- perfSONAR
28+
- config creation, then validation during migration using an inbuilt tool
29+
30+
# AWS
31+
32+
No Packet Left Behind
33+
34+
- Network owned end to end
35+
- 96% automation due to scale (1 million network devices)
36+
- Single chip routers, fixed ports
37+
- No fabric, cellification, dual REs, state sync, etc.
38+
- 32 x 400G 12.8TB switch (can breakout to 100G)
39+
- Deployed 1 rack at a time in a clos fabric, pre-cabled
40+
- Auto remediation stages
41+
- Detect/Isolate
42+
- Identify Root Case
43+
- Mitigate impact
44+
- Remediate underlying problem
45+
- Return to Service
46+
- Vendor discard counters vary per vendor, even platform in the same vendor
47+
- No standards
48+
- draft-evans-discardclass
49+
- Shift / Drain / etc.
50+
- Take device out of service (as path, is-is pref, etc.)
51+
- ECMP allows to take out single link
52+
- Rollback no matter the change, can always roll forward again
53+
- Deployment
54+
- Change -> Validate -> Intended State -> Deployment System -> Applied state -> Observed State
55+
- Batfish for Correctness of Intent
56+
- Containerlab / NetLab / etc. for actual implementation testing
57+
- At scale
58+
- 2500 lines of instructions to an LLM agent
59+
- This is just tribal knowledge
60+
61+
## Nokia
62+
63+
- 8 x GPU collective operations (RCCL, NCCL, etc.)
64+
- Dataset with Weights / Gradients
65+
- All-reduce, all-gather for distributed parallel flows
66+
- All of these have a large impact on the network
67+
- 3 Phases (Sync, Compute, Communicate)
68+
- Periodic bursts of large traffic (elephant flow problem)
69+
- Rail network is used to build a GPU -> GPU topology (ultra ethernet)
70+
- Consider Intra and Inter GPU server comms
71+
- Inter GPU uses RoCEv2
72+
- Physical (Modular AI clusters)
73+
- SU (Scalable unit) aka Stripe
74+
- Line rate across the whole cluster
75+
- 32 x GPU Servers (other configs available)
76+
- 3 x Mgmt TOR
77+
- 2 spine 3 leaf
78+
- Logical Networks
79+
- Interconnect
80+
- In-band
81+
- Out-of-band
82+
- Backend GPU
83+
- Storage
84+
- Considerations
85+
- RDMA primary traffic
86+
- Lossless RoCEv2 -> UEC
87+
- Getting to lossless ethernet
88+
- ECN + ECN Bits
89+
- PFC (microbursts) 802.1Qbb
90+
- Pause frames
91+
- Storage
92+
- Checkpointing to compensate for GPU failures
93+
- Each SU has a dedicated storage fabric (clos)
94+
- NVMe-oF
95+
96+
## Nodal/ys
97+
98+
How geopolitics is rerouting submarine cables
99+
100+
- Hyperscalers beginning to owning cables outright
101+
- US/China
102+
- PLCN, BtoBE, MEA-ME-WE6, SJC2
103+
104+
# Cloudflare
105+
106+
Beyond the Firewall
107+
108+
- Sarah Armstong-smith books
109+
- Human hacking
110+
- Social Engineering
111+
- https://radar.cloudflare.com
112+
- No blame culture
113+
- Move beyond generic annual training, focus on the why and make it relevant
114+
- Know your business, communicate with employees
115+
- What are they doing and need to do (no workarounds/shadow IT)
116+
- Project Zero Trust - George Finney
117+
- Flipper
118+
119+
## Kentik
120+
121+
The Scourge of Excessive AS-Sets
122+
123+
- AS-SET in IRR
124+
- bgp4q to build BGP filter lists on AS-SETs
125+
- Check the authoritative source (APNIC/RIPE/etc.) of the AS-SET in PeeringDB
126+
- When making your AS-SET
127+
- Remove recursive AS-SETs where possible
128+
- Otherwise keep to minimum
129+
- Long term solution
130+
- In-band BGP signaling (RFC 9234)
131+
- RPKI-based signaling using ASPA verification
132+
- Future RPKI extensions
133+
134+
## BGP Tools
135+
136+
How far can you go with IX route servers only?
137+
138+
- IX Route servers
139+
- Solves the problem of reaching out to individuals for peering at IXPs
140+
- Generally far safer than bi-lat peering
141+
- Cumulative IX peers diminishing returns
142+
- map.bgp.tools
143+
- CGNAT /24's generally don't respond to pings but accounts for lots of traffic
144+
145+
## 5G Networks
146+
147+
BFD going down, from BGP timers expiring?
148+
149+
- JTAC Bug and process of case
150+
151+
## Telair
152+
153+
A smaller NBN rollout
154+
155+
- NBN NNI attached to EVPN PWE
156+
- Terminated on Juniper BNG
157+
158+
159+
# Day 2
160+
161+
## Nokia
162+
163+
Quantum Technology
164+
165+
- Qubits / Superposition / Entanglement
166+
- 4 Qubit word is 2^4 (all possible permutations)
167+
- HNDL Attacks post Q-Day (RSA-2048 cracked in 24h)
168+
- Harvest data now, decrypt later
169+
- Symmetric Crypto in use today, either PSK or asymmetric Key Sharing
170+
- Move to AES256 & SHA512 for safety
171+
- Public Key Cryptography
172+
- Larger keys, one way Algos (RSA, DH, etc.)
173+
- Algos use large 1 way prime factors, reversing this is compute heavy
174+
- Shor's Algorithm
175+
- QKD may be used in future to share these keys
176+
- Otherwise PQC coming up with new Algorithm for key exchange
177+
- Mosca's theorem
178+
179+
## Leaptel
180+
181+
Mikrotiks doing CGNAT
182+
183+
- ~40Gb
184+
185+
## Juniper
186+
187+
Evolving Broadband Design in Australia
188+
189+
- Cloudified BGP network (Spine/Leaf for access + BNG for scaleout)
190+
- ISIS SRv4
191+
- EVPN-VPWS + ESI
192+
- BNG CUPS
193+
- DBNG-MP (Mgmt)
194+
- DBNG-CP (Ctrl)
195+
- DBNG-UP (User)
196+
- Allows for Local resiliency and Geo Redundancy designs
197+
- IPv6, QoS, Merchant, Automation
198+
199+
## Cisco
200+
201+
Transport Protocols Evolution
202+
203+
- MPLE TE Challenges
204+
- RSVP hard to state (large headend / midpoint)
205+
- Core device stats k*n^2
206+
- SR/SRv6
207+
- State is in packet
208+
- Eliminates LDP/RSVP
209+
- No Tunnel interfaces
210+
- Multi-domain with PCE/BSID
211+
- SRv6 goes back to OSI model due to IPv6 headers
212+
- 80-90% engineers used RSVP TE For FRR
213+
- Achieved in SR with Ti-LFA
214+
- Flex Algo (native steering of traffic of FA path)
215+
- Can use Metric / Constraints
216+
- On Demand Nexthop
217+
- HE automatically creates SR policy to BGP NH
218+
- Uses SR-PCE (when different HE and TE)
219+
- Can do per-flow
220+
221+
## APNIC Labs
222+
223+
Evolution of TCP Transport Protocols
224+
225+
- Speed evolving, plateaus recently
226+
- Optical Transmission outpacing TCP speed
227+
- Reno, increase conservatively but not as suitable these days for Gbit networks
228+
- Ramp up based on MSS / RTT, drop 50% on loss
229+
- Would take 3 years on a Tbit link
230+
- Cubic used on modern Linux
231+
- Non linear algorithim
232+
- Reacts quickly to capacity in network
233+
- Buffer bloat causes delay loops (queue never drains)
234+
- Small buffers also bad (can't utilize link)
235+
- BW * RTT / sqrt(N)
236+
- Turn on Pacing on servers
237+
- ECN (network should hit hte point of just beginning to buffer)
238+
- ECN not in use
239+
- TCP BBR tries to accomodate for this
240+
241+
## Telstra
242+
243+
Breaking the light barrier: Optical spectrum sharing
244+
245+
- WDM expensive to deploy
246+
- Providers now offering spectrum with users owning their transponders
247+
- Services (Wavelength / Spectrum sharing / Dark Fibre)
248+
249+
## AARNet
250+
251+
AARNet Network Architecture
252+
253+
- Automation
254+
- Model Driven approach (services, lifecycle, etc.)
255+
- Cisco NSO implementation
256+
- Northbound RESTCONF API
257+
- Multivendor, flexible (orchestrator or device can be master)
258+
- Service defined as YANG
259+
- YANG model then used to build device config template
260+
261+
## Vocus
262+
263+
Defending Telco Networks
264+
265+
- Essentials
266+
- Separate networks
267+
- No Generic credentials
268+
- Harden
269+
- MFA, complex passwords that are rotated
270+
- CLI audits
271+
- Jumphosts (MFA)
272+
- Zero Trust
273+
274+
## Arista
275+
276+
Comfortable Complexity of Overlays
277+
278+
- EVPN single service plane
279+
- Protocol reduction
280+
- Repeatable Model
281+
- Flexible multi-homing
282+
- Any Encapsulation
283+
- Converged teams
284+
- EVPN Gateway to stitch transport domains (i.e. dc to wan to campus)
285+
- RFC 9014
286+
- Type-5 used and readvertised using GW next-hop
287+

0 commit comments

Comments
 (0)