Skip to content

Commit

Permalink
ipsec: libreswan: Reduce chances for crossing streams.
Browse files Browse the repository at this point in the history
Normally ovs-monitor-ipsec will start all the connections it manages.
This is required, because we do not generally know if the other side of
the tunnel is going to initiate the IPsec connection or not.
For example, the other side might not belong to an OVS setup, so it may
not be managed by the other instance of ovs-monitor-ipsec.  There are
also issues in Libreswan that may cause the other side to fail the
connection initiation in a way that it will not try again.

However, in many cases the other side is managed by ovs-monitor-ipsec.
And in that scenario there is a high chance the both sides will try
to initiate the connection at the same time.  This is known as
crossing streams.  Unfortunately, Libreswan, 4.x in particular, doesn't
handle this well and either crashes or ends up in a state where
connections reported as active, but no traffic can actually go through.

For tunnels, where we create separate incoming and outgoing connections
(geneve), we may start (add + up) the outgoing connection and only add
the incoming one.  This would give the other side some time to initiate,
avoiding the crossing streams and giving Libreswan a higher chance to
survive.

We still have to try to bring the incoming connections up at some point
if they do not become active.  Reconciliation logic will take care of
this.  Next time we check the active connections, we'll try to reconcile
and will bring all the loaded but not active connections up.  So, we're
loosing at most 15 seconds if something goes wrong.

This change greatly improves stability with Libreswan 4.x.  It's still
not enough to enable the ping test for it, but hopefully enough for
real world setups to not hit the Libreswan issues often.

GRE connections will still be started from both sides.  We do already
have some issues in case users name their tunnels with -in- or -out-
in the name, so it's not a new problem, but if the regex accidentally
matches on such a GRE tunnel, we'll again loose at most 15 seconds
before they will be brought up during reconciliation.  So, should not
be a big deal.

Note: ipsec auto in Libreswan < 5 accepts --asynchronous together with
--add, even though the --asynchronous flag is only for up/down/start,
but Libreswan 5 fails the command, so we need to add it conditionally.

Acked-by: Eelco Chaudron <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
  • Loading branch information
igsilya committed Nov 3, 2024
1 parent 12596c2 commit 7d7e001
Showing 1 changed file with 12 additions and 3 deletions.
15 changes: 12 additions & 3 deletions ipsec/ovs-monitor-ipsec.in
Original file line number Diff line number Diff line change
Expand Up @@ -675,8 +675,16 @@ conn prevent_unencrypted_vxlan
active.discard(conn)

for conn in desired:
vlog.info("Starting ipsec connection %s" % conn)
self._start_ipsec_connection(conn, "start")
# Start (add + up) outgoing connections and only add
# incoming ones. If the other side will not initiate
# the connection and it will not become active, we'll
# bring it up during the next refresh.
if re.match(r".*-in-\d+$", conn):
vlog.info("Adding ipsec connection %s" % conn)
self._start_ipsec_connection(conn, "add")
else:
vlog.info("Starting ipsec connection %s" % conn)
self._start_ipsec_connection(conn, "start")
else:
# Ask pluto to bring UP connections that are loaded,
# but not active for some reason.
Expand Down Expand Up @@ -827,11 +835,12 @@ conn prevent_unencrypted_vxlan
"--delete", conn], "delete %s" % conn)

def _start_ipsec_connection(self, conn, action):
asynchronous = [] if action == "add" else ["--asynchronous"]
ret, pout, perr = run_command([self.IPSEC, "auto",
"--config", self.IPSEC_CONF,
"--ctlsocket", self.IPSEC_CTL,
"--" + action,
"--asynchronous", conn],
*asynchronous, conn],
"%s %s" % (action, conn))

if re.match(r".*[F|f]ailed to initiate connection.*", pout):
Expand Down

0 comments on commit 7d7e001

Please sign in to comment.