Skip to content

Orleans Clustering Fails with Multiple Instances in Azure App Service using a Linux Container #9293

Open
@KobusInversion

Description

@KobusInversion

I have the following setup with Orleans to allow for clustering. Using Docker locally, I can spin up two instances with the code below, and both instances join the cluster successfully.

builder.Host.UseOrleans((context, siloBuilder) =>
{
    siloBuilder.Services.AddSerializer(serializerBuilder => { serializerBuilder.ConfigureSerializer(); });  
     
    siloBuilder
        .ConfigureEndpoints(11111, 30000, listenOnAnyHostAddress: true)
        .UseAzureStorageClustering(opt =>
        {
            opt.TableName = "linuxOrleansSiloInstances";
            opt.TableServiceClient =
                new TableServiceClient(settings.TableStorageConfig.ConnectionString);
        })
        .Configure<ClusterOptions>(options =>
        {
            options.ClusterId = settings.ClusterConfiguration.ClusterId;
            options.ServiceId = settings.ClusterConfiguration.ServiceId;
        });
});

I want to host it in an Azure App Service within a Linux container. The first instance spins up correctly, but I start seeing the errors below when the second instance attempts to join the cluster.

Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to endpoint S169.254.129.15:11111:94653375. See InnerException
 ---> Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 169.254.129.15:11111. Error: ConnectionRefused
   at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 65

Here is my partial Dockerfile setup attempting to expose ports 11111 and 30000:

FROM mcr.microsoft.com/dotnet/aspnet:9.0 AS base
WORKDIR /app

EXPOSE 5000
EXPOSE 11111
EXPOSE 30000

# Build stage uses the .NET SDK for build
FROM mcr.microsoft.com/dotnet/sdk:9.0 AS build
WORKDIR /src

...

The only documentation I could find was here https://learn.microsoft.com/en-us/dotnet/orleans/deployment/deploy-to-azure-app-service#configure-host-networking, which mentions the following:

If deploying to Linux, ensure that your hosts are listening on all IP addresses as described in the Configure host networking section.
The above article is accurate when it comes to windows-based environment, which I have done before with success but seems when scaling out with a docker linux based environment these instructions don't work.

What configuration am I missing to enable instance communication and ensure proper silo coordination?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions