Docs
Visit Sundeck.io
Add-on

High Availability

Overview

Many organizations have requirements to run services in a high-availability configuration. This means that if a single instance of a service fails, another instance will transparently take over to provide service and clients will not experience any interruption of service. Singular services that are not highly available are known as a “single point of failure”.

A single Sundeck Private Broker instance can handle a large number of concurrent user sessions and queries (up to 1000s of concurrent sessions), and, as such, we recommend starting with a single standalone instance of the Private Broker.

When running in a production-ready setup, you may choose to run more than one instance of the Private Broker. Multiple instances of the Private Broker can be deployed in concert with an off-the-shelf HTTP load balancer. This deployment model avoids a single Private Broker from being a single point of failure.

Load Balancer Requirements

The Sundeck Private Broker maintains some state in-memory as clients execute queries. As such, the Private Broker requires a load balancer which can provide the following:

  1. Sticky-sessions: if a client executes a query via one Private Broker instance, the load balancer should ensure that all future queries from that client are also executed on the same Private Broker instance.
  2. Client-IP Stickiness: sticky-sessions are implemented by the load-balancer using the client’s IP address.

There are many load balancers, freely or commercially available, which satisfy these requirements.

HTTP Cookie-based sticky-sessions are not supported by some Snowflake clients. As such, sticky sessions that rely on HTTP cookies are ineffective and will result in each query being randomly routed to different Private Broker instances.

TLS Termination

Most load balancer configurations will terminate the TLS connection and forward HTTP requests to Private Brokers. This is not a requirement of the load balancer, but it is a common deployment configuration. If the load balancer does not terminate the TLS connection, each Private Broker instance will need to be configured with its own TLS certificate and corresponding hostname and the Load Balancer will be responsible for re-negotiating the TLS connection between itself and the Private Broker.

Client IP Translation

Be aware of other load balancers and network-address-translations (NAT) that occur between your clients and the load balancer in front of the Private Broker. If the client’s IP address is not preserved, all clients will appear to be coming from the same IP Address which results in all clients being routed to the same Private Broker instance.

Before deploying production workloads to Private Broker, ensure that client requests are evenly spread across all Private Broker instances. This can be done by inspecting the CPU usage of each Private Broker instance and confirming that CPU load is generally uniform.

Examples

Most HTTP load balancers support sticky-sessions based on the client’s IP address. Below are some specific examples of load balancers that are known to work with the Sundeck Private Broker.

RedHat OpenShift

A Route in Red Hat OpenShift is how an application running on OpenShift is exposed to clients running outside of OpenShift. The Route is configured using annotations on the Route you created in OpenShift.

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    haproxy.router.openshift.io/balance: source
    
    # ...
spec:
  host: sundeck-private-broker.apps.company.com
  port:
    targetPort: 8080-tcp
  tls:
    insecureEdgeTerminationPolicy: Redirect
    termination: edge
  to:
    kind: Service
    name: broker
    weight: 100
  wildcardPolicy: None

In the above example, we rely on OpenShift’s ability to terminate the TLS connection and then route the request to one of the PrivateBroker Pods running in the broker Service. The haproxy.router.openshift.io/balance: source annotation tells OpenShift to use the client’s IP address for sticky-sessions. Snowflake clients can use the host sundeck-private-broker.apps.company.com to connect to the Private Broker.

HAProxy

HAProxy is a well-known open-source load balancer that supports sticky-sessions based on the client’s IP address.

The following is an example HAProxy configuration that listens on port 8080 and forwards requests to two Private Broker instances which are running on broker1.svc.internal and broker2.svc.internal also listening on port 8080.

defaults
  mode http
  timeout connect 5s
  timeout client 30s
  timeout server 50s

frontend http-in
  bind *:8080
  default_backend sundeck-brokers

backend sundeck-brokers
  balance roundrobin

  # Store up to 1M ip addresses, TTL of 1hour
  stick-table type ip size 1m expire 1h
  # Add source IP to the stick table
  stick on src

  # GET /heartbeat as the health check
  option httpchk GET /heartbeat

  # Run health checks every 5 seconds
  server broker1 broker1.svc.internal:8080 check inter 5s
  server broker2 broker2.svc.internal:8080 check inter 5s

Private Broker Fail-over

In the event that a Private Broker instance stops running, most load balancers will automatically detect this and stop routing new requests to that instance. When this happens, most load balancers will automatically re-compute the mapping from client to Private Broker. This means that a client may be directed to a new Private Broker even if the Private Broker instance they were previously using is still running.

The state that each Private Broker maintains for each Snowflake client is known as the “Sundeck Session”. The Private Broker transparently re-generates the Sundeck Session when it is missing; however, there is a small latency penalty (~100ms) when a client is transitioned to a different Private Broker instance. As such, it is not recommended to frequently add/remove Private Broker instances.

By default, the Sundeck Session is retained for 1 hour in the Private Broker. If a client does not execute any queries for one hour, the Private Broker will automatically re-create the Sundeck Session and then execute the client’s query. This is sufficient for most customers. Please contact Sundeck Support if you need to re-configure this timeout.