Skip to main content

How to maintain Session Persistence (Sticky Session) in Docker Swarm

How to maintain Session Persistence(Sticky Session) in Docker Swarm with multiple containers

Introduction

Stateless services are in vogue and rightfully so as they are easy to scale up and are loosely coupled. However, it is practically impossible to stay away from stateful services completely. For example, say you might need a login application where user session details need to be maintained across several pages.

Session state can be maintained either using

  • Session Replication
  • Session Stickiness

or a combination of both.

 

Maintaining a user session is relatively easy if you are using a typical monolithic architecture where your application is installed on a couple of servers and you can change the configuration in servers to facilitate session replication using some cache mechanism or session stickiness using a load balancer/reverse proxy.

However, In the case of Microservices, where the scale can be as large from 10 to 10000’s instances the session replication might slow up things as each and every service need to look up at the centralised cache to get session information.

The other approach Session Stickiness where each following request should keep going to the same server( Docker container)  and hence preserving the session will be looked at in this article.

Why session persistence is hard to maintain with containers

Load balancer typically works on Layer 7 OSI model, the application layer (HTTP protocol at this layer) and then distributes the data across multiple machines, but Docker ingress routing mesh works at level 4 in OSI layer.

Someone in StackOverflow has summarized the solution for above problem as- To implement sticky sessions, you would need to implement a reverse proxy inside of docker that supports sticky sessions and communicates directly to the containers by their container id (rather than doing a DNS lookup on the service name which would again go to the round robin load balancer). Implementing that load balancer would also require you to implement your own service discovery tool so that it knows which containers are available.

Possible options explored

Take -1

So I tried implementing the reverse proxy with Nginx and it worked with multiple containers on a single machine but when deployed on Docker Swarm it doesn’t work probably because I was using the service discovery by name and as suggested above, I should use containerId to communicate and not container names.

Take -2

Read about the Jwilder Nginx proxy which works for everyone and it worked on my local but when deployed on Swarm it won’t generate anything any container IP’s inside the upstream{server}

Take -3

Desperate enough by this time I was going through all possible solutions people have to offer about on the internet (stack overflow, Docker community forums..) and one gentleman has mentioned something about Traefik. Eyes glittered when I read that it works on SWARM and here I go.

Sticky Session with Traefik in Docker Swarm with multiple containers

Even though  I was very comfortable with Nginx and assumed that learning Traefik will again be an overhead. It wasn’t the case Traefik is simple to learn and easy to understand and good thing is that you need not fiddle with any of the conf files.

The only constraint is that Traefik should run on manager node

I have tested the configuration with Docker compose version 3 which is the latest and deployed using Docker stack deploy

To start off you need to create a docker-compose.yml (version 3) and add the load balancer Traefik Image. This is how it looks like

Few things to note here

  • Traefik listens to Docker daemon on manager node and keeps aware of new worker nodes, so there is no need to restart if you scale your services.
    volumes: – /var/run/docker.sock:/var/run/docker.sock
  • Traefik provides a dashboard to check the worker nodes health so port 9090 can be kept inside a firewall for monitoring purpose.
    Also, note that placement: constraints: [node.role == manager]  specifies that traefik run only on manager node.

Adding the Image for sticky session

To add a Docker Image which will hold session stickyness we need to add something like this

This is a hello world image which displays the container name its running on. We are defining in this file to have 5 replicas of this container. The important section where traefik does the magic is in “labels”

  • - "traefik.docker.network=test_net" Tells on which network this image will run on. Please note that the network name is test_net, where test is the stack name. In the load balancer service we just gave net as name.
  • - "traefik.port=80" This Helloworld is running on docker port 80 so lets map the traefik port to 80
  • - "traefik.frontend.rule=PathPrefix:/hello"All URLs starting with {domainname}/hello/ will be redirected to this container/application
  • - "traefik.backend.loadbalancer.sticky=true" The magic happens here, where we are telling to make sessions sticky.

The Complete Picture

Try to use the below file as it is and see if it works, if it does then fiddle with it and make your changes accordingly.

You will need to create a file called docker-compose.yml on your Docker manager node and run this command
docker stack deploy -c docker-compose.yml test wher the “test” is the namespace.

Read Here about deploying in Swarm: How to Install Stack of services in Docker Swarm

Now you can test this service by http://{Your-Domain-name}/hello and http://{Your-Domain-name}:9090 should show us a Traefik dashboard.

Though there are 5 replicas of above “whoami” service, it should always display the same container ID. If it does congratulations your session peristence is working.

This is how the dashboard of Traefik looks like

Testing session stickness in local machine

In case you don’t have a swarm node and just want to test it on your localhost machine. You can use the following docker-compose file. To run successfully create a directory called test( required for namespace, as we have given our network name as  test_net - "traefik.docker.network=test_net", change the directory name if you have different network) and run
docker-compose up -d

Docker-Compose should create the required services and whoami service should be available on http://localhost/hello.

Scale say this service to 5 docker-compose scale whoami=5 and test

Follow this Video to see things in action

Related Posts

17 thoughts on “How to maintain Session Persistence (Sticky Session) in Docker Swarm

  1. Hi! Great article! I’ve tried following your steps, but it doesn’t seem to work for me.
    I’m trying to run 3 replicas of GOGS (git repo) and maintain sessions

    I used the following:

    services:
    gogs:
    image: registry:5000/gogs:latest
    deploy:
    replicas: 3
    restart_policy:
    condition: any
    labels:
    – “traefik.docker.network=gogs_net”
    – “traefik.port=3000”
    – “traefik.frontend.rule=PathPrefix:/git;”
    – “traefik.backend.loadbalancer.sticky=true”
    volumes:
    – /nfs/apps/gogs:/gogs:rw
    – /nfs/apps/gogs/data/repos:/repos:rw
    – /nfs/apps/gogs/data/db:/db:rw
    – /nfs/apps/gogs/custom/conf:/data/gogs/conf:rw
    ports:
    – “10022:22”
    – “3000:3000”
    networks:
    – net
    loadbalancer:
    image: registry:5000/traefik
    command: –docker \
    –docker.swarmmode \
    –docker.watch \
    –web \
    –loglevel=DEBUG
    ports:
    – “80:3000”
    – “9090:8080”
    volumes:
    – /var/run/docker.sock:/var/run/docker.sock
    deploy:
    restart_policy:
    condition: any
    mode: replicated
    replicas: 1
    update_config:
    delay: 2s
    placement:
    constraints: [node.role == manager]
    networks:
    – net
    networks:
    net:

    But no luck…

    If I only run a stack with GOGS using the following docker-compose file, i’m able to access the GIT Repo:

    ################ START docker-compose.yml ################
    version: “3”
    services:
    gogs:
    image: registry:5000/gogs:latest # This specifies the image on our private registry.
    deploy:
    replicas: 3 # This generates three replicas.
    volumes: # We map various folders from the NFS share on the HOST to the CONTAINER.
    – /nfs/apps/gogs:/gogs:rw
    – /nfs/apps/gogs/data/repos:/repos:rw
    – /nfs/apps/gogs/data/db:/db:rw
    – /nfs/apps/gogs/custom/conf:/data/gogs/conf:rw
    ports: # We map HOST ports with CONTAINER ports.
    – “10022:22”
    – “80:3000”
    ################ END docker-compose.yml ################

    Any idea/suggestion?

    Thanks in advance!

    1. Can you please check couple of things

      • When you are deploying stack make sure that you use this command
        docker stack deploy -c docker-compose.yml gogs. Make sure the namespace is gogs.
      • Noticed that you are using networks for service gogs, it is not needed, please remove these lines from gogs service
        networks:
        – net
        (REMOVE ONLY FOR GOGS and NOT FOR LOADBALANCER)
        Network is already defined here – “traefik.docker.network=gogs_net”
      • What happens if you run the docker-compose.yml which has been provided as an example.
      1. Hello Abhi,

        Thank you very much for the response!

        Responding to your qestions:
        – I do use the namespace “gogs” when deploying the stack.

        – I’ve removed the “network” lines from the gogs service:

        I had it in there because in your examples the “hello-world” services does also include it…

        ######## START docker-compose.yml ########
        version: “3”
        services:
        gogs:
        image: gogs/gogs
        deploy:
        replicas: 3
        restart_policy:
        condition: any
        labels:
        – “traefik.docker.network=gogs_net”
        – “traefik.port=3000”
        – “traefik.frontend.rule=PathPrefix:/git;”
        – “traefik.backend.loadbalancer.sticky=true”
        volumes:
        – /nfs/apps/gogs:/gogs:rw
        – /nfs/apps/gogs/data/repos:/repos:rw
        – /nfs/apps/gogs/data/db:/db:rw
        – /nfs/apps/gogs/custom/conf:/data/gogs/conf:rw
        ports:
        – “10022:22”
        – “3000:3000”
        loadbalancer:
        image: traefik
        command: –docker \
        –docker.swarmmode \
        –docker.watch \
        –web \
        –loglevel=DEBUG
        ports:
        – “80:3000”
        – “9090:8080”
        volumes:
        – /var/run/docker.sock:/var/run/docker.sock
        deploy:
        restart_policy:
        condition: any
        mode: replicated
        replicas: 1
        update_config:
        delay: 2s
        placement:
        constraints: [node.role == manager]
        networks:
        – net
        networks:
        net:
        ######## END docker-compose.yml ########

        Doing this, I’m able to get to the Traefik dashboard, which I wasn’t before. But trying to access gogs through “http://host:80” or “http://host:3000” still doesn’t work… Looks like I’m getting a “connection refused error”.

        – Running the docker-compose file provided as example I can get to the dashboard but it just half loads and doesn’t show the lower part (only header bar). And going to “http://host/” or “http://host/hello” I only get an 404 Error.

        Kind Regards,
        Kevin

        1. Hi Kevin,
          Apologies, I overlooked the -networks tag yes it should be there so put it back.
          Lets try to fix the example compose in your machine and then your gogs service .
          Is your tutum/hello-world image getting downloaded ? Can you see any errors while running the stack?
          Can you check following things for me
          1> docker service ls ( Whats the output for this ?)
          2> docker service ps test_whoami ( Output for this, you need to run “docker stack deply -c docker-compose.yml test”)
          3> docker node ls ( Just to check if you are on swarm and manager node setup)
          4> Traefik Dashboard, does it show any backends ?
          4>

  2. Hello Abhi,

    I’m using a private registry and I think the image was not being downloaded, when deploying the stack. The containers were pending. I manually pulled the image from the private registry on each node and then they changed into “running” state. I also removed the following line from the “hello-world” service:

    placement:
    constraints: [node.role == worker]

    Because at this moment all my nodes are managers.

    This way now I got the following “docker-compose.yml” file which works. Traefik is accessible and shows frontend and backend, the whoami containers are running fine, and I can navigate to “http://host/hello” which displays the container ID and refreshing does show the same ID:

    version: “3”
    services:
    whoami:
    image: registry:5000/hello-world
    networks:
    – net
    ports:
    – “80”
    deploy:
    restart_policy:
    condition: any
    mode: replicated
    replicas: 3
    update_config:
    delay: 2s
    labels:
    – “traefik.docker.network=test_net”
    – “traefik.port=80”
    – “traefik.frontend.rule=PathPrefix:/hello;”
    – “traefik.backend.loadbalancer.sticky=true”
    loadbalancer:
    image: traefik
    command: –docker \
    –docker.swarmmode \
    –docker.watch \
    –web \
    –loglevel=DEBUG
    ports:
    – 80:80
    – 9090:8080
    volumes:
    – /var/run/docker.sock:/var/run/docker.sock
    deploy:
    restart_policy:
    condition: any
    mode: replicated
    replicas: 1
    update_config:
    delay: 2s
    placement:
    constraints: [node.role == manager]
    networks:
    – net
    networks:
    net:

    – running docker service ls

    ID NAME MODE REPLICAS IMAGE PORTS
    6cp3z55e9ws5 test_whoami replicated 3/3 registry:5000/hello-world:latest *:0->80/tcp
    e2vd98781itt test_loadbalancer replicated 1/1 traefik:latest *:80->80/tcp,*:9090->8080/tcp
    xrdivswie93l visualizer_visualizer replicated 1/1 registry:5000/visualizer:latest *:8080->8080/tcp

    – running docker node ls

    ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
    j4m25tolkaamj2jr263qoqu56 * e02dkr01 Ready Active Reachable
    v8dswzlyj4f499k12rkrivbcv mars Ready Active Reachable
    vtsk3j9jbmg7tdqnhoqnois89 e02dkr02 Ready Active Leader

    So the Whoami example seems to work….

      1. P.D, as per the private registry, I’m using an internal self-hosted private registry, instead of a private registry on docker hub 🙂

  3. Hi Abhi,

    So the below is the docker-compose file I currently have.

    – Traefik is accessible and shows backend and frontends.

    – Hello world is accessible, but every refresh the container ID changes… It was “sticky” before, why not now if nothing has changed on this service?

    – GOGS is accessible on “http://host1/”, “http://host2/” and “http://host3” (where each host is a node of the swarm). If I use “http://host1/” and login, I can refresh as many times as I want, it won’t log me out, and everything works fine. If I then go to “http://host2/” I have to re-login, same with host3. Once logged in, I’m not logged out and can switch between hosts (I suppose it stores the cookies for each of them). If I remove or add replicas, I have to re-login doesn’t matter which host I use…

    So basically right now it doesn’t seem to be sticky for some reason, but GOGS seems to work fine somehow…

    version: “3”
    services:
    whoami:
    image: registry:5000/hello-world
    networks:
    – net
    ports:
    – “80”
    deploy:
    restart_policy:
    condition: any
    mode: replicated
    replicas: 3
    update_config:
    delay: 2s
    labels:
    – “traefik.docker.network=test_net”
    – “traefik.port=80”
    – “traefik.frontend.rule=PathPrefix:/hello;”
    – “traefik.backend.loadbalancer.sticky=true”
    gogs:
    image: registry:5000/gogs
    deploy:
    replicas: 1
    restart_policy:
    condition: any
    labels:
    – “traefik.docker.network=test_net”
    – “traefik.port=3000”
    – “traefik.frontend.rule=PathPrefix:/;”
    – “traefik.backend.loadbalancer.sticky=true”
    volumes:
    – /nfs/apps/gogs:/gogs:rw
    – /nfs/apps/gogs/data/repos:/repos:rw
    – /nfs/apps/gogs/data/db:/db:rw
    – /nfs/apps/gogs/custom/conf:/data/gogs/conf:rw
    networks:
    – net
    ports:
    – “10022:22”
    – “3000:3000”
    loadbalancer:
    image: registry:5000/traefik
    command: –docker \
    –docker.swarmmode \
    –docker.watch \
    –web \
    –loglevel=DEBUG
    ports:
    – 80:80
    – 25581:3000
    – 9090:8080
    volumes:
    – /var/run/docker.sock:/var/run/docker.sock
    deploy:
    restart_policy:
    condition: any
    mode: replicated
    replicas: 1
    update_config:
    delay: 2s
    placement:
    constraints: [node.role == manager]
    networks:
    – net
    networks:
    net:

    1. Even I have faced similar problem where I had 2 images which I wanted to be sticky but it was giving issues
      Can you add the following rule in labels and try
      – “traefik.frontend.priority=2” for whoami and

      – “traefik.frontend.priority=1” for gogs

      If it doesn’t fix the problem take the “whoami” service out you don’t need that anyway. I think there is some problem when you use- “traefik.frontend.rule=PathPrefix:/” and – “traefik.frontend.rule=PathPrefix:/whatever***” with sticky session.

      If you remove the label “- “traefik.frontend.rule=PathPrefix:/” from gogs service.I believe your whoami service will again be sticky.

      I had asked in Traefik support channel on this but no one replied, feel free to raise an issue.

      1. Hi Abhi,

        I’ve tried with the priority but nothing changed.

        I’ve also tried removing the whoami service, but am not able to get gogs running and accessible through http://host/git

        The app is configured to listen on port 3000, and it’s default root is http://host:3000/git

        I set it up in the following way below, but am not able to access GOGs at all…

        version: “3”

        services:

        gogs:
        image: registry:5000/gogs
        deploy:
        replicas: 3
        restart_policy:
        condition: any
        labels:
        – “traefik.docker.network=gogs_net”
        – “traefik.port=3000”
        – “traefik.frontend.priority=1”
        – “traefik.frontend.rule=PathPrefix:/;”
        – “traefik.backend.loadbalancer.sticky=true”
        volumes:
        – /nfs/apps/gogs:/gogs:rw
        – /nfs/apps/gogs/data/repos:/repos:rw
        – /nfs/apps/gogs/data/db:/db:rw
        – /nfs/apps/gogs/custom/conf:/data/gogs/conf:rw
        networks:
        – net
        ports:
        – “10022:22”
        – “3000”

        loadbalancer:
        image: registry:5000/traefik
        command: –docker \
        –docker.swarmmode \
        –docker.watch \
        –web \
        –loglevel=DEBUG
        ports:
        – “80:3000”
        – “9090:8080”
        volumes:
        – /var/run/docker.sock:/var/run/docker.sock
        deploy:
        restart_policy:
        condition: any
        mode: replicated
        replicas: 1
        update_config:
        delay: 2s
        placement:
        constraints: [node.role == manager]
        networks:
        – net

        networks:
        net:

        Any idea on what could I be missing?

        1. Is there any error you are getting when you do
          docker service ps
          What about exposing port 3000 of the gogs on the dockerhost machine too like 3000:3000, that way you can test by http://host:3000/app and see if the application is fine.

          1. Hello Abhi,

            No, there is no error whatsoever when running docker service ps. All services are running without issues supposedly.

            I’ve been doing some testing, and the bellow docker-compose file is what I have.

            – As you can see, I’ve removed – “traefik.backend.loadbalancer.sticky=true” from the whoami service. If I leave the this label for both services, I can login to GOGS but every refresh logs me out and I have to re-login. Without this label in the whoami service, going to http://host1/hello shows a different container ID every time (as expected), but at least going to http://host1/ shows me GOGS, and I can login without issues and refresh. If I go to http://host2/ or http://host3/ I have to re-login once, but it seems that for each hostname used, the sessions are preserved.

            – I don’t know why, but if I remove/comment out the whoami service from the docker-compose file, and deploy it, GOGS is not accessible. I don’t understand where they relate. If I remove one, the other should work fine, no?

            ################# DOCKER-COMPOSE START ####################

            version: “3”
            services:
            whoami:
            image: registry:5000/hello-world
            networks:
            – net
            ports:
            – “80”
            deploy:
            restart_policy:
            condition: any
            mode: replicated
            replicas: 3
            update_config:
            delay: 2s
            labels:
            – “traefik.docker.network=gogs_net”
            – “traefik.port=80”
            – “traefik.frontend.priority=2”
            – “traefik.frontend.rule=PathPrefix:/hello;”
            gogs:
            image: registry:5000/gogs
            deploy:
            replicas: 3
            restart_policy:
            condition: any
            labels:
            – “traefik.docker.network=gogs_net”
            – “traefik.port=3000”
            – “traefik.frontend.priority=1”
            – “traefik.frontend.rule=PathPrefix:/;”
            – “traefik.backend.loadbalancer.sticky=true”
            volumes:
            – /nfs/apps/gogs:/gogs:rw
            – /nfs/apps/gogs/data/repos:/repos:rw
            – /nfs/apps/gogs/data/db:/db:rw
            – /nfs/apps/gogs/custom/conf:/data/gogs/conf:rw
            networks:
            – net
            ports:
            – “10022:22”
            – “3000”
            loadbalancer:
            image: registry:5000/traefik
            command: –docker \
            –docker.swarmmode \
            –docker.watch \
            –web \
            –loglevel=DEBUG
            ports:
            – 80:80
            – 3000:3000
            – 9090:8080
            volumes:
            – /var/run/docker.sock:/var/run/docker.sock
            deploy:
            restart_policy:
            condition: any
            mode: replicated
            replicas: 1
            update_config:
            delay: 2s
            networks:
            – net
            networks:
            net:

          2. Hi,
            Glad that your GOGs service is working fine with session persistence. The reason it won’t work with WHOAMI service is probably that trafeik uses a cookie called “traefik_backend” to store the container IP. I think in the case of 2 services using “sticky=true” it overrides the proxy value with the new container.
            May be trafeik team needs to consider this scenario and may be writing multiple cookies if sticky=true for multiple services.

          3. Yeah, maybe.

            But why does GOGS stop working if I remove the whoami service? That doesn’t make any sense for me…

          4. True doesnt make any sense, did you tried removing priority from labels from gogs and taking down whoami service.

          5. That worked! Removing the priority labels and the whoami service, GOGS works fine!

            So seems like there is some issue with Traefik and multiple services + priority labels…

  4. Hi Abhi, I am trying to implement load balancing while maintain session stickyness for docker containers deployed in a swarm. I am unable to reach specific containers when I search for http://myip/containername : for my case it is 10.244.102.243/TestManager

    Heres my docker-compose file :
    version: ‘3’
    services:
    test_manager:
    image: 10.244.102.10:5000/testmanager
    networks:
    – net
    deploy:
    mode: replicated
    replicas: 3
    restart_policy:
    condition: on-failure
    delay: 5s
    max_attempts: 3
    window: 120s
    labels:
    – “traefik.docker.network=autoframework_net”
    – “traefik.port=80”
    – “traefik.frontend.rule=Host:10.244.102.243; PathPrefix:/TestManager”
    – “traefik.backend.loadbalancer.sticky=true”
    ports:
    – “8080”

    loadbalancer:
    image: traefik
    command: –docker \
    –docker.swarmmode \
    –docker.watch \
    –web \
    –loglevel=DEBUG
    ports:
    – 80:80
    – 9090:8080
    volumes:
    – /var/run/docker.sock:/var/run/docker.sock
    deploy:
    restart_policy:
    condition: any
    mode: replicated
    replicas: 1
    update_config:
    delay: 2s
    placement:
    constraints: [node.role == manager]
    networks:
    – net

    networks:
    net:

Leave a Reply

Your email address will not be published. Required fields are marked *

Bitnami