고가용성 페어 (바이너리 스타 패턴)

바이너리 스타 패턴은 Primary 백업에 대응하는 2개의 서버를 이용하여 신뢰성을 높입니다. 어떤 시점에서는 한 개의 서버 (Active)가 클라이언트로부터의 접속을 입력 받으며, 또 한편(비 Active)은 아무것도 하지 않습니다. 하지만 이 2개의 서버는 서로 감시하고 있습니다. 네트워크 상에서 Active서버가 접근할 수 없게 되면 곧바로 비 Active서버가 Active 서버의 역활을 이어 받습니다.

바이너리 스타 패턴은 다음과 같이 설계되어 있습니다.

단순한 고가용성 솔루션을 제공한다
간단히 이해할 수 있어서 쉽게 사용할 수 있을 것.
필요한 경우에만 Fail-Over한다

바이너리 스타 패턴에서는 이하의 패턴으로 Fail-Over가 발생합니다.

Primary 서버에 치명적인 문제 (폭발, 화재, 전원이 뽑히거나 등)가 발생한 경우. 애플리케이션은 그것을 확인하고 백업 서버로 재접속을 실행한다.
Primary 서버에 있는 네트워크 세그먼트에서 장해가 발생한 경우. 아마도 라우터가 과부하가 되었지만, 애플리케이션은 백업 서버에 재접속을 실행하게 된다.
Primary 서버가 충돌된 경우 또는 재기동을 실행하여 자동적으로 기동하지 않은 경우.

Failover에서 복구하기 위해서는 아래의 일을 실행합니다.

Primary 서버를 기동하여 네트워크로부터 노출되도록 합니다.
일시적으로 백업 서버를 정지합니다. 이것에 의해 애플리케이션에서 접속이 끊깁니다.
애플리케이션이 Primary 서버에 재접속하는 것을 확인하고, 백업 서버를 기동합니다.

Failover에서의 복구는 수동으로 진행합니다. 복구를 자동적으로 실행하는 것이 좋지 않다는 것을 우리들은 이미 경험해본 일입니다. 이것에는 아래의 이유가 있습니다.

Failover는 아마도 10~30초의 서비스 정지를 발생시킵니다. 그리고 복구에도 같은 시간이 걸립니다. 이것은 사용자가 적은 시간대에 실행하는 것이 좋습니다.
긴급사태에 빠진 경우, 가장 중요한 것은 확실히 복구시키는 것입니다. 자동복구를 실행하더라도 시스템 관리자의 이중체크 없이는 복구의 확인을 할 수 없습니다.
만일 일시적인 네트워크 장해로 인해 Failover가 일어나 자동복구를 실행한 경우 서비스 중지의 원인을 특정하는 것이 어려워집니다.

하지만 바이너리 스타 패턴에서는 Primary 서버에 장해가 발생하여 그 후 백업 서버도 장해가 발생하게 되면 사실상 자동복구한 형태가 됩니다.

바이너리 스타 패턴을 Shutdown시키기 위해서는 아래의 방법이 있습니다.

우선 비 Active 서버를 정지하여, 그 후 Active서버를 정지한다.
2개의 서버를 거의 동시에 정지한다.

Active 서버를 정지하여 시간을 두어 비 Active서버를 정지한 경우, 애플리케이션은 절단, 재접속, 절단이라고 하는 동작이 되어 사용자가 혼란에 빠지게 됩니다.

상세한 요건

바이너리 스타 패턴은 가능한 단순하게 동작합니다. 고가용성 아키텍쳐에서는 이하의 요건을 만족할 필요가 있습니다.

Failover는 하드웨어 장해, 재난 등의 중대한 시스템 장해에 대한 보험입니다. 일반적인 장해에서 복구하기 위한 방법은 기존에 학습한 형태로 단순한 방법이 있습니다.
Failover에 필요한 시간은 60초 이하여야만 하며, 가능하면 10초 이하가 바람직합니다.
Failover는 자동으로 실행하지만, Failover에서의 복구는 수동으로 할 필요가 있습니다. 백업 서버로의 전환은 자동적으로 진행되어도 문제가 없지만, Primary 서버로의 전환은 문제가 수정되었는지 아닌지를 오퍼레이터가 확인하고 적절한 타이밍을 잡을 필요가 있습니다.
프로토콜은 개발자가 이해하기 쉽도록 단순하고, 간단하게 구성되어야 하며, 이상적으로는 클라이언트 API로 은닉하는 것이 좋습니다.
네트워크가 분단된 경우에 발생하는 Split-Brain문제 (일반적으로 클러스터화된 노드들이 일시적 동시 단절 현상이 발생하여, 서로가 Primary로 착각하게 되는 문제) 를 회피하기 위해 명확한 네트워크 설계 순서가 필요합니다.
서버를 기동하는 순서에 의존하지 않고 동작할 수 있도록 해야 합니다.
클라이언트가 정지하는 일 없이 (재접속은 발생하지만) 어떤 서버에도 정지한다거나 재기동을 실행할 수 있도록 해야합니다.
오퍼레이터는 항상 2개의 서버를 감시할 필요가 있습니다.
2개의 서버는 고속 네트워크 회선으로 접속되며, Failover는 특정의 IP 경로로 동기할 필요가 있습니다.

아래의 사례를 가정해보죠.

한 개의 백업 서버에서 충분한 보험이 있으며 복수의 백업 서버를 필요치 않습니다.
Primary 서버와 백업 서버는 각 1대에서 애플리케이션의 부하를 감당하도록 합니다. 이것들의 서버에서 부하분산 (Load-Balancing)하지 않도록 해주세요.
상시 아무것도 하지 않는 백업 서버를 동작시키기 위한 예산을 확보하세요.

아래의 일에 대해서는 이곳에서는 다루지 않습니다.

Active 백업 서버 또는 부하분산을 수행하는 것. 바이너리 스타 패턴에서는 백업 서버는 비 Active이며, Primary 서버가 비Active가 되지 않는 한 이용할 수 없습니다.
신뢰성 낮은 네트워크를 이용하고 있는 것을 전제로 한 경우, 어떠한 방법으로 메시지의 영속성 또는 트랜잭션을 실행할 필요가 있습니다.
서버의 자동검출. 바이너리 스타 패턴에서는 네트워크의 설정을 주도로 실행하며, 애플리케이션은 이 설정을 알고 있도록 합니다.
메시지와 상태 서버 사이에서의 복제. Failover가 발생하면 세션을 1부터 다시 실행하도록 합니다.

바이너리 스타 패턴에서 사용되는 용어는 아래와 같습니다.

Primary : 초기 또는 통상 상태로 Active한 서버
백업 : 통상 상태에서 비Active한 서버. Primary 서버가 네트워크 상에서 도달하지 않을 경우에 Active가 되며 클라이언트는 이쪽으로 접속됩니다.
Active : 클라이언트의 접속을 받아 들이는 서버. 유일한 한 개의 서버만이 Active가 됩니다.
비Active : Active에 접근하지 못할 경우에 역할을 이어받는 서버. 바이너리 스타 패턴에서는 통상 Primary 서버가 Active가 되며 백업 서버가 비Active입니다. Failover가 일어난 경우 이것이 역전됩니다.

바이너리 스타 패턴에서는 아래의 정보가 설정되어 있을 필요가 있습니다.

Primary 서버는 백업 서버의 주소를 알고 있을 것
백업 서버는 Primary 서버의 주소를 알고 있을 것
Failover의 응답 시간은 2개의 서버에서 같을 필요가 있다.

튜닝 파라메터로서는 Failover를 실행하기 위해 서버 끼리 서로 상태를 확인하는 간격을 설정합니다. 이번 예로서는 Failover의 타임아웃은 고정 2초로 설정합니다. 이 수치가 작게 하여 보다 고속으로 백업 서버가 Active 서버의 역할을 이어 받는 것이 가능합니다. 하지만 예기치 못한 Failover가 발생할 가능성이 있습니다. 예를 들면 Primary 서버가 충돌난 경우에 자동적으로 재기동을 실행하는 요청을 작성하는 경우 타임 아웃은 Primary 서버의 재기동에 필요한 시간 보다 기렉 설정되어야 합니다.

바이너리 스타 패턴에서 확실히 클라이언트 애플리케이션이 확실하게 동작하기 위해서 클라이언트는 아래와 같이 구현되어야 합니다.

2개의 서버 주소를 알고 있을 필요가 있습니다.
우선 Primary 서버에 접속하여 실패하게 되면 백업 서버에 접속합니다.
커넥션 차단을 검출하기 위해 하트 비트를 실행합니다.
재접속을 실행하는 경우 우선 Primary 서버에 접속하며, 다음으로 백업 서버에 접속합니다. 재시도 간격은 Failover 타임아웃과 같은 간격으로 실행합니다.
재접속을 실행하는 사이 세션을 재생성합니다.
신뢰성을 높이고 싶다면 Failover 후 소실된 메시지를 재송신합니다.

이것들을 구현하는 것은 그렇게 간단한 일이 아니라서, 통상은 API 안에 은닉하는 것이 좋습니다.

바이너리 스타 패턴의 주된 제한은 아래와 같습니다.

1 프로세스로는 바이너리 스타 패턴을 구성할 수 없습니다.
Primary 서버는 1개의 백업 서버를 가지며, 이 이상은 늘리지 않습니다.
비 Active서버는 통상적으로 동작하지 않습니다.
Primary 서버와 백업 서버는 각각 애플리케이션의 부하에 버틸 수 있서야 합니다.
Failover의 설정은 실행중에는 변경할 수 없습니다.
클라이언트 애플리케이션은 Failover에 대응하기 위한 기능을 가지고 있을 필요가 있습니다.

Split Brain 문제의 방지

클러스터가 분단되어 개별 모듈이 같은 타이밍으로 Active가 되면 Split Brain문제가 발생합니다. 이것은 애플리케이션의 정지를 일으킵니다. 바이너리 스타 패턴은 Split Brain문제를 검출하고 해결하는 알고리즘을 갖고 있습니다. 서버 끼리 서로 통신하여 판단하는 것이 아닌, 클라이언트로부터 접속을 받으면서 자신이 Active라는 것을 판단합니다.

하지만 이 알고리즘을 속이기 위해 의도적으로 네트워크를 구성하는 것은 가능합니다. 이 전형적인 시나리오는 바이너리 스타의 한 쌍이 2개의 건물로 분산되어 각각 건물에 클라이언트 애플리케이션이 존재하는 네트워크입니다. 이 때, 건물 사이의 네트워크가 분단되면 바이너리 스타의 한 쌍은 양 쪽 모두 Active가 됩니다.

이 Split Brain문제를 막기 위해서는 단순히 바이너리 스타 한 쌍을 같은 네트워크 스위치에 접속할 것인가, 크로스 케이블로 서로를 직접 접속하는것이 좋습니다.

바이너리 스타 패턴에서는 애플리케이션이 존재하는 네트워크를 2개의 섬으로 나누어선 안됩니다. 이와 같은 네트워크 구성으로 있는 경우에는 Failover가 아닌 Federation Pattern을 이용해야합니다.

신경질적인 네트워크에서는 단일이 아닌 2개의 클러스터를 상호접속하는 것이 있습니다 .나아가 경우에 따라서는 상호접속을 위해 통신과 메시지 처리의 통신으로 서로 다른 네트워크 카드가 이용되는 경우가 있습니다. 우선은 네트워크 장해와 클러스터 내의 장해를 분리하는 것이 중요합니다. 네트워크 포트는 꽤나 빈번하게 고장날 수 있기 때문입니다.

바이너리 스타의 구현

실제로 동작하는 바이너리 스타 서버의 구현을 보죠. Primary와 백업의 역할은 실행 시에 지정하는 것으로 코드 자체는 같습니다.

//  바이너리 스타의 개념 구현. 이 서버는 바이너리 스타의 Failover를 구현한 것 뿐이라서
//  실용적인 소스는 아닙니다.

#include "czmq.h"

//  States we can be in at any point in time
typedef enum {
    STATE_PRIMARY = 1,          //  Primary, waiting for peer to connect
    STATE_BACKUP = 2,           //  Backup, waiting for peer to connect
    STATE_ACTIVE = 3,           //  Active - accepting connections
    STATE_PASSIVE = 4           //  Passive - not accepting connections
} state_t;

//  Events, which start with the states our peer can be in
typedef enum {
    PEER_PRIMARY = 1,           //  HA peer is pending primary
    PEER_BACKUP = 2,            //  HA peer is pending backup
    PEER_ACTIVE = 3,            //  HA peer is active
    PEER_PASSIVE = 4,           //  HA peer is passive
    CLIENT_REQUEST = 5          //  Client makes request
} event_t;

//  Our finite state machine
typedef struct {
    state_t state;              //  Current state
    event_t event;              //  Current event
    int64_t peer_expiry;        //  When peer is considered 'dead'
} bstar_t;

//  정기적으로 상태 정보를 송신한다
//  만약 2회의 하트비트에 응답이 없는 경우 상대가 Shutdown 상태로 판단한다
#define HEARTBEAT 1000          //  In msecs

//  The heart of the Binary Star design is its finite-state machine (FSM).
//  바이너리 스타의 근간은 유한 상태 머신입니다. (FSM)
//  The FSM runs one event at a time. We apply an event to the current state,
//  which checks if the event is accepted, and if so, sets a new state:
//  FSM에서 이벤트가 발생하면 현재의 상태로 적용되어 새로운 상태로 전이합니다.

static bool
s_state_machine (bstar_t *fsm)
{
    bool exception = false;
    
    //  These are the PRIMARY and BACKUP states; we're waiting to become
    //  ACTIVE or PASSIVE depending on events we get from our peer:
    //  Primary와 백업 상태에서는 Active, Passive이벤트를 기다립니다.
    if (fsm->state == STATE_PRIMARY) {
        if (fsm->event == PEER_BACKUP) {
            printf ("I: connected to backup (passive), ready active\n");
            fsm->state = STATE_ACTIVE;
        }
        else
        if (fsm->event == PEER_ACTIVE) {
            printf ("I: connected to backup (active), ready passive\n");
            fsm->state = STATE_PASSIVE;
        }
        //  Accept client connections
        //  접속을 받아들인다.
    }
    else
    if (fsm->state == STATE_BACKUP) {
        if (fsm->event == PEER_ACTIVE) {
            printf ("I: connected to primary (active), ready passive\n");
            fsm->state = STATE_PASSIVE;
        }
        else
        //  Reject client connections when acting as backup
        if (fsm->event == CLIENT_REQUEST)
            exception = true;
    }
    else
    //  These are the ACTIVE and PASSIVE states:
    //  ACTIVE, PASSIVE 상태의 경우

    if (fsm->state == STATE_ACTIVE) {
        if (fsm->event == PEER_ACTIVE) {
            //  Two actives would mean split-brain
            //  양쪽 모두 Active상태, 이른바 Split Brain이 발생하게 되었다.
            printf ("E: fatal error - dual actives, aborting\n");
            exception = true;
        }
    }
    else
    //  Server is passive
    //  CLIENT_REQUEST events can trigger failover if peer looks dead
    //  이곳은 Passive상태로 상대가 끊긴 경우 클라이언트 요청 이벤트에 기인한
    //  Failover가 발생한다.
    if (fsm->state == STATE_PASSIVE) {
        if (fsm->event == PEER_PRIMARY) {
            //  Peer is restarting - become active, peer will go passive
            //  상대쪽이 재기동합니다. 이쪽이 Active가 되며 상대가 Passive로 됩니다.
            printf ("I: primary (passive) is restarting, ready active\n");
            fsm->state = STATE_ACTIVE;
        }
        else
        if (fsm->event == PEER_BACKUP) {
            //  Peer is restarting - become active, peer will go passive
            //  상대쪽이 재기동합니다. 이쪽이 Active가 되며 상대가 Passive로 됩니다.
            printf ("I: backup (passive) is restarting, ready active\n");
            fsm->state = STATE_ACTIVE;
        }
        else
        if (fsm->event == PEER_PASSIVE) {
            //  양쪽 모두 Passive, 이른바 클러스터가 응답불능 상태가 됩니다.
            printf ("E: fatal error - dual passives, aborting\n");
            exception = true;
        }
        else
        if (fsm->event == CLIENT_REQUEST) {
            //  Peer becomes active if timeout has passed
            //  It's the client request that triggers the failover
            //  상대가 Active 상태로 타임아웃이 발생하면 클라이언트 리퀘스트에 기인한
            //  Failover가 발생합니다.
            assert (fsm->peer_expiry > 0);
            if (zclock_time () >= fsm->peer_expiry) {
                //  If peer is dead, switch to the active state
                //  상대가 죽어서 이쪽이 Active 상태가 됩니다.
                printf ("I: failover successful, ready active\n");
                fsm->state = STATE_ACTIVE;
            }
            else
                //  If peer is alive, reject connections
                // 상대가 살아있다면 클라이언트의 접속을 거부합니다.
                exception = true;
        }
    }
    return exception;
}

//  This is our main task. First we bind/connect our sockets with our
//  peer and make sure we will get state messages correctly. We use
//  three sockets; one to publish state, one to subscribe to state, and
//  one for client requests/replies:
//  이쪽이 메인 테스크입니다. 우선 서로에 bind와 접속을 실행하는 상태 메시지가 표시되는 것을 
//  확인하세요. 이곳에서는 3개의 소켓을 이용합니다. 하나는 상태의 Publish, 또 하나는 상태의 
//  Subscribe, 그리고 또 하나로 클라이언트와 통신합니다.

int main (int argc, char *argv [])
{
    //  Arguments can be either of:
    //      -p  primary server, at tcp://localhost:5001
    //      -b  backup server, at tcp://localhost:5002
    zctx_t *ctx = zctx_new ();
    void *statepub = zsocket_new (ctx, ZMQ_PUB);
    void *statesub = zsocket_new (ctx, ZMQ_SUB);
    zsocket_set_subscribe (statesub, "");
    void *frontend = zsocket_new (ctx, ZMQ_ROUTER);
    bstar_t fsm = { 0 };

    if (argc == 2 && streq (argv [1], "-p")) {
        printf ("I: Primary active, waiting for backup (passive)\n");
        zsocket_bind (frontend, "tcp://*:5001");
        zsocket_bind (statepub, "tcp://*:5003");
        zsocket_connect (statesub, "tcp://localhost:5004");
        fsm.state = STATE_PRIMARY;
    }
    else
    if (argc == 2 && streq (argv [1], "-b")) {
        printf ("I: Backup passive, waiting for primary (active)\n");
        zsocket_bind (frontend, "tcp://*:5002");
        zsocket_bind (statepub, "tcp://*:5004");
        zsocket_connect (statesub, "tcp://localhost:5003");
        fsm.state = STATE_BACKUP;
    }
    else {
        printf ("Usage: bstarsrv { -p | -b }\n");
        zctx_destroy (&ctx);
        exit (0);
    }
    //  We now process events on our two input sockets, and process these
    //  events one at a time via our finite-state machine. Our "work" for
    //  a client request is simply to echo it back:
    //  이곳에서는 2개의 입력 소켓에서 메시지를 수신하고 유한 상태 머신으로서 이벤트를
    //  처리합니다. 클라이언트로부터의 리퀘스트에 대해서는 단순히 에코를 반환합니다.

    //  Set timer for next outgoing state message
    //  다음 상태 메시지까지의 타이머를 설정합니다.
    int64_t send_state_at = zclock_time () + HEARTBEAT;
    while (!zctx_interrupted) {
        zmq_pollitem_t items [] = {
            { frontend, 0, ZMQ_POLLIN, 0 },
            { statesub, 0, ZMQ_POLLIN, 0 }
        };
        int time_left = (int) ((send_state_at - zclock_time ()));
        if (time_left < 0)
            time_left = 0;
        int rc = zmq_poll (items, 2, time_left * ZMQ_POLL_MSEC);
        if (rc == -1)
            break;              //  Context has been shut down

        if (items [0].revents & ZMQ_POLLIN) {
            //  클라이언트 리퀘스트를 수신
            zmsg_t *msg = zmsg_recv (frontend);
            fsm.event = CLIENT_REQUEST;
            if (s_state_machine (&fsm) == false)
                //  에코를 반환
                zmsg_send (&msg, frontend);
            else
                zmsg_destroy (&msg);
        }
        if (items [1].revents & ZMQ_POLLIN) {
            //  Have state from our peer, execute as event
            //  상대방으로부터 상태 정보를 수신, 이벤트를 실행합니다.
            char *message = zstr_recv (statesub);
            fsm.event = atoi (message);
            free (message);
            if (s_state_machine (&fsm))
                break;          //  Error, so exit
            fsm.peer_expiry = zclock_time () + 2 * HEARTBEAT;
        }
        //  If we timed out, send state to peer
        if (zclock_time () >= send_state_at) {
            char message [2];
            sprintf (message, "%d", fsm.state);
            zstr_send (statepub, message);
            send_state_at = zclock_time () + HEARTBEAT;
        }
    }
    if (zctx_interrupted)
        printf ("W: interrupted\n");

    //  Shutdown sockets and context
    zctx_destroy (&ctx);
    return 0;
}

그리고 클라이언트의 코드입니다.

//  Binary Star client proof-of-concept implementation. This client does no
//  real work; it just demonstrates the Binary Star failover model.

#include "czmq.h"
#define REQUEST_TIMEOUT     1000    //  msecs
#define SETTLE_DELAY        2000    //  Before failing over

int main (void)
{
    zctx_t *ctx = zctx_new ();

    char *server [] = { "tcp://localhost:5001", "tcp://localhost:5002" };
    uint server_nbr = 0;

    printf ("I: connecting to server at %s…\n", server [server_nbr]);
    void *client = zsocket_new (ctx, ZMQ_REQ);
    zsocket_connect (client, server [server_nbr]);

    int sequence = 0;
    while (!zctx_interrupted) {
        //  We send a request, then we work to get a reply
        //  리퀘스트를 송신하고 응답을 받아들이기 위한 처리를 실행합니다.
        char request [10];
        sprintf (request, "%d", ++sequence);
        zstr_send (client, request);

        int expect_reply = 1;
        while (expect_reply) {
            //  Poll socket for a reply, with timeout
            zmq_pollitem_t items [] = { { client, 0, ZMQ_POLLIN, 0 } };
            int rc = zmq_poll (items, 1, REQUEST_TIMEOUT * ZMQ_POLL_MSEC);
            if (rc == -1)
                break;          //  Interrupted

            //  We use a Lazy Pirate strategy in the client. If there's no
            //  reply within our timeout, we close the socket and try again.
            //  In Binary Star, it's the client vote that decides which
            //  server is primary; the client must therefore try to connect
            //  to each server in turn:
            
            //  게으른 해적 클라이언트와 같은 전략을 이용합니다.
            //  응답을 취득하기 까지에 타임아웃이 발생한 경우 소켓을 닫고 
            //  재시행을 실행합니다.
            //  바이너리 스타에서는 클라이언트의 투표에 의해 Primary 서버를 결정합니다.
            //  따라서 클라이언트는 서버에 순번으로 접속하지 않으면 합니다.
            
            if (items [0].revents & ZMQ_POLLIN) {
                //  We got a reply from the server, must match sequence
                //  서버로부터 응답을 수신했다면 시퀸스 번호를 확인합니다.
                char *reply = zstr_recv (client);
                if (atoi (reply) == sequence) {
                    printf ("I: server replied OK (%s)\n", reply);
                    expect_reply = 0;
                    sleep (1);  //  1초간 1리퀘스트로 제한
                }
                else
                    printf ("E: bad reply from server: %s\n", reply);
                free (reply);
            }
            else {
                printf ("W: no response from server, failing over\n");
                
                //  응답이 없어서 새로운 소켓으로 재접속합니다.
                zsocket_destroy (ctx, client);
                server_nbr = (server_nbr + 1) % 2;
                zclock_sleep (SETTLE_DELAY);
                printf ("I: connecting to server at %s…\n",
                        server [server_nbr]);
                client = zsocket_new (ctx, ZMQ_REQ);
                zsocket_connect (client, server [server_nbr]);

                //  새로운 소켓으로 리퀘스트를 재송신합니다.
                zstr_send (client, request);
            }
        }
    }
    zctx_destroy (&ctx);
    return 0;
}

바이너리 스타 테스트를 실행하기 위해서는 아래와 같이 2개의 서버와 클라이언트를 기동합니다. 기동하는 순서는 어떤 것이든 상관없습니다.

bstarsrv -p # Start primary
bstarsrv -b # Start backup
bstarcli

이 상태에서 Primary 서버를 정지하게 되면 Failover를 발생시킬 수 있습니다. 그리고 Primary를 기동하면 백업을 정지하는 것으로 복구가 완료됩니다. Failover와 복구의 타이밍은 클라이언트가 판단하는 사실에 주의해야 합니다.

바이너리 스타는 유한상태 머신에 의해 동작합니다. 「Peer Active」는 상대측 서버가 Active상태라는 의미의 이벤트입니다. 「Client Request」는 클라이언트로부터의 리퀘스트를 취득했다는 것을 의미하는 이벤트입니다. 「Client Vote」는Passive상태의 서버가 클라이언트로부터 리퀘스트를 취득하여 Active 상태로 전이합니다.

서버 끼리 상태를 통지하기 위해 PUB-SUB 소켓을 이용하고 있습니다. 다른 소켓의 구조로는 잘 동작하지 않을 것입니다. PUSH와 DEALER 소켓의 구조로는 통신 상대가 메시지를 수신할 준비가 되지 않은 경우에 블럭이 되버립니다. PAIR소켓으로는 통신 상대와 일시적으로 통신이 가능하지 않는 경우에 재접속을 실행할 수 없습니다. ROUTER 소켓으로는 메시지를 송신하는 경우에 통신 상대의 주소가 필요합니다.

바이너리 스타 리액터

바이너리 스타를 재이용가능한 리액터 클래스로서 패키징하면 범용적으로 편리해집니다. 리액터에는 메시지를 처리하는 함수를 넘겨서 실행합니다. 기존의 서버에 대해 바이너리 스타의 기능을 복사/붙여넣기 하기 보다는 이쪽이 훨씬 낫습니다.

C 언어의 경우 이미 소개했던 CZMQ의 zloop 클래스를 이용합니다. zloop에는 소켓과 타이머 이벤트에 반응하는 핸들러를 등록할 수 있습니다. 바이너리 스타의 경우 Active에서 비Active로 전이 등의 상태 변경에 관한 핸들러를 등록합니다. 다음은 bstar 클래스의 구현입니다.

//  bstar class - Binary Star reactor

#include "bstar.h"

//  States we can be in at any point in time
typedef enum {
    STATE_PRIMARY = 1,          //  Primary, waiting for peer to connect
    STATE_BACKUP = 2,           //  Backup, waiting for peer to connect
    STATE_ACTIVE = 3,           //  Active - accepting connections
    STATE_PASSIVE = 4           //  Passive - not accepting connections
} state_t;

//  Events, which start with the states our peer can be in
typedef enum {
    PEER_PRIMARY = 1,           //  HA peer is pending primary
    PEER_BACKUP = 2,            //  HA peer is pending backup
    PEER_ACTIVE = 3,            //  HA peer is active
    PEER_PASSIVE = 4,           //  HA peer is passive
    CLIENT_REQUEST = 5          //  Client makes request
} event_t;

//  Structure of our class

struct _bstar_t {
    zctx_t *ctx;                //  Our private context
    zloop_t *loop;              //  Reactor loop
    void *statepub;             //  State publisher
    void *statesub;             //  State subscriber
    state_t state;              //  Current state
    event_t event;              //  Current event
    int64_t peer_expiry;        //  When peer is considered 'dead'
    zloop_fn *voter_fn;         //  Voting socket handler
    void *voter_arg;            //  Arguments for voting handler
    zloop_fn *active_fn;        //  Call when become active
    void *active_arg;           //  Arguments for handler
    zloop_fn *passive_fn;         //  Call when become passive
    void *passive_arg;            //  Arguments for handler
};

//  The finite-state machine is the same as in the proof-of-concept server.
//  To understand this reactor in detail, first read the CZMQ zloop class.
//  유한상태 머신은 위의 서버와 동일합니다.
//  이 리액터를 자세히 이해하고 싶은 경우에는 CZMS의 zloop 클래스를 읽어주세요

//  We send state information every this often
//  If peer doesn't respond in two heartbeats, it is 'dead'
//  정기적으로 상태 정보를 송신합니다.
//  만약 2회의 하트비트에 응답이 없는 경우 상대는 죽었다고 판단합니다.
#define BSTAR_HEARTBEAT     1000        //  In msecs

//  Binary Star finite state machine (applies event to state)
//  Returns -1 if there was an exception, 0 if event was valid.
//  유한상태 머신이 이벤트를 정상으로 처리한 경우엔 0을 반환, 예외가 발생한 경우
//  -1을 반환합니다.

static int
s_execute_fsm (bstar_t *self)
{
    int rc = 0;
    //  Primary server is waiting for peer to connect
    //  Accepts CLIENT_REQUEST events in this state
    //  Primary 상태라면 접속을 대기, CLIENT_REQUEST 이벤트를 수신합니다.
    if (self->state == STATE_PRIMARY) {
        if (self->event == PEER_BACKUP) {
            zclock_log ("I: connected to backup (passive), ready as active");
            self->state = STATE_ACTIVE;
            if (self->active_fn)
                (self->active_fn) (self->loop, NULL, self->active_arg);
        }
        else
        if (self->event == PEER_ACTIVE) {
            zclock_log ("I: connected to backup (active), ready as passive");
            self->state = STATE_PASSIVE;
            if (self->passive_fn)
                (self->passive_fn) (self->loop, NULL, self->passive_arg);
        }
        else
        if (self->event == CLIENT_REQUEST) {
            // Allow client requests to turn us into the active if we've
            // waited sufficiently long to believe the backup is not
            // currently acting as active (i.e., after a failover)
            // 백업이 장시간 Active로서 동작하고 있지 않을 때는
            // 이쪽이 Active될 수 있습니다.
            assert (self->peer_expiry > 0);
            if (zclock_time () >= self->peer_expiry) {
                zclock_log ("I: request from client, ready as active");
                self->state = STATE_ACTIVE;
                if (self->active_fn)
                    (self->active_fn) (self->loop, NULL, self->active_arg);
            } else
                // Don't respond to clients yet - it's possible we're
                // performing a failback and the backup is currently active
                // 백업이 Active 상태라서 PRimary는 클라이언트에 응답하지 않습니다.
                rc = -1;
        }
    }
    else
    //  Backup server is waiting for peer to connect
    //  Rejects CLIENT_REQUEST events in this state
    //  상대쪽에 접속이 오는 것을 기다립니다.
    //  이 상태에서 CLIENT_REQUEST가 온 경우에는 거부합니다.
    if (self->state == STATE_BACKUP) {
        if (self->event == PEER_ACTIVE) {
            zclock_log ("I: connected to primary (active), ready as passive");
            self->state = STATE_PASSIVE;
            if (self->passive_fn)
                (self->passive_fn) (self->loop, NULL, self->passive_arg);
        }
        else
        if (self->event == CLIENT_REQUEST)
            rc = -1;
    }
    else
    //  Server is active
    //  Accepts CLIENT_REQUEST events in this state
    //  The only way out of ACTIVE is death
    //  Active 상태라서 CLIENT_REQUEST 이벤트를 받아들입니다.
    if (self->state == STATE_ACTIVE) {
        if (self->event == PEER_ACTIVE) {
            //  Two actives would mean split-brain
            //  양쪽 모두 Active 상태, 이른바 Split Brain이 발생합니다.
            zclock_log ("E: fatal error - dual actives, aborting");
            rc = -1;
        }
    }
    else
    //  서버가 패시브 상태입니다.
    //  CLIENT_REQUEST events can trigger failover if peer looks dead
    //  상대가 죽어서 CLIENT_REQUEST가 온 경우에는 Failover 합니다.
    if (self->state == STATE_PASSIVE) {
        if (self->event == PEER_PRIMARY) {
            //  Peer is restarting - become active, peer will go passive
            // 상대는 재기동중입니다. 이쪽이 Active가 되며 상대는 Passive가 됩니다.
            zclock_log ("I: primary (passive) is restarting, ready as active");
            self->state = STATE_ACTIVE;
        }
        else
        if (self->event == PEER_BACKUP) {
            //  Peer is restarting - become active, peer will go passive
            //  상대는 재기동중입니다. 이쪽이 Active가 되며 상대는 Passive가 됩니다.
            zclock_log ("I: backup (passive) is restarting, ready as active");
            self->state = STATE_ACTIVE;
        }
        else
        if (self->event == PEER_PASSIVE) {
            //  Two passives would mean cluster would be non-responsive
            // 양쪽 모두 Passive, 이른바 클러스터가 응답 불능이 됩니다.
            zclock_log ("E: fatal error - dual passives, aborting");
            rc = -1;
        }
        else
        if (self->event == CLIENT_REQUEST) {
            //  Peer becomes active if timeout has passed
            //  It's the client request that triggers the failover
            //  상대가 Active 상태에서 타임아웃이 발생하면 클라이언트 리퀘스트에 기인한
            //  Failover가 발생합니다.
            assert (self->peer_expiry > 0);
            if (zclock_time () >= self->peer_expiry) {
                //  If peer is dead, switch to the active state
                zclock_log ("I: failover successful, ready as active");
                self->state = STATE_ACTIVE;
            }
            else
                //  If peer is alive, reject connections
                rc = -1;
        }
        //  Call state change handler if necessary
        //  필요에 따라 상태 변경 핸들러를 호출합니다.
        if (self->state == STATE_ACTIVE && self->active_fn)
            (self->active_fn) (self->loop, NULL, self->active_arg);
    }
    return rc;
}

static void
s_update_peer_expiry (bstar_t *self)
{
    self->peer_expiry = zclock_time () + 2 * BSTAR_HEARTBEAT;
}

//  Reactor event handlers…

//  Publish our state to peer
// 상태를 상대에게 전달합니다.
int s_send_state (zloop_t *loop, int timer_id, void *arg)
{
    bstar_t *self = (bstar_t *) arg;
    zstr_sendf (self->statepub, "%d", self->state);
    return 0;
}

//  Receive state from peer, execute finite state machine
// 상대쪽의 상태를 수신하고 유한상태 머신을 실행합니다.
int s_recv_state (zloop_t *loop, zmq_pollitem_t *poller, void *arg)
{
    bstar_t *self = (bstar_t *) arg;
    char *state = zstr_recv (poller->socket);
    if (state) {
        self->event = atoi (state);
        s_update_peer_expiry (self);
        free (state);
    }
    return s_execute_fsm (self);
}

//  Application wants to speak to us, see if it's possible
// 애플리케이션은 접속할 수 있는지 어떤지를 확인하고 싶어 합니다.
int s_voter_ready (zloop_t *loop, zmq_pollitem_t *poller, void *arg)
{
    bstar_t *self = (bstar_t *) arg;
    //  If server can accept input now, call appl handler
    self->event = CLIENT_REQUEST;
    if (s_execute_fsm (self) == 0)
        (self->voter_fn) (self->loop, poller, self->voter_arg);
    else {
        //  Destroy waiting message, no-one to read it
        zmsg_t *msg = zmsg_recv (poller->socket);
        zmsg_destroy (&msg);
    }
    return 0;
}

//  This is the constructor for our bstar class. We have to tell it
//  whether we're primary or backup server, as well as our local and
//  remote endpoints to bind and connect to:
//  이것은 bstar 클래스의 생성자입니다. 이곳에서 인수를 넣어 
//  Primary인지 백업인지를 전달할 필요가 있습니다. 
//  동시에 엔드포인트와 상대쪽으로의 접속처를 지정합니다.

bstar_t *
bstar_new (int primary, char *local, char *remote)
{
    bstar_t
        *self;

    self = (bstar_t *) zmalloc (sizeof (bstar_t));

    //  Initialize the Binary Star
    self->ctx = zctx_new ();
    self->loop = zloop_new ();
    self->state = primary? STATE_PRIMARY: STATE_BACKUP;

    //  Create publisher for state going to peer
    self->statepub = zsocket_new (self->ctx, ZMQ_PUB);
    zsocket_bind (self->statepub, local);

    //  Create subscriber for state coming from peer
    self->statesub = zsocket_new (self->ctx, ZMQ_SUB);
    zsocket_set_subscribe (self->statesub, "");
    zsocket_connect (self->statesub, remote);

    //  Set-up basic reactor events
    zloop_timer (self->loop, BSTAR_HEARTBEAT, 0, s_send_state, self);
    zmq_pollitem_t poller = { self->statesub, 0, ZMQ_POLLIN };
    zloop_poller (self->loop, &poller, s_recv_state, self);
    return self;
}

//  The destructor shuts down the bstar reactor:

void
bstar_destroy (bstar_t **self_p)
{
    assert (self_p);
    if (*self_p) {
        bstar_t *self = *self_p;
        zloop_destroy (&self->loop);
        zctx_destroy (&self->ctx);
        free (self);
        *self_p = NULL;
    }
}

//  This method returns the underlying zloop reactor, so we can add
//  additional timers and readers:
//  이 메서드는 zloop 리액터를 반환합니다.
//  이곳에서 타이머와 리더를 추가할 수 있습니다.

zloop_t *
bstar_zloop (bstar_t *self)
{
    return self->loop;
}

//  This method registers a client voter socket. Messages received
//  on this socket provide the CLIENT_REQUEST events for the Binary Star
//  FSM and are passed to the provided application handler. We require
//  exactly one voter per bstar instance:
//  이 메서드는 클라이언트의 투표 소켓을 등록합니다.
//  메시지를 수신하면 바이너리 스타 FSM에서 CLIENT_REQUEST 이벤트를 발생시켜
//  애플리케이션 핸들러를 호출합니다.

int
bstar_voter (bstar_t *self, char *endpoint, int type, zloop_fn handler,
             void *arg)
{
    //  Hold actual handler+arg so we can call this later
    //  나중에 이용하기 위해 핸들러와 파라메터를 확보합니다.
    void *socket = zsocket_new (self->ctx, type);
    zsocket_bind (socket, endpoint);
    assert (!self->voter_fn);
    self->voter_fn = handler;
    self->voter_arg = arg;
    zmq_pollitem_t poller = { socket, 0, ZMQ_POLLIN };
    return zloop_poller (self->loop, &poller, s_voter_ready, self);
}

//  Register handlers to be called each time there's a state change:
//  상태를 변경하면 호출시킬 핸들러를 등록합니다.

void
bstar_new_active (bstar_t *self, zloop_fn handler, void *arg)
{
    assert (!self->active_fn);
    self->active_fn = handler;
    self->active_arg = arg;
}

void
bstar_new_passive (bstar_t *self, zloop_fn handler, void *arg)
{
    assert (!self->passive_fn);
    self->passive_fn = handler;
    self->passive_arg = arg;
}

//  Enable/disable verbose tracing, for debugging:

void bstar_set_verbose (bstar_t *self, bool verbose)
{
    zloop_set_verbose (self->loop, verbose);
}

//  Finally, start the configured reactor. It will end if any handler
//  returns -1 to the reactor, or if the process receives SIGINT or SIGTERM:
//  reactor를 개시합니다.
//  언젠가 핸들러가 -1을 반환하는 경우와 프로세스가 SIGINT와 SIGTERM을 취득한 경우에 완료합니다.

int
bstar_start (bstar_t *self)
{
    assert (self->voter_fn);
    s_update_peer_expiry (self);
    return zloop_start (self->loop);
}

이것을 이용하는 것으로 서버의 메인 프로그램은 이렇게 짧아집니다.

//  Binary Star server, using bstar reactor

//  Lets us build this source without creating a library
#include "bstar.c"

//  Echo service
int s_echo (zloop_t *loop, zmq_pollitem_t *poller, void *arg)
{
    zmsg_t *msg = zmsg_recv (poller->socket);
    zmsg_send (&msg, poller->socket);
    return 0;
}

int main (int argc, char *argv [])
{
    //  Arguments can be either of:
    //      -p  primary server, at tcp://localhost:5001
    //      -b  backup server, at tcp://localhost:5002
    bstar_t *bstar;
    if (argc == 2 && streq (argv [1], "-p")) {
        printf ("I: Primary active, waiting for backup (passive)\n");
        bstar = bstar_new (BSTAR_PRIMARY,
            "tcp://*:5003", "tcp://localhost:5004");
        bstar_voter (bstar, "tcp://*:5001", ZMQ_ROUTER, s_echo, NULL);
    }
    else
    if (argc == 2 && streq (argv [1], "-b")) {
        printf ("I: Backup passive, waiting for primary (active)\n");
        bstar = bstar_new (BSTAR_BACKUP,
            "tcp://*:5004", "tcp://localhost:5003");
        bstar_voter (bstar, "tcp://*:5002", ZMQ_ROUTER, s_echo, NULL);
    }
    else {
        printf ("Usage: bstarsrvs { -p | -b }\n");
        exit (0);
    }
    bstar_start (bstar);
    bstar_destroy (&bstar);
    return 0;
}

The Coding Machine's Warehouse

고가용성 페어 (바이너리 스타 패턴)

상세한 요건

Split Brain 문제의 방지

바이너리 스타의 구현

바이너리 스타 리액터

티스토리툴바