<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=1194005000700189&amp;ev=PageView&amp;noscript=1">
VizuriBlogHero.jpg

From Vizuri's Experts

Configuring HornetQ for failover (part 2)

HornetQ

This is part two of two part post on configuring HornetQ for failover.  In part one we discussed properties that a clustered messaging configuration should exhibit, using shared journals to pick up in-flight messages via replication or shared storage, and a proposed configuration for HornetQ failover. 

Split Brain Issues

Before we dive into the configuration, let's take consider some of the risks to any replication based approach, in the commonly called “split-brain” situations.  The shared storage approach will not be prone to these situations since there is a single source of the in-flight messages. When using replication, these situations can occur in the following scenarios:

The network between the nodes fails, and a backup becomes active before a failure, resulting in the messages that have already been replicated to be consumed by multiple nodes.

With all nodes colocated, this is an unlikely event a node fails, has it’s messages processed by a backup, the backup fails, and then the original node comes back online.

The original active node will not see that its messages have been processed, and will attempt to process them again.

This is already an edge case, and having 2 backups for each marginalizes its likelihood even more.

The ideal situation for any asynchronous system is to have it’s messages be idempotent, meaning that replaying the same message does not corrupt the state of the application. This can be achieved by using unique id’s and/or counters along with the message payload and additional application logic.

Most of the time, these situations will arise in the midst of catastrophic failure, where it should be expected to have some manual intervention to sort through the message journals.

This HornetQ configuration is adapted from Red Hat’s “reference architecture” and is thoroughly explained in the accompanying document. The configurations will be presented and briefly examined here:

Active HornetQ Server

The active HornetQ server is defined with the following excerpt in domain.xml:

<hornetq-server>
      <persistence-enabled>true</persistence-enabled>
<cluster-password>ClusterPassword!</cluster-password>
<backup>false</backup>
<allow-failback>true</allow-failback>
<failover-on-shutdown>true</failover-on-shutdown>
<shared-store>false</shared-store>
<journal-type>NIO</journal-type>
<journal-min-files>2</journal-min-files>
<check-for-live-server>true</check-for-live-server>
<backup-group-name>${messaging.backup.group.a:backup-group-1}</backup-group-name>

<connectors>
<netty-connector name="netty" socket-binding="messaging"/>
<netty-connector name="netty-throughput" socket-binding="messaging-throughput">
<param key="batch-delay" value="50"/>
</netty-connector>
<in-vm-connector name="in-vm" server-id="1"/>
</connectors>

<acceptors>
<netty-acceptor name="netty" socket-binding="messaging"/>
<netty-acceptor name="netty-throughput" socket-binding="messaging-throughput">
<param key="batch-delay" value="50"/>
<param key="direct-deliver" value="false"/>
</netty-acceptor>
<in-vm-acceptor name="in-vm" server-id="1"/>
</acceptors>

<broadcast-groups>
<broadcast-group name="bg-group1">
<socket-binding>messaging-group</socket-binding>
<broadcast-period>5000</broadcast-period>
<connector-ref>
netty
</connector-ref>
</broadcast-group>
</broadcast-groups>

<discovery-groups>
<discovery-group name="dg-group1">
<socket-binding>messaging-group</socket-binding>
<refresh-timeout>10000</refresh-timeout>
</discovery-group>
</discovery-groups>

<cluster-connections>
<cluster-connection name="my-cluster">
<address>jms</address>
<connector-ref>netty</connector-ref>
<check-period>35000</check-period>
<discovery-group-ref discovery-group-name="dg-group1"/>
</cluster-connection>
</cluster-connections>

<security-settings>
<security-setting match="#">
<permission type="send" roles="guest"/>
<permission type="consume" roles="guest"/>
<permission type="createNonDurableQueue" roles="guest"/>
<permission type="deleteNonDurableQueue" roles="guest"/>
</security-setting>
</security-settings>

<address-settings>
<address-setting match="#">
<dead-letter-address>jms.queue.DLQ</dead-letter-address>
<expiry-address>jms.queue.ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<max-size-bytes>10485760</max-size-bytes>
<address-full-policy>BLOCK</address-full-policy>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<redistribution-delay>1000</redistribution-delay>
</address-setting>
</address-settings>

<jms-connection-factories>
<connection-factory name="InVmConnectionFactory">
<connectors>
<connector-ref connector-name="in-vm"/>
</connectors>
<entries>
<entry name="java:/ConnectionFactory"/>
</entries>
</connection-factory>
<connection-factory name="RemoteConnectionFactory">
<connectors>
<connector-ref connector-name="netty"/>
</connectors>
<entries>
<entry name="java:jboss/exported/jms/RemoteConnectionFactory"/>
</entries>
<ha>true</ha>
<block-on-acknowledge>true</block-on-acknowledge>
<retry-interval>1000</retry-interval>
<retry-interval-multiplier>1.0</retry-interval-multiplier>
<reconnect-attempts>-1</reconnect-attempts>
</connection-factory>
<pooled-connection-factory name="hornetq-ra">
<transaction mode="xa"/>
<connectors>
<connector-ref connector-name="in-vm"/>
</connectors>
<entries>
<entry name="java:/JmsXA"/>
</entries>
<ha>true</ha>
<block-on-acknowledge>true</block-on-acknowledge>
<retry-interval>1000</retry-interval>
<retry-interval-multiplier>1</retry-interval-multiplier>
<reconnect-attempts>-1</reconnect-attempts>
</pooled-connection-factory>
</jms-connection-factories>

<jms-destinations>
<jms-queue name="HELLOWORLDMDBQueue">
<entry name="/queue/HELLOWORLDMDBQueue"/>
<durable>true</durable>
</jms-queue>
<jms-topic name="HELLOWORLDMDBTopic">
<entry name="/topic/HELLOWORLDMDBTopic"/>
</jms-topic>
</jms-destinations>
</hornetq-server>

Some highlights of this configuration are:

bbackup = false
    • This marks the server as an active node
cluster-password
    • This must match all cluster nodes
    • The cluster-user has a sensible default and is omitted here, but if set must match all nodes
    • The deprecated clustered=true setting is omitted as well
allow-failback = true
    • A backup that is promoted to active will give control back if the original active node comes back online
failover-on-shutdown = true
    • Allows failover to occur during “normal” shutdown. If false, failover will only occur on unexpected loss of node
    • This is a operations decision whether you want this on or not:  some feel that if they purposely shut the machine down, they want to cease processing it’s messages
    • This is set to true here to facilitate testing
shared-store = false
    • Enables replication
check-for-live-server = true
    • A big part of this solution, informs the server to check with it’s cluster peers (which may be outside of it’s backup group) to find a node that can process messages when the current node has no means to do so
backup-group-name = ${messaging.backup.group.a:backup-group-1}
    • The backup group name links the active and backup servers together for replication
    • Parameterizing this allows us to achieve the solution with a single profile
    • The active node will use the value of system property “messaging.backup.group.a” and will default to “backup-group-1” when absent
connectors/acceptors
    • must have a unique “server-id” within the same JVM
pooled-connection-factory
    • Has some additional settings for HA, including retry timeouts and counts

First Backup HornetQ Server

The first backup configuration excerpt is here:

<hornetq-server name="backup-server-a">
                   <persistence-enabled>true</persistence-enabled>
                   <cluster-password>ClusterPassword!</cluster-password>
                   <backup>true</backup>
                   <allow-failback>true</allow-failback>
                   <failover-on-shutdown>true</failover-on-shutdown>
                   <shared-store>false</shared-store>
                   <journal-type>NIO</journal-type>
                   <journal-min-files>2</journal-min-files>
                   <check-for-live-server>true</check-for-live-server>
                   <backup-group-name>${messaging.backup.group.b:backup-group-2}</backup-group-name>
                   <paging-directory path="messagingpagingbackupa"/>
                   <bindings-directory path="messagingbindingbackupa"/>
                   <journal-directory path="messagingjournalbackupa"/>
                   <large-messages-directory path="messaginglargemessagesbackupa"/>

                   <connectors>
                       <netty-connector name="netty" socket-binding="messaginga"/>
                       <in-vm-connector name="in-vm" server-id="2"/>
                   </connectors>

                   <acceptors>
                       <netty-acceptor name="netty" socket-binding="messaginga"/>
                       <in-vm-acceptor name="in-vm" server-id="2"/>
                   </acceptors>

                   <broadcast-groups>
                       <broadcast-group name="bg-group1">
                           <socket-binding>messaging-group</socket-binding>
                           <broadcast-period>5000</broadcast-period>
                           <connector-ref>
                               netty
                           </connector-ref>
                       </broadcast-group>
                   </broadcast-groups>

                   <discovery-groups>
                       <discovery-group name="dg-group1">
                           <socket-binding>messaging-group</socket-binding>
                           <refresh-timeout>10000</refresh-timeout>
                       </discovery-group>
                   </discovery-groups>

                   <cluster-connections>
                       <cluster-connection name="my-cluster">
                           <address>jms</address>
                           <connector-ref>netty</connector-ref>
                           <discovery-group-ref discovery-group-name="dg-group1"/>
                       </cluster-connection>
                   </cluster-connections>

                   <address-settings>
                       <address-setting match="#">
                           <dead-letter-address>jms.queue.DLQ</dead-letter-address>
                           <expiry-address>jms.queue.ExpiryQueue</expiry-address>
                           <redelivery-delay>0</redelivery-delay>
                           <max-size-bytes>10485760</max-size-bytes>
                           <address-full-policy>BLOCK</address-full-policy>
                           <message-counter-history-day-limit>10</message-counter-history-day-limit>
                           <redistribution-delay>1000</redistribution-delay>
                       </address-setting>
                   </address-settings>
</hornetq-server>

The differences between this and the active server:

Different server name
backup = true
backup-group-name
    • Parameterized to use value of system property “messaging.backup.group.b”, defaulting to backup-group-2 when absent
Uses different directories for storing it’s journals so it doesn’t overwrite the active server
connectors/acceptors
    • Uses custom socket binding port “messaginga” which is added to the full-ha-sockets binding definition
    • Uses a different server-id value
    • netty-throughput values are removed
Removed all jms-connection-factories and jms-destinations
    • Again, all the actual work is done by one of the default HornetQ servers

Second Backup HornetQ Server

The second backup is nearly identical to the first:

<hornetq-server name="backup-server-b">
                   <persistence-enabled>true</persistence-enabled>
                   <cluster-password>ClusterPassword!</cluster-password>
                   <backup>true</backup>
                   <allow-failback>true</allow-failback>
                   <failover-on-shutdown>true</failover-on-shutdown>
                   <shared-store>false</shared-store>
                   <journal-type>NIO</journal-type>
                   <journal-min-files>2</journal-min-files>
                   <check-for-live-server>true</check-for-live-server>
                   <backup-group-name>${messaging.backup.group.c:backup-group-3}</backup-group-name>
                   <paging-directory path="messagingpagingbackupb"/>
                   <bindings-directory path="messagingbindingbackupb"/>
                   <journal-directory path="messagingjournalbackupb"/>
                   <large-messages-directory path="messaginglargemessagesbackupb"/>

                   <connectors>
                       <netty-connector name="netty" socket-binding="messagingb"/>
                       <in-vm-connector name="in-vm" server-id="3"/>
                   </connectors>

                   <acceptors>
                       <netty-acceptor name="netty" socket-binding="messagingb"/>
                       <in-vm-acceptor name="in-vm" server-id="3"/>
                   </acceptors>

                   <broadcast-groups>
                       <broadcast-group name="bg-group1">
                           <socket-binding>messaging-group</socket-binding>
                           <broadcast-period>5000</broadcast-period>
                           <connector-ref>
                               netty
                           </connector-ref>
                       </broadcast-group>
                   </broadcast-groups>

                   <discovery-groups>
                       <discovery-group name="dg-group1">
                           <socket-binding>messaging-group</socket-binding>
                           <refresh-timeout>10000</refresh-timeout>
                       </discovery-group>
                   </discovery-groups>

                   <cluster-connections>
                       <cluster-connection name="my-cluster">
                           <address>jms</address>
                           <connector-ref>netty</connector-ref>
                           <discovery-group-ref discovery-group-name="dg-group1"/>
                       </cluster-connection>
                   </cluster-connections>

                   <address-settings>
                       <address-setting match="#">
                           <dead-letter-address>jms.queue.DLQ</dead-letter-address>
                           <expiry-address>jms.queue.ExpiryQueue</expiry-address>
                           <redelivery-delay>0</redelivery-delay>
                           <max-size-bytes>10485760</max-size-bytes>
                           <address-full-policy>BLOCK</address-full-policy>
                           <message-counter-history-day-limit>10</message-counter-history-day-limit>
                           <redistribution-delay>1000</redistribution-delay>
                       </address-setting>
                   </address-settings>
</hornetq-server>

The only differences between this and the other backup being:

Different server name
backup-group-name
    • Parameterized to use value of system property “messaging.backup.group.c”, defaulting to backup-group-3 when absent
Uses different directories for storing it’s journals so it doesn’t overwrite the others
connectors/acceptors
    • Uses custom socket binding port “messagingb” which is added to the full-ha-sockets binding definition
    • Uses a different server-id value

Socket Binding Group

The following items are referenced in the socket binding group to enable all 3 HornetQ servers to run without port conflicts:

<socket-binding-group name="full-ha-sockets" default-interface="public">      
          ...
          <socket-binding name="messaging"  port="5445"/>
          <socket-binding name="messaginga" port="5446"/>
          <socket-binding name="messagingb" port="5447"/>
           …
</socket-binding-group>

JBoss Server Definitions

A minimal server group definition in domain.xml would be:
<server-group name="cluster-server-group-1" profile="full-ha">
<socket-binding-group ref="full-ha-sockets"/>
</server-group>
A server definition using the default values would look like this in host.xml:
<servers>
<server name="server-one" group="cluster-server-group-1" />
</servers>
A typical “second” node would look like:
<server name="server-two" group="cluster-server-group-1">
<system-properties>
<property name="messaging.backup.group.a" value="backup-group-2" boot-time="true"/>
<property name="messaging.backup.group.b" value="backup-group-3" boot-time="true"/>
<property name="messaging.backup.group.c" value="backup-group-1" boot-time="true"/>
</system-properties>
</server>
And finally a third node:
<server name="server-three" group="cluster-server-group-1">
<system-properties>
<property name="messaging.backup.group.a" value="backup-group-3" boot-time="true"/>
<property name="messaging.backup.group.b" value="backup-group-1" boot-time="true"/>
<property name="messaging.backup.group.c" value="backup-group-2" boot-time="true"/>
</system-properties>
</server>

Test Application

A small test application is available to manually walk through failover scenarios.   This is a modified version of the JBoss quickstart “Hello World MDB”. It’s available at https://github.com/Vizuri/hornetq-failover.git
When deployed, it provides a URL that can be opened that will post a number of messages to either a queue or a topic.

This application has been modified in the following manner:

  • Drops 20 messages on the queue/topic
  • MDB is restricted to have only a single instance, so each JBoss node processes just one message at a time
  • MDB onMessage method does a 5 second Thread sleep to slow things down and make it easier to shut things down in flight
  • The queue/topic definitions are removed from the deployment and moved the configuration

Setup

In order to run this example, we need a cluster of nodes. This can be achieved in the following manner:

  • Run on the target servers
  • Run on two or more instances locally that are bound to different IP addresses
    • Can bind one to your wireless NIC and one to your hardwire NIC
    • Can set up aliases

In a clean environment, use the following steps to run the failover scenario, replacing IP1 and IP2 with the actual values:

  1. Unzip a clean JBoss 6.1.1 instance to be the domain controller/server-one instance
    1. Copy domain.xml.failover into it’s domain/configuration folder and rename to domain.xml
    2. Copy host.xml.failover.host1 into it’s domain/configuration folder and rename to host.xml
    3. Copy over the mgmt-users.properties file into it’s domain/configuration
    4. startup with:
      1. ./domain.sh -b IP1 -bmanagement IP1
  2. Unzip a clean JBoss 6.1.1 instance to host server two into another directory
    1. Copy host.xml.failover.host2 into it’s domain/configuration folder and rename to host.xml
    2. startup with:
      1. ./domain.sh -b IP2 -bmanagement IP2 -Djboss.domain.master.address=IP1
      2. Should see cluster messages in both server console outputs indicating that the HornetQ servers have found their peers and backups
  3. Deploy the application
    1. login to http://IP1:9990
      1. admin/admin99!
    2. Add the jboss-as-helloworld-mdb.war, and assign to the cluster-server-group-1

Test Run

  1. Open a browser to http://IP1:8080/jboss-as-helloworld-mdb/HelloWorldMDBServletClient to drop 20 messages on the queue.   
  2. Console windows for both servers should show messages being processed in 5 second intervals
    1. Typically, one will be doing the “odd” numbers while the other does the “even”
  3. About half way through, stop one of the servers
    1. Can cleanly stop from http://IP:9990 in the overview
  4. The other node should complete it’s own work (up to 20 or 19) and then immediately start processing the messages that were not completed from the stopped server

Clean Up

When running a test that uses a different configuration, it is recommended to wipe the slate clean. This is accomplished by:

  1. Shutting down all JBoss processes
  2. Removing tmp, data, log and servers directories from each jboss-eap-6.1/domain folder
  3. Removing the deployments from the domain.xml in both the general “deployments” section as well as for each relevant server-group

Comparing with Current Setup
It is possible to demonstrate the current failover behavior using the appropriate configuration files (e.g. domain.xml.nofailover, host.xml.nofailover.host1 etc.) and following the same setup instructions.

Conclusion

Hopefully you found this post helpful.  

Are you looking for JBoss consulting services? We have a team of experts on staff ready to help - click here to schedule a complimentary consultation today.

References
Red Hat JBoss EAP Official Documentation
Reference Implementation for HornetQ Cluster

 

Jiehuan Li

A former Vizuri developer, Jiehuan Li brought more than fifteen years of experience in designing customized IT solutions to Vizuri’s projects. Interested in this post? Connect with our team of experts by