An Open View

How to detect an infinite loop in your BRMS Rules and auto recover - part 1

So you just got into building rulesredhat_brms.png and you finally got something to work, when suddenly you hear your laptop fan start spinning out of control. You have no idea what is going on and suddenly you are unable to do anything, your laptop becomes completely unresponsive. All you remember is that the last thing you did was execute a set of business rules running on your local application server, whether JBoss/WildFly or Tomcat.

Well, the chances are good that you just hit your first infinite loop! This is part one of a three part blog post that covers how to detect an infinite loop in your rules and how to auto recover:

What is an infinite loop in Rules

An infinite loop is where one rule is causing itself (self-loop) to fire over and over again or where two or more rules play Ping-Pong (compound-loop) and cause one another to re-fire. The question more precisely is what can a rule do to cause another rule or itself to re-fire? Well it is a combination of the when part (condition: LHS - Left Hand Side) of the rule and the then part (action: RHS - Right Hand Side) of the rule. More specifically, it all starts when you make use of the update/modify, delete/retract and insert commands in the RHS, which modify the Working Memory of the BRMS rules engine. The moment the Working Memory is updated, the rules engine updates its indexes, which is nothing other than a huge Hash Map that gets updated and validated. During this process, the rules engine re-evaluates all rules that can potentially be affected by the given domain model object, that was just modified, deleted or added. In combination with poorly defined conditions/constraints in the LHS of the rules, the dreaded infinite loop is born. 

Great! That was a mouthful, but what does it really mean?

How to create an infinite loop

I don't know about you, but I learn best by doing and executing the code myself and tweaking it until something breaks, so let's look at an example to demonstrate:

Screen Shot 2017-01-13 at 11.50.20 AM.png

Code Block 1 - Single Rule

In the above DRL file, we have declared a class called Counter and then have a rule (name: "Update counter to test infinite loop") that will fire if we have a Counter with a count greater or equal to zero (see line 13). However, on line 16 we are modifying/updating the count for Counter by incrementing it by one, so what you have created is an infinite loop. The rules engine will re-evaluate all rule conditions to see if there are any rules that might be affected by this object change and if so, re-activate those rules. In our case, this rule will cause itself to be re-activated and thus, continue to fire until you have to kill the Java process with a "kill -9" command.

In the above example, we can see how easy it is to create an infinite loop in a rules engine. I would not suggest that you create this example in your rules engine, but if you want to see what will happen, go ahead. However, I would recommend that you have two terminal windows open (one for your application server and one for killing the application server process). Make sure you look up the process id of your running server with the "ps -ef | grep java" command and type the "kill -9 ..." on the one terminal but do not execute it yet. Then run/execute the rule(s) and be ready to execute that kill command. On a Mac, you will hear the fan spinning up and you should have no problem going to the second terminal to kill the process. My experience on a Windows machine is that it can bring the machine to a frozen/unresponsive state, so be ready to stop the process right away.

There is a better way of simulating an infinite loop with less severe consequences. Look at the following rule change, which is a small modification of the constraints on the rule:

Screen Shot 2017-01-12 at 3.51.56 PM.png

Code Block 2 - Improvement

In the above code change, we improved the rule condition or Left Hand Side (LHS) to limit the count attribute from going higher than 10. This way, you should only see the rule re-activated 9 times and then the rules engine will quit. You can change the value from 10 to 20 or 30 to see how it will fire more times. In the action (RHS) of the rule, you will see that we are printing to the system log the message: "Bad dog ..." and you can go and verify this output in your server log or console when you fire all rules in your rules engine.

Now, let's look at another example where an infinite loop is created. In this case, it is when two or more rules trigger one another (compound-loop) to re-fire. This is the more common case of infinite loop generation and is also the more difficult to identify:

Screen Shot 2017-01-13 at 12.21.59 PM.png

Code Block 3 - Multiple Rules

In the above DRL file, we have added a new rule (name: "Update counter to test infinite loop - cat"), and have updated the rule name for the first rule ("Update counter to test infinite loop - dog"). I am going to pretend that these two rules represent a cat and dog fighting and will re-activate one another.

Let's look at what is going on here. You will notice that both rules have conditions/constraints that look at the Counter object. Also, both rules modify the Counter object. Given the same declaration of the Counter object as before (Code Block 1 - line 5) the first rule (dog) will fire. The second rule (cat) will not fire first, since its condition requires that count has to be greater than 0 and our initial count was set to 0 in the declaration on line 7 of Code Block 1. So in round 1, we have 1 for the dog and 0 for the cat.

When the dog rule fires and modifies the Counter object, it will cause the rules engine to re-evaluate all the rules with any conditions on the Counter domain model class. In our case, we only have these two rules and they both have a condition on the Counter class. So does it mean both will fire? Let's look at the conditions. The dog rule will fire, if count is less than 10, so yes it will fire again. But on line 2 of Code Block 3, I have added the rule attribute no-loop and set its value to true. This will cause the dog rule not to fire again, if itself (self-loop) was the cause of the re-activation, so no, the dog rule will not fire (we will talk more about the no-loop in Part 2). 

For the cat rule, its condition/constraint only requires that count to be greater that 0, so if the dog rule just made it 1, then the cat rule will definitely fire. So for round 2 we have 1 for the cat and 0 for the dog. The moment the cat rule updates the Counter, it will trigger the dog rule again and so you can see how the two rules keep on triggering one another. The dog rule will stop at count equal to 9 since the dog rule will give up and stop incrementing the Counter due to the restriction on its condition. It is also important to note that the no-loop attribute has no effect on the dog rule if it was the cat rule that triggered the rules engine to re-evaluate all rules. With a complex rule set, this kind of situation can be amplified and you can potentially have a chain of rules causing one another to re-fire. So what now?

Coming up in Part 2

Be sure to check back for Part 2, where I will cover best practices for preventing an infinite loop.

Ben-Johan van der Walt

Ben-Johan van der Walt is a Software Architect/Engineer with over 20 years of experience leading successful projects of various sizes and scopes. He is a seasoned professional, with outstanding project planning, execution, mentoring and support skills. He is always ready for a challenge.