There are different types of software bugs listed on Wikipedia. They are categorized on the type of errors such as logic, syntax, resource, etc… None carries personality and express the frustration they cause. And it needs to change… especially when we consider that the term itself, bug, is a colorful way for naming a software error.

So I’d like to contribute more colors to the software landscape by suggesting one more type of bugs: the bed ones. The kind of minor and pesky bugs that won’t kill but leads to sleep deprivation:

Bed Bug

The characteristics of such bug is as follows:

  1. It hides well, you don’t see it easily.
  2. You need special tooling and attention to zoom in on it.
  3. It doesn’t cause big harm, but it possibly has greater side-effects.
  4. The fix to get rid of this bug is generally simple.

I’ve encountered one this week. Let’s examine a special case that led to this one character change fix:

bed_bug
Java supports boolean values in two different forms: primitive and object. The fix implied a switch from the Object type (denoted with a camel case, the big B version) to the primitive type. Often this type of change is done for a taxed method to improve performance. But that is not the case here. This method is part of a class that implements an Apache log4j Filter abstract class, hence this is a method that will get called very few times in the application’s life cycle.

But let’s rewind to the encountered symptoms prior to the fix, so we can review the bed bug afterward:

  1. The web application start-up crashed with Guice injection errors.
  2. The Guice configuration did not change in the past months and was working fine during all this time.
  3. The error did not always occur, sometimes the application started just well.
  4. The error occurred mostly in our deployed local instance of Tomcat.
  5. The error did not occur in our deployed online instance of the application (also in Tomcat).
  6. The error occurred once in a while with the GWT Eclipse plug-in, but a refresh/rebuild seemed to get rid of it.

There were several errors at start-up and some of these were red-herring to the real error. First, the traditional Guice injection error report:

oct. 26, 2013 3:38:46 PM org.apache.catalina.core.StandardContext listenerStart
SEVERE: Exception sending context initialized event to listener instance of class a.company.ui.server.guice.WebAppModuleContextListener
com.google.inject.CreationException: Guice creation errors:

1) No implementation for a.company.domain.dao.DomainDao was bound.
while locating a.company.domain.dao.DomainDao
for parameter 1 at a.company.domain.server.ApplicationServiceImpl.<init>(ApplicationServiceImpl.java:128)
at a.company.domain.server.service.RPCModule.bindServices(RPCModule.java:51)
(+ 10 more..)

Or other thread crashing for file management, reporting that an attribute for the log4j filter configuration was missing:

SEVERE: Exception sending context destroyed event to listener instance of class a.company.domain.server.FileCleanerListener
java.lang.NullPointerException: The presence option was not configured for key 'SessionId'.
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208)
at com.modelsolv.log.MdcPresenceFilter.decide(MdcPresenceFilter.java:33)
at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:244)
at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
at org.apache.log4j.Category.callAppenders(Category.java:206)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:209)

There was a lot of these exceptions reported and our team couldn’t figure out what the real problem was. Keep in mind that these symptoms showed up randomly too. But at some point, I noticed these logs that seemed related to the previous error reported:

log4j:WARN Failed to set property [present] to value "false".
log4j:WARN Failed to set property [present] to value "true".
log4j:WARN Failed to set property [present] to value "false".
log4j:WARN Failed to set property [present] to value "true".

Why would this simple property of the log4j configuration failed to be set? Is it really set in the log?

<appender name="consoleMdcAppender">
  <layout>
  <param name="ConversionPattern"
    value="%d{MM/dd/yyyy HH:mm} %5p [%t] (%F:%L) (Session: %X{SessionId}, User: %X{UserId}) - %m%n" />
  </layout>
  <filter>
    <param name="key" value="SessionId" />
    <param name="present" value="true" />
  </filter>
</appender>

This property is well configured. But not well set as the logs report. And as it was not correctly set, the safe guard condition within the MdcPresenceFilter threw an exception as it has a mis-configured present member, which led to the thread crashing:

@Override
public int decide(LoggingEvent event) {
  Preconditions.checkNotNull(key, "The MDC key was not configured.");
  Preconditions.checkNotNull(present, format("The presence option was not configured for key '%s'.", key));

Attaching a debugger and setting a breakpoint to the code location where the parameters were set for log4j was the next logical step. The org.apache.log4j.config.PropertySette class was the right place to look into. After some debug sessions, it turned out that the boolean value of the present filter parameter could not always be correctly set. There is a PropertySetter#convertArg method that is responsible for converting the textual value of the log4g configuration into the proper Java type. If you looked at my previous posted configuration, you’ll get this should be converted to a boolean value. This is a snapshot of the debugger at the exact condition where log4j had to realize the same thing:

condition-not-picked

But it never got into this condition. Ever. The type parameter presented in this snapshot was determined by previous log4j logic that introspected the MdcPresenceFilter class and its methods. It found this excerpt:

public void setPresent(String present) {
  setPresent(Boolean.parseBoolean(present));
}

Hence it asserted that the correct type to set the textual “true” or “false” value had to be java.lang.Boolean. But although a quick glance to the condition in the snapshot could make you think that it does support the Boolean object type, it does not:

} else if (Boolean.TYPE.isAssignableFrom(type)) {

The key element of the faulty line is that it refers to Boolean.TYPE. The latter refers, in fact, to the primitive boolean type of Java. It’s quite different from the full-blown Object java.lang.Boolean type. Hence it means that only the primitive type of boolean is supported by the log4j filter parameters. It’s no big deal so far. But here’s the breaker that made this bug inconsistent in its behavior and why it worked sometimes… and some others not:

public void setPresent(boolean present) {
  this.present = present;
}

public void setPresent(String present) {
  setPresent(Boolean.parseBoolean(present));
}

At the time I wrote the MdcFilterPresence class, I overloaded the setPresent method with a version that accepts a String parameter and that automatically converts it to a Boolean object. As it turns out, the log4j library decides to convert the textual filter parameter of “true” or “false” to, sometimes a String, or some other times a Boolean. Yes that’s right, it’s not always the same! When it decides to convert it to a String, the application starts correctly, else when it picks the Boolean conversion, the application crashes, as the latter type conversion is not supported by log4j. That made the diagnosis of this bug a bit harder because no clear constants were obvious and it all seemed like red herrings.

Why the log4j library arbitrarily decides to pick either Boolean or String as a conversion is non-deterministic per my observation and entirely depends on the java.beans.GenericBeanInfo class, as shown by these screenshots:

Boolean was picked.

Boolean was picked.

String was picked.

String was picked.

These are the kind of bugs that creep in your bed and make your work a bit harder. All of those swayed away with a single character change fix. As I like to put it about bug fixes: “the amount of time required to find a bug is opposite to the amount of changes required to fix that bug.” Not always true, but once in a while, it bites you hard.

Advertisements