Thoughts on good software: Logging: Structuring debug output

When we begin learning to program, we all write debugging output like

printf("found string\n");

or, in the parlance of our times,

System.out.println("found string");

For a certain time, this sort of debugging output is enough. We remember where we put that output statement.

Things get harder when the code grows beyond a few hundred lines, or when more than one developer is working on it. Someone else just does not understand what train of thought the message comes from. Especially hard to understand are low-level control flow debugging outputs like AAAABACCDDD where each letter means that one specific branch of an if statement or other control flow statement has been taken (don't laugh, I've seen this sort of output in a release version of Acrobat Reader on Linux). The more your application grows, the more you are in need of a fundamental mechanism: Logging.

Logging should be the first thing you lay out before writing the first debug message of a code. A good log message consists of at least 5 elements per line of output:

WHEN was the message caused?
WHAT happened?
WHO caused the message (which object/method/function)?
WHICH THREAD ran the code?
HOW SEVERE was the message in the context of the task to accomplish?

(This only sounds like the rules when calling 911.)

Our little example from above may now look like this:

21 Jun 08 11:24:03.434 INFO [search_thread] TextSearch::searchTopic(): Found topic 'Jungle'

Let me explain why each element of this line is important.

The WHEN

As irrelevant as it may seem at first, the point of time a log message is caused is extremely important. Especially in multithreaded or distributed applications with the danger of race conditions (two or more threads or processes concurrently racing to use a data field / ressource) you have to know the exact point of time a thread tried to access something. Depending on your application, millisecond precision may be important. You can also detect time wasting in your code that way ("Gee, that call really takes 100ms? Why?")

Time runneth fast. Always time the log message at the very beginning of your log method. If you write all sort of code before (like string formatting, output preparation etc.), a context switch may already have cost you an unknown amount of time, and the "real" point of time of the log message is lost.

A very hard challenge is synchronizing log messages between different computers. Although NTP and time synching has been around for quite some time, synching of system clocks is still not precise to the millisecond. If you are on a fast local network, you can use a broadcast message over a network to cause a message at one point of time. Even if you have diverging clocks, you can then find out the leap between all clocks because the message was received at almost the same millisecond.

The WHAT

All log mechanisms allow one or another sort of output formatting where the language does not offer on-the-fly string building like in Java. In C/C++ the usual way is offering printf-style formatting with varargs. A call may then look similar to a printf call:

log("Now looking for text %s",searchstring);

Good log messages are like good algorithms: They are clear to read and describe what is being done. Obscure log messages, like obscure algorithms, may look ingenious, but are usually a pain to read and to maintain. Be clear when formulating log messages. Neither be too short ("Value: "+acceleration) nor too chatty ("The algorithm is just now processing the value acceleration which is of value "+acceleration). One rule of thumb is: Values alone are confusing. Always attach the name of the variable to it, and a unit if possible ("acceleration=3.5 m/s")

The WHO

You definitively want to know which function or method is sending you a log message. In C/C++, you can automatically insert the function/method name by using compile-time directives like __PRETTY_FUNCTION__ (gcc) or __FUNCTION__ (Visual Studio compiler). These directives contain the current function (try printf(__PRETTY_FUNCTION__);) With another preprocessor wrapper like

#if defined(WIN32)

#define __PRETTY_FUNCTION__ __FUNCTION__

#endif

#define LL(l) Logging::l,__PRETTY_FUNCTION__

you can avoid to have to write the long directive in each log message. LL(loglevel) will evaluate to 'Logging::loglevel,__PRETTY_FUNCTION__' and thus fill two of the arguments of a call to the log function:

log(LL(SEVERE),'test');

would satisfy a log(loglevel,method,string) function with three arguments , and the current method or function name would be inserted automatically. (The Logging class has to have constant values for the log levels as static variables to make the example work of course.)

Java offers the java.util.Logging class where you have to enter the method name yourself. And there's LOG4J of course.

If you have to keep track of which object executes what method, use either its name (with uniquely named objects) or its unique pointer if you can get hold of it (this is easy in C/C++ but may be difficult in other languages).

The THREAD

Ten years ago most software was written as one long singular control flow. If the code did something, it was done in a strictly linear way. Only sometimes a worker thread was started so that the GUI could still be handled while calculating something.

Although there are some applications that still work in this peaceful way, most modern applications have an abundance of threads. In such an environment it is of key importance to know which thread caused a log message, especially when multiple threads of the same kind are running. If possible, use named threads and make sure that all thread names are unique. In Java and C# naming threads is a standard feature, while in pthreads, which many portable C/C++ applications use nowadays, you have to use TLS (thread local storage) to attach a name to a thread.

The SEVERITY

Log messages vary a lot in terms of importance. All serious log mechanisms take this into account and require an importance measure be attached to each message: the log level. Names of log levels vary, but the general layout is usually like this:

ERROR: a message that MUST be shown to the user, and which is fatal
WARNING: a message that something has gone wrong which the user should know. Non-fatal.
INFO: a message of an informative character that the user should know about.
DEBUG: messages only important to the developer. Filtered in production code.
TRACE: verbose debugging output that is usually annoying even to the developer except in special cases. Filtered in production code.

Log mechanisms vary in their number of loglevels. Some add a FATAL level above ERROR, others use more fine-grained log levels at the DEBUG or TRACE level. This is a designer's choice.

Most GUI applications also include logging, but they strictly hide it from the end-user. Then of course appropriate GUI action has to be taken to inform the user.

Design details of a logging mechanism

One beginner's mistake when building a log function is using methods in it that issue log messages themselves. Depending on the use of mutexes (none/recursive vs. non-recursive) this either causes a stack explosion by infinite recursion or a deadlock (can't lock the log mutex ever because the log method is locking it already) Be very careful when writing the log output/transmission function when you have no way of determining whether a logging caused it itself. A simple (yet not thread-safe) way to avoid recursion is using a static variable logging_in_progress that is set to true whenever a logging is just going on. If the log method is called while logging_in_progress is true, it immediately returns. In a multithreaded application, this behavior is of course not enough. We'll come to that in a minute.

The log queue

Multiple threads may cause a lot of messages at almost the same time. As a log mechanism usually has to use a mutex to protect the integrity of its output, it can become a needle eye in your code. One log message may be output on a terminal, delivered over network connections and entered into a database. Much too long for one thread to lock a mutex.

The technique to reduce the cost of sending log messages is using a special log thread for the actual output/transmission of log messages and a stub log() method that just appends log messages to a log queue. The log thread basically does two things in an endless loop:

Look for log messages
If log messages in the queue, output/transmit them

and so on. Both methods, the log sender and the log receiver, need to be extremely carefully designed. The time spent locking the log queue mutex must be kept as low as possible at any cost. It should go without saying that the timing of the log message must be done in the log sender method, not in the log thread.

Extra care must be taken that the log thread does not starve because other threads take over all the CPU time. In massively multithreaded applications with hundreds of log messages every second, care has to be taken that the log queue does not grow infinitely and eats up all of the address space available to the application. A way to avoid trouble is dropping older log messages (possibly retaining ERROR and WARNING messages) in an overflow situation.

Instead of checking a logging_in_progress static variable, you have to make sure the log thread never queues in log messages in its own queue. A good way to ensure this is checking for the logging thread's unique name when using named threads.

The log queue is so important in multithreaded applications that, in my opinion, it should be declared a design pattern. Conversely, the non-queued logging should be declared an anti-pattern for multithreaded applications because it is a real time waster in all but the most basic situations (everything more complicated than just issuing a printf() statement).

Separating content from layout

For web pages, content management systems separate the content to be displayed and the actual layout. It is a good rule to do the same to log messages: Keep them in a structured way as long as possible, i.e. transform them to a flat string as late as you can. If your log method just builds a flat string, structured entry of log messages e.g. in a database becomes a real pain. If you use a log queue, make sure it uses structured data fields so you can easily process them in the log thread.

Conclusion

Logging is a key element in successful software debugging. It helps all developers and should be designed and used carefully. A good log mechanism implements log levels, thread debugging (where needed) and log message formatting. Log messages should be kept short and informative. Good logging output gives any developer a good understanding of what the code is trying to achieve and how it has succeeded achieving its goals.

Thoughts on good software

Sonntag, 22. Juni 2008

Logging: Structuring debug output

1 Kommentar:

Blog-Archiv

Über mich