Dynamic Logging - Log Level Per Request

15 May 2018 on logs, operability, pickaroon

Whilst checking out the latest Thoughtworks Tech radar I noticed a technique that had been on my mind recently as well: Log Level per request.

Logging is a vital tool for operations, without appropriate logging you're blind, but you always have to balance granularity and detail vs storage costs/overheads. The benefits of logging and why you should log more are best served for (several) other blog posts, for now I want to explore what varying Log Levels per Request can look like, and whether any log libraries actually support this technique.

I'm compiling a list of libraties that support Log Levels per Request and will update this post over time, so please tweet me if you have any libraries you know of!

The General Principle

To balance storage contraints when logging we'll often restrict the amount we log, often by filtering based on log levels. These log levels will often look something like this:

DEBUG - Low level, extremely detailed log entries, would often have tens per request, such as starting operation x, completing operation x, calling Database Query y.
INFO - Do not indicate any level of fault within the system, can indicate standard operation feedback such as scheduled tasks being triggered or completing, requests being handled, startup/shutdown operations.
WARNING - This is something to be aware of, but you dont necesarily need to wake anyone up about it.
ERROR - Something's wrong and someone would need to be woken up to fix it.

In production it's extremely rare the log level of Debug would be enabled, as the number of logs per request would be extremely high, if you're lucky you can cope with the load from having INFO level enabled, but that may not be the case. Which often leaves a lot of systems I've seen only logging Warning/Errors. This is generally fine for detecting something goes wrong in a way you've predicted, but it's rubbish for diagnosising unknown error states. This aspect of a system - the ability to observe how it's operating, not how it's failing in known ways - has become a more popular are of late, termed a system's Observability. I've encountered systems where it's hard to determine this, something goes wrong and you end up having to add logging in iteratively until you start to zero in on the problem, I imagine you may have similar experiences.

Observability good then, but balancing it within a high traffic system can be challenging, this is where being able to dynamically adjust the level you're logging becomes key. With HTTP services the perfect method is available in Request Headers, e.g.:

X-Log-Level: DEBUG

The logger would need to check this header when logging an item, with languages where functions are a first class citizen (e.g. JS, .NET, etc) this can be quite simple:

But the problem is it needs to be built into the logging framework. And most log framework's I've encountered treat log levels as static entities, or at best something that be modified for the entire system dynamically, not for specific requests (helpful, but would still result in A LOT of logging potentially).

Distributed Systems, Tracing and More

Logging and Observability becomes even more critical in distributed systems. Correlation or Request IDs are extremely helpful in linking requests between systems so that all logs can be seen as one - what error sent to user was caused by a lower level microservice?

For this purpose request-id's are passed between services, in this case I'd recommend passing the request log level header as well, so that the expanded logging propagates for a request to all depedant services, not just the first contacted endpoint.

Of course the example of varying by explicit header is just one possible application, log level could be adjusted in this manner via cookie (include all users in an A/B test), or via random number generation (high detail logging for 40% of users), or more. The only usecase that would be particulary difficult to solve would be a common APM implementation: log outliers, as this would require the logs to be buffered outside of the application.

As mentioned in the ThoughtWorks Tech Radar, this whole area also links in with Open Tracing project - which is defintely worth checking out for more information.

In the wild

No libraries I've encountered in NodeJS or .NET support dynamic log levels, if you know of a library in any language that supports the technique please contact me or tweet me.

NodeJS

No popular libraries (winston/bunyan, etc.) support the technique, but:

Pickaroon/pickaroon-express

Disclaimer: I wrote this to experiment with dynamic log levels and other logging techniques.
https://github.com/tegud/pickaroon / https://github.com/tegud/pickaroon-express
I started writing Pickaroon as a way of trying out some more advanced logging techniques around enforcing consistent log field names, request context and dynamic log level, for this reason the base library can accept a function as a log level, and the pickaroon-express library sets the log level from the x-log-level header (the code above is from that library).

.NET

No libraries support Request Log Level, but serilog can change the log level during runtime and Matthew Skelton blogged in 2012 above changing the log level of individual log lines in the codebase by configuration.

Serilog

https://serilog.net/
Serilog can't do request based log levels, but it can change the log level of the entire system dynamically.

tegud.net