Where to logs

Related to Opinionated guide on how to write a log

I would like to write in more detail about where to put log, the benefits, disadvantages, and how to mitigate that.

I want to put a context that we're developing web services, whether public access or internal use. No matter what protocol is used, HTTP with JSON, gRPC, Thrift, etc. I think my guide may adapt to all of these kinds of technology.

So, where to put logs?

1. Incoming requests – aka. access log

Usually, there are two types of requests: read and write. You can write a middleware, interceptor, AOP, or whatever is suitable for your situation and log all necessary data such as HTTP request data, response status, process time, caller information, controller and handler name, etc.

This is very useful for many of situations:

Someone might take it to a very extreme by constantly logging the entire request and response body. Well, it depends on context and company, but I suggest logging those only if there is a serious error such as HTTP 500 so that it won’t violate user privacy, reduce security risk, and save the cost.

2. At the end of handlers that are not read-only

You should put a log to record an event (what happened) at the end of every handler that is not read-only, such as POST, PUT, PATCH, DELETE, or depending on the RPC technology you are using.

I don’t suggest putting it on read-only handlers because most of the systems are very read-heavy, and you already have an access log from above, so it doesn’t add much value to do that on all handlers.

What should be logged on those non-read-only handlers?

In order to log so much of those information, structured log such as JSON is a must.

The benefit of doing this is building real-time business metrics to show to all stakeholders. It is pretty exciting to put data, visualize, and see the impact of what you did in real-time.

3. When handlers got an error or exception

Access log with response status is not enough for investigation when there is an error or exception on a request. Handling errors and putting a log in each handler is very helpful to know what's wrong with that request.

You should log an error message, stack trace, why the error happens, on which block of code. Don't write a generic error log or catch a generic exception because you won't get enough detail to understand the problem.

4. When the system calls to external dependencies

In a microservice architecture, it is common that your system has to invoke an API call to other services. You may remember that there are 2 types of requests: read and write. If you send a request to get the data back without any side effects, it is a read, or you may call a query request.

As your system is a client who calls to another service, I suggest you write a log for all requests being sent out from your service. Again, the information to be logged is quite the same as the access log.

There are disadvantages to logging every call, especially for the query request type. The most significant drawback is overhead. Around 80% of requests are query requests; logging all of them will cost you the storage, compute power, and increased response time. In this case, metrics are a better solution as you can count the number of requests, duration histogram.

By the way, if a query request has got an error, I still suggest logging it for investigation. You may also apply a log sampling or rate limit to mitigate the log flood problem in case of 100% of requests are errors.

5. After the system did something important

Even though you had event logging at the end of handlers, there is a missing gap here such as a cron job, watchdog, housekeeping. Those are still not logged and in many cases, it is useful to know what the system does so you can correlate the event in case of something bad happened.

I had an experience on a system where cron jobs are running in the same process as the web server. The application was deployed to many physical servers. Once it is time to run a cron, every server triggers a job simultaneously, which causes a sudden spike in system resources and database.

I'm lucky enough that once I take a look at the centralized log system around that time, it is so obvious that the cron job is a problem.

Here's a list of what should be log

6. When you want to record what happened

You can log anywhere in the code, so don't limit yourself to logging only inside handlers. I usually log about security information, such as someone trying to login using the wrong password, requesting an OTP, delete a record without permission.

You can also log for audit purposes: database record with before & after, login success, email verified, etc.

This might sound a bit duplicate with No 2. but it is not. As a single request can perform many tasks, the log at the end of the handler will record an overall event. But this kind of log is more granular, so it records what happened within a single request.

Closing

I found that logging is more helpful than metrics when you need very detailed information, but it comes with writing, transit, processing, and storing costs. For a system under heavy load, you should reduce the number of logs by using metrics or sampling the logs. But for non-read-only requests, you should always write logs as you might get the benefit when investigating the problems and real-time business metrics for monitoring.

These guidelines on where to logs are what I think you should always do. By the way, feel free to put any log in any log level on the locations that it might be helpful for you.

#codestyle