Thoughts on Logging Strategy
Like many things in programming, doing a decent job logging isn't rocket science. And yet we've all seen it done badly so many times. Logging is often a critical part of your application. Fellow developers, here are a few thoughts to help you get it right and get some love from your support teams (and customers).
Backing up - why are we logging?
- Monitor the application lifecycle: Administrators need to know if the application is up, initialized and ready to go. They also want to see that it shuts down cleanly.
- Monitor errors: Serious system errors need to be logged for administrators and support. Stacktraces and other relevant details should be included in each problem report. The standard for ERROR logs should mean that action is necessary by the administrator.
- Debugging: Developers add debug logs to help see the state of the application. These logs should not be enabled in production and are likely meaningless to customers. Customers should not rely on these logs for any reason - they may change or go away.
- Tracing: When debugging particular use cases, administrators, support and developers want to trace the execution of the program. This means entry and exit messages at the method level. Ideally, this is not something you have to hand-code. Don't do this in production unless you have a way to trace just the transaction in question lest you kill performance.
- Checkpointing: A special case of tracing that is customer focused. Well-known status logs are written at important points in the execution of certain use cases. These logs are documented and map to documented use cases.
- Analytics: Customers will use the log files to find trends in user and system behavior. See "Big Data".
Good ideas in logging
Customers should control log configuration
Give customers control over what is logged and how. You will not be able to anticipate all their needs. Typically this means a log config file that controls what is logged, at what level and to which file (or other target). Allow this file to be changed at runtime - don't force them to restart the application to change the log config. If you're using a well-known logging framework (e.g., Log4j), customers will know how to configure logging - don't abstract this away in an attempt to "simplify" log control.
Quiet logs in production
In production, administrators want to see lifecycle events and serious errors - and nothing else. If they want more, they need to change the log configuration. Log files can get very large and become hard to analyze. Excessive logging can seriously slow a system. So minimal logs should be the default.
Rotate log files
Make sure the log files can be rotated based on time or size to avoid creating monstrously large files. Don't fill up a disk and crash the application with a run away log file. Customers should be in control of the parameters for rotation.
Well-defined log levels
Use a set of distinct, standardized log levels. Many apps use the following:
- ERROR indicates a system error (not a user error) and that administrator action is necessary. The log should indicate the action to take. Perhaps support should be notified.
- WARNING means there is a condition that will degrade functionality or performance. Administrators may want to take action. Logs should indicate an action to take (if any).
- INFO indicates lifecycle events like startup, shutdown and configuration changes. The absence of a particular INFO event may indicate trouble for administrators. Normal use by end users should rarely result in INFO logs.
- DEBUG is for developers only. Customers shouldn't ever need to turn this on.
Don't depend on debug logs
Customers should never need debug logs and should never depend on them. Other mechanisms should help customers configure their systems. Tracing/checkpointing tools are helpful for debugging. The application or accessory programs should help them validate and test their configuration.
Don't force customers to turn on system-wide debug logs to get a grip on a specific failure. Provide them a way to trace and debug execution for a single user, an IP address, a specific URL, a single use case, etc. This will allow them to trace without killing performance or generating too many log entries.
Allow customers to adjust log levels for specific log sources ("categories" or "loggers" in log4j terminology) . Often the category is the file or class producing the log. This is handy for developers, but not always best for customers. Consider adding categories for customer-focused concepts like "security" or for system components that the administrator understands like "caching".
System errors vs. User errors
Users provide bad inputs all the time. These are not problems that the administrator needs to correct and should not be logged as ERRORs. Operation Measures (e.g., event counters) give administrators a view of the number and type of user errors. Checkpointing and tracing mechanisms help administrators debug bad inputs.
Logs are an attack vector
Logs should never contain sensitive information like passwords. Even debug logs have to obey this rule if customers can turn them on. Customers should be able to share logs with support without fear of exposing sensitive info.
Log the system's environment
To help support, log the system's environment at startup:
- OS type and version
- Software version
- Host name, IP
- Versions of supporting software (e.g., Java)
- Relevant environment variables, system properties
- Locations of config files
- Values in config files
Number of log files
Not all logs should be written to a single file. Trace and debug messages often belong in a separate file from lifecycle logs. The reason is that these logs have different audiences. Admins want to lifecycle info and errors. Support wants to see the tracing for a specific use case. If you use multiple logs files, be very clear about which files support will need. Consider including a tool to bundle and send the files to support.
When logging an error, give a short, stable id as well as a descriptive message like "COMPX01234 Missing password, authentication failed." The "COMPX01234" value makes communication easier between administrators, support and developers. Be careful re-using error identifiers. Ideally, they should lead developers to a single source code location.
Especially in clustered or distributed environments, customers may want to aggregate their logs to a central place for analysis. Many logging frameworks allow logging to syslog, JMX, a database or a network pipe. Allow customers the flexibility to configure these features.
Log entry structure
Both humans and machines will be reading the log files. You need enough structure to support machine analysis, but the files should also be readable by human beings. A line-per-log format helps very much when using grep and search tools. XML is amenable to machine search, but is terrible for humans to read.
What goes in each log entry?
- Datetime - To the millisecond
- Thread Id
- Transaction or Session Id - Allows you to distinguish between transactions. In a distributed system, use the same transaction id across components.
- Source code location
- If error, Error Id
- A descriptive message - Take care to include the context. If this is a NumberFormatExecption, include the value that failed to parse as a number.
Too many debug logs are a code smell
I've worked on systems that had huge numbers of debug logs. They were all added by well-intentioned developers, but they cluttered both the source and log files. I think this may be symptomatic of a team that doesn't utilize unit testing and good debuggers sufficiently. Developers should consider removing the truly extraneous logs before check-in.
Logs are not a complete monitoring solution
Many aspects of the system should be visible to administrators without log analysis. Monitoring systems should be able to give them the current system state. Operational measures (e.g., counters and statistics over them) and Alarms are other ways to indicate system state. For example, an Alarm may indicate the connection to a back-end system is down. An Operational Measure can indicate how many times the connection has dropped in the last hour. Of course, it still might be appropriate to log the "connection down" event for auditing purposes.
What have I missed?
I'm a developer, though an experienced one who has shipped a number of servers. Admins, DevOps guys - what am I missing? What advice would you give developers about logging in enterprise-y environments?
Announce: mk-project 1.5
- Previously, a project's compile command could only be a string to be interpreted as a shell command (think "make"). Now the compile command can also be a Emacs lisp function name. Flexibility is good.
- The project-status command now prints the project status to buffer *project-status* instead of *Messages*.
- Several commands now have optional arguments which allow them to be called programmatically as well as interactively. The affected commands are project-load, project-grep, and project-ack. So now you can call
(project-grep "TODO")from your own code.
- The code will now ignore "_darcs" directories if you set the vcs type to 'darcs.
- Finally, mk-project now has a menu!
Emacs: Custom columns for Ibuffer
Emacs 23.2 is out! To celebrate, I'll write (yet) another Emacs post. Please forgive the fact that the content has nothing to do with version 23.2 in particular (you can read about all new stuff in the NEWS file (C-h n)).
I'm very fond of the Ibuffer package. It lists all your buffers, and allows you to sort, filter, and group them as you please. You can operate on multiple buffers by marking a subset and then calling a command (think "delete" or "replace-regexp"). Amazing! Ibuffer's got it all.
I was reading Ibuffer's source to see if I could add my own
information to the buffer listing. Specifically, I wanted to
indicate that a file was part of my "project" (as defined
by mk-project). Ibuffer
did not disappoint. It was trivial to create a new custom column with
In the "screenshot" below, I have a mk-project defined which includes all the files in my ~/elisp directory. You'll see a "p" next to each of these files.
MRP Name Size Mode Filename/Process --- ---- ---- ---- ---------------- [ Default ] z.clj 2210 Clojure ~/z.clj tmp.txt 2915 Text ~/tmp.txt p projects.el 3127 Emacs-Lisp ~/elisp/projects.el p mk-utils.el 9968 Emacs-Lisp ~/elisp/mk-utils.el p basic.el 7245 Emacs-Lisp ~/elisp/basic.el ...
In the code below, I create a column
mk-proj-col, the column heading of which is simply
"P", and if a particular buffer is a part of my project, it will print
"p" next to it.
(defun mk/proj-buffer-p (b) "Is the buffer `b' part of the project?" (and mk-proj-name (or (mk-proj-buffer-p b) (string= (buffer-name b) mk-proj-fib-name) (string= (buffer-file-name b) mk-proj-tags-file)))) (define-ibuffer-column mk-proj-col (:name "P") (if (mk/proj-buffer-p buffer) "p" "")) ;; Define 3 formats, each including the new mk-proj-col ;; column. Switch formats with ibuffer-switch-format (bound to "`"). (setq ibuffer-formats '((mark modified read-only (mk-proj-col 2 2 :left :elide) " " (name 30 30 :left :elide) " " (size 9 -1 :right) " " (mode 16 16 :left :elide) " " filename-and-process) (mark modified read-only (mk-proj-col 2 2 :left :elide) " " (name 45 -1 :left) " " filename-and-process) (mark modified read-only (mk-proj-col 2 2 :left :elide) " " filename-and-process)))
And it was even easier to define a filter command that displays just the files in my project.
(define-ibuffer-filter project "Toggle current view to buffers in the defined mk-project." (:description "mk-project") (mk/proj-buffer-p buf)) (define-key ibuffer-mode-map (kbd "/ k") 'ibuffer-filter-by-project)
Announce: mk-project 1.4.0
- project-find-file-ido now displays relative filenames (relative to the project directory) rather than the absolute filenames. This cuts the visual clutter way down.
- A brand new project function, project-multi-occur, searches all open project files for a regex using multi-occur.
- A message is printed when the project is finished loading.
- A new README file provides details of all the configuration options.
- Now quoting filename arguments used in find commands. Filenames with spaces, for example, now won't break the find commands we use to generate TAGS or the index file.
Announce: mk-project 1.3.1
Version 1.3.1 of mk-project.el is available from github or the Emacs Wiki. David Findlay reported that project-grep and project-ack where running from the project basedir's parent directory, not the project basedir. This behavior was possible on certain versions of Emacs if the project's basedir didn't include a trailing slash -- like this:
(project-def "test-project" ((basedir "/home/me/test-project") ; no trailing slash ... ))
The new version ensures that the trailing slash is appended to basedir when it is used as the default-directory value.
Thanks much to David for the bug report and helping me test potential solutions!
Announce: mk-project 1.3
I'm pleased to announce the release of version 1.3 of mk-project. It offers several new features and a bug fix.
Feature: Custom Find Commands
Mk-project uses a "find" command in several scenarios: finding files to pipe to etags (project-tags), finding files to pipe to grep (project-grep) and finding files to index (project-index, project-find-file). Until this release, the find commands used in these situations where calculated using a combination of the "basedir", "src-patterns", "ignore-patterns" and "vcs" project settings. As nice and simple as this scheme was, there was no way to customize these find commands for more complicated project structures. For example, I'm currently working on a project with a very large $basedir/thirdparty directory that I do not want to include in TAGS, grep actions or the project index. With version 1.3, I can specify custom find commands that will omit the thirdparty directory:
(project-def "big-project" `((basedir "~/big-project") (src-patterns ("*.java")) (ignore-patterns ("*.class")) (src-find-cmd ,(concat "find ~/big-project \\( -path ~/big-project/thirdparty -prune \\) -o " "\\( -type f -name \"*.java\" -o -print \\)")) (grep-find-cmd "find . -type f | egrep -v thirdparty ") (index-find-cmd (lambda (context) (concat "find ~/big-project " "\\( -path ~/big-project/thirdparty -prune \\) -o -print"))) ...))
As you can see, there are 3 new "-find-cmd" project settings. The values can be simple strings specifying a "find" command or a function of 1 argument that returns the find command. The argument will be 'src, 'grep or 'index as appropriate, which allows you to write a single function to generate all 3 find commands if you'd like.
Feature: Relative paths in TAGS files
If your tags-file is located in your basedir (directly in the basedir, not a subdirectory of basedir), the generated TAGS file will now use relative file names. This makes the TAGS file portable. For example, if you copied the basedir to a new location, you could copy the TAGS file to the new directory and it would work without modification.
Feature: Custom ack command name
You can customize the ack-command name for your system. It defaults to "ack.pl" (or "ack" on Windows systems).
Fixed issue #1: project-ack does not use the "confirmed" command.
I've pushed mk-project version 1.2.1 to both the Emacs Wiki and github. It fixes a bug that was preventing mk-project from killing dired buffers open to a project's basedir (or a subdirectory of the basedir) when the project was closed (and the user elects to close all project files). You can now also expect dired buffers belonging to the project to be restored when the project is re-opened.
Thanks much to email@example.com for the bug report and code.
For more info on mk-project, see its homepage.
Switching to Atom from RSS
I've added an Atom feed for this blog that carries complete xHTML content. The old RSS feed carries just simple text. I'll leave the RSS feed in place for the time being, but I will likely pull the plug on it in the future.
Both feeds should be auto-discoverable by your browser.
etags-update: update TAGS when saving a file
Here comes another Emacs post!
While we're on the subject of TAGS, I recommend etags-select.el which will show a menu of matching tags when you run etags-select-find-tag (which I've bound to M-.). Very handy.
Technorati tags for this post: emacs