Welcome to Matt Keller's little home on the web. This site is a work in progress, a labor of love, a galacticly large time suck. If something is broken then I'm certainly the one who just broke it. Which means I'm probably improving something. Alas.
Last 5 Blog Entries:
Thoughts on Logging Strategy
Like many things in programming, doing a decent job logging isn't rocket science. And yet we've all seen it done badly so many times. Logging is often a critical part of your application. Fellow developers, here are a few thoughts to help you get it right and get some love from your support teams (and customers).
Backing up - why are we logging?
- Monitor the application lifecycle: Administrators need to know if the application is up, initialized and ready to go. They also want to see that it shuts down cleanly.
- Monitor errors: Serious system errors need to be logged for administrators and support. Stacktraces and other relevant details should be included in each problem report. The standard for ERROR logs should mean that action is necessary by the administrator.
- Debugging: Developers add debug logs to help see the state of the application. These logs should not be enabled in production and are likely meaningless to customers. Customers should not rely on these logs for any reason - they may change or go away.
- Tracing: When debugging particular use cases, administrators, support and developers want to trace the execution of the program. This means entry and exit messages at the method level. Ideally, this is not something you have to hand-code. Don't do this in production unless you have a way to trace just the transaction in question lest you kill performance.
- Checkpointing: A special case of tracing that is customer focused. Well-known status logs are written at important points in the execution of certain use cases. These logs are documented and map to documented use cases.
- Analytics: Customers will use the log files to find trends in user and system behavior. See "Big Data".
Good ideas in logging
Customers should control log configuration
Give customers control over what is logged and how. You will not be able to anticipate all their needs. Typically this means a log config file that controls what is logged, at what level and to which file (or other target). Allow this file to be changed at runtime - don't force them to restart the application to change the log config. If you're using a well-known logging framework (e.g., Log4j), customers will know how to configure logging - don't abstract this away in an attempt to "simplify" log control.
Quiet logs in production
In production, administrators want to see lifecycle events and serious errors - and nothing else. If they want more, they need to change the log configuration. Log files can get very large and become hard to analyze. Excessive logging can seriously slow a system. So minimal logs should be the default.
Rotate log files
Make sure the log files can be rotated based on time or size to avoid creating monstrously large files. Don't fill up a disk and crash the application with a run away log file. Customers should be in control of the parameters for rotation.
Well-defined log levels
Use a set of distinct, standardized log levels. Many apps use the following:
- ERROR indicates a system error (not a user error) and that administrator action is necessary. The log should indicate the action to take. Perhaps support should be notified.
- WARNING means there is a condition that will degrade functionality or performance. Administrators may want to take action. Logs should indicate an action to take (if any).
- INFO indicates lifecycle events like startup, shutdown and configuration changes. The absence of a particular INFO event may indicate trouble for administrators. Normal use by end users should rarely result in INFO logs.
- DEBUG is for developers only. Customers shouldn't ever need to turn this on.
Don't depend on debug logs
Customers should never need debug logs and should never depend on them. Other mechanisms should help customers configure their systems. Tracing/checkpointing tools are helpful for debugging. The application or accessory programs should help them validate and test their configuration.
Don't force customers to turn on system-wide debug logs to get a grip on a specific failure. Provide them a way to trace and debug execution for a single user, an IP address, a specific URL, a single use case, etc. This will allow them to trace without killing performance or generating too many log entries.
Allow customers to adjust log levels for specific log sources ("categories" or "loggers" in log4j terminology) . Often the category is the file or class producing the log. This is handy for developers, but not always best for customers. Consider adding categories for customer-focused concepts like "security" or for system components that the administrator understands like "caching".
System errors vs. User errors
Users provide bad inputs all the time. These are not problems that the administrator needs to correct and should not be logged as ERRORs. Operation Measures (e.g., event counters) give administrators a view of the number and type of user errors. Checkpointing and tracing mechanisms help administrators debug bad inputs.
Logs are an attack vector
Logs should never contain sensitive information like passwords. Even debug logs have to obey this rule if customers can turn them on. Customers should be able to share logs with support without fear of exposing sensitive info.
Log the system's environment
To help support, log the system's environment at startup:
- OS type and version
- Software version
- Host name, IP
- Versions of supporting software (e.g., Java)
- Relevant environment variables, system properties
- Locations of config files
- Values in config files
Number of log files
Not all logs should be written to a single file. Trace and debug messages often belong in a separate file from lifecycle logs. The reason is that these logs have different audiences. Admins want to lifecycle info and errors. Support wants to see the tracing for a specific use case. If you use multiple logs files, be very clear about which files support will need. Consider including a tool to bundle and send the files to support.
When logging an error, give a short, stable id as well as a descriptive message like "COMPX01234 Missing password, authentication failed." The "COMPX01234" value makes communication easier between administrators, support and developers. Be careful re-using error identifiers. Ideally, they should lead developers to a single source code location.
Especially in clustered or distributed environments, customers may want to aggregate their logs to a central place for analysis. Many logging frameworks allow logging to syslog, JMX, a database or a network pipe. Allow customers the flexibility to configure these features.
Log entry structure
Both humans and machines will be reading the log files. You need enough structure to support machine analysis, but the files should also be readable by human beings. A line-per-log format helps very much when using grep and search tools. XML is amenable to machine search, but is terrible for humans to read.
What goes in each log entry?
- Datetime - To the millisecond
- Thread Id
- Transaction or Session Id - Allows you to distinguish between transactions. In a distributed system, use the same transaction id across components.
- Source code location
- If error, Error Id
- A descriptive message - Take care to include the context. If this is a NumberFormatExecption, include the value that failed to parse as a number.
Too many debug logs are a code smell
I've worked on systems that had huge numbers of debug logs. They were all added by well-intentioned developers, but they cluttered both the source and log files. I think this may be symptomatic of a team that doesn't utilize unit testing and good debuggers sufficiently. Developers should consider removing the truly extraneous logs before check-in.
Logs are not a complete monitoring solution
Many aspects of the system should be visible to administrators without log analysis. Monitoring systems should be able to give them the current system state. Operational measures (e.g., counters and statistics over them) and Alarms are other ways to indicate system state. For example, an Alarm may indicate the connection to a back-end system is down. An Operational Measure can indicate how many times the connection has dropped in the last hour. Of course, it still might be appropriate to log the "connection down" event for auditing purposes.
What have I missed?
I'm a developer, though an experienced one who has shipped a number of servers. Admins, DevOps guys - what am I missing? What advice would you give developers about logging in enterprise-y environments?
Announce: mk-project 1.5
- Previously, a project's compile command could only be a string to be interpreted as a shell command (think "make"). Now the compile command can also be a Emacs lisp function name. Flexibility is good.
- The project-status command now prints the project status to buffer *project-status* instead of *Messages*.
- Several commands now have optional arguments which allow them to be called programmatically as well as interactively. The affected commands are project-load, project-grep, and project-ack. So now you can call
(project-grep "TODO")from your own code.
- The code will now ignore "_darcs" directories if you set the vcs type to 'darcs.
- Finally, mk-project now has a menu!
Emacs: Custom columns for Ibuffer
Emacs 23.2 is out! To celebrate, I'll write (yet) another Emacs post. Please forgive the fact that the content has nothing to do with version 23.2 in particular (you can read about all new stuff in the NEWS file (C-h n)).
I'm very fond of the Ibuffer package. It lists all your buffers, and allows you to sort, filter, and group them as you please. You can operate on multiple buffers by marking a subset and then calling a command (think "delete" or "replace-regexp"). Amazing! Ibuffer's got it all.
I was reading Ibuffer's source to see if I could add my own
information to the buffer listing. Specifically, I wanted to
indicate that a file was part of my "project" (as defined
by mk-project). Ibuffer
did not disappoint. It was trivial to create a new custom column with
In the "screenshot" below, I have a mk-project defined which includes all the files in my ~/elisp directory. You'll see a "p" next to each of these files.
MRP Name Size Mode Filename/Process --- ---- ---- ---- ---------------- [ Default ] z.clj 2210 Clojure ~/z.clj tmp.txt 2915 Text ~/tmp.txt p projects.el 3127 Emacs-Lisp ~/elisp/projects.el p mk-utils.el 9968 Emacs-Lisp ~/elisp/mk-utils.el p basic.el 7245 Emacs-Lisp ~/elisp/basic.el ...
In the code below, I create a column
mk-proj-col, the column heading of which is simply
"P", and if a particular buffer is a part of my project, it will print
"p" next to it.
(defun mk/proj-buffer-p (b) "Is the buffer `b' part of the project?" (and mk-proj-name (or (mk-proj-buffer-p b) (string= (buffer-name b) mk-proj-fib-name) (string= (buffer-file-name b) mk-proj-tags-file)))) (define-ibuffer-column mk-proj-col (:name "P") (if (mk/proj-buffer-p buffer) "p" "")) ;; Define 3 formats, each including the new mk-proj-col ;; column. Switch formats with ibuffer-switch-format (bound to "`"). (setq ibuffer-formats '((mark modified read-only (mk-proj-col 2 2 :left :elide) " " (name 30 30 :left :elide) " " (size 9 -1 :right) " " (mode 16 16 :left :elide) " " filename-and-process) (mark modified read-only (mk-proj-col 2 2 :left :elide) " " (name 45 -1 :left) " " filename-and-process) (mark modified read-only (mk-proj-col 2 2 :left :elide) " " filename-and-process)))
And it was even easier to define a filter command that displays just the files in my project.
(define-ibuffer-filter project "Toggle current view to buffers in the defined mk-project." (:description "mk-project") (mk/proj-buffer-p buf)) (define-key ibuffer-mode-map (kbd "/ k") 'ibuffer-filter-by-project)
Announce: mk-project 1.4.0
- project-find-file-ido now displays relative filenames (relative to the project directory) rather than the absolute filenames. This cuts the visual clutter way down.
- A brand new project function, project-multi-occur, searches all open project files for a regex using multi-occur.
- A message is printed when the project is finished loading.
- A new README file provides details of all the configuration options.
- Now quoting filename arguments used in find commands. Filenames with spaces, for example, now won't break the find commands we use to generate TAGS or the index file.