Your application is elegant and well designed. It's covered by unit tests, the domain is effectively modeled, and your key algorithms are scalable and optimized.
You know the methods like Jules Winfield knows Ezekiel 25:17. You may even be looking forward to pushing it to production, thinking, "Hey, we got it right for once!" However, unless you have littered your beautiful codebase with thousands of hacked-up log lines for the most trivial operations, you don't stand a chance in hell of supporting this app.
What's the biggest difference between how developers and QA use a system versus how actual production users use the system? It's simple: developer and QA know what the hell they're doing. Production users are going to click every button at the wrong time, enter bizarre and incorrect information into forms, and fail to read any and all messaging from the system. Not only will these barbarians be savaging your system, but there are thousands of them and they expect you to help them.
You can certainly try to support these users and fix the bugs they report without any logs. After all, you have their data, you have your app, and you can just reproduce their errors manually. That's nice thinking, but you're wrong. Remember, you know how to use your system. You're not going to load 87,000 records into your grid, or try to delete 87,000 records at once, or type an 87,000 word stream-of-consciousness diatribe about gluten-free products into a memo field. Good luck reproducing that big ball of insanity via the user's data alone.
What are some of the benefits of excessive logging across your application?
First, you know exactly what users were doing when something blew up. This is always much more helpful than a user-submitted bug report; the users submitting these bug reports have an imperfect knowledge of the system and they're not going to remember exactly what they were doing when catastrophe struck. You can reproduce erroneous conditions exactly if you've been logging like a fiend, though.
Second, it's much easier to determine if a production issue is an actual bug or if it's some random hiccup in your production environment. Databases crash, routers get rebooted, and third-party APIs go down for updates. Maybe your production environment only experienced this hiccup for 1.5 seconds, but it could be the precise period where the CEO was hitting the Save button. Good luck figuring that out without logs.
Third, voluminous, time-stamped logs can give you great insight into production performance. If you write software like the rest of us, the load your system is under in the development and QA environment is nothing at all like the load it's under in production. Once under that load, trivial things can slow down... a lot. Think about a wooly mammoth, stuck in tar, dragging a Winnebago behind him, and know that'll be some random part of your system that you never expected. If you collect time-stamped logs like my great aunt collects stray cats, then it's a cinch to find these culprits and then begin to tune them appropriately.
I think we all agree now that logs are tremendously important.
If not, I encourage you to head over to check out the latest Ziggy comic and to leave the grown-up talk to the rest of us. What exactly do we want to log then?
First, log every time your system goes out of process. Thus, log all database, filesystem, OS, API, and web service calls. Since you typically don't control these dependencies, you need to know exactly what you sent to them and what they sent back in order to troubleshoot any weirdness.
Second, log all user input. It's hard to overestimate the variance you'll see when you start asking your users for information. Don't be s
hocked when you see that people are filling in the First Name prompt with 'hi my name is anthony and i am in the market for that crystal skull vodka that dan akroyd sells on the internet'. Real users can submit strange, strange stuff.
Third, log the input and output for any non-trivial algorithms in your code. Sure, you've tested this logic to Hades and back, but you'll still find some weird edge cases in production. For example, one of my beautiful scheduling algorithms one time blew up in production because of a weirdness with Argentinian timezones. I certainly didn't anticipate that.
Adding all of this logging into your code is certainly going to ugly it up. It's a good kind of ugliness though, an ugliness that shows preparation for any and all crazy ass questions you'll have to field once you have real users. And this ugliness definitely beats the alternative, which is slowly being driven insane by trying to fix bugs you can't reproduce. Think of it as logging as self defense.