From zen and the art of moto: ‘There’s no fault isolation problem in motorcycle maintenance that can stand up to it. When you’ve hit a really tough one, tried everything, racked your brain and nothing works, and you know that this time Nature has really decided to be difficult, you say, “Okay, Nature, that’s the end of the nice guy,” and you crank up the formal scientific method.’
I read it at age 18 and thought, "I should go buy a motorcycle and ride it around. That's the answer." Then I read it at age 30 and thought..."Oh, that wasn't the point at all."
Troubleshooting is one of my main comparative advantages - I'm better at it than I am at programming and I enjoy it more. It's also a relatively independent skill, not everyone is good at it or likes it. It reminds me of the lateral thinking puzzles I did as a kid where you had to ask questions to uncover whatever the weird situation was. You have to question your assumptions - think about how you might be wrong, something I like to do in general anyway.
There's a certain way of reasoning about the problem and thinking about what it might be with limited information in a systemic way. It's also a bit broader than debugging - you can do a lot of troubleshooting (sometimes faster and more effectively) by doing things other than reading the code.
It's also been somewhat of a career advantage because it seems to be both more uncommon than standard dev for someone to be really good at and something that most people dislike (while it's my favorite thing to do). It also overlaps a lot with other more general types of problem solving.
Anyway - a lot of the article resonates with how I think about it too.
I feel like the article is just a very long-winded way to say what I try to help junior devs understand and that's: take a step back and start from the very top, change only one thing at a time, don't become fixated.
So many issues turn out to be the smallest configuration mishaps, this is why I also promote using a debugger as much as I can as well - there's nothing quite like being able to see _exactly_ what's going on.
> this is why I also promote using a debugger as much as I can as well - there's nothing quite like being able to see _exactly_ what's going on.
I don't know, the speed of just reading code and maybe inserting some diagnostic messages is hard to beat. It's a pretty bad day if I feel like I need to bust out a debugger—99% of "seeing _exactly_ what's going on" is not going to be relevant and will just distract you.
I strongly and emphatically disagree on every point (I don't even know what your statement about runtime state and code even means actually, it just seems like a category error, but how do you even write code without being able to reason about runtime state? It just makes no sense.), but I understand there are people who love their debugger and I respect them.
Basically the only time I pop open the debugger is when it's otherwise difficult to see runtime behavior—say, a certain condition in a server for which you can't easily access logs. Outside of that it feels like major overhead and distraction from getting the bug fixed. Plus iterating without print statements is a tedious, tedious affair.
Don't get me wrong, it's a critical and necessary skill that junior devs often struggle to understand and master. I just think over reliance on the debugger will slow your velocity over time when most bugs have straightforward causes easiest to see by simply reading the code (which you'll have to do anyway with a debugger). I can't tell you how many times I've had to tell devs to put away their tools so we can calmly analyze the code without flipping back and forth between views. The vast majority of the time it's the second pair of eyes that resolves the problem.
> Realizing that I spend more time troubleshooting than I do building or doing ...
That's not good. The problem with troubleshooting is that it messes up with your reward system. After you fix a hard-to-debug problem, you feel a sense of accomplishment. Which would be ok, but the problem is that this sense of accomplishment is often time higher than it should be. You go home at the end of the day thinking "well, today I didn't build anything, but it's fine, because I fixed that bug". You are becoming complacent.
If you end up saying to yourself, like the author of this blog here, that you troubleshoot more than you build or you do, then you have a problem. Soon you'll be seen by others as a car mechanic. Maybe a reliable car mechanic. But reliable car mechanics don't get paid a lot.
This might be a controversial take but here it is: being proud of your troubleshooting skills sits somewhere between being proud of your typing speed and being proud of your word document formatting skills. These things never go obsolete, but don't fool yourself into thinking they are gold currency on the job market.
I think you may be leaning too far in the other direction.
I'm a troubleshooter. I fix problems. I keep my head straight in a crisis. Every job I've had across 3 decades, regardless of my actual title or formal responsibilities, I'm the firefighter. People call me when they can't figure something out. People call me when something big breaks and needs to be fixed urgently. Even if I'm not an expert in the broken thing, they call me in. They call me because the experts are often floundering and not making any progress because they can't troubleshoot their way out of a wet paper bag.
I do not feel this has held me back professionally. I have been loved by management and peers in all of these jobs. When I nearly left a prior employer because much of the work wasn't aligned with what I wanted to do, management created a new role with better aligned work and higher pay to convince me to stay. In my current role, I'm very happy with my salary, working environment, management, and team.
I wish troubleshooting skills were as common as typing and document formatting skills. I wouldn't need to help out nearly as many people because they could handle their own crises.
> I'm a troubleshooter. I fix problems. I keep my head straight in a crisis. ... People call me when they can't figure something out. ... Even if I'm not an expert in the broken thing, they call me in. They call me because the experts are often floundering ...
This describes a sizable portion of my career. It's lucrative, it's gratifying, and it's fun. It's as close as I'm going to get to being a "kick-ass mercenary".
Seeing new environments, new applications, and new problems never gets old. The stories that come from the work are priceless, too.
> I wish troubleshooting skills were as common as typing and document formatting skills.
When I conduct interviews this is the main skill I screen for. I think it can be taught, but somebody who already has it and is missing some particular technical experience is vastly more valuable.
I've found past a certain point career-wise, troubleshooting really can't be taught. It's sort of a a mindset/attitude to me. I you are 5+ years into your career and haven't gotten there, you probably just don't care. It's the attitude of a developer who is indifferent to the craft and just wants to cobble together found code as quickly as possible to move onto the next thing.
A good troubleshooter can enable higher output across a team because they are like grease in the machine. Particularly indifferent troubleshooters become a net drag because instead of being able to help others they are always interrupting others for help.
I think it has to do with interests. Some people have an inmate interest in how stuff works, and specifically how it breaks.
I think you can teach someone to troubleshoot in a procedural and methodical manner, but they will always lack the creative "spark" that comes from being actually interested. Procedural troubleshooters are useful, but they won't exceed the bounds of the model they've been taught to work under.
I don’t believe that’s true. It’s an attitude, not some kind of innate skill like reflexes. You can learn to believe in yourself, plus it’s teachable in my experience.
That would be more of a psychological hack. I've never seen this happen. My experience is people behave a certain way (care about what they do up to a roughly defined level) and 10 years later they behave the same. Self esteem tends to change or fluctuate and can be thought, but personally i believe that is not enough for a non-troubleshooting mindset to turn around. Unless you could convince me otherwise?
> I do not feel this has held me back professionally. I have been loved by management and peers in all of these jobs.
If only your experience was universal in that regard! I once had that role in an early-career job -- but I was looked down upon by peers and management because I was doing mostly maintenance work. The "good" developers, in their minds, were the ones shipping the most new features -- the irony being that those features would then blow up out in the field, at which time they landed on my desk to turn them into production-worthy code.
That's just poor management, IMO. The good ones will have your number in their cell phone to call when the stuff they shipped breaks (or even better, allow you to take the time you need to not ship broken code to begin with). Plus it doesn't take much time in the industry to realize that shipping a broken product is a far worse look than shipping slower, and that the faster you can fix a broken product the less money you'll bleed.
The word retainer has an appealing mercenary quality to it. The dream is that your knowledge of an esoteric system set up in the 1980s gets you warehoused in a data closet at a mid-sized organization, where you can spend the rest of your days browsing Hacker News and watching pirated films.
Troubleshooting skills are really valuable but hard to market. You can deal with lot's of different technologies and effortless draw conclusions from for others totally disconnected domains. Sadly the tech market values expertise that is based on keywords. So while it is fun and creates huge value it is worth staying mostly on a path that can be explained to less mentally flexible mortals.
I agree the need for troubleshooting can be born from poor decisions, but it's still a marketable skill for places that need it at scale. One of my roles was head of Linux Engineering for a Fortune 50. Sure there's the pets vs cattle thing and we all prefer cattle, but particularly in places with lots of legacy apps and infrastructure there are plenty of both that need more nuance than turning it off and back on again.
There's value in fixing things in the moment and then feeding them back to your engineer and architecture functions to address endemic issues so that everyone benefits.
This mentality through most of my career has left me trapped as technical support, and it's damn near impossible to climb out of the pit I've dug for myself. What you say about being seen as a car mechanic is true.
This played out at my last place. My boss would assign my co-worker to build the world's crappiest car in the least amount of time and when it broke down I would be the only one that seemed to be able to fix it (while my co-worker was busy building some other crappy car). I would have built a much better car in the first place! However I would have taken more time and the goal was to build and release as fast a possible. My boss was okay with the risk of said crappy car, my co-worker got promoted and I slowly burned out.
It's a tough balancing to make sure you sell yourself correctly and fight to work on things you want to!
We had a guy like this on our team once, it took a year to convince management he was a net drag on the team. Half the team quit, the other half said they would if they had to work with him any longer.
To prove the point we put him on a strategic rewrite and gave him master/trunk while the entire team moved to a feature branch for 6 months. This was complimentary to his ego as he was sick of us bureaucrats in the rest of the team telling him what to do and being such a burden on his genius creativity.
By the end he was unable to build / run his own branch, while the remaining team lost no velocity and was making regular releases to end users. The choice was easy at that point.
> and it's damn near impossible to climb out of the pit I've dug for myself.
By far the easiest way to do so will be to find another job. If you can't do this, yea, mentality will lock you in to positions you don't want to be in.
I feel this way about documentation. I do it, a lot. I get compliments and positive feedback on it. It helps me remember things I would otherwise forget. I hope that others would be inspired by my example but it hasn't happened. I could be selfish and horde my own documentation and let others sink or swim. But that hurts me too as I'd have to pick up their slack.
I'm reminded of the Gervais Principle. Doing the work is not the way to "win," but not winning might be the better lifestyle. Depends on your motivations, aspirations, and ethics. It's easy to chase the total compensation number, because it's just _there_ and like what are we doing anyway? But then what are you doing, anyway?
That's been my career mindset. I've been in roles where my management has referred to me as a "rockstar" and it was a burden not a compliment. I'd rather be in a supportive team environment where everyone carries their own weight.
Compensation has been decent with this approach over the years. I could have made more staying longer in a darwinian bigco but the work was not fulfilling.
I dunno, the mechanic I go to is reliable and so busy it's hard to get a slot these days, and he seems to be doing very well for himself. So many mechanics are unreliable
I think I know what you mean, in an ideal world - we wouldn't have bugs. You'd build a feature and it would work forever but that's rarely the case.
I think of it as "offensive" and "defensive" building, ideally you want to be on the offensive (i.e. building stuff that wasn't there yesterday) but you have to balance it with good defence (i.e. adding a certain type of anti-fragility to your system by fixing bugs due to your features being exposed to the real world).
Saying this, I've never met a good engineer who wasn't very good at troubleshooting so perhaps it's more of a consequence of building than a skillset.
If you are debugging your work too much, maybe it's you.
Obligatory Kernighan’s law: “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”
I think that's just having a job. There's nothing inherently better about building than fixing. Hell, anyone can build something these days. You can get chatgpt to write you a fully functional bootloader without knowing a single bit of assembly or how booting operates. Being able to grasp and fix things is already the superlative talent worth hiring for.
> But reliable car mechanics don't get paid a lot.
The equivalent in our industry is worth a lot more money than someone who can only build. I think "building" is a lot closer to your analogy of using a word processor than "fixing" things is and you've got the reputation of the two skills completely swapped.
> But reliable car mechanics don't get paid a lot.
Not a great analogy. Reliable car mechanics often get paid very well in comparison to their peers. Used to be one. Got into tech as a result of how much tech got into cars. Do they pay as well as tech jobs? Depends. I made more money and worked less hours than my buddy in IT at one of the largest corporations in America in the same city (granted not a techhub like SV, Seattle, NYC, etc, but most places aren't).
The key differentiator here is not time spent building vs time spent repairing (troubleshooting). It's knowing what's worth spending your time on, and when to say "No", because not everything needs fixed, nor is every problem necessarily yours to solve.
Truly good diagnostics skills is knowing what's worth spending time on, regardless of whether it's repairing something that exists or building something that doesn't. A tire with only 10% of treadwear could technically be replaced with something better, but is that worth anyone's time or money? Probably not. But if the tire on the opposite side is still brand new, and they were replaced at the same time, diagnostics tells you the alignment is off, and that issue - whatever it may be - very well could be worth everyone's time and money to fix.
Code is no different. Don't try to fix/improve/build everything. Focus on what matters. Good troubleshooting/diagnostic skills is a big part of knowing what does, and doesn't.
And it isn’t just your _appearance_ or _pay_ you should be worrying about. If you fix a nitpicky bug that affected 5% of the users — congrats that’s a pretty big bug! But could you have built a new feature that would roll out to 50% of users in that same time? In many situations, building the new feature will have a bigger impact on the world than the bug fix. Obviously will depend on the exact circumstance. But you should consider the opportunity cost.
Hmm. Well, of course it isn’t on any one person to fix a systemic issue. But, I really would not want to use a system where the institutional decision was made to focus on new features instead of fixing bugs that hit 5% of users.
The problem is that the real impact and the perceived value of each path are not necessarily the same.
Building new things is sexy and highly visible. It's easy to say about yourself "I built that cool new feature" or better, to promote "I might build something of incredible value". You're front-and-center with decision makers, shaping the future.
Conversely, debugging is perceived as a cost center. "I fixed that critical infrastructure that used to work with only minor interruption" or "Maybe everything will break and you'll need me to fix it" are not nearly as exciting. Worse, the best maintenance is completely invisible, fixing problems before they are felt. You're in the background, dealing with the legacy of the past.
If your organization is like that then let them know. Suggest some leading indicators that they could track so you can take credit for your preventative work. If all else fails, then you can decide if you want to just do visible work or leave.
> Soon you'll be seen by others as a car mechanic. Maybe a reliable car mechanic. But reliable car mechanics don't get paid a lot.
this argues for a messed up reward system for the company, not the engineer. all sufficiently large systems have bugs and performance issues, and adding a measure of reliability, stability, and speed is just as important as adding features.
Reliable car mechanics with ambition actually get paid fairly well compared to actually complacent car mechanics. It's not staff software engineer money, but it's pretty decent, and a hell of a lot less of whichever qualities one associates with working in tech for better or worse. I don't think those qualities are mutually exclusive. But, much like everything, it goes without saying almost that generally you should have a variety of skills you can bring to the table.
This is a fascinating take. I have been thinking about your comment for two days.
I think you're right in some cases (when working in a field one has mastered, for example), and I think I could probably go in the direction of getting it right the first time.
But the way I see it, any time I'm doing something new or innovative, I'm doing something I don't know how to do, which takes trial and error; and troubleshooting is basically figuring things out by trial and error, in a systematic way.
Though a lot of time it is used for fixing bugs, I think troubleshooting as a skill and mindset is equally useful for creating new things, where you are solving for something.
Notice that if you mouse over "9 hours ago" on the story it shows the timestamp 2025-02-25. 9 hours ago was not 2025-02-25. If you mouse over the "7 hours ago" on credit_guy's comment, the timestamp shows 2025-02-26. One day after it was submitted, two days before it made the frontpage.
> But reliable car mechanics don't get paid a lot.
I honestly don't know: do they not? The reliable mechanics in my city seem to do tons of business and charge significantly higher prices than the competition.
I can think of a lot of software that I have stopped paying for because they did not fix bugs or performance issues. I can think of far less software that I stopped paying for because although it was reliable, they did not add more features to it - but I am an individual, not a business, and likely am not representative of the average software-buying individual.
In software and systems that I have built for myself, the impact of fixing a bothersome bug is usually far higher than adding a new capability, but I may just be more bothered by bugs than most. Reliability and smooth, predictable operation are very important to me.
I spend way more time troubleshooting than I do building new things. I'm a lead programmer on my team. I regularly hop on impromptu zoom calls to help people out with thorny problems or jump into slack conversations. I don't get a lot of focused time for building new stuff. I'm more valuable to my team keeping them running smoothly and enforcing standards that avoid some of the nastier troubleshooting.
I have found myself to be someone that loves learning a little bit about various tech disciplines, but I've ended up as a "master of none" because I get bored quickly. Any advice for a career path that would reward such a thing?
I'm not sure that I agree with this, not that I'm big on troubleshooting, but experienced SREs (Site Reliability Engineers) are worth their weight in gold and get paid an insane amount of money. Perhaps the key is to debug a vendor's solution rather than an in-house one.
Making troubleshooting skills a profession in itself makes reliability a property of a specific person or team and not a property of the system. The former doesn't scale.
You’ll never be able to build a (large, complex) system that is consistently, inherently reliable over time and in response to change. You want to aim for such reliability but you still need troubleshooting ability.
My skill at troubleshooting has caused me to be the goto guy in every project, which lends great credibility and opportunities for leadership. Your pride in your troubleshooting skills isn't pride in a side-quest, it's pride in having a deep understanding of how systems work in general and in the specific.
"Good troubleshooter" might not look great on a CV, but all of your coworkers naming you as the most valuable member of the team, and a natural leader, is worth more than any feature launches.
The value is in leadership, and being able to avoid certain classes of bugs from appearing in the first place. Troubleshooting just happens to be the skill that allows you to gain the knowledge to lead.
This take seems pretty short-sighted. You may well have a point regarding what hiring managers actually value but speaking from experience having a couple people on hand that are better at solving problems than manufacturing them is pretty clutch in most settings where results are actually important.
> don't fool yourself into thinking they are gold currency on the job market.
I must have gotten lucky then because I’ve built most of my career off my incredible troubleshooting capacity and my communication capability.
I’m not earning Silicon Valley money (because thankfully I don’t live there) but I’m at the top end for salaries in my country. I out earn the vast majority of devs/devops people I know.
Maybe it’s a bit unfair though because I troubleshoot at a very different level to most I suspect.
Or maybe I’ve been applying the wrong word to what I do my entire career…or misunderstanding why I’m valued. This is more possible than it might seem.
For further context, i've held a wide variety of positions now within IT, including executive management. The problem is still, and I know this will sound stupid, I don't know how it is I got here, or why people valued me enough to keep me promoting me. I never even asked for the promotions they just...happened. This has made applying for new jobs harder, because I've never actually been entirely sure what my value is, so I often take jobs that are lower than my last job, but then everytime its not long before I end up well above that again.
I eventually assumed it must be my troubleshooting capacity. I asked a CEO I worked with once at a smaller startup why it is he kept promoting me, and I got this story about how by just being in the room, everyone around me wants to do better work. Not because they are being told to do so, but because the work I do apparently just inspires people around me to do better. I was the truest example he had seen apparently of 'lead by example.'
It's been very problematic because despite earning good money, and having never struggled to find, retain or advance in a job, I still don't truely know what it is exactly i'm good at.
I think im terrible at explaining this. Every time I have tried to talk to someone about it in real life they just end up telling me I have impostor syndrome. Of course I do, I don't really know what the fuck it is I do.
There is some truth to what you’re saying, but that’s just another example of (particularly American) capitalism’s masterful misallocation of resources when it comes to compensation.
What I missed here was the importance of keeping careful notes as you go. What exactly happened when we constructed that weird input and commented out line 353? What hypotheses are we entertaining? Can we rule out any of them based on our evidence? It’s very easy to dupe yourself if you’re doing it all in your head.
Thanks for posting an archive link. My site has survived previous HN traffic spikes on's free tier, but 256mb of RAM wasn't quite adequate this time :)
Just finished the book recently. It's very insightful. As someone who considers himself good at debugging* and is still trying to improve debugging skills and efficiency, I view this book (and similar resources like this article) as a guide that also helps me reflect on what could have been done in a better way next time.
* Multiple times, I helped others find the root cause of a bug after they spend hours at it and have no clue what is happening
Seconded. I don't think I learned much from the book, but it helped make my thought process more structured and methodical. I had a friend recently start as a developer, and I strongly recommended this to them.
I believe Hillel Wayne gives a copy to every junior dev he meets.
What do you think will be the last skills/jobs to go obsolete?
I think it's "Wanting the right thing" (This includes figuring out what the right thing is) and "Being able to articulate your wish clearly" (This includes clarifying your thoughts).
Are there people with those skills today? They seem to be in terribly short supply. I’ve seen more than one company spin its wheels for ages because nobody could clearly express an operational vision.
> I’ll define troubleshooting as systematically determining the cause of unwanted behaviour in a system, and fixing it.
Or debugging and understanding the reason why a system isn’t behaving as expected. And pinpointing the part of the code that causes the behaviour that is not desired.
In another field—In IT Service Management (ITSM)—there is the distinction between incidents and problems. If you see many incidents coming in that are related, you sit down and start doing a root-cause analysis, basically a form of debugging. Or troubleshooting.
Half the comments here are nitpicking the car mechanic analogy (naturally), the other half are complaining about the site shitting the bed.
Yes, debugging is important, and too many people can't do it, which is unsettling considering how many bugs those people are putting into the code in the first place.
I feel like the way to think about troubleshooting is to think about it as an umbrella encompassing reliability and quality engineering in software. If you can find ways of showing how reliability and quality of a software can be broken and how it can be improved (simultaneously), then you have a career to make.
Don't wait for stuff to break and react. Be proactive and find ways to demonstrate how it can break and how to fix it.
When I'm stumped troubleshooting production problems, I try to think how I can get more information out of the system.
Exporting telemetry events with a wide set of attributes to an observability platform is a great approach which can provide an extensible way to expose additional information about the events.
There are problem areas where it is a lot easier to assume everything is a 10/10 monster.
If you start every journey with "power cycle the device" and always wind up with a bridge call between 3 vendors, you might as well get the bridge warmed up the moment something throws a warning.
Oftentimes, getting someone on the phone can be a bit of a circus act regardless of what the contracts say. Over reacting early on can minimize total time to resolution.
When the CEO's vibe-coded slop gets chucked over the wall to become someone else's problem once completed in rough prototype form, and the ensuing bugs and scalability/reliability issues manifest, troubleshooting is going to be a more valuable skill than ever!
We'll get paid peanuts for it, but hey, we should be thankful for the work in the first place!
From zen and the art of moto: ‘There’s no fault isolation problem in motorcycle maintenance that can stand up to it. When you’ve hit a really tough one, tried everything, racked your brain and nothing works, and you know that this time Nature has really decided to be difficult, you say, “Okay, Nature, that’s the end of the nice guy,” and you crank up the formal scientific method.’
The more I hear about this book, the more I realize that I was way too young when I read it.
I read it at age 18 and thought, "I should go buy a motorcycle and ride it around. That's the answer." Then I read it at age 30 and thought..."Oh, that wasn't the point at all."
Troubleshooting is one of my main comparative advantages - I'm better at it than I am at programming and I enjoy it more. It's also a relatively independent skill, not everyone is good at it or likes it. It reminds me of the lateral thinking puzzles I did as a kid where you had to ask questions to uncover whatever the weird situation was. You have to question your assumptions - think about how you might be wrong, something I like to do in general anyway.
There's a certain way of reasoning about the problem and thinking about what it might be with limited information in a systemic way. It's also a bit broader than debugging - you can do a lot of troubleshooting (sometimes faster and more effectively) by doing things other than reading the code.
It's also been somewhat of a career advantage because it seems to be both more uncommon than standard dev for someone to be really good at and something that most people dislike (while it's my favorite thing to do). It also overlaps a lot with other more general types of problem solving.
Anyway - a lot of the article resonates with how I think about it too.
I feel like the article is just a very long-winded way to say what I try to help junior devs understand and that's: take a step back and start from the very top, change only one thing at a time, don't become fixated.
So many issues turn out to be the smallest configuration mishaps, this is why I also promote using a debugger as much as I can as well - there's nothing quite like being able to see _exactly_ what's going on.
> this is why I also promote using a debugger as much as I can as well - there's nothing quite like being able to see _exactly_ what's going on.
I don't know, the speed of just reading code and maybe inserting some diagnostic messages is hard to beat. It's a pretty bad day if I feel like I need to bust out a debugger—99% of "seeing _exactly_ what's going on" is not going to be relevant and will just distract you.
The code rarely resembles the runtime state of the system. Debuggers are an incredible shortcut almost all the time.
I strongly and emphatically disagree on every point (I don't even know what your statement about runtime state and code even means actually, it just seems like a category error, but how do you even write code without being able to reason about runtime state? It just makes no sense.), but I understand there are people who love their debugger and I respect them.
Basically the only time I pop open the debugger is when it's otherwise difficult to see runtime behavior—say, a certain condition in a server for which you can't easily access logs. Outside of that it feels like major overhead and distraction from getting the bug fixed. Plus iterating without print statements is a tedious, tedious affair.
Don't get me wrong, it's a critical and necessary skill that junior devs often struggle to understand and master. I just think over reliance on the debugger will slow your velocity over time when most bugs have straightforward causes easiest to see by simply reading the code (which you'll have to do anyway with a debugger). I can't tell you how many times I've had to tell devs to put away their tools so we can calmly analyze the code without flipping back and forth between views. The vast majority of the time it's the second pair of eyes that resolves the problem.
Depends on the platform. Using a JS debugger is so easy, yet few devs I know use it.
> Realizing that I spend more time troubleshooting than I do building or doing ...
That's not good. The problem with troubleshooting is that it messes up with your reward system. After you fix a hard-to-debug problem, you feel a sense of accomplishment. Which would be ok, but the problem is that this sense of accomplishment is often time higher than it should be. You go home at the end of the day thinking "well, today I didn't build anything, but it's fine, because I fixed that bug". You are becoming complacent.
If you end up saying to yourself, like the author of this blog here, that you troubleshoot more than you build or you do, then you have a problem. Soon you'll be seen by others as a car mechanic. Maybe a reliable car mechanic. But reliable car mechanics don't get paid a lot.
This might be a controversial take but here it is: being proud of your troubleshooting skills sits somewhere between being proud of your typing speed and being proud of your word document formatting skills. These things never go obsolete, but don't fool yourself into thinking they are gold currency on the job market.
I think you may be leaning too far in the other direction.
I'm a troubleshooter. I fix problems. I keep my head straight in a crisis. Every job I've had across 3 decades, regardless of my actual title or formal responsibilities, I'm the firefighter. People call me when they can't figure something out. People call me when something big breaks and needs to be fixed urgently. Even if I'm not an expert in the broken thing, they call me in. They call me because the experts are often floundering and not making any progress because they can't troubleshoot their way out of a wet paper bag.
I do not feel this has held me back professionally. I have been loved by management and peers in all of these jobs. When I nearly left a prior employer because much of the work wasn't aligned with what I wanted to do, management created a new role with better aligned work and higher pay to convince me to stay. In my current role, I'm very happy with my salary, working environment, management, and team.
I wish troubleshooting skills were as common as typing and document formatting skills. I wouldn't need to help out nearly as many people because they could handle their own crises.
> I'm a troubleshooter. I fix problems. I keep my head straight in a crisis. ... People call me when they can't figure something out. ... Even if I'm not an expert in the broken thing, they call me in. They call me because the experts are often floundering ...
This describes a sizable portion of my career. It's lucrative, it's gratifying, and it's fun. It's as close as I'm going to get to being a "kick-ass mercenary".
Seeing new environments, new applications, and new problems never gets old. The stories that come from the work are priceless, too.
> I wish troubleshooting skills were as common as typing and document formatting skills.
When I conduct interviews this is the main skill I screen for. I think it can be taught, but somebody who already has it and is missing some particular technical experience is vastly more valuable.
I've found past a certain point career-wise, troubleshooting really can't be taught. It's sort of a a mindset/attitude to me. I you are 5+ years into your career and haven't gotten there, you probably just don't care. It's the attitude of a developer who is indifferent to the craft and just wants to cobble together found code as quickly as possible to move onto the next thing.
A good troubleshooter can enable higher output across a team because they are like grease in the machine. Particularly indifferent troubleshooters become a net drag because instead of being able to help others they are always interrupting others for help.
"troubleshooting really can't be taught" Exactly: it is a gift. You have "the Knack". (Dilbert - The Knack "The Curse of the Engineer")
I think it has to do with interests. Some people have an inmate interest in how stuff works, and specifically how it breaks.
I think you can teach someone to troubleshoot in a procedural and methodical manner, but they will always lack the creative "spark" that comes from being actually interested. Procedural troubleshooters are useful, but they won't exceed the bounds of the model they've been taught to work under.
Right - can you teach people to like different things? Maybe? Generally, no.
I don’t believe that’s true. It’s an attitude, not some kind of innate skill like reflexes. You can learn to believe in yourself, plus it’s teachable in my experience.
Yes attitudes are harder to learn than skills aren't they?
Ever notice people get more stubborn and stuck in their ways over time?
It's possible you cannot teach people to want different things.
That would be more of a psychological hack. I've never seen this happen. My experience is people behave a certain way (care about what they do up to a roughly defined level) and 10 years later they behave the same. Self esteem tends to change or fluctuate and can be thought, but personally i believe that is not enough for a non-troubleshooting mindset to turn around. Unless you could convince me otherwise?
I was a manager for over 25 years, and this was exactly the type of thing that I looked for.
LeetCode tests actually tend to bias against that kind of skill.
> I do not feel this has held me back professionally. I have been loved by management and peers in all of these jobs.
If only your experience was universal in that regard! I once had that role in an early-career job -- but I was looked down upon by peers and management because I was doing mostly maintenance work. The "good" developers, in their minds, were the ones shipping the most new features -- the irony being that those features would then blow up out in the field, at which time they landed on my desk to turn them into production-worthy code.
That's just poor management, IMO. The good ones will have your number in their cell phone to call when the stuff they shipped breaks (or even better, allow you to take the time you need to not ship broken code to begin with). Plus it doesn't take much time in the industry to realize that shipping a broken product is a far worse look than shipping slower, and that the faster you can fix a broken product the less money you'll bleed.
That's your problem. You keep the fixes for a rainy day when production is down and the business is losing $10m an hour.
>Yeah boss I can fix it, but how much is it worth to you since this isn't in my job description.
Textbook survivorship bias.
> I wouldn't need to help out nearly as many people because they could handle their own crises.
They don't need to, because there's always you who can figure out boring minutiae for them while they deliver business value.
Wonderful description. Thank you for capturing a snap shot that conveys the power of troubleshooting.
In fact you could easily be the guy they keep on a monthly retainer just for peace of mind.
The word retainer has an appealing mercenary quality to it. The dream is that your knowledge of an esoteric system set up in the 1980s gets you warehoused in a data closet at a mid-sized organization, where you can spend the rest of your days browsing Hacker News and watching pirated films.
After 3 years the finish gets dull, but it’s still not bad. Not the worst contract I ever worked.
Troubleshooting skills are really valuable but hard to market. You can deal with lot's of different technologies and effortless draw conclusions from for others totally disconnected domains. Sadly the tech market values expertise that is based on keywords. So while it is fun and creates huge value it is worth staying mostly on a path that can be explained to less mentally flexible mortals.
I agree the need for troubleshooting can be born from poor decisions, but it's still a marketable skill for places that need it at scale. One of my roles was head of Linux Engineering for a Fortune 50. Sure there's the pets vs cattle thing and we all prefer cattle, but particularly in places with lots of legacy apps and infrastructure there are plenty of both that need more nuance than turning it off and back on again.
There's value in fixing things in the moment and then feeding them back to your engineer and architecture functions to address endemic issues so that everyone benefits.
This mentality through most of my career has left me trapped as technical support, and it's damn near impossible to climb out of the pit I've dug for myself. What you say about being seen as a car mechanic is true.
This played out at my last place. My boss would assign my co-worker to build the world's crappiest car in the least amount of time and when it broke down I would be the only one that seemed to be able to fix it (while my co-worker was busy building some other crappy car). I would have built a much better car in the first place! However I would have taken more time and the goal was to build and release as fast a possible. My boss was okay with the risk of said crappy car, my co-worker got promoted and I slowly burned out.
It's a tough balancing to make sure you sell yourself correctly and fight to work on things you want to!
We had a guy like this on our team once, it took a year to convince management he was a net drag on the team. Half the team quit, the other half said they would if they had to work with him any longer.
To prove the point we put him on a strategic rewrite and gave him master/trunk while the entire team moved to a feature branch for 6 months. This was complimentary to his ego as he was sick of us bureaucrats in the rest of the team telling him what to do and being such a burden on his genius creativity.
By the end he was unable to build / run his own branch, while the remaining team lost no velocity and was making regular releases to end users. The choice was easy at that point.
> and it's damn near impossible to climb out of the pit I've dug for myself.
By far the easiest way to do so will be to find another job. If you can't do this, yea, mentality will lock you in to positions you don't want to be in.
Until you start your own company, even if it is just you.
I feel this way about documentation. I do it, a lot. I get compliments and positive feedback on it. It helps me remember things I would otherwise forget. I hope that others would be inspired by my example but it hasn't happened. I could be selfish and horde my own documentation and let others sink or swim. But that hurts me too as I'd have to pick up their slack.
I'm reminded of the Gervais Principle. Doing the work is not the way to "win," but not winning might be the better lifestyle. Depends on your motivations, aspirations, and ethics. It's easy to chase the total compensation number, because it's just _there_ and like what are we doing anyway? But then what are you doing, anyway?
That's been my career mindset. I've been in roles where my management has referred to me as a "rockstar" and it was a burden not a compliment. I'd rather be in a supportive team environment where everyone carries their own weight.
Compensation has been decent with this approach over the years. I could have made more staying longer in a darwinian bigco but the work was not fulfilling.
> But reliable car mechanics don't get paid a lot
I dunno, the mechanic I go to is reliable and so busy it's hard to get a slot these days, and he seems to be doing very well for himself. So many mechanics are unreliable
I think I know what you mean, in an ideal world - we wouldn't have bugs. You'd build a feature and it would work forever but that's rarely the case.
I think of it as "offensive" and "defensive" building, ideally you want to be on the offensive (i.e. building stuff that wasn't there yesterday) but you have to balance it with good defence (i.e. adding a certain type of anti-fragility to your system by fixing bugs due to your features being exposed to the real world).
Saying this, I've never met a good engineer who wasn't very good at troubleshooting so perhaps it's more of a consequence of building than a skillset.
If you are debugging your work too much, maybe it's you.
Obligatory Kernighan’s law: “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”
> You are becoming complacent.
I think that's just having a job. There's nothing inherently better about building than fixing. Hell, anyone can build something these days. You can get chatgpt to write you a fully functional bootloader without knowing a single bit of assembly or how booting operates. Being able to grasp and fix things is already the superlative talent worth hiring for.
> But reliable car mechanics don't get paid a lot.
The equivalent in our industry is worth a lot more money than someone who can only build. I think "building" is a lot closer to your analogy of using a word processor than "fixing" things is and you've got the reputation of the two skills completely swapped.
> But reliable car mechanics don't get paid a lot.
Not a great analogy. Reliable car mechanics often get paid very well in comparison to their peers. Used to be one. Got into tech as a result of how much tech got into cars. Do they pay as well as tech jobs? Depends. I made more money and worked less hours than my buddy in IT at one of the largest corporations in America in the same city (granted not a techhub like SV, Seattle, NYC, etc, but most places aren't).
The key differentiator here is not time spent building vs time spent repairing (troubleshooting). It's knowing what's worth spending your time on, and when to say "No", because not everything needs fixed, nor is every problem necessarily yours to solve.
Truly good diagnostics skills is knowing what's worth spending time on, regardless of whether it's repairing something that exists or building something that doesn't. A tire with only 10% of treadwear could technically be replaced with something better, but is that worth anyone's time or money? Probably not. But if the tire on the opposite side is still brand new, and they were replaced at the same time, diagnostics tells you the alignment is off, and that issue - whatever it may be - very well could be worth everyone's time and money to fix.
Code is no different. Don't try to fix/improve/build everything. Focus on what matters. Good troubleshooting/diagnostic skills is a big part of knowing what does, and doesn't.
And it isn’t just your _appearance_ or _pay_ you should be worrying about. If you fix a nitpicky bug that affected 5% of the users — congrats that’s a pretty big bug! But could you have built a new feature that would roll out to 50% of users in that same time? In many situations, building the new feature will have a bigger impact on the world than the bug fix. Obviously will depend on the exact circumstance. But you should consider the opportunity cost.
Hmm. Well, of course it isn’t on any one person to fix a systemic issue. But, I really would not want to use a system where the institutional decision was made to focus on new features instead of fixing bugs that hit 5% of users.
> But could you have built a new feature that would roll out to 50% of users in that same time?
If the impact of debugging is expected to be larger than building something new, then debug. Else, build something new.
The problem is that the real impact and the perceived value of each path are not necessarily the same.
Building new things is sexy and highly visible. It's easy to say about yourself "I built that cool new feature" or better, to promote "I might build something of incredible value". You're front-and-center with decision makers, shaping the future.
Conversely, debugging is perceived as a cost center. "I fixed that critical infrastructure that used to work with only minor interruption" or "Maybe everything will break and you'll need me to fix it" are not nearly as exciting. Worse, the best maintenance is completely invisible, fixing problems before they are felt. You're in the background, dealing with the legacy of the past.
If your organization is like that then let them know. Suggest some leading indicators that they could track so you can take credit for your preventative work. If all else fails, then you can decide if you want to just do visible work or leave.
More like: why fix a 5% bug when you can deploy a new bug to the other 50%
> Soon you'll be seen by others as a car mechanic. Maybe a reliable car mechanic. But reliable car mechanics don't get paid a lot.
this argues for a messed up reward system for the company, not the engineer. all sufficiently large systems have bugs and performance issues, and adding a measure of reliability, stability, and speed is just as important as adding features.
> But reliable car mechanics don't get paid a lot.
If you're a wizbang mechanic working for an import car repair, you can get paid a lot more in hourly labor fees than some of us make.
They were talking about reliable, 10x is worth a lot in most complex jobs.
Reliable car mechanics with ambition actually get paid fairly well compared to actually complacent car mechanics. It's not staff software engineer money, but it's pretty decent, and a hell of a lot less of whichever qualities one associates with working in tech for better or worse. I don't think those qualities are mutually exclusive. But, much like everything, it goes without saying almost that generally you should have a variety of skills you can bring to the table.
This is a fascinating take. I have been thinking about your comment for two days.
I think you're right in some cases (when working in a field one has mastered, for example), and I think I could probably go in the direction of getting it right the first time.
But the way I see it, any time I'm doing something new or innovative, I'm doing something I don't know how to do, which takes trial and error; and troubleshooting is basically figuring things out by trial and error, in a systematic way.
Though a lot of time it is used for fixing bugs, I think troubleshooting as a skill and mindset is equally useful for creating new things, where you are solving for something.
time dilation
Based on the timestamps, it could only be. But this story, timestamps notwithstanding, was submitted by suprisetalk ~2 days ago.
Then it was placed in (, and got two comments; credit_guy's comment was one of them.
Then, today, it hit the frontpage.
Notice that if you mouse over "9 hours ago" on the story it shows the timestamp 2025-02-25. 9 hours ago was not 2025-02-25. If you mouse over the "7 hours ago" on credit_guy's comment, the timestamp shows 2025-02-26. One day after it was submitted, two days before it made the frontpage.
Reliable mechanics have high rates and long waits everywhere I've lived.
Hell, I've seen some that are so busy that they won't take new customers!
> But reliable car mechanics don't get paid a lot.
I honestly don't know: do they not? The reliable mechanics in my city seem to do tons of business and charge significantly higher prices than the competition.
I can think of a lot of software that I have stopped paying for because they did not fix bugs or performance issues. I can think of far less software that I stopped paying for because although it was reliable, they did not add more features to it - but I am an individual, not a business, and likely am not representative of the average software-buying individual.
In software and systems that I have built for myself, the impact of fixing a bothersome bug is usually far higher than adding a new capability, but I may just be more bothered by bugs than most. Reliability and smooth, predictable operation are very important to me.
Complacency is not just feeling good, but doing so while ignoring a risk.
You can feel good about addressing risks, the opposite of complacency.
I spend way more time troubleshooting than I do building new things. I'm a lead programmer on my team. I regularly hop on impromptu zoom calls to help people out with thorny problems or jump into slack conversations. I don't get a lot of focused time for building new stuff. I'm more valuable to my team keeping them running smoothly and enforcing standards that avoid some of the nastier troubleshooting.
I have found myself to be someone that loves learning a little bit about various tech disciplines, but I've ended up as a "master of none" because I get bored quickly. Any advice for a career path that would reward such a thing?
CTO :)
Some kinds of troubleshooting are pretty important, though (e.g., medicine).
Funny. Sounds like you’re troubleshooting their reward system.
I'm not sure that I agree with this, not that I'm big on troubleshooting, but experienced SREs (Site Reliability Engineers) are worth their weight in gold and get paid an insane amount of money. Perhaps the key is to debug a vendor's solution rather than an in-house one.
It's even worse than that.
Making troubleshooting skills a profession in itself makes reliability a property of a specific person or team and not a property of the system. The former doesn't scale.
You’ll never be able to build a (large, complex) system that is consistently, inherently reliable over time and in response to change. You want to aim for such reliability but you still need troubleshooting ability.
User credit_guy advocates for technical debt.
My skill at troubleshooting has caused me to be the goto guy in every project, which lends great credibility and opportunities for leadership. Your pride in your troubleshooting skills isn't pride in a side-quest, it's pride in having a deep understanding of how systems work in general and in the specific.
"Good troubleshooter" might not look great on a CV, but all of your coworkers naming you as the most valuable member of the team, and a natural leader, is worth more than any feature launches.
The value is in leadership, and being able to avoid certain classes of bugs from appearing in the first place. Troubleshooting just happens to be the skill that allows you to gain the knowledge to lead.
"I found the root cause and corrected it. It may be an issue in these other two places so we should check there as well."
"Great. How do we avoid issues like this in the future?"
"By doing X thing a different way, and ensuring that Y thing is also in place."
You get more credit for putting out fires you started than building something that doesn't catch on fire in the first place.
This take seems pretty short-sighted. You may well have a point regarding what hiring managers actually value but speaking from experience having a couple people on hand that are better at solving problems than manufacturing them is pretty clutch in most settings where results are actually important.
> don't fool yourself into thinking they are gold currency on the job market.
I must have gotten lucky then because I’ve built most of my career off my incredible troubleshooting capacity and my communication capability.
I’m not earning Silicon Valley money (because thankfully I don’t live there) but I’m at the top end for salaries in my country. I out earn the vast majority of devs/devops people I know.
Maybe it’s a bit unfair though because I troubleshoot at a very different level to most I suspect.
Or maybe I’ve been applying the wrong word to what I do my entire career…or misunderstanding why I’m valued. This is more possible than it might seem.
For further context, i've held a wide variety of positions now within IT, including executive management. The problem is still, and I know this will sound stupid, I don't know how it is I got here, or why people valued me enough to keep me promoting me. I never even asked for the promotions they just...happened. This has made applying for new jobs harder, because I've never actually been entirely sure what my value is, so I often take jobs that are lower than my last job, but then everytime its not long before I end up well above that again.
I eventually assumed it must be my troubleshooting capacity. I asked a CEO I worked with once at a smaller startup why it is he kept promoting me, and I got this story about how by just being in the room, everyone around me wants to do better work. Not because they are being told to do so, but because the work I do apparently just inspires people around me to do better. I was the truest example he had seen apparently of 'lead by example.'
It's been very problematic because despite earning good money, and having never struggled to find, retain or advance in a job, I still don't truely know what it is exactly i'm good at.
I think im terrible at explaining this. Every time I have tried to talk to someone about it in real life they just end up telling me I have impostor syndrome. Of course I do, I don't really know what the fuck it is I do.
There is some truth to what you’re saying, but that’s just another example of (particularly American) capitalism’s masterful misallocation of resources when it comes to compensation.
What I missed here was the importance of keeping careful notes as you go. What exactly happened when we constructed that weird input and commented out line 353? What hypotheses are we entertaining? Can we rule out any of them based on our evidence? It’s very easy to dupe yourself if you’re doing it all in your head.
Hugged to death for me right now, here's an archive link:
Thanks for posting an archive link. My site has survived previous HN traffic spikes on's free tier, but 256mb of RAM wasn't quite adequate this time :)
No disrespect, but I thought the whole point of these magic cloud platforms was that this situation never happens.
The whole point of magic cloud platforms is to upcharge for everything and convince people there's no other way to run software.
Just finished the book recently. It's very insightful. As someone who considers himself good at debugging* and is still trying to improve debugging skills and efficiency, I view this book (and similar resources like this article) as a guide that also helps me reflect on what could have been done in a better way next time.
* Multiple times, I helped others find the root cause of a bug after they spend hours at it and have no clue what is happening
Seconded. I don't think I learned much from the book, but it helped make my thought process more structured and methodical. I had a friend recently start as a developer, and I strongly recommended this to them.
I believe Hillel Wayne gives a copy to every junior dev he meets.
Brendan Gregg's USE method is for performance troubleshooting but could work in any situation (broken is just the worst performance, right?)
He promotes many more methods , based on the exact needs:
If one is to be pedantic, troubleshooting does not involve fixing which is a separate and also valuable skill
What do you think will be the last skills/jobs to go obsolete?
I think it's "Wanting the right thing" (This includes figuring out what the right thing is) and "Being able to articulate your wish clearly" (This includes clarifying your thoughts).
Are there people with those skills today? They seem to be in terribly short supply. I’ve seen more than one company spin its wheels for ages because nobody could clearly express an operational vision.
There will always be work for people who are smart, hard-working, creative, and willing to do exactly what their billionaire-owner asks.
> I’ll define troubleshooting as systematically determining the cause of unwanted behaviour in a system, and fixing it.
Or debugging and understanding the reason why a system isn’t behaving as expected. And pinpointing the part of the code that causes the behaviour that is not desired.
In another field—In IT Service Management (ITSM)—there is the distinction between incidents and problems. If you see many incidents coming in that are related, you sit down and start doing a root-cause analysis, basically a form of debugging. Or troubleshooting.
So yes, this is a skill that is timeless.
Half the comments here are nitpicking the car mechanic analogy (naturally), the other half are complaining about the site shitting the bed.
Yes, debugging is important, and too many people can't do it, which is unsettling considering how many bugs those people are putting into the code in the first place.
I feel like the way to think about troubleshooting is to think about it as an umbrella encompassing reliability and quality engineering in software. If you can find ways of showing how reliability and quality of a software can be broken and how it can be improved (simultaneously), then you have a career to make.
Don't wait for stuff to break and react. Be proactive and find ways to demonstrate how it can break and how to fix it.
Technical troubleshooting is one thing. Organizational troubleshooting is another, and IME it is neither valued nor rewarded. YMMV.
When I'm stumped troubleshooting production problems, I try to think how I can get more information out of the system.
Exporting telemetry events with a wide set of attributes to an observability platform is a great approach which can provide an extensible way to expose additional information about the events.
My first job was as a bench tech, for a microwave communication device manufacturer.
That really helped me to become a good troubleshooter.
(SadServers guy here) Good article and although I don't see the author here, thanks for the mention :-)
Thank you :) I'm here just slow at typing.
> Don’t assume it’s complicated
There are problem areas where it is a lot easier to assume everything is a 10/10 monster.
If you start every journey with "power cycle the device" and always wind up with a bridge call between 3 vendors, you might as well get the bridge warmed up the moment something throws a warning.
Oftentimes, getting someone on the phone can be a bit of a circus act regardless of what the contracts say. Over reacting early on can minimize total time to resolution.
This strategy will also make the rank-and-file at your vendors hate and distrust you, so proceed with caution.
The skill is "troubleshooting". Not to be confused with its close cousin, "troublemaking".
The site appears to be hugged to death for me.
Yes, sorry. I have scaled up my hosting and it's back :)
When the CEO's vibe-coded slop gets chucked over the wall to become someone else's problem once completed in rough prototype form, and the ensuing bugs and scalability/reliability issues manifest, troubleshooting is going to be a more valuable skill than ever!
We'll get paid peanuts for it, but hey, we should be thankful for the work in the first place!
They should troubleshoot the long loading times for the site.