Try to fix it one level deeper

matklad.github.io

65 points by Smaug123 11 hours ago

niccl 9 hours ago

In the course of interviewing a bunch of developers, and employing a few of them, I've concluded that this ability/inclination/something to do this deeper digging is one of the things I prize most in a developer. They have to know when to go deep and when not to, though, and that's sometimes a hard balancing act.

I've never found a good way of screening for the ability, and more, for when not to go deep, because everyone will come up with some example if you ask, and it's not the sort of thing that I can see highlighting in a coding test (and _certainly_ not in a leet-code test!). If anyone has any suggestions on how to uncover it during the hiring process I'd be ecstatic!

giantg2 9 hours ago

"I've concluded that this ability/inclination/something to do this deeper digging is one of the things I prize most in a developer."
Where have you been all my life? It seems most of teams I've been on value speed over future proofing bugs. The systems thinking approach is rare.
If you want to test for this, you can create a PR for a fake project. Make sure the project runs but has error, code smells, etc. Have a few things like they talk about in the article, like a message of being out of disk space but missing critical message/logging infrastructure to cover other scenarios. The best part is, you can use the same PR for all levels that you're hiring for by expecting senior to get X% of the bugs, mids to get X/2% and noobs to get X/4%.
- jerf 6 hours ago
  
  "It seems most of teams I've been on value speed over future proofing bugs."
  So, obviously, if one team is future proofing bugs, and the other team just blasts out localized short-term fixes as quickly as possible, there will come a point where the first team will overtake the second, because the second team's velocity will by necessity has to slow down more than the first as the code base grows.
  If the crossover point is ten years hence, then it only makes sense to be the second team.
  However, what I find a bit horrifying as a developer is that my estimate of the crossover point keeps coming in. When I'm working by myself on greenfield code, I'd put it at about three weeks; yes, I'll go somewhat faster today if I just blast out code and skip the unit tests, but it's only weeks before I'm getting bitten by that. Bigger teams may have a somewhat farther cross over point, but it's still likely to be small single-digit months.
  There is of course overdoing it and being too perfectionist, and that does get some people, but the people, teams, managers, and companies who always vote for the short term code blasting simply have no idea how much performance they are leaving on the table almost immediately.
  Established code bases are slower to turn, naturally. But even so, I still think the constant short-term focus is vastly more expensive than those who choose it understand. And I don't even mean obvious stuff like "oh, you'll have more bugs" or "oh, it's so much harder to on board", even if that's true... no, I mean, even by the only metric you seem care about, the team that takes the time to fix fundamental issues and invests in better logging and metrics and all those things you think just slow you down can also smoke you on dev speed after a couple of months... and they'll have the solid code base, too!
  "Make sure the project runs but has error, code smells, etc."
  It is a hard problem to construct a test for this but it would be interesting to provide the candidate some code that compiles with warnings and just watch them react to the warnings. You may not learn everything you need but it'll certainly teach you something.
  - ozim 2 hours ago
    
    Unfortunately I believe there is no crossing point even in 10 years.
    If quick fix works it is most likely a proper fix, if it doesn’t work then you dig deeper. It is also case if feature to be fixed is even worth spending so much time.
    
    rocqua 27 minutes ago
    
    A quick fix works now. It makes the next fix or change much harder because it just added a special case, or ignored an edge case that wasn't possible in the configuration at that time.
  - daelon 2 hours ago
    
    Slow is smooth, smooth is fast.
- ozim 2 hours ago
  
  I have seen enough BSers who claimed that they need “do the proper fix” doing analysis and wasting everyone’s time.
  They would be vocal about it and then spend weeks delivering nothing “tweaking db indexes” while I immediately have seen code was crap and needed slight changes but I also don’t have time to fight all the fights in the company.
- niccl 8 hours ago
  
  That's a really good idea. Thanks
bongodongobob 8 hours ago

Knowing when to go down the rabbit hole is probably more about experience/age than anything. I work with a very intelligent junior that is constantly going down rabbit holes. His heart is in the right spot but sometimes you just need to make things work/get things done.
I used to do it a lot too and I kind of had a "shit, I'm getting old" moment the other day when I was telling him something along the lines of "yeah, we could probably fix that deeper but it's going to take 6 weeks of meetings and 3 departments to approve this. Is that really what you want to spend your time on?"
Like you said, it's definitely a balancing act and the older I get, the less I care about "doing things the right way" when no one actually cares or will know.
I get paid to knock out tickets, so that's what I'm going to do. I'll let the juniors spin their wheels and burn mental CPU on the deep dives and I'm around to lend a hand when they need it.
- layer8 8 hours ago
  
  However, you have to overdo it a sufficient number of times when you’re still inexperienced, in order to gain the experience of when it’s worth it and when it’s not. You have to make mistakes in order to learn from them.
  - giantg2 8 hours ago
    
    When it's worth it and when it's not seems to be more of a business question for the product owner. It's all opinion.
    I've been on a where I had 2 weeks left and they didn't want me working on anything high priority during that time so it wouldn't be half finished when I left. I had a couple small stories I was assigned. Then I decide to cherrypick the backlog to see how much tech debt I could close for the team before I left. I cleared something like 11 stories out of 100. I was then chewed out by the product owner because she "would have assigned [me] other higher priority stories". But the whole point was that I wasn't suppose dto be on high priority tasks because I'm leaving...
    
    layer8 8 hours ago
    
    The product owner often isn’t technical enough, or into the technical weeds enough, to be able to asses how long it might take. You need the technical experience to have a feeling of the effort/risk/benefit profile. You also may have to start going down the hole to assess the situation in the first place.
    The product owner can decide how much time would be worth it given a probable timeline, risks and benefits, but the experienced developer is needed to provide that input information. The developer has to present the case to the product owner, who can then make the decision about if, when, and how to proceed. Or, if the developer has sufficient slack and leeway, they can make the decision themselves within the latitude they’ve been given.
    
    giantg2 7 hours ago
    
    Yeah. The team agreed I should just do the two stories, which was what was committed to in that sprint. I got that done and then ripped through those other 11 stories in the slack time before I left the team. My TL supported that I didn't do anything wrong in picking up the stories. The PO still didn't like it.
    
    seadan83 8 hours ago
    
    Why product owner? (Perhaps rather not say team lead?)
    Are these deeply technical product owners? Which ones would be best to make this decision and which less?
    
    giantg2 7 hours ago
    
    In a non-technical company with IT being a cost center, it seems that the product owner gets the final say. My TL supported me, but the PO was still upset.
  - rocqua 26 minutes ago
    
    Regardless, these deep dives are so valuable in teaching yourself, they can be worth it just for that.
- userbinator 8 hours ago
  
  Have you been asked "why do we never have the time to do it right, but always time to do it twice?"
  - sqeaky 8 hours ago
    
    His response is likely something like "I am hourly contractor, I have howevermuch time time they want", or something with the same no long gives a shit energy.
    But their manager likely believes that deeper fixes aren't possible or useful for some shortsighted bean-counter reason. Not that bean counting isn't important, but they are often cout ed early and wrong.
    
    bongodongobob 6 hours ago
    
    Yeah don't get me wrong, I'm not saying "don't care about anything and do a shitty job" but sometimes the extra effort just isn't worth it. I'm a perfectionist at heart but I have to weigh the cost of meeting my manager's goals or getting behind because I want it to be perfect. Then 6 months later my perfect thing gets hacked apart by a new request/change. Knowing when and where to go deeper and when to polish things is a learned skill and has more to do with politics and the internal workings of your company more than some ideal. Everything is in constant flux and having insight into smart deep dives isn't some black and white general issue. It's completely context dependant.
- thelostdragon an hour ago
  
  "yeah, we could probably fix that deeper but it's going to take 6 weeks of meetings and 3 departments to approve this. Is that really what you want to spend your time on?"
  This is where a developer goes from junior to serior.

andai 9 hours ago

I was reading about NASA's software engineering practices.

When they find a bug, they don't just fix the bug, they fix the engineering process that allowed the bug to occur in the first place.

xelxebar 7 hours ago

This is such a powerful frame of mind. Bugs, software architecture, tooling choices, etc. all happen within organizational, social, political, and market machinery. A bug isn't just a technical failure, but a potential issue with the meta-structures in which the software is embedded.
Code review is one example of addressing the engineering process, but I also find it very helpful to consider business and political processes as well. Granted, NASA's concerns are very different than that of most companies, but as engineers and consultants, we have leeway to choose where and how to address bugs, beyond just the technical and immediate dev habits.
Soft skills matter hard.
anotherhue 9 hours ago

Maintenance is never as rewarded as new features, there's probably some MBA logic behind it to do with avoiding commoditisation.
It's true in software, it's true in physical infrastructure (read about the sorry state of most dams).
Until we root cause that process I don't see much progress coming from this direction, on the plus side CS principles are making their way into compilers. We're a long way from C.
- hedvig23 7 hours ago
  
  Speaking of digging deeper, can you expand on that theory on why focus/man hours spent on maintenance leads to commoditization and why a company wants to avoid that?
  - anotherhue 5 hours ago
    
    Top of my head, new things have unbounded potential, existing ones have known potential. We assume the new will be better.
    I think it's part of the reason stocks almost always dip after positive earnings reports. No matter how positive it's always less than idealised.
    You might think there's a trick where you can sell maintenance as a new thing but you've just invented the unnecessary rewrite.
    To answer your question more directly, once something has been achieved it's safe to assume someone else can achieve it also, so the focus turns to the new thing. Why else would we develop hydrogen or neutron bombs when we already had perfectly good fission ones (they got commoditised).
- giantg2 8 hours ago
  
  "Maintenance is never as rewarded as new features,"
  And security work is rewarded even less!
  - riknos314 6 hours ago
    
    > And security work is rewarded even less
    While I do recognize that this is a pervasive problem, it seems counter-intuitive to me based on the tendency of the human brain to be risk averse.
    It raises an interesting question of "why doesn't the risk of security breaches trigger the emotions associated with risk in those making the decision of how much to invest in security?".
    Downstream of that is likely "Can we communicate the security risk story in a way that more appropriately triggers the associated risk emotions?"
    
    SAI_Peregrinus 5 hours ago
    
    What is the consequence for security breaches? Usually some negative press everyone forgets in a week. Maybe a lost sale or two, but that's hard to measure. If you're exceedingly unlucky, an inconsequential fine. At worst paying for two years of credit monitoring for your users.
    What's the risk? The stock price will be back up by next week.
    
    giantg2 6 hours ago
    
    The people making the decision don't have a direct negative impact. Someone's head might role, but that's usually far up the chain where the comp and connections are high enough to not care. The POs making the day to day decisions are under more pressure for new features than they are for security.
toolz 6 hours ago

To that example though, is NASA really the pinnacle of achievement in their field? Sure, it's not a very competitive field (e.g. compared to something like the restaurant industry) and most of their existence has been about r&d for tech there wasn't really a market for yet, but still spaceX comes along and in a fraction of the time they're landing and reusing rockets making space launches more attainable and significantly cheaper.
I'm hoping that example holds up, but I'm not well versed in that area so it may be a terrible counter-example but my overarching point is this: overly engineered code often produces less value than quickly executed code. We're not in the business of making computers do things artfully just for the beauty of the rigor and correctness of our systems. We're doing it to make computers do useful thing for humanity.
You may think that spending an extra year perfecting a pace-maker might end up saving lives, but what if more people die in the year before you go to market than would've ended up dying had you launched with something almost perfect, but with potential defects?
Time is expensive in so many more ways than just capital spent.
- the_other 38 minutes ago
  
  SpaceX came along decades after NASA’s most famous projects. Would SpaceX have been able to do what they did if NASA hadn’t engineered to their standard earlier on?
  My argument (and I’m just thought experimenting here) is that without NASA’s rigor, their programmes would have failed. Public support, and thus the market for soace projects, would have dried up before SpaceX was able to “do it faster”.
  (Feel free to shoot this down: I wasn’t there and I havn’t read any deep histories of the conpanies. I’m just brainstorming to explore the problem space)
- sfn42 12 minutes ago
  
  The fallacy here is that you're assuming that doing things right takes more time.
  Doing things right takes less time in my experience. You spend a little more time up front to figure out the right way to do something, and a lot of the time that investment pays dividends. The alternative is to just choose the quickest fix every time until eventually your code is so riddled with quick fixes that nobody knows how it works and it's impossible to get anything done.
sendfoods an hour ago

Which blog/post/book was this? Thanks

raphlinus 5 hours ago

The title immediately brings to mind the Osterhout classic, "Always Measure One Level Deeper", [1], and I imagine was probably inspired by it. Also worth revisiting.

[1]: https://cacm.acm.org/research/always-measure-one-level-deepe...

brody_hamer 9 hours ago

I learned a similar mantra that I keep returning to: “there’s never just one problem.”

- How did this bug make it to production? Where’s the missing unit test? Code review?

- Could the error have been handled automatically? Or more gracefully?

Cpoll 8 hours ago

This kind of reminds me of https://en.m.wikipedia.org/wiki/Five_whys.

Terr_ 5 hours ago

IMO it may be worth distinguishing between:

1. Diagnosing the "real causes" one level deeper

2. Implementing a "real fix" fix one level deeper

Sometimes they have huge overlap, but the first is much more consistently-desirable.

For example, it might be the most-practical fix is to add some "if this happens just retry" logic, but it would be beneficial to know--and leave a comment--that it occurs because of a race condition.

peter_d_sherman 5 hours ago

>"There’s a bug! And it is sort-of obvious how to fix it. But if you don’t laser-focus on that, and try to perceive the surrounding context, it turns out that the bug is valuable, and it is pointing in the direction of a bigger related problem."

That is an absolutely stellar quote!

It's also more broadly applicable to life / problem solving / goal setting (if we replace the word 'bug' with 'problem' in the above quote):

"There’s a problem! And it is sort-of obvious how to fix it. But if you don’t laser-focus on that, and try to perceive the surrounding context, it turns out that the problem is valuable, and it is pointing in the direction of a bigger related problem."

In other words, in life / problem solving / goal setting -- smaller problems can be really valuable, because they can be pointers/signs/omens/subcases/indicators of/to larger surrounding problems in larger surrounding contexts...

(Just like bugs can be, in Software Engineering!)

Now if only our political classes (on both sides!) could see the problems that they typically see as problems -- as effects not causes (because that's what they all are, effects), of as-of-yet unseen larger problems, of which those smaller problems are pointers to, "hints at", subcases of, "indicators of" (use whatever terminology you prefer...)

Phrased another way, in life/legislation/problem solving/Software Engineering -- you always have to nail down first causes -- otherwise you're always in "Effectsville"... :-)

You don't want to live in "Effectsville" -- because anything you change will be changed back to what it was previously in the shortest time possible, because everything is an effect in Effectsville! :-)

Legislating something that is seen that is the effect of another, greater, as-of-yet unseen problem -- will not fix the seen problem!

Finally, all problems are always valuable -- but if and only if their surrounding context is properly perceived...

So, an an excellent observation by the author, in the context of Software Engineering!