Twitter | Search | |
Charity Majors
How can you predict what data Future-You will need to have gathered to debug an unpredictable problem? (And if you *could* predict the problem, why do you need a fancy debugging tool?) I hear this a lot, and the answer gets at the heart of the difference btwn monitoring & o11y.
Reply Retweet Like More
Joe Stein Jan 10
Replying to @mipsytipsy
o11y 🤟
Reply Retweet Like
Charity Majors Jan 10
Replying to @mipsytipsy
Monitoring is very much biased towards actionable alerts, and as such it trucks in rapidly identifying the same complex failure conditions repeatedly. Your known-unknowns.
Reply Retweet Like
Charity Majors Jan 10
Replying to @mipsytipsy
Good observability tools are very different. They are exploratory, letting you slice and dice and play with the data, and following the bread crumbs wherever they lead you. Think about how business intelligence tools work. You don’t know where you’re going til you get there.
Reply Retweet Like
Charity Majors Jan 10
Replying to @mipsytipsy
This exploratory approach can be slower at dealing with known-unknowns. But it’s the only game in town for unknown-unknowns. You’re screwed if you have to rely on a few humans magically holding the system in their heads and reasoning far beyond anything your tools can show.
Reply Retweet Like
Michael Hausenb🎃🎃 Jan 10
Replying to @charmalloc @mipsytipsy
I'm guessing observability ;)
Reply Retweet Like
Charity Majors Jan 10
Replying to @mipsytipsy
So: how do you predict what data you will need to gather? Unlike with monitoring, you don’t have to know nearly as much in advance. First: the q is often “which {service,node,version,shard,cluster,etc} is this coming from?” ... so instrument the shit outta all network hops
Reply Retweet Like
Charity Majors Jan 10
Replying to @mipsytipsy
Second: the next question is often “ok I found an error ... who or what else is experiencing this and what do they have in common?” You’ll want to tag everything with uuid, query, unique request id, build id, region, any other hi-card slice you can think of.
Reply Retweet Like
Michael P. Jan 10
Replying to @mipsytipsy
Numeronyms are fun.
Reply Retweet Like
Josh Wills Jan 10
Replying to @mipsytipsy
My solution to this is to just know everything. I don’t know what other people do.
Reply Retweet Like
Josh Wills Jan 10
Replying to @mipsytipsy
Perfect Foresight As A Service
Reply Retweet Like
Charity Majors Jan 10
Replying to @mipsytipsy
Low cardinality details rarely tell you shit worth knowing. (“Ohh, boyeee, all the errors are in us-east-1??!” And using mysql) High cardinality attributes give you the exact fucking residential address of the needle you seek, and all the losers on its block too.
Reply Retweet Like
Charity Majors Jan 10
Replying to @josh_wills
same if I’m not the first one on the team I don’t know how to function
Reply Retweet Like
Josh Wills Jan 10
Replying to @mipsytipsy
right there with you. What if we were on the same team?
Reply Retweet Like
Charity Majors Jan 10
Replying to @mipsytipsy
In summary: gather all the information you can about the movement of the request, and all the detail you can about the context of the request. System metrics and language internals are nice-to-have, usually useless and mostly a crutch.
Reply Retweet Like
Josh Wills Jan 10
Replying to @mipsytipsy
High cardinality? More like fly shardinality, amirite? (I’m sorry about this, I’ve been drinking for several hours now.)
Reply Retweet Like
Charity Majors Jan 10
Replying to @mipsytipsy
Dirty secret of distributed systems is how often “fixing” the errors just means “identify the component with a problem super fast, and route around it or decommission it or programmatically attempt to return it to a known good state”
Reply Retweet Like
Charity Majors Jan 10
Replying to @mipsytipsy
Understanding and diving deep into weird problems is *hard*, man. Takes time. Getting to take the time to really deeply explore a weird edge case is a luxury.
Reply Retweet Like
Charity Majors Jan 10
Replying to @josh_wills
Reply Retweet Like
Josh Wills Jan 10
Replying to @mipsytipsy
Don’t tease me, you know what I do for a living.
Reply Retweet Like