|
@
ESYudkowsky
|
|
Ours is the era of inadequate AI alignment theory. Any other facts about this era are relatively unimportant, but sometimes I tweet about them anyway.
|
|
|
4.570
Tweetovi
|
62
Pratim
|
29.699
Osobe koje vas prate
|
| Tweetovi |
|
Eliezer Yudkowsky
@ESYudkowsky
|
6 h |
|
If you zoom out even further, they're just some people trapped executing an adaptation that doesn't really make them happy and you should interrupt their Gray Boy loop.
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
6 h |
|
If someone on Twitter seems to really dislike you, do the compassionate thing and block them, because it's almost certain they lack the tiny shreds of internal competence required to just stop reading your damn tweets.
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
8 h |
|
#include "SteelmanningConsideredHarmfulAspireToPassTheIdeologicalTuringTestInstead.cpp"
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
9 h |
|
It's all over the place. Anytime you try to espouse a different standard for X, you get a huge crowd of applicants who couldn't hack the standard system for one reason or another. Potentially deadly if you don't realize in advance that's what you're doing.
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
9 h |
|
Thank you, random person online I don't know, for correctly performing the neverending and often thankless task of trying to rectify a straightforward mathematical misstep in popular understanding of something with political implications. It's not Right or Left, it's Math.
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
18 h |
|
In other words, the constraint doesn't prohibit Omega from guessing I'll take Box B because that's what my twin brother did in the same situation or because I have gene BXB18; it prohibits Omega from caring whether I decided that by alphabetizing or EU maximizing.
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
18 h |
|
The constraint is that Omega only cares about the algorithm's pattern of behavior, not that Omega can only predict future patterns of behavior by observing past patterns of behavior (and what would that even mean in real life where nothing ever repeats exactly?)
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
19 h |
|
That's definitely not one of my constraints. My agents live in a world with genes, facial expressions, MRI scans, widespread theories of mind, and so on. Even the humble notion of a Nash equilibrium presumes that CDT agents have a prior idea they're interacting with CDT agents.
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
19 h |
|
CDT fears the truth if Omega can predict its outputs in advance of measuring them, even if Omega has no care for the algorithm behind the output.
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
2. velj |
|
setting concept: in a fantasy world, the one god who’s an okay person is “worshipped” by all the complete bastards because that god never sends anyone to hell
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
2. velj |
|
Deepfakes, algorithms and datasets implementing social biases, autonomous weapons, these all seem orthogonal to AGI alignment issues AFAICT.
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
2. velj |
|
"adversarial examples" are one of the very rare cases that seem on-point to me; it touches on Goodhart and optimizing for vs around a criterion.
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
2. velj |
|
Virtually all of them; everything except transparency issues and "reward hacking" (poorly named, it's just reward obtaining) that I can think of offhand has approximately zero bearing on AGI alignment.
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
2. velj |
|
In other words, the key threshold for "modesty" is not the average person's status, but much status the average person thinks you have. A billionaire who acts like a millionaire will be called modest; a slave who acts like a peasant will be whipped for arrogance.
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
2. velj |
|
I'd say they're aspiring to "humility" (whether they're successfully humble depends on how well they guard against errors). Successful "modest" performance is when you act lower-status than the status people think you're entitled to, in a way that actually raises your status.
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
2. velj |
|
Many people do! People with strong social brains, anyone with math trauma, people raised with squicky feelings about money. It's just important to keep in mind that Mileage Varies.
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
2. velj |
|
If by life or by death I can save you, I will, but you can't make me do it with dignity.
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
2. velj |
|
I feel like I should have predicted faster that the sort of person who hates me will loudly complain about having seen a weight-loss pic of me, then force their fellows to look at it too. I'd regret generating so much disutility, but I don't even know if that's what it is.
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
1. velj |
|
I've long since forgotten the first person to tell me I was onto something, and will never ever forget the first person who sent me $1000 to pursue it.
|
||
|
|
||
|
Eliezer Yudkowsky
@ESYudkowsky
|
1. velj |
|
*Sigh.* If you're actually at $500/month then I'm not going to spring the trap on you for $500. Here's a picture of myself I just took. Meditate upon it and become wiser. pic.twitter.com/sw144pdZlO
|
||
|
|
||