|
Rohin Shah
@
rohinmshah
Berkeley, CA
|
|
PhD student at the Center for Human-Compatible AI at UC Berkeley. I publish the Alignment Newsletter.
|
|
|
100
Tweetovi
|
85
Pratim
|
1.052
Osobe koje vas prate
|
| Tweetovi |
|
Rohin Shah
@rohinmshah
|
29. sij |
|
Wondering what the field of long-term AI safety does, but don't want to read hundreds of posts? Check out my review of work done in 2018-19! Please do leave comments and suggestions: docs.google.com/document/d/1Fn…
The summary is also Alignment Newsletter #84: mailchi.mp/1af38085edc5/a…
|
||
|
|
||
|
Rohin Shah
@rohinmshah
|
22. sij |
|
[Alignment Newsletter #83]: Sample-efficient deep learning with ReMixMatch - mailchi.mp/ff565f097630/a…
|
||
|
|
||
|
Rohin Shah
@rohinmshah
|
15. sij |
|
[Alignment Newsletter #82]: How OpenAI Five distributed their training computation - mailchi.mp/7ba40faa7eed/a…
|
||
|
|
||
|
Rohin Shah
@rohinmshah
|
8. sij |
|
[Alignment Newsletter #81]: Universality as a potential solution to conceptual difficulties in intent alignment - mailchi.mp/6078fe4f9928/a…
|
||
|
|
||
|
Rohin Shah
@rohinmshah
|
2. sij |
|
Alignment Newsletter #80: Why AI risk might be solved without additional intervention from longtermists - mailchi.mp/b3dc916ac7e2/a…
|
||
|
|
||
|
Rohin Shah
@rohinmshah
|
1. sij |
|
Alignment Newsletter #79: Recursive reward modeling as an alignment technique integrated with deep RL - mailchi.mp/8d9e3703fbde/a…
|
||
|
|
||
|
Rohin Shah
@rohinmshah
|
26. pro |
|
Alignment Newsletter #78: Formalizing power and instrumental convergence, and the end-of-year AI safety charity comparison - mailchi.mp/eef1d6c95d7c/a…
|
||
|
|
||
|
Rohin Shah
@rohinmshah
|
18. pro |
|
[Alignment Newsletter #77]: Double descent: a unification of statistical theory and modern ML practice - mailchi.mp/d2f2d15b7114/a…
|
||
|
|
||
| Rohin Shah proslijedio/la je tweet | ||
|
Niel Bowerman
@NielBowerman
|
12. pro |
|
Shout out to @rohinmshah for his impressive AI alignment newsletter. If you want to keep up to speed with what is going on in the field of AI alignment, there's nothing better: rohinshah.com/alignment-news…
His team has summarised 1,200 papers to date!
|
||
|
|
||
| Rohin Shah proslijedio/la je tweet | ||
|
Adam Gleave
@ARGleave
|
11. pro |
|
Want to ensure AI is beneficial for society? Come talk to like-minded people at the Human-Aligned AI Social at #NeurIPS2019, Thursday 7-10 pm, room West 205-207. nips.cc/Conferences/20… @claudia_shi57 @victorveitch pic.twitter.com/0KgrHGZSiu
|
||
|
|
||
|
Rohin Shah
@rohinmshah
|
4. pro |
|
[Alignment Newsletter #76]: How dataset size affects robustness, and benchmarking safe exploration by measuring constraint violations - mailchi.mp/1106d0ce6766/a…
|
||
|
|
||
|
Rohin Shah
@rohinmshah
|
27. stu |
|
[Alignment Newsletter #75]: Solving Atari and Go with learned game models, and thoughts from a MIRI employee - mailchi.mp/3e34fa1f299a/a…
|
||
|
|
||
|
Rohin Shah
@rohinmshah
|
20. stu |
|
[Alignment Newsletter #74]: Separating beneficial AI into competence, alignment, and coping with impacts - mailchi.mp/49c956f84771/a…
|
||
|
|
||
|
Rohin Shah
@rohinmshah
|
13. stu |
|
[Alignment Newsletter #73]: Detecting catastrophic failures by learning how agents tend to break - mailchi.mp/ef55eb52b0fd/a…
|
||
|
|
||
|
Rohin Shah
@rohinmshah
|
6. stu |
|
[Alignment Newsletter #72]: Alignment, robustness, methodology, and system building as research priorities for AI safety - mailchi.mp/cac125522aa3/a…
|
||
|
|
||
|
Rohin Shah
@rohinmshah
|
30. lis |
|
[Alignment Newsletter #71]: Avoiding reward tampering through current-RF optimization - mailchi.mp/938a7eed18c3/a…
|
||
|
|
||
|
Rohin Shah
@rohinmshah
|
23. lis |
|
[Alignment Newsletter #70]: Agents that help humans who are still learning about their own preferences - mailchi.mp/732eaa192df0/a…
|
||
|
|
||
|
Rohin Shah
@rohinmshah
|
21. lis |
|
Real humans adapt to the opaque protocols that SP learns, and play differently than the naive behavior cloned model that our agent was trained against, so the effect is smaller. Nonetheless, the human-aware agent still does better, sometimes beating human performance! (4/4) pic.twitter.com/FmR9Mn2Xwx
|
||
|
|
||
|
Rohin Shah
@rohinmshah
|
21. lis |
|
We need an agent that has the right “expectation” about its partner. Obvious solution: train a human model with behavior cloning, and then train an agent to play well with that model. This does way better than SP in simulation (i.e. evaluated against a “test” human model). (3/4) pic.twitter.com/v1ykAkLpkE
|
||
|
|
||
|
Rohin Shah
@rohinmshah
|
21. lis |
|
In competitive games, the minimax theorem allows self-play to be agnostic to its opponent: if they are suboptimal, SP will crush them even harder. That doesn’t work in collaborative games, where the partner’s suboptimal move and SP’s failure to anticipate it will hurt. (2/4) pic.twitter.com/6I6KwLOp0Z
|
||
|
|
||