Twitter | Pretraživanje | |
Rohin Shah
PhD student at the Center for Human-Compatible AI at UC Berkeley. I publish the Alignment Newsletter.
100
Tweetovi
85
Pratim
1.052
Osobe koje vas prate
Tweetovi
Rohin Shah 29. sij
Wondering what the field of long-term AI safety does, but don't want to read hundreds of posts? Check out my review of work done in 2018-19! Please do leave comments and suggestions: The summary is also Alignment Newsletter #84:
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah 22. sij
[Alignment Newsletter #83]: Sample-efficient deep learning with ReMixMatch -
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah 15. sij
[Alignment Newsletter #82]: How OpenAI Five distributed their training computation -
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah 8. sij
[Alignment Newsletter #81]: Universality as a potential solution to conceptual difficulties in intent alignment -
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah 2. sij
Alignment Newsletter #80: Why AI risk might be solved without additional intervention from longtermists -
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah 1. sij
Alignment Newsletter #79: Recursive reward modeling as an alignment technique integrated with deep RL -
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah 26. pro
Alignment Newsletter #78: Formalizing power and instrumental convergence, and the end-of-year AI safety charity comparison -
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah 18. pro
[Alignment Newsletter #77]: Double descent: a unification of statistical theory and modern ML practice -
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah proslijedio/la je tweet
Niel Bowerman 12. pro
Shout out to for his impressive AI alignment newsletter. If you want to keep up to speed with what is going on in the field of AI alignment, there's nothing better: His team has summarised 1,200 papers to date!
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah proslijedio/la je tweet
Adam Gleave 11. pro
Want to ensure AI is beneficial for society? Come talk to like-minded people at the Human-Aligned AI Social at , Thursday 7-10 pm, room West 205-207.
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah 4. pro
[Alignment Newsletter #76]: How dataset size affects robustness, and benchmarking safe exploration by measuring constraint violations -
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah 27. stu
[Alignment Newsletter #75]: Solving Atari and Go with learned game models, and thoughts from a MIRI employee -
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah 20. stu
[Alignment Newsletter #74]: Separating beneficial AI into competence, alignment, and coping with impacts -
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah 13. stu
[Alignment Newsletter #73]: Detecting catastrophic failures by learning how agents tend to break -
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah 6. stu
[Alignment Newsletter #72]: Alignment, robustness, methodology, and system building as research priorities for AI safety -
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah 30. lis
[Alignment Newsletter #71]: Avoiding reward tampering through current-RF optimization -
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah 23. lis
[Alignment Newsletter #70]: Agents that help humans who are still learning about their own preferences -
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah 21. lis
Odgovor korisniku/ci @rohinmshah
Real humans adapt to the opaque protocols that SP learns, and play differently than the naive behavior cloned model that our agent was trained against, so the effect is smaller. Nonetheless, the human-aware agent still does better, sometimes beating human performance! (4/4)
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah 21. lis
Odgovor korisniku/ci @rohinmshah
We need an agent that has the right “expectation” about its partner. Obvious solution: train a human model with behavior cloning, and then train an agent to play well with that model. This does way better than SP in simulation (i.e. evaluated against a “test” human model). (3/4)
Reply Retweet Označi sa "sviđa mi se"
Rohin Shah 21. lis
Odgovor korisniku/ci @rohinmshah
In competitive games, the minimax theorem allows self-play to be agnostic to its opponent: if they are suboptimal, SP will crush them even harder. That doesn’t work in collaborative games, where the partner’s suboptimal move and SP’s failure to anticipate it will hurt. (2/4)
Reply Retweet Označi sa "sviđa mi se"