Twitter | Search | |
Vicki Boykis
Stats/data people: Tired of iris and mtcars? Tell me about your favorite heterogenous, small dataset! (I.e. has both numerical and text-value columns), is ideally smaller than 500 rows or so, is interesting to work with.
Reply Retweet Like More
Vicki Boykis Jul 22
Replying to @vboykis
Ok, I'll start. This forest fires dataset looks like a good contender: 517 values, 13 attributes, 2 of which are text.
Reply Retweet Like
Hadley Wickham Jul 22
Replying to @vboykis
Anything in UCI is automatically disqualified
Reply Retweet Like
Thomas Mock Jul 22
Replying to @vboykis
I like dplyr::starwars, as it has a lot of text/factors for counting/summarizing in addition to a few numeric columns. 87 rows by 13 columns.
Reply Retweet Like
Dr Jenny Richmond Jul 22
Replying to @vboykis @EmmaVitz
This list has lots of cool datasets
Reply Retweet Like
Vicki Boykis Jul 22
Replying to @hadleywickham
Why’s that?
Reply Retweet Like
Neil Kodner Jul 22
Replying to @vboykis
I'm in the middle of moving so I can't dig up a link right now-but see if you can find the Emergency Room Visits data. It's gold. I think gizmodo analyzes the results each year.
Reply Retweet Like
Ph.Demetri Jul 22
Replying to @vboykis
My own tweets, via Rtweet
Reply Retweet Like
Matt Harris Jul 22
Replying to @vboykis
There are some unique datasets of similar description in the archdata package. data is cool, but I am biased :)
Reply Retweet Like
Ken Butler Jul 22
Replying to @vboykis @R4DScommunity
Australian athletes data at (I forget where I got it from originally)
Reply Retweet Like
Alison Hill Jul 22
Replying to @vboykis @YhatHQ
This list has some good ones (pigeon racing, marijuana street prices): from
Reply Retweet Like
Ken Butler Jul 22
Replying to @vboykis @R4DScommunity
I have example problems that use the data set. Somewhere.
Reply Retweet Like
Emma Vitz Jul 22
This is basically where I got all the datasets that I used for writing the tutorial questions for a first year university stats course
Reply Retweet Like
Peter Nosko Jul 22
My retirement account.
Reply Retweet Like
Melissa Jul 22
Replying to @vboykis
Glassdoor company stats via their api.
Reply Retweet Like
Hadley Wickham Jul 22
Replying to @vboykis
They’re the iris of ML
Reply Retweet Like
David Robinson Jul 22
Replying to @hadleywickham @vboykis
1. Isn’t iris the iris of ML 2. Isn’t gapminder the iris of EDA
Reply Retweet Like
Hadley Wickham Jul 22
Replying to @drob @vboykis
diamonds is the iris of EDA
Reply Retweet Like
George McIntire Jul 22
Replying to @vboykis
I got tired using iris, titanic, Boston housing, so that helped inspire me to create my own ML dataset
Reply Retweet Like
Joseph Cook Jul 22
Replying to @vboykis
I’ll just grab a season from basketball reference.
Reply Retweet Like