Twitter | Pretraživanje | |
Hilary Parker 31. sij
I had a bit of a breakthrough in terms of my thinking of data science thanks to all the interesting discussions at --
Reply Retweet Označi sa "sviđa mi se"
Hilary Parker
The way we talk about data science and focus so much on methods, we actually incentivize working with *bad* data, rather than spending the time to collect good data and then use easy methods with it
Reply Retweet Označi sa "sviđa mi se" More
Hilary Parker 31. sij
Odgovor korisniku/ci @hspter
I want us to have a conference solely focused on how people collect data, and all the politics / product negotiations / etc that go along with that
Reply Retweet Označi sa "sviđa mi se"
Hilary Parker 31. sij
Odgovor korisniku/ci @hspter
One small thing I am doing at Stitch Fix -- for the datasets we use, I'm referencing them as e.g. "the data that Cindy, Ping and Francesca created" rather than "the stylecard data".
Reply Retweet Označi sa "sviđa mi se"
Patrick Blanchenay 1. velj
Odgovor korisniku/ci @hspter @statwonk
This is a false dichotomy. In an ideal world, data collection informs the method, and the choice of method might influence data collection. And in reality, one usually has more choice over method than over data collection.
Reply Retweet Označi sa "sviđa mi se"
Hilary Parker 2. velj
Odgovor korisniku/ci @PBlanchenay @statwonk
I don't disagree that it is a false dichotomy. I think data scientists in general have much more influence than they might think, but it just takes more time and politics than most will tolerate
Reply Retweet Označi sa "sviđa mi se"
Ben Greve 31. sij
Odgovor korisniku/ci @hspter
Agreed. Spending time on dealing with data quality problems and calculating better, more relevant features will almost always contribute more to the predictive power of a model than tuning the model or increasing its complexity.
Reply Retweet Označi sa "sviđa mi se"
Cece🌊🌊🌊 31. sij
Odgovor korisniku/ci @benjamingreve @hspter
Part of the problem is data cleaning is seen as “less than”. But I know the scientific decisions that are forfeited if you just let some low-pay lackey do it. Oh wait that’s me.
Reply Retweet Označi sa "sviđa mi se"
Arman Oganisian 1. velj
Odgovor korisniku/ci @hspter
I agree there's over-hype around cool methods and not enough thought about the data generating process that our sampling scheme should be capturing. But fancy methods are necessary too because accurately capturing the process is often infeasible (cost, ethical constraints, etc)
Reply Retweet Označi sa "sviđa mi se"
Hilary Parker 1. velj
Odgovor korisniku/ci @StableMarkets
Oh for sure, no doubt about that. But for many tech applications specifically, the juice is not worth the squeeze
Reply Retweet Označi sa "sviđa mi se"
Guy Maskall 🇪🇺 🔶 #FBPE 1. velj
Odgovor korisniku/ci @hspter
That's starting to sound like putting "science" into "data science".
Reply Retweet Označi sa "sviđa mi se"
Hilary Parker 1. velj
Odgovor korisniku/ci @GuyMaskall
it's funny bc I talked at a plant pathology conf recently and was like "I guess I don't have to tell you all to care about the underlying question"
Reply Retweet Označi sa "sviđa mi se"