Twitter | Search | |
This is the legacy version of twitter.com. We will be shutting it down on 15 December 2020. Please switch to a supported browser or device. You can see a list of supported browsers in our Help Center.
Roel
Over the past few days I have been experimenting with all the different ways you can run scripts on a schedule. I'm not done yet, but I have come a long way. Get ready for an exciting thread with heroku, github, gitlab and docker! ⏬ 1/
Reply Retweet Like More
Roel Sep 23
Replying to @RoelMHogervorst
Why did I do this? There is a lot of on twitter but a huge amount of that is focused on graphics or shiny. And that is awesome! But I think R is absolutely ready for production workflows and one of the first steps for is scheduling a script. 2/
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
R is my favorite tool. I use it interactively: for quick explorations of data, for machine learning competitions or for interacting with a bazillion APIs. I have created several packages to do just that. Or for the creation of APIs {plumber} or graphic interaction with {shiny}/3
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
But like all programming languages, you can script your actions too. But how do you actually save time and let the script run itself? I think we could use more tutorials for that. /4
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
I distinguish between running a script on your own computer 👩‍💻, your own server☁️💻 or other places that are not under your control🪄. For an overview see this github page: . /5
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
Running an Rscript on your own computer 👩‍💻 is easy, just use `source("script.R")`. But that doesn't make your script run automatically. We have tools for that. 🕑On windows there is task scheduler and for linux and macos we have /6
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
But the scripts will not run if you your computer is offline 😭. What now? It is time for a server! ☁️💻this sounds more impressive that it is. A server is just another laptop always on(usually w/o a screen and sometimes in the cloud). From a raspberry pi to big computers. /7
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
You can use CRON on your server too. Most servers are linux based, but there are windows versions too. So that makes running a script on that 'cloud laptop' the same difficulty as on your local computer. But you do need to get the files on the server and results back. /8
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
When the number of scripts increase and start to depend on each other we need advanced options . For example workflow managers like DRAKE, or schedulers like Airflow or Luigi. Within larger companies these schedulers are set up for you /9
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
But you can set them up on your own server too. Getting Rscripts to work on airflow is not trivial but if you have the first one done it gets easier. (If you have examples, please ping me) These tools will give you automatic retries, notifications on failures and more stability
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
But maybe you only have a few scripts, you don't have computers lying around, don't want to rent one on the big cloud providers. There is another option. One I call ephemeral computing 🪄(because I love the way it sounds) Others call it serverless (.. but with servers ...) /11
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
🪄 this is like magic, because we don't care about the underlying computing. We tell the 🪄 what to do, and we only pay for execution time and memory. And for several of these, we use so little, that we stay in the free tier, and don't pay at all. /12
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
First the ones I feel are most easy to set up: integrations with version control websites. You are using version control right? Here is an intro . /13
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
Gitlab and github have created a lot of integrations and have tools of their own now to help with continuous integration / continuous deployment (CI/CD) fancy words for running automatic processes after every push to the repo. You can use it to test your packages, deploy sites ..
Reply Retweet Like
Roel Sep 23
Replying to @invertedushape1
... or run other arbitrary code. And that is what I will show you next. I created a script that creates an image and tweets it from . Is it serious? no. Is it useful? also no. But it IS an script. (find the repo here: ) /15
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
If you use github to sync and share your git repositories there is a new tool called actions. There are many examples in this repo and more explanations in this book /16
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
But apart from the actions on every push, you can also schedule github actions, and that is what I did here: /17
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
On to the next 🪄one: gitlab was the first to introduce their complete ci/cd toolset. They had the option to run scripts way earlier than github. I have created an example of running a script here /18
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
Both the github and gitlab scripts are hacked together from examples online. (If you have improvements I'd love to hear them!) It is quite slow because of the package installation, maybe I should create an intermediate container and use that one... (let me know) /19
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
This whole deep dive started when someone asked me about my heroku script from the past. Heroku is not really a version control site but it does work with git. Heroku takes over a lot of work for you, you only need a script and a renv.lock file thanks to:
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
Heroku just works*. Find my more detailed post about running something on heroku here (* Alright I broke it, but I tried something advanced) /21
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
It is also quite doable to pack up your script in a docker container and schedule that. The rocker project maintains docker containers for many R versions and even complete Rstudio instances in a docker container. /22
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
I also created a docker container for my tweet project. If I run this container a new random tweet is send. /23
Reply Retweet Like
Roel Sep 23
Replying to @_ColinFay @rOpenSci
For an introduction to R and docker I recommend s excellent introduction tutorial and the more extensive tutorial by /24
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
So what should you do, to get from an script to ? 1. make sure it works locally 2. remove all secrets you hardcoded in your script 3. replace them with Sys.getenv() calls or other solutions (let me know). 4. I really really recommend using
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
(continued) renv tells you exactly which packages you need for your script and can be used to install those packages on another machine too! 🪄!! And now there are choices: is it one script and does it run during your workhours? The easiest way is to schedule it on your laptop👩‍💻
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
Does it need to run without your computer and you don't have a computer lying around? go for the 🪄heroku or github/gitlab options. If you and your coworkers need to run multiple scripts it might be cheaper and easier to set up a server ☁️💻for yourselves, an extra laptop will do
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
There is an entire set of options that I haven't even mentioned yet 😳 serverless or function as a service (FAAS) all the major cloud services provide these. /28
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
You upload a dockercontainer or something similar to the cloud provider and they turn that into a 'function' something that runs when called. examples are AWS lambda, google cloud functions and Azure functions. see the link for packages to make it work for you /29
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
Continuing the advice: If you have creditcards ready and willing to set up a lot, these FAAS options can be a nice intermediate solution. You don't need to maintain a server and it will just work until you turn it off or stop paying your bills. /30
Reply Retweet Like
Roel Sep 23
Replying to @RoelMHogervorst
I believe all of these things (well maybe setting up airflow is too much) can be done by regular useRs. It is not as easy as creating a plot, and requires some commandline skills, but you can do it! Of course it would be nice if your IT department helps out. /31
Reply Retweet Like