Recreate the graphic from the RStudio exercise from the week 2 lab.
Save the output in a file called mtcars.png.
- Write an R script that generates an
sqlite3database containing themtcarsdata. - Write a Python script that reads in the data using
pandasand makes the plot usingseaborn
- This "pipeline" will be written and carried out using
snakemake. - The pipeline must be robust to change.
In other words, if you
touchany of the inputs, then the work flow should restart from that point and regenerate the necessary outputs.touchis a Unix command. If you are not familiar with it, Google it.
- You'll need to read the
snakemakedocs. - You'll have to figure out how to organize the "rules".
- A correct work flow will generate the final output file starting from a directory containing nothing other than the
Snakefileand the oneRand the onePythonscript.
A correct work flow will only execute the necessary steps when a script/input file is "touched". In other words:
- If you
touchyour R script, the sqlite3 database and figure will be regenerated. - If you
touchyour sqlite3 database or the Python script, then the figure will be regenerated, but not the database.
Most of the steps you need are in the material from previous weeks.
You need to discover how to save a seaborn plot to a file, though!
- How do you delete all output from a
snakemakework flow? - How do you delete output from a single
snakemakerule? - What is the citation for
snakemake?
In a new repository:
-
The
Snakefile -
The R and Python script.
-
mtcars.png -
The
README.mdfor the repo should display the image. -
The
README.mdshould contain the answers to the questions listed above. -
The
README.mdshould contain evidence thattouching the various files does the right thing. You can copy-paste the shell output that happens after youtouchand rerun the jobs. Place the output in a code fence in the README:``` Output ```Do not use screen shots.
The other mechanics are the same. Link to the new work from your homework web page, etc..