class: right, middle # If I have seen further it is by standing on the sholders [sic] of Giants..red[*] ## [Casey Greene](http://twitter.com/greenescientist) ## Contribute on [GitHub](https://github.com/greenelab/computational-reagents) .footnote[.red[*][Letter from Newton to Hooke](https://en.wikipedia.org/wiki/Standing_on_the_shoulders_of_giants#References_during_the_sixteenth_to_nineteenth_centuries) expressing a sentiment [earlier attributed to Bernard of Chartres](https://en.wikipedia.org/wiki/Standing_on_the_shoulders_of_giants#Attribution_and_meaning) via [wikipedia](https://en.wikipedia.org/wiki/Standing_on_the_shoulders_of_giants).] --- Tasks that participants must complete are set apart from other elements of the presentation as shown below. Due dates are indicated. Participants must complete all elements by these due dates to participate during in person discussions: .task[ **Before 8:00 AM on January 26**: - count to ten. - read [Stodden et al.](https://doi.org/10.1126/science.aah6168) ] --- # Why make my work reproducible and transparent? - Reproducibility is a basic principle of science. - Repeatable workflows make our lives easier. - We want to graduate one day, so we can't be irreplaceable to our lab. - You are a science giant and can help others see further. --- # Why make my work reproducible and transparent? .task[ **Before 8:00 AM on February 1**, carefully consider important reasons and file a [pull request](https://github.com/greenelab/computational-reagents/edit/master/index.html) to add your own reasons to this slide. If you need more information about how to do this, see the [README](https://github.com/greenelab/computational-reagents/). Add a new bullet point with: - Facilitates careful review of conclusions by reviewers and more rapid improvements by colleagues allowing science to progress more quickly and reliably. - because I want to be accountable for my conclusions. If my work is solid, transparent and reproducible, but my results are misleading, it’s justifiable to have faulty conclusions. They will be corrected over time with more samples, additional tests, etc. - because we would not want to end up like our colleagues in Psychology: ["Over half of psychology studies fail reproducibility test"](https://goo.gl/IJw4Nh) - I want to make my work reproducible and transparent so that I won't have to think about it ever again and can move forward in my life. - Ethical obligation. - Foster a positive reputation among colleagues. - So that when I or others want to repeat an experiment (especially after a long period of time) there is a reference that can be used to repeat it exactly, troubleshoot, and/or improve upon it. - By making my work transparent, I can get input from others, and thus save time from trying methods that don't work or find gaps in the way I am thinking about a problem. - to make sure I agree with my scientific ethics. ] --- layout: false .left-column[ ### What's needed? ] .right-column[ I want to be reproducible and transparent. How do I start? ] --- layout: false .left-column[ ### What's needed? #### Source Code ] .right-column[ Any source code that is used from data collection to the conclusion of your analyses. This includes: - pre-processing code (adapter trimming, etc). - the nuts and bolts of your analysis. - code that generates the plots used to compose figures. This is a non-exclusive list. If you can't get the same results without it, you need it. ] --- layout: false .left-column[ ### What's needed? #### Source Code #### Data ] .right-column[ Any data that you analyze to the extent possible. Some genetic data can not be shared without restrictions. In these cases, you should take alternate steps to help others confirm that they have the same data. Readers should be able to easily obtain: - any data that does not have sharing restrictions. - for data that can't be shared, can you compute a [hash](https://en.wikipedia.org/wiki/Sha1sum)? - for data that can't be shared, provide an example file with dummy values. - the [random seed(s)](https://en.wikipedia.org/wiki/Random_seed) that you used. This is also a non-exclusive list. If you can't get the same results without it, you need to preserve and, if possible, share it under an [open license](http://opendefinition.org/) that permits re-use. ] --- layout: false .left-column[ ### What's needed? #### Source Code #### Data #### Environment ] .right-column[ Don't overlook your computing environment. The software that you have installed can change the results that you observe. This is particularly true for: - [versions](http://www.informit.com/articles/article.aspx?p=1439189) of your programming language(s). - [versions](http://scikit-learn.org/stable/whats_new.html#enhancements) of packages and software libraries that you use? - items that are [tied to our understanding of the genome](http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp). This is also a non-exclusive list. If you can't get the same results without it, you need to preserve and share it. Ideally under an [open license](https://opensource.org/) that permits re-use. ] --- layout: false .left-column[ ### What's needed? #### Source Code #### Data #### Environment #### Transparency ] .right-column[ What did you do and when did you do it? The lab notebook is a critically important tool for scientists to track and share this information. In the computational sciences, we have a number of means available to us. But we have to use them carefully! Inadequate use may compromise our ability to nail down what we did when. - [Version control](http://doi.org/10.1371/journal.pcbi.1004947) can record what code existed at which time. - [Preprints](https://doi.org/10.15252/embj.201670030) can result in [feedback above and beyond](https://github.com/greenelab/deep-review/issues/110) what you receive during peer review. We must aim to get things right. Preprints are a tool to help us do this. ] --- layout: false .left-column[ ### What's needed? #### Source Code #### Data #### Environment #### Transparency #### Archiving ] .right-column[ We haven't discussed how you should store and share these items. Digital artifacts - that is your source code, data, and compute environment - should be archived and disseminated. Where can you store these items? - [Zenodo](http://zenodo.org) offers a service at no cost that connects to GitHub to [capture releases as citeable objects](https://guides.github.com/activities/citable-code/). - [figshare](https://figshare.com/) offers a [similar service that can auto-sync each release](https://support.figshare.com/support/solutions/articles/6000150264-how-to-connect-figshare-with-your-github-account). - [Institutional repositories](http://repository.upenn.edu/about.html). I'm not currently aware of a Penn service designed for code, data, or compute environments. Things that you should look for in an archiving source: Are authors unable to delete their own uploads? Does the artifact receive a digital object identifier (doi)? [This paper](http://doi.org/10.1371/journal.pcbi.1005097) discusses important considerations. ] --- # What tools exist to help out? Find and share: .task[ **Before 8:00 AM on January 26**, consider tools for reproducible and transparent research that you have used or heard of before. Add an [issue](https://github.com/greenelab/computational-reagents/issues) for a tool that nobody else has filed an issue on yet. Put the tool's name in the issue's title. In the issue text, explain whether you think the tool is necessary, sufficient, helpful, or irrelevant to transparent and reproducible computational research. ] And evaluate other contributions: .task[ **Before 8:00 AM on February 1**, comment on at least three other [issues](https://github.com/greenelab/computational-reagents/issues). Learn about the tool if necessary, and explain whether you think the tool is necessary, sufficient, helpful, or irrelevant to transparent and reproducible computational research. ] --- # Evaluate the literature. .task[ **Before 8:00 AM on February 1**, evaluate preprints or peer reviewed papers: - Find one that you think exemplifies reproducibility and transparency. - Find one that you think exhibits poor reproducibility, transparency, or both. - Find one that you are a coauthor on. If you haven't written a paper yet, find one from your current lab. Post links to each paper in this [github issue](https://github.com/greenelab/computational-reagents/issues/4). ]