<div dir="ltr"><h3 class="gmail-storytitle" id="gmail-post-21414">A long comment section discussion follows this article at website below.</h3><div class="gmail-storytitle">Vision2020 Post: Ted Moffett</div><h3 class="gmail-storytitle">------------------------------------------</h3><h3 class="gmail-storytitle"><a href="http://www.realclimate.org/index.php/archives/2018/05/transparency-in-climate-science/">http://www.realclimate.org/index.php/archives/2018/05/transparency-in-climate-science/</a><span></span></h3><h3 class="gmail-storytitle">Transparency in climate science</h3><div class="gmail-meta">Filed under: <ul class="gmail-post-categories"><li><a href="http://www.realclimate.org/index.php/archives/category/climate-science/climate-modelling/" rel="category tag"><font color="#0066cc">Climate modelling</font></a></li>      <li><a href="http://www.realclimate.org/index.php/archives/category/climate-science/" rel="category tag"><font color="#0066cc">Climate Science</font></a></li>        <li><a href="http://www.realclimate.org/index.php/archives/category/climate-science/instrumental-record/" rel="category tag"><font color="#0066cc">Instrumental  Record</font></a></li>       <li><a href="http://www.realclimate.org/index.php/archives/category/climate-science/paleoclimate/" rel="category tag"><font color="#0066cc">Paleoclimate</font></a></li></ul> — gavin @ 12 May 2018  </div><div class="entry"><div class="gmail-at-above-post gmail-addthis_tool"><div class="gmail-kcite-section"><p>Good thing? Of course.<sup><font size="2">*</font></sup> </p><p><span id="gmail-more-21414"></span></p><p>I was invited to give a short presentation to a committee at the National Academies last week on issues of reproducibility and replicability in climate science for <a href="http://sites.nationalacademies.org/dbasse/bbcss/reproducibility_and_replicability_in_science/index.htm"><font color="#0066cc">a report they have been asked to prepare by Congress</font></a>. My<br><a href="http://www.realclimate.org/images//NRC_RR_Schmidt.pdf"><font color="#0066cc">slides</font></a> give a brief overview of the points I made, but basically the issue is <b>not</b> that there isn’t enough data being made available, but rather there is too much! </p><p>A small selection of climate data sources is given on our (cleverly named) “<a href="http://www.realclimate.org/index.php/data-sources/"><font color="#0066cc">Data Sources</font></a>” page and these and others are enormously rich repositories of useful stuff that climate scientists and the interested public have been diving into for years. Claims that have persisted for decades that “data” aren’t available are mostly bogus (to save the commenters the trouble of angrily demanding it, here is a <a href="http://www.meteo.psu.edu/holocene/public_html/shared/research/old/mbh98.html"><font color="#0066cc">link for data from the original hockey stick paper</font></a>. You’re welcome!).</p><p>The issues worth talking about are however a little more subtle. First off, what definitions are being used here. This committee has decided that formally:</p><ul><li><b>Reproducibility</b> is the ability to test a result using independent methods and alternate choices in data processing. This is akin to a different laboratory testing an experimental result or a different climate model showing the same phenomena etc. </li><li><b>Replicability</b> is the ability to check and rerun the analysis and get the same answer. </li></ul><p>[Note that these definitions are sometimes swapped in other discussions.] The two ideas are probably best described as checking the robustness of a result, or rerunning the analysis. Both are useful in different ways. Robustness is key if you want to make a case that any particular result is relevant to the real world (though that is necessary, not sufficient) and if a result is robust, there’s not much to be gained from rerunning the specifics of one person’s/one group’s analysis. For sure, rerunning the analysis is useful for checking the conclusions stemmed from the raw data, and is a great platform for subsequently testing its robustness (by making different choices for input data, analysis methods, etc.) as efficiently as possible. </p><p>So what issues are worth talking about? First, the big success in climate science with respect to robustness/reproducibility is the <a href="https://www.wcrp-climate.org/wgcm-cmip"><font color="#0066cc">Coupled Model Intercomparison Project</font></a> – all of the climate models from labs across the world running the same basic experiments with an open data platform that makes it easy to compare and contrast many aspects of the simulations. However, this data set is growing very quickly and the tools to analyse it have not scaled as well. So, while everything is testable in theory, bandwidth and computational restrictions make it difficult to do so in practice. This could be improved with appropriate server-side analytics (which are promised this time around) and the organized archiving of intermediate and derived data. Analysis code sharing in a more organized way would also be useful. </p><p>One minor issue is that while climate models are bit-reproducible at the local scale (something essential for testing and debugging), the environments for which that is true are fragile. Compilers, libraries, and operating systems change over time and preclude taking a code from say 2000 and the input files and getting exactly the same results (bit-for-bit) with simulations that are sensitive to initial conditions (like climate models). The emergent properties should be robust, and that is worth testing. There are ways to archive the run environment in digital ‘containers’, so this isn’t necessarily always going to be a problem, but this has not yet become standard practice. Most GCM codes are freely available (for instance, <a href="https://www.giss.nasa.gov/tools/modelE/"><font color="#0066cc">GISS ModelE</font></a>, and the officially open source <a href="https://github.com/E3SM-Project"><font color="#0066cc">DOE E3SM</font></a>). </p><p>There is more to climate science than GCMs of course. There are operational products (like <a href="http://data.giss.nasa.gov/gistemp"><font color="#0066cc">GISTEMP</font></a> – which is both replicable and reproducible), and paleo-climate records (such as are put together in projects like <a href="http://www.pages-igbp.org/ini/wg/2k-network/intro"><font color="#0066cc">PAGES2K</font></a>). Discussions on what the right standards are for those projects are being actively discussed (see <a href="https://www.clim-past.net/14/593/2018/cp-14-593-2018-discussion.html"><font color="#0066cc">this string of comments</font></a> or the <a href="https://lipd.net/"><font color="#0066cc">LiPD project</font></a> for instance).</p><p>In all of the real discussions, the issue is not <em>whether</em> to strive for R&R, but <em>how</em> to do it efficiently, usably, and without unfairly burdening data producers. The costs (if any) of making an analysis replicable are borne by the original scientists, while the benefits are shared across the community. Conversely, the costs of reproducing research is borne by the community, while benefits accrue to the original authors (if the research is robust) or to the community (if it isn’t). </p><p>One aspect that is perhaps under-appreciated is that if research is done knowing from the start that there will be a code and data archive, it is much easier to build that into your workflow. Creating usable archives as an after thought is much harder. This lesson is one that is also true for specific communities – if we build an expectation for organized community archives and repositories it’s much easier for everyone to do the right thing. </p><p>[<strong>Update:</strong> My fault I expect, but for folks not completely familiar with the history here, this is an old discussion – for instance, “<a href="http://www.realclimate.org/index.php/archives/2009/02/on-replication/"><font color="#0066cc">On Replication</font></a>” from 2009, a suggestion for a <a href="http://www.realclimate.org/index.php/archives/2017/02/someone-c-a-r-e-s/"><font color="#0066cc">online replication journal</font></a> last year, multiple posts focused on replicating previously published work (<a href="http://www.realclimate.org/index.php/archives/2015/08/lets-learn-from-mistakes/"><font color="#0066cc">e.g.</font></a>) etc…]</p><p><small><font size="2"><sup>*</sup> For the record, this does not imply support for the new EPA proposed rule on ‘transparency’<sup>**</sup>. This is an appallingly crafted ‘solution’ in search of a problem, promoted by people who really think that that the science of air pollution impacts on health can be disappeared by adding arbitrary hoops for researchers to jump through. </font><a href="https://www.healtheffects.org/publication/reanalysis-harvard-six-cities-study-and-american-cancer-society-study-particulate-air"><font color="#0066cc" size="2">They are wrong</font></a><font size="2">.</font></small></p><small><p><small><small><sup><font size="1">**</font></sup><font size="2"> Obviously this is my personal opinion, not an official statement.<span></span></font></small></small></p></small></div></div></div></div>