{"id":3574,"date":"2023-07-20T12:28:38","date_gmt":"2023-07-20T11:28:38","guid":{"rendered":"https:\/\/lookingatnothing.com\/?p=3574"},"modified":"2023-07-20T12:28:40","modified_gmt":"2023-07-20T11:28:40","slug":"traceable-trustworthy-science-tackling-metadata-holistically","status":"publish","type":"post","link":"https:\/\/lookingatnothing.com\/index.php\/archives\/3574","title":{"rendered":"Traceable, trustworthy science: tackling metadata holistically"},"content":{"rendered":"\n<p>Time to thoroughly tackle a tough topic: trust and traceability in science. I&#8217;m sure we won&#8217;t tackle it for science in general, but at least we can think about it in the framework of the workflows in our lab&#8230; <\/p>\n\n\n\n<p>Generally speaking, scientific findings are conclusions, drawn from an interrelated amalgamation of information. A reasonable effort is done by the scientist when writing a paper to document this amalgamation, but a lot of information is omitted from these publications, either by convention, for brevity, or for space reasons. <\/p>\n\n\n\n<p>If we want to get people to trust these scientific findings more, we need to make the entire argumentation path traceable. Once that is done, a level of trust can be assigned independent of the scientist who did the work (in an ideal world at least). Bonus points if your work is so traceable that it can be reproduced. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"461\" src=\"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-1024x461.png\" alt=\"\" class=\"wp-image-3577\" srcset=\"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-1024x461.png 1024w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-300x135.png 300w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-768x346.png 768w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-1536x692.png 1536w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image.png 1854w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>There have been plenty of efforts to make science more traceable and reproducible, but many of these are limited to a technique, a dataset, or an analysis. As a gedankenexperiment*, let&#8217;s see what would be needed to traceably document a complete experiment in our MOUSE lab. <\/p>\n\n\n\n<p>In my presentations I have always presented the following simplified workflow, so we can use that as a starting point. We need to cover aspects of sample preparation and sample selection, the measurement process, the data correction aspects, the (multiple) analysis details and how the analyses are turned into an interpretation (or a finding): <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"154\" src=\"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-1-1024x154.png\" alt=\"\" class=\"wp-image-3578\" srcset=\"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-1-1024x154.png 1024w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-1-300x45.png 300w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-1-768x115.png 768w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-1-1536x230.png 1536w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-1.png 1840w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>That, of course, is overly simplified. When we work this out a bit, we would be able to expand this to the following workflow (click to enlarge): <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-3.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"125\" src=\"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-3-1024x125.png\" alt=\"\" class=\"wp-image-3580\" srcset=\"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-3-1024x125.png 1024w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-3-300x37.png 300w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-3-768x94.png 768w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-3-1536x187.png 1536w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-3-2048x250.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>This shows the individual steps and the quantities of files we are dealing with for a batch of measurements. Fortunately, a significant chunk of this can be automatically processed automatically with our RunDeck instance: <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-4.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"142\" src=\"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-4-1024x142.png\" alt=\"\" class=\"wp-image-3581\" srcset=\"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-4-1024x142.png 1024w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-4-300x42.png 300w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-4-768x107.png 768w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-4-1536x213.png 1536w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-4-2048x284.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>For traceability, we are storing several pieces of information in our SciCat measurement catalog, including information on the user, the samples and the project:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-5.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"180\" src=\"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-5-1024x180.png\" alt=\"\" class=\"wp-image-3582\" srcset=\"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-5-1024x180.png 1024w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-5-300x53.png 300w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-5-768x135.png 768w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-5-1536x269.png 1536w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-5-2048x359.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>This already looks quite promising, but once you take another step back, you will see that we are still not capturing a lot of scientific (meta-)data and domain knowledge essential to arriving at the conclusion. Examples are information on codebases (repository links and commit IDs would suffice here), but also sample history, project scope and timeline, configuration files, masks, other analyses, domain knowledge and literature links: <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-6.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"231\" src=\"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-6-1024x231.png\" alt=\"\" class=\"wp-image-3583\" srcset=\"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-6-1024x231.png 1024w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-6-300x68.png 300w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-6-768x173.png 768w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-6-1536x346.png 1536w, https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/image-6-2048x462.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>In an ideal world, we would catalog many of these auxilary files and links, and link them to the workflow that was used to arrive at particular scientific findings. This only requires time, and an in-depth understanding of the scientists of the workflow and information they used to get somewhere. We will be spending some time to try to capture more of this into scicat, and see how far we can get. <\/p>\n\n\n\n<p>*) and, in fact, for a presentation I gave for the SciCat user meeting this week&#8230;<\/p>\n","protected":false},"excerpt":{"rendered":"<div class=\"mh-excerpt\"><p>Time to thoroughly tackle a tough topic: trust and traceability in science. I&#8217;m sure we won&#8217;t tackle it for science in general, but at least <a class=\"mh-excerpt-more\" href=\"https:\/\/lookingatnothing.com\/index.php\/archives\/3574\" title=\"Traceable, trustworthy science: tackling metadata holistically\">[&#8230;]<\/a><\/p>\n<\/div>","protected":false},"author":2,"featured_media":3584,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":4,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1],"tags":[],"class_list":["post-3574","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/lookingatnothing.com\/wp-content\/uploads\/2023\/07\/4344024048_b4f2560389_b.jpg","jetpack_shortlink":"https:\/\/wp.me\/p1gZ2v-VE","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/lookingatnothing.com\/index.php\/wp-json\/wp\/v2\/posts\/3574","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lookingatnothing.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lookingatnothing.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lookingatnothing.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lookingatnothing.com\/index.php\/wp-json\/wp\/v2\/comments?post=3574"}],"version-history":[{"count":10,"href":"https:\/\/lookingatnothing.com\/index.php\/wp-json\/wp\/v2\/posts\/3574\/revisions"}],"predecessor-version":[{"id":3592,"href":"https:\/\/lookingatnothing.com\/index.php\/wp-json\/wp\/v2\/posts\/3574\/revisions\/3592"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/lookingatnothing.com\/index.php\/wp-json\/wp\/v2\/media\/3584"}],"wp:attachment":[{"href":"https:\/\/lookingatnothing.com\/index.php\/wp-json\/wp\/v2\/media?parent=3574"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lookingatnothing.com\/index.php\/wp-json\/wp\/v2\/categories?post=3574"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lookingatnothing.com\/index.php\/wp-json\/wp\/v2\/tags?post=3574"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}