2024-06-23 14:43:31 +02:00
\title{Data sharing}
\author{Nora Wickelmaier}
\date{June 24, 2024}
\begin{frame}{What are the benefits of sharing your data?}
\includegraphics[width = 5cm]{../figures/QR Code for Methodenseminar SS 2024 - Session 4}
\begin{frame}[<+->]{Benefits of sharing data}
Sharing data
\item[\dots] ensures that data are not ultimately lost (save data for posterity)
\item[\dots] is consistent with scientific norms of openness and rigor
\item[\dots] increases citation scores of papers
\item[\dots] encourages more research because it enables secondary analyses
\item[\dots] facilitates subsequent reanalyses (correct errors, emphasize
robustness of original results)
\item[\dots] is demanded by most third party funding agencies
Date & Topic
2024-05-13 & Introduction to data management
2024-05-27 & Workflow
2024-06-10 & Data organisation
2024-06-24 & Data sharing
2024-07-08 & Clean coding
2024-07-22 & Version control
\section{Data organisation}
\begin{frame}[<+->]{What we covered so far}
\item What habits do we need for effective research data management?
\item What is a workflow and why do we need one?
\item What needs to be considered when naming files of a research project?
\item How to organize folders for a research project?
\item What metadata should be added to my research project?
\item What are good ways to document a data set?
\begin{frame}{Examples for documenting data sets}
\item A recent paper with published data by \citet{Ngo2023} investigating
what cues are considered by Twitter users to identify social bots
\item A multi-cohort, longitudinal study by the Hector Research Institute of
Education Sciences and Psychology at the university of Tübingen:
Transformation of the secondary school system and academic careers
\item Editorial on why to publish your data with an accompanying data set
by \citet{Wicherts2012}
They provide
\item A data set with 221 observations and 633 variables
\item A PDF with all measures and the scenario used for collecting the data
\item Go to \url{} and download the files
\texttt{data.csv} and
\texttt{Experimental-Study-Measures and scenario.pdf}
\item Read the data into R using \texttt{read.csv()}
\item Find out which variables in the data correspond to measure
(BTW: Sharing the data in this form is better than \emph{not} sharing them,
in my opinion)
\begin{frame}{What additional information do we need to use these data?}
\includegraphics[width = 5cm]{../figures/QR Code for Methodenseminar SS 2024 - Session 4}
\item Multi-cohort study that includes longitudinal data for several cohorts
\item Broad spectrum of achievement test data and psycho-social variables
\item Large number of publications on different topics using these data
\item This is not the original data set, but a prepared version for teaching
statistics (hence, proportions in the data and the codebook are not
\item Read the data set \texttt{TOSCAtoTeach\_W123.sav} into R using
\texttt{foreign::read.spss()} or \texttt{haven::read\_spss()}
\item Create contingency tables for the variables \texttt{sform} and
\texttt{szweig1} and compare the results to the codebook
\hfill{\tiny \url{}}
They provide
\item A data set with 537 observations and 79 variables
\item A codebook with variable names and some descriptive statistics for
the scales (\texttt{1-s2.0-S0160289612000050-mmc1.doc})
\item ``Publish (your data) or (let the data) perish! Why not publish your
data too?''
\item Data come from freshman-testing program called ``Testweek''
\item (Try \texttt{readxl::read\_excel()} to read the data into R)
\begin{frame}{What is the single one thing that would make sharing these data
indefinitely better?}
\includegraphics[width = 5cm]{../figures/QR Code for Methodenseminar SS 2024 - Session 4}
\begin{frame}[<+->]{Non-anonymous data}
\item Before putting data into any cloud, you should always take a moment to
reflect if your data are anonymous
\item No (third-party) cloud storage, even if it is not publicly accessible
\item If your data contains personal data, it should always be stored
locally, ideally on an encrypted device
\item You should have a plan --- bofore ever collecting the data --- how,
when, and by whom the data will be anonymized
\item All data should eventually be anonymized! (Yes, even audio and video
\item IWM servers can be considered local
\section[Collaborative use]{Sharing data for collaborative use}
\begin{frame}[<+->]{Working together with the same data}
\item Part of data organisation is to think about who needs access to
your data
\item Often these are colleagues from the same lab and there is
infrastructure to share files and scripts easily
\item The IWM offers several solutions for sharing your data (internally and
\item When the end goal is to make the data public, it might be a good idea
to work together at a place where the data can go public at a certain
point in time
\item We will look at two possiblities: OSF and Github
\begin{frame}{IWM solutions}
IWM servers
\item Nextcloud: \url{}
\item Gitea: \url{}
\item Shared drive: \texttt{Y:/}
Microsoft servers
\item OneDrive
\item Teams
\begin{frame}{Open Science Framework}
\item ``OSF is a free and open source project management tool that supports
researchers throughout their entire project lifecycle.''
\item Founded in 2012 and constantly developed: \url{}
\item Meant to integrate all research steps
\includegraphics[scale = .2]{../figures/osf_workflow.png}
\begin{frame}[fragile]{Let's try it out}
\item You need an OSF account -- just sign up with an e-mail address or use ORCID
\item Sign in
\item Create a project
\item Upload (or link) your files
\item Invite contributors
\includegraphics[scale = .4]{../figures/licenses_osf.png}
\item OSF offers you several options for licenses
\item For data the Creative Common (CC) licenses are usually a good option
\item For software, other options might be better suited
\item For code (e.\,g., analysis scripts) CC licenses are also a good
\hfill{\footnotesize \url{}}
\hfill{\footnotesize \url{}}
\hfill{\footnotesize \url{}}
\item Developer platform that allows developers to create, store, manage and
share code
\item Based on Git software providing version control
\item[+] access control
\item[+] bug tracking
\item[+] software feature requests
\item[+] task management
\item[+] continuous integration
\item[+] wikis
\item Commonly used to host open source software development projects
\item Bought by Microsoft in 2018
\includegraphics[scale = .2]{../figures/github.png}
\begin{frame}{Github workflow}
\includegraphics[scale = .3]{../figures/workflow_git-github.png}
\hfill{\tiny \url{}}
\section[Repositories]{Sharing data in repositories}
\begin{frame}{Data repositories}
\item \url{}
\item \url{}
\item \url{}
\item \url{}
\item \url{}
\item \url{}
\item \url{}
\hfill{\footnotesize \url{}}
\begin{frame}{A codebook should include}
Variable name & Usually some abbreviation like \texttt{pna01}
Variable label & Brief description to identify variable
Question text & If applicable, exact wording from survey question
Values & Values variable can take (e.\,g, 1 to 5)
Value labels & If applicable, textual descriptions of the values
Statistics & For example, range, mean, standard deviation for
numeric variables; frequencies and percentages for categorical variables
Missing data & If applicable, values and labels of missing data
Notes & Additional notes, remarks, or comments; for measures or
questions from copyrighted instruments, the notes field can be used to
cite the source
\hfill\tiny \url{}