540 lines
25 KiB
Plaintext
540 lines
25 KiB
Plaintext
\documentclass{article}
|
|
|
|
\usepackage[margin = 2.4cm]{geometry}
|
|
\usepackage[utf8]{inputenc}
|
|
\usepackage{Sweave}
|
|
\usepackage{url}
|
|
\usepackage{authblk}
|
|
\usepackage[style = apa, backend = biber, natbib = true]{biblatex}
|
|
\addbibresource{lit.bib}
|
|
|
|
\title{Working title: Data Descriptor for HMC Data Set}
|
|
\author{Angelica Henestrosa}
|
|
\affil{Leibniz-Institut für Wissensmedien, Tübingen}
|
|
|
|
|
|
\begin{document}
|
|
\SweaveOpts{concordance=TRUE}
|
|
%\SweaveOpts{concordance=TRUE}
|
|
|
|
\maketitle
|
|
|
|
\begin{abstract}
|
|
|
|
Since the emergence of large language models (LLMs) in 2022, generative AI has rapidly expanded into mainstream applications, leading, for example, to the integration of Apple Intelligence into customer devices in 2024.
|
|
This integration into personal technology marks a significant shift and a further reduction in barriers to use, bringing advanced AI capabilities into everyday devices and making them accessible to private individuals.
|
|
Thus, the use of generative AI--consciously or unconsciously--along with interaction through LLM-powered (voice) assistants and engagement with AI-generated content is expected to increase significantly.
|
|
However, data that link this usage to psychological variables and track it over time remain scarce.
|
|
This longitudinal study comprises the data from an American sample across six waves at two-month intervals between September 2024 and July 2025. It examines user behavior, attitudes, knowledge, and perceptions related to generative AI.
|
|
Thus, this data set allows for future research on psychological and behavioral dynamics of AI use over time, offering insights into user engagement and the individual factors connected to it.
|
|
|
|
% Should not exceed 170 words
|
|
\end{abstract}
|
|
|
|
\section{Background and Summary}
|
|
|
|
% Overview of Dataset
|
|
%
|
|
% * Provide a clear overview of the dataset
|
|
% * Explain the motivation for creating the dataset
|
|
% * Outline the potential reuse value of the dataset
|
|
|
|
The introduction of transformer architectures in 2017 marked a major breakthrough in natural language processing (NLP), enabling significant advances in machine learning (ML) and the development of large language models (LLMs). These models, trained on vast corpora of text data, have demonstrated unprecedented capabilities in generating coherent and contextually relevant language. A milestone in public engagement with generative AI (GenAI) was the release of ChatGPT in November 2022, which made LLMs widely accessible to non-expert users.
|
|
Since then, millions of individuals have interacted with conversational agents and other GenAI tools, often regularly integrating them into everyday tasks such as writing, coding, learning, and decision-making (LIT).
|
|
This widespread proliferation of AI technologies, coupled with their increasingly diverse applications and personalized user experiences, raises the questions on how psychological factors shape and might explain differences in AI adoption and usage.
|
|
As AI systems become more adaptive and embedded in everyday life, understanding the determinants of usage intensity, behavioral patterns, and types of use becomes essential.
|
|
Moreover, the field of AI is evolving at a fast pace, and user characteristics such as attitudes and trust are subject to change over time. Therefore, longitudinal research that captures temporal fluctuations in user traits and behaviors is crucial.
|
|
|
|
Therfore, this longitudinally designed data set aims to capture the evolving perceptions of opportunities and risks associated with AI, perceived capabilities of AI systems, attitudes toward AI, trust in AI, willingness to delegate tasks to AI, areas of application, (to be continued) and the interrelationships among these constructs over time and get some hints on causality. Longitudinal studies are more likely to find changes if there is a potential change trigger \citep{Zhao2024}.
|
|
|
|
Central questions are whether predictors of technology acceptance as well as technology use change over time, whether the perception of AI-Tools as tools vs.\ agents (if so: what type of role/relationship) changes over time, whether this perception is related to concepts like credibility, trustworthiness, or task delegation, and whether factors such as social presence of perceived anthropomorphism mediate such processes. Long-term effects of delegating tasks to AI Tools on outcomes like perceived self-efficacy (writing skills), loneliness, or cognitive self-esteem and explore the moderating role of personality can also be explored.
|
|
|
|
% Note: lets all reflect on which term and why we want to use, and how we define it: usage vs. use
|
|
|
|
This project is a joint project from the human-computer interaction group at
|
|
the Leibniz-Institut für Wissensmedien in Tübingen (IWM). There are several (how many should we mention?) preregistrations from group members focusing on their individual subquestions. For an overview of the work packages and their research questions, please visit our repository \url{https://gitea.iwm-tuebingen.de/HMC/data}.
|
|
Thus, this data descriptor may be used to examine research questions across the individual work packages, the possibility to extract and analyze specific subgroups or individual trajectories ignored in the work packages.
|
|
Because the data set was collected shortly before the public release of Apple Intelligence on consumer devices, it offers a timely snapshot of user attitudes and behaviors at a pivotal moment in AI adoption. This context enhances the relevance of the data for understanding emerging patterns in human-AI interaction. Moreover, the findings may provide early indicators of how psychological variables such as trust, perceived usefulness, and willingness to delegate tasks relate to AI usage, potentially offering prognosis of similar developments in other countries.
|
|
|
|
% Previous Publications
|
|
%
|
|
% * Cite any previous publications that utilized these data, in whole or in part
|
|
% * Briefly summarize the findings or contributions of those publications
|
|
%
|
|
% Introductions for Articles and Comments
|
|
%
|
|
% * Explain the purpose of the work performed
|
|
% * Describe the value that the work adds to the field
|
|
%
|
|
% Citing Prior Art
|
|
%
|
|
% * Include citations of relevant datasets or outputs in the field for reader
|
|
% interest
|
|
% * Avoid subjective claims regarding novelty, impact, or utility
|
|
|
|
|
|
\section{Methods}
|
|
|
|
% Description of Data Creation
|
|
%
|
|
% * Describe the steps or procedures used to create the data.
|
|
% * Include full descriptions of the experimental design.
|
|
% * Detail the data acquisition methods.
|
|
% * Explain any computational processing involved.
|
|
%
|
|
\subsection{e.g.: Participants and Data Collection}
|
|
|
|
To examine those changes and relationships, an American sample mainly consisting of AI users (specify) was invited to participate in this survey at two-month intervals between September 2024 and July 2025.
|
|
|
|
This study targets an US-American sample due to Apple announcing to release its new AI platform Apple Intelligence in autumn 2024 (in the US due to the stricter regulations in the EU) and we expect many people to be exposed to this AI on their Apple devices. Data collection started at the end of August 2024?? (six waves, roughly one year).
|
|
|
|
* Prolific
|
|
* Invitation
|
|
* time and intervals
|
|
* retention rate
|
|
* second sample -> invitation of wave1 participants
|
|
* focus on users -> exclusion of nousers without intention
|
|
* ethics approval
|
|
|
|
\subsection{e.g.: Measurements}
|
|
* List of all measures by wave
|
|
|
|
We collected sociodemographic information, including, age, gender, educational level, and household income from all participants at wave 1.
|
|
|
|
% Input Data for Secondary Datasets
|
|
%
|
|
% * Provide detailed descriptions of all input data.
|
|
% * Use a sub-heading such as 'Input Data' if desired.
|
|
% * Ensure details allow readers to source the exact data used (avoid non-specific
|
|
% URLs or homepages).
|
|
% * For continuously updated input data, include version numbers or search terms
|
|
% used.
|
|
% * Reference input datasets with DOIs or formal metadata using the appropriate
|
|
% citation format.
|
|
% * Embed URLs in the text for datasets without formal metadata.
|
|
%
|
|
% Focus on Practical Tasks
|
|
%
|
|
% * Avoid including general results or analyses in this section.
|
|
% * If data have been analyzed or published elsewhere, cite the experimental
|
|
% methods instead of restating them.
|
|
% * Focus on documenting practical tasks and technical or processing steps.
|
|
%
|
|
% Scientific Process Description
|
|
%
|
|
% * Describe the full scientific process for generating the output or study.
|
|
% * Limit discussion of operational aspects like software development or project
|
|
% management unless relevant to the science.
|
|
%
|
|
% Consortia and Multi-Stakeholder Projects
|
|
%
|
|
% * Be mindful of scientific relevance and reader interest when describing
|
|
% administration, management, and funding.
|
|
% * State funder details as a practical requirement but avoid excessive focus on
|
|
% organization unless relevant to the science.
|
|
|
|
|
|
\section{Data Records}
|
|
|
|
% * Explain what the dataset contains.
|
|
% * Specify the repository where the dataset is stored.
|
|
% * Provide an overview of the data files and their formats.
|
|
% * Describe the folder structure of the dataset.
|
|
% * Cite each external dataset using the appropriate data citation format.
|
|
% * Limit extensive summary statistics to less than half a page.
|
|
% * Include 1-2 tables or figures if necessary, but avoid summarizing data that
|
|
% can be generated from the dataset.
|
|
|
|
Data records for each of the six waves are available in CSV format at
|
|
\url{https://gitea.iwm-tuebingen.de/HMC/data} together with the R scripts for
|
|
data anonymization and data cleaning.
|
|
|
|
In a first step, the data was anonymized by removing participants' Prolific IDs
|
|
and unused variables as well as variables only containing \texttt{NA} resulting
|
|
from faulty questionnaire programming were removed. The results are six files
|
|
(one for each wave) with the primary data containing the single items of each
|
|
scale measured. Furthermore, variable names were harmonized and subjects
|
|
excluded that filled in the survey several times. The final data sets are ready
|
|
for analyses after taking some additional data preparation steps for building
|
|
the scales (if desired).
|
|
|
|
Figure~\ref{fig:folderstruc} shows the folder structure and files contained in
|
|
the repository of the data records. This repostirory is generated from the
|
|
local project folder that all project collaborators can access. All files are
|
|
text files or PDFs with the exceptions of the codebook which is an EXCEL file.
|
|
However, an export of the information contained in the EXCEL codebook to a
|
|
MARKDOWN file is also included, for faster readability online and to ensure
|
|
that all files are in non-proprietary formats.
|
|
|
|
\begin{figure}
|
|
\begin{verbatim}
|
|
https://gitea.iwm-tuebingen.de/HMC/data
|
|
|-- 01_project_management
|
|
| |-- workpackages
|
|
| | |-- workpackages.md
|
|
|-- 02_material
|
|
| |-- AI_Trends_Wave1_Survey.pdf
|
|
| |-- AI_Trends_Wave2_Survey.pdf
|
|
| |-- AI_Trends_Wave3_Survey.pdf
|
|
| |-- AI_Trends_Wave4_Survey.pdf
|
|
| |-- AI_Trends_Wave5_Survey.pdf
|
|
| |-- AI_Trends_Wave6_Survey.pdf
|
|
|-- 03_data
|
|
| |-- 01_raw_data
|
|
| | |-- anonymization.R
|
|
| |-- 02_anonymized_data
|
|
| | |-- cleaning.R
|
|
| |-- 03_cleaned_data
|
|
| | |-- HMC_wave1_cleaned.csv
|
|
| | |-- HMC_wave2_cleaned.csv
|
|
| | |-- HMC_wave3_cleaned.csv
|
|
| | |-- HMC_wave4_cleaned.csv
|
|
| | |-- HMC_wave5_cleaned.csv
|
|
| | |-- HMC_wave6_cleaned.csv
|
|
| |-- HMC_codebook.xlsx
|
|
| |-- item_reference.md
|
|
| |-- README.md
|
|
|-- README.md
|
|
\end{verbatim}
|
|
\caption{Folder structure of the repository containing the data records.}
|
|
\label{fig:folderstruc}
|
|
\end{figure}
|
|
|
|
Furthermore, a codebook explaining variable abbreviations and coding and
|
|
containing references and information about the waves in which the variable was
|
|
collected is available at
|
|
\url{https://gitea.iwm-tuebingen.de/HMC/data/src/branch/main/03_data/item_reference.md}.
|
|
|
|
% TODO: Where should the demographics table go? Here or above in the Methods
|
|
% section?
|
|
Table~\ref{tab:demographics} provides an overview of the demographic variables
|
|
over all six waves. Education and income were collected on six-point scales.
|
|
Answering options for education are
|
|
%
|
|
\begin{enumerate}
|
|
\item Some high school or less
|
|
\item High school diploma or GED
|
|
\item Some college, but no degree
|
|
\item Associates or technical degree
|
|
\item Bachelor's degree
|
|
\item Graduate or professional degree (MA, MS, MBA, PhD, JD, MD, DDS etc.)"
|
|
\end{enumerate}
|
|
%
|
|
and for income
|
|
%
|
|
\begin{enumerate}
|
|
\item Less than \$25,000
|
|
\item \$25,000-\$49,999
|
|
\item \$50,000-\$74,999
|
|
\item \$75,000-\$99,999
|
|
\item \$100,000-\$149,999
|
|
\item \$150,000 or more.
|
|
\end{enumerate}
|
|
%
|
|
The rate of users of AI systems increases over the six waves from about 76\% to
|
|
almost 90\% in the sixth wave.
|
|
|
|
<<echo = false, results = tex>>=
|
|
# Read data
|
|
|
|
dat1 <- read.csv("../data/03_data/03_cleaned_data/HMC_wave1_cleaned.csv")
|
|
dat2 <- read.csv("../data/03_data/03_cleaned_data/HMC_wave2_cleaned.csv")
|
|
dat3 <- read.csv("../data/03_data/03_cleaned_data/HMC_wave3_cleaned.csv")
|
|
dat4 <- read.csv("../data/03_data/03_cleaned_data/HMC_wave4_cleaned.csv")
|
|
dat5 <- read.csv("../data/03_data/03_cleaned_data/HMC_wave5_cleaned.csv")
|
|
dat6 <- read.csv("../data/03_data/03_cleaned_data/HMC_wave6_cleaned.csv")
|
|
|
|
dat1$use <- factor(dat1$use,
|
|
levels = 1:2,
|
|
labels = c("user", "noUser"))
|
|
dat2$use <- factor(dat2$use,
|
|
levels = 1:2,
|
|
labels = c("user", "noUser"))
|
|
dat3$use <- factor(dat3$use,
|
|
levels = 1:2,
|
|
labels = c("user", "noUser"))
|
|
dat4$use <- factor(dat4$use,
|
|
levels = 1:2,
|
|
labels = c("user", "noUser"))
|
|
dat5$use <- factor(dat5$use,
|
|
levels = 1:2,
|
|
labels = c("user", "noUser"))
|
|
dat6$use <- factor(dat6$use,
|
|
levels = 1:2,
|
|
labels = c("user", "noUser"))
|
|
|
|
subj_id_w2 <- unique(dat2$subj_id)
|
|
subj_id_w3 <- unique(dat3$subj_id)
|
|
subj_id_w4 <- unique(dat4$subj_id)
|
|
subj_id_w5 <- unique(dat5$subj_id)
|
|
subj_id_w6 <- unique(dat6$subj_id)
|
|
|
|
# Demographics were collected in wave 1
|
|
dat <- subset(dat1, select = c(subj_id, age, gender, education, income,
|
|
apple_use, apple_spprt_SiriAI, apple_AI_intent_use, use))
|
|
|
|
dat$gender <- factor(dat$gender,
|
|
levels = 1:4,
|
|
labels = c("Male", "Female", "Non-binary / third gender",
|
|
"Prefer not to say"))
|
|
# dat$education <- factor(dat$education,
|
|
# levels = 1:7,
|
|
# labels = c("Some high school or less",
|
|
# "High school diploma or GED",
|
|
# "Some college, but no degree",
|
|
# "Associates or technical degree",
|
|
# "Bachelor's degree",
|
|
# "Graduate or professional degree (MA, MS, MBA, PhD, JD, MD, DDS etc.)",
|
|
# "Prefer not to say"))
|
|
#
|
|
# dat$income <- factor(dat$income,
|
|
# levels = 1:7,
|
|
# labels = c("Less than $25,000",
|
|
# "$25,000-$49,999",
|
|
# "$50,000-$74,999",
|
|
# "$75,000-$99,999",
|
|
# "$100,000-$149,999",
|
|
# "$150,000 or more",
|
|
# "Prefer not to say"))
|
|
# TODO: What to do about these? Reported means in table, since it is to detailed
|
|
# otherwise - but is this what we want?
|
|
|
|
# Remove categories that are not informative for means and SDs
|
|
dat$education <- ifelse(dat$education == 7, NA, dat$education)
|
|
dat$income <- ifelse(dat$income == 7, NA, dat$income)
|
|
|
|
# dat$use <- factor(dat$use,
|
|
# levels = 1:2,
|
|
# labels = c("user", "noUser"))
|
|
|
|
# TODO: Put in separate table? Left out for now!
|
|
dat$apple_use <- factor(dat$apple_use,
|
|
levels = 1:2,
|
|
labels = c("Yes", "No"))
|
|
|
|
dat$apple_spprt_SiriAI <- factor(dat$apple_spprt_SiriAI,
|
|
levels = 1:3,
|
|
labels = c("Yes", "No", "I don't know"))
|
|
|
|
dat$apple_AI_intent_use <- factor(dat$apple_AI_intent_use,
|
|
levels = 1:3,
|
|
labels = c("Yes", "No", "Maybe"))
|
|
|
|
# Create table for demographics
|
|
tab_demo <- matrix(NA, nrow = 6, ncol = 8)
|
|
|
|
rownames(tab_demo) <- paste("wave", 1:6)
|
|
colnames(tab_demo) <- c("Total N", "User", "Male", "Female", "Other",
|
|
"Age M(SD)", "Education M(SD)", "Income M(SD)")
|
|
|
|
tab_demo[, 1] <- c(nrow(dat1), nrow(dat2), nrow(dat3), nrow(dat4), nrow(dat5),
|
|
nrow(dat6))
|
|
|
|
tab_demo[, 2] <- c(
|
|
paste0(sprintf(fmt = "%.2f", dat1 |> subset(use == "user") |> nrow() / dat1 |> nrow() * 100), "%"),
|
|
paste0(sprintf(fmt = "%.2f", dat2 |> subset(use == "user") |> nrow() / dat2 |> nrow() * 100), "%"),
|
|
paste0(sprintf(fmt = "%.2f", dat3 |> subset(use == "user") |> nrow() / dat3 |> nrow() * 100), "%"),
|
|
paste0(sprintf(fmt = "%.2f", dat4 |> subset(use == "user") |> nrow() / dat4 |> nrow() * 100), "%"),
|
|
paste0(sprintf(fmt = "%.2f", dat5 |> subset(use == "user") |> nrow() / dat5 |> nrow() * 100), "%"),
|
|
paste0(sprintf(fmt = "%.2f", dat6 |> subset(use == "user") |> nrow() / dat6 |> nrow() * 100), "%")
|
|
)
|
|
|
|
tab_demo[, 3] <- c(dat |> subset(gender == "Male") |> nrow(),
|
|
dat |> subset(gender == "Male" & subj_id %in% subj_id_w2) |> nrow(),
|
|
dat |> subset(gender == "Male" & subj_id %in% subj_id_w3) |> nrow(),
|
|
dat |> subset(gender == "Male" & subj_id %in% subj_id_w4) |> nrow(),
|
|
dat |> subset(gender == "Male" & subj_id %in% subj_id_w5) |> nrow(),
|
|
dat |> subset(gender == "Male" & subj_id %in% subj_id_w6) |> nrow()
|
|
)
|
|
|
|
tab_demo[, 4] <- c(dat |> subset(gender == "Female") |> nrow(),
|
|
dat |> subset(gender == "Female" & subj_id %in% subj_id_w2) |> nrow(),
|
|
dat |> subset(gender == "Female" & subj_id %in% subj_id_w3) |> nrow(),
|
|
dat |> subset(gender == "Female" & subj_id %in% subj_id_w4) |> nrow(),
|
|
dat |> subset(gender == "Female" & subj_id %in% subj_id_w5) |> nrow(),
|
|
dat |> subset(gender == "Female" & subj_id %in% subj_id_w6) |> nrow()
|
|
)
|
|
|
|
tab_demo[, 5] <- c(dat |> subset(gender %in% c("Non-binary / third gender",
|
|
"Prefer not to say")) |> nrow(),
|
|
dat |> subset(gender %in% c("Non-binary / third gender", "Prefer not to say") & subj_id %in% subj_id_w2) |> nrow(),
|
|
dat |> subset(gender %in% c("Non-binary / third gender", "Prefer not to say") & subj_id %in% subj_id_w3) |> nrow(),
|
|
dat |> subset(gender %in% c("Non-binary / third gender", "Prefer not to say") & subj_id %in% subj_id_w4) |> nrow(),
|
|
dat |> subset(gender %in% c("Non-binary / third gender", "Prefer not to say") & subj_id %in% subj_id_w5) |> nrow(),
|
|
dat |> subset(gender %in% c("Non-binary / third gender", "Prefer not to say") & subj_id %in% subj_id_w6) |> nrow()
|
|
)
|
|
|
|
tab_demo[, 6] <- c(
|
|
paste0(dat$age |> mean(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
" (",
|
|
dat$age |> sd(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
")"
|
|
),
|
|
paste0(subset(dat, subj_id %in% subj_id_w2)$age |> mean(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
" (",
|
|
subset(dat, subj_id %in% subj_id_w2)$age |> sd(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
")"
|
|
),
|
|
paste0(subset(dat, subj_id %in% subj_id_w3)$age |> mean(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
" (",
|
|
subset(dat, subj_id %in% subj_id_w3)$age |> sd(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
")"
|
|
),
|
|
paste0(subset(dat, subj_id %in% subj_id_w4)$age |> mean(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
" (",
|
|
subset(dat, subj_id %in% subj_id_w4)$age |> sd(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
")"
|
|
),
|
|
paste0(subset(dat, subj_id %in% subj_id_w5)$age |> mean(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
" (",
|
|
subset(dat, subj_id %in% subj_id_w5)$age |> sd(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
")"
|
|
),
|
|
paste0(subset(dat, subj_id %in% subj_id_w6)$age |> mean(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
" (",
|
|
subset(dat, subj_id %in% subj_id_w6)$age |> sd(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
")"
|
|
)
|
|
)
|
|
|
|
tab_demo[, 7] <- c(
|
|
paste0(dat$education |> mean(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
" (",
|
|
dat$education |> sd(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
")"
|
|
),
|
|
paste0(subset(dat, subj_id %in% subj_id_w2)$education |> mean(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
" (",
|
|
subset(dat, subj_id %in% subj_id_w2)$education |> sd(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
")"
|
|
),
|
|
paste0(subset(dat, subj_id %in% subj_id_w3)$education |> mean(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
" (",
|
|
subset(dat, subj_id %in% subj_id_w3)$education |> sd(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
")"
|
|
),
|
|
paste0(subset(dat, subj_id %in% subj_id_w4)$education |> mean(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
" (",
|
|
subset(dat, subj_id %in% subj_id_w4)$education |> sd(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
")"
|
|
),
|
|
paste0(subset(dat, subj_id %in% subj_id_w5)$education |> mean(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
" (",
|
|
subset(dat, subj_id %in% subj_id_w5)$education |> sd(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
")"
|
|
),
|
|
paste0(subset(dat, subj_id %in% subj_id_w6)$education |> mean(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
" (",
|
|
subset(dat, subj_id %in% subj_id_w6)$education |> sd(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
")"
|
|
)
|
|
)
|
|
|
|
|
|
tab_demo[, 8] <- c(
|
|
paste0(dat$income |> mean(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
" (",
|
|
dat$income |> sd(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
")"
|
|
),
|
|
paste0(subset(dat, subj_id %in% subj_id_w2)$income |> mean(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
" (",
|
|
subset(dat, subj_id %in% subj_id_w2)$income |> sd(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
")"
|
|
),
|
|
paste0(subset(dat, subj_id %in% subj_id_w3)$income |> mean(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
" (",
|
|
subset(dat, subj_id %in% subj_id_w3)$income |> sd(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
")"
|
|
),
|
|
paste0(subset(dat, subj_id %in% subj_id_w4)$income |> mean(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
" (",
|
|
subset(dat, subj_id %in% subj_id_w4)$income |> sd(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
")"
|
|
),
|
|
paste0(subset(dat, subj_id %in% subj_id_w5)$income |> mean(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
" (",
|
|
subset(dat, subj_id %in% subj_id_w5)$income |> sd(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
")"
|
|
),
|
|
paste0(subset(dat, subj_id %in% subj_id_w6)$income |> mean(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
" (",
|
|
subset(dat, subj_id %in% subj_id_w6)$income |> sd(na.rm = TRUE) |> sprintf(fmt = "%.2f"),
|
|
")"
|
|
)
|
|
)
|
|
|
|
xtable::xtable(tab_demo,
|
|
align = c("l", "r", "r", "r", "r", "r", "c", "c", "c"),
|
|
caption = "Demographic variables per wave",
|
|
label = "tab:demographics", auto = TRUE)
|
|
@
|
|
|
|
|
|
\section{Technical Validation}
|
|
|
|
Wave 1 was conducted shortly before iOs 18?? was published. -> were there any other external events potentially influencing the survey?
|
|
|
|
* Analysis of sample differences across waves -> was the sample equally distributed regarding sociodemographic characteristics?
|
|
|
|
* attention check
|
|
* bot detection question
|
|
* forced to respond
|
|
|
|
% * Describe the experiments, analyses, or checks performed to support the
|
|
% technical quality of the dataset.
|
|
% * Include any supporting figures and tables as needed.
|
|
|
|
\section{Usage Notes (optional)}
|
|
|
|
Maybe here elaborate on limitations:
|
|
* no data on no-users for wave 1-3
|
|
* not representative for age/gender/education/region due to focus on users
|
|
* online survey: inattentive participants, fatigue effects especially in wave 1 and 6 (more variables)
|
|
* rentention rate/dropout rate across waves
|
|
% * Provide optional information that may assist other researchers in reusing the
|
|
% data.
|
|
% * Include additional technical notes on how to access or process the data.
|
|
% * Avoid using this section for conclusions, general selling points, or worked
|
|
% case studies.
|
|
|
|
\section{Code Availability}
|
|
|
|
% * Include a subheading titled "Code Availability" in the publication.
|
|
% * Indicate whether custom code can be accessed.
|
|
% * Provide details on how to access the custom code, including any restrictions
|
|
% * Include information on the versions of any software used, if relevant.
|
|
% * Specify any particular variables or parameters used to generate, test, or
|
|
% process the dataset, if not included in the Methods.
|
|
% * Place the code availability statement at the end of the manuscript,
|
|
% immediately before the references.
|
|
% * If no custom code has been used, include a statement confirming this.
|
|
|
|
The primary cleaned data and accompanying R code for data anonymization and
|
|
cleaning for all six waves is available at
|
|
\url{https://gitea.iwm-tuebingen.de/HMC/data}. The repository and all material
|
|
can be downloaded directly or cloned as a Git repository. All additional R
|
|
pacckages used for data cleaning (like, e.\,g., \texttt{dplyr},
|
|
\texttt{qualtRics}, or \texttt{openxlsx}) are available on CRAN
|
|
(\url{https://cran.r-project.org/}) and can be freely downloaded there.
|
|
However, the scripts are mainly provided to make transparent which steps haven
|
|
been take for data annymization and data cleaning. The data files and codebook
|
|
can be downloaded and sued without having to rerun any of the scripts. We
|
|
provide the data on item level here, so that they can be used for any kind of
|
|
analysis. The codebook provides information needed to aggregate items into
|
|
scales, e.\,g. which items belong to one scale and which items should be
|
|
inversed before being included into the scale.
|
|
|
|
% TODO: Should we maybe add information on how to build the scale? Like "take
|
|
% the mean", "take the sum" - does this differ? --> Check YAMLs
|
|
|
|
\printbibliography
|
|
|
|
\section*{Author Contributions}
|
|
|
|
\section*{Competing Interests}
|
|
|
|
\section*{Acknowledgements}
|
|
|
|
\end{document}
|