\documentclass{article}

\usepackage[utf8]{inputenc}
\usepackage{Sweave}
\usepackage{authblk}


\title{Working title: Data Descriptor for HMC Data Set}
\author{Angelica Henestrosa}
\affil{Leibniz-Institut für Wissensmedien, Tübingen}


\begin{document}
\SweaveOpts{concordance=TRUE}
%\SweaveOpts{concordance=TRUE}

\maketitle

\begin{abstract}

Since the emergence of large language models (LLMs) in 2022, generative AI has rapidly expanded into mainstream applications, leading, for example, to the integration of Apple Intelligence into customer devices in 2024. 
This integration into personal technology marks a significant shift and a further reduction in barriers to use, bringing advanced AI capabilities into everyday devices and making them accessible to private individuals.
Thus, the use of generative AI--consciously or unconsciously--along with interaction through LLM-powered (voice) assistants and engagement with AI-generated content is expected to increase significantly.
However, data that link this usage to psychological variables and track it over time remain scarce.
This longitudinal study comprises the data from an American sample across six waves at two-month intervals between September 2024 and July 2025. It examines user behavior, attitudes, knowledge, and perceptions related to generative AI.
Thus, this data set allows for future research on psychological and behavioral dynamics of AI use over time, offering insights into user engagement and the individual factors connected to it.

  % Should not exceed 170 words
\end{abstract}

\section{Background and Summary}

% Overview of Dataset
%
% * Provide a clear overview of the dataset
% * Explain the motivation for creating the dataset
% * Outline the potential reuse value of the dataset

The introduction of transformer architectures in 2017 marked a major breakthrough in natural language processing (NLP), enabling significant advances in machine learning (ML) and the development of large language models (LLMs). These models, trained on vast corpora of text data, have demonstrated unprecedented capabilities in generating coherent and contextually relevant language. A milestone in public engagement with generative AI (GenAI) was the release of ChatGPT in November 2022, which made LLMs widely accessible to non-expert users.
Since then, millions of individuals have interacted with conversational agents and other GenAI tools, often regularly integrating them into everyday tasks such as writing, coding, learning, and decision-making (LIT).
This widespread proliferation of AI technologies, coupled with their increasingly diverse applications and personalized user experiences, raises the questions on how psychological factors shape and might explain differences in AI adoption and usage.
As AI systems become more adaptive and embedded in everyday life, understanding the determinants of usage intensity, behavioral patterns, and types of use becomes essential. 
Moreover, the field of AI is evolving at a fast pace, and user characteristics such as attitudes and trust are subject to change over time. Therefore, longitudinal research that captures temporal fluctuations in user traits and behaviors is crucial.

Therfore, this longitudinally designed data set aims to capture the evolving perceptions of opportunities and risks associated with AI, perceived capabilities of AI systems, attitudes toward AI, trust in AI, willingness to delegate tasks to AI, areas of application, (to be continued) and the interrelationships among these constructs over time and get some hints on causality. Longitudinal studies are more likely to find changes if there is a potential change trigger (Zhao et al., 2024)

Central questions are whether predictors of technology acceptance as well as
technology use change over time, whether the perception of AI-Tools as tools vs.
agents (if so: what type of role/relationship) changes over time, whether this
perception is related to concepts like credibility, trustworthiness, or task
delegation, and whether factors such as social presence of perceive
anthropomorphism mediate such processes. We also explore the long-term
effects of delegating tasks to AI Tools on outcomes like perceived
self-efficacy (writing skills), loneliness, or cognitive self-esteem and explore
the moderating role of personality.

% Note: lets all reflect on which term and why we want to use, and how we define it: usage vs. use

This project is a joint project from the human-computer interaction group at
the Leibniz-Institut für Wissensmedien in Tübingen (IWM). There are several (how many should we mention?) preregistrations from group members focusing on their individual subquestions. For an overview of the work packages and their research questions, please visit our repository [LINK]. 
% --> create workpackages.md
Thus, this data descriptor may be used to examine research questions across the individual work packages, the possibility to extract and analyze specific subgroups or individual trajectories ignored in the work packages. 
Because the data set was collected shortly before the public release of Apple Intelligence on consumer devices, it offers a timely snapshot of user attitudes and behaviors at a pivotal moment in AI adoption. This context enhances the relevance of the data for understanding emerging patterns in human-AI interaction. Moreover, the findings may provide early indicators of how psychological variables such as trust, perceived usefulness, and willingness to delegate tasks relate to AI usage, potentially offering prognosis of similar developments in other countries.

% WP1 Teresa/Nico/Vanessa/Angelica https://osf.io/58tqc
%
% WP2 Teresa
%
% WP3 Sonja https://aspredicted.org/4g3d-rqkt.pdf
%
% WP4 Büsra https://aspredicted.org/m6zv9.pdf
%
% WP5 Büsra/Teresa https://aspredicted.org/kx5r-4pxq.pdf
%
% WP6 Angelica/Gerrit https://doi.org/10.17605/OSF.IO/JAUD4
%
% WP7 Mike ???
%
% WP8 Steffi/Sonja https://osf.io/f3jyc?view_only=d8d009e575c64dc2bd453f969c3cb7b1
%
% WP9 Steffi https://osf.io/h5fwe?view_only=8c5bc9e62074469ebdb3d72b38f4716d
%
% --> Are these all WPs? Are there any missing?


% Previous Publications
%
% * Cite any previous publications that utilized these data, in whole or in part
% * Briefly summarize the findings or contributions of those publications
%
% Introductions for Articles and Comments
%
% * Explain the purpose of the work performed
% * Describe the value that the work adds to the field
%
% Citing Prior Art
%
% * Include citations of relevant datasets or outputs in the field for reader
%   interest
% * Avoid subjective claims regarding novelty, impact, or utility


\section{Methods}

% Description of Data Creation
%
% * Describe the steps or procedures used to create the data.
% * Include full descriptions of the experimental design.
% * Detail the data acquisition methods.
% * Explain any computational processing involved.
%
\subsection{e.g.: Participants and Data Collection}

To examine those changes and relationships, an American sample mainly consisting of AI users (specify) was invited to participate in this survey at two-month intervals between September 2024 and July 2025.

This study targets an US-American sample due to Apple announcing to release its new AI platform Apple Intelligence in autumn 2024 (in the US due to the stricter regulations in the EU) and we expect many people to be exposed to this AI on their Apple devices. Data collection started at the end of August 2024?? (six waves, roughly one year). 

* Prolific
* Invitation
* time and intervals
* retention rate
* second sample -> invitation of wave1 participants
* focus on users -> exclusion of nousers without intention
* ethics approval

\subsection{e.g.: Measurements}
* List of all measures by wave

We collected sociodemographic information, including, age, gender, educational level, and household income from all participants at wave 1.

% Input Data for Secondary Datasets
%
% * Provide detailed descriptions of all input data.
% * Use a sub-heading such as 'Input Data' if desired.
% * Ensure details allow readers to source the exact data used (avoid non-specific
%   URLs or homepages).
% * For continuously updated input data, include version numbers or search terms
%   used.
% * Reference input datasets with DOIs or formal metadata using the appropriate
%   citation format.
% * Embed URLs in the text for datasets without formal metadata.
%
% Focus on Practical Tasks
%
% * Avoid including general results or analyses in this section.
% * If data have been analyzed or published elsewhere, cite the experimental
%   methods instead of restating them.
% * Focus on documenting practical tasks and technical or processing steps.
%
% Scientific Process Description
%
% * Describe the full scientific process for generating the output or study.
% * Limit discussion of operational aspects like software development or project
%   management unless relevant to the science.
%
% Consortia and Multi-Stakeholder Projects
%
% * Be mindful of scientific relevance and reader interest when describing
%   administration, management, and funding.
% * State funder details as a practical requirement but avoid excessive focus on
%   organization unless relevant to the science.


\section{Data Records}

% * @Nora das könntest du vllt. noch ausführen?

Data records for each of the six waves are available in csv format at (tbd) together with the R/python scripts for data anonymization, data cleaning, and data preprocessing.
That is, firstly the data was anonymized by removing participants' Prolific IDs and unused variables, empty variables resulting from faulty questionnaire programming, and xy were removed. Thus (filename) represents the cleaned and anonymized raw data, including the single items of each measurement. Second, variable names were harmonized and scales were calculated, resulting an the preprocessed data set xy, ready for analyses across scales.
Moreover, a codebook explaining variable abbreviations and containing information about the waves in which the variable was collected (what else?) is available at (tbd).

% * Explain what the dataset contains.
% * Specify the repository where the dataset is stored.
% * Provide an overview of the data files and their formats.
% * Describe the folder structure of the dataset.
% * Cite each external dataset using the appropriate data citation format.
% * Limit extensive summary statistics to less than half a page.
% * Include 1-2 tables or figures if necessary, but avoid summarizing data that
%   can be generated from the dataset.

% * how should we report on the variables and scales:
%  **item and scale level OR just scale level ?
% **link to Gerrits scale list: https://gitea.iwm-tuebingen.de/AG4/project_HMC_preprocessing/src/branch/main/results/database_api_reference.md ?
%  **extra codebook or merge that information to Gerrits list?
  
%  --> an overview about all variables, their calculation, their measurement format and ideally their M, SD, cronbachs alpha would be ideal!

\section{Technical Validation}

Wave 1 was conducted shortly before iOs 18?? was published. -> were there any other external events potentially influencing the survey?

* Analysis of sample differences across waves -> was the sample equally distributed regarding sociodemographic characteristics?

* attention check
* bot detection question
* forced to respond

% * Describe the experiments, analyses, or checks performed to support the
%   technical quality of the dataset.
% * Include any supporting figures and tables as needed.

\section{Usage Notes (optional)}

Maybe here elaborate on limitations:
* no data on no-users for wave 1-3
* not representative for age/gender/education/region due to focus on users
* online survey: inattentive participants, fatigue effects especially in wave 1 and 6 (more variables)
* rentention rate/dropout rate across waves
% * Provide optional information that may assist other researchers in reusing the
%   data.
% * Include additional technical notes on how to access or process the data.
% * Avoid using this section for conclusions, general selling points, or worked
%   case studies.

\section{Code Availability}

All python (version x) an R (version x) code for data anonymization, data cleaning, and preprocessing as well as the cleaned and the preprocessed data sets for each wave are stored in the public repository [link].
% überlegen, ob man hier getrennt zu gitea und zu OSF (+ Material) weiterleitet

% * Include a subheading titled "Code Availability" in the publication.
% * Indicate whether custom code can be accessed.
% * Provide details on how to access the custom code, including any restrictions
% * Include information on the versions of any software used, if relevant.
% * Specify any particular variables or parameters used to generate, test, or
%   process the dataset, if not included in the Methods.
% * Place the code availability statement at the end of the manuscript,
%   immediately before the references.
% * If no custom code has been used, include a statement confirming this.

\section*{References}

\section*{Author Contributions}

\section*{Competing Interests}

\section*{Acknowledgements}


Hier ist ein R-Chunk:

<<>>=
x <- rnorm(100)
summary(x)
@

\end{document}