From 6921203765c7d55898fbb761fa8a0d803ecba1bc Mon Sep 17 00:00:00 2001
From: nwickel <n.wickelmaier@iwm-tuebingen.de>
Date: Wed, 10 Dec 2025 17:19:07 +0100
Subject: [PATCH] Finalize data README

---
 03_data/README.md | 45 +++++++++++++++++++++++----------------------
 1 file changed, 23 insertions(+), 22 deletions(-)
diff --git a/03_data/README.md b/03_data/README.md
index ae26359..7877b56 100644
--- a/03_data/README.md
+++ b/03_data/README.md
@@ -54,8 +54,7 @@ immediately preceding wave because ongoing monitoring showed that many non-users
 remained non-users and that relatively few participants perceived AI as a social
 actor. To capture more contemporary usage and obtain sufficient variation for
 research questions filtering for individuals that perceived AI as a social
-actor, we broadened recruitment in wave 4 to all wave-1 participants. Sample 2
-therefore contains only participants with at least one missing wave.
+actor, we broadened recruitment in wave 4 to all wave-1 participants.
 
 
 ## Download settings in Qualtrics
@@ -82,7 +81,7 @@ wave in `03_data/01_raw_data/wave*`. The script
 data and adds an anonymized ID `subj_id` with entries `subj0001 - sub1009` to
 all data sets.
 
-Irrelevant columns -- mostly automatically created by Qualtrics -- are also
+Irrelevant columns - mostly automatically created by Qualtrics - are also
 removed. See `anonymization.R` for details.
 
 The anonymized data files are saved to `03_data/02_anonymized_data/` as
@@ -90,10 +89,10 @@ CSV files with file names `HMC_<wave>_anonymized.csv`.
 
 # Data cleaning
 
-After data anonymization, some more rudimentary preprocessing was done on the
-data with the script `03_data/02_anonymized_data/cleaning.R`. Especially,
-the original variable names in Qualtrics were harmonized so they all follow the
-same structure.
+After data anonymization, some more rudimentary data cleaning was done with the
+script `03_data/02_anonymized_data/cleaning.R`. Especially, the original
+variable names in Qualtrics were harmonized so they all follow the same
+structure.
 
 The cleaned data files are saved to `03_data/03_cleaned_data/` as CSV files with
 file names `HMC_<wave>_cleaned.csv`.
@@ -103,21 +102,31 @@ some cleaning.
 
 ## Problems
 
-### with variable names over waves
+### with variable names
 
-* `trust_fav` and `Q161` and `Q162`
-* `obj_know` and `Q158`
-* the labels of the intention variables were swapped
-  --> `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa
-* ...
+* For the variables looking at what tasks subjects would delegate to AI, there
+  were some inconsistencies in the naming. This was _only_ in the variable
+  naming, the items were presented correctly to the subjects. The folloing
+  variables were renamed:
+  - `delg_tsk_typs_4 --> delg_tsk_typs_3`
+  - `delg_tsk_typs_5 --> delg_tsk_typs_4`
+  - `delg_tsk_typs_6 --> delg_tsk_typs_5`
+  - `delg_tsk_typs_7 --> delg_tsk_typs_6`
+  - `delg_tsk_typs_8 --> delg_tsk_typs_7`
+  - `delg_tsk_typs_8` was deleted
 
-<!-- TODO: Add more details -->
+* The labels of the intention variables were swapped by accident and this was
+  corrected:
+  - `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa
 
 ### with subjects
 
 * Two entries in wave 1: `subj0762`
 * Three entries in wave 3: `subj1009`
 * We kept the first entry for each subject
+* `subj1009` has been removed from the dataset since it only appeared in wave 3
+  and it is unclear how this happened; only subjects who participated in wave 1
+  have been invited to participate in further waves 
 
 # Data preprocessing
 
@@ -128,11 +137,3 @@ with the final variables for each scale are then saved in the folder
 `03_data/04_preprocessed_data` as CSV files with file names
 `HMC_<wave>_preprocessed.csv`.
 
-# TODOs
-
-* Add more preprocessing steps like variable renaming?
-
-* Get age (and other descriptives?) for subj1008 and subj1009 from Profilic
-  data?
-
-