Finalize data README

2025-12-10 17:19:07 +01:00 · 2025-12-10 17:19:07 +01:00 · 6921203765
commit 6921203765
parent bbdb35559c
1 changed files with 23 additions and 22 deletions
--- a/03_data/README.md
+++ b/03_data/README.md
@ -54,8 +54,7 @@ immediately preceding wave because ongoing monitoring showed that many non-users
 remained non-users and that relatively few participants perceived AI as a social
 actor. To capture more contemporary usage and obtain sufficient variation for
 research questions filtering for individuals that perceived AI as a social
-actor, we broadened recruitment in wave 4 to all wave-1 participants. Sample 2
+actor, we broadened recruitment in wave 4 to all wave-1 participants.
 therefore contains only participants with at least one missing wave.
 ## Download settings in Qualtrics
@ -82,7 +81,7 @@ wave in `03_data/01_raw_data/wave*`. The script
 data and adds an anonymized ID `subj_id` with entries `subj0001 - sub1009` to
 all data sets.
-Irrelevant columns -- mostly automatically created by Qualtrics -- are also
+Irrelevant columns - mostly automatically created by Qualtrics - are also
 removed. See `anonymization.R` for details.
 The anonymized data files are saved to `03_data/02_anonymized_data/` as
@ -90,10 +89,10 @@ CSV files with file names `HMC_<wave>_anonymized.csv`.
 # Data cleaning
-After data anonymization, some more rudimentary preprocessing was done on the
+After data anonymization, some more rudimentary data cleaning was done with the
-data with the script `03_data/02_anonymized_data/cleaning.R`. Especially,
+script `03_data/02_anonymized_data/cleaning.R`. Especially, the original
-the original variable names in Qualtrics were harmonized so they all follow the
+variable names in Qualtrics were harmonized so they all follow the same
-same structure.
+structure.
 The cleaned data files are saved to `03_data/03_cleaned_data/` as CSV files with
 file names `HMC_<wave>_cleaned.csv`.
@ -103,21 +102,31 @@ some cleaning.
 ## Problems
-### with variable names over waves
+### with variable names
-* `trust_fav` and `Q161` and `Q162`
+* For the variables looking at what tasks subjects would delegate to AI, there
-* `obj_know` and `Q158`
+  were some inconsistencies in the naming. This was _only_ in the variable
-* the labels of the intention variables were swapped
+  naming, the items were presented correctly to the subjects. The folloing
-  --> `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa
+  variables were renamed:
-* ...
+  - `delg_tsk_typs_4 --> delg_tsk_typs_3`
  - `delg_tsk_typs_5 --> delg_tsk_typs_4`
  - `delg_tsk_typs_6 --> delg_tsk_typs_5`
  - `delg_tsk_typs_7 --> delg_tsk_typs_6`
  - `delg_tsk_typs_8 --> delg_tsk_typs_7`
  - `delg_tsk_typs_8` was deleted
-<!-- TODO: Add more details -->
+* The labels of the intention variables were swapped by accident and this was
  corrected:
  - `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa
 ### with subjects
 * Two entries in wave 1: `subj0762`
 * Three entries in wave 3: `subj1009`
 * We kept the first entry for each subject
 * `subj1009` has been removed from the dataset since it only appeared in wave 3
  and it is unclear how this happened; only subjects who participated in wave 1
  have been invited to participate in further waves 
 # Data preprocessing
@ -128,11 +137,3 @@ with the final variables for each scale are then saved in the folder
 `03_data/04_preprocessed_data` as CSV files with file names
 `HMC_<wave>_preprocessed.csv`.
 # TODOs
 * Add more preprocessing steps like variable renaming?
 * Get age (and other descriptives?) for subj1008 and subj1009 from Profilic
  data?