Finalize data README

2025-12-10 17:19:07 +01:00
parent bbdb35559c
commit 6921203765
1 changed files with 23 additions and 22 deletions
@@ -54,8 +54,7 @@ immediately preceding wave because ongoing monitoring showed that many non-users
 remained non-users and that relatively few participants perceived AI as a social
 actor. To capture more contemporary usage and obtain sufficient variation for
 research questions filtering for individuals that perceived AI as a social
-actor, we broadened recruitment in wave 4 to all wave-1 participants. Sample 2
-therefore contains only participants with at least one missing wave.
+actor, we broadened recruitment in wave 4 to all wave-1 participants.


 ## Download settings in Qualtrics
@@ -82,7 +81,7 @@ wave in `03_data/01_raw_data/wave*`. The script
 data and adds an anonymized ID `subj_id` with entries `subj0001 - sub1009` to
 all data sets.

-Irrelevant columns -- mostly automatically created by Qualtrics -- are also
+Irrelevant columns - mostly automatically created by Qualtrics - are also
 removed. See `anonymization.R` for details.

 The anonymized data files are saved to `03_data/02_anonymized_data/` as
@@ -90,10 +89,10 @@ CSV files with file names `HMC_<wave>_anonymized.csv`.

 # Data cleaning

-After data anonymization, some more rudimentary preprocessing was done on the
-data with the script `03_data/02_anonymized_data/cleaning.R`. Especially,
-the original variable names in Qualtrics were harmonized so they all follow the
-same structure.
+After data anonymization, some more rudimentary data cleaning was done with the
+script `03_data/02_anonymized_data/cleaning.R`. Especially, the original
+variable names in Qualtrics were harmonized so they all follow the same
+structure.

 The cleaned data files are saved to `03_data/03_cleaned_data/` as CSV files with
 file names `HMC_<wave>_cleaned.csv`.
@@ -103,21 +102,31 @@ some cleaning.

 ## Problems

-### with variable names over waves
+### with variable names

-* `trust_fav` and `Q161` and `Q162`
-* `obj_know` and `Q158`
-* the labels of the intention variables were swapped
-  --> `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa
-* ...
+* For the variables looking at what tasks subjects would delegate to AI, there
+  were some inconsistencies in the naming. This was _only_ in the variable
+  naming, the items were presented correctly to the subjects. The folloing
+  variables were renamed:
+  - `delg_tsk_typs_4 --> delg_tsk_typs_3`
+  - `delg_tsk_typs_5 --> delg_tsk_typs_4`
+  - `delg_tsk_typs_6 --> delg_tsk_typs_5`
+  - `delg_tsk_typs_7 --> delg_tsk_typs_6`
+  - `delg_tsk_typs_8 --> delg_tsk_typs_7`
+  - `delg_tsk_typs_8` was deleted

-<!-- TODO: Add more details -->
+* The labels of the intention variables were swapped by accident and this was
+  corrected:
+  - `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa

 ### with subjects

 * Two entries in wave 1: `subj0762`
 * Three entries in wave 3: `subj1009`
 * We kept the first entry for each subject
+* `subj1009` has been removed from the dataset since it only appeared in wave 3
+  and it is unclear how this happened; only subjects who participated in wave 1
+  have been invited to participate in further waves 

 # Data preprocessing

@@ -128,11 +137,3 @@ with the final variables for each scale are then saved in the folder
 `03_data/04_preprocessed_data` as CSV files with file names
 `HMC_<wave>_preprocessed.csv`.

-# TODOs
-
-* Add more preprocessing steps like variable renaming?
-
-* Get age (and other descriptives?) for subj1008 and subj1009 from Profilic
-  data?
-
-