Finalize data README

This commit is contained in:
Nora Wickelmaier 2025-12-10 17:19:07 +01:00
parent bbdb35559c
commit 6921203765

View File

@ -54,8 +54,7 @@ immediately preceding wave because ongoing monitoring showed that many non-users
remained non-users and that relatively few participants perceived AI as a social
actor. To capture more contemporary usage and obtain sufficient variation for
research questions filtering for individuals that perceived AI as a social
actor, we broadened recruitment in wave 4 to all wave-1 participants. Sample 2
therefore contains only participants with at least one missing wave.
actor, we broadened recruitment in wave 4 to all wave-1 participants.
## Download settings in Qualtrics
@ -82,7 +81,7 @@ wave in `03_data/01_raw_data/wave*`. The script
data and adds an anonymized ID `subj_id` with entries `subj0001 - sub1009` to
all data sets.
Irrelevant columns -- mostly automatically created by Qualtrics -- are also
Irrelevant columns - mostly automatically created by Qualtrics - are also
removed. See `anonymization.R` for details.
The anonymized data files are saved to `03_data/02_anonymized_data/` as
@ -90,10 +89,10 @@ CSV files with file names `HMC_<wave>_anonymized.csv`.
# Data cleaning
After data anonymization, some more rudimentary preprocessing was done on the
data with the script `03_data/02_anonymized_data/cleaning.R`. Especially,
the original variable names in Qualtrics were harmonized so they all follow the
same structure.
After data anonymization, some more rudimentary data cleaning was done with the
script `03_data/02_anonymized_data/cleaning.R`. Especially, the original
variable names in Qualtrics were harmonized so they all follow the same
structure.
The cleaned data files are saved to `03_data/03_cleaned_data/` as CSV files with
file names `HMC_<wave>_cleaned.csv`.
@ -103,21 +102,31 @@ some cleaning.
## Problems
### with variable names over waves
### with variable names
* `trust_fav` and `Q161` and `Q162`
* `obj_know` and `Q158`
* the labels of the intention variables were swapped
--> `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa
* ...
* For the variables looking at what tasks subjects would delegate to AI, there
were some inconsistencies in the naming. This was _only_ in the variable
naming, the items were presented correctly to the subjects. The folloing
variables were renamed:
- `delg_tsk_typs_4 --> delg_tsk_typs_3`
- `delg_tsk_typs_5 --> delg_tsk_typs_4`
- `delg_tsk_typs_6 --> delg_tsk_typs_5`
- `delg_tsk_typs_7 --> delg_tsk_typs_6`
- `delg_tsk_typs_8 --> delg_tsk_typs_7`
- `delg_tsk_typs_8` was deleted
<!-- TODO: Add more details -->
* The labels of the intention variables were swapped by accident and this was
corrected:
- `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa
### with subjects
* Two entries in wave 1: `subj0762`
* Three entries in wave 3: `subj1009`
* We kept the first entry for each subject
* `subj1009` has been removed from the dataset since it only appeared in wave 3
and it is unclear how this happened; only subjects who participated in wave 1
have been invited to participate in further waves
# Data preprocessing
@ -128,11 +137,3 @@ with the final variables for each scale are then saved in the folder
`03_data/04_preprocessed_data` as CSV files with file names
`HMC_<wave>_preprocessed.csv`.
# TODOs
* Add more preprocessing steps like variable renaming?
* Get age (and other descriptives?) for subj1008 and subj1009 from Profilic
data?