Finalize data README

This commit is contained in:
Nora Wickelmaier 2025-12-10 17:19:07 +01:00
parent bbdb35559c
commit 6921203765

View File

@ -54,8 +54,7 @@ immediately preceding wave because ongoing monitoring showed that many non-users
remained non-users and that relatively few participants perceived AI as a social remained non-users and that relatively few participants perceived AI as a social
actor. To capture more contemporary usage and obtain sufficient variation for actor. To capture more contemporary usage and obtain sufficient variation for
research questions filtering for individuals that perceived AI as a social research questions filtering for individuals that perceived AI as a social
actor, we broadened recruitment in wave 4 to all wave-1 participants. Sample 2 actor, we broadened recruitment in wave 4 to all wave-1 participants.
therefore contains only participants with at least one missing wave.
## Download settings in Qualtrics ## Download settings in Qualtrics
@ -82,7 +81,7 @@ wave in `03_data/01_raw_data/wave*`. The script
data and adds an anonymized ID `subj_id` with entries `subj0001 - sub1009` to data and adds an anonymized ID `subj_id` with entries `subj0001 - sub1009` to
all data sets. all data sets.
Irrelevant columns -- mostly automatically created by Qualtrics -- are also Irrelevant columns - mostly automatically created by Qualtrics - are also
removed. See `anonymization.R` for details. removed. See `anonymization.R` for details.
The anonymized data files are saved to `03_data/02_anonymized_data/` as The anonymized data files are saved to `03_data/02_anonymized_data/` as
@ -90,10 +89,10 @@ CSV files with file names `HMC_<wave>_anonymized.csv`.
# Data cleaning # Data cleaning
After data anonymization, some more rudimentary preprocessing was done on the After data anonymization, some more rudimentary data cleaning was done with the
data with the script `03_data/02_anonymized_data/cleaning.R`. Especially, script `03_data/02_anonymized_data/cleaning.R`. Especially, the original
the original variable names in Qualtrics were harmonized so they all follow the variable names in Qualtrics were harmonized so they all follow the same
same structure. structure.
The cleaned data files are saved to `03_data/03_cleaned_data/` as CSV files with The cleaned data files are saved to `03_data/03_cleaned_data/` as CSV files with
file names `HMC_<wave>_cleaned.csv`. file names `HMC_<wave>_cleaned.csv`.
@ -103,21 +102,31 @@ some cleaning.
## Problems ## Problems
### with variable names over waves ### with variable names
* `trust_fav` and `Q161` and `Q162` * For the variables looking at what tasks subjects would delegate to AI, there
* `obj_know` and `Q158` were some inconsistencies in the naming. This was _only_ in the variable
* the labels of the intention variables were swapped naming, the items were presented correctly to the subjects. The folloing
--> `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa variables were renamed:
* ... - `delg_tsk_typs_4 --> delg_tsk_typs_3`
- `delg_tsk_typs_5 --> delg_tsk_typs_4`
- `delg_tsk_typs_6 --> delg_tsk_typs_5`
- `delg_tsk_typs_7 --> delg_tsk_typs_6`
- `delg_tsk_typs_8 --> delg_tsk_typs_7`
- `delg_tsk_typs_8` was deleted
<!-- TODO: Add more details --> * The labels of the intention variables were swapped by accident and this was
corrected:
- `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa
### with subjects ### with subjects
* Two entries in wave 1: `subj0762` * Two entries in wave 1: `subj0762`
* Three entries in wave 3: `subj1009` * Three entries in wave 3: `subj1009`
* We kept the first entry for each subject * We kept the first entry for each subject
* `subj1009` has been removed from the dataset since it only appeared in wave 3
and it is unclear how this happened; only subjects who participated in wave 1
have been invited to participate in further waves
# Data preprocessing # Data preprocessing
@ -128,11 +137,3 @@ with the final variables for each scale are then saved in the folder
`03_data/04_preprocessed_data` as CSV files with file names `03_data/04_preprocessed_data` as CSV files with file names
`HMC_<wave>_preprocessed.csv`. `HMC_<wave>_preprocessed.csv`.
# TODOs
* Add more preprocessing steps like variable renaming?
* Get age (and other descriptives?) for subj1008 and subj1009 from Profilic
data?