Finalize data README
This commit is contained in:
parent
bbdb35559c
commit
6921203765
@ -54,8 +54,7 @@ immediately preceding wave because ongoing monitoring showed that many non-users
|
|||||||
remained non-users and that relatively few participants perceived AI as a social
|
remained non-users and that relatively few participants perceived AI as a social
|
||||||
actor. To capture more contemporary usage and obtain sufficient variation for
|
actor. To capture more contemporary usage and obtain sufficient variation for
|
||||||
research questions filtering for individuals that perceived AI as a social
|
research questions filtering for individuals that perceived AI as a social
|
||||||
actor, we broadened recruitment in wave 4 to all wave-1 participants. Sample 2
|
actor, we broadened recruitment in wave 4 to all wave-1 participants.
|
||||||
therefore contains only participants with at least one missing wave.
|
|
||||||
|
|
||||||
|
|
||||||
## Download settings in Qualtrics
|
## Download settings in Qualtrics
|
||||||
@ -82,7 +81,7 @@ wave in `03_data/01_raw_data/wave*`. The script
|
|||||||
data and adds an anonymized ID `subj_id` with entries `subj0001 - sub1009` to
|
data and adds an anonymized ID `subj_id` with entries `subj0001 - sub1009` to
|
||||||
all data sets.
|
all data sets.
|
||||||
|
|
||||||
Irrelevant columns -- mostly automatically created by Qualtrics -- are also
|
Irrelevant columns - mostly automatically created by Qualtrics - are also
|
||||||
removed. See `anonymization.R` for details.
|
removed. See `anonymization.R` for details.
|
||||||
|
|
||||||
The anonymized data files are saved to `03_data/02_anonymized_data/` as
|
The anonymized data files are saved to `03_data/02_anonymized_data/` as
|
||||||
@ -90,10 +89,10 @@ CSV files with file names `HMC_<wave>_anonymized.csv`.
|
|||||||
|
|
||||||
# Data cleaning
|
# Data cleaning
|
||||||
|
|
||||||
After data anonymization, some more rudimentary preprocessing was done on the
|
After data anonymization, some more rudimentary data cleaning was done with the
|
||||||
data with the script `03_data/02_anonymized_data/cleaning.R`. Especially,
|
script `03_data/02_anonymized_data/cleaning.R`. Especially, the original
|
||||||
the original variable names in Qualtrics were harmonized so they all follow the
|
variable names in Qualtrics were harmonized so they all follow the same
|
||||||
same structure.
|
structure.
|
||||||
|
|
||||||
The cleaned data files are saved to `03_data/03_cleaned_data/` as CSV files with
|
The cleaned data files are saved to `03_data/03_cleaned_data/` as CSV files with
|
||||||
file names `HMC_<wave>_cleaned.csv`.
|
file names `HMC_<wave>_cleaned.csv`.
|
||||||
@ -103,21 +102,31 @@ some cleaning.
|
|||||||
|
|
||||||
## Problems
|
## Problems
|
||||||
|
|
||||||
### with variable names over waves
|
### with variable names
|
||||||
|
|
||||||
* `trust_fav` and `Q161` and `Q162`
|
* For the variables looking at what tasks subjects would delegate to AI, there
|
||||||
* `obj_know` and `Q158`
|
were some inconsistencies in the naming. This was _only_ in the variable
|
||||||
* the labels of the intention variables were swapped
|
naming, the items were presented correctly to the subjects. The folloing
|
||||||
--> `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa
|
variables were renamed:
|
||||||
* ...
|
- `delg_tsk_typs_4 --> delg_tsk_typs_3`
|
||||||
|
- `delg_tsk_typs_5 --> delg_tsk_typs_4`
|
||||||
|
- `delg_tsk_typs_6 --> delg_tsk_typs_5`
|
||||||
|
- `delg_tsk_typs_7 --> delg_tsk_typs_6`
|
||||||
|
- `delg_tsk_typs_8 --> delg_tsk_typs_7`
|
||||||
|
- `delg_tsk_typs_8` was deleted
|
||||||
|
|
||||||
<!-- TODO: Add more details -->
|
* The labels of the intention variables were swapped by accident and this was
|
||||||
|
corrected:
|
||||||
|
- `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa
|
||||||
|
|
||||||
### with subjects
|
### with subjects
|
||||||
|
|
||||||
* Two entries in wave 1: `subj0762`
|
* Two entries in wave 1: `subj0762`
|
||||||
* Three entries in wave 3: `subj1009`
|
* Three entries in wave 3: `subj1009`
|
||||||
* We kept the first entry for each subject
|
* We kept the first entry for each subject
|
||||||
|
* `subj1009` has been removed from the dataset since it only appeared in wave 3
|
||||||
|
and it is unclear how this happened; only subjects who participated in wave 1
|
||||||
|
have been invited to participate in further waves
|
||||||
|
|
||||||
# Data preprocessing
|
# Data preprocessing
|
||||||
|
|
||||||
@ -128,11 +137,3 @@ with the final variables for each scale are then saved in the folder
|
|||||||
`03_data/04_preprocessed_data` as CSV files with file names
|
`03_data/04_preprocessed_data` as CSV files with file names
|
||||||
`HMC_<wave>_preprocessed.csv`.
|
`HMC_<wave>_preprocessed.csv`.
|
||||||
|
|
||||||
# TODOs
|
|
||||||
|
|
||||||
* Add more preprocessing steps like variable renaming?
|
|
||||||
|
|
||||||
* Get age (and other descriptives?) for subj1008 and subj1009 from Profilic
|
|
||||||
data?
|
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user