140 lines
4.9 KiB
Markdown
140 lines
4.9 KiB
Markdown
Data for the HMC (Human Machine Communication) project
|
||
================
|
||
|
||
# Variables
|
||
|
||
An overview of all variables on item level can be found in the [item
|
||
reference](item_refrence.md) and in the [EXCEL codebook](HMC_codebook.xlsx).
|
||
These files show which variables have been collected in each wave.
|
||
|
||
# Folder and file organisation
|
||
|
||
## Folders
|
||
|
||
* `01_raw_data` contains the downloaded files from Qualtrics
|
||
* `02_anonymized_data` contains the anonymized data files (otherwise this can
|
||
still be considered raw data)
|
||
* `03_cleaned` contains data files with harmonized data names; additionally some
|
||
incorrect variable names were fixed and double entries from subjects who did a
|
||
wave two or more times were removed; see `cleaning.R` and below for more
|
||
details
|
||
|
||
## Files
|
||
|
||
* `HMC_codebook.xlsx` contains all variable names for all waves, with the
|
||
original descriptions as presented in Qualtrics and the original variable
|
||
names as well as the harmonized variable names
|
||
* `HMC_variables.xlsx` contains an overview of the variables, their origin, who
|
||
wanted them in the data, etc. This file is for internal use and is not
|
||
commited with the public version
|
||
|
||
# Data collection and data files
|
||
|
||
The data collection was done in Qualtrics. The following projects are on
|
||
https://kmrc.qualtrics.com/:
|
||
|
||
* `AI_Trends_Wave1`
|
||
* `AI_Trends_Wave2`
|
||
* `AI_Trends_Wave3`
|
||
* `AI_Trends_Wave4`
|
||
* `AI_Trends_Wave4_sample2`
|
||
* `AI_Trends_Wave5`
|
||
* `AI_Trends_Wave5_sample2`
|
||
* `AI_Trends_Wave6`
|
||
* `AI_Trends_Wave6_sample2`
|
||
|
||
## Sample
|
||
|
||
### Sample 2 data files
|
||
|
||
After wave 3, we re-invited wave-1 participants for waves 4–6 to increase
|
||
statistical power for questions that did not require participation in all six
|
||
waves. This departed from our original plan to invite only participants from the
|
||
immediately preceding wave because ongoing monitoring showed that many non-users
|
||
remained non-users and that relatively few participants perceived AI as a social
|
||
actor. To capture more contemporary usage and obtain sufficient variation for
|
||
research questions filtering for individuals that perceived AI as a social
|
||
actor, we broadened recruitment in wave 4 to all wave-1 participants.
|
||
|
||
|
||
## Download settings in Qualtrics
|
||
|
||
The data were downloaded from Qualtrics as CSV files with the following
|
||
settings.
|
||
|
||
### Overall
|
||
|
||
- Download all fields
|
||
- Export values
|
||
|
||
### CSV
|
||
|
||
- Recode seen but unanswered questions as -99
|
||
- Recode seen but unanswered multi-value fields as 0
|
||
- Split multi-value fields into columns
|
||
|
||
# Data anonymization
|
||
|
||
After download from Qualtrics, files were put in the respective folders for each
|
||
wave in `03_data/01_raw_data/wave*`. The script
|
||
`03_data/01_raw_data/anonymization.R` mostly removes the `PROLIFIC_IDs` from the
|
||
data and adds an anonymized ID `subj_id` with entries `subj0001 - sub1009` to
|
||
all data sets.
|
||
|
||
Irrelevant columns - mostly automatically created by Qualtrics - are also
|
||
removed. See `anonymization.R` for details.
|
||
|
||
The anonymized data files are saved to `03_data/02_anonymized_data/` as
|
||
CSV files with file names `HMC_<wave>_anonymized.csv`.
|
||
|
||
# Data cleaning
|
||
|
||
After data anonymization, some more rudimentary data cleaning was done with the
|
||
script `03_data/02_anonymized_data/cleaning.R`. Especially, the original
|
||
variable names in Qualtrics were harmonized so they all follow the same
|
||
structure.
|
||
|
||
The cleaned data files are saved to `03_data/03_cleaned_data/` as CSV files with
|
||
file names `HMC_<wave>_cleaned.csv`.
|
||
|
||
The following section gives an overview of the problems in the data, that needed
|
||
some cleaning.
|
||
|
||
## Problems
|
||
|
||
### with variable names
|
||
|
||
* For the variables looking at what tasks subjects would delegate to AI, there
|
||
were some inconsistencies in the naming. This was _only_ in the variable
|
||
naming, the items were presented correctly to the subjects. The folloing
|
||
variables were renamed:
|
||
- `delg_tsk_typs_4 --> delg_tsk_typs_3`
|
||
- `delg_tsk_typs_5 --> delg_tsk_typs_4`
|
||
- `delg_tsk_typs_6 --> delg_tsk_typs_5`
|
||
- `delg_tsk_typs_7 --> delg_tsk_typs_6`
|
||
- `delg_tsk_typs_8 --> delg_tsk_typs_7`
|
||
- `delg_tsk_typs_8` was deleted
|
||
|
||
* The labels of the intention variables were swapped by accident and this was
|
||
corrected:
|
||
- `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa
|
||
|
||
### with subjects
|
||
|
||
* Two entries in wave 1: `subj0762`
|
||
* Three entries in wave 3: `subj1009`
|
||
* We kept the first entry for each subject
|
||
* `subj1009` has been removed from the dataset since it only appeared in wave 3
|
||
and it is unclear how this happened; only subjects who participated in wave 1
|
||
have been invited to participate in further waves
|
||
|
||
# Data preprocessing
|
||
|
||
The final data preprocessing creates scales from the collected items. It was
|
||
done in Python and the code for the preprocessing can be found in a separate
|
||
code repository: https://gitea.iwm-tuebingen.de/HMC/preprocessing. The files
|
||
with the final variables for each scale are then saved in the folder
|
||
`03_data/04_preprocessed_data` as CSV files with file names
|
||
`HMC_<wave>_preprocessed.csv`.
|
||
|