130 lines
4.1 KiB
Markdown
130 lines
4.1 KiB
Markdown
Data for the HMC (Human Machine Communication) project
|
||
================
|
||
|
||
# Variables
|
||
|
||
An overview of all variables on item level can be found in the [item
|
||
reference](item_refrence.md) and in the [EXCEL codebook](HMC_codebook.xlsx).
|
||
These files show which variables have been collected in each wave.
|
||
|
||
# Folder and file organisation
|
||
|
||
## Folders
|
||
|
||
* `01_raw_data` contains the downloaded files from Qualtrics
|
||
* `02_anonymized_data` contains the anonymized data files (otherwise this can
|
||
still be considered raw data)
|
||
* `03_cleaned` contains data files with harmonized data names; additionally some
|
||
incorrect variable names were fixed and double entries from subjects who did a
|
||
wave two or more times were removed; see `cleaning.R` and below for more
|
||
details
|
||
|
||
## Files
|
||
|
||
* `HMC_codebook.xlsx` contains all variable names for all waves, with the
|
||
original descriptions as presented in Qualtrics and the original variable
|
||
names as well as the harmonized variable names
|
||
* `HMC_variables.xlsx` contains an overview of the variables, their origin, who
|
||
wanted them in the data, etc. This file is for internal use and is not
|
||
commited with the public version
|
||
|
||
# Data collection and data files
|
||
|
||
The data collection was done in Qualtrics. The following projects are on
|
||
https://kmrc.qualtrics.com/:
|
||
|
||
* `AI_Trends_Wave1`
|
||
* `AI_Trends_Wave2`
|
||
* `AI_Trends_Wave3`
|
||
* `AI_Trends_Wave4`
|
||
* `AI_Trends_Wave4_sample2`
|
||
* `AI_Trends_Wave5`
|
||
* `AI_Trends_Wave5_sample2`
|
||
* `AI_Trends_Wave6`
|
||
* `AI_Trends_Wave6_sample2`
|
||
|
||
## Sample
|
||
|
||
### Sample 2 data files
|
||
|
||
After wave 3, we re-invited wave-1 participants for waves 4–6 to increase
|
||
statistical power for questions that did not require participation in all six
|
||
waves. This departed from our original plan to invite only participants from the
|
||
immediately preceding wave because ongoing monitoring showed that many non-users
|
||
remained non-users and that relatively few participants perceived AI as a social
|
||
actor. To capture more contemporary usage and obtain sufficient variation for
|
||
research questions filtering for individuals that perceived AI as a social
|
||
actore, we broadened recruitment in wave 4 to all wave-1 participants. Sample 2
|
||
therefore contains only participants with at least one missing wave.
|
||
|
||
|
||
## Download settings in Qualtrics
|
||
|
||
The data were downloaded from Qualtrics as CSV files with the following
|
||
settings.
|
||
|
||
### Overall
|
||
|
||
- Download all fields
|
||
- Export values
|
||
|
||
### CSV
|
||
|
||
- Recode seen but unanswered questions as -99
|
||
- Recode seen but unanswered multi-value fields as 0
|
||
- Split multi-value fields into columns
|
||
|
||
# Data anonymization
|
||
|
||
After download from Qualtrics, files were put in the respective folders for each
|
||
wave in `03_data/01_raw_data/wave*`. The script
|
||
`03_data/01_raw_data/anonymization.R` mostly removes the `PROLIFIC_IDs` from the
|
||
data and adds an anonymized ID `subj_id` with entries `subj0001 - sub1009` to
|
||
all data sets.
|
||
|
||
Irrelevant columns -- mostly automatically created by Qualtrics -- are also
|
||
removed. See `anonymization.R` for details.
|
||
|
||
The anonymized data files are saved to `03_data/02_anonymized_data/ as
|
||
CSV files with file names `HMC_<wave>_anonymized.csv`.
|
||
|
||
# Data preprocessing
|
||
|
||
After data anonymization, some more rudimentary preprocessing was done on the
|
||
data with the script `03_data/02_anonymized_data/cleaning.R`. Especially,
|
||
the original variable names in Qualtrics were harmonized so they all follow the
|
||
same structure.
|
||
|
||
The cleaned data files are saved to `03_data/03_cleaned_data/`as
|
||
CSV files with file names `HMC_<wave>_cleaned.csv`.
|
||
|
||
The following section gives an overview of the problems in the data, that needed
|
||
some cleaning.
|
||
|
||
## Problems
|
||
|
||
### with variable names over waves
|
||
|
||
* `trust_fav` and `Q161` and `Q162`
|
||
* `obj_know` and `Q158`
|
||
* intention labels sind vertauscht
|
||
--> `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa
|
||
* ...
|
||
|
||
<!-- TODO: Add more details -->
|
||
|
||
### with subjects
|
||
|
||
* Two entries in wave 1: `subj0762`
|
||
* Three entries in wave 3: `subj1009`
|
||
* We kept the first entry for each subject
|
||
|
||
# TODOs
|
||
|
||
* Add more preprocessing steps like variable renaming?
|
||
|
||
* Get age (and other descriptives?) for subj1008 and subj1009 from Profilic
|
||
data?
|
||
|
||
|