124 lines
3.5 KiB
Markdown
124 lines
3.5 KiB
Markdown
Data for the HMC (Human Machine Communication) project
|
|
================
|
|
|
|
# Variables
|
|
|
|
An overview of all variables on item level can be found in the [item
|
|
reference](item_refrence.md) and in the [EXCEL codebook](HMC_codebook.xlsx).
|
|
These files show which variables have been collected in each wave.
|
|
|
|
# Folder and file organisation
|
|
|
|
## Folders
|
|
|
|
* `01_raw_data` contains the downloaded files from Qualtrics
|
|
* `02_anonymized_data` contains the anonymized data files (otherwise this can
|
|
still be considered raw data)
|
|
* `03_cleaned` contains data files with harmonized data names; additionally some
|
|
incorrect variable names were fixed and double entries from subjects who did a
|
|
wave two or more times were removed; see `cleaning.R` and below for more
|
|
details
|
|
|
|
## Files
|
|
|
|
* `HMC_codebook.xlsx` contains all variable names for all waves, with the
|
|
original descriptions as presented in Qualtrics and the original variable
|
|
names as well as the harmonized variable names
|
|
* `HMC_variables.xlsx` contains an overview of the variables, their origin, who
|
|
wanted them in the data, etc. This file is for internal use and is not
|
|
commited with the public version
|
|
|
|
# Data collection and data files
|
|
|
|
The data collection was done in Qualtrics. The following projects are on
|
|
https://kmrc.qualtrics.com/:
|
|
|
|
* `AI_Trends_Wave1`
|
|
* `AI_Trends_Wave2`
|
|
* `AI_Trends_Wave3`
|
|
* `AI_Trends_Wave4`
|
|
* `AI_Trends_Wave4_sample2`
|
|
* `AI_Trends_Wave5`
|
|
* `AI_Trends_Wave5_sample2`
|
|
* `AI_Trends_Wave6`
|
|
* `AI_Trends_Wave6_sample2`
|
|
|
|
## Sample
|
|
|
|
### Sample 2 data files
|
|
|
|
Subjects from the first wave that did not participate in the following waves
|
|
were again invited after...
|
|
|
|
<!-- TODO: Add more details -->
|
|
|
|
## Download settings in Qualtrics
|
|
|
|
The data were downloaded from Qualtrics as CSV files with the following
|
|
settings.
|
|
|
|
### Overall
|
|
|
|
- Download all fields
|
|
- Export values
|
|
|
|
### CSV
|
|
|
|
- Recode seen but unanswered questions as -99
|
|
- Recode seen but unanswered multi-value fields as 0
|
|
- Split multi-value fields into columns
|
|
|
|
# Data anonymization
|
|
|
|
After download from Qualtrics, files were put in the respective folders for each
|
|
wave in `03_data/01_raw_data/wave*`. The script
|
|
`03_data/01_raw_data/anonymization.R` mostly removes the `PROLIFIC_IDs` from the
|
|
data and adds an anonymized ID `subj_id` with entries `subj0001 - sub1009` to
|
|
all data sets.
|
|
|
|
Irrelevant columns -- mostly automatically created by Qualtrics -- are also
|
|
removed. See `anonymization.R` for details.
|
|
|
|
The anonymized data files are saved to `03_data/02_anonymized_data/ as
|
|
CSV files with file names `HMC_<wave>_anonymized.csv`.
|
|
|
|
# Data preprocessing
|
|
|
|
After data anonymization, some more rudimentary preprocessing was done on the
|
|
data with the script `03_data/02_anonymized_data/cleaning.R`. Especially,
|
|
the original variable names in Qualtrics were harmonized so they all follow the
|
|
same structure.
|
|
|
|
The cleaned data files are saved to `03_data/03_cleaned_data/`as
|
|
CSV files with file names `HMC_<wave>_cleaned.csv`.
|
|
|
|
The following section gives an overview of the problems in the data, that needed
|
|
some cleaning.
|
|
|
|
## Problems
|
|
|
|
### with variable names over waves
|
|
|
|
* `trust_fav` and `Q161` and `Q162`
|
|
* `obj_know` and `Q158`
|
|
* intention labels sind vertauscht
|
|
--> `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa
|
|
* ...
|
|
|
|
<!-- TODO: Add more details -->
|
|
|
|
### with subjects
|
|
|
|
* Two entries in wave 1: `subj0762`
|
|
* Three entries in wave 3: `subj1009`
|
|
* We kept the first entry for each subject
|
|
|
|
# TODOs
|
|
|
|
* Add more preprocessing steps like variable renaming?
|
|
|
|
* Get age (and other descriptives?) for subj1008 and subj1009 from Profilic
|
|
data?
|
|
|
|
|