Data for the HMC (Human Machine Communication) project ================ # Variables An overview of all variables on item level can be found in the [item reference](item_refrence.md) and in the [EXCEL codebook](HMC_codebook.xlsx). These files show which variables have been collected in each wave. # Folder and file organisation ## Folders * `01_raw_data` contains the downloaded files from Qualtrics * `02_anonymized_data` contains the anonymized data files (otherwise this can still be considered raw data) * `03_cleaned` contains data files with harmonized data names; additionally some incorrect variable names were fixed and double entries from subjects who did a wave two or more times were removed; see `cleaning.R` and below for more details ## Files * `HMC_codebook.xlsx` contains all variable names for all waves, with the original descriptions as presented in Qualtrics and the original variable names as well as the harmonized variable names * `HMC_variables.xlsx` contains an overview of the variables, their origin, who wanted them in the data, etc. This file is for internal use and is not commited with the public version # Data collection and data files The data collection was done in Qualtrics. The following projects are on https://kmrc.qualtrics.com/: * `AI_Trends_Wave1` * `AI_Trends_Wave2` * `AI_Trends_Wave3` * `AI_Trends_Wave4` * `AI_Trends_Wave4_sample2` * `AI_Trends_Wave5` * `AI_Trends_Wave5_sample2` * `AI_Trends_Wave6` * `AI_Trends_Wave6_sample2` ## Sample ### Sample 2 data files Subjects from the first wave that did not participate in the following waves were again invited after... ## Download settings in Qualtrics The data were downloaded from Qualtrics as CSV files with the following settings. ### Overall - Download all fields - Export values ### CSV - Recode seen but unanswered questions as -99 - Recode seen but unanswered multi-value fields as 0 - Split multi-value fields into columns # Data anonymization After download from Qualtrics, files were put in the respective folders for each wave in `03_data/01_raw_data/wave*`. The script `03_data/01_raw_data/anonymization.R` mostly removes the `PROLIFIC_IDs` from the data and adds an anonymized ID `subj_id` with entries `subj0001 - sub1009` to all data sets. Irrelevant columns -- mostly automatically created by Qualtrics -- are also removed. See `anonymization.R` for details. The anonymized data files are saved to `03_data/02_anonymized_data/ as CSV files with file names `HMC__anonymized.csv`. # Data preprocessing After data anonymization, some more rudimentary preprocessing was done on the data with the script `03_data/02_anonymized_data/cleaning.R`. Especially, the original variable names in Qualtrics were harmonized so they all follow the same structure. The cleaned data files are saved to `03_data/03_cleaned_data/`as CSV files with file names `HMC__cleaned.csv`. The following section gives an overview of the problems in the data, that needed some cleaning. ## Problems ### with variable names over waves * `trust_fav` and `Q161` and `Q162` * `obj_know` and `Q158` * intention labels sind vertauscht --> `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa * ... ### with subjects * Two entries in wave 1: `subj0762` * Three entries in wave 3: `subj1009` * We kept the first entry for each subject # TODOs * Add more preprocessing steps like variable renaming? * Get age (and other descriptives?) for subj1008 and subj1009 from Profilic data?