Data for the HMC (Human Machine Communication) project ================ # Variables An overview of all variables on item level can be found in the [item reference](item_refrence.md) and in the [EXCEL codebook](HMC_codebook.xlsx). These files show which variables have been collected in each wave. # Folder and file organisation ## Folders * `01_raw_data` contains the downloaded files from Qualtrics * `02_anonymized_data` contains the anonymized data files (otherwise this can still be considered raw data) * `03_cleaned` contains data files with harmonized data names; additionally some incorrect variable names were fixed and double entries from subjects who did a wave two or more times were removed; see `cleaning.R` and below for more details ## Files * `HMC_codebook.xlsx` contains all variable names for all waves, with the original descriptions as presented in Qualtrics and the original variable names as well as the harmonized variable names * `HMC_variables.xlsx` contains an overview of the variables, their origin, who wanted them in the data, etc. This file is for internal use and is not commited with the public version # Data collection and data files The data collection was done in Qualtrics. The following projects are on https://kmrc.qualtrics.com/: * `AI_Trends_Wave1` * `AI_Trends_Wave2` * `AI_Trends_Wave3` * `AI_Trends_Wave4` * `AI_Trends_Wave4_sample2` * `AI_Trends_Wave5` * `AI_Trends_Wave5_sample2` * `AI_Trends_Wave6` * `AI_Trends_Wave6_sample2` ## Sample ### Sample 2 data files After wave 3, we re-invited wave-1 participants for waves 4–6 to increase statistical power for questions that did not require participation in all six waves. This departed from our original plan to invite only participants from the immediately preceding wave because ongoing monitoring showed that many non-users remained non-users and that relatively few participants perceived AI as a social actor. To capture more contemporary usage and obtain sufficient variation for research questions filtering for individuals that perceived AI as a social actore, we broadened recruitment in wave 4 to all wave-1 participants. Sample 2 therefore contains only participants with at least one missing wave. ## Download settings in Qualtrics The data were downloaded from Qualtrics as CSV files with the following settings. ### Overall - Download all fields - Export values ### CSV - Recode seen but unanswered questions as -99 - Recode seen but unanswered multi-value fields as 0 - Split multi-value fields into columns # Data anonymization After download from Qualtrics, files were put in the respective folders for each wave in `03_data/01_raw_data/wave*`. The script `03_data/01_raw_data/anonymization.R` mostly removes the `PROLIFIC_IDs` from the data and adds an anonymized ID `subj_id` with entries `subj0001 - sub1009` to all data sets. Irrelevant columns -- mostly automatically created by Qualtrics -- are also removed. See `anonymization.R` for details. The anonymized data files are saved to `03_data/02_anonymized_data/` as CSV files with file names `HMC__anonymized.csv`. # Data preprocessing After data anonymization, some more rudimentary preprocessing was done on the data with the script `03_data/02_anonymized_data/cleaning.R`. Especially, the original variable names in Qualtrics were harmonized so they all follow the same structure. The cleaned data files are saved to `03_data/03_cleaned_data/`as CSV files with file names `HMC__cleaned.csv`. The following section gives an overview of the problems in the data, that needed some cleaning. ## Problems ### with variable names over waves * `trust_fav` and `Q161` and `Q162` * `obj_know` and `Q158` * intention labels sind vertauscht --> `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa * ... ### with subjects * Two entries in wave 1: `subj0762` * Three entries in wave 3: `subj1009` * We kept the first entry for each subject # TODOs * Add more preprocessing steps like variable renaming? * Get age (and other descriptives?) for subj1008 and subj1009 from Profilic data?