Initialize repository

2025-10-17 11:19:18 +02:00
commit a3ac72c7a3
20 changed files with 5876 additions and 0 deletions
@@ -0,0 +1,124 @@
+---
+title: "Data for the HMC (Human Machine Communication) project"
+---
+
+# Variables
+
+An overview of all variables on item level can be found in the [item
+reference](item_refrence.md) and in the [EXCEL codebook](HMC_codebook.xlsx).
+These files show which variables have been collected in each wave.
+
+# Folder and file organisation
+
+## Folders
+
+* `01_raw_data` contains the downloaded files from Qualtrics
+* `02_anonymized_data` contains the anonymized data files (otherwise this can
+  still be considered raw data)
+* `03_cleaned` contains data files with harmonized data names; additionally some
+  incorrect variable names were fixed and double entries from subjects who did a
+  wave two or more times were removed; see `cleaning.R` and below for more
+  details
+
+## Files
+
+* `HMC_codebook.xlsx` contains all variable names for all waves, with the
+  original descriptions as presented in Qualtrics and the original variable
+  names as well as the harmonized variable names
+* `HMC_variables.xlsx` contains an overview of the variables, their origin, who
+  wanted them in the data, etc. This file is for internal use and is not
+  commited with the public version
+
+# Data collection and data files
+
+The data collection was done in Qualtrics. The following projects are on
+https://kmrc.qualtrics.com/:
+
+* `AI_Trends_Wave1`
+* `AI_Trends_Wave2`
+* `AI_Trends_Wave3`
+* `AI_Trends_Wave4`
+* `AI_Trends_Wave4_sample2`
+* `AI_Trends_Wave5`
+* `AI_Trends_Wave5_sample2`
+* `AI_Trends_Wave6`
+* `AI_Trends_Wave6_sample2`
+
+## Sample
+
+### Sample 2 data files
+
+Subjects from the first wave that did not participate in the following waves
+were again invited after...
+
+<!-- TODO: Add more details -->
+
+## Download settings in Qualtrics
+
+The data were downloaded from Qualtrics as CSV files with the following
+settings.
+
+### Overall
+
+- Download all fields
+- Export values
+
+### CSV
+
+- Recode seen but unanswered questions as -99
+- Recode seen but unanswered multi-value fields as 0
+- Split multi-value fields into columns
+
+# Data anonymization
+
+After download from Qualtrics, files were put in the respective folders for each
+wave in `03_data/01_raw_data/wave*`. The script
+`03_data/01_raw_data/anonymization.R` mostly removes the `PROLIFIC_IDs` from the
+data and adds an anonymized ID `subj_id` with entries `subj0001 - sub1009` to
+all data sets.
+
+Irrelevant columns -- mostly automatically created by Qualtrics -- are also
+removed. See `anonymization.R` for details.
+
+The anonymized data files are saved to `03_data/02_anonymized_data/ as
+CSV files with file names `HMC_<wave>_anonymized.csv`.
+
+# Data preprocessing
+
+After data anonymization, some more rudimentary preprocessing was done on the
+data with the script `03_data/02_anonymized_data/cleaning.R`. Especially,
+the original variable names in Qualtrics were harmonized so they all follow the
+same structure.
+
+The cleaned data files are saved to `03_data/03_cleaned_data/`as
+CSV files with file names `HMC_<wave>_cleaned.csv`.
+
+The following section gives an overview of the problems in the data, that needed
+some cleaning.
+
+## Problems
+
+### with variable names over waves
+
+* `trust_fav` and `Q161` and `Q162`
+* `obj_know` and `Q158`
+* intention labels sind vertauscht
+  --> `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa
+* ...
+
+<!-- TODO: Add more details -->
+
+### with subjects
+
+* Two entries in wave 1: `subj0762`
+* Three entries in wave 3: `subj1009`
+* We kept the first entry for each subject
+
+# TODOs
+
+* Add more preprocessing steps like variable renaming?
+
+* Get age (and other descriptives?) for subj1008 and subj1009 from Profilic
+  data?
+
+