HMC/data

Nora Wickelmaier 5fd0fb2326 Update READMEs

2025-10-17 11:23:52 +02:00

3.5 KiB

Raw Blame History

Data for the HMC (Human Machine Communication) project

Variables

An overview of all variables on item level can be found in the item reference and in the EXCEL codebook. These files show which variables have been collected in each wave.

Folder and file organisation

Folders

01_raw_data contains the downloaded files from Qualtrics
02_anonymized_data contains the anonymized data files (otherwise this can still be considered raw data)
03_cleaned contains data files with harmonized data names; additionally some incorrect variable names were fixed and double entries from subjects who did a wave two or more times were removed; see cleaning.R and below for more details

Files

HMC_codebook.xlsx contains all variable names for all waves, with the original descriptions as presented in Qualtrics and the original variable names as well as the harmonized variable names
HMC_variables.xlsx contains an overview of the variables, their origin, who wanted them in the data, etc. This file is for internal use and is not commited with the public version

Data collection and data files

The data collection was done in Qualtrics. The following projects are on https://kmrc.qualtrics.com/:

AI_Trends_Wave1
AI_Trends_Wave2
AI_Trends_Wave3
AI_Trends_Wave4
AI_Trends_Wave4_sample2
AI_Trends_Wave5
AI_Trends_Wave5_sample2
AI_Trends_Wave6
AI_Trends_Wave6_sample2

Sample

Sample 2 data files

Subjects from the first wave that did not participate in the following waves were again invited after...

Download settings in Qualtrics

The data were downloaded from Qualtrics as CSV files with the following settings.

Overall

Download all fields
Export values

CSV

Recode seen but unanswered questions as -99
Recode seen but unanswered multi-value fields as 0
Split multi-value fields into columns

Data anonymization

After download from Qualtrics, files were put in the respective folders for each wave in 03_data/01_raw_data/wave*. The script 03_data/01_raw_data/anonymization.R mostly removes the PROLIFIC_IDs from the data and adds an anonymized ID subj_id with entries subj0001 - sub1009 to all data sets.

Irrelevant columns -- mostly automatically created by Qualtrics -- are also removed. See anonymization.R for details.

The anonymized data files are saved to 03_data/02_anonymized_data/ as CSV files with file names HMC__anonymized.csv`.

Data preprocessing

After data anonymization, some more rudimentary preprocessing was done on the data with the script 03_data/02_anonymized_data/cleaning.R. Especially, the original variable names in Qualtrics were harmonized so they all follow the same structure.

The cleaned data files are saved to 03_data/03_cleaned_data/as CSV files with file names HMC_<wave>_cleaned.csv.

The following section gives an overview of the problems in the data, that needed some cleaning.

Problems

with variable names over waves

trust_fav and Q161 and Q162
obj_know and Q158
intention labels sind vertauscht --> int_use_bhvr_fav = int_use_bhvr_noUser and vice versa
...

with subjects

Two entries in wave 1: subj0762
Three entries in wave 3: subj1009
We kept the first entry for each subject

TODOs

Add more preprocessing steps like variable renaming?
Get age (and other descriptives?) for subj1008 and subj1009 from Profilic data?

3.5 KiB Raw Blame History