Initialize repository
This commit is contained in:
@@ -0,0 +1,124 @@
|
||||
---
|
||||
title: "Data for the HMC (Human Machine Communication) project"
|
||||
---
|
||||
|
||||
# Variables
|
||||
|
||||
An overview of all variables on item level can be found in the [item
|
||||
reference](item_refrence.md) and in the [EXCEL codebook](HMC_codebook.xlsx).
|
||||
These files show which variables have been collected in each wave.
|
||||
|
||||
# Folder and file organisation
|
||||
|
||||
## Folders
|
||||
|
||||
* `01_raw_data` contains the downloaded files from Qualtrics
|
||||
* `02_anonymized_data` contains the anonymized data files (otherwise this can
|
||||
still be considered raw data)
|
||||
* `03_cleaned` contains data files with harmonized data names; additionally some
|
||||
incorrect variable names were fixed and double entries from subjects who did a
|
||||
wave two or more times were removed; see `cleaning.R` and below for more
|
||||
details
|
||||
|
||||
## Files
|
||||
|
||||
* `HMC_codebook.xlsx` contains all variable names for all waves, with the
|
||||
original descriptions as presented in Qualtrics and the original variable
|
||||
names as well as the harmonized variable names
|
||||
* `HMC_variables.xlsx` contains an overview of the variables, their origin, who
|
||||
wanted them in the data, etc. This file is for internal use and is not
|
||||
commited with the public version
|
||||
|
||||
# Data collection and data files
|
||||
|
||||
The data collection was done in Qualtrics. The following projects are on
|
||||
https://kmrc.qualtrics.com/:
|
||||
|
||||
* `AI_Trends_Wave1`
|
||||
* `AI_Trends_Wave2`
|
||||
* `AI_Trends_Wave3`
|
||||
* `AI_Trends_Wave4`
|
||||
* `AI_Trends_Wave4_sample2`
|
||||
* `AI_Trends_Wave5`
|
||||
* `AI_Trends_Wave5_sample2`
|
||||
* `AI_Trends_Wave6`
|
||||
* `AI_Trends_Wave6_sample2`
|
||||
|
||||
## Sample
|
||||
|
||||
### Sample 2 data files
|
||||
|
||||
Subjects from the first wave that did not participate in the following waves
|
||||
were again invited after...
|
||||
|
||||
<!-- TODO: Add more details -->
|
||||
|
||||
## Download settings in Qualtrics
|
||||
|
||||
The data were downloaded from Qualtrics as CSV files with the following
|
||||
settings.
|
||||
|
||||
### Overall
|
||||
|
||||
- Download all fields
|
||||
- Export values
|
||||
|
||||
### CSV
|
||||
|
||||
- Recode seen but unanswered questions as -99
|
||||
- Recode seen but unanswered multi-value fields as 0
|
||||
- Split multi-value fields into columns
|
||||
|
||||
# Data anonymization
|
||||
|
||||
After download from Qualtrics, files were put in the respective folders for each
|
||||
wave in `03_data/01_raw_data/wave*`. The script
|
||||
`03_data/01_raw_data/anonymization.R` mostly removes the `PROLIFIC_IDs` from the
|
||||
data and adds an anonymized ID `subj_id` with entries `subj0001 - sub1009` to
|
||||
all data sets.
|
||||
|
||||
Irrelevant columns -- mostly automatically created by Qualtrics -- are also
|
||||
removed. See `anonymization.R` for details.
|
||||
|
||||
The anonymized data files are saved to `03_data/02_anonymized_data/ as
|
||||
CSV files with file names `HMC_<wave>_anonymized.csv`.
|
||||
|
||||
# Data preprocessing
|
||||
|
||||
After data anonymization, some more rudimentary preprocessing was done on the
|
||||
data with the script `03_data/02_anonymized_data/cleaning.R`. Especially,
|
||||
the original variable names in Qualtrics were harmonized so they all follow the
|
||||
same structure.
|
||||
|
||||
The cleaned data files are saved to `03_data/03_cleaned_data/`as
|
||||
CSV files with file names `HMC_<wave>_cleaned.csv`.
|
||||
|
||||
The following section gives an overview of the problems in the data, that needed
|
||||
some cleaning.
|
||||
|
||||
## Problems
|
||||
|
||||
### with variable names over waves
|
||||
|
||||
* `trust_fav` and `Q161` and `Q162`
|
||||
* `obj_know` and `Q158`
|
||||
* intention labels sind vertauscht
|
||||
--> `int_use_bhvr_fav = int_use_bhvr_noUser` and vice versa
|
||||
* ...
|
||||
|
||||
<!-- TODO: Add more details -->
|
||||
|
||||
### with subjects
|
||||
|
||||
* Two entries in wave 1: `subj0762`
|
||||
* Three entries in wave 3: `subj1009`
|
||||
* We kept the first entry for each subject
|
||||
|
||||
# TODOs
|
||||
|
||||
* Add more preprocessing steps like variable renaming?
|
||||
|
||||
* Get age (and other descriptives?) for subj1008 and subj1009 from Profilic
|
||||
data?
|
||||
|
||||
|
||||
Reference in New Issue
Block a user