Update READMEs

2025-12-08 16:02:14 +01:00
parent 2aa6441eba
commit 973d552050
2 changed files with 23 additions and 6 deletions
@@ -88,15 +88,15 @@ removed. See `anonymization.R` for details.
 The anonymized data files are saved to `03_data/02_anonymized_data/` as
 CSV files with file names `HMC_<wave>_anonymized.csv`.

-# Data preprocessing
+# Data cleaning

 After data anonymization, some more rudimentary preprocessing was done on the
 data with the script `03_data/02_anonymized_data/cleaning.R`. Especially,
 the original variable names in Qualtrics were harmonized so they all follow the
 same structure.

-The cleaned data files are saved to `03_data/03_cleaned_data/`as
-CSV files with file names `HMC_<wave>_cleaned.csv`.
+The cleaned data files are saved to `03_data/03_cleaned_data/` as CSV files with
+file names `HMC_<wave>_cleaned.csv`.

 The following section gives an overview of the problems in the data, that needed
 some cleaning.
@@ -119,6 +119,15 @@ some cleaning.
 * Three entries in wave 3: `subj1009`
 * We kept the first entry for each subject

+# Data preprocessing
+
+The final data preprocessing creates scales from the collected items. It was
+done in Python and the code for the preprocessing can be found in a separate
+code repository: https://gitea.iwm-tuebingen.de/HMC/preprocessing. The files
+with the final variables for each scale are then saved in the folder
+`03_data/04_preprocessed_data` as CSV files with file names
+`HMC_<wave>_preprocessed.csv`.
+
 # TODOs

 * Add more preprocessing steps like variable renaming?