# Accompanying Analysis Code for the Master Thesis "Analyzing Log Data from Multi-Touch Tables" The multi-touch table at the Herzog-Anton-Ulrich-Museum (HAUM) in Braunschweig gives visitors of the Museum the opportunity to interact with about 70 artworks and 3 virtual cards containing information about the museum and its layout. The table was installed at the museum in October 2016 and since November 2016 log files from interactions of visitors of the museum have been collected. The master thesis for which this repository was created analyzes data collected between December 14, 2016 and July 5, 2023. In total, the data set consists of 39,767 log files containing 6,700,176 events. The following gives a short overview over the analyses conducted. All analysis scripts can be found in the `/code/` folder. The complete folder structure of this project looks like this (not all folders are committed to the repository): ``` // | |- /code/ |- /data/ |- /haum/ |- /ContentEyevisit/ |- /LogFiles/ |- /metadata/ |- /figures/ |- /results/ |-- README.md ``` ## Preprocessing and Descriptives The first script `01_preprocessing.R` preprocesses the raw log files by first parsing them so they are readable by standard statistics software like R or Python and then converting it to event logs. A short R package doing the preprocessing and more information can be found at . The second script `02_descriptives.R` calculates some descriptive statistics and creates plots to get an overall feeling for the data set. ## Conformance Checking A normative Petri net to test the data quality after the preprocessing is created in `03_create-petrinet.py` and the actual data quality check is done in `04_conformance-checking.py`. Both scripts are written in Python using the pm4py library. For more information and the full documentation go to . The next script `05_check-traces.R` (written in R again) checks the corrupt trace found during conformance checking and exports the cleaned data sets used for the following analyses. ## Clustering of Items To answer the first research question in the thesis "Do interaction patterns look different for different artworks? (Control-flow perspective)" process mining was applied to all paths separately for each item on the multi-touch table. Fitness, precision, generalizability, simplicity, soundness, number of connecting arcs, number of transitions, number of places, number of different variants, and the most frequent variant were obtained and saved to a CSV file (Python script `06_infos-items.py`). These information were then read into R in the next script (`07_item-clustering.R`) and used (together with other features) for hierarchical clustering. ## Clustering of Cases For the second research question "What kind of patterns exist and are there typical user behaviors? (Case perspective)" six indicator variables for five proposed user navigation types were calculated in `08_case-characteristics.R` and then used for hierarchical clustering und recursive partitioning to extract the different navigation types in script `09_user-navigation.R`. A validation of the results for data from 2018 was done in `10_validation.R`. Different variants for the cases for the complete data set and the data used for investigating the navigation types (all log files from 2019) was done in `11_investigate-variants.R` and the found clusters of the navigation types were further investigated with process mining techniques in R (`12_dfgs-case-clusters.R`) and Python (`13_pm-case-clusters.py`). # Software versions ## R The following gives the output of `sessionInfo()` with all relevant packages attached. ``` R version 4.3.3 (2024-02-29 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045) Matrix products: default locale: [1] LC_COLLATE=German_Germany.utf8 [2] LC_CTYPE=German_Germany.utf8 [3] LC_MONETARY=German_Germany.utf8 [4] LC_NUMERIC=C [5] LC_TIME=German_Germany.utf8 time zone: Europe/Berlin tzcode source: internal attached base packages: [1] grid stats graphics grDevices utils [6] datasets methods base other attached packages: [1] igraph_2.0.3 DiagrammeR_1.0.11 partykit_1.2-20 [4] mvtnorm_1.2-4 libcoin_1.0-10 rpart_4.1.23 [7] rgl_1.3.1 smacof_2.1-6 e1071_1.7-14 [10] colorspace_2.1-0 plotrix_3.8-4 vioplot_0.4.0 [13] zoo_1.8-12 sm_2.2-6.0 cluster_2.1.6 [16] factoextra_1.0.7 ggplot2_3.5.0 dplyr_1.1.4 [19] pbapply_1.7-2 petrinetR_0.3.0 processmapR_0.5.3 [22] eventdataR_0.3.1 edeaR_0.9.4 bupaR_0.5.4 [25] lattice_0.22-6 devtools_2.4.5 usethis_2.2.3 [28] mtt_0.0-1 loaded via a namespace (and not attached): [1] RColorBrewer_1.1-3 rstudioapi_0.16.0 [3] jsonlite_1.8.8 shape_1.4.6.1 [5] magrittr_2.0.3 jomo_2.7-6 [7] nloptr_2.0.3 rmarkdown_2.26 [9] fs_1.6.3 vctrs_0.6.5 [11] heplots_1.6.2 memoise_2.0.1 [13] minqa_1.2.6 base64enc_0.1-3 [15] htmltools_0.5.8.1 forcats_1.0.0 [17] polynom_1.4-1 weights_1.0.4 [19] broom_1.0.5 Formula_1.2-5 [21] mitml_0.4-5 htmlwidgets_1.6.4 [23] plotly_4.10.4 lubridate_1.9.3 [25] cachem_1.0.8 mime_0.12 [27] lifecycle_1.0.4 iterators_1.0.14 [29] pkgconfig_2.0.3 Matrix_1.6-5 [31] R6_2.5.1 fastmap_1.1.1 [33] shiny_1.8.1.1 digest_0.6.35 [35] shinyTime_1.0.3 pkgload_1.3.4 [37] ellipse_0.5.0 Hmisc_5.1-2 [39] fansi_1.0.6 timechange_0.3.0 [41] nnls_1.5 gdata_3.0.0 [43] abind_1.4-5 httr_1.4.7 [45] compiler_4.3.3 proxy_0.4-27 [47] remotes_2.5.0 withr_3.0.0 [49] doParallel_1.0.17 htmlTable_2.4.2 [51] backports_1.4.1 carData_3.0-5 [53] pkgbuild_1.4.4 pan_1.9 [55] MASS_7.3-60.0.1 sessioninfo_1.2.2 [57] gtools_3.9.5 tools_4.3.3 [59] foreign_0.8-86 httpuv_1.6.15 [61] nnet_7.3-19 glue_1.7.0 [63] inum_1.0-5 nlme_3.1-164 [65] promises_1.3.0 checkmate_2.3.1 [67] generics_0.1.3 gtable_0.3.4 [69] class_7.3-22 tidyr_1.3.1 [71] data.table_1.15.4 hms_1.1.3 [73] car_3.1-2 xml2_1.3.6 [75] utf8_1.2.4 ggrepel_0.9.5 [77] foreach_1.5.2 pillar_1.9.0 [79] stringr_1.5.1 later_1.3.2 [81] splines_4.3.3 survival_3.5-8 [83] tidyselect_1.2.1 miniUI_0.1.1.1 [85] knitr_1.46 gridExtra_2.3 [87] xfun_0.43 visNetwork_2.1.2 [89] stringi_1.8.3 lazyeval_0.2.2 [91] boot_1.3-30 evaluate_0.23 [93] codetools_0.2-20 wordcloud_2.6 [95] tibble_3.2.1 cli_3.6.2 [97] xtable_1.8-4 munsell_0.5.1 [99] candisc_0.8-6 Rcpp_1.0.12 [101] parallel_4.3.3 ellipsis_0.3.2 [103] profvis_0.3.8 urlchecker_1.0.1 [105] lme4_1.1-35.3 glmnet_4.1-8 [107] viridisLite_0.4.2 ggthemes_5.1.0 [109] scales_1.3.0 purrr_1.0.2 [111] rlang_1.1.3 mice_3.16.0 ``` ## Python The Python version used is `Python 3.12.0`. The following gives the output of `pip list` for the virtual environment all analyses have been run in. ``` Package Version ----------------- ------------ asttokens 2.4.1 colorama 0.4.6 contourpy 1.2.0 cycler 0.12.1 decorator 5.1.1 deprecation 2.1.0 executing 2.0.1 fonttools 4.46.0 graphviz 0.20.1 intervaltree 3.1.0 ipython 8.18.1 jedi 0.19.1 joblib 1.3.2 kiwisolver 1.4.5 lxml 4.9.3 matplotlib 3.8.2 matplotlib-inline 0.1.6 networkx 3.2.1 numpy 1.26.2 packaging 23.2 pandas 2.1.3 parso 0.8.3 Pillow 10.1.0 pip 24.0 pm4py 2.7.8.4 prompt-toolkit 3.0.41 PuLP 2.8.0 pure-eval 0.2.2 pydotplus 2.0.2 Pygments 2.17.2 pyparsing 3.1.1 python-dateutil 2.8.2 pytz 2023.3.post1 scikit-learn 1.4.0 scipy 1.11.4 six 1.16.0 sortedcontainers 2.4.0 stack-data 0.6.3 StringDist 1.0.9 threadpoolctl 3.2.0 tqdm 4.66.1 traitlets 5.14.0 tzdata 2023.3 wcwidth 0.2.12 ```