Analysis of log data from Multi-Touch-Table at Herzog-Anton-Ulrich-Museum (HAUM)
Go to file
2024-05-02 18:53:17 +02:00
code Removed NA removal since it is not needed 2024-04-26 12:35:01 +02:00
README.md Added R and Python versions and package info to README 2024-05-02 18:53:17 +02:00

Accompanying Analysis Code for the Master Thesis "Analyzing Log Data from Multi-Touch Tables"

The multi-touch table at the Herzog-Anton-Ulrich-Museum (HAUM) in Braunschweig gives visitors of the Museum the opportunity to interact with about 70 artworks and 3 virtual cards containing information about the museum and its layout. The table was installed at the museum in October 2016 and since November 2016 log files from interactions of visitors of the museum have been collected. The master thesis for which this repository was created analyzes data collected between December 14, 2016 and July 5, 2023. In total, the data set consists of 39,767 log files containing 6,700,176 events.

The following gives a short overview over the analyses conducted. All analysis scripts can be found in the /code/ folder. The complete folder structure of this project looks like this (not all folders are committed to the repository):

/<parent_folder>/
|
|- /code/
|- /data/
    |- /haum/
        |- /ContentEyevisit/
        |- /LogFiles/
    |- /metadata/
|- /figures/
|- /results/
|-- README.md

Preprocessing and Descriptives

The first script 01_preprocessing.R preprocesses the raw log files by first parsing them so they are readable by standard statistics software like R or Python and then converting it to event logs. A short R package doing the preprocessing and more information can be found at https://gitea.iwm-tuebingen.de/R/mtt.

The second script 02_descriptives.R calculates some descriptive statistics and creates plots to get an overall feeling for the data set.

Conformance Checking

A normative Petri net to test the data quality after the preprocessing is created in 03_create-petrinet.py and the actual data quality check is done in 04_conformance-checking.py. Both scripts are written in Python using the pm4py library. For more information and the full documentation go to https://pm4py.fit.fraunhofer.de/.

The next script 05_check-traces.R (written in R again) checks the corrupt trace found during conformance checking and exports the cleaned data sets used for the following analyses.

Clustering of Items

To answer the first research question in the thesis "Do interaction patterns look different for different artworks? (Control-flow perspective)" process mining was applied to all paths separately for each item on the multi-touch table. Fitness, precision, generalizability, simplicity, soundness, number of connecting arcs, number of transitions, number of places, number of different variants, and the most frequent variant were obtained and saved to a CSV file (Python script 06_infos-items.py). These information were then read into R in the next script (07_item-clustering.R) and used (together with other features) for hierarchical clustering.

Clustering of Cases

For the second research question "What kind of patterns exist and are there typical user behaviors? (Case perspective)" six indicator variables for five proposed user navigation types were calculated in 08_case-characteristics.R and then used for hierarchical clustering und recursive partitioning to extract the different navigation types in script 09_user-navigation.R. A validation of the results for data from 2018 was done in 10_validation.R. Different variants for the cases for the complete data set and the data used for investigating the navigation types (all log files from 2019) was done in 11_investigate-variants.R and the found clusters of the navigation types were further investigated with process mining techniques in R (12_dfgs-case-clusters.R) and Python (13_pm-case-clusters.py).

Software versions

R

The following gives the output of sessionInfo() with all relevant packages attached.

R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=German_Germany.utf8 
[2] LC_CTYPE=German_Germany.utf8   
[3] LC_MONETARY=German_Germany.utf8
[4] LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] grid      stats     graphics  grDevices utils    
[6] datasets  methods   base     

other attached packages:
 [1] igraph_2.0.3      DiagrammeR_1.0.11 partykit_1.2-20  
 [4] mvtnorm_1.2-4     libcoin_1.0-10    rpart_4.1.23     
 [7] rgl_1.3.1         smacof_2.1-6      e1071_1.7-14     
[10] colorspace_2.1-0  plotrix_3.8-4     vioplot_0.4.0    
[13] zoo_1.8-12        sm_2.2-6.0        cluster_2.1.6    
[16] factoextra_1.0.7  ggplot2_3.5.0     dplyr_1.1.4      
[19] pbapply_1.7-2     petrinetR_0.3.0   processmapR_0.5.3
[22] eventdataR_0.3.1  edeaR_0.9.4       bupaR_0.5.4      
[25] lattice_0.22-6    devtools_2.4.5    usethis_2.2.3    
[28] mtt_0.0-1        

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3 rstudioapi_0.16.0 
  [3] jsonlite_1.8.8     shape_1.4.6.1     
  [5] magrittr_2.0.3     jomo_2.7-6        
  [7] nloptr_2.0.3       rmarkdown_2.26    
  [9] fs_1.6.3           vctrs_0.6.5       
 [11] heplots_1.6.2      memoise_2.0.1     
 [13] minqa_1.2.6        base64enc_0.1-3   
 [15] htmltools_0.5.8.1  forcats_1.0.0     
 [17] polynom_1.4-1      weights_1.0.4     
 [19] broom_1.0.5        Formula_1.2-5     
 [21] mitml_0.4-5        htmlwidgets_1.6.4 
 [23] plotly_4.10.4      lubridate_1.9.3   
 [25] cachem_1.0.8       mime_0.12         
 [27] lifecycle_1.0.4    iterators_1.0.14  
 [29] pkgconfig_2.0.3    Matrix_1.6-5      
 [31] R6_2.5.1           fastmap_1.1.1     
 [33] shiny_1.8.1.1      digest_0.6.35     
 [35] shinyTime_1.0.3    pkgload_1.3.4     
 [37] ellipse_0.5.0      Hmisc_5.1-2       
 [39] fansi_1.0.6        timechange_0.3.0  
 [41] nnls_1.5           gdata_3.0.0       
 [43] abind_1.4-5        httr_1.4.7        
 [45] compiler_4.3.3     proxy_0.4-27      
 [47] remotes_2.5.0      withr_3.0.0       
 [49] doParallel_1.0.17  htmlTable_2.4.2   
 [51] backports_1.4.1    carData_3.0-5     
 [53] pkgbuild_1.4.4     pan_1.9           
 [55] MASS_7.3-60.0.1    sessioninfo_1.2.2 
 [57] gtools_3.9.5       tools_4.3.3       
 [59] foreign_0.8-86     httpuv_1.6.15     
 [61] nnet_7.3-19        glue_1.7.0        
 [63] inum_1.0-5         nlme_3.1-164      
 [65] promises_1.3.0     checkmate_2.3.1   
 [67] generics_0.1.3     gtable_0.3.4      
 [69] class_7.3-22       tidyr_1.3.1       
 [71] data.table_1.15.4  hms_1.1.3         
 [73] car_3.1-2          xml2_1.3.6        
 [75] utf8_1.2.4         ggrepel_0.9.5     
 [77] foreach_1.5.2      pillar_1.9.0      
 [79] stringr_1.5.1      later_1.3.2       
 [81] splines_4.3.3      survival_3.5-8    
 [83] tidyselect_1.2.1   miniUI_0.1.1.1    
 [85] knitr_1.46         gridExtra_2.3     
 [87] xfun_0.43          visNetwork_2.1.2  
 [89] stringi_1.8.3      lazyeval_0.2.2    
 [91] boot_1.3-30        evaluate_0.23     
 [93] codetools_0.2-20   wordcloud_2.6     
 [95] tibble_3.2.1       cli_3.6.2         
 [97] xtable_1.8-4       munsell_0.5.1     
 [99] candisc_0.8-6      Rcpp_1.0.12       
[101] parallel_4.3.3     ellipsis_0.3.2    
[103] profvis_0.3.8      urlchecker_1.0.1  
[105] lme4_1.1-35.3      glmnet_4.1-8      
[107] viridisLite_0.4.2  ggthemes_5.1.0    
[109] scales_1.3.0       purrr_1.0.2       
[111] rlang_1.1.3        mice_3.16.0       

Python

The Python version used is Python 3.12.0. The following gives the output of pip list for the virtual environment all analyses have been run in.

Package           Version
----------------- ------------
asttokens         2.4.1
colorama          0.4.6
contourpy         1.2.0
cycler            0.12.1
decorator         5.1.1
deprecation       2.1.0
executing         2.0.1
fonttools         4.46.0
graphviz          0.20.1
intervaltree      3.1.0
ipython           8.18.1
jedi              0.19.1
joblib            1.3.2
kiwisolver        1.4.5
lxml              4.9.3
matplotlib        3.8.2
matplotlib-inline 0.1.6
networkx          3.2.1
numpy             1.26.2
packaging         23.2
pandas            2.1.3
parso             0.8.3
Pillow            10.1.0
pip               24.0
pm4py             2.7.8.4
prompt-toolkit    3.0.41
PuLP              2.8.0
pure-eval         0.2.2
pydotplus         2.0.2
Pygments          2.17.2
pyparsing         3.1.1
python-dateutil   2.8.2
pytz              2023.3.post1
scikit-learn      1.4.0
scipy             1.11.4
six               1.16.0
sortedcontainers  2.4.0
stack-data        0.6.3
StringDist        1.0.9
threadpoolctl     3.2.0
tqdm              4.66.1
traitlets         5.14.0
tzdata            2023.3
wcwidth           0.2.12