2024-04-17 14:33:24 +02:00
|
|
|
# Accompanying Analysis Code for the Master Thesis "Analyzing Log Data from Multi-Touch Tables"
|
2024-04-09 17:46:01 +02:00
|
|
|
|
|
|
|
The multi-touch table at the Herzog-Anton-Ulrich-Museum (HAUM) in
|
|
|
|
Braunschweig gives visitors of the Museum the opportunity to interact with
|
|
|
|
about 70 artworks and 3 virtual cards containing information about the
|
|
|
|
museum and its layout. The table was installed at the museum in October
|
|
|
|
2016 and since November 2016 log files from interactions of visitors of the
|
|
|
|
museum have been collected. The master thesis for which this repository was
|
2024-04-17 14:33:24 +02:00
|
|
|
created analyzes data collected between December 14, 2016 and July 5, 2023.
|
2024-04-09 17:46:01 +02:00
|
|
|
In total, the data set consists of 39,767 log files containing 6,700,176
|
|
|
|
events.
|
|
|
|
|
|
|
|
The following gives a short overview over the analyses conducted. All
|
2024-04-17 14:33:24 +02:00
|
|
|
analysis scripts can be found in the `/code/` folder. The complete folder
|
|
|
|
structure of this project looks like this (not all folders are committed to
|
|
|
|
the repository):
|
|
|
|
|
|
|
|
```
|
|
|
|
/<parent_folder>/
|
|
|
|
|
|
|
|
|
|- /code/
|
|
|
|
|- /data/
|
|
|
|
|- /haum/
|
|
|
|
|- /ContentEyevisit/
|
|
|
|
|- /LogFiles/
|
|
|
|
|- /metadata/
|
|
|
|
|- /figures/
|
|
|
|
|- /results/
|
|
|
|
|-- README.md
|
|
|
|
```
|
2024-04-09 17:46:01 +02:00
|
|
|
|
|
|
|
## Preprocessing and Descriptives
|
|
|
|
|
|
|
|
The first script `01_preprocessing.R` preprocesses the raw log files by
|
|
|
|
first parsing them so they are readable by standard statistics software
|
|
|
|
like R or Python and then converting it to event logs. A short R package
|
|
|
|
doing the preprocessing and more information can be found at
|
2024-03-22 16:39:32 +01:00
|
|
|
<https://gitea.iwm-tuebingen.de/R/mtt>.
|
|
|
|
|
2024-04-09 17:46:01 +02:00
|
|
|
The second script `02_descriptives.R` calculates some descriptive
|
|
|
|
statistics and creates plots to get an overall feeling for the data set.
|
|
|
|
|
|
|
|
## Conformance Checking
|
|
|
|
|
|
|
|
A normative Petri net to test the data quality after the preprocessing is
|
|
|
|
created in `03_create-petrinet.py` and the actual data quality check is
|
|
|
|
done in `04_conformance-checking.py`. Both scripts are written in Python
|
|
|
|
using the pm4py library. For more information and the full documentation go
|
|
|
|
to <https://pm4py.fit.fraunhofer.de/>.
|
|
|
|
|
|
|
|
The next script `05_check-traces.R` (written in R again) checks the corrupt
|
|
|
|
trace found during conformance checking and exports the cleaned data sets
|
|
|
|
used for the following analyses.
|
|
|
|
|
|
|
|
## Clustering of Items
|
|
|
|
|
|
|
|
To answer the first research question in the thesis "Do interaction
|
|
|
|
patterns look different for different artworks? (Control-flow perspective)"
|
|
|
|
process mining was applied to all paths separately for each item on the
|
|
|
|
multi-touch table. Fitness, precision, generalizability, simplicity,
|
|
|
|
soundness, number of connecting arcs, number of transitions, number of
|
|
|
|
places, number of different variants, and the most frequent variant were
|
|
|
|
obtained and saved to a CSV file (Python script `06_infos-items.py`). These
|
|
|
|
information were then read into R in the next script
|
|
|
|
(`07_item-clustering.R`) and used (together with other features) for
|
|
|
|
hierarchical clustering.
|
|
|
|
|
|
|
|
## Clustering of Cases
|
|
|
|
|
|
|
|
For the second research question "What kind of patterns exist and are there
|
|
|
|
typical user behaviors? (Case perspective)" six indicator variables for
|
|
|
|
five proposed user navigation types were calculated in
|
|
|
|
`08_case-characteristics.R` and then used for hierarchical clustering und
|
|
|
|
recursive partitioning to extract the different navigation types in script
|
|
|
|
`09_user-navigation.R`. A validation of the results for data from 2018 was
|
|
|
|
done in `10_validation.R`. Different variants for the cases for the
|
|
|
|
complete data set and the data used for investigating the navigation types
|
|
|
|
(all log files from 2019) was done in `11_investigate-variants.R` and the
|
|
|
|
found clusters of the navigation types were further investigated with
|
|
|
|
process mining techniques in R (`12_dfgs-case-clusters.R`) and Python
|
|
|
|
(`13_pm-case-clusters.py`).
|
2024-03-22 15:58:30 +01:00
|
|
|
|
2024-05-02 18:53:17 +02:00
|
|
|
# Software versions
|
|
|
|
|
|
|
|
## R
|
|
|
|
|
|
|
|
The following gives the output of `sessionInfo()` with all relevant
|
|
|
|
packages attached.
|
|
|
|
|
|
|
|
```
|
|
|
|
R version 4.3.3 (2024-02-29 ucrt)
|
|
|
|
Platform: x86_64-w64-mingw32/x64 (64-bit)
|
|
|
|
Running under: Windows 10 x64 (build 19045)
|
|
|
|
|
|
|
|
Matrix products: default
|
|
|
|
|
|
|
|
|
|
|
|
locale:
|
|
|
|
[1] LC_COLLATE=German_Germany.utf8
|
|
|
|
[2] LC_CTYPE=German_Germany.utf8
|
|
|
|
[3] LC_MONETARY=German_Germany.utf8
|
|
|
|
[4] LC_NUMERIC=C
|
|
|
|
[5] LC_TIME=German_Germany.utf8
|
|
|
|
|
|
|
|
time zone: Europe/Berlin
|
|
|
|
tzcode source: internal
|
|
|
|
|
|
|
|
attached base packages:
|
|
|
|
[1] grid stats graphics grDevices utils
|
|
|
|
[6] datasets methods base
|
|
|
|
|
|
|
|
other attached packages:
|
|
|
|
[1] igraph_2.0.3 DiagrammeR_1.0.11 partykit_1.2-20
|
|
|
|
[4] mvtnorm_1.2-4 libcoin_1.0-10 rpart_4.1.23
|
|
|
|
[7] rgl_1.3.1 smacof_2.1-6 e1071_1.7-14
|
|
|
|
[10] colorspace_2.1-0 plotrix_3.8-4 vioplot_0.4.0
|
|
|
|
[13] zoo_1.8-12 sm_2.2-6.0 cluster_2.1.6
|
|
|
|
[16] factoextra_1.0.7 ggplot2_3.5.0 dplyr_1.1.4
|
|
|
|
[19] pbapply_1.7-2 petrinetR_0.3.0 processmapR_0.5.3
|
|
|
|
[22] eventdataR_0.3.1 edeaR_0.9.4 bupaR_0.5.4
|
|
|
|
[25] lattice_0.22-6 devtools_2.4.5 usethis_2.2.3
|
|
|
|
[28] mtt_0.0-1
|
|
|
|
|
|
|
|
loaded via a namespace (and not attached):
|
|
|
|
[1] RColorBrewer_1.1-3 rstudioapi_0.16.0
|
|
|
|
[3] jsonlite_1.8.8 shape_1.4.6.1
|
|
|
|
[5] magrittr_2.0.3 jomo_2.7-6
|
|
|
|
[7] nloptr_2.0.3 rmarkdown_2.26
|
|
|
|
[9] fs_1.6.3 vctrs_0.6.5
|
|
|
|
[11] heplots_1.6.2 memoise_2.0.1
|
|
|
|
[13] minqa_1.2.6 base64enc_0.1-3
|
|
|
|
[15] htmltools_0.5.8.1 forcats_1.0.0
|
|
|
|
[17] polynom_1.4-1 weights_1.0.4
|
|
|
|
[19] broom_1.0.5 Formula_1.2-5
|
|
|
|
[21] mitml_0.4-5 htmlwidgets_1.6.4
|
|
|
|
[23] plotly_4.10.4 lubridate_1.9.3
|
|
|
|
[25] cachem_1.0.8 mime_0.12
|
|
|
|
[27] lifecycle_1.0.4 iterators_1.0.14
|
|
|
|
[29] pkgconfig_2.0.3 Matrix_1.6-5
|
|
|
|
[31] R6_2.5.1 fastmap_1.1.1
|
|
|
|
[33] shiny_1.8.1.1 digest_0.6.35
|
|
|
|
[35] shinyTime_1.0.3 pkgload_1.3.4
|
|
|
|
[37] ellipse_0.5.0 Hmisc_5.1-2
|
|
|
|
[39] fansi_1.0.6 timechange_0.3.0
|
|
|
|
[41] nnls_1.5 gdata_3.0.0
|
|
|
|
[43] abind_1.4-5 httr_1.4.7
|
|
|
|
[45] compiler_4.3.3 proxy_0.4-27
|
|
|
|
[47] remotes_2.5.0 withr_3.0.0
|
|
|
|
[49] doParallel_1.0.17 htmlTable_2.4.2
|
|
|
|
[51] backports_1.4.1 carData_3.0-5
|
|
|
|
[53] pkgbuild_1.4.4 pan_1.9
|
|
|
|
[55] MASS_7.3-60.0.1 sessioninfo_1.2.2
|
|
|
|
[57] gtools_3.9.5 tools_4.3.3
|
|
|
|
[59] foreign_0.8-86 httpuv_1.6.15
|
|
|
|
[61] nnet_7.3-19 glue_1.7.0
|
|
|
|
[63] inum_1.0-5 nlme_3.1-164
|
|
|
|
[65] promises_1.3.0 checkmate_2.3.1
|
|
|
|
[67] generics_0.1.3 gtable_0.3.4
|
|
|
|
[69] class_7.3-22 tidyr_1.3.1
|
|
|
|
[71] data.table_1.15.4 hms_1.1.3
|
|
|
|
[73] car_3.1-2 xml2_1.3.6
|
|
|
|
[75] utf8_1.2.4 ggrepel_0.9.5
|
|
|
|
[77] foreach_1.5.2 pillar_1.9.0
|
|
|
|
[79] stringr_1.5.1 later_1.3.2
|
|
|
|
[81] splines_4.3.3 survival_3.5-8
|
|
|
|
[83] tidyselect_1.2.1 miniUI_0.1.1.1
|
|
|
|
[85] knitr_1.46 gridExtra_2.3
|
|
|
|
[87] xfun_0.43 visNetwork_2.1.2
|
|
|
|
[89] stringi_1.8.3 lazyeval_0.2.2
|
|
|
|
[91] boot_1.3-30 evaluate_0.23
|
|
|
|
[93] codetools_0.2-20 wordcloud_2.6
|
|
|
|
[95] tibble_3.2.1 cli_3.6.2
|
|
|
|
[97] xtable_1.8-4 munsell_0.5.1
|
|
|
|
[99] candisc_0.8-6 Rcpp_1.0.12
|
|
|
|
[101] parallel_4.3.3 ellipsis_0.3.2
|
|
|
|
[103] profvis_0.3.8 urlchecker_1.0.1
|
|
|
|
[105] lme4_1.1-35.3 glmnet_4.1-8
|
|
|
|
[107] viridisLite_0.4.2 ggthemes_5.1.0
|
|
|
|
[109] scales_1.3.0 purrr_1.0.2
|
|
|
|
[111] rlang_1.1.3 mice_3.16.0
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## Python
|
|
|
|
|
|
|
|
The Python version used is `Python 3.12.0`. The following gives the output
|
|
|
|
of `pip list` for the virtual environment all analyses have been run in.
|
|
|
|
|
|
|
|
```
|
|
|
|
Package Version
|
|
|
|
----------------- ------------
|
|
|
|
asttokens 2.4.1
|
|
|
|
colorama 0.4.6
|
|
|
|
contourpy 1.2.0
|
|
|
|
cycler 0.12.1
|
|
|
|
decorator 5.1.1
|
|
|
|
deprecation 2.1.0
|
|
|
|
executing 2.0.1
|
|
|
|
fonttools 4.46.0
|
|
|
|
graphviz 0.20.1
|
|
|
|
intervaltree 3.1.0
|
|
|
|
ipython 8.18.1
|
|
|
|
jedi 0.19.1
|
|
|
|
joblib 1.3.2
|
|
|
|
kiwisolver 1.4.5
|
|
|
|
lxml 4.9.3
|
|
|
|
matplotlib 3.8.2
|
|
|
|
matplotlib-inline 0.1.6
|
|
|
|
networkx 3.2.1
|
|
|
|
numpy 1.26.2
|
|
|
|
packaging 23.2
|
|
|
|
pandas 2.1.3
|
|
|
|
parso 0.8.3
|
|
|
|
Pillow 10.1.0
|
|
|
|
pip 24.0
|
|
|
|
pm4py 2.7.8.4
|
|
|
|
prompt-toolkit 3.0.41
|
|
|
|
PuLP 2.8.0
|
|
|
|
pure-eval 0.2.2
|
|
|
|
pydotplus 2.0.2
|
|
|
|
Pygments 2.17.2
|
|
|
|
pyparsing 3.1.1
|
|
|
|
python-dateutil 2.8.2
|
|
|
|
pytz 2023.3.post1
|
|
|
|
scikit-learn 1.4.0
|
|
|
|
scipy 1.11.4
|
|
|
|
six 1.16.0
|
|
|
|
sortedcontainers 2.4.0
|
|
|
|
stack-data 0.6.3
|
|
|
|
StringDist 1.0.9
|
|
|
|
threadpoolctl 3.2.0
|
|
|
|
tqdm 4.66.1
|
|
|
|
traitlets 5.14.0
|
|
|
|
tzdata 2023.3
|
|
|
|
wcwidth 0.2.12
|
|
|
|
```
|
|
|
|
|