code | ||
README.md |
Accompanying Analysis Code for the Master Thesis "Analyzing Log Data from Multi-Touch Tables"
The multi-touch table at the Herzog-Anton-Ulrich-Museum (HAUM) in Braunschweig gives visitors of the Museum the opportunity to interact with about 70 artworks and 3 virtual cards containing information about the museum and its layout. The table was installed at the museum in October 2016 and since November 2016 log files from interactions of visitors of the museum have been collected. The master thesis for which this repository was created analyzes data collected between December 14, 2016 and July 5, 2023. In total, the data set consists of 39,767 log files containing 6,700,176 events.
The following gives a short overview over the analyses conducted. All
analysis scripts can be found in the /code/
folder. The complete folder
structure of this project looks like this (not all folders are committed to
the repository):
/<parent_folder>/
|
|- /code/
|- /data/
|- /haum/
|- /ContentEyevisit/
|- /LogFiles/
|- /metadata/
|- /figures/
|- /results/
|-- README.md
Preprocessing and Descriptives
The first script 01_preprocessing.R
preprocesses the raw log files by
first parsing them so they are readable by standard statistics software
like R or Python and then converting it to event logs. A short R package
doing the preprocessing and more information can be found at
https://gitea.iwm-tuebingen.de/R/mtt.
The second script 02_descriptives.R
calculates some descriptive
statistics and creates plots to get an overall feeling for the data set.
Conformance Checking
A normative Petri net to test the data quality after the preprocessing is
created in 03_create-petrinet.py
and the actual data quality check is
done in 04_conformance-checking.py
. Both scripts are written in Python
using the pm4py library. For more information and the full documentation go
to https://pm4py.fit.fraunhofer.de/.
The next script 05_check-traces.R
(written in R again) checks the corrupt
trace found during conformance checking and exports the cleaned data sets
used for the following analyses.
Clustering of Items
To answer the first research question in the thesis "Do interaction
patterns look different for different artworks? (Control-flow perspective)"
process mining was applied to all paths separately for each item on the
multi-touch table. Fitness, precision, generalizability, simplicity,
soundness, number of connecting arcs, number of transitions, number of
places, number of different variants, and the most frequent variant were
obtained and saved to a CSV file (Python script 06_infos-items.py
). These
information were then read into R in the next script
(07_item-clustering.R
) and used (together with other features) for
hierarchical clustering.
Clustering of Cases
For the second research question "What kind of patterns exist and are there
typical user behaviors? (Case perspective)" six indicator variables for
five proposed user navigation types were calculated in
08_case-characteristics.R
and then used for hierarchical clustering und
recursive partitioning to extract the different navigation types in script
09_user-navigation.R
. A validation of the results for data from 2018 was
done in 10_validation.R
. Different variants for the cases for the
complete data set and the data used for investigating the navigation types
(all log files from 2019) was done in 11_investigate-variants.R
and the
found clusters of the navigation types were further investigated with
process mining techniques in R (12_dfgs-case-clusters.R
) and Python
(13_pm-case-clusters.py
).
Software versions
R
The following gives the output of sessionInfo()
with all relevant
packages attached.
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.utf8
[2] LC_CTYPE=German_Germany.utf8
[3] LC_MONETARY=German_Germany.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=German_Germany.utf8
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] grid stats graphics grDevices utils
[6] datasets methods base
other attached packages:
[1] igraph_2.0.3 DiagrammeR_1.0.11 partykit_1.2-20
[4] mvtnorm_1.2-4 libcoin_1.0-10 rpart_4.1.23
[7] rgl_1.3.1 smacof_2.1-6 e1071_1.7-14
[10] colorspace_2.1-0 plotrix_3.8-4 vioplot_0.4.0
[13] zoo_1.8-12 sm_2.2-6.0 cluster_2.1.6
[16] factoextra_1.0.7 ggplot2_3.5.0 dplyr_1.1.4
[19] pbapply_1.7-2 petrinetR_0.3.0 processmapR_0.5.3
[22] eventdataR_0.3.1 edeaR_0.9.4 bupaR_0.5.4
[25] lattice_0.22-6 devtools_2.4.5 usethis_2.2.3
[28] mtt_0.0-1
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.16.0
[3] jsonlite_1.8.8 shape_1.4.6.1
[5] magrittr_2.0.3 jomo_2.7-6
[7] nloptr_2.0.3 rmarkdown_2.26
[9] fs_1.6.3 vctrs_0.6.5
[11] heplots_1.6.2 memoise_2.0.1
[13] minqa_1.2.6 base64enc_0.1-3
[15] htmltools_0.5.8.1 forcats_1.0.0
[17] polynom_1.4-1 weights_1.0.4
[19] broom_1.0.5 Formula_1.2-5
[21] mitml_0.4-5 htmlwidgets_1.6.4
[23] plotly_4.10.4 lubridate_1.9.3
[25] cachem_1.0.8 mime_0.12
[27] lifecycle_1.0.4 iterators_1.0.14
[29] pkgconfig_2.0.3 Matrix_1.6-5
[31] R6_2.5.1 fastmap_1.1.1
[33] shiny_1.8.1.1 digest_0.6.35
[35] shinyTime_1.0.3 pkgload_1.3.4
[37] ellipse_0.5.0 Hmisc_5.1-2
[39] fansi_1.0.6 timechange_0.3.0
[41] nnls_1.5 gdata_3.0.0
[43] abind_1.4-5 httr_1.4.7
[45] compiler_4.3.3 proxy_0.4-27
[47] remotes_2.5.0 withr_3.0.0
[49] doParallel_1.0.17 htmlTable_2.4.2
[51] backports_1.4.1 carData_3.0-5
[53] pkgbuild_1.4.4 pan_1.9
[55] MASS_7.3-60.0.1 sessioninfo_1.2.2
[57] gtools_3.9.5 tools_4.3.3
[59] foreign_0.8-86 httpuv_1.6.15
[61] nnet_7.3-19 glue_1.7.0
[63] inum_1.0-5 nlme_3.1-164
[65] promises_1.3.0 checkmate_2.3.1
[67] generics_0.1.3 gtable_0.3.4
[69] class_7.3-22 tidyr_1.3.1
[71] data.table_1.15.4 hms_1.1.3
[73] car_3.1-2 xml2_1.3.6
[75] utf8_1.2.4 ggrepel_0.9.5
[77] foreach_1.5.2 pillar_1.9.0
[79] stringr_1.5.1 later_1.3.2
[81] splines_4.3.3 survival_3.5-8
[83] tidyselect_1.2.1 miniUI_0.1.1.1
[85] knitr_1.46 gridExtra_2.3
[87] xfun_0.43 visNetwork_2.1.2
[89] stringi_1.8.3 lazyeval_0.2.2
[91] boot_1.3-30 evaluate_0.23
[93] codetools_0.2-20 wordcloud_2.6
[95] tibble_3.2.1 cli_3.6.2
[97] xtable_1.8-4 munsell_0.5.1
[99] candisc_0.8-6 Rcpp_1.0.12
[101] parallel_4.3.3 ellipsis_0.3.2
[103] profvis_0.3.8 urlchecker_1.0.1
[105] lme4_1.1-35.3 glmnet_4.1-8
[107] viridisLite_0.4.2 ggthemes_5.1.0
[109] scales_1.3.0 purrr_1.0.2
[111] rlang_1.1.3 mice_3.16.0
Python
The Python version used is Python 3.12.0
. The following gives the output
of pip list
for the virtual environment all analyses have been run in.
Package Version
----------------- ------------
asttokens 2.4.1
colorama 0.4.6
contourpy 1.2.0
cycler 0.12.1
decorator 5.1.1
deprecation 2.1.0
executing 2.0.1
fonttools 4.46.0
graphviz 0.20.1
intervaltree 3.1.0
ipython 8.18.1
jedi 0.19.1
joblib 1.3.2
kiwisolver 1.4.5
lxml 4.9.3
matplotlib 3.8.2
matplotlib-inline 0.1.6
networkx 3.2.1
numpy 1.26.2
packaging 23.2
pandas 2.1.3
parso 0.8.3
Pillow 10.1.0
pip 24.0
pm4py 2.7.8.4
prompt-toolkit 3.0.41
PuLP 2.8.0
pure-eval 0.2.2
pydotplus 2.0.2
Pygments 2.17.2
pyparsing 3.1.1
python-dateutil 2.8.2
pytz 2023.3.post1
scikit-learn 1.4.0
scipy 1.11.4
six 1.16.0
sortedcontainers 2.4.0
stack-data 0.6.3
StringDist 1.0.9
threadpoolctl 3.2.0
tqdm 4.66.1
traitlets 5.14.0
tzdata 2023.3
wcwidth 0.2.12