2024-04-17 14:33:24 +02:00
|
|
|
# Accompanying Analysis Code for the Master Thesis "Analyzing Log Data from Multi-Touch Tables"
|
2024-04-09 17:46:01 +02:00
|
|
|
|
|
|
|
The multi-touch table at the Herzog-Anton-Ulrich-Museum (HAUM) in
|
|
|
|
Braunschweig gives visitors of the Museum the opportunity to interact with
|
|
|
|
about 70 artworks and 3 virtual cards containing information about the
|
|
|
|
museum and its layout. The table was installed at the museum in October
|
|
|
|
2016 and since November 2016 log files from interactions of visitors of the
|
|
|
|
museum have been collected. The master thesis for which this repository was
|
2024-04-17 14:33:24 +02:00
|
|
|
created analyzes data collected between December 14, 2016 and July 5, 2023.
|
2024-04-09 17:46:01 +02:00
|
|
|
In total, the data set consists of 39,767 log files containing 6,700,176
|
|
|
|
events.
|
|
|
|
|
|
|
|
The following gives a short overview over the analyses conducted. All
|
2024-04-17 14:33:24 +02:00
|
|
|
analysis scripts can be found in the `/code/` folder. The complete folder
|
|
|
|
structure of this project looks like this (not all folders are committed to
|
|
|
|
the repository):
|
|
|
|
|
|
|
|
```
|
|
|
|
/<parent_folder>/
|
|
|
|
|
|
|
|
|
|- /code/
|
|
|
|
|- /data/
|
|
|
|
|- /haum/
|
|
|
|
|- /ContentEyevisit/
|
|
|
|
|- /LogFiles/
|
|
|
|
|- /metadata/
|
|
|
|
|- /figures/
|
|
|
|
|- /results/
|
|
|
|
|-- README.md
|
|
|
|
```
|
2024-04-09 17:46:01 +02:00
|
|
|
|
|
|
|
## Preprocessing and Descriptives
|
|
|
|
|
|
|
|
The first script `01_preprocessing.R` preprocesses the raw log files by
|
|
|
|
first parsing them so they are readable by standard statistics software
|
|
|
|
like R or Python and then converting it to event logs. A short R package
|
|
|
|
doing the preprocessing and more information can be found at
|
2024-03-22 16:39:32 +01:00
|
|
|
<https://gitea.iwm-tuebingen.de/R/mtt>.
|
|
|
|
|
2024-04-09 17:46:01 +02:00
|
|
|
The second script `02_descriptives.R` calculates some descriptive
|
|
|
|
statistics and creates plots to get an overall feeling for the data set.
|
|
|
|
|
|
|
|
## Conformance Checking
|
|
|
|
|
|
|
|
A normative Petri net to test the data quality after the preprocessing is
|
|
|
|
created in `03_create-petrinet.py` and the actual data quality check is
|
|
|
|
done in `04_conformance-checking.py`. Both scripts are written in Python
|
|
|
|
using the pm4py library. For more information and the full documentation go
|
|
|
|
to <https://pm4py.fit.fraunhofer.de/>.
|
|
|
|
|
|
|
|
The next script `05_check-traces.R` (written in R again) checks the corrupt
|
|
|
|
trace found during conformance checking and exports the cleaned data sets
|
|
|
|
used for the following analyses.
|
|
|
|
|
|
|
|
## Clustering of Items
|
|
|
|
|
|
|
|
To answer the first research question in the thesis "Do interaction
|
|
|
|
patterns look different for different artworks? (Control-flow perspective)"
|
|
|
|
process mining was applied to all paths separately for each item on the
|
|
|
|
multi-touch table. Fitness, precision, generalizability, simplicity,
|
|
|
|
soundness, number of connecting arcs, number of transitions, number of
|
|
|
|
places, number of different variants, and the most frequent variant were
|
|
|
|
obtained and saved to a CSV file (Python script `06_infos-items.py`). These
|
|
|
|
information were then read into R in the next script
|
|
|
|
(`07_item-clustering.R`) and used (together with other features) for
|
|
|
|
hierarchical clustering.
|
|
|
|
|
|
|
|
## Clustering of Cases
|
|
|
|
|
|
|
|
For the second research question "What kind of patterns exist and are there
|
|
|
|
typical user behaviors? (Case perspective)" six indicator variables for
|
|
|
|
five proposed user navigation types were calculated in
|
|
|
|
`08_case-characteristics.R` and then used for hierarchical clustering und
|
|
|
|
recursive partitioning to extract the different navigation types in script
|
|
|
|
`09_user-navigation.R`. A validation of the results for data from 2018 was
|
|
|
|
done in `10_validation.R`. Different variants for the cases for the
|
|
|
|
complete data set and the data used for investigating the navigation types
|
|
|
|
(all log files from 2019) was done in `11_investigate-variants.R` and the
|
|
|
|
found clusters of the navigation types were further investigated with
|
|
|
|
process mining techniques in R (`12_dfgs-case-clusters.R`) and Python
|
|
|
|
(`13_pm-case-clusters.py`).
|
2024-03-22 15:58:30 +01:00
|
|
|
|