mtt_haum/README.md

83 lines
3.6 KiB
Markdown
Raw Normal View History

2024-04-17 14:33:24 +02:00
# Accompanying Analysis Code for the Master Thesis "Analyzing Log Data from Multi-Touch Tables"
The multi-touch table at the Herzog-Anton-Ulrich-Museum (HAUM) in
Braunschweig gives visitors of the Museum the opportunity to interact with
about 70 artworks and 3 virtual cards containing information about the
museum and its layout. The table was installed at the museum in October
2016 and since November 2016 log files from interactions of visitors of the
museum have been collected. The master thesis for which this repository was
2024-04-17 14:33:24 +02:00
created analyzes data collected between December 14, 2016 and July 5, 2023.
In total, the data set consists of 39,767 log files containing 6,700,176
events.
The following gives a short overview over the analyses conducted. All
2024-04-17 14:33:24 +02:00
analysis scripts can be found in the `/code/` folder. The complete folder
structure of this project looks like this (not all folders are committed to
the repository):
```
/<parent_folder>/
|
|- /code/
|- /data/
|- /haum/
|- /ContentEyevisit/
|- /LogFiles/
|- /metadata/
|- /figures/
|- /results/
|-- README.md
```
## Preprocessing and Descriptives
The first script `01_preprocessing.R` preprocesses the raw log files by
first parsing them so they are readable by standard statistics software
like R or Python and then converting it to event logs. A short R package
doing the preprocessing and more information can be found at
2024-03-22 16:39:32 +01:00
<https://gitea.iwm-tuebingen.de/R/mtt>.
The second script `02_descriptives.R` calculates some descriptive
statistics and creates plots to get an overall feeling for the data set.
## Conformance Checking
A normative Petri net to test the data quality after the preprocessing is
created in `03_create-petrinet.py` and the actual data quality check is
done in `04_conformance-checking.py`. Both scripts are written in Python
using the pm4py library. For more information and the full documentation go
to <https://pm4py.fit.fraunhofer.de/>.
The next script `05_check-traces.R` (written in R again) checks the corrupt
trace found during conformance checking and exports the cleaned data sets
used for the following analyses.
## Clustering of Items
To answer the first research question in the thesis "Do interaction
patterns look different for different artworks? (Control-flow perspective)"
process mining was applied to all paths separately for each item on the
multi-touch table. Fitness, precision, generalizability, simplicity,
soundness, number of connecting arcs, number of transitions, number of
places, number of different variants, and the most frequent variant were
obtained and saved to a CSV file (Python script `06_infos-items.py`). These
information were then read into R in the next script
(`07_item-clustering.R`) and used (together with other features) for
hierarchical clustering.
## Clustering of Cases
For the second research question "What kind of patterns exist and are there
typical user behaviors? (Case perspective)" six indicator variables for
five proposed user navigation types were calculated in
`08_case-characteristics.R` and then used for hierarchical clustering und
recursive partitioning to extract the different navigation types in script
`09_user-navigation.R`. A validation of the results for data from 2018 was
done in `10_validation.R`. Different variants for the cases for the
complete data set and the data used for investigating the navigation types
(all log files from 2019) was done in `11_investigate-variants.R` and the
found clusters of the navigation types were further investigated with
process mining techniques in R (`12_dfgs-case-clusters.R`) and Python
(`13_pm-case-clusters.py`).