mtt_haum/README.md

# Accompanying Analysis Code for the Master Thesis "XXX"

The multi-touch table at the Herzog-Anton-Ulrich-Museum (HAUM) in
Braunschweig gives visitors of the Museum the opportunity to interact with
about 70 artworks and 3 virtual cards containing information about the
museum and its layout. The table was installed at the museum in October
2016 and since November 2016 log files from interactions of visitors of the
museum have been collected. The master thesis for which this repository was
created analyzed data collected between December 14, 2016 and July 5, 2023.
In total, the data set consists of 39,767 log files containing 6,700,176
events.

The following gives a short overview over the analyses conducted. All
analysis scripts can be found in the `/code/` folder.

## Preprocessing and Descriptives

The first script `01_preprocessing.R` preprocesses the raw log files by
first parsing them so they are readable by standard statistics software
like R or Python and then converting it to event logs. A short R package
doing the preprocessing and more information can be found at
<https://gitea.iwm-tuebingen.de/R/mtt>.

The second script `02_descriptives.R` calculates some descriptive
statistics and creates plots to get an overall feeling for the data set.

## Conformance Checking

A normative Petri net to test the data quality after the preprocessing is
created in `03_create-petrinet.py` and the actual data quality check is
done in `04_conformance-checking.py`. Both scripts are written in Python
using the pm4py library. For more information and the full documentation go
to <https://pm4py.fit.fraunhofer.de/>.

The next script `05_check-traces.R` (written in R again) checks the corrupt
trace found during conformance checking and exports the cleaned data sets
used for the following analyses.

## Clustering of Items

To answer the first research question in the thesis "Do interaction
patterns look different for different artworks? (Control-flow perspective)"
process mining was applied to all paths separately for each item on the
multi-touch table. Fitness, precision, generalizability, simplicity,
soundness, number of connecting arcs, number of transitions, number of
places, number of different variants, and the most frequent variant were
obtained and saved to a CSV file (Python script `06_infos-items.py`). These
information were then read into R in the next script
(`07_item-clustering.R`) and used (together with other features) for
hierarchical clustering.

## Clustering of Cases

For the second research question "What kind of patterns exist and are there
typical user behaviors? (Case perspective)" six indicator variables for
five proposed user navigation types were calculated in
`08_case-characteristics.R` and then used for hierarchical clustering und
recursive partitioning to extract the different navigation types in script
`09_user-navigation.R`. A validation of the results for data from 2018 was
done in `10_validation.R`. Different variants for the cases for the
complete data set and the data used for investigating the navigation types
(all log files from 2019) was done in `11_investigate-variants.R` and the
found clusters of the navigation types were further investigated with
process mining techniques in R (`12_dfgs-case-clusters.R`) and Python
(`13_pm-case-clusters.py`).
Updated README so it reflects analysis structure; moved technical stuff about preprocessing to mtt package 2024-04-09 17:46:01 +02:00			`# Accompanying Analysis Code for the Master Thesis "XXX"`

			`The multi-touch table at the Herzog-Anton-Ulrich-Museum (HAUM) in`
			`Braunschweig gives visitors of the Museum the opportunity to interact with`
			`about 70 artworks and 3 virtual cards containing information about the`
			`museum and its layout. The table was installed at the museum in October`
			`2016 and since November 2016 log files from interactions of visitors of the`
			`museum have been collected. The master thesis for which this repository was`
			`created analyzed data collected between December 14, 2016 and July 5, 2023.`
			`In total, the data set consists of 39,767 log files containing 6,700,176`
			`events.`

			`The following gives a short overview over the analyses conducted. All`
			analysis scripts can be found in the `/code/` folder.

			`## Preprocessing and Descriptives`

			The first script `01_preprocessing.R` preprocesses the raw log files by
			`first parsing them so they are readable by standard statistics software`
			`like R or Python and then converting it to event logs. A short R package`
			`doing the preprocessing and more information can be found at`
Added link to mtt package 2024-03-22 16:39:32 +01:00			`<https://gitea.iwm-tuebingen.de/R/mtt>.`

Updated README so it reflects analysis structure; moved technical stuff about preprocessing to mtt package 2024-04-09 17:46:01 +02:00			The second script `02_descriptives.R` calculates some descriptive
			`statistics and creates plots to get an overall feeling for the data set.`

			`## Conformance Checking`

			`A normative Petri net to test the data quality after the preprocessing is`
			created in `03_create-petrinet.py` and the actual data quality check is
			done in `04_conformance-checking.py`. Both scripts are written in Python
			`using the pm4py library. For more information and the full documentation go`
			`to <https://pm4py.fit.fraunhofer.de/>.`

			The next script `05_check-traces.R` (written in R again) checks the corrupt
			`trace found during conformance checking and exports the cleaned data sets`
			`used for the following analyses.`

			`## Clustering of Items`

			`To answer the first research question in the thesis "Do interaction`
			`patterns look different for different artworks? (Control-flow perspective)"`
			`process mining was applied to all paths separately for each item on the`
			`multi-touch table. Fitness, precision, generalizability, simplicity,`
			`soundness, number of connecting arcs, number of transitions, number of`
			`places, number of different variants, and the most frequent variant were`
			obtained and saved to a CSV file (Python script `06_infos-items.py`). These
			`information were then read into R in the next script`
			(`07_item-clustering.R`) and used (together with other features) for
			`hierarchical clustering.`

			`## Clustering of Cases`

			`For the second research question "What kind of patterns exist and are there`
			`typical user behaviors? (Case perspective)" six indicator variables for`
			`five proposed user navigation types were calculated in`
			`08_case-characteristics.R` and then used for hierarchical clustering und
			`recursive partitioning to extract the different navigation types in script`
			`09_user-navigation.R`. A validation of the results for data from 2018 was
			done in `10_validation.R`. Different variants for the cases for the
			`complete data set and the data used for investigating the navigation types`
			(all log files from 2019) was done in `11_investigate-variants.R` and the
			`found clusters of the navigation types were further investigated with`
			process mining techniques in R (`12_dfgs-case-clusters.R`) and Python
			(`13_pm-case-clusters.py`).
Updated README.Rmd and exported as github_document 2024-03-22 15:58:30 +01:00