Updated README with all infos on decisions as RMD and MD

2024-04-09 16:55:05 +02:00
parent a7316eacd6
commit a88917459a
6 changed files with 1135 additions and 7 deletions
@@ -0,0 +1 @@
+^README\.Rmd$
@@ -0,0 +1,541 @@
+---
+output: github_document
+---
+
+<!-- README.md is generated from README.Rmd. Please edit that file -->
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>",
+  fig.path = "man/figures/README-",
+  out.width = "100%"
+)
+```
+
+# R package mtt
+
+![mtt package](man/figures/logo.png)
+
+This package was created to process log files obtained from multi-touch
+tables at the Leibniz-Institut für Wissensmedien (IWM).
+
+## Installation
+
+It can be installed via
+
+`devtools::install_git("https://gitea.iwm-tuebingen.de/R/mtt.git")`
+
+If you get an error message, you probably need to install `git2r`first with
+
+`install.packages("git2r")`.
+
+The package depends on the following R packages
+
+* `dplyr`
+* `pbapply`
+* `XML`
+* `lubridate`
+
+so make sure they are installed as well.
+
+# Multi-Touch Table
+
+The multi-touch table at the Herzog-Anton-Ulrich-Museum (HAUM) in
+Braunschweig gives visitors of the Museum the opportunity to interact with
+about 70 artworks and 3 virtual cards containing information about the
+museum and its layout. The table was installed at the museum in October
+2016 and since November 2016 log files from interactions of visitors of the
+museum have been collected. These log files are in an unstructured format
+and cannot be easily analyzed. The purpose of the following document is to
+describe how the data haven been transformed and which decisions have been
+made along the way.
+
+<!--
+The implementation of the steps described here can be found at:
+https://gitea.iwm-tuebingen.de/R/mtt.
+-->
+
+# Data structure
+
+The log files contain lines that indicate the beginning and end of possible
+activities that can be performed when interacting with the artworks on the
+table. The layout of the table looks like pictures have been tossed on a
+large table. Every artwork is visible at the start configuration. People
+can move the pictures on the table, they can be scaled and rotated.
+Additionally, the virtual picture cards can be flipped in order to find
+more information of the artwork on the "back" of the card. One has to press
+a little `i` for more information in one of the bottom corners of the card.
+On the back of the card two to six information cards can be found with a
+teaser text about a certain topic. These topic cards can be opened and a
+hypertext with detailed information opens. Within these hypertexts certain
+technical terms can be clicked for lay people to get more information. This
+also opens up a pop-up. The events encoded in the raw log files therefore
+have the following structure.
+
+```
+"Start Application"     --> Start Application
+"Show Application"
+"Transform start"       --> Move
+"Transform stop"
+"Show Info"             --> Flip Card
+"Show Front"
+"Artwork/OpenCard"      --> Open Topic
+"Artwork/CloseCard"
+"ShowPopup"             --> Open Popup
+"HidePopup"
+```
+
+The right side shows what events can be extracted from these raw lines. The
+"Start Application" is not an event in the original sense since it only
+indicates if the table was started or maybe reset itself. This is not an
+interaction with the table and therefore not interesting in itself. All
+"Start Application" and "Show Application" are therefore excluded from the
+data when further processed and are only in the raw log files.
+
+# Parsing the raw log files
+
+The first step is to parse the raw log files that are stored by the
+application as text files in a rather unstructured format to a format that
+can be read by common statistics software packages. The data are therefore
+transferred to a spread sheet format. The following section describes what
+problems were encountered while doing this.
+
+## Corrupt lines
+
+When reading the files containing the raw logs into R, a warning appears
+that says
+
+```
+Warning messages:
+  incomplete final line found on '2016/2016_11_18-11_31_0.log'
+  incomplete final line found on '2016/2016_11_18-11_38_30.log'
+  incomplete final line found on '2016/2016_11_18-11_40_36.log'
+  ...
+```
+
+When you open these files, it looks like the last line contains some binary
+content. It is unclear why and how this happens. So when reading the data,
+these lines were removed. A warning will be given that indicates how many
+files have been affected.
+
+## Extracted variables from raw log files
+
+The following variables (columns in the data frame) are extracted from the
+raw log file:
+
+* `fileId`: Containing the zero-left-padded file name of the raw log file
+  the data line has been extracted from
+
+* `folder`: The folder names in which the raw log files haven been
+  organized in. For the HAUM data set, the data are sorted by year (folders
+  2016, 2017, 2018, 2019, 2020, 2021, 2022, and 2023).
+
+* `date`: Extracted timestamp from the raw log file in the format
+  `yyyy-mm-dd hh:mm:ss`.
+
+* `timeMs`: Containing a timestamp in Milliseconds that restarts with
+  every new raw log files.
+
+* `event`: Start and stop event tags. See above for possible values.
+
+* `item`: Identifier of the different items. This is a three-digit
+  (left-padded) number. The numbers of the items correspond to the
+  folder names in `/ContentEyevisit/eyevisit_cards_light/` and were
+  orginally taken from the museums catalogue.
+
+* `popup`: Name of the pop-up opened. This is only interesting for
+  "openPopup" events.
+
+* `topic`: The number of the topic card that has been opened at the back of
+  the item card. See below for a more detailed description what these
+  numbers mean.
+
+* `x`: Value of x-coordinate in pixel on the 4K-Display ($3840 \times 2160$).
+
+* `y`: Value of y-coordinate in pixel.
+
+* `scale`: Number in 128 bit that indicates how much the card has been
+  scaled.
+
+* `rotation`: Degree of rotation from start configuration.
+
+<!-- TODO: Nach welchem Zeitintervall resettet sich der Tisch wieder in die
+  Ausgangskonfiguration? -->
+
+## Variables after "closing of events"
+
+The raw log data consist of start and stop events for each event type.
+After preprocessing four event types are extracted: `move`, `flipCard`,
+`openTopic`, and `openPopup`. Except for the `move` events, which can occur
+at any time when interacting with an item card on the table, the events
+have a hierarchical order: An item card first needs to be flipped
+(`flipCard`), then the topic cards on the back of the card can be opened
+(`openTopic`), and finally pop-ups on these topic cards can be opened
+(`openPopup`). This implies that the event `openPopup` can only be present
+for a certain item, if the card has already been flipped (i.e., an event
+`flipCard` for the same item has already occured).
+
+After preprocessing, the data frame is now in a wide format with columns
+for the start and the stop of each event and contains the following
+variables:
+
+* `fileId.start` / `fileId.stop`: See above.
+
+* `date.start` / `date.stop`: See above.
+
+* `folder`: Containing the folder name (see above).
+
+* `case`: A numerical variable indicating cases in the data. A "case"
+  indicates an interaction interval and could be defined in different ways.
+  Right now a new case begins, when no event occurred when no new path
+  started for 20 seconds or longer.
+
+* `path`: A path is defined as one interaction with one item. A path
+  can either start with a `flipCard` event or when an item has been
+  touched for the first time within this case. A path ends with the
+  item card being flipped close again or with the last movement of the
+  card within this case. One case can contain several paths with the same
+  item when the item is flipped open and flipped close again several
+  times within a short time.
+
+* `glossar`: An indicator variable with values 0/1 that tracks if a pop-up
+  has been opened from the glossar folder. These pop-ups can be assigned to
+  the wrong item since it is not possible to do this algorithmically.
+  It is possible that two items are flipped open that could both link to
+  the same pop-up from a glossar. The indicator variable is left as a
+  variable, so that these pop-ups can be easily deleted from the data.
+  Right now, glossar entries can be ignored completely by setting an
+  argument and this is done by default. Using the pop-ups from the glossar
+  will need a lot more love, before it behaves satisfactorily.
+
+* `event`: Indicating the event. Can take tha values `move`, `flipCard`,
+  `openTopic`, and `openPopup`.
+
+* `item`: Identifier of the different artworks and information cards. This
+  is a three-digit (left-padded) number. See above.
+
+* `timeMs.start` / `timeMs.stop`: See above.
+
+* `duration`: Calculated by $timeMs.stop - timeMs.start$ in Milliseconds.
+  Needs to be adjusted for events spanning more than one log file by a
+  factor of $60,000 \times \text{number of logfiles}$. See below for details.
+
+* `topic`: See above.
+
+* `popup`: See above.
+
+* `x.start` / `x.stop`: See above.
+
+* `y.start` / `y.stop`: See above.
+
+* `distance`: Euclidean distande calculated from $(x.start, y.start)$ and
+  $(x.stop, y.stop)$.
+
+* `scale.start` / `scale.stop`: See above.
+
+* `scaleSize`: Relative scaling of item card, calculated by
+  $\frac{scale.stop}{scale.start}$.
+
+* `rotation.start` / `rotation.stop`: See above.
+
+* `rotationDegree`: Difference of rotation from $rotation.stop$ to
+  $rotation.start$.
+
+## How unclosed events are handled
+
+Events do not necessarily need to be completed. A person can, e.g., leave
+the table and not flip the item card close again. For `flipCard`,
+`openTopic`, and `openPopup` the data frame contains `NA` when the event
+does not complete. For `move` events it happens quite often that a start
+event follows a start event and a stop event follows a stop event.
+Technically a move event cannot *not* be finished and the number of events
+without a start or stop indicate that the time resolution was not
+sufficient to catch all these events accurately. Double start and stop
+`move` events have therefore been deleted from the data set.
+
+## Additional meta data
+
+For the HAUM data, I added meta data on state holidays and school
+vacations. 
+
+This led to the following additional variables:
+
+* `holiday`
+
+* `vacations`
+
+# Problems and how I handled them
+
+This lists some problems with the log data that required decisions. These
+decisions influence the outcome and maybe even the data quality. Hence, I
+tried to document how I handled these problems and explain the decisions I
+made.
+
+## Weird behavior of `timeMs` and neg. `duration` values
+
+`timeMs` resets itself every time a new log file starts. This means that
+the durations of events spanning more than one log file must be adjusted.
+Instead of just calculating $timeMs.stop - timeMs.start$, `timeMs.start`
+must be subtracted from the maximum duration of the log file where the
+event started ($600,000 ms$) and the `timeMs.stop` must be added. If the
+event spans more than two log files, a multiple of $600,000$ must be taken,
+e.g. for three log files it must be: $2 \times 600,000 - timeMs.start +
+timeMs.stop$ and so on.
+
+```{r timems, echo = FALSE, results = FALSE, fig.show = TRUE}
+# Read data
+datraw <- read.table("../../MDS/2023ss/60100_master_thesis/analysis/code/results/raw_logfiles_2024-02-21_16-07-33.csv", sep = ";",
+                     header = TRUE)
+
+plot(timeMs ~ as.factor(fileId), datraw[1:5000,], xlab = "fileId")
+```
+
+The boxplot shows that we have a continuous range of values within one log
+file but that `timeMs` does not increase over log files. I kept
+`timeMs.start` and `timeMs.stop` and also `fileId.start` and `fileId.stop`
+in the data frame, so it is clear when events span more than one log file.
+
+<!--
+Infos from the programmer:
+
+"Bin außerdem gerade den Code von damals durchgegangen. Das Logging läuft
+so: Mit Start der Anwendung wird alle 10 Minuten ein neues Logfile
+erstellt. Die Startzeit, von der aus die Duration berechnet wird, wird
+jeweils neu gesetzt. Duration ist also nicht "Dauer seit Start der
+Anwendung" sondern "Dauer seit Restart des Loggers". Deine Vermutung ist
+also richtig - es sollte keine Durations >10 Minuten geben. Der erste
+Eintrag eines Logfiles kann alles zwischen 0 und 10 Minuten sein (je
+nachdem, ob der Tisch zum Zeitpunkt des neuen Logging-Intervalls in
+Benutzung war). Wenn ein Case also über 2+ Logs verteilt ist, musst du auf
+die Duration jeweils 10 Minuten pro Logfile nach dem ersten addieren, damit
+es passt."
+-->
+
+## Left padding of file IDs
+
+The file names of the raw log files are automatically generated and contain
+a timestamp. This timestamp is not well formed. First, it contains an
+incorrect month. The months go from 0 to 11 which means, that the file name
+`2016_11_15-12_12_57.log` was collected on December 15, 2016 at 12:12 pm.
+Another problem is that the file names are not zero left padded, e.g.,
+`2016_11_15-12_2_57.log`. This file was collected on December 15, 2016 at
+12:02 pm and therefore before the file above. But most sorting algorithms,
+will sort these files in the order shown below. In order to preprocess the
+data and close events that belong together, the data need to be sorted by
+events and artworks repeatedly. In order to get them back in the correct
+time order, it is necessary to order them based on three variables:
+`fileId.start`, `date.start` and `timeMs.start`. The file IDs therefore
+need to sort in the correct order (again see below for example). I zero
+left padded the log file names within the data frame using it as an
+identifier. These "file names" do not correspond exactly to the original
+raw log file names. This needs to be kept in mind when doing any kind of
+matching etc.
+
+```
+## what it looked like before left padding
+# 1422  ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_2_57.log 2016-12-15 12:12:56 599671 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26874254
+# 1423 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57    621 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26523465
+# 1424 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57    677  Transform stop 076 076.xml NA 2092.25 2008.00 0.2997736 13.26239605
+# 1425 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57    774 Transform start 076 076.xml NA 2092.25 2008.00 0.2999345 13.26239605
+# 1426 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57    850  Transform stop 076 076.xml NA 2092.25 2008.00 0.2997107 13.26223362
+# 1427  ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_2_57.log 2016-12-15 12:12:57 599916  Transform stop 076 076.xml NA 2092.25 2008.00 0.2997771 13.26523465
+
+## what it looks like now
+# 1422 2016_11_15-12_02_57.log 2016-12-15 12:12:56 599671 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26874254
+# 1423 2016_11_15-12_02_57.log 2016-12-15 12:12:57 599916  Transform stop 076 076.xml NA 2092.25 2008.00 0.2997771 13.26523465
+# 1424 2016_11_15-12_12_57.log 2016-12-15 12:12:57    621 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26523465
+# 1425 2016_11_15-12_12_57.log 2016-12-15 12:12:57    677  Transform stop 076 076.xml NA 2092.25 2008.00 0.2997736 13.26239605
+# 1426 2016_11_15-12_12_57.log 2016-12-15 12:12:57    774 Transform start 076 076.xml NA 2092.25 2008.00 0.2999345 13.26239605
+# 1427 2016_11_15-12_12_57.log 2016-12-15 12:12:57    850  Transform stop 076 076.xml NA 2092.25 2008.00 0.2997107 13.26223362
+```
+
+## Timestamps repeat
+
+The timestamps in the `date` variable record year, month, day, hour,
+minute and seconds. Since one second is not a very short time interval for
+a move on a touch display, this is not fine grained enough to bring events
+into the correct order, meaning there are events from the same log file
+having the same timestamp and even events from different log files having
+the same timestamp. The log files get written about every 10 minutes
+(which can easily be seen when looking at the file names of the raw log
+files). So in order to get events in the correct order, it is necessary to
+first order by file ID, within file ID then sort by timestamp `date` and
+then within these more coarse grained timestamps sort be `timeMs`. But as
+explained above, `timeMs` can only be sorted within one file ID, since they
+do not increase consistently over log files, but have a new setoff for each
+raw log file.
+
+## x,y-coordinates outside of display range
+
+The display of the Multi-Touch-Table is a 4K-display with 3840 x 2160
+pixels. When you plot the start and stop coordinates, the display is
+clearly distinguishable. However, a lot of points are outside of the
+display range. This can happen, when the art objects are scaled and then
+moved to the very edge of the table. Then it will record pixels outside of
+the table. These are actually valid data points and I will leave them as
+is.
+
+```{r xycoord}
+datlogs <- read.table("../../MDS/2023ss/60100_master_thesis/analysis/code/results/event_logfiles_2024-02-21_16-07-33.csv", sep = ";",
+                      header = TRUE)
+
+par(mfrow = c(1, 2))
+plot(y.start ~ x.start, datlogs)
+abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
+plot(y.stop ~ x.stop, datlogs)
+abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
+
+aggregate(cbind(x.start, x.stop, y.start, y.stop) ~ 1, datlogs, mean)
+```
+
+## Pop-ups from glossar cannot be assigned to a specific item
+
+All the information, pictures and texts for the topics and pop-ups are
+stored in `/data/haum/ContentEyevisit/eyevisit_cards_light/<item_number>`.
+Among other things, each folder contains XML-files with the information
+about any technical terms that can be opened from the hypertexts on the
+topic cards. Often these information are item dependent and then the
+corresponding XML-file is in the folder for this item. Sometimes, however,
+more general terms can be opened. In order to avoid multiple files
+containing the same information, these were stored in a folder called
+`glossar` and get accessed from there. The raw log files only contain the
+path to this glossar entry and did not record from which item it was
+accessed. I tried to assign these glossar entries to the correct items. The
+(very heuristic) approach was this:
+
+1. Create a lookup table with all XML-file names (possible pop-ups) from
+   the glossar folder and what items possibly call them. This was stored
+   as an `RData` object for easier handling but should maybe be stored in a
+   more interoperable format.
+
+2. I went through all possible pop-ups in this lookup table and stored the
+   items that are associated with it.
+
+3. I created a sub data frame without move events (since they can never be
+   associated with a pop-up) and went through every line and looked up if
+   an item and a topic card had been opened. If this was the case and a
+   glossar entry came up before the item was closed again, I assigned
+   this item to the glossar entry.
+
+This is heuristic since it is possible that several topic cards from
+different items are opened simultaneously and the glossar pop-up could
+be opened from either one (it could even be more than two, of course). In
+these cases the item that was opened closest to the glossar pop-up has
+been assigned, but this can never be completely error free.
+
+And this heuristic only assigns a little more than half of the glossar
+entries. Since my heuristic only looks for the last item that has been
+opened and if this item is a possible candidate it misses all glossar
+pop-ups where another item has been opened in between. This is still an
+open TODO to write a more elaborate algorithm.
+
+All glossar pop-ups that do not get matched with an item are removed
+from the data set with a warning if the argument `glossar = TRUE` is set.
+Otherwise the glossar entries will be ignored completely.
+
+## Assign a `case` variable based on "time heuristic"
+
+One thing needed in order to work with the data set and use it for machine
+learning algorithms like process mining, is a variable that tries to
+identify a case. A case variable will structure the data frame in a way
+that navigation behavior can actually be investigated. However, we do not
+know if several people are standing around the table interacting with it or
+just one very active person. The simplest way to define a case variable is
+to just use a time limit between events. This means that when the table has
+not been interacted with for, e.g., 20 seconds than it is assumed that a
+person moved on and a new person started interacting with the table. This
+is the easiest heuristic and implemented at the moment. Process mining
+shows that this simple approach works in a way that the correct process
+gets extracted by the algorithm.
+
+In order to investigate user behavior on a more fine grained level, it will
+be necessary to come up with a more elaborate approach. A better, still
+simple approach, could be to use this kind of time limit and additionally
+look at the distance between items interacted with within one time window.
+When items are far apart it seems plausible that more than one person
+interacted with them. Very short time lapses between events on different
+items could also be an indicator that more than one person is interacting
+with the table.
+
+## Assign a `path` variable
+
+The `path` variable is supposed to show one interaction trace with one
+artwork. Meaning it starts when an artwork is touched or flipped and stops
+when it is closed again. It is easy to assign a path from flipping a card
+over opening (maybe several) topics and pop-ups for this artwork card until
+closing this card again. But one would like to assign the same path to
+move events surrounding this interaction. Again, this is not possible in an
+algorithmic way but only heuristically.
+
+Again, I used a time cutoff for this. First, if a `move` event occurs, it
+is checked, if the same item has been flipped less than 20 seconds
+beforehand. If yes, the same path indicator is assigned to this `move`. If
+not, temporarily a new "move indicator" is assigned. Then, a "backward
+pass" is applied, where it is checked if the same item is opened less than
+20 seconds _after_ the event occurs. If yes, that path indicator is
+assigned. For all the remaining moves, a new path number is assigned. This
+corresponds to items being moved without being flipped.
+
+## A `move` event does not record any change
+
+Most of the events in the log files are move events. Additionally, many of
+these move events are recorded but they do not indicate any change, meaning
+the only difference is the timestamp. All other variables indicating moves
+like `x.start` and `x.stop`, `rotation.start` and `rotation.stop` etc. do
+not show _any_ change. They represent about 2/3 of all move events. These
+events are probably short touches of the table without an actual
+interaction. They were therefore removed from the data set.
+
+## Card indices go from 0 to 7 (instead of 0 to 5 as expected)
+
+In the beginning I thought that the number for topics was the index of
+where the card was presented on the back of the item. But this is not
+correct. It is the number of the topic. There are eight topics in total:
+
+```
+Indices for topics:
+0   artist
+1   thema
+2   komposition
+3   leben des kunstwerks
+4   details
+5   licht und farbe
+6   extra info
+7   technik
+```
+On the back of items, there can be between 2 to 6 topic cards. Several of
+these topic cards can be about the same topic, e.g., there can be two topic
+cards assigned to the topic `thema`. It is impossible to find out if the
+same topic card was opened several times or if different topic cards with
+the same topic were opened from the same item. See example below for item
+"001".
+
+```{r topics, echo = FALSE}
+devtools::load_all()
+items <- sprintf("%03d", unique(datlogs$item))
+topics <- extract_topics(items, xmlfiles = paste0(items, ".xml"),
+                         xmlpath = "../../MDS/2023ss/60100_master_thesis/analysis/data/haum/ContentEyevisit/eyevisit_cards_light/")
+head(topics)
+```
+
+## New artworks "504" and "505" starting October 2022
+
+When I read in the complete data frame for the first time, all of the
+sudden there were 72 instead of 70 items. It seems like these two
+artworks appear on October 21, 2022.
+
+```{r newitems}
+summary(as.Date(datraw[datraw$item %in% c("504", "505"), "date"]))
+```
+
+The artworks seem to be have updated in general after October 21, 2022. The
+following table shows which items were presented in which years.
+
+```{r years}
+xtabs(~ item + lubridate::year(date.start), datlogs)
+```
+
+It shows that the artworks haven been updated after the Corona pandemic. I
+think, the table was also moved to a different location at that point.
+
@@ -1,7 +1,12 @@
+
+<!-- README.md is generated from README.Rmd. Please edit that file -->
+
 # R package mtt

-This package was created to process log files obtained from
-Multi-Touch-Tables at the IWM.
+![mtt package](man/figures/logo.png)
+
+This package was created to process log files obtained from multi-touch
+tables at the Leibniz-Institut für Wissensmedien (IWM).

 ## Installation

@@ -9,16 +14,597 @@ It can be installed via

 `devtools::install_git("https://gitea.iwm-tuebingen.de/R/mtt.git")`

-If you get an error message, you probably need to install `git2r`first with
+If you get an error message, you probably need to install `git2r`first
+with

 `install.packages("git2r")`.

 The package depends on the following R packages

-* `dplyr`
-* `pbapply`
-* `XML`
-* `lubridate`
+- `dplyr`
+- `pbapply`
+- `XML`
+- `lubridate`

 so make sure they are installed as well.

+# Multi-Touch Table
+
+The multi-touch table at the Herzog-Anton-Ulrich-Museum (HAUM) in
+Braunschweig gives visitors of the Museum the opportunity to interact
+with about 70 artworks and 3 virtual cards containing information about
+the museum and its layout. The table was installed at the museum in
+October 2016 and since November 2016 log files from interactions of
+visitors of the museum have been collected. These log files are in an
+unstructured format and cannot be easily analyzed. The purpose of the
+following document is to describe how the data haven been transformed
+and which decisions have been made along the way.
+
+<!--
+The implementation of the steps described here can be found at:
+https://gitea.iwm-tuebingen.de/R/mtt.
+-->
+
+# Data structure
+
+The log files contain lines that indicate the beginning and end of
+possible activities that can be performed when interacting with the
+artworks on the table. The layout of the table looks like pictures have
+been tossed on a large table. Every artwork is visible at the start
+configuration. People can move the pictures on the table, they can be
+scaled and rotated. Additionally, the virtual picture cards can be
+flipped in order to find more information of the artwork on the “back”
+of the card. One has to press a little `i` for more information in one
+of the bottom corners of the card. On the back of the card two to six
+information cards can be found with a teaser text about a certain topic.
+These topic cards can be opened and a hypertext with detailed
+information opens. Within these hypertexts certain technical terms can
+be clicked for lay people to get more information. This also opens up a
+pop-up. The events encoded in the raw log files therefore have the
+following structure.
+
+    "Start Application"     --> Start Application
+    "Show Application"
+    "Transform start"       --> Move
+    "Transform stop"
+    "Show Info"             --> Flip Card
+    "Show Front"
+    "Artwork/OpenCard"      --> Open Topic
+    "Artwork/CloseCard"
+    "ShowPopup"             --> Open Popup
+    "HidePopup"
+
+The right side shows what events can be extracted from these raw lines.
+The “Start Application” is not an event in the original sense since it
+only indicates if the table was started or maybe reset itself. This is
+not an interaction with the table and therefore not interesting in
+itself. All “Start Application” and “Show Application” are therefore
+excluded from the data when further processed and are only in the raw
+log files.
+
+# Parsing the raw log files
+
+The first step is to parse the raw log files that are stored by the
+application as text files in a rather unstructured format to a format
+that can be read by common statistics software packages. The data are
+therefore transferred to a spread sheet format. The following section
+describes what problems were encountered while doing this.
+
+## Corrupt lines
+
+When reading the files containing the raw logs into R, a warning appears
+that says
+
+    Warning messages:
+      incomplete final line found on '2016/2016_11_18-11_31_0.log'
+      incomplete final line found on '2016/2016_11_18-11_38_30.log'
+      incomplete final line found on '2016/2016_11_18-11_40_36.log'
+      ...
+
+When you open these files, it looks like the last line contains some
+binary content. It is unclear why and how this happens. So when reading
+the data, these lines were removed. A warning will be given that
+indicates how many files have been affected.
+
+## Extracted variables from raw log files
+
+The following variables (columns in the data frame) are extracted from
+the raw log file:
+
+- `fileId`: Containing the zero-left-padded file name of the raw log
+  file the data line has been extracted from
+
+- `folder`: The folder names in which the raw log files haven been
+  organized in. For the HAUM data set, the data are sorted by year
+  (folders 2016, 2017, 2018, 2019, 2020, 2021, 2022, and 2023).
+
+- `date`: Extracted timestamp from the raw log file in the format
+  `yyyy-mm-dd hh:mm:ss`.
+
+- `timeMs`: Containing a timestamp in Milliseconds that restarts with
+  every new raw log files.
+
+- `event`: Start and stop event tags. See above for possible values.
+
+- `item`: Identifier of the different items. This is a three-digit
+  (left-padded) number. The numbers of the items correspond to the
+  folder names in `/ContentEyevisit/eyevisit_cards_light/` and were
+  orginally taken from the museums catalogue.
+
+- `popup`: Name of the pop-up opened. This is only interesting for
+  “openPopup” events.
+
+- `topic`: The number of the topic card that has been opened at the back
+  of the item card. See below for a more detailed description what these
+  numbers mean.
+
+- `x`: Value of x-coordinate in pixel on the 4K-Display
+  ($3840 \times 2160$).
+
+- `y`: Value of y-coordinate in pixel.
+
+- `scale`: Number in 128 bit that indicates how much the card has been
+  scaled.
+
+- `rotation`: Degree of rotation from start configuration.
+
+<!-- TODO: Nach welchem Zeitintervall resettet sich der Tisch wieder in die
+  Ausgangskonfiguration? -->
+
+## Variables after “closing of events”
+
+The raw log data consist of start and stop events for each event type.
+After preprocessing four event types are extracted: `move`, `flipCard`,
+`openTopic`, and `openPopup`. Except for the `move` events, which can
+occur at any time when interacting with an item card on the table, the
+events have a hierarchical order: An item card first needs to be flipped
+(`flipCard`), then the topic cards on the back of the card can be opened
+(`openTopic`), and finally pop-ups on these topic cards can be opened
+(`openPopup`). This implies that the event `openPopup` can only be
+present for a certain item, if the card has already been flipped (i.e.,
+an event `flipCard` for the same item has already occured).
+
+After preprocessing, the data frame is now in a wide format with columns
+for the start and the stop of each event and contains the following
+variables:
+
+- `fileId.start` / `fileId.stop`: See above.
+
+- `date.start` / `date.stop`: See above.
+
+- `folder`: Containing the folder name (see above).
+
+- `case`: A numerical variable indicating cases in the data. A “case”
+  indicates an interaction interval and could be defined in different
+  ways. Right now a new case begins, when no event occurred when no new
+  path started for 20 seconds or longer.
+
+- `path`: A path is defined as one interaction with one item. A path can
+  either start with a `flipCard` event or when an item has been touched
+  for the first time within this case. A path ends with the item card
+  being flipped close again or with the last movement of the card within
+  this case. One case can contain several paths with the same item when
+  the item is flipped open and flipped close again several times within
+  a short time.
+
+- `glossar`: An indicator variable with values 0/1 that tracks if a
+  pop-up has been opened from the glossar folder. These pop-ups can be
+  assigned to the wrong item since it is not possible to do this
+  algorithmically. It is possible that two items are flipped open that
+  could both link to the same pop-up from a glossar. The indicator
+  variable is left as a variable, so that these pop-ups can be easily
+  deleted from the data. Right now, glossar entries can be ignored
+  completely by setting an argument and this is done by default. Using
+  the pop-ups from the glossar will need a lot more love, before it
+  behaves satisfactorily.
+
+- `event`: Indicating the event. Can take tha values `move`, `flipCard`,
+  `openTopic`, and `openPopup`.
+
+- `item`: Identifier of the different artworks and information cards.
+  This is a three-digit (left-padded) number. See above.
+
+- `timeMs.start` / `timeMs.stop`: See above.
+
+- `duration`: Calculated by $timeMs.stop - timeMs.start$ in
+  Milliseconds. Needs to be adjusted for events spanning more than one
+  log file by a factor of $60,000 \times \text{number of logfiles}$. See
+  below for details.
+
+- `topic`: See above.
+
+- `popup`: See above.
+
+- `x.start` / `x.stop`: See above.
+
+- `y.start` / `y.stop`: See above.
+
+- `distance`: Euclidean distande calculated from $(x.start, y.start)$
+  and $(x.stop, y.stop)$.
+
+- `scale.start` / `scale.stop`: See above.
+
+- `scaleSize`: Relative scaling of item card, calculated by
+  $\frac{scale.stop}{scale.start}$.
+
+- `rotation.start` / `rotation.stop`: See above.
+
+- `rotationDegree`: Difference of rotation from $rotation.stop$ to
+  $rotation.start$.
+
+## How unclosed events are handled
+
+Events do not necessarily need to be completed. A person can, e.g.,
+leave the table and not flip the item card close again. For `flipCard`,
+`openTopic`, and `openPopup` the data frame contains `NA` when the event
+does not complete. For `move` events it happens quite often that a start
+event follows a start event and a stop event follows a stop event.
+Technically a move event cannot *not* be finished and the number of
+events without a start or stop indicate that the time resolution was not
+sufficient to catch all these events accurately. Double start and stop
+`move` events have therefore been deleted from the data set.
+
+## Additional meta data
+
+For the HAUM data, I added meta data on state holidays and school
+vacations.
+
+This led to the following additional variables:
+
+- `holiday`
+
+- `vacations`
+
+# Problems and how I handled them
+
+This lists some problems with the log data that required decisions.
+These decisions influence the outcome and maybe even the data quality.
+Hence, I tried to document how I handled these problems and explain the
+decisions I made.
+
+## Weird behavior of `timeMs` and neg. `duration` values
+
+`timeMs` resets itself every time a new log file starts. This means that
+the durations of events spanning more than one log file must be
+adjusted. Instead of just calculating $timeMs.stop - timeMs.start$,
+`timeMs.start` must be subtracted from the maximum duration of the log
+file where the event started ($600,000 ms$) and the `timeMs.stop` must
+be added. If the event spans more than two log files, a multiple of
+$600,000$ must be taken, e.g. for three log files it must be:
+$2 \times 600,000 - timeMs.start + timeMs.stop$ and so on.
+
+<img src="man/figures/README-timems-1.png" width="100%" />
+
+The boxplot shows that we have a continuous range of values within one
+log file but that `timeMs` does not increase over log files. I kept
+`timeMs.start` and `timeMs.stop` and also `fileId.start` and
+`fileId.stop` in the data frame, so it is clear when events span more
+than one log file.
+
+<!--
+Infos from the programmer:
+
+"Bin außerdem gerade den Code von damals durchgegangen. Das Logging läuft
+so: Mit Start der Anwendung wird alle 10 Minuten ein neues Logfile
+erstellt. Die Startzeit, von der aus die Duration berechnet wird, wird
+jeweils neu gesetzt. Duration ist also nicht "Dauer seit Start der
+Anwendung" sondern "Dauer seit Restart des Loggers". Deine Vermutung ist
+also richtig - es sollte keine Durations >10 Minuten geben. Der erste
+Eintrag eines Logfiles kann alles zwischen 0 und 10 Minuten sein (je
+nachdem, ob der Tisch zum Zeitpunkt des neuen Logging-Intervalls in
+Benutzung war). Wenn ein Case also über 2+ Logs verteilt ist, musst du auf
+die Duration jeweils 10 Minuten pro Logfile nach dem ersten addieren, damit
+es passt."
+-->
+
+## Left padding of file IDs
+
+The file names of the raw log files are automatically generated and
+contain a timestamp. This timestamp is not well formed. First, it
+contains an incorrect month. The months go from 0 to 11 which means,
+that the file name `2016_11_15-12_12_57.log` was collected on December
+15, 2016 at 12:12 pm. Another problem is that the file names are not
+zero left padded, e.g., `2016_11_15-12_2_57.log`. This file was
+collected on December 15, 2016 at 12:02 pm and therefore before the file
+above. But most sorting algorithms, will sort these files in the order
+shown below. In order to preprocess the data and close events that
+belong together, the data need to be sorted by events and artworks
+repeatedly. In order to get them back in the correct time order, it is
+necessary to order them based on three variables: `fileId.start`,
+`date.start` and `timeMs.start`. The file IDs therefore need to sort in
+the correct order (again see below for example). I zero left padded the
+log file names within the data frame using it as an identifier. These
+“file names” do not correspond exactly to the original raw log file
+names. This needs to be kept in mind when doing any kind of matching
+etc.
+
+    ## what it looked like before left padding
+    # 1422  ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_2_57.log 2016-12-15 12:12:56 599671 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26874254
+    # 1423 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57    621 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26523465
+    # 1424 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57    677  Transform stop 076 076.xml NA 2092.25 2008.00 0.2997736 13.26239605
+    # 1425 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57    774 Transform start 076 076.xml NA 2092.25 2008.00 0.2999345 13.26239605
+    # 1426 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57    850  Transform stop 076 076.xml NA 2092.25 2008.00 0.2997107 13.26223362
+    # 1427  ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_2_57.log 2016-12-15 12:12:57 599916  Transform stop 076 076.xml NA 2092.25 2008.00 0.2997771 13.26523465
+
+    ## what it looks like now
+    # 1422 2016_11_15-12_02_57.log 2016-12-15 12:12:56 599671 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26874254
+    # 1423 2016_11_15-12_02_57.log 2016-12-15 12:12:57 599916  Transform stop 076 076.xml NA 2092.25 2008.00 0.2997771 13.26523465
+    # 1424 2016_11_15-12_12_57.log 2016-12-15 12:12:57    621 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26523465
+    # 1425 2016_11_15-12_12_57.log 2016-12-15 12:12:57    677  Transform stop 076 076.xml NA 2092.25 2008.00 0.2997736 13.26239605
+    # 1426 2016_11_15-12_12_57.log 2016-12-15 12:12:57    774 Transform start 076 076.xml NA 2092.25 2008.00 0.2999345 13.26239605
+    # 1427 2016_11_15-12_12_57.log 2016-12-15 12:12:57    850  Transform stop 076 076.xml NA 2092.25 2008.00 0.2997107 13.26223362
+
+## Timestamps repeat
+
+The timestamps in the `date` variable record year, month, day, hour,
+minute and seconds. Since one second is not a very short time interval
+for a move on a touch display, this is not fine grained enough to bring
+events into the correct order, meaning there are events from the same
+log file having the same timestamp and even events from different log
+files having the same timestamp. The log files get written about every
+10 minutes (which can easily be seen when looking at the file names of
+the raw log files). So in order to get events in the correct order, it
+is necessary to first order by file ID, within file ID then sort by
+timestamp `date` and then within these more coarse grained timestamps
+sort be `timeMs`. But as explained above, `timeMs` can only be sorted
+within one file ID, since they do not increase consistently over log
+files, but have a new setoff for each raw log file.
+
+## x,y-coordinates outside of display range
+
+The display of the Multi-Touch-Table is a 4K-display with 3840 x 2160
+pixels. When you plot the start and stop coordinates, the display is
+clearly distinguishable. However, a lot of points are outside of the
+display range. This can happen, when the art objects are scaled and then
+moved to the very edge of the table. Then it will record pixels outside
+of the table. These are actually valid data points and I will leave them
+as is.
+
+``` r
+datlogs <- read.table("../../MDS/2023ss/60100_master_thesis/analysis/code/results/event_logfiles_2024-02-21_16-07-33.csv", sep = ";",
+                      header = TRUE)
+
+par(mfrow = c(1, 2))
+plot(y.start ~ x.start, datlogs)
+abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
+plot(y.stop ~ x.stop, datlogs)
+abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
+```
+
+<img src="man/figures/README-xycoord-1.png" width="100%" />
+
+``` r
+
+aggregate(cbind(x.start, x.stop, y.start, y.stop) ~ 1, datlogs, mean)
+#>    x.start   x.stop  y.start   y.stop
+#> 1 1978.202 1975.876 1137.481 1133.494
+```
+
+## Pop-ups from glossar cannot be assigned to a specific item
+
+All the information, pictures and texts for the topics and pop-ups are
+stored in
+`/data/haum/ContentEyevisit/eyevisit_cards_light/<item_number>`. Among
+other things, each folder contains XML-files with the information about
+any technical terms that can be opened from the hypertexts on the topic
+cards. Often these information are item dependent and then the
+corresponding XML-file is in the folder for this item. Sometimes,
+however, more general terms can be opened. In order to avoid multiple
+files containing the same information, these were stored in a folder
+called `glossar` and get accessed from there. The raw log files only
+contain the path to this glossar entry and did not record from which
+item it was accessed. I tried to assign these glossar entries to the
+correct items. The (very heuristic) approach was this:
+
+1.  Create a lookup table with all XML-file names (possible pop-ups)
+    from the glossar folder and what items possibly call them. This was
+    stored as an `RData` object for easier handling but should maybe be
+    stored in a more interoperable format.
+
+2.  I went through all possible pop-ups in this lookup table and stored
+    the items that are associated with it.
+
+3.  I created a sub data frame without move events (since they can never
+    be associated with a pop-up) and went through every line and looked
+    up if an item and a topic card had been opened. If this was the case
+    and a glossar entry came up before the item was closed again, I
+    assigned this item to the glossar entry.
+
+This is heuristic since it is possible that several topic cards from
+different items are opened simultaneously and the glossar pop-up could
+be opened from either one (it could even be more than two, of course).
+In these cases the item that was opened closest to the glossar pop-up
+has been assigned, but this can never be completely error free.
+
+And this heuristic only assigns a little more than half of the glossar
+entries. Since my heuristic only looks for the last item that has been
+opened and if this item is a possible candidate it misses all glossar
+pop-ups where another item has been opened in between. This is still an
+open TODO to write a more elaborate algorithm.
+
+All glossar pop-ups that do not get matched with an item are removed
+from the data set with a warning if the argument `glossar = TRUE` is
+set. Otherwise the glossar entries will be ignored completely.
+
+## Assign a `case` variable based on “time heuristic”
+
+One thing needed in order to work with the data set and use it for
+machine learning algorithms like process mining, is a variable that
+tries to identify a case. A case variable will structure the data frame
+in a way that navigation behavior can actually be investigated. However,
+we do not know if several people are standing around the table
+interacting with it or just one very active person. The simplest way to
+define a case variable is to just use a time limit between events. This
+means that when the table has not been interacted with for, e.g., 20
+seconds than it is assumed that a person moved on and a new person
+started interacting with the table. This is the easiest heuristic and
+implemented at the moment. Process mining shows that this simple
+approach works in a way that the correct process gets extracted by the
+algorithm.
+
+In order to investigate user behavior on a more fine grained level, it
+will be necessary to come up with a more elaborate approach. A better,
+still simple approach, could be to use this kind of time limit and
+additionally look at the distance between items interacted with within
+one time window. When items are far apart it seems plausible that more
+than one person interacted with them. Very short time lapses between
+events on different items could also be an indicator that more than one
+person is interacting with the table.
+
+## Assign a `path` variable
+
+The `path` variable is supposed to show one interaction trace with one
+artwork. Meaning it starts when an artwork is touched or flipped and
+stops when it is closed again. It is easy to assign a path from flipping
+a card over opening (maybe several) topics and pop-ups for this artwork
+card until closing this card again. But one would like to assign the
+same path to move events surrounding this interaction. Again, this is
+not possible in an algorithmic way but only heuristically.
+
+Again, I used a time cutoff for this. First, if a `move` event occurs,
+it is checked, if the same item has been flipped less than 20 seconds
+beforehand. If yes, the same path indicator is assigned to this `move`.
+If not, temporarily a new “move indicator” is assigned. Then, a
+“backward pass” is applied, where it is checked if the same item is
+opened less than 20 seconds *after* the event occurs. If yes, that path
+indicator is assigned. For all the remaining moves, a new path number is
+assigned. This corresponds to items being moved without being flipped.
+
+## A `move` event does not record any change
+
+Most of the events in the log files are move events. Additionally, many
+of these move events are recorded but they do not indicate any change,
+meaning the only difference is the timestamp. All other variables
+indicating moves like `x.start` and `x.stop`, `rotation.start` and
+`rotation.stop` etc. do not show *any* change. They represent about 2/3
+of all move events. These events are probably short touches of the table
+without an actual interaction. They were therefore removed from the data
+set.
+
+## Card indices go from 0 to 7 (instead of 0 to 5 as expected)
+
+In the beginning I thought that the number for topics was the index of
+where the card was presented on the back of the item. But this is not
+correct. It is the number of the topic. There are eight topics in total:
+
+    Indices for topics:
+    0   artist
+    1   thema
+    2   komposition
+    3   leben des kunstwerks
+    4   details
+    5   licht und farbe
+    6   extra info
+    7   technik
+
+On the back of items, there can be between 2 to 6 topic cards. Several
+of these topic cards can be about the same topic, e.g., there can be two
+topic cards assigned to the topic `thema`. It is impossible to find out
+if the same topic card was opened several times or if different topic
+cards with the same topic were opened from the same item. See example
+below for item “001”.
+
+    #> ℹ Loading mtt
+    #>   item            file_name                topic
+    #> 1  001 001_dargestellte.xml                thema
+    #> 2  001       001_thema1.xml                thema
+    #> 3  001        001_leben.xml leben des kunstwerks
+    #> 4  001       001_leben3.xml leben des kunstwerks
+    #> 5  001       001_thema2.xml                thema
+    #> 6  001        001_thema.xml                thema
+
+## New artworks “504” and “505” starting October 2022
+
+When I read in the complete data frame for the first time, all of the
+sudden there were 72 instead of 70 items. It seems like these two
+artworks appear on October 21, 2022.
+
+``` r
+summary(as.Date(datraw[datraw$item %in% c("504", "505"), "date"]))
+#>         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
+#> "2022-10-21" "2023-01-11" "2023-03-08" "2023-03-09" "2023-05-21" "2023-07-05"
+```
+
+The artworks seem to be have updated in general after October 21, 2022.
+The following table shows which items were presented in which years.
+
+``` r
+xtabs(~ item + lubridate::year(date.start), datlogs)
+#>      lubridate::year(date.start)
+#> item   2016  2017  2018  2019  2020  2022  2023
+#>   1     277  4082  1912  1434   424   394  1315
+#>   3     485  6730  3126  2356   528   457  1124
+#>   19    714  8656  4028  2743   660   698  1595
+#>   20    595  8461  3996  2983   938   657  1355
+#>   24    497  6638  2912  2251   649   439  1028
+#>   27    567  5959  3112  2318   651   711  1324
+#>   28    601  9329  4394  3056   778   762  1570
+#>   29    425  6865  3830  2365   516   615  1174
+#>   31    289  4118  2051  1218   291   296   675
+#>   32    562  7016  3477  2253   726   766  1647
+#>   33    509  4936  2242  1449   555   358   666
+#>   36    434  4505  2276  1668   373   387   976
+#>   37    242  4478  2182  1554   339   423  1168
+#>   38    480  4617  2144  1397   371   381   784
+#>   39    395  3227  1313  1003   237   161   622
+#>   41    282  3329  1303  1022   225   209   701
+#>   42    203  3113  1307   903   242   191   421
+#>   43    115  2420  1089   806   176   219   486
+#>   45   1491 13561  5924  4474   966   585  1828
+#>   46    903  9181  5340  3812   961   944  1648
+#>   47    306  4949  2395  1510   750   297   675
+#>   48    723 10455  5384  4162  1328   948  2031
+#>   49    433  4326  2124  1414   434   431   809
+#>   51    564  7837  4577  2991   884   659  1370
+#>   52    447  5021  2104  1729   471   349   840
+#>   54    424  5068  2816  2008   529   370   918
+#>   55    358  4859  2069  1428   341   403  1303
+#>   57    860 14264  6625  5092  1410  1221  2714
+#>   60    555  6865  3539  2336   639   586  1415
+#>   62    547  6736  3803  2210   795   633  1322
+#>   63    251  3677  1827  1241   300   282   527
+#>   66    552  6004  2774  1977   505   373   932
+#>   69    394  3730  1827  1438   272   206   680
+#>   70    226  3766  1843   973   293   268   703
+#>   71    557  6160  2490  1846   570   323   839
+#>   72    426  6194  2857  2129   508   635  1553
+#>   73    432  6125  2880  1821   583   395   939
+#>   75    258  5885  2418  1562   369   257   645
+#>   76    861 12435  6253  4214  1753  1153  2268
+#>   77    816  8595  4197  2897   699   674  1452
+#>   78    410  5632  2498  1924   394   408   850
+#>   80   1650 25687 12429  7782  1975  1712  4433
+#>   83    644  8618  4720  3026   987  1027  2294
+#>   84    184  2121  1231   759   231   254   465
+#>   87    149  1618   722   632    99     0     0
+#>   88    513  6996  3493  2272   539   533  1420
+#>   89    214  2204   950   723   156     0     0
+#>   90    281  3756  1372  1143   403   320   932
+#>   93    613  8528  4224  3015   696  1174  2058
+#>   98    462  6662  3265  2565   704   670  1453
+#>   99    180  4162  1653  1454   363   411   868
+#>   101   414  4209  1859  1282   392   411   981
+#>   103   677  8758  4366  3165  1045   909  1871
+#>   104   423  5256  2381  1865   463   467   933
+#>   107   181  2101  1106   788   205   146   339
+#>   109   321  4001  1619  1106   292   188   453
+#>   110   489  5846  2785  2008   494   387   923
+#>   125   640  8435  4519  3334   926     0     0
+#>   129   598 11322  5046  3369   910  1131  1682
+#>   145   419  7821  3945  2694   706   740  1396
+#>   176   507  8465  3968  2787   687   552  1544
+#>   180   516  7563  3720  2765   585   550  1272
+#>   183   377  4014  1819  1741   346   251   675
+#>   187   340  4222  2165  1753   319   312   734
+#>   197   426  7710  3603  2510   671   602  1217
+#>   229   303  4872  2360  1891   482   389  1005
+#>   231   271  3606  1851  1239   318   236   467
+#>   501  1915 15968  7849  5060  1157   890  2989
+#>   502  1212 14550  7111  4749  1105   883  2752
+#>   503  1308 15218  8632  6399  1626   870  2558
+#>   504     0     0     0     0     0   363   662
+#>   505     0     0     0     0     0   426  1533
+```
+
+It shows that the artworks haven been updated after the Corona pandemic.
+I think, the table was also moved to a different location at that point.