610 lines
		
	
	
		
			28 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			610 lines
		
	
	
		
			28 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| 
 | ||
| <!-- README.md is generated from README.Rmd. Please edit that file -->
 | ||
| 
 | ||
| # R package mtt
 | ||
| 
 | ||
| 
 | ||
| 
 | ||
| This package was created to process log files obtained from multi-touch
 | ||
| tables at the Leibniz-Institut für Wissensmedien (IWM).
 | ||
| 
 | ||
| ## Installation
 | ||
| 
 | ||
| It can be installed via
 | ||
| 
 | ||
| `devtools::install_git("https://gitea.iwm-tuebingen.de/R/mtt.git")`
 | ||
| 
 | ||
| If you get an error message, you probably need to install `git2r`first
 | ||
| with
 | ||
| 
 | ||
| `install.packages("git2r")`.
 | ||
| 
 | ||
| The package depends on the following R packages
 | ||
| 
 | ||
| - `dplyr`
 | ||
| - `pbapply`
 | ||
| - `XML`
 | ||
| - `lubridate`
 | ||
| 
 | ||
| so make sure they are installed as well.
 | ||
| 
 | ||
| # Multi-Touch Table
 | ||
| 
 | ||
| The multi-touch table at the Herzog-Anton-Ulrich-Museum (HAUM) in
 | ||
| Braunschweig gives visitors of the Museum the opportunity to interact
 | ||
| with about 70 artworks and 3 virtual cards containing information about
 | ||
| the museum and its layout. The table was installed at the museum in
 | ||
| October 2016 and since November 2016 log files from interactions of
 | ||
| visitors of the museum have been collected. These log files are in an
 | ||
| unstructured format and cannot be easily analyzed. The purpose of the
 | ||
| following document is to describe how the data haven been transformed
 | ||
| and which decisions have been made along the way.
 | ||
| 
 | ||
| <!--
 | ||
| The implementation of the steps described here can be found at:
 | ||
| https://gitea.iwm-tuebingen.de/R/mtt.
 | ||
| -->
 | ||
| 
 | ||
| # Data structure
 | ||
| 
 | ||
| The log files contain lines that indicate the beginning and end of
 | ||
| possible activities that can be performed when interacting with the
 | ||
| artworks on the table. The layout of the table looks like pictures have
 | ||
| been tossed on a large table. Every artwork is visible at the start
 | ||
| configuration. People can move the pictures on the table, they can be
 | ||
| scaled and rotated. Additionally, the virtual picture cards can be
 | ||
| flipped in order to find more information of the artwork on the “back”
 | ||
| of the card. One has to press a little `i` for more information in one
 | ||
| of the bottom corners of the card. On the back of the card two to six
 | ||
| information cards can be found with a teaser text about a certain topic.
 | ||
| These topic cards can be opened and a hypertext with detailed
 | ||
| information opens. Within these hypertexts certain technical terms can
 | ||
| be clicked for lay people to get more information. This also opens up a
 | ||
| pop-up. The events encoded in the raw log files therefore have the
 | ||
| following structure.
 | ||
| 
 | ||
|     "Start Application"     --> Start Application
 | ||
|     "Show Application"
 | ||
|     "Transform start"       --> Move
 | ||
|     "Transform stop"
 | ||
|     "Show Info"             --> Flip Card
 | ||
|     "Show Front"
 | ||
|     "Artwork/OpenCard"      --> Open Topic
 | ||
|     "Artwork/CloseCard"
 | ||
|     "ShowPopup"             --> Open Popup
 | ||
|     "HidePopup"
 | ||
| 
 | ||
| The right side shows what events can be extracted from these raw lines.
 | ||
| The “Start Application” is not an event in the original sense since it
 | ||
| only indicates if the table was started or maybe reset itself. This is
 | ||
| not an interaction with the table and therefore not interesting in
 | ||
| itself. All “Start Application” and “Show Application” are therefore
 | ||
| excluded from the data when further processed and are only in the raw
 | ||
| log files.
 | ||
| 
 | ||
| # Parsing the raw log files
 | ||
| 
 | ||
| The first step is to parse the raw log files that are stored by the
 | ||
| application as text files in a rather unstructured format to a format
 | ||
| that can be read by common statistics software packages. The data are
 | ||
| therefore transferred to a spread sheet format. The following section
 | ||
| describes what problems were encountered while doing this.
 | ||
| 
 | ||
| ## Corrupt lines
 | ||
| 
 | ||
| When reading the files containing the raw logs into R, a warning appears
 | ||
| that says
 | ||
| 
 | ||
|     Warning messages:
 | ||
|       incomplete final line found on '2016/2016_11_18-11_31_0.log'
 | ||
|       incomplete final line found on '2016/2016_11_18-11_38_30.log'
 | ||
|       incomplete final line found on '2016/2016_11_18-11_40_36.log'
 | ||
|       ...
 | ||
| 
 | ||
| When you open these files, it looks like the last line contains some
 | ||
| binary content. It is unclear why and how this happens. So when reading
 | ||
| the data, these lines were removed. A warning will be given that
 | ||
| indicates how many files have been affected.
 | ||
| 
 | ||
| ## Extracted variables from raw log files
 | ||
| 
 | ||
| The following variables (columns in the data frame) are extracted from
 | ||
| the raw log file:
 | ||
| 
 | ||
| - `fileId`: Containing the zero-left-padded file name of the raw log
 | ||
|   file the data line has been extracted from
 | ||
| 
 | ||
| - `folder`: The folder names in which the raw log files haven been
 | ||
|   organized in. For the HAUM data set, the data are sorted by year
 | ||
|   (folders 2016, 2017, 2018, 2019, 2020, 2021, 2022, and 2023).
 | ||
| 
 | ||
| - `date`: Extracted timestamp from the raw log file in the format
 | ||
|   `yyyy-mm-dd hh:mm:ss`.
 | ||
| 
 | ||
| - `timeMs`: Containing a timestamp in Milliseconds that restarts with
 | ||
|   every new raw log files.
 | ||
| 
 | ||
| - `event`: Start and stop event tags. See above for possible values.
 | ||
| 
 | ||
| - `item`: Identifier of the different items. This is a three-digit
 | ||
|   (left-padded) number. The numbers of the items correspond to the
 | ||
|   folder names in `/ContentEyevisit/eyevisit_cards_light/` and were
 | ||
|   orginally taken from the museums catalogue.
 | ||
| 
 | ||
| - `popup`: Name of the pop-up opened. This is only interesting for
 | ||
|   “openPopup” events.
 | ||
| 
 | ||
| - `topic`: The number of the topic card that has been opened at the back
 | ||
|   of the item card. See below for a more detailed description what these
 | ||
|   numbers mean.
 | ||
| 
 | ||
| - `x`: Value of x-coordinate in pixel on the 4K-Display
 | ||
|   ($3840 \times 2160$).
 | ||
| 
 | ||
| - `y`: Value of y-coordinate in pixel.
 | ||
| 
 | ||
| - `scale`: Number in 128 bit that indicates how much the card has been
 | ||
|   scaled.
 | ||
| 
 | ||
| - `rotation`: Degree of rotation from start configuration.
 | ||
| 
 | ||
| <!-- TODO: Nach welchem Zeitintervall resettet sich der Tisch wieder in die
 | ||
|   Ausgangskonfiguration? -->
 | ||
| 
 | ||
| ## Variables after “closing of events”
 | ||
| 
 | ||
| The raw log data consist of start and stop events for each event type.
 | ||
| After preprocessing four event types are extracted: `move`, `flipCard`,
 | ||
| `openTopic`, and `openPopup`. Except for the `move` events, which can
 | ||
| occur at any time when interacting with an item card on the table, the
 | ||
| events have a hierarchical order: An item card first needs to be flipped
 | ||
| (`flipCard`), then the topic cards on the back of the card can be opened
 | ||
| (`openTopic`), and finally pop-ups on these topic cards can be opened
 | ||
| (`openPopup`). This implies that the event `openPopup` can only be
 | ||
| present for a certain item, if the card has already been flipped (i.e.,
 | ||
| an event `flipCard` for the same item has already occured).
 | ||
| 
 | ||
| After preprocessing, the data frame is now in a wide format with columns
 | ||
| for the start and the stop of each event and contains the following
 | ||
| variables:
 | ||
| 
 | ||
| - `fileId.start` / `fileId.stop`: See above.
 | ||
| 
 | ||
| - `date.start` / `date.stop`: See above.
 | ||
| 
 | ||
| - `folder`: Containing the folder name (see above).
 | ||
| 
 | ||
| - `case`: A numerical variable indicating cases in the data. A “case”
 | ||
|   indicates an interaction interval and could be defined in different
 | ||
|   ways. Right now a new case begins, when no event occurred when no new
 | ||
|   path started for 20 seconds or longer.
 | ||
| 
 | ||
| - `path`: A path is defined as one interaction with one item. A path can
 | ||
|   either start with a `flipCard` event or when an item has been touched
 | ||
|   for the first time within this case. A path ends with the item card
 | ||
|   being flipped close again or with the last movement of the card within
 | ||
|   this case. One case can contain several paths with the same item when
 | ||
|   the item is flipped open and flipped close again several times within
 | ||
|   a short time.
 | ||
| 
 | ||
| - `glossar`: An indicator variable with values 0/1 that tracks if a
 | ||
|   pop-up has been opened from the glossar folder. These pop-ups can be
 | ||
|   assigned to the wrong item since it is not possible to do this
 | ||
|   algorithmically. It is possible that two items are flipped open that
 | ||
|   could both link to the same pop-up from a glossar. The indicator
 | ||
|   variable is left as a variable, so that these pop-ups can be easily
 | ||
|   deleted from the data. Right now, glossar entries can be ignored
 | ||
|   completely by setting an argument and this is done by default. Using
 | ||
|   the pop-ups from the glossar will need a lot more love, before it
 | ||
|   behaves satisfactorily.
 | ||
| 
 | ||
| - `event`: Indicating the event. Can take tha values `move`, `flipCard`,
 | ||
|   `openTopic`, and `openPopup`.
 | ||
| 
 | ||
| - `item`: Identifier of the different artworks and information cards.
 | ||
|   This is a three-digit (left-padded) number. See above.
 | ||
| 
 | ||
| - `timeMs.start` / `timeMs.stop`: See above.
 | ||
| 
 | ||
| - `duration`: Calculated by $timeMs.stop - timeMs.start$ in
 | ||
|   Milliseconds. Needs to be adjusted for events spanning more than one
 | ||
|   log file by a factor of $60,000 \times \text{number of logfiles}$. See
 | ||
|   below for details.
 | ||
| 
 | ||
| - `topic`: See above.
 | ||
| 
 | ||
| - `popup`: See above.
 | ||
| 
 | ||
| - `x.start` / `x.stop`: See above.
 | ||
| 
 | ||
| - `y.start` / `y.stop`: See above.
 | ||
| 
 | ||
| - `distance`: Euclidean distande calculated from $(x.start, y.start)$
 | ||
|   and $(x.stop, y.stop)$.
 | ||
| 
 | ||
| - `scale.start` / `scale.stop`: See above.
 | ||
| 
 | ||
| - `scaleSize`: Relative scaling of item card, calculated by
 | ||
|   $\frac{scale.stop}{scale.start}$.
 | ||
| 
 | ||
| - `rotation.start` / `rotation.stop`: See above.
 | ||
| 
 | ||
| - `rotationDegree`: Difference of rotation from $rotation.stop$ to
 | ||
|   $rotation.start$.
 | ||
| 
 | ||
| ## How unclosed events are handled
 | ||
| 
 | ||
| Events do not necessarily need to be completed. A person can, e.g.,
 | ||
| leave the table and not flip the item card close again. For `flipCard`,
 | ||
| `openTopic`, and `openPopup` the data frame contains `NA` when the event
 | ||
| does not complete. For `move` events it happens quite often that a start
 | ||
| event follows a start event and a stop event follows a stop event.
 | ||
| Technically a move event cannot *not* be finished and the number of
 | ||
| events without a start or stop indicate that the time resolution was not
 | ||
| sufficient to catch all these events accurately. Double start and stop
 | ||
| `move` events have therefore been deleted from the data set.
 | ||
| 
 | ||
| ## Additional meta data
 | ||
| 
 | ||
| For the HAUM data, I added meta data on state holidays and school
 | ||
| vacations.
 | ||
| 
 | ||
| This led to the following additional variables:
 | ||
| 
 | ||
| - `holiday`
 | ||
| 
 | ||
| - `vacations`
 | ||
| 
 | ||
| # Problems and how I handled them
 | ||
| 
 | ||
| This lists some problems with the log data that required decisions.
 | ||
| These decisions influence the outcome and maybe even the data quality.
 | ||
| Hence, I tried to document how I handled these problems and explain the
 | ||
| decisions I made.
 | ||
| 
 | ||
| ## Weird behavior of `timeMs` and neg. `duration` values
 | ||
| 
 | ||
| `timeMs` resets itself every time a new log file starts. This means that
 | ||
| the durations of events spanning more than one log file must be
 | ||
| adjusted. Instead of just calculating $timeMs.stop - timeMs.start$,
 | ||
| `timeMs.start` must be subtracted from the maximum duration of the log
 | ||
| file where the event started ($600,000 ms$) and the `timeMs.stop` must
 | ||
| be added. If the event spans more than two log files, a multiple of
 | ||
| $600,000$ must be taken, e.g. for three log files it must be:
 | ||
| $2 \times 600,000 - timeMs.start + timeMs.stop$ and so on.
 | ||
| 
 | ||
| <!-- -->
 | ||
| 
 | ||
| The boxplot shows that we have a continuous range of values within one
 | ||
| log file but that `timeMs` does not increase over log files. I kept
 | ||
| `timeMs.start` and `timeMs.stop` and also `fileId.start` and
 | ||
| `fileId.stop` in the data frame, so it is clear when events span more
 | ||
| than one log file.
 | ||
| 
 | ||
| <!--
 | ||
| Infos from the programmer:
 | ||
| 
 | ||
| "Bin außerdem gerade den Code von damals durchgegangen. Das Logging läuft
 | ||
| so: Mit Start der Anwendung wird alle 10 Minuten ein neues Logfile
 | ||
| erstellt. Die Startzeit, von der aus die Duration berechnet wird, wird
 | ||
| jeweils neu gesetzt. Duration ist also nicht "Dauer seit Start der
 | ||
| Anwendung" sondern "Dauer seit Restart des Loggers". Deine Vermutung ist
 | ||
| also richtig - es sollte keine Durations >10 Minuten geben. Der erste
 | ||
| Eintrag eines Logfiles kann alles zwischen 0 und 10 Minuten sein (je
 | ||
| nachdem, ob der Tisch zum Zeitpunkt des neuen Logging-Intervalls in
 | ||
| Benutzung war). Wenn ein Case also über 2+ Logs verteilt ist, musst du auf
 | ||
| die Duration jeweils 10 Minuten pro Logfile nach dem ersten addieren, damit
 | ||
| es passt."
 | ||
| -->
 | ||
| 
 | ||
| ## Left padding of file IDs
 | ||
| 
 | ||
| The file names of the raw log files are automatically generated and
 | ||
| contain a timestamp. This timestamp is not well formed. First, it
 | ||
| contains an incorrect month. The months go from 0 to 11 which means,
 | ||
| that the file name `2016_11_15-12_12_57.log` was collected on December
 | ||
| 15, 2016 at 12:12 pm. Another problem is that the file names are not
 | ||
| zero left padded, e.g., `2016_11_15-12_2_57.log`. This file was
 | ||
| collected on December 15, 2016 at 12:02 pm and therefore before the file
 | ||
| above. But most sorting algorithms, will sort these files in the order
 | ||
| shown below. In order to preprocess the data and close events that
 | ||
| belong together, the data need to be sorted by events and artworks
 | ||
| repeatedly. In order to get them back in the correct time order, it is
 | ||
| necessary to order them based on three variables: `fileId.start`,
 | ||
| `date.start` and `timeMs.start`. The file IDs therefore need to sort in
 | ||
| the correct order (again see below for example). I zero left padded the
 | ||
| log file names within the data frame using it as an identifier. These
 | ||
| “file names” do not correspond exactly to the original raw log file
 | ||
| names. This needs to be kept in mind when doing any kind of matching
 | ||
| etc.
 | ||
| 
 | ||
|     ## what it looked like before left padding
 | ||
|     # 1422  ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_2_57.log 2016-12-15 12:12:56 599671 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26874254
 | ||
|     # 1423 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57    621 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26523465
 | ||
|     # 1424 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57    677  Transform stop 076 076.xml NA 2092.25 2008.00 0.2997736 13.26239605
 | ||
|     # 1425 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57    774 Transform start 076 076.xml NA 2092.25 2008.00 0.2999345 13.26239605
 | ||
|     # 1426 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57    850  Transform stop 076 076.xml NA 2092.25 2008.00 0.2997107 13.26223362
 | ||
|     # 1427  ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_2_57.log 2016-12-15 12:12:57 599916  Transform stop 076 076.xml NA 2092.25 2008.00 0.2997771 13.26523465
 | ||
| 
 | ||
|     ## what it looks like now
 | ||
|     # 1422 2016_11_15-12_02_57.log 2016-12-15 12:12:56 599671 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26874254
 | ||
|     # 1423 2016_11_15-12_02_57.log 2016-12-15 12:12:57 599916  Transform stop 076 076.xml NA 2092.25 2008.00 0.2997771 13.26523465
 | ||
|     # 1424 2016_11_15-12_12_57.log 2016-12-15 12:12:57    621 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26523465
 | ||
|     # 1425 2016_11_15-12_12_57.log 2016-12-15 12:12:57    677  Transform stop 076 076.xml NA 2092.25 2008.00 0.2997736 13.26239605
 | ||
|     # 1426 2016_11_15-12_12_57.log 2016-12-15 12:12:57    774 Transform start 076 076.xml NA 2092.25 2008.00 0.2999345 13.26239605
 | ||
|     # 1427 2016_11_15-12_12_57.log 2016-12-15 12:12:57    850  Transform stop 076 076.xml NA 2092.25 2008.00 0.2997107 13.26223362
 | ||
| 
 | ||
| ## Timestamps repeat
 | ||
| 
 | ||
| The timestamps in the `date` variable record year, month, day, hour,
 | ||
| minute and seconds. Since one second is not a very short time interval
 | ||
| for a move on a touch display, this is not fine grained enough to bring
 | ||
| events into the correct order, meaning there are events from the same
 | ||
| log file having the same timestamp and even events from different log
 | ||
| files having the same timestamp. The log files get written about every
 | ||
| 10 minutes (which can easily be seen when looking at the file names of
 | ||
| the raw log files). So in order to get events in the correct order, it
 | ||
| is necessary to first order by file ID, within file ID then sort by
 | ||
| timestamp `date` and then within these more coarse grained timestamps
 | ||
| sort be `timeMs`. But as explained above, `timeMs` can only be sorted
 | ||
| within one file ID, since they do not increase consistently over log
 | ||
| files, but have a new setoff for each raw log file.
 | ||
| 
 | ||
| ## x,y-coordinates outside of display range
 | ||
| 
 | ||
| The display of the Multi-Touch-Table is a 4K-display with 3840 x 2160
 | ||
| pixels. When you plot the start and stop coordinates, the display is
 | ||
| clearly distinguishable. However, a lot of points are outside of the
 | ||
| display range. This can happen, when the art objects are scaled and then
 | ||
| moved to the very edge of the table. Then it will record pixels outside
 | ||
| of the table. These are actually valid data points and I will leave them
 | ||
| as is.
 | ||
| 
 | ||
| ``` r
 | ||
| datlogs <- read.table("../../MDS/2023ss/60100_master_thesis/analysis/code/results/event_logfiles_2024-02-21_16-07-33.csv", sep = ";",
 | ||
|                       header = TRUE)
 | ||
| 
 | ||
| par(mfrow = c(1, 2))
 | ||
| plot(y.start ~ x.start, datlogs)
 | ||
| abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
 | ||
| plot(y.stop ~ x.stop, datlogs)
 | ||
| abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
 | ||
| ```
 | ||
| 
 | ||
| <!-- -->
 | ||
| 
 | ||
| ``` r
 | ||
| 
 | ||
| aggregate(cbind(x.start, x.stop, y.start, y.stop) ~ 1, datlogs, mean)
 | ||
| ##    x.start   x.stop  y.start   y.stop
 | ||
| ## 1 1978.202 1975.876 1137.481 1133.494
 | ||
| ```
 | ||
| 
 | ||
| ## Pop-ups from glossar cannot be assigned to a specific item
 | ||
| 
 | ||
| All the information, pictures and texts for the topics and pop-ups are
 | ||
| stored in
 | ||
| `/data/haum/ContentEyevisit/eyevisit_cards_light/<item_number>`. Among
 | ||
| other things, each folder contains XML-files with the information about
 | ||
| any technical terms that can be opened from the hypertexts on the topic
 | ||
| cards. Often these information are item dependent and then the
 | ||
| corresponding XML-file is in the folder for this item. Sometimes,
 | ||
| however, more general terms can be opened. In order to avoid multiple
 | ||
| files containing the same information, these were stored in a folder
 | ||
| called `glossar` and get accessed from there. The raw log files only
 | ||
| contain the path to this glossar entry and did not record from which
 | ||
| item it was accessed. I tried to assign these glossar entries to the
 | ||
| correct items. The (very heuristic) approach was this:
 | ||
| 
 | ||
| 1.  Create a lookup table with all XML-file names (possible pop-ups)
 | ||
|     from the glossar folder and what items possibly call them. This was
 | ||
|     stored as an `RData` object for easier handling but should maybe be
 | ||
|     stored in a more interoperable format.
 | ||
| 
 | ||
| 2.  I went through all possible pop-ups in this lookup table and stored
 | ||
|     the items that are associated with it.
 | ||
| 
 | ||
| 3.  I created a sub data frame without move events (since they can never
 | ||
|     be associated with a pop-up) and went through every line and looked
 | ||
|     up if an item and a topic card had been opened. If this was the case
 | ||
|     and a glossar entry came up before the item was closed again, I
 | ||
|     assigned this item to the glossar entry.
 | ||
| 
 | ||
| This is heuristic since it is possible that several topic cards from
 | ||
| different items are opened simultaneously and the glossar pop-up could
 | ||
| be opened from either one (it could even be more than two, of course).
 | ||
| In these cases the item that was opened closest to the glossar pop-up
 | ||
| has been assigned, but this can never be completely error free.
 | ||
| 
 | ||
| And this heuristic only assigns a little more than half of the glossar
 | ||
| entries. Since my heuristic only looks for the last item that has been
 | ||
| opened and if this item is a possible candidate it misses all glossar
 | ||
| pop-ups where another item has been opened in between. This is still an
 | ||
| open TODO to write a more elaborate algorithm.
 | ||
| 
 | ||
| All glossar pop-ups that do not get matched with an item are removed
 | ||
| from the data set with a warning if the argument `glossar = TRUE` is
 | ||
| set. Otherwise the glossar entries will be ignored completely.
 | ||
| 
 | ||
| ## Assign a `case` variable based on “time heuristic”
 | ||
| 
 | ||
| One thing needed in order to work with the data set and use it for
 | ||
| machine learning algorithms like process mining, is a variable that
 | ||
| tries to identify a case. A case variable will structure the data frame
 | ||
| in a way that navigation behavior can actually be investigated. However,
 | ||
| we do not know if several people are standing around the table
 | ||
| interacting with it or just one very active person. The simplest way to
 | ||
| define a case variable is to just use a time limit between events. This
 | ||
| means that when the table has not been interacted with for, e.g., 20
 | ||
| seconds than it is assumed that a person moved on and a new person
 | ||
| started interacting with the table. This is the easiest heuristic and
 | ||
| implemented at the moment. Process mining shows that this simple
 | ||
| approach works in a way that the correct process gets extracted by the
 | ||
| algorithm.
 | ||
| 
 | ||
| In order to investigate user behavior on a more fine grained level, it
 | ||
| will be necessary to come up with a more elaborate approach. A better,
 | ||
| still simple approach, could be to use this kind of time limit and
 | ||
| additionally look at the distance between items interacted with within
 | ||
| one time window. When items are far apart it seems plausible that more
 | ||
| than one person interacted with them. Very short time lapses between
 | ||
| events on different items could also be an indicator that more than one
 | ||
| person is interacting with the table.
 | ||
| 
 | ||
| ## Assign a `path` variable
 | ||
| 
 | ||
| The `path` variable is supposed to show one interaction trace with one
 | ||
| artwork. Meaning it starts when an artwork is touched or flipped and
 | ||
| stops when it is closed again. It is easy to assign a path from flipping
 | ||
| a card over opening (maybe several) topics and pop-ups for this artwork
 | ||
| card until closing this card again. But one would like to assign the
 | ||
| same path to move events surrounding this interaction. Again, this is
 | ||
| not possible in an algorithmic way but only heuristically.
 | ||
| 
 | ||
| Again, I used a time cutoff for this. First, if a `move` event occurs,
 | ||
| it is checked, if the same item has been flipped less than 20 seconds
 | ||
| beforehand. If yes, the same path indicator is assigned to this `move`.
 | ||
| If not, temporarily a new “move indicator” is assigned. Then, a
 | ||
| “backward pass” is applied, where it is checked if the same item is
 | ||
| opened less than 20 seconds *after* the event occurs. If yes, that path
 | ||
| indicator is assigned. For all the remaining moves, a new path number is
 | ||
| assigned. This corresponds to items being moved without being flipped.
 | ||
| 
 | ||
| ## A `move` event does not record any change
 | ||
| 
 | ||
| Most of the events in the log files are move events. Additionally, many
 | ||
| of these move events are recorded but they do not indicate any change,
 | ||
| meaning the only difference is the timestamp. All other variables
 | ||
| indicating moves like `x.start` and `x.stop`, `rotation.start` and
 | ||
| `rotation.stop` etc. do not show *any* change. They represent about 2/3
 | ||
| of all move events. These events are probably short touches of the table
 | ||
| without an actual interaction. They were therefore removed from the data
 | ||
| set.
 | ||
| 
 | ||
| ## Card indices go from 0 to 7 (instead of 0 to 5 as expected)
 | ||
| 
 | ||
| In the beginning I thought that the number for topics was the index of
 | ||
| where the card was presented on the back of the item. But this is not
 | ||
| correct. It is the number of the topic. There are eight topics in total:
 | ||
| 
 | ||
|     Indices for topics:
 | ||
|     0   artist
 | ||
|     1   thema
 | ||
|     2   komposition
 | ||
|     3   leben des kunstwerks
 | ||
|     4   details
 | ||
|     5   licht und farbe
 | ||
|     6   extra info
 | ||
|     7   technik
 | ||
| 
 | ||
| On the back of items, there can be between 2 to 6 topic cards. Several
 | ||
| of these topic cards can be about the same topic, e.g., there can be two
 | ||
| topic cards assigned to the topic `thema`. It is impossible to find out
 | ||
| if the same topic card was opened several times or if different topic
 | ||
| cards with the same topic were opened from the same item. See example
 | ||
| below for item “001”.
 | ||
| 
 | ||
|     ##   item            file_name                topic
 | ||
|     ## 1  001 001_dargestellte.xml                thema
 | ||
|     ## 2  001       001_thema1.xml                thema
 | ||
|     ## 3  001        001_leben.xml leben des kunstwerks
 | ||
|     ## 4  001       001_leben3.xml leben des kunstwerks
 | ||
|     ## 5  001       001_thema2.xml                thema
 | ||
|     ## 6  001        001_thema.xml                thema
 | ||
| 
 | ||
| ## New artworks “504” and “505” starting October 2022
 | ||
| 
 | ||
| When I read in the complete data frame for the first time, all of the
 | ||
| sudden there were 72 instead of 70 items. It seems like these two
 | ||
| artworks appear on October 21, 2022.
 | ||
| 
 | ||
| ``` r
 | ||
| summary(as.Date(datraw[datraw$item %in% c("504", "505"), "date"]))
 | ||
| ##         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
 | ||
| ## "2022-10-21" "2023-01-11" "2023-03-08" "2023-03-09" "2023-05-21" "2023-07-05"
 | ||
| ```
 | ||
| 
 | ||
| The artworks seem to be have updated in general after October 21, 2022.
 | ||
| The following table shows which items were presented in which years.
 | ||
| 
 | ||
| ``` r
 | ||
| xtabs(~ item + lubridate::year(date.start), datlogs)
 | ||
| ##      lubridate::year(date.start)
 | ||
| ## item   2016  2017  2018  2019  2020  2022  2023
 | ||
| ##   1     277  4082  1912  1434   424   394  1315
 | ||
| ##   3     485  6730  3126  2356   528   457  1124
 | ||
| ##   19    714  8656  4028  2743   660   698  1595
 | ||
| ##   20    595  8461  3996  2983   938   657  1355
 | ||
| ##   24    497  6638  2912  2251   649   439  1028
 | ||
| ##   27    567  5959  3112  2318   651   711  1324
 | ||
| ##   28    601  9329  4394  3056   778   762  1570
 | ||
| ##   29    425  6865  3830  2365   516   615  1174
 | ||
| ##   31    289  4118  2051  1218   291   296   675
 | ||
| ##   32    562  7016  3477  2253   726   766  1647
 | ||
| ##   33    509  4936  2242  1449   555   358   666
 | ||
| ##   36    434  4505  2276  1668   373   387   976
 | ||
| ##   37    242  4478  2182  1554   339   423  1168
 | ||
| ##   38    480  4617  2144  1397   371   381   784
 | ||
| ##   39    395  3227  1313  1003   237   161   622
 | ||
| ##   41    282  3329  1303  1022   225   209   701
 | ||
| ##   42    203  3113  1307   903   242   191   421
 | ||
| ##   43    115  2420  1089   806   176   219   486
 | ||
| ##   45   1491 13561  5924  4474   966   585  1828
 | ||
| ##   46    903  9181  5340  3812   961   944  1648
 | ||
| ##   47    306  4949  2395  1510   750   297   675
 | ||
| ##   48    723 10455  5384  4162  1328   948  2031
 | ||
| ##   49    433  4326  2124  1414   434   431   809
 | ||
| ##   51    564  7837  4577  2991   884   659  1370
 | ||
| ##   52    447  5021  2104  1729   471   349   840
 | ||
| ##   54    424  5068  2816  2008   529   370   918
 | ||
| ##   55    358  4859  2069  1428   341   403  1303
 | ||
| ##   57    860 14264  6625  5092  1410  1221  2714
 | ||
| ##   60    555  6865  3539  2336   639   586  1415
 | ||
| ##   62    547  6736  3803  2210   795   633  1322
 | ||
| ##   63    251  3677  1827  1241   300   282   527
 | ||
| ##   66    552  6004  2774  1977   505   373   932
 | ||
| ##   69    394  3730  1827  1438   272   206   680
 | ||
| ##   70    226  3766  1843   973   293   268   703
 | ||
| ##   71    557  6160  2490  1846   570   323   839
 | ||
| ##   72    426  6194  2857  2129   508   635  1553
 | ||
| ##   73    432  6125  2880  1821   583   395   939
 | ||
| ##   75    258  5885  2418  1562   369   257   645
 | ||
| ##   76    861 12435  6253  4214  1753  1153  2268
 | ||
| ##   77    816  8595  4197  2897   699   674  1452
 | ||
| ##   78    410  5632  2498  1924   394   408   850
 | ||
| ##   80   1650 25687 12429  7782  1975  1712  4433
 | ||
| ##   83    644  8618  4720  3026   987  1027  2294
 | ||
| ##   84    184  2121  1231   759   231   254   465
 | ||
| ##   87    149  1618   722   632    99     0     0
 | ||
| ##   88    513  6996  3493  2272   539   533  1420
 | ||
| ##   89    214  2204   950   723   156     0     0
 | ||
| ##   90    281  3756  1372  1143   403   320   932
 | ||
| ##   93    613  8528  4224  3015   696  1174  2058
 | ||
| ##   98    462  6662  3265  2565   704   670  1453
 | ||
| ##   99    180  4162  1653  1454   363   411   868
 | ||
| ##   101   414  4209  1859  1282   392   411   981
 | ||
| ##   103   677  8758  4366  3165  1045   909  1871
 | ||
| ##   104   423  5256  2381  1865   463   467   933
 | ||
| ##   107   181  2101  1106   788   205   146   339
 | ||
| ##   109   321  4001  1619  1106   292   188   453
 | ||
| ##   110   489  5846  2785  2008   494   387   923
 | ||
| ##   125   640  8435  4519  3334   926     0     0
 | ||
| ##   129   598 11322  5046  3369   910  1131  1682
 | ||
| ##   145   419  7821  3945  2694   706   740  1396
 | ||
| ##   176   507  8465  3968  2787   687   552  1544
 | ||
| ##   180   516  7563  3720  2765   585   550  1272
 | ||
| ##   183   377  4014  1819  1741   346   251   675
 | ||
| ##   187   340  4222  2165  1753   319   312   734
 | ||
| ##   197   426  7710  3603  2510   671   602  1217
 | ||
| ##   229   303  4872  2360  1891   482   389  1005
 | ||
| ##   231   271  3606  1851  1239   318   236   467
 | ||
| ##   501  1915 15968  7849  5060  1157   890  2989
 | ||
| ##   502  1212 14550  7111  4749  1105   883  2752
 | ||
| ##   503  1308 15218  8632  6399  1626   870  2558
 | ||
| ##   504     0     0     0     0     0   363   662
 | ||
| ##   505     0     0     0     0     0   426  1533
 | ||
| ```
 | ||
| 
 | ||
| It shows that the artworks haven been updated after the Corona pandemic.
 | ||
| I think, the table was also moved to a different location at that point.
 |