2023-09-13 14:20:08 +02:00
|
|
|
---
|
2024-03-22 15:58:30 +01:00
|
|
|
title: "Log data from the Multi-Touch Table at the HAUM"
|
|
|
|
output: github_document
|
2023-09-13 14:20:08 +02:00
|
|
|
---
|
|
|
|
|
2023-09-21 16:45:06 +02:00
|
|
|
```{r, include = FALSE}
|
2024-03-22 15:58:30 +01:00
|
|
|
devtools::load_all("../../../../software/mtt")
|
2023-09-21 16:45:06 +02:00
|
|
|
```
|
|
|
|
|
2023-09-13 14:20:08 +02:00
|
|
|
The Multi Touch Table at the Herzog-Anton-Ulrich-Museum (HAUM) in
|
|
|
|
Braunschweig gives visitors of the Museum the opportunity to interact with
|
2024-03-22 15:58:30 +01:00
|
|
|
about 70 artworks and 3 virtual cards containing information about the
|
2024-03-22 16:39:32 +01:00
|
|
|
museum and its layout. The table was installed at the museum in October
|
2024-03-22 15:58:30 +01:00
|
|
|
2016 and since November 2016 log files from interactions of visitors of the
|
|
|
|
museum have been collected. These log files are in an unstructured format
|
|
|
|
and cannot be easily analyzed. The purpose of the following document is to
|
|
|
|
describe how the data haven been transformed and which decisions have been
|
|
|
|
made along the way.
|
2023-09-13 14:20:08 +02:00
|
|
|
|
2024-03-22 16:39:32 +01:00
|
|
|
The implementation of the steps described here can be found at:
|
|
|
|
https://gitea.iwm-tuebingen.de/R/mtt.
|
|
|
|
|
2023-09-13 14:20:08 +02:00
|
|
|
# Data structure
|
|
|
|
|
|
|
|
The log files contain lines that indicate the beginning and end of possible
|
2024-03-22 15:58:30 +01:00
|
|
|
activities that can be performed when interacting with the artworks on the
|
|
|
|
table. The layout of the table looks like pictures have been tossed on a
|
2023-09-13 14:20:08 +02:00
|
|
|
large table. Every artwork is visible at the start configuration. People
|
|
|
|
can move the pictures on the table, they can be scaled and rotated.
|
|
|
|
Additionally, the virtual picture cards can be flipped in order to find
|
|
|
|
more information of the artwork on the "back" of the card. One has to press
|
|
|
|
a little `i` for more information in one of the bottom corners of the card.
|
2024-03-22 15:58:30 +01:00
|
|
|
On the back of the card two to six information cards can be found with a
|
|
|
|
teaser text about a certain topic. These topic cards can be opened and a
|
|
|
|
hypertext with detailed information opens. Within these hypertexts certain
|
|
|
|
technical terms can be clicked for lay people to get more information. This
|
|
|
|
also opens up a pop-up. The events encoded in the raw log files therefore
|
|
|
|
have the following structure.
|
2023-09-13 14:20:08 +02:00
|
|
|
|
|
|
|
```
|
|
|
|
"Start Application" --> Start Application
|
|
|
|
"Show Application"
|
|
|
|
"Transform start" --> Move
|
|
|
|
"Transform stop"
|
|
|
|
"Show Info" --> Flip Card
|
|
|
|
"Show Front"
|
|
|
|
"Artwork/OpenCard" --> Open Topic
|
|
|
|
"Artwork/CloseCard"
|
|
|
|
"ShowPopup" --> Open Popup
|
|
|
|
"HidePopup"
|
|
|
|
```
|
|
|
|
|
|
|
|
The right side shows what events can be extracted from these raw lines. The
|
|
|
|
"Start Application" is not an event in the original sense since it only
|
|
|
|
indicates if the table was started or maybe reset itself. This is not an
|
|
|
|
interaction with the table and therefore not interesting in itself. All
|
|
|
|
"Start Application" and "Show Application" are therefore excluded from the
|
|
|
|
data when further processed and are only in the raw log files.
|
|
|
|
|
|
|
|
# Parsing the raw log files
|
|
|
|
|
|
|
|
The first step is to parse the raw log files that are stored by the
|
|
|
|
application as text files in a rather unstructured format to a format that
|
2023-10-25 17:13:07 +02:00
|
|
|
can be read by common statistics software packages. The data are therefore
|
|
|
|
transferred to a spread sheet format. The following section describes what
|
|
|
|
problems were encountered while doing this.
|
2023-09-13 14:20:08 +02:00
|
|
|
|
|
|
|
## Corrupt lines
|
|
|
|
|
|
|
|
When reading the files containing the raw logs into R, a warning appears
|
|
|
|
that says
|
|
|
|
|
|
|
|
```
|
|
|
|
Warning messages:
|
2023-10-25 17:13:07 +02:00
|
|
|
incomplete final line found on '2016/2016_11_18-11_31_0.log'
|
|
|
|
incomplete final line found on '2016/2016_11_18-11_38_30.log'
|
|
|
|
incomplete final line found on '2016/2016_11_18-11_40_36.log'
|
2023-09-13 14:20:08 +02:00
|
|
|
...
|
|
|
|
```
|
|
|
|
|
|
|
|
When you open these files, it looks like the last line contains some binary
|
|
|
|
content. It is unclear why and how this happens. So when reading the data,
|
|
|
|
these lines were removed. A warning will be given that indicates how many
|
|
|
|
files have been affected.
|
|
|
|
|
2023-10-25 17:13:07 +02:00
|
|
|
## Extracted variables from raw log files
|
2023-09-13 14:20:08 +02:00
|
|
|
|
2023-10-25 17:13:07 +02:00
|
|
|
The following variables (columns in the data frame) are extracted from the
|
|
|
|
raw log file:
|
|
|
|
|
|
|
|
* `fileId`: Containing the zero-left-padded file name of the raw log file
|
|
|
|
the data line has been extracted from
|
|
|
|
|
|
|
|
* `folder`: The folder names in which the raw log files haven been
|
|
|
|
organized in. For the HAUM data set, the data are sorted by year (folders
|
|
|
|
2016, 2017, 2018, 2019, 2020, 2021, 2022, and 2023).
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
* `date`: Extracted timestamp from the raw log file in the format
|
2023-10-25 17:13:07 +02:00
|
|
|
`yyyy-mm-dd hh:mm:ss`.
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
* `timeMs`: Containing a timestamp in Milliseconds that restarts with
|
2023-10-25 17:13:07 +02:00
|
|
|
every new raw log files.
|
|
|
|
|
|
|
|
* `event`: Start and stop event tags. See above for possible values.
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
* `item`: Identifier of the different items. This is a three-digit
|
|
|
|
(left-padded) number. The numbers of the items correspond to the
|
2023-10-25 17:13:07 +02:00
|
|
|
folder names in `/ContentEyevisit/eyevisit_cards_light/` and were
|
|
|
|
orginally taken from the museums catalogue.
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
* `popup`: Name of the pop-up opened. This is only interesting for
|
2023-10-25 17:13:07 +02:00
|
|
|
"openPopup" events.
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
* `topic`: The number of the topic card that has been opened at the back of
|
|
|
|
the item card. See below for a more detailed descripttion what these
|
|
|
|
numbers mean.
|
2023-10-25 17:13:07 +02:00
|
|
|
|
|
|
|
* `x`: Value of x-coordinate in pixel on the 4K-Display ($3840 \times 2160$)
|
|
|
|
|
|
|
|
* `y`: Value of y-coordinate in pixel
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
* `scale`: Number in 128 bit that indicates how much the card has been
|
|
|
|
scaled
|
2023-10-25 17:13:07 +02:00
|
|
|
|
|
|
|
* `rotation`: Degree of rotation in start configuration.
|
|
|
|
|
|
|
|
<!-- TODO: Nach welchem Zeitintervall resettet sich der Tisch wieder in die
|
|
|
|
Ausgangskonfiguration? -> PM needs to look it up -->
|
|
|
|
|
|
|
|
## Variables after "closing of events"
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
The raw log data consist of start and stop events for each event type.
|
|
|
|
After preprocessing four event types are extracted: `move`, `flipCard`,
|
2023-10-25 17:13:07 +02:00
|
|
|
`openTopic`, and `openPopup`. Except for the `move` events, which can occur
|
2024-03-22 15:58:30 +01:00
|
|
|
at any time when interacting with an item card on the table, the events
|
|
|
|
have a hierarchical order: An item card first needs to be flipped
|
2023-10-25 17:13:07 +02:00
|
|
|
(`flipCard`), then the topic cards on the back of the card can be opened
|
|
|
|
(`openTopic`), and finally pop-ups on these topic cards can be opened
|
|
|
|
(`openPopup`). This implies that the event `openPopup` can only be present
|
2024-03-22 15:58:30 +01:00
|
|
|
for a certain item, if the card has already been flipped (i.e., an event
|
|
|
|
`flipCard` for the same item has already occured).
|
2023-10-25 17:13:07 +02:00
|
|
|
|
|
|
|
After preprocessing, the data frame is now in a wide format with columns
|
|
|
|
for the start and the stop of each event and contains the following
|
|
|
|
variables:
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
* `fileId.start` / `fileId.stop`: See above.
|
2023-10-25 17:13:07 +02:00
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
* `date.start` / `date.stop`: See above.
|
|
|
|
|
|
|
|
* `folder`: Containing the folder name (see above)
|
2023-10-25 17:13:07 +02:00
|
|
|
|
|
|
|
* `case`: A numerical variable indicating cases in the data. A "case"
|
|
|
|
indicates an interaction interval and could be defined in different ways.
|
2024-03-22 15:58:30 +01:00
|
|
|
Right now a new case begins, when no event occurred for 20 seconds or
|
|
|
|
longer.
|
|
|
|
|
|
|
|
* `path`: A path is defined as one interaction with one item A path
|
|
|
|
can either start with a `flipCard` event or when an item has been
|
|
|
|
touched for the first time within this case. A path ends with the
|
|
|
|
item card being flipped close again or with the last movement of the
|
|
|
|
card within this case. One case can contain several paths with the same
|
|
|
|
item when the item is flipped open and flipped close again several
|
2023-10-25 17:13:07 +02:00
|
|
|
times within a short time.
|
|
|
|
|
|
|
|
* `glossar`: An indicator variable with values 0/1 that tracks if a pop-up
|
|
|
|
has been opened from the glossar folder. These pop-ups can be assigned to
|
2024-03-22 15:58:30 +01:00
|
|
|
the wrong item since it is not possible to do this algorithmically.
|
|
|
|
It is possible that two items are flipped open that could both link to
|
|
|
|
the same pop-up from a glossar. The indicator variable is left as a
|
2023-10-25 17:13:07 +02:00
|
|
|
variable, so that these pop-ups can be easily deleted from the data.
|
|
|
|
Right now, glossar entries can be ignored completely by setting an
|
|
|
|
argument and this is done by default. Using the pop-ups from the glossar
|
|
|
|
will need a lot more love, before it behaves satisfactorily.
|
|
|
|
|
|
|
|
* `event`: Indicating the event. Can take tha values `move`, `flipCard`,
|
|
|
|
`openTopic`, and `openPopup`.
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
* `item`: Identifier of the different artworks and information cards. This
|
|
|
|
is a three-digit (left-padded) number. See above.
|
2023-10-25 17:13:07 +02:00
|
|
|
|
|
|
|
* `timeMs.start` / `timeMs.stop`: See above.
|
|
|
|
|
|
|
|
* `duration`: Calculated by $timeMs.stop - timeMs.start$ in Milliseconds.
|
|
|
|
Needs to be adjusted for events spanning more than one log file by a
|
2024-03-22 15:58:30 +01:00
|
|
|
factor of $60,000 \times \text{number of logfiles}$. See below for details.
|
2023-10-25 17:13:07 +02:00
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
* `topic`: See above.
|
2023-10-25 17:13:07 +02:00
|
|
|
|
|
|
|
* `popup`: See above.
|
|
|
|
|
|
|
|
* `x.start` / `x.stop`: See above.
|
|
|
|
|
|
|
|
* `y.start` / `y.stop`: See above.
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
* `distance`: Euclidean distande calculated from $(x.start, y.start)$ and
|
|
|
|
$(x.stop, y.stop)$.
|
2023-10-25 17:13:07 +02:00
|
|
|
|
|
|
|
* `scale.start` / `scale.stop`: See above.
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
* `scaleSize`: Relative scaling of item card, calculated by
|
2023-10-25 17:13:07 +02:00
|
|
|
$\frac{scale.stop}{scale.start}$.
|
|
|
|
|
|
|
|
* `rotation.start` / `rotation.stop`: See above.
|
|
|
|
|
|
|
|
* `rotationDegree`: Difference of rotation from $rotation.stop$ to
|
|
|
|
$rotation.start$.
|
2023-09-13 14:20:08 +02:00
|
|
|
|
|
|
|
## How unclosed events are handled
|
|
|
|
|
2023-10-25 17:13:07 +02:00
|
|
|
Events do not necessarily need to be completed. A person can, e.g., leave
|
2024-03-22 15:58:30 +01:00
|
|
|
the table and not flip the item card close again. For `flipCard`,
|
2023-10-25 17:13:07 +02:00
|
|
|
`openTopic`, and `openPopup` the data frame contains `NA` when the event
|
2024-03-22 15:58:30 +01:00
|
|
|
does not complete. For `move` events it happens quite often that a start
|
2023-10-25 17:13:07 +02:00
|
|
|
event follows a start event and a stop event follows a stop event.
|
|
|
|
Technically a move event cannot *not* be finished and the number of events
|
2024-03-22 15:58:30 +01:00
|
|
|
without a start or stop indicate that the time resolution was not
|
2023-10-25 17:13:07 +02:00
|
|
|
sufficient to catch all these events accurately. Double start and stop
|
2024-03-22 15:58:30 +01:00
|
|
|
`move` events have therefore been deleted from the data set.
|
2023-09-13 14:20:08 +02:00
|
|
|
|
|
|
|
## Additional meta data
|
|
|
|
|
2023-10-25 17:13:07 +02:00
|
|
|
For the HAUM data, I added meta data on state holidays and school
|
2024-03-22 15:58:30 +01:00
|
|
|
vacations.
|
2023-10-25 17:13:07 +02:00
|
|
|
|
|
|
|
This led to the following additional variables:
|
|
|
|
|
|
|
|
* `holiday`
|
|
|
|
|
|
|
|
* `vacations`
|
|
|
|
|
2023-09-13 14:20:08 +02:00
|
|
|
# Problems and how I handled them
|
|
|
|
|
|
|
|
This lists some problems with the log data that required decisions. These
|
|
|
|
decisions influence the outcome and maybe even the data quality. Hence, I
|
|
|
|
tried to document how I handled these problems and explain the decisions I
|
|
|
|
made.
|
|
|
|
|
|
|
|
## Weird behavior of `timeMs` and neg. `duration` values
|
|
|
|
|
2023-10-25 17:13:07 +02:00
|
|
|
`timeMs` resets itself every time a new log file starts. This means that
|
|
|
|
the durations of events spanning more than one log file must be adjusted.
|
|
|
|
Instead of just calculating $timeMs.stop - timeMs.start$, `timeMs.start`
|
|
|
|
must be subtracted from the maximum duration of the log file where the
|
|
|
|
event started ($600,000 ms$) and the `timeMs.stop` must be added. If the
|
|
|
|
event spans more than two log files, a multiple of $600,000$ must be taken,
|
|
|
|
e.g. for three log files it must be: $2 \times 600,000 - timeMs.start +
|
|
|
|
timeMs.stop$ and so on.
|
2023-09-13 14:20:08 +02:00
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
```{r timems, echo = FALSE, results = FALSE, fig.show = TRUE}
|
2023-09-13 14:20:08 +02:00
|
|
|
# Read data
|
2024-03-22 15:58:30 +01:00
|
|
|
datraw <- read.table("code/results/raw_logfiles_2024-02-21_16-07-33.csv", sep = ";",
|
|
|
|
header = TRUE)
|
|
|
|
|
|
|
|
plot(timeMs ~ as.factor(fileId), datraw[1:5000,], xlab = "fileId")
|
2023-09-13 14:20:08 +02:00
|
|
|
```
|
|
|
|
|
|
|
|
The boxplot shows that we have a continuous range of values within one log
|
2023-10-25 17:13:07 +02:00
|
|
|
file but that `timeMs` does not increase over log files. I kept
|
|
|
|
`timeMs.start` and `timeMs.stop` and also `fileId.start` and `fileId.stop`
|
|
|
|
in the data frame, so it is clear when events span more than one log file.
|
2023-09-21 16:45:06 +02:00
|
|
|
|
2023-10-25 17:13:07 +02:00
|
|
|
<!--
|
2024-03-22 15:58:30 +01:00
|
|
|
Infos from the programmer:
|
2023-10-18 12:57:15 +02:00
|
|
|
|
|
|
|
"Bin außerdem gerade den Code von damals durchgegangen. Das Logging läuft
|
|
|
|
so: Mit Start der Anwendung wird alle 10 Minuten ein neues Logfile
|
|
|
|
erstellt. Die Startzeit, von der aus die Duration berechnet wird, wird
|
|
|
|
jeweils neu gesetzt. Duration ist also nicht "Dauer seit Start der
|
|
|
|
Anwendung" sondern "Dauer seit Restart des Loggers". Deine Vermutung ist
|
|
|
|
also richtig - es sollte keine Durations >10 Minuten geben. Der erste
|
|
|
|
Eintrag eines Logfiles kann alles zwischen 0 und 10 Minuten sein (je
|
|
|
|
nachdem, ob der Tisch zum Zeitpunkt des neuen Logging-Intervalls in
|
|
|
|
Benutzung war). Wenn ein Case also über 2+ Logs verteilt ist, musst du auf
|
|
|
|
die Duration jeweils 10 Minuten pro Logfile nach dem ersten addieren, damit
|
|
|
|
es passt."
|
2023-10-25 17:13:07 +02:00
|
|
|
-->
|
2023-10-18 12:57:15 +02:00
|
|
|
|
2023-09-13 14:20:08 +02:00
|
|
|
## Left padding of file IDs
|
|
|
|
|
|
|
|
The file names of the raw log files are automatically generated and contain
|
2024-03-22 15:58:30 +01:00
|
|
|
a timestamp. This timestamp is not well formed. First, it contains an
|
2023-09-13 14:20:08 +02:00
|
|
|
incorrect month. The months go from 0 to 11 which means, that the file name
|
|
|
|
`2016_11_15-12_12_57.log` was collected on December 15, 2016 at 12:12 pm.
|
|
|
|
Another problem is that the file names are not zero left padded, e.g.,
|
|
|
|
`2016_11_15-12_2_57.log`. This file was collected on December 15, 2016 at
|
|
|
|
12:02 pm and therefore before the file above. But most sorting algorithms,
|
|
|
|
will sort these files in the order shown below. In order to preprocess the
|
|
|
|
data and close events that belong together, the data need to be sorted by
|
|
|
|
events and artworks repeatedly. In order to get them back in the correct
|
|
|
|
time order, it is necessary to order them based on three variables:
|
2024-03-22 15:58:30 +01:00
|
|
|
`fileId.start`, `date.start` and `timeMs.start`. The file IDs therefore
|
|
|
|
need to sort in the correct order (again see below for example). I zero
|
|
|
|
left padded the log file names within the data frame using it as an
|
|
|
|
identifier. These "file names" do not correspond exactly to the original
|
|
|
|
raw log file names. This needs to be kept in mind when doing any kind of
|
|
|
|
matching etc.
|
2023-09-13 14:20:08 +02:00
|
|
|
|
|
|
|
```
|
|
|
|
## what it looked like before left padding
|
|
|
|
# 1422 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_2_57.log 2016-12-15 12:12:56 599671 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26874254
|
|
|
|
# 1423 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57 621 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26523465
|
|
|
|
# 1424 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57 677 Transform stop 076 076.xml NA 2092.25 2008.00 0.2997736 13.26239605
|
|
|
|
# 1425 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57 774 Transform start 076 076.xml NA 2092.25 2008.00 0.2999345 13.26239605
|
|
|
|
# 1426 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57 850 Transform stop 076 076.xml NA 2092.25 2008.00 0.2997107 13.26223362
|
|
|
|
# 1427 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_2_57.log 2016-12-15 12:12:57 599916 Transform stop 076 076.xml NA 2092.25 2008.00 0.2997771 13.26523465
|
|
|
|
|
|
|
|
## what it looks like now
|
|
|
|
# 1422 2016_11_15-12_02_57.log 2016-12-15 12:12:56 599671 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26874254
|
|
|
|
# 1423 2016_11_15-12_02_57.log 2016-12-15 12:12:57 599916 Transform stop 076 076.xml NA 2092.25 2008.00 0.2997771 13.26523465
|
|
|
|
# 1424 2016_11_15-12_12_57.log 2016-12-15 12:12:57 621 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26523465
|
|
|
|
# 1425 2016_11_15-12_12_57.log 2016-12-15 12:12:57 677 Transform stop 076 076.xml NA 2092.25 2008.00 0.2997736 13.26239605
|
|
|
|
# 1426 2016_11_15-12_12_57.log 2016-12-15 12:12:57 774 Transform start 076 076.xml NA 2092.25 2008.00 0.2999345 13.26239605
|
|
|
|
# 1427 2016_11_15-12_12_57.log 2016-12-15 12:12:57 850 Transform stop 076 076.xml NA 2092.25 2008.00 0.2997107 13.26223362
|
|
|
|
```
|
|
|
|
|
|
|
|
## Timestamps repeat
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
The timestamps in the `date` variable record year, month, day, hour,
|
2023-09-13 14:20:08 +02:00
|
|
|
minute and seconds. Since one second is not a very short time interval for
|
|
|
|
a move on a touch display, this is not fine grained enough to bring events
|
|
|
|
into the correct order, meaning there are events from the same log file
|
2024-03-22 15:58:30 +01:00
|
|
|
having the same timestamp and even events from different log files having
|
|
|
|
the same timestamp. The log files get written about every 10 minutes
|
2023-09-13 14:20:08 +02:00
|
|
|
(which can easily be seen when looking at the file names of the raw log
|
|
|
|
files). So in order to get events in the correct order, it is necessary to
|
2024-03-22 15:58:30 +01:00
|
|
|
first order by file ID, within file ID then sort by timestamp `date` and
|
|
|
|
then within these more coarse grained timestamps sort be `timeMs`. But as
|
2023-09-13 14:20:08 +02:00
|
|
|
explained above, `timeMs` can only be sorted within one file ID, since they
|
|
|
|
do not increase consistently over log files, but have a new setoff for each
|
|
|
|
raw log file.
|
|
|
|
|
|
|
|
## x,y-coordinates outside of display range
|
|
|
|
|
|
|
|
The display of the Multi-Touch-Table is a 4K-display with 3840 x 2160
|
|
|
|
pixels. When you plot the start and stop coordinates, the display is
|
2024-03-22 15:58:30 +01:00
|
|
|
clearly distinguishable. However, a lot of points are outside of the
|
|
|
|
display range. This can happen, when the art objects are scaled and then
|
|
|
|
moved to the very edge of the table. Then it will record pixels outside of
|
|
|
|
the table. These are actually valid data points and I will leave them as
|
|
|
|
is.
|
|
|
|
|
|
|
|
```{r xycoord}
|
|
|
|
datlogs <- read.table("code/results/event_logfiles_2024-02-21_16-07-33.csv", sep = ";",
|
|
|
|
header = TRUE)
|
2023-09-13 14:20:08 +02:00
|
|
|
|
|
|
|
par(mfrow = c(1, 2))
|
2024-03-22 15:58:30 +01:00
|
|
|
plot(y.start ~ x.start, datlogs)
|
2023-09-13 14:20:08 +02:00
|
|
|
abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
|
2024-03-22 15:58:30 +01:00
|
|
|
plot(y.stop ~ x.stop, datlogs)
|
2023-09-13 14:20:08 +02:00
|
|
|
abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
aggregate(cbind(x.start, x.stop, y.start, y.stop) ~ 1, datlogs, mean)
|
2023-09-13 14:20:08 +02:00
|
|
|
```
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
## Pop-ups from glossar cannot be assigned to a specific item
|
2023-09-13 14:20:08 +02:00
|
|
|
|
|
|
|
All the information, pictures and texts for the topics and pop-ups are
|
2024-03-22 15:58:30 +01:00
|
|
|
stored in `/data/haum/ContentEyevisit/eyevisit_cards_light/<item_number>`.
|
|
|
|
Among other things, each folder contains XML-files with the information
|
|
|
|
about any technical terms that can be opened from the hypertexts on the
|
|
|
|
topic cards. Often these information are item dependent and then the
|
|
|
|
corresponding XML-file is in the folder for this item. Sometimes, however,
|
|
|
|
more general terms can be opened. In order to avoid multiple files
|
|
|
|
containing the same information, these were stored in a folder called
|
|
|
|
`glossar` and get accessed from there. The raw log files only contain the
|
|
|
|
path to this glossar entry and did not record from which item it was
|
|
|
|
accessed. I tried to assign these glossar entries to the correct items. The
|
|
|
|
(very heuristic) approach was this:
|
2023-09-13 14:20:08 +02:00
|
|
|
|
|
|
|
1. Create a lookup table with all XML-file names (possible pop-ups) from
|
2024-03-22 15:58:30 +01:00
|
|
|
the glossar folder and what items possibly call them. This was stored
|
2023-09-13 14:20:08 +02:00
|
|
|
as an `RData` object for easier handling but should maybe be stored in a
|
|
|
|
more interoperable format.
|
|
|
|
|
|
|
|
2. I went through all possible pop-ups in this lookup table and stored the
|
2024-03-22 15:58:30 +01:00
|
|
|
items that are associated with it.
|
2023-09-13 14:20:08 +02:00
|
|
|
|
|
|
|
3. I created a sub data frame without move events (since they can never be
|
|
|
|
associated with a pop-up) and went through every line and looked up if
|
2024-03-22 15:58:30 +01:00
|
|
|
an item and a topic card had been opened. If this was the case and a
|
|
|
|
glossar entry came up before the item was closed again, I assigned
|
|
|
|
this item to the glossar entry.
|
2023-09-13 14:20:08 +02:00
|
|
|
|
|
|
|
This is heuristic since it is possible that several topic cards from
|
2024-03-22 15:58:30 +01:00
|
|
|
different items are opened simultaneously and the glossar pop-up could
|
2023-09-13 14:20:08 +02:00
|
|
|
be opened from either one (it could even be more than two, of course). In
|
2024-03-22 15:58:30 +01:00
|
|
|
these cases the item that was opened closest to the glossar pop-up has
|
2023-09-13 14:20:08 +02:00
|
|
|
been assigned, but this can never be completely error free.
|
|
|
|
|
|
|
|
And this heuristic only assigns a little more than half of the glossar
|
2024-03-22 15:58:30 +01:00
|
|
|
entries. Since my heuristic only looks for the last item that has been
|
|
|
|
opened and if this item is a possible candidate it misses all glossar
|
|
|
|
pop-ups where another item has been opened in between. This is still an
|
2023-09-13 14:20:08 +02:00
|
|
|
open TODO to write a more elaborate algorithm.
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
All glossar pop-ups that do not get matched with an item are removed
|
2023-10-25 17:13:07 +02:00
|
|
|
from the data set with a warning if the argument `glossar = TRUE` is set.
|
|
|
|
Otherwise the glossar entries will be ignored completely.
|
2023-09-13 14:20:08 +02:00
|
|
|
|
|
|
|
## Assign a `case` variable based on "time heuristic"
|
|
|
|
|
|
|
|
One thing needed in order to work with the data set and use it for machine
|
2023-10-25 17:13:07 +02:00
|
|
|
learning algorithms like process mining, is a variable that tries to
|
2023-09-13 14:20:08 +02:00
|
|
|
identify a case. A case variable will structure the data frame in a way
|
|
|
|
that navigation behavior can actually be investigated. However, we do not
|
|
|
|
know if several people are standing around the table interacting with it or
|
|
|
|
just one very active person. The simplest way to define a case variable is
|
|
|
|
to just use a time limit between events. This means that when the table has
|
|
|
|
not been interacted with for, e.g., 20 seconds than it is assumed that a
|
|
|
|
person moved on and a new person started interacting with the table. This
|
|
|
|
is the easiest heuristic and implemented at the moment. Process mining
|
|
|
|
shows that this simple approach works in a way that the correct process
|
|
|
|
gets extracted by the algorithm.
|
|
|
|
|
|
|
|
In order to investigate user behavior on a more fine grained level, it will
|
|
|
|
be necessary to come up with a more elaborate approach. A better, still
|
2023-10-25 17:13:07 +02:00
|
|
|
simple approach, could be to use this kind of time limit and additionally
|
2024-03-22 15:58:30 +01:00
|
|
|
look at the distance between items interacted with within one time window.
|
|
|
|
When items are far apart it seems plausible that more than one person
|
|
|
|
interacted with them. Very short time lapses between events on different
|
|
|
|
items could also be an indicator that more than one person is interacting
|
|
|
|
with the table.
|
2023-09-13 14:20:08 +02:00
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
## Assign a `path` variable
|
2023-09-13 14:20:08 +02:00
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
The `path` variable is supposed to show one interaction trace with one
|
2023-09-13 14:20:08 +02:00
|
|
|
artwork. Meaning it starts when an artwork is touched or flipped and stops
|
2024-03-22 15:58:30 +01:00
|
|
|
when it is closed again. It is easy to assign a path from flipping a card
|
2023-09-13 14:20:08 +02:00
|
|
|
over opening (maybe several) topics and pop-ups for this artwork card until
|
2024-03-22 15:58:30 +01:00
|
|
|
closing this card again. But one would like to assign the same path to
|
2023-09-13 14:20:08 +02:00
|
|
|
move events surrounding this interaction. Again, this is not possible in an
|
2024-03-22 15:58:30 +01:00
|
|
|
algorithmic way but only heuristically.
|
2023-09-13 14:20:08 +02:00
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
Again, I used a time cutoff for this. First, if a `move` event occurs, it
|
|
|
|
is checked, if the same item has been flipped less than 20 seconds
|
|
|
|
beforehand. If yes, the same path indicator is assigned to this `move`. If
|
|
|
|
not, temporarily a new "move indicator" is assigned. Then, a "backward
|
|
|
|
pass" is applied, where it is checked if the same item is opened less than
|
|
|
|
20 seconds _after_ the event occurs. If yes, that path indicator is
|
|
|
|
assigned. For all the remaining moves, a new path number is assigned. This
|
|
|
|
corresponds to items being moved without being flipped.
|
2023-09-13 14:20:08 +02:00
|
|
|
|
|
|
|
## A `move` event does not record any change
|
|
|
|
|
|
|
|
Most of the events in the log files are move events. Additionally, many of
|
2024-03-22 15:58:30 +01:00
|
|
|
these move events are recorded but they do not indicate any change, meaning
|
|
|
|
the only difference is the timestamp. All other variables indicating moves
|
2023-09-13 14:20:08 +02:00
|
|
|
like `x.start` and `x.stop`, `rotation.start` and `rotation.stop` etc. do
|
2024-03-22 15:58:30 +01:00
|
|
|
not show _any_ change. They represent about 2/3 of all move events. These
|
2023-09-13 14:20:08 +02:00
|
|
|
events are probably short touches of the table without an actual
|
|
|
|
interaction. They were therefore removed from the data set.
|
|
|
|
|
|
|
|
## Card indices go from 0 to 7 (instead of 0 to 5 as expected)
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
In the beginning I thought that the number for topics was the index of
|
|
|
|
where the card was presented on the back of the item. But this is not
|
|
|
|
correct. It is the number of the topic. There are eight topics in total:
|
2023-10-25 17:13:07 +02:00
|
|
|
|
2023-09-13 14:20:08 +02:00
|
|
|
```
|
2024-03-22 15:58:30 +01:00
|
|
|
Indices for topics:
|
|
|
|
0 artist
|
|
|
|
1 thema
|
|
|
|
2 komposition
|
|
|
|
3 leben des kunstwerks
|
|
|
|
4 details
|
|
|
|
5 licht und farbe
|
|
|
|
6 extra info
|
|
|
|
7 technik
|
|
|
|
```
|
|
|
|
On the back of items, there can be between 2 to 6 topic cards. Several of
|
|
|
|
these topic cards can be about the same topic, e.g., there can be two topic
|
|
|
|
cards assigned to the topic `thema`. It is impossible to find out if the
|
|
|
|
same topic card was opened several times or if different topic cards with
|
|
|
|
the same topic were opened from the same item. See example below for item
|
|
|
|
"001".
|
|
|
|
|
|
|
|
```{r topics, echo = FALSE}
|
|
|
|
items <- sprintf("%03d", unique(datlogs$item))
|
|
|
|
topics <- extract_topics(items, xmlfiles = paste0(items, ".xml"),
|
|
|
|
xmlpath = "data/haum/ContentEyevisit/eyevisit_cards_light/")
|
|
|
|
head(topics)
|
|
|
|
```
|
2023-09-13 14:20:08 +02:00
|
|
|
|
2023-09-15 16:22:21 +02:00
|
|
|
## New artworks "504" and "505" starting October 2022
|
|
|
|
|
|
|
|
When I read in the complete data frame for the first time, all of the
|
2024-03-22 15:58:30 +01:00
|
|
|
sudden there were 72 instead of 70 items. It seems like these two
|
2023-09-15 16:22:21 +02:00
|
|
|
artworks appear on October 21, 2022.
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
```{r newitems}
|
|
|
|
summary(as.Date(datraw[datraw$item %in% c("504", "505"), "date"]))
|
2023-09-15 16:22:21 +02:00
|
|
|
```
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
The artworks seem to be have updated in general after October 21, 2022. The
|
|
|
|
following table shows which items were presented in which years.
|
2023-09-15 16:22:21 +02:00
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
```{r years}
|
|
|
|
xtabs(~ item + lubridate::year(date.start), datlogs)
|
2023-09-15 16:22:21 +02:00
|
|
|
```
|
|
|
|
|
2024-03-22 15:58:30 +01:00
|
|
|
It shows that the artworks haven been updated after the Corona pandemic. I
|
|
|
|
think, the table was also moved to a different location at that point.
|
2023-09-13 14:20:08 +02:00
|
|
|
|