Updated README.Rmd and exported as github_document
This commit is contained in:
parent
37e67bfa69
commit
9762c61a8d
537
README.Rmd
537
README.Rmd
@ -1,46 +1,38 @@
|
|||||||
---
|
---
|
||||||
title: "Background information about MTT data"
|
title: "Log data from the Multi-Touch Table at the HAUM"
|
||||||
author: "Nora Wickelmaier"
|
output: github_document
|
||||||
date: "`r Sys.Date()`"
|
|
||||||
output:
|
|
||||||
html_document:
|
|
||||||
number_sections: true
|
|
||||||
toc: true
|
|
||||||
---
|
---
|
||||||
|
|
||||||
```{r, include = FALSE}
|
```{r, include = FALSE}
|
||||||
# setwd("C:/Users/nwickelmaier/Nextcloud/Documents/MDS/2023ss/60100_master_thesis")
|
devtools::load_all("../../../../software/mtt")
|
||||||
devtools::load_all("../../../software/mtt")
|
|
||||||
```
|
```
|
||||||
|
|
||||||
# Log data from the Multi-Touch Table at the HAUM
|
|
||||||
|
|
||||||
The Multi Touch Table at the Herzog-Anton-Ulrich-Museum (HAUM) in
|
The Multi Touch Table at the Herzog-Anton-Ulrich-Museum (HAUM) in
|
||||||
Braunschweig gives visitors of the Museum the opportunity to interact with
|
Braunschweig gives visitors of the Museum the opportunity to interact with
|
||||||
67 artworks and 3 tiles containing information about the museum and its
|
about 70 artworks and 3 virtual cards containing information about the
|
||||||
layout. The table was installed at the institute in October 2016 and since
|
museum and its layout. The table was installed at the institute in October
|
||||||
November 2016 log files from interactions of visitors of the museum have
|
2016 and since November 2016 log files from interactions of visitors of the
|
||||||
been collected. These log files are in an unstructured format and cannot be
|
museum have been collected. These log files are in an unstructured format
|
||||||
easily analyzed. The purpose of the following document is to describe how
|
and cannot be easily analyzed. The purpose of the following document is to
|
||||||
the data haven been transformed and which decisions have been made along
|
describe how the data haven been transformed and which decisions have been
|
||||||
the way.
|
made along the way.
|
||||||
|
|
||||||
# Data structure
|
# Data structure
|
||||||
|
|
||||||
The log files contain lines that indicate the beginning and end of possible
|
The log files contain lines that indicate the beginning and end of possible
|
||||||
actions that can be performed when interacting with the artworks on the
|
activities that can be performed when interacting with the artworks on the
|
||||||
table. The layout of the table looks like 70 pictures have been tossed on a
|
table. The layout of the table looks like pictures have been tossed on a
|
||||||
large table. Every artwork is visible at the start configuration. People
|
large table. Every artwork is visible at the start configuration. People
|
||||||
can move the pictures on the table, they can be scaled and rotated.
|
can move the pictures on the table, they can be scaled and rotated.
|
||||||
Additionally, the virtual picture cards can be flipped in order to find
|
Additionally, the virtual picture cards can be flipped in order to find
|
||||||
more information of the artwork on the "back" of the card. One has to press
|
more information of the artwork on the "back" of the card. One has to press
|
||||||
a little `i` for more information in one of the bottom corners of the card.
|
a little `i` for more information in one of the bottom corners of the card.
|
||||||
On the back of the card two (?) to six information cards can be found with
|
On the back of the card two to six information cards can be found with a
|
||||||
a teaser text about a certain topic. These topic cards can be opened and a
|
teaser text about a certain topic. These topic cards can be opened and a
|
||||||
hypertext with detailed information pops up. Within these hypertexts
|
hypertext with detailed information opens. Within these hypertexts certain
|
||||||
certain technical terms can be clicked for lay people to get more
|
technical terms can be clicked for lay people to get more information. This
|
||||||
information. This also opens up a pop-up. The events encoded in the raw log
|
also opens up a pop-up. The events encoded in the raw log files therefore
|
||||||
files therefore have the following structure.
|
have the following structure.
|
||||||
|
|
||||||
```
|
```
|
||||||
"Start Application" --> Start Application
|
"Start Application" --> Start Application
|
||||||
@ -100,32 +92,32 @@ raw log file:
|
|||||||
organized in. For the HAUM data set, the data are sorted by year (folders
|
organized in. For the HAUM data set, the data are sorted by year (folders
|
||||||
2016, 2017, 2018, 2019, 2020, 2021, 2022, and 2023).
|
2016, 2017, 2018, 2019, 2020, 2021, 2022, and 2023).
|
||||||
|
|
||||||
* `data`: Extracted time stamp from the raw log file in the format
|
* `date`: Extracted timestamp from the raw log file in the format
|
||||||
`yyyy-mm-dd hh:mm:ss`.
|
`yyyy-mm-dd hh:mm:ss`.
|
||||||
|
|
||||||
* `timeMs`: Containing a time stamp in Milliseconds that restarts with
|
* `timeMs`: Containing a timestamp in Milliseconds that restarts with
|
||||||
every new raw log files.
|
every new raw log files.
|
||||||
|
|
||||||
* `event`: Start and stop event tags. See above for possible values.
|
* `event`: Start and stop event tags. See above for possible values.
|
||||||
|
|
||||||
* `artwork`: Identifier of the different artworks. This is a 3 digit
|
* `item`: Identifier of the different items. This is a three-digit
|
||||||
(left-padded) number. The numbers of the artworks correspond to the
|
(left-padded) number. The numbers of the items correspond to the
|
||||||
folder names in `/ContentEyevisit/eyevisit_cards_light/` and were
|
folder names in `/ContentEyevisit/eyevisit_cards_light/` and were
|
||||||
orginally taken from the museums catalogue.
|
orginally taken from the museums catalogue.
|
||||||
|
|
||||||
* `popup`: Name of the pop-up opened. This is only interestin for
|
* `popup`: Name of the pop-up opened. This is only interesting for
|
||||||
"openPopup" events.
|
"openPopup" events.
|
||||||
|
|
||||||
* `topicNumber`: The number of the topic card that has been opened at the back of
|
* `topic`: The number of the topic card that has been opened at the back of
|
||||||
the artwork card. See below for a more detailed descripttion what these
|
the item card. See below for a more detailed descripttion what these
|
||||||
numbers possibly mean.
|
numbers mean.
|
||||||
|
|
||||||
* `x`: Value of x-coordinate in pixel on the 4K-Display ($3840 \times 2160$)
|
* `x`: Value of x-coordinate in pixel on the 4K-Display ($3840 \times 2160$)
|
||||||
|
|
||||||
* `y`: Value of y-coordinate in pixel
|
* `y`: Value of y-coordinate in pixel
|
||||||
|
|
||||||
* `scale`: Number in 128 bit that indicates how much the artwork card has
|
* `scale`: Number in 128 bit that indicates how much the card has been
|
||||||
been scaled (????)
|
scaled
|
||||||
|
|
||||||
* `rotation`: Degree of rotation in start configuration.
|
* `rotation`: Degree of rotation in start configuration.
|
||||||
|
|
||||||
@ -134,43 +126,45 @@ raw log file:
|
|||||||
|
|
||||||
## Variables after "closing of events"
|
## Variables after "closing of events"
|
||||||
|
|
||||||
The raw log data consists of start and stop events for each event type.
|
The raw log data consist of start and stop events for each event type.
|
||||||
After preprocessing for event types are extracted: `move`, `flipCard`,
|
After preprocessing four event types are extracted: `move`, `flipCard`,
|
||||||
`openTopic`, and `openPopup`. Except for the `move` events, which can occur
|
`openTopic`, and `openPopup`. Except for the `move` events, which can occur
|
||||||
at any time when interacting with an artwork card on the table, the events
|
at any time when interacting with an item card on the table, the events
|
||||||
have a hierachical order: An artwork card first needs to be flipped
|
have a hierarchical order: An item card first needs to be flipped
|
||||||
(`flipCard`), then the topic cards on the back of the card can be opened
|
(`flipCard`), then the topic cards on the back of the card can be opened
|
||||||
(`openTopic`), and finally pop-ups on these topic cards can be opened
|
(`openTopic`), and finally pop-ups on these topic cards can be opened
|
||||||
(`openPopup`). This implies that the event `openPopup` can only be present
|
(`openPopup`). This implies that the event `openPopup` can only be present
|
||||||
for a certain artwork, if the card has already been flipped (i.e., an event
|
for a certain item, if the card has already been flipped (i.e., an event
|
||||||
`flipCard` for the same artwork has already occured).
|
`flipCard` for the same item has already occured).
|
||||||
|
|
||||||
After preprocessing, the data frame is now in a wide format with columns
|
After preprocessing, the data frame is now in a wide format with columns
|
||||||
for the start and the stop of each event and contains the following
|
for the start and the stop of each event and contains the following
|
||||||
variables:
|
variables:
|
||||||
|
|
||||||
* `folder`: Containing the folder name (see above)
|
* `fileId.start` / `fileId.stop`: See above.
|
||||||
|
|
||||||
* `eventId`: A numerical variable that indicates the number of the event.
|
* `date.start` / `date.stop`: See above.
|
||||||
Starts at 1 and ends with the total number of events, counting up by 1.
|
|
||||||
|
* `folder`: Containing the folder name (see above)
|
||||||
|
|
||||||
* `case`: A numerical variable indicating cases in the data. A "case"
|
* `case`: A numerical variable indicating cases in the data. A "case"
|
||||||
indicates an interaction interval and could be defined in different ways.
|
indicates an interaction interval and could be defined in different ways.
|
||||||
Right now a new case begins, when no event occured for 20 seconds.
|
Right now a new case begins, when no event occurred for 20 seconds or
|
||||||
|
longer.
|
||||||
|
|
||||||
* `trace`: A trace is defined as one interaction with one artwork. A trace
|
* `path`: A path is defined as one interaction with one item A path
|
||||||
can either start with a `flipCard` event or when an artwork has been
|
can either start with a `flipCard` event or when an item has been
|
||||||
touched for the first time within this case. A trace ends with the
|
touched for the first time within this case. A path ends with the
|
||||||
artwork card being flipped close again or with the last movement of the
|
item card being flipped close again or with the last movement of the
|
||||||
card within this case. One case can contain several traces with the same
|
card within this case. One case can contain several paths with the same
|
||||||
artwork when the artwork is flipped open and slipped close again several
|
item when the item is flipped open and flipped close again several
|
||||||
times within a short time.
|
times within a short time.
|
||||||
|
|
||||||
* `glossar`: An indicator variable with values 0/1 that tracks if a pop-up
|
* `glossar`: An indicator variable with values 0/1 that tracks if a pop-up
|
||||||
has been opened from the glossar folder. These pop-ups can be assigned to
|
has been opened from the glossar folder. These pop-ups can be assigned to
|
||||||
the wronge artwork since it is not possible to do this algorithmically.
|
the wrong item since it is not possible to do this algorithmically.
|
||||||
It is possible that two artworks are flipped open that could both link to
|
It is possible that two items are flipped open that could both link to
|
||||||
the same popup from a glossar. The indicator variable is left as a
|
the same pop-up from a glossar. The indicator variable is left as a
|
||||||
variable, so that these pop-ups can be easily deleted from the data.
|
variable, so that these pop-ups can be easily deleted from the data.
|
||||||
Right now, glossar entries can be ignored completely by setting an
|
Right now, glossar entries can be ignored completely by setting an
|
||||||
argument and this is done by default. Using the pop-ups from the glossar
|
argument and this is done by default. Using the pop-ups from the glossar
|
||||||
@ -179,20 +173,16 @@ variables:
|
|||||||
* `event`: Indicating the event. Can take tha values `move`, `flipCard`,
|
* `event`: Indicating the event. Can take tha values `move`, `flipCard`,
|
||||||
`openTopic`, and `openPopup`.
|
`openTopic`, and `openPopup`.
|
||||||
|
|
||||||
* `artwork`: Identifier of the different artworks. This is a 3 digit
|
* `item`: Identifier of the different artworks and information cards. This
|
||||||
(left-padded) number. See above.
|
is a three-digit (left-padded) number. See above.
|
||||||
|
|
||||||
* `fileId.start` / `fileId.stop`: See above.
|
|
||||||
|
|
||||||
* `date.start` / `date.stop`: See above.
|
|
||||||
|
|
||||||
* `timeMs.start` / `timeMs.stop`: See above.
|
* `timeMs.start` / `timeMs.stop`: See above.
|
||||||
|
|
||||||
* `duration`: Calculated by $timeMs.stop - timeMs.start$ in Milliseconds.
|
* `duration`: Calculated by $timeMs.stop - timeMs.start$ in Milliseconds.
|
||||||
Needs to be adjusted for events spanning more than one log file by a
|
Needs to be adjusted for events spanning more than one log file by a
|
||||||
factor of $60,000 \times #logfiles$. See below for details.
|
factor of $60,000 \times \text{number of logfiles}$. See below for details.
|
||||||
|
|
||||||
* `topicNumber`: See above.
|
* `topic`: See above.
|
||||||
|
|
||||||
* `popup`: See above.
|
* `popup`: See above.
|
||||||
|
|
||||||
@ -200,11 +190,12 @@ variables:
|
|||||||
|
|
||||||
* `y.start` / `y.stop`: See above.
|
* `y.start` / `y.stop`: See above.
|
||||||
|
|
||||||
* `distance`: Euclidean distande calculated from $(x.start, y.start)$ and $(x.stop, y.stop)$.
|
* `distance`: Euclidean distande calculated from $(x.start, y.start)$ and
|
||||||
|
$(x.stop, y.stop)$.
|
||||||
|
|
||||||
* `scale.start` / `scale.stop`: See above.
|
* `scale.start` / `scale.stop`: See above.
|
||||||
|
|
||||||
* `scaleSize`: Relative scaling of artwork card, calculated by
|
* `scaleSize`: Relative scaling of item card, calculated by
|
||||||
$\frac{scale.stop}{scale.start}$.
|
$\frac{scale.stop}{scale.start}$.
|
||||||
|
|
||||||
* `rotation.start` / `rotation.stop`: See above.
|
* `rotation.start` / `rotation.stop`: See above.
|
||||||
@ -215,60 +206,26 @@ variables:
|
|||||||
## How unclosed events are handled
|
## How unclosed events are handled
|
||||||
|
|
||||||
Events do not necessarily need to be completed. A person can, e.g., leave
|
Events do not necessarily need to be completed. A person can, e.g., leave
|
||||||
the table and not flip the artwork card close again. For `flipCard`,
|
the table and not flip the item card close again. For `flipCard`,
|
||||||
`openTopic`, and `openPopup` the data frame contains `NA` when the event
|
`openTopic`, and `openPopup` the data frame contains `NA` when the event
|
||||||
does not complete. For `move` events is happens quite often that a start
|
does not complete. For `move` events it happens quite often that a start
|
||||||
event follows a start event and a stop event follows a stop event.
|
event follows a start event and a stop event follows a stop event.
|
||||||
Technically a move event cannot *not* be finished and the number of events
|
Technically a move event cannot *not* be finished and the number of events
|
||||||
without a start or stop indicated that the time resolution was not
|
without a start or stop indicate that the time resolution was not
|
||||||
sufficient to catch all these events accurately. Double start and stop
|
sufficient to catch all these events accurately. Double start and stop
|
||||||
`move`events have therefore been deleted from the data set.
|
`move` events have therefore been deleted from the data set.
|
||||||
|
|
||||||
<!--
|
|
||||||
## How a case is defined
|
|
||||||
|
|
||||||
* Herausfinden, ob mehr als eine Person am Tisch steht?
|
|
||||||
- Sliding window, in der Anzahl von Artworks gezählt wird? Oder wie weit
|
|
||||||
angefasste Artworks voneinander entfernt sind?
|
|
||||||
- Man kann sowas schon "sehen" in den Logs - aber wie kann ich es
|
|
||||||
automatisiert rausziehen? Was ist meine Definition von
|
|
||||||
"Interaktionsboost"?
|
|
||||||
- Egal wie wir es machen, geht es auf den "Event-Log-Daten"?
|
|
||||||
-->
|
|
||||||
|
|
||||||
## Additional meta data
|
## Additional meta data
|
||||||
|
|
||||||
For the HAUM data, I added meta data on state holidays and school
|
For the HAUM data, I added meta data on state holidays and school
|
||||||
vacations. Additionally, the topic categories of the topic cards were
|
vacations.
|
||||||
extracted from the XML files and added to the data frame.
|
|
||||||
|
|
||||||
This led to the following additional variables:
|
This led to the following additional variables:
|
||||||
|
|
||||||
* `topicIndex`
|
|
||||||
|
|
||||||
* `topicFile`
|
|
||||||
|
|
||||||
* `topic`
|
|
||||||
|
|
||||||
* `state` (Niedersachsen for complete HAUM data set)
|
|
||||||
|
|
||||||
* `stateCode` (NI)
|
|
||||||
|
|
||||||
* `holiday`
|
* `holiday`
|
||||||
|
|
||||||
* `vacations`
|
* `vacations`
|
||||||
|
|
||||||
* `stateCodeVacations`
|
|
||||||
|
|
||||||
<!--
|
|
||||||
- Metadata on artworks like, name, artist, type of artwork, epoch, etc.
|
|
||||||
- School vacations and holidays
|
|
||||||
- Special exhibits at the museum
|
|
||||||
- Number of visitors per day (bei Sven noch mal nachhaken?)
|
|
||||||
- Age structure of visitors per day?
|
|
||||||
- ... ????
|
|
||||||
-->
|
|
||||||
|
|
||||||
# Problems and how I handled them
|
# Problems and how I handled them
|
||||||
|
|
||||||
This lists some problems with the log data that required decisions. These
|
This lists some problems with the log data that required decisions. These
|
||||||
@ -287,33 +244,12 @@ event spans more than two log files, a multiple of $600,000$ must be taken,
|
|||||||
e.g. for three log files it must be: $2 \times 600,000 - timeMs.start +
|
e.g. for three log files it must be: $2 \times 600,000 - timeMs.start +
|
||||||
timeMs.stop$ and so on.
|
timeMs.stop$ and so on.
|
||||||
|
|
||||||
```{r, results = FALSE, fig.show = TRUE}
|
```{r timems, echo = FALSE, results = FALSE, fig.show = TRUE}
|
||||||
# Read data
|
# Read data
|
||||||
dat0 <- read.table("data/haum/raw_logfiles_small_2023-09-26_13-50-20.csv", sep = ";",
|
datraw <- read.table("code/results/raw_logfiles_2024-02-21_16-07-33.csv", sep = ";",
|
||||||
header = TRUE)
|
header = TRUE)
|
||||||
dat0$date <- as.POSIXct(dat0$date)
|
|
||||||
dat0$glossar <- ifelse(dat0$artwork == "glossar", 1, 0)
|
|
||||||
|
|
||||||
# Remove irrelevant events
|
plot(timeMs ~ as.factor(fileId), datraw[1:5000,], xlab = "fileId")
|
||||||
dat <- subset(dat0, !(dat0$event %in% c("Start Application",
|
|
||||||
"Show Application")))
|
|
||||||
|
|
||||||
# Add trace variable
|
|
||||||
artworks <- unique(stats::na.omit(dat$artwork))
|
|
||||||
artworks <- artworks[artworks != "glossar"]
|
|
||||||
glossar_files <- unique(subset(dat, dat$artwork == "glossar")$popup)
|
|
||||||
glossar_dict <- create_glossardict(artworks, glossar_files,
|
|
||||||
xmlpath = "data/haum/ContentEyevisit/eyevisit_cards_light/")
|
|
||||||
dat1 <- add_trace(dat, glossar_dict)
|
|
||||||
|
|
||||||
# Close events
|
|
||||||
dat2 <- rbind(close_events(dat1, "move", rm_nochange_moves = TRUE),
|
|
||||||
close_events(dat1, "flipCard", rm_nochange_moves = TRUE),
|
|
||||||
close_events(dat1, "openTopic", rm_nochange_moves = TRUE),
|
|
||||||
close_events(dat1, "openPopup", rm_nochange_moves = TRUE))
|
|
||||||
dat2 <- dat2[order(dat2$fileId.start, dat2$date.start, dat2$timeMs.start), ]
|
|
||||||
|
|
||||||
plot(timeMs ~ as.factor(fileId), dat[1:5000,], xlab = "fileId")
|
|
||||||
```
|
```
|
||||||
|
|
||||||
The boxplot shows that we have a continuous range of values within one log
|
The boxplot shows that we have a continuous range of values within one log
|
||||||
@ -322,7 +258,7 @@ file but that `timeMs` does not increase over log files. I kept
|
|||||||
in the data frame, so it is clear when events span more than one log file.
|
in the data frame, so it is clear when events span more than one log file.
|
||||||
|
|
||||||
<!--
|
<!--
|
||||||
Infos from Philipp:
|
Infos from the programmer:
|
||||||
|
|
||||||
"Bin außerdem gerade den Code von damals durchgegangen. Das Logging läuft
|
"Bin außerdem gerade den Code von damals durchgegangen. Das Logging läuft
|
||||||
so: Mit Start der Anwendung wird alle 10 Minuten ein neues Logfile
|
so: Mit Start der Anwendung wird alle 10 Minuten ein neues Logfile
|
||||||
@ -340,7 +276,7 @@ es passt."
|
|||||||
## Left padding of file IDs
|
## Left padding of file IDs
|
||||||
|
|
||||||
The file names of the raw log files are automatically generated and contain
|
The file names of the raw log files are automatically generated and contain
|
||||||
a time stamp. This time stamp is not well formed. First, it contains an
|
a timestamp. This timestamp is not well formed. First, it contains an
|
||||||
incorrect month. The months go from 0 to 11 which means, that the file name
|
incorrect month. The months go from 0 to 11 which means, that the file name
|
||||||
`2016_11_15-12_12_57.log` was collected on December 15, 2016 at 12:12 pm.
|
`2016_11_15-12_12_57.log` was collected on December 15, 2016 at 12:12 pm.
|
||||||
Another problem is that the file names are not zero left padded, e.g.,
|
Another problem is that the file names are not zero left padded, e.g.,
|
||||||
@ -350,11 +286,12 @@ will sort these files in the order shown below. In order to preprocess the
|
|||||||
data and close events that belong together, the data need to be sorted by
|
data and close events that belong together, the data need to be sorted by
|
||||||
events and artworks repeatedly. In order to get them back in the correct
|
events and artworks repeatedly. In order to get them back in the correct
|
||||||
time order, it is necessary to order them based on three variables:
|
time order, it is necessary to order them based on three variables:
|
||||||
`fileId`, `date.start` and `timeMs`. The file IDs therefore need to
|
`fileId.start`, `date.start` and `timeMs.start`. The file IDs therefore
|
||||||
sort in the correct order (again see below for example). I zero left padded
|
need to sort in the correct order (again see below for example). I zero
|
||||||
the log file names within the data frame using it as an identifier. These
|
left padded the log file names within the data frame using it as an
|
||||||
"file names" do not correspond exactly to the original raw log file names.
|
identifier. These "file names" do not correspond exactly to the original
|
||||||
This needs to be kept in mind when doing any kind of matching etc.
|
raw log file names. This needs to be kept in mind when doing any kind of
|
||||||
|
matching etc.
|
||||||
|
|
||||||
```
|
```
|
||||||
## what it looked like before left padding
|
## what it looked like before left padding
|
||||||
@ -376,16 +313,16 @@ This needs to be kept in mind when doing any kind of matching etc.
|
|||||||
|
|
||||||
## Timestamps repeat
|
## Timestamps repeat
|
||||||
|
|
||||||
The time stamps in the `date` variable record year, month, day, hour,
|
The timestamps in the `date` variable record year, month, day, hour,
|
||||||
minute and seconds. Since one second is not a very short time interval for
|
minute and seconds. Since one second is not a very short time interval for
|
||||||
a move on a touch display, this is not fine grained enough to bring events
|
a move on a touch display, this is not fine grained enough to bring events
|
||||||
into the correct order, meaning there are events from the same log file
|
into the correct order, meaning there are events from the same log file
|
||||||
having the same time stamp and even events from different log files having
|
having the same timestamp and even events from different log files having
|
||||||
the same time stamp. The log files get written about every 10 minutes
|
the same timestamp. The log files get written about every 10 minutes
|
||||||
(which can easily be seen when looking at the file names of the raw log
|
(which can easily be seen when looking at the file names of the raw log
|
||||||
files). So in order to get events in the correct order, it is necessary to
|
files). So in order to get events in the correct order, it is necessary to
|
||||||
first order by file ID, within file ID then sort by time stamp `date` and
|
first order by file ID, within file ID then sort by timestamp `date` and
|
||||||
then within these more coarse grained time stamps sort be `timeMs`. But as
|
then within these more coarse grained timestamps sort be `timeMs`. But as
|
||||||
explained above, `timeMs` can only be sorted within one file ID, since they
|
explained above, `timeMs` can only be sorted within one file ID, since they
|
||||||
do not increase consistently over log files, but have a new setoff for each
|
do not increase consistently over log files, but have a new setoff for each
|
||||||
raw log file.
|
raw log file.
|
||||||
@ -394,64 +331,67 @@ raw log file.
|
|||||||
|
|
||||||
The display of the Multi-Touch-Table is a 4K-display with 3840 x 2160
|
The display of the Multi-Touch-Table is a 4K-display with 3840 x 2160
|
||||||
pixels. When you plot the start and stop coordinates, the display is
|
pixels. When you plot the start and stop coordinates, the display is
|
||||||
clearly to distinguish. However, a lot of points are outside of the display
|
clearly distinguishable. However, a lot of points are outside of the
|
||||||
range. This can happen, when the art objects are scaled and then moved to
|
display range. This can happen, when the art objects are scaled and then
|
||||||
the very edge of the table. Then it will record pixels outside of the
|
moved to the very edge of the table. Then it will record pixels outside of
|
||||||
table. These are actually valid data points and I will leave them as is.
|
the table. These are actually valid data points and I will leave them as
|
||||||
|
is.
|
||||||
|
|
||||||
|
```{r xycoord}
|
||||||
|
datlogs <- read.table("code/results/event_logfiles_2024-02-21_16-07-33.csv", sep = ";",
|
||||||
|
header = TRUE)
|
||||||
|
|
||||||
```{r}
|
|
||||||
par(mfrow = c(1, 2))
|
par(mfrow = c(1, 2))
|
||||||
plot(y.start ~ x.start, dat2)
|
plot(y.start ~ x.start, datlogs)
|
||||||
abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
|
abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
|
||||||
plot(y.stop ~ x.stop, dat2)
|
plot(y.stop ~ x.stop, datlogs)
|
||||||
abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
|
abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
|
||||||
|
|
||||||
aggregate(cbind(x.start, x.stop, y.start, y.stop) ~ 1, dat2, mean)
|
aggregate(cbind(x.start, x.stop, y.start, y.stop) ~ 1, datlogs, mean)
|
||||||
```
|
```
|
||||||
|
|
||||||
## Pop-ups from glossar cannot be assigned to a specific artwork
|
## Pop-ups from glossar cannot be assigned to a specific item
|
||||||
|
|
||||||
All the information, pictures and texts for the topics and pop-ups are
|
All the information, pictures and texts for the topics and pop-ups are
|
||||||
stored in
|
stored in `/data/haum/ContentEyevisit/eyevisit_cards_light/<item_number>`.
|
||||||
`/Logfiles/ContentEyevisit/eyevisit_cards_light/<artwork_number>`. Among
|
Among other things, each folder contains XML-files with the information
|
||||||
other things, each folder contains XML-files with the information about any
|
about any technical terms that can be opened from the hypertexts on the
|
||||||
technical terms that can be opened from the hypertexts on the topic cards.
|
topic cards. Often these information are item dependent and then the
|
||||||
Often these information are artwork dependent and then the corresponding
|
corresponding XML-file is in the folder for this item. Sometimes, however,
|
||||||
XML-file is in the folder for this artwork. Sometimes, however, more
|
more general terms can be opened. In order to avoid multiple files
|
||||||
general terms can be opened. In order to avoid multiple files containing
|
containing the same information, these were stored in a folder called
|
||||||
the same information, these were stored in a folder called `glossar` and
|
`glossar` and get accessed from there. The raw log files only contain the
|
||||||
get accessed from there. The raw log files only contain the path to this
|
path to this glossar entry and did not record from which item it was
|
||||||
glossar entry and did not record from which artwork it was accessed. I
|
accessed. I tried to assign these glossar entries to the correct items. The
|
||||||
tried to assign these glossar entries to the correct artworks. The (very
|
(very heuristic) approach was this:
|
||||||
heuristic) approach was this:
|
|
||||||
|
|
||||||
1. Create a lookup table with all XML-file names (possible pop-ups) from
|
1. Create a lookup table with all XML-file names (possible pop-ups) from
|
||||||
the glossar folder and what artworks possibly call them. This was stored
|
the glossar folder and what items possibly call them. This was stored
|
||||||
as an `RData` object for easier handling but should maybe be stored in a
|
as an `RData` object for easier handling but should maybe be stored in a
|
||||||
more interoperable format.
|
more interoperable format.
|
||||||
|
|
||||||
2. I went through all possible pop-ups in this lookup table and stored the
|
2. I went through all possible pop-ups in this lookup table and stored the
|
||||||
artworks that are associated with it.
|
items that are associated with it.
|
||||||
|
|
||||||
3. I created a sub data frame without move events (since they can never be
|
3. I created a sub data frame without move events (since they can never be
|
||||||
associated with a pop-up) and went through every line and looked up if
|
associated with a pop-up) and went through every line and looked up if
|
||||||
an artwork and a topic card had been opened. If this was the case and a
|
an item and a topic card had been opened. If this was the case and a
|
||||||
glossar entry came up before the artwork was closed again, I assigned
|
glossar entry came up before the item was closed again, I assigned
|
||||||
this artwork to this glossar entry.
|
this item to the glossar entry.
|
||||||
|
|
||||||
This is heuristic since it is possible that several topic cards from
|
This is heuristic since it is possible that several topic cards from
|
||||||
different artworks are opened simultaneously and the glossar pop-up could
|
different items are opened simultaneously and the glossar pop-up could
|
||||||
be opened from either one (it could even be more than two, of course). In
|
be opened from either one (it could even be more than two, of course). In
|
||||||
these cases the artwork that was opened closest to the glossar pop-up has
|
these cases the item that was opened closest to the glossar pop-up has
|
||||||
been assigned, but this can never be completely error free.
|
been assigned, but this can never be completely error free.
|
||||||
|
|
||||||
And this heuristic only assigns a little more than half of the glossar
|
And this heuristic only assigns a little more than half of the glossar
|
||||||
entries. Since my heuristic only looks for the last artwork that has been
|
entries. Since my heuristic only looks for the last item that has been
|
||||||
opened and if this artwork is a possible candidate it misses all glossar
|
opened and if this item is a possible candidate it misses all glossar
|
||||||
pop-ups where another artwork has been opened in between. This is still an
|
pop-ups where another item has been opened in between. This is still an
|
||||||
open TODO to write a more elaborate algorithm.
|
open TODO to write a more elaborate algorithm.
|
||||||
|
|
||||||
All glossar pop-ups that do not get matched with an artwork are removed
|
All glossar pop-ups that do not get matched with an item are removed
|
||||||
from the data set with a warning if the argument `glossar = TRUE` is set.
|
from the data set with a warning if the argument `glossar = TRUE` is set.
|
||||||
Otherwise the glossar entries will be ignored completely.
|
Otherwise the glossar entries will be ignored completely.
|
||||||
|
|
||||||
@ -473,232 +413,89 @@ gets extracted by the algorithm.
|
|||||||
In order to investigate user behavior on a more fine grained level, it will
|
In order to investigate user behavior on a more fine grained level, it will
|
||||||
be necessary to come up with a more elaborate approach. A better, still
|
be necessary to come up with a more elaborate approach. A better, still
|
||||||
simple approach, could be to use this kind of time limit and additionally
|
simple approach, could be to use this kind of time limit and additionally
|
||||||
look at the distance between artworks interacted with within one time
|
look at the distance between items interacted with within one time window.
|
||||||
window. When artworks are far apart it seems plausible that more than one
|
When items are far apart it seems plausible that more than one person
|
||||||
person interacted with them. Very short time lapses between events on
|
interacted with them. Very short time lapses between events on different
|
||||||
different artworks could also be an indicator that more than one person is
|
items could also be an indicator that more than one person is interacting
|
||||||
interacting with the table.
|
with the table.
|
||||||
|
|
||||||
## Assign a `trace` variable
|
## Assign a `path` variable
|
||||||
|
|
||||||
The `trace` variable is supposed to show one interaction trace with one
|
The `path` variable is supposed to show one interaction trace with one
|
||||||
artwork. Meaning it starts when an artwork is touched or flipped and stops
|
artwork. Meaning it starts when an artwork is touched or flipped and stops
|
||||||
when it is closed again. It is easy to assign a trace from flipping a card
|
when it is closed again. It is easy to assign a path from flipping a card
|
||||||
over opening (maybe several) topics and pop-ups for this artwork card until
|
over opening (maybe several) topics and pop-ups for this artwork card until
|
||||||
closing this card again. But one would like to assign the same trace to
|
closing this card again. But one would like to assign the same path to
|
||||||
move events surrounding this interaction. Again, this is not possible in an
|
move events surrounding this interaction. Again, this is not possible in an
|
||||||
algorithmic way but only heuristically. I used the `case` variable in order
|
algorithmic way but only heuristically.
|
||||||
to get meaningful units around the artworks.
|
|
||||||
|
|
||||||
If within one case only a single trace for a single artwork was opened, I
|
Again, I used a time cutoff for this. First, if a `move` event occurs, it
|
||||||
assigned this trace to the moves associated with this artwork. It (quite
|
is checked, if the same item has been flipped less than 20 seconds
|
||||||
often) happens that within one case one artwork is opened and closed
|
beforehand. If yes, the same path indicator is assigned to this `move`. If
|
||||||
several times, each time starting a new trace. I then assigned all the
|
not, temporarily a new "move indicator" is assigned. Then, a "backward
|
||||||
following move events to the trace beforehand. This is, of course,
|
pass" is applied, where it is checked if the same item is opened less than
|
||||||
arbitrary and could also be handled the other way around.
|
20 seconds _after_ the event occurs. If yes, that path indicator is
|
||||||
|
assigned. For all the remaining moves, a new path number is assigned. This
|
||||||
Another possibility is, that an artwork gets moved within one trace without
|
corresponds to items being moved without being flipped.
|
||||||
being flipped. I then assigned a new trace to this move.
|
|
||||||
|
|
||||||
This overall worked very well even though it was based on the very
|
|
||||||
heuristic approach assigning a case when the table has not been touched for
|
|
||||||
20 seconds. It should be kept in mind that the trace assignments for the
|
|
||||||
moves will change when case is defined in a different way.
|
|
||||||
|
|
||||||
## A `move` event does not record any change
|
## A `move` event does not record any change
|
||||||
|
|
||||||
Most of the events in the log files are move events. Additionally, many of
|
Most of the events in the log files are move events. Additionally, many of
|
||||||
these move events are recorded but they do not indicate any change meaning
|
these move events are recorded but they do not indicate any change, meaning
|
||||||
the only difference is the time stamp. All other variables indicating moves
|
the only difference is the timestamp. All other variables indicating moves
|
||||||
like `x.start` and `x.stop`, `rotation.start` and `rotation.stop` etc. do
|
like `x.start` and `x.stop`, `rotation.start` and `rotation.stop` etc. do
|
||||||
not show any change. They represent about 2/3 of all move events. These
|
not show _any_ change. They represent about 2/3 of all move events. These
|
||||||
events are probably short touches of the table without an actual
|
events are probably short touches of the table without an actual
|
||||||
interaction. They were therefore removed from the data set.
|
interaction. They were therefore removed from the data set.
|
||||||
|
|
||||||
## Events that only close (`date.start` is NA)
|
|
||||||
|
|
||||||
It looks like there is some kind of log error for the events that do not
|
|
||||||
have a start stop. I was able to get rid of most by sorting for `popup` for
|
|
||||||
the openPopup events, but there are still some left (50 for the small data
|
|
||||||
set, which corresponds to 0.2 per mill). The following example shows that
|
|
||||||
artwork "501" gets closed (line 31030) while the pop-up `sommerbau.xml`
|
|
||||||
is still opened (line 31027). Then artwork "501" gets opened again
|
|
||||||
(line 31035) and after that the pop-up `sommerbau.xml` is closed (line
|
|
||||||
31040). This should not be possible and therefore (correctly) two events
|
|
||||||
are assigned: One where the pop-up was opened and then not closed (which is
|
|
||||||
common) and another one where the pop-up has no start.
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
dat[31000:31019,]
|
|
||||||
# Card gets flipped closed before pop-up closes --> log error!
|
|
||||||
```
|
|
||||||
|
|
||||||
I did not check all of these cases (for the complete data set this is
|
|
||||||
simply not possible by hand) but just excluded all events that do not have
|
|
||||||
a `date.start` since they are hard to interpret. Often they are log errors
|
|
||||||
but in some cases they might be resolvable.
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
# remove all events that do not have a `date.start`
|
|
||||||
dim(dat2[is.na(dat2$date.start), ])
|
|
||||||
dat2 <- dat2[!is.na(dat2$date.start), ]
|
|
||||||
```
|
|
||||||
|
|
||||||
In order to deal with these logging errors, I check the data for what I
|
|
||||||
call "fragmented traces". These are traces that cannot happen, when
|
|
||||||
everything is logged correctly, e.g., traces containing `flipCard ->
|
|
||||||
openPopup` or traces that only consist of `move`, `openTopic`, and
|
|
||||||
`openPopup` events. These fragmented traces are removed from the data. It
|
|
||||||
was not possible to check them all manually, but the 20 or more that I do
|
|
||||||
check in the raw log files were all some kind of logging error like above.
|
|
||||||
Most often a card was already closed again, before a topic card or pop-up
|
|
||||||
was recorded as being closed.
|
|
||||||
|
|
||||||
## Card indices go from 0 to 7 (instead of 0 to 5 as expected)
|
## Card indices go from 0 to 7 (instead of 0 to 5 as expected)
|
||||||
|
|
||||||
See `questions_number-of-cards.R` for more details.
|
In the beginning I thought that the number for topics was the index of
|
||||||
|
where the card was presented on the back of the item. But this is not
|
||||||
|
correct. It is the number of the topic. There are eight topics in total:
|
||||||
|
|
||||||
I wrote a function that for each artwork extracts the file names of the
|
|
||||||
possible topic cards and then looks up which topics have actually been
|
|
||||||
displayed on the back of the card. I added an index giving the ordering in
|
|
||||||
the index files.
|
|
||||||
|
|
||||||
The possible values in the variable `topicNumber` range from 0 to 7,
|
|
||||||
however, no artwork has more than six different numbers. So I just renamed
|
|
||||||
those numbers from 1 to the highest number, e.g., $0,1,2,4,5,6$ was changed
|
|
||||||
to $0\to 1,1\to 2,2\to 3,4\to 4,5\to 5,6\to 6$. Next I used the index to
|
|
||||||
assign topics and file names to the according pop-ups. This needs to be
|
|
||||||
cross checked with the programming, but seems the most plausible approach
|
|
||||||
with my current knowledge.
|
|
||||||
|
|
||||||
<!-- TODO: Ask Philipp -->
|
|
||||||
|
|
||||||
## Extracting topics from `index.xml` vs. `<artwork_number>.xml`
|
|
||||||
|
|
||||||
When I extract the topics from `index.html` I get different topics, than
|
|
||||||
when I get them from `<artwork>.html`. At first glance, it looks like using
|
|
||||||
`index.html` actually gives the wrong results.
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
artworks <- unique(dat2$artwork)
|
|
||||||
path <- "data/haum/ContentEyevisit/eyevisit_cards_light/"
|
|
||||||
topics <- extract_topics(artworks, rep("index.xml", length(artworks)), path)
|
|
||||||
topics2 <- extract_topics(artworks, paste0(artworks, ".xml"), path)
|
|
||||||
|
|
||||||
topics[!topics$file_name %in% topics2$file_name, ]
|
|
||||||
topics2[!topics2$file_name %in% topics$file_name, ]
|
|
||||||
```
|
```
|
||||||
|
Indices for topics:
|
||||||
|
0 artist
|
||||||
|
1 thema
|
||||||
|
2 komposition
|
||||||
|
3 leben des kunstwerks
|
||||||
|
4 details
|
||||||
|
5 licht und farbe
|
||||||
|
6 extra info
|
||||||
|
7 technik
|
||||||
|
```
|
||||||
|
On the back of items, there can be between 2 to 6 topic cards. Several of
|
||||||
|
these topic cards can be about the same topic, e.g., there can be two topic
|
||||||
|
cards assigned to the topic `thema`. It is impossible to find out if the
|
||||||
|
same topic card was opened several times or if different topic cards with
|
||||||
|
the same topic were opened from the same item. See example below for item
|
||||||
|
"001".
|
||||||
|
|
||||||
For artwork "031", `index.html` only defines 5 cards (the 6th is commented
|
```{r topics, echo = FALSE}
|
||||||
out), but `topicNumber` for this artwork has 6 different entries. I will
|
items <- sprintf("%03d", unique(datlogs$item))
|
||||||
therefore extract the topics from `<artwork>.html`. (This seems also better
|
topics <- extract_topics(items, xmlfiles = paste0(items, ".xml"),
|
||||||
compatible with other data sets like 8o8m.)
|
xmlpath = "data/haum/ContentEyevisit/eyevisit_cards_light/")
|
||||||
|
head(topics)
|
||||||
|
```
|
||||||
|
|
||||||
## New artworks "504" and "505" starting October 2022
|
## New artworks "504" and "505" starting October 2022
|
||||||
|
|
||||||
When I read in the complete data frame for the first time, all of the
|
When I read in the complete data frame for the first time, all of the
|
||||||
sudden there were 72 instead of 70 artworks. It seems like these two
|
sudden there were 72 instead of 70 items. It seems like these two
|
||||||
artworks appear on October 21, 2022.
|
artworks appear on October 21, 2022.
|
||||||
|
|
||||||
```{r}
|
```{r newitems}
|
||||||
dat0 <- read.table("data/haum/raw_logfiles_2023-09-23_01-31-30.csv",
|
summary(as.Date(datraw[datraw$item %in% c("504", "505"), "date"]))
|
||||||
sep = ";", header = TRUE)
|
|
||||||
dat0$date <- as.POSIXct(dat0$date)
|
|
||||||
dat0$glossar <- ifelse(dat0$artwork == "glossar", 1, 0)
|
|
||||||
|
|
||||||
# Remove irrelevant events
|
|
||||||
dat <- subset(dat0, !(dat0$event %in% c("Start Application",
|
|
||||||
"Show Application")))
|
|
||||||
|
|
||||||
summary(dat[dat$artwork %in% c("504", "505"), ])
|
|
||||||
```
|
```
|
||||||
|
|
||||||
The artworks seem to be have updated in general after October 21, 2022.
|
The artworks seem to be have updated in general after October 21, 2022. The
|
||||||
|
following table shows which items were presented in which years.
|
||||||
|
|
||||||
```{r}
|
```{r years}
|
||||||
art_after_oct2022 <- sort(unique(dat[dat$date >= "2022-10-21", "artwork"]))
|
xtabs(~ item + lubridate::year(date.start), datlogs)
|
||||||
art_before_oct2022 <- sort(unique(dat[dat$date <= "2022-10-21", "artwork"]))
|
|
||||||
# Removed artworks
|
|
||||||
art_before_oct2022[!art_before_oct2022 %in% art_after_oct2022]
|
|
||||||
# Additional artworks
|
|
||||||
art_after_oct2022[!art_after_oct2022 %in% art_before_oct2022]
|
|
||||||
```
|
```
|
||||||
|
|
||||||
The following table shows which artworks were presented in which years.
|
It shows that the artworks haven been updated after the Corona pandemic. I
|
||||||
|
think, the table was also moved to a different location at that point.
|
||||||
```{r}
|
|
||||||
xtabs(~ artwork + lubridate::year(date), dat)
|
|
||||||
```
|
|
||||||
|
|
||||||
It strongly suggests that the artworks haven been updated after the Corona
|
|
||||||
pandemic. I think, the table was also moved to a different location at that
|
|
||||||
point. (Check with PG to make sure.)
|
|
||||||
|
|
||||||
# Optimizing resources used by the code
|
|
||||||
|
|
||||||
After I started trying out the functions on the complete data set, it
|
|
||||||
became obvious (not surprisingly `:)`) that this will not work --
|
|
||||||
especially for the move events. The reshape function cannot take a long
|
|
||||||
data frame with over 6 Million entries and convert it into a wide data
|
|
||||||
frame (at least not on my laptop). The code is supposed to work "out of the
|
|
||||||
box" for researchers, hence it *should* run on a regular (8 core) laptop.
|
|
||||||
So, I changed the reshaping so that it is done in batches on subsets of the
|
|
||||||
data for every `fileId` separately. This means that events that span over
|
|
||||||
two (or more) raw log files cannot be closed and will then be removed from
|
|
||||||
the data set. The function warns about this, but it is a random process
|
|
||||||
getting rid of these data and seems therefore not like a systematic
|
|
||||||
problem. Another reason why this is not bad, is that durations cannot be
|
|
||||||
calculated for events across log files anyways, because the time stamps do
|
|
||||||
not increase systematically over log files (see above).
|
|
||||||
|
|
||||||
UPDATE: By now, I close the events spanning more than one log file after
|
|
||||||
this has been done.
|
|
||||||
|
|
||||||
I meant to put the lists back together with `do.call(rbind, some_list)` but
|
|
||||||
this can also not handle big data sets. I therefore switched to
|
|
||||||
`dplyr::bind_rows(some_ist)` which is really fast and was developed
|
|
||||||
especially for this purpose. It means, that I have to depend on the dplyr
|
|
||||||
package (which I am not a big fan of, since I meant to keep the package
|
|
||||||
self-contained).
|
|
||||||
|
|
||||||
# Reading list
|
|
||||||
|
|
||||||
* @Arizmendi2022 [--]
|
|
||||||
* @Bannert2014 [x]
|
|
||||||
* @Bousbia2010 [--]
|
|
||||||
* @Cerezo2020
|
|
||||||
* @GerjetsSchwan2021 [x]
|
|
||||||
* @Goldhammer2020
|
|
||||||
* @Guenther2007
|
|
||||||
* @HuberBannert2023 [x]
|
|
||||||
* @Kroehne2018
|
|
||||||
* @SchwanGerjets2021 [x]
|
|
||||||
* @vanderAalst2016 [Chap. 2, x]
|
|
||||||
* @vanderAalst2016 [Chap. 3]
|
|
||||||
* @vanderAalst2016 [Chap. 5, x]
|
|
||||||
* @Wang2019
|
|
||||||
|
|
||||||
# Open stuff
|
|
||||||
|
|
||||||
* Angle from which people approach table in Braunschweig? Consider in
|
|
||||||
rotation variable?
|
|
||||||
* Time limit for `case` variable different for different events? (openTopic
|
|
||||||
should be opened the longest)
|
|
||||||
|
|
||||||
$\to$ I think this is not relevant since I am looking at time *between*
|
|
||||||
events!
|
|
||||||
|
|
||||||
# Stuff AK found interesting
|
|
||||||
|
|
||||||
* Pre/post corona
|
|
||||||
* Identify school classes
|
|
||||||
* How many persons are present at the table?
|
|
||||||
|
|
||||||
# Other potential questions
|
|
||||||
|
|
||||||
* "Bursts"
|
|
||||||
* 1st vs. 2nd half of the day
|
|
||||||
* Can we identify "types of art"? With clustering or something?
|
|
||||||
* Possible to estimate how many persons per day? Maybe average of certain
|
|
||||||
weekdays? ... ?
|
|
||||||
|
|
||||||
|
577
README.md
Normal file
577
README.md
Normal file
@ -0,0 +1,577 @@
|
|||||||
|
Log data from the Multi-Touch Table at the HAUM
|
||||||
|
================
|
||||||
|
|
||||||
|
The Multi Touch Table at the Herzog-Anton-Ulrich-Museum (HAUM) in
|
||||||
|
Braunschweig gives visitors of the Museum the opportunity to interact
|
||||||
|
with about 70 artworks and 3 virtual cards containing information about
|
||||||
|
the museum and its layout. The table was installed at the institute in
|
||||||
|
October 2016 and since November 2016 log files from interactions of
|
||||||
|
visitors of the museum have been collected. These log files are in an
|
||||||
|
unstructured format and cannot be easily analyzed. The purpose of the
|
||||||
|
following document is to describe how the data haven been transformed
|
||||||
|
and which decisions have been made along the way.
|
||||||
|
|
||||||
|
# Data structure
|
||||||
|
|
||||||
|
The log files contain lines that indicate the beginning and end of
|
||||||
|
possible activities that can be performed when interacting with the
|
||||||
|
artworks on the table. The layout of the table looks like pictures have
|
||||||
|
been tossed on a large table. Every artwork is visible at the start
|
||||||
|
configuration. People can move the pictures on the table, they can be
|
||||||
|
scaled and rotated. Additionally, the virtual picture cards can be
|
||||||
|
flipped in order to find more information of the artwork on the “back”
|
||||||
|
of the card. One has to press a little `i` for more information in one
|
||||||
|
of the bottom corners of the card. On the back of the card two to six
|
||||||
|
information cards can be found with a teaser text about a certain topic.
|
||||||
|
These topic cards can be opened and a hypertext with detailed
|
||||||
|
information opens. Within these hypertexts certain technical terms can
|
||||||
|
be clicked for lay people to get more information. This also opens up a
|
||||||
|
pop-up. The events encoded in the raw log files therefore have the
|
||||||
|
following structure.
|
||||||
|
|
||||||
|
"Start Application" --> Start Application
|
||||||
|
"Show Application"
|
||||||
|
"Transform start" --> Move
|
||||||
|
"Transform stop"
|
||||||
|
"Show Info" --> Flip Card
|
||||||
|
"Show Front"
|
||||||
|
"Artwork/OpenCard" --> Open Topic
|
||||||
|
"Artwork/CloseCard"
|
||||||
|
"ShowPopup" --> Open Popup
|
||||||
|
"HidePopup"
|
||||||
|
|
||||||
|
The right side shows what events can be extracted from these raw lines.
|
||||||
|
The “Start Application” is not an event in the original sense since it
|
||||||
|
only indicates if the table was started or maybe reset itself. This is
|
||||||
|
not an interaction with the table and therefore not interesting in
|
||||||
|
itself. All “Start Application” and “Show Application” are therefore
|
||||||
|
excluded from the data when further processed and are only in the raw
|
||||||
|
log files.
|
||||||
|
|
||||||
|
# Parsing the raw log files
|
||||||
|
|
||||||
|
The first step is to parse the raw log files that are stored by the
|
||||||
|
application as text files in a rather unstructured format to a format
|
||||||
|
that can be read by common statistics software packages. The data are
|
||||||
|
therefore transferred to a spread sheet format. The following section
|
||||||
|
describes what problems were encountered while doing this.
|
||||||
|
|
||||||
|
## Corrupt lines
|
||||||
|
|
||||||
|
When reading the files containing the raw logs into R, a warning appears
|
||||||
|
that says
|
||||||
|
|
||||||
|
Warning messages:
|
||||||
|
incomplete final line found on '2016/2016_11_18-11_31_0.log'
|
||||||
|
incomplete final line found on '2016/2016_11_18-11_38_30.log'
|
||||||
|
incomplete final line found on '2016/2016_11_18-11_40_36.log'
|
||||||
|
...
|
||||||
|
|
||||||
|
When you open these files, it looks like the last line contains some
|
||||||
|
binary content. It is unclear why and how this happens. So when reading
|
||||||
|
the data, these lines were removed. A warning will be given that
|
||||||
|
indicates how many files have been affected.
|
||||||
|
|
||||||
|
## Extracted variables from raw log files
|
||||||
|
|
||||||
|
The following variables (columns in the data frame) are extracted from
|
||||||
|
the raw log file:
|
||||||
|
|
||||||
|
- `fileId`: Containing the zero-left-padded file name of the raw log
|
||||||
|
file the data line has been extracted from
|
||||||
|
|
||||||
|
- `folder`: The folder names in which the raw log files haven been
|
||||||
|
organized in. For the HAUM data set, the data are sorted by year
|
||||||
|
(folders 2016, 2017, 2018, 2019, 2020, 2021, 2022, and 2023).
|
||||||
|
|
||||||
|
- `date`: Extracted timestamp from the raw log file in the format
|
||||||
|
`yyyy-mm-dd hh:mm:ss`.
|
||||||
|
|
||||||
|
- `timeMs`: Containing a timestamp in Milliseconds that restarts with
|
||||||
|
every new raw log files.
|
||||||
|
|
||||||
|
- `event`: Start and stop event tags. See above for possible values.
|
||||||
|
|
||||||
|
- `item`: Identifier of the different items. This is a three-digit
|
||||||
|
(left-padded) number. The numbers of the items correspond to the
|
||||||
|
folder names in `/ContentEyevisit/eyevisit_cards_light/` and were
|
||||||
|
orginally taken from the museums catalogue.
|
||||||
|
|
||||||
|
- `popup`: Name of the pop-up opened. This is only interesting for
|
||||||
|
“openPopup” events.
|
||||||
|
|
||||||
|
- `topic`: The number of the topic card that has been opened at the back
|
||||||
|
of the item card. See below for a more detailed descripttion what
|
||||||
|
these numbers mean.
|
||||||
|
|
||||||
|
- `x`: Value of x-coordinate in pixel on the 4K-Display
|
||||||
|
($3840 \times 2160$)
|
||||||
|
|
||||||
|
- `y`: Value of y-coordinate in pixel
|
||||||
|
|
||||||
|
- `scale`: Number in 128 bit that indicates how much the card has been
|
||||||
|
scaled
|
||||||
|
|
||||||
|
- `rotation`: Degree of rotation in start configuration.
|
||||||
|
|
||||||
|
<!-- TODO: Nach welchem Zeitintervall resettet sich der Tisch wieder in die
|
||||||
|
Ausgangskonfiguration? -> PM needs to look it up -->
|
||||||
|
|
||||||
|
## Variables after “closing of events”
|
||||||
|
|
||||||
|
The raw log data consist of start and stop events for each event type.
|
||||||
|
After preprocessing four event types are extracted: `move`, `flipCard`,
|
||||||
|
`openTopic`, and `openPopup`. Except for the `move` events, which can
|
||||||
|
occur at any time when interacting with an item card on the table, the
|
||||||
|
events have a hierarchical order: An item card first needs to be flipped
|
||||||
|
(`flipCard`), then the topic cards on the back of the card can be opened
|
||||||
|
(`openTopic`), and finally pop-ups on these topic cards can be opened
|
||||||
|
(`openPopup`). This implies that the event `openPopup` can only be
|
||||||
|
present for a certain item, if the card has already been flipped (i.e.,
|
||||||
|
an event `flipCard` for the same item has already occured).
|
||||||
|
|
||||||
|
After preprocessing, the data frame is now in a wide format with columns
|
||||||
|
for the start and the stop of each event and contains the following
|
||||||
|
variables:
|
||||||
|
|
||||||
|
- `fileId.start` / `fileId.stop`: See above.
|
||||||
|
|
||||||
|
- `date.start` / `date.stop`: See above.
|
||||||
|
|
||||||
|
- `folder`: Containing the folder name (see above)
|
||||||
|
|
||||||
|
- `case`: A numerical variable indicating cases in the data. A “case”
|
||||||
|
indicates an interaction interval and could be defined in different
|
||||||
|
ways. Right now a new case begins, when no event occurred for 20
|
||||||
|
seconds or longer.
|
||||||
|
|
||||||
|
- `path`: A path is defined as one interaction with one item A path can
|
||||||
|
either start with a `flipCard` event or when an item has been touched
|
||||||
|
for the first time within this case. A path ends with the item card
|
||||||
|
being flipped close again or with the last movement of the card within
|
||||||
|
this case. One case can contain several paths with the same item when
|
||||||
|
the item is flipped open and flipped close again several times within
|
||||||
|
a short time.
|
||||||
|
|
||||||
|
- `glossar`: An indicator variable with values 0/1 that tracks if a
|
||||||
|
pop-up has been opened from the glossar folder. These pop-ups can be
|
||||||
|
assigned to the wrong item since it is not possible to do this
|
||||||
|
algorithmically. It is possible that two items are flipped open that
|
||||||
|
could both link to the same pop-up from a glossar. The indicator
|
||||||
|
variable is left as a variable, so that these pop-ups can be easily
|
||||||
|
deleted from the data. Right now, glossar entries can be ignored
|
||||||
|
completely by setting an argument and this is done by default. Using
|
||||||
|
the pop-ups from the glossar will need a lot more love, before it
|
||||||
|
behaves satisfactorily.
|
||||||
|
|
||||||
|
- `event`: Indicating the event. Can take tha values `move`, `flipCard`,
|
||||||
|
`openTopic`, and `openPopup`.
|
||||||
|
|
||||||
|
- `item`: Identifier of the different artworks and information cards.
|
||||||
|
This is a three-digit (left-padded) number. See above.
|
||||||
|
|
||||||
|
- `timeMs.start` / `timeMs.stop`: See above.
|
||||||
|
|
||||||
|
- `duration`: Calculated by $timeMs.stop - timeMs.start$ in
|
||||||
|
Milliseconds. Needs to be adjusted for events spanning more than one
|
||||||
|
log file by a factor of $60,000 \times \text{number of logfiles}$. See
|
||||||
|
below for details.
|
||||||
|
|
||||||
|
- `topic`: See above.
|
||||||
|
|
||||||
|
- `popup`: See above.
|
||||||
|
|
||||||
|
- `x.start` / `x.stop`: See above.
|
||||||
|
|
||||||
|
- `y.start` / `y.stop`: See above.
|
||||||
|
|
||||||
|
- `distance`: Euclidean distande calculated from $(x.start, y.start)$
|
||||||
|
and $(x.stop, y.stop)$.
|
||||||
|
|
||||||
|
- `scale.start` / `scale.stop`: See above.
|
||||||
|
|
||||||
|
- `scaleSize`: Relative scaling of item card, calculated by
|
||||||
|
$\frac{scale.stop}{scale.start}$.
|
||||||
|
|
||||||
|
- `rotation.start` / `rotation.stop`: See above.
|
||||||
|
|
||||||
|
- `rotationDegree`: Difference of rotation from $rotation.stop$ to
|
||||||
|
$rotation.start$.
|
||||||
|
|
||||||
|
## How unclosed events are handled
|
||||||
|
|
||||||
|
Events do not necessarily need to be completed. A person can, e.g.,
|
||||||
|
leave the table and not flip the item card close again. For `flipCard`,
|
||||||
|
`openTopic`, and `openPopup` the data frame contains `NA` when the event
|
||||||
|
does not complete. For `move` events it happens quite often that a start
|
||||||
|
event follows a start event and a stop event follows a stop event.
|
||||||
|
Technically a move event cannot *not* be finished and the number of
|
||||||
|
events without a start or stop indicate that the time resolution was not
|
||||||
|
sufficient to catch all these events accurately. Double start and stop
|
||||||
|
`move` events have therefore been deleted from the data set.
|
||||||
|
|
||||||
|
## Additional meta data
|
||||||
|
|
||||||
|
For the HAUM data, I added meta data on state holidays and school
|
||||||
|
vacations.
|
||||||
|
|
||||||
|
This led to the following additional variables:
|
||||||
|
|
||||||
|
- `holiday`
|
||||||
|
|
||||||
|
- `vacations`
|
||||||
|
|
||||||
|
# Problems and how I handled them
|
||||||
|
|
||||||
|
This lists some problems with the log data that required decisions.
|
||||||
|
These decisions influence the outcome and maybe even the data quality.
|
||||||
|
Hence, I tried to document how I handled these problems and explain the
|
||||||
|
decisions I made.
|
||||||
|
|
||||||
|
## Weird behavior of `timeMs` and neg. `duration` values
|
||||||
|
|
||||||
|
`timeMs` resets itself every time a new log file starts. This means that
|
||||||
|
the durations of events spanning more than one log file must be
|
||||||
|
adjusted. Instead of just calculating $timeMs.stop - timeMs.start$,
|
||||||
|
`timeMs.start` must be subtracted from the maximum duration of the log
|
||||||
|
file where the event started ($600,000 ms$) and the `timeMs.stop` must
|
||||||
|
be added. If the event spans more than two log files, a multiple of
|
||||||
|
$600,000$ must be taken, e.g. for three log files it must be:
|
||||||
|
$2 \times 600,000 - timeMs.start + timeMs.stop$ and so on.
|
||||||
|
|
||||||
|
![](README_files/figure-gfm/timems-1.png)<!-- -->
|
||||||
|
|
||||||
|
The boxplot shows that we have a continuous range of values within one
|
||||||
|
log file but that `timeMs` does not increase over log files. I kept
|
||||||
|
`timeMs.start` and `timeMs.stop` and also `fileId.start` and
|
||||||
|
`fileId.stop` in the data frame, so it is clear when events span more
|
||||||
|
than one log file.
|
||||||
|
|
||||||
|
<!--
|
||||||
|
Infos from the programmer:
|
||||||
|
|
||||||
|
"Bin außerdem gerade den Code von damals durchgegangen. Das Logging läuft
|
||||||
|
so: Mit Start der Anwendung wird alle 10 Minuten ein neues Logfile
|
||||||
|
erstellt. Die Startzeit, von der aus die Duration berechnet wird, wird
|
||||||
|
jeweils neu gesetzt. Duration ist also nicht "Dauer seit Start der
|
||||||
|
Anwendung" sondern "Dauer seit Restart des Loggers". Deine Vermutung ist
|
||||||
|
also richtig - es sollte keine Durations >10 Minuten geben. Der erste
|
||||||
|
Eintrag eines Logfiles kann alles zwischen 0 und 10 Minuten sein (je
|
||||||
|
nachdem, ob der Tisch zum Zeitpunkt des neuen Logging-Intervalls in
|
||||||
|
Benutzung war). Wenn ein Case also über 2+ Logs verteilt ist, musst du auf
|
||||||
|
die Duration jeweils 10 Minuten pro Logfile nach dem ersten addieren, damit
|
||||||
|
es passt."
|
||||||
|
-->
|
||||||
|
|
||||||
|
## Left padding of file IDs
|
||||||
|
|
||||||
|
The file names of the raw log files are automatically generated and
|
||||||
|
contain a timestamp. This timestamp is not well formed. First, it
|
||||||
|
contains an incorrect month. The months go from 0 to 11 which means,
|
||||||
|
that the file name `2016_11_15-12_12_57.log` was collected on December
|
||||||
|
15, 2016 at 12:12 pm. Another problem is that the file names are not
|
||||||
|
zero left padded, e.g., `2016_11_15-12_2_57.log`. This file was
|
||||||
|
collected on December 15, 2016 at 12:02 pm and therefore before the file
|
||||||
|
above. But most sorting algorithms, will sort these files in the order
|
||||||
|
shown below. In order to preprocess the data and close events that
|
||||||
|
belong together, the data need to be sorted by events and artworks
|
||||||
|
repeatedly. In order to get them back in the correct time order, it is
|
||||||
|
necessary to order them based on three variables: `fileId.start`,
|
||||||
|
`date.start` and `timeMs.start`. The file IDs therefore need to sort in
|
||||||
|
the correct order (again see below for example). I zero left padded the
|
||||||
|
log file names within the data frame using it as an identifier. These
|
||||||
|
“file names” do not correspond exactly to the original raw log file
|
||||||
|
names. This needs to be kept in mind when doing any kind of matching
|
||||||
|
etc.
|
||||||
|
|
||||||
|
## what it looked like before left padding
|
||||||
|
# 1422 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_2_57.log 2016-12-15 12:12:56 599671 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26874254
|
||||||
|
# 1423 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57 621 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26523465
|
||||||
|
# 1424 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57 677 Transform stop 076 076.xml NA 2092.25 2008.00 0.2997736 13.26239605
|
||||||
|
# 1425 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57 774 Transform start 076 076.xml NA 2092.25 2008.00 0.2999345 13.26239605
|
||||||
|
# 1426 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57 850 Transform stop 076 076.xml NA 2092.25 2008.00 0.2997107 13.26223362
|
||||||
|
# 1427 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_2_57.log 2016-12-15 12:12:57 599916 Transform stop 076 076.xml NA 2092.25 2008.00 0.2997771 13.26523465
|
||||||
|
|
||||||
|
## what it looks like now
|
||||||
|
# 1422 2016_11_15-12_02_57.log 2016-12-15 12:12:56 599671 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26874254
|
||||||
|
# 1423 2016_11_15-12_02_57.log 2016-12-15 12:12:57 599916 Transform stop 076 076.xml NA 2092.25 2008.00 0.2997771 13.26523465
|
||||||
|
# 1424 2016_11_15-12_12_57.log 2016-12-15 12:12:57 621 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26523465
|
||||||
|
# 1425 2016_11_15-12_12_57.log 2016-12-15 12:12:57 677 Transform stop 076 076.xml NA 2092.25 2008.00 0.2997736 13.26239605
|
||||||
|
# 1426 2016_11_15-12_12_57.log 2016-12-15 12:12:57 774 Transform start 076 076.xml NA 2092.25 2008.00 0.2999345 13.26239605
|
||||||
|
# 1427 2016_11_15-12_12_57.log 2016-12-15 12:12:57 850 Transform stop 076 076.xml NA 2092.25 2008.00 0.2997107 13.26223362
|
||||||
|
|
||||||
|
## Timestamps repeat
|
||||||
|
|
||||||
|
The timestamps in the `date` variable record year, month, day, hour,
|
||||||
|
minute and seconds. Since one second is not a very short time interval
|
||||||
|
for a move on a touch display, this is not fine grained enough to bring
|
||||||
|
events into the correct order, meaning there are events from the same
|
||||||
|
log file having the same timestamp and even events from different log
|
||||||
|
files having the same timestamp. The log files get written about every
|
||||||
|
10 minutes (which can easily be seen when looking at the file names of
|
||||||
|
the raw log files). So in order to get events in the correct order, it
|
||||||
|
is necessary to first order by file ID, within file ID then sort by
|
||||||
|
timestamp `date` and then within these more coarse grained timestamps
|
||||||
|
sort be `timeMs`. But as explained above, `timeMs` can only be sorted
|
||||||
|
within one file ID, since they do not increase consistently over log
|
||||||
|
files, but have a new setoff for each raw log file.
|
||||||
|
|
||||||
|
## x,y-coordinates outside of display range
|
||||||
|
|
||||||
|
The display of the Multi-Touch-Table is a 4K-display with 3840 x 2160
|
||||||
|
pixels. When you plot the start and stop coordinates, the display is
|
||||||
|
clearly distinguishable. However, a lot of points are outside of the
|
||||||
|
display range. This can happen, when the art objects are scaled and then
|
||||||
|
moved to the very edge of the table. Then it will record pixels outside
|
||||||
|
of the table. These are actually valid data points and I will leave them
|
||||||
|
as is.
|
||||||
|
|
||||||
|
``` r
|
||||||
|
datlogs <- read.table("code/results/event_logfiles_2024-02-21_16-07-33.csv", sep = ";",
|
||||||
|
header = TRUE)
|
||||||
|
|
||||||
|
par(mfrow = c(1, 2))
|
||||||
|
plot(y.start ~ x.start, datlogs)
|
||||||
|
abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
|
||||||
|
plot(y.stop ~ x.stop, datlogs)
|
||||||
|
abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
|
||||||
|
```
|
||||||
|
|
||||||
|
![](README_files/figure-gfm/xycoord-1.png)<!-- -->
|
||||||
|
|
||||||
|
``` r
|
||||||
|
aggregate(cbind(x.start, x.stop, y.start, y.stop) ~ 1, datlogs, mean)
|
||||||
|
```
|
||||||
|
|
||||||
|
## x.start x.stop y.start y.stop
|
||||||
|
## 1 1978.202 1975.876 1137.481 1133.494
|
||||||
|
|
||||||
|
## Pop-ups from glossar cannot be assigned to a specific item
|
||||||
|
|
||||||
|
All the information, pictures and texts for the topics and pop-ups are
|
||||||
|
stored in
|
||||||
|
`/data/haum/ContentEyevisit/eyevisit_cards_light/<item_number>`. Among
|
||||||
|
other things, each folder contains XML-files with the information about
|
||||||
|
any technical terms that can be opened from the hypertexts on the topic
|
||||||
|
cards. Often these information are item dependent and then the
|
||||||
|
corresponding XML-file is in the folder for this item. Sometimes,
|
||||||
|
however, more general terms can be opened. In order to avoid multiple
|
||||||
|
files containing the same information, these were stored in a folder
|
||||||
|
called `glossar` and get accessed from there. The raw log files only
|
||||||
|
contain the path to this glossar entry and did not record from which
|
||||||
|
item it was accessed. I tried to assign these glossar entries to the
|
||||||
|
correct items. The (very heuristic) approach was this:
|
||||||
|
|
||||||
|
1. Create a lookup table with all XML-file names (possible pop-ups)
|
||||||
|
from the glossar folder and what items possibly call them. This was
|
||||||
|
stored as an `RData` object for easier handling but should maybe be
|
||||||
|
stored in a more interoperable format.
|
||||||
|
|
||||||
|
2. I went through all possible pop-ups in this lookup table and stored
|
||||||
|
the items that are associated with it.
|
||||||
|
|
||||||
|
3. I created a sub data frame without move events (since they can never
|
||||||
|
be associated with a pop-up) and went through every line and looked
|
||||||
|
up if an item and a topic card had been opened. If this was the case
|
||||||
|
and a glossar entry came up before the item was closed again, I
|
||||||
|
assigned this item to the glossar entry.
|
||||||
|
|
||||||
|
This is heuristic since it is possible that several topic cards from
|
||||||
|
different items are opened simultaneously and the glossar pop-up could
|
||||||
|
be opened from either one (it could even be more than two, of course).
|
||||||
|
In these cases the item that was opened closest to the glossar pop-up
|
||||||
|
has been assigned, but this can never be completely error free.
|
||||||
|
|
||||||
|
And this heuristic only assigns a little more than half of the glossar
|
||||||
|
entries. Since my heuristic only looks for the last item that has been
|
||||||
|
opened and if this item is a possible candidate it misses all glossar
|
||||||
|
pop-ups where another item has been opened in between. This is still an
|
||||||
|
open TODO to write a more elaborate algorithm.
|
||||||
|
|
||||||
|
All glossar pop-ups that do not get matched with an item are removed
|
||||||
|
from the data set with a warning if the argument `glossar = TRUE` is
|
||||||
|
set. Otherwise the glossar entries will be ignored completely.
|
||||||
|
|
||||||
|
## Assign a `case` variable based on “time heuristic”
|
||||||
|
|
||||||
|
One thing needed in order to work with the data set and use it for
|
||||||
|
machine learning algorithms like process mining, is a variable that
|
||||||
|
tries to identify a case. A case variable will structure the data frame
|
||||||
|
in a way that navigation behavior can actually be investigated. However,
|
||||||
|
we do not know if several people are standing around the table
|
||||||
|
interacting with it or just one very active person. The simplest way to
|
||||||
|
define a case variable is to just use a time limit between events. This
|
||||||
|
means that when the table has not been interacted with for, e.g., 20
|
||||||
|
seconds than it is assumed that a person moved on and a new person
|
||||||
|
started interacting with the table. This is the easiest heuristic and
|
||||||
|
implemented at the moment. Process mining shows that this simple
|
||||||
|
approach works in a way that the correct process gets extracted by the
|
||||||
|
algorithm.
|
||||||
|
|
||||||
|
In order to investigate user behavior on a more fine grained level, it
|
||||||
|
will be necessary to come up with a more elaborate approach. A better,
|
||||||
|
still simple approach, could be to use this kind of time limit and
|
||||||
|
additionally look at the distance between items interacted with within
|
||||||
|
one time window. When items are far apart it seems plausible that more
|
||||||
|
than one person interacted with them. Very short time lapses between
|
||||||
|
events on different items could also be an indicator that more than one
|
||||||
|
person is interacting with the table.
|
||||||
|
|
||||||
|
## Assign a `path` variable
|
||||||
|
|
||||||
|
The `path` variable is supposed to show one interaction trace with one
|
||||||
|
artwork. Meaning it starts when an artwork is touched or flipped and
|
||||||
|
stops when it is closed again. It is easy to assign a path from flipping
|
||||||
|
a card over opening (maybe several) topics and pop-ups for this artwork
|
||||||
|
card until closing this card again. But one would like to assign the
|
||||||
|
same path to move events surrounding this interaction. Again, this is
|
||||||
|
not possible in an algorithmic way but only heuristically.
|
||||||
|
|
||||||
|
Again, I used a time cutoff for this. First, if a `move` event occurs,
|
||||||
|
it is checked, if the same item has been flipped less than 20 seconds
|
||||||
|
beforehand. If yes, the same path indicator is assigned to this `move`.
|
||||||
|
If not, temporarily a new “move indicator” is assigned. Then, a
|
||||||
|
“backward pass” is applied, where it is checked if the same item is
|
||||||
|
opened less than 20 seconds *after* the event occurs. If yes, that path
|
||||||
|
indicator is assigned. For all the remaining moves, a new path number is
|
||||||
|
assigned. This corresponds to items being moved without being flipped.
|
||||||
|
|
||||||
|
## A `move` event does not record any change
|
||||||
|
|
||||||
|
Most of the events in the log files are move events. Additionally, many
|
||||||
|
of these move events are recorded but they do not indicate any change,
|
||||||
|
meaning the only difference is the timestamp. All other variables
|
||||||
|
indicating moves like `x.start` and `x.stop`, `rotation.start` and
|
||||||
|
`rotation.stop` etc. do not show *any* change. They represent about 2/3
|
||||||
|
of all move events. These events are probably short touches of the table
|
||||||
|
without an actual interaction. They were therefore removed from the data
|
||||||
|
set.
|
||||||
|
|
||||||
|
## Card indices go from 0 to 7 (instead of 0 to 5 as expected)
|
||||||
|
|
||||||
|
In the beginning I thought that the number for topics was the index of
|
||||||
|
where the card was presented on the back of the item. But this is not
|
||||||
|
correct. It is the number of the topic. There are eight topics in total:
|
||||||
|
|
||||||
|
Indices for topics:
|
||||||
|
0 artist
|
||||||
|
1 thema
|
||||||
|
2 komposition
|
||||||
|
3 leben des kunstwerks
|
||||||
|
4 details
|
||||||
|
5 licht und farbe
|
||||||
|
6 extra info
|
||||||
|
7 technik
|
||||||
|
|
||||||
|
On the back of items, there can be between 2 to 6 topic cards. Several
|
||||||
|
of these topic cards can be about the same topic, e.g., there can be two
|
||||||
|
topic cards assigned to the topic `thema`. It is impossible to find out
|
||||||
|
if the same topic card was opened several times or if different topic
|
||||||
|
cards with the same topic were opened from the same item. See example
|
||||||
|
below for item “001”.
|
||||||
|
|
||||||
|
## item file_name topic
|
||||||
|
## 1 001 001_dargestellte.xml thema
|
||||||
|
## 2 001 001_thema1.xml thema
|
||||||
|
## 3 001 001_leben.xml leben des kunstwerks
|
||||||
|
## 4 001 001_leben3.xml leben des kunstwerks
|
||||||
|
## 5 001 001_thema2.xml thema
|
||||||
|
## 6 001 001_thema.xml thema
|
||||||
|
|
||||||
|
## New artworks “504” and “505” starting October 2022
|
||||||
|
|
||||||
|
When I read in the complete data frame for the first time, all of the
|
||||||
|
sudden there were 72 instead of 70 items. It seems like these two
|
||||||
|
artworks appear on October 21, 2022.
|
||||||
|
|
||||||
|
``` r
|
||||||
|
summary(as.Date(datraw[datraw$item %in% c("504", "505"), "date"]))
|
||||||
|
```
|
||||||
|
|
||||||
|
## Min. 1st Qu. Median Mean 3rd Qu. Max.
|
||||||
|
## "2022-10-21" "2023-01-11" "2023-03-08" "2023-03-09" "2023-05-21" "2023-07-05"
|
||||||
|
|
||||||
|
The artworks seem to be have updated in general after October 21, 2022.
|
||||||
|
The following table shows which items were presented in which years.
|
||||||
|
|
||||||
|
``` r
|
||||||
|
xtabs(~ item + lubridate::year(date.start), datlogs)
|
||||||
|
```
|
||||||
|
|
||||||
|
## lubridate::year(date.start)
|
||||||
|
## item 2016 2017 2018 2019 2020 2022 2023
|
||||||
|
## 1 277 4082 1912 1434 424 394 1315
|
||||||
|
## 3 485 6730 3126 2356 528 457 1124
|
||||||
|
## 19 714 8656 4028 2743 660 698 1595
|
||||||
|
## 20 595 8461 3996 2983 938 657 1355
|
||||||
|
## 24 497 6638 2912 2251 649 439 1028
|
||||||
|
## 27 567 5959 3112 2318 651 711 1324
|
||||||
|
## 28 601 9329 4394 3056 778 762 1570
|
||||||
|
## 29 425 6865 3830 2365 516 615 1174
|
||||||
|
## 31 289 4118 2051 1218 291 296 675
|
||||||
|
## 32 562 7016 3477 2253 726 766 1647
|
||||||
|
## 33 509 4936 2242 1449 555 358 666
|
||||||
|
## 36 434 4505 2276 1668 373 387 976
|
||||||
|
## 37 242 4478 2182 1554 339 423 1168
|
||||||
|
## 38 480 4617 2144 1397 371 381 784
|
||||||
|
## 39 395 3227 1313 1003 237 161 622
|
||||||
|
## 41 282 3329 1303 1022 225 209 701
|
||||||
|
## 42 203 3113 1307 903 242 191 421
|
||||||
|
## 43 115 2420 1089 806 176 219 486
|
||||||
|
## 45 1491 13561 5924 4474 966 585 1828
|
||||||
|
## 46 903 9181 5340 3812 961 944 1648
|
||||||
|
## 47 306 4949 2395 1510 750 297 675
|
||||||
|
## 48 723 10455 5384 4162 1328 948 2031
|
||||||
|
## 49 433 4326 2124 1414 434 431 809
|
||||||
|
## 51 564 7837 4577 2991 884 659 1370
|
||||||
|
## 52 447 5021 2104 1729 471 349 840
|
||||||
|
## 54 424 5068 2816 2008 529 370 918
|
||||||
|
## 55 358 4859 2069 1428 341 403 1303
|
||||||
|
## 57 860 14264 6625 5092 1410 1221 2714
|
||||||
|
## 60 555 6865 3539 2336 639 586 1415
|
||||||
|
## 62 547 6736 3803 2210 795 633 1322
|
||||||
|
## 63 251 3677 1827 1241 300 282 527
|
||||||
|
## 66 552 6004 2774 1977 505 373 932
|
||||||
|
## 69 394 3730 1827 1438 272 206 680
|
||||||
|
## 70 226 3766 1843 973 293 268 703
|
||||||
|
## 71 557 6160 2490 1846 570 323 839
|
||||||
|
## 72 426 6194 2857 2129 508 635 1553
|
||||||
|
## 73 432 6125 2880 1821 583 395 939
|
||||||
|
## 75 258 5885 2418 1562 369 257 645
|
||||||
|
## 76 861 12435 6253 4214 1753 1153 2268
|
||||||
|
## 77 816 8595 4197 2897 699 674 1452
|
||||||
|
## 78 410 5632 2498 1924 394 408 850
|
||||||
|
## 80 1650 25687 12429 7782 1975 1712 4433
|
||||||
|
## 83 644 8618 4720 3026 987 1027 2294
|
||||||
|
## 84 184 2121 1231 759 231 254 465
|
||||||
|
## 87 149 1618 722 632 99 0 0
|
||||||
|
## 88 513 6996 3493 2272 539 533 1420
|
||||||
|
## 89 214 2204 950 723 156 0 0
|
||||||
|
## 90 281 3756 1372 1143 403 320 932
|
||||||
|
## 93 613 8528 4224 3015 696 1174 2058
|
||||||
|
## 98 462 6662 3265 2565 704 670 1453
|
||||||
|
## 99 180 4162 1653 1454 363 411 868
|
||||||
|
## 101 414 4209 1859 1282 392 411 981
|
||||||
|
## 103 677 8758 4366 3165 1045 909 1871
|
||||||
|
## 104 423 5256 2381 1865 463 467 933
|
||||||
|
## 107 181 2101 1106 788 205 146 339
|
||||||
|
## 109 321 4001 1619 1106 292 188 453
|
||||||
|
## 110 489 5846 2785 2008 494 387 923
|
||||||
|
## 125 640 8435 4519 3334 926 0 0
|
||||||
|
## 129 598 11322 5046 3369 910 1131 1682
|
||||||
|
## 145 419 7821 3945 2694 706 740 1396
|
||||||
|
## 176 507 8465 3968 2787 687 552 1544
|
||||||
|
## 180 516 7563 3720 2765 585 550 1272
|
||||||
|
## 183 377 4014 1819 1741 346 251 675
|
||||||
|
## 187 340 4222 2165 1753 319 312 734
|
||||||
|
## 197 426 7710 3603 2510 671 602 1217
|
||||||
|
## 229 303 4872 2360 1891 482 389 1005
|
||||||
|
## 231 271 3606 1851 1239 318 236 467
|
||||||
|
## 501 1915 15968 7849 5060 1157 890 2989
|
||||||
|
## 502 1212 14550 7111 4749 1105 883 2752
|
||||||
|
## 503 1308 15218 8632 6399 1626 870 2558
|
||||||
|
## 504 0 0 0 0 0 363 662
|
||||||
|
## 505 0 0 0 0 0 426 1533
|
||||||
|
|
||||||
|
It shows that the artworks haven been updated after the Corona pandemic.
|
||||||
|
I think, the table was also moved to a different location at that point.
|
BIN
README_files/figure-gfm/timems-1.png
Normal file
BIN
README_files/figure-gfm/timems-1.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 6.2 KiB |
BIN
README_files/figure-gfm/xycoord-1.png
Normal file
BIN
README_files/figure-gfm/xycoord-1.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 12 KiB |
Loading…
Reference in New Issue
Block a user