Updated README.Rmd and exported as github_document
This commit is contained in:
		
							parent
							
								
									37e67bfa69
								
							
						
					
					
						commit
						9762c61a8d
					
				
							
								
								
									
										539
									
								
								README.Rmd
									
									
									
									
									
								
							
							
						
						
									
										539
									
								
								README.Rmd
									
									
									
									
									
								
							@ -1,46 +1,38 @@
 | 
				
			|||||||
---
 | 
					---
 | 
				
			||||||
title: "Background information about MTT data"
 | 
					title: "Log data from the Multi-Touch Table at the HAUM"
 | 
				
			||||||
author: "Nora Wickelmaier"
 | 
					output: github_document
 | 
				
			||||||
date: "`r Sys.Date()`"
 | 
					 | 
				
			||||||
output: 
 | 
					 | 
				
			||||||
  html_document:
 | 
					 | 
				
			||||||
    number_sections: true
 | 
					 | 
				
			||||||
    toc: true
 | 
					 | 
				
			||||||
---
 | 
					---
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```{r, include = FALSE}
 | 
					```{r, include = FALSE}
 | 
				
			||||||
# setwd("C:/Users/nwickelmaier/Nextcloud/Documents/MDS/2023ss/60100_master_thesis")
 | 
					devtools::load_all("../../../../software/mtt")
 | 
				
			||||||
devtools::load_all("../../../software/mtt")
 | 
					 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
# Log data from the Multi-Touch Table at the HAUM
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
The Multi Touch Table at the Herzog-Anton-Ulrich-Museum (HAUM) in
 | 
					The Multi Touch Table at the Herzog-Anton-Ulrich-Museum (HAUM) in
 | 
				
			||||||
Braunschweig gives visitors of the Museum the opportunity to interact with
 | 
					Braunschweig gives visitors of the Museum the opportunity to interact with
 | 
				
			||||||
67 artworks and 3 tiles containing information about the museum and its
 | 
					about 70 artworks and 3 virtual cards containing information about the
 | 
				
			||||||
layout. The table was installed at the institute in October 2016 and since
 | 
					museum and its layout. The table was installed at the institute in October
 | 
				
			||||||
November 2016 log files from interactions of visitors of the museum have
 | 
					2016 and since November 2016 log files from interactions of visitors of the
 | 
				
			||||||
been collected. These log files are in an unstructured format and cannot be
 | 
					museum have been collected. These log files are in an unstructured format
 | 
				
			||||||
easily analyzed. The purpose of the following document is to describe how
 | 
					and cannot be easily analyzed. The purpose of the following document is to
 | 
				
			||||||
the data haven been transformed and which decisions have been made along
 | 
					describe how the data haven been transformed and which decisions have been
 | 
				
			||||||
the way.
 | 
					made along the way.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
# Data structure
 | 
					# Data structure
 | 
				
			||||||
 | 
					
 | 
				
			||||||
The log files contain lines that indicate the beginning and end of possible
 | 
					The log files contain lines that indicate the beginning and end of possible
 | 
				
			||||||
actions that can be performed when interacting with the artworks on the
 | 
					activities that can be performed when interacting with the artworks on the
 | 
				
			||||||
table. The layout of the table looks like 70 pictures have been tossed on a
 | 
					table. The layout of the table looks like pictures have been tossed on a
 | 
				
			||||||
large table. Every artwork is visible at the start configuration. People
 | 
					large table. Every artwork is visible at the start configuration. People
 | 
				
			||||||
can move the pictures on the table, they can be scaled and rotated.
 | 
					can move the pictures on the table, they can be scaled and rotated.
 | 
				
			||||||
Additionally, the virtual picture cards can be flipped in order to find
 | 
					Additionally, the virtual picture cards can be flipped in order to find
 | 
				
			||||||
more information of the artwork on the "back" of the card. One has to press
 | 
					more information of the artwork on the "back" of the card. One has to press
 | 
				
			||||||
a little `i` for more information in one of the bottom corners of the card.
 | 
					a little `i` for more information in one of the bottom corners of the card.
 | 
				
			||||||
On the back of the card two (?) to six information cards can be found with
 | 
					On the back of the card two to six information cards can be found with a
 | 
				
			||||||
a teaser text about a certain topic. These topic cards can be opened and a
 | 
					teaser text about a certain topic. These topic cards can be opened and a
 | 
				
			||||||
hypertext with detailed information pops up. Within these hypertexts
 | 
					hypertext with detailed information opens. Within these hypertexts certain
 | 
				
			||||||
certain technical terms can be clicked for lay people to get more
 | 
					technical terms can be clicked for lay people to get more information. This
 | 
				
			||||||
information. This also opens up a pop-up. The events encoded in the raw log
 | 
					also opens up a pop-up. The events encoded in the raw log files therefore
 | 
				
			||||||
files therefore have the following structure.
 | 
					have the following structure.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
"Start Application"     --> Start Application
 | 
					"Start Application"     --> Start Application
 | 
				
			||||||
@ -100,32 +92,32 @@ raw log file:
 | 
				
			|||||||
  organized in. For the HAUM data set, the data are sorted by year (folders
 | 
					  organized in. For the HAUM data set, the data are sorted by year (folders
 | 
				
			||||||
  2016, 2017, 2018, 2019, 2020, 2021, 2022, and 2023).
 | 
					  2016, 2017, 2018, 2019, 2020, 2021, 2022, and 2023).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `data`: Extracted time stamp from the raw log file in the format
 | 
					* `date`: Extracted timestamp from the raw log file in the format
 | 
				
			||||||
  `yyyy-mm-dd hh:mm:ss`.
 | 
					  `yyyy-mm-dd hh:mm:ss`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `timeMs`: Containing a time stamp in Milliseconds that restarts with
 | 
					* `timeMs`: Containing a timestamp in Milliseconds that restarts with
 | 
				
			||||||
  every new raw log files.
 | 
					  every new raw log files.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `event`: Start and stop event tags. See above for possible values.
 | 
					* `event`: Start and stop event tags. See above for possible values.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `artwork`: Identifier of the different artworks. This is a 3 digit
 | 
					* `item`: Identifier of the different items. This is a three-digit
 | 
				
			||||||
  (left-padded) number. The numbers of the artworks correspond to the
 | 
					  (left-padded) number. The numbers of the items correspond to the
 | 
				
			||||||
  folder names in `/ContentEyevisit/eyevisit_cards_light/` and were
 | 
					  folder names in `/ContentEyevisit/eyevisit_cards_light/` and were
 | 
				
			||||||
  orginally taken from the museums catalogue.
 | 
					  orginally taken from the museums catalogue.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `popup`: Name of the pop-up opened. This is only interestin for
 | 
					* `popup`: Name of the pop-up opened. This is only interesting for
 | 
				
			||||||
  "openPopup" events.
 | 
					  "openPopup" events.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `topicNumber`: The number of the topic card that has been opened at the back of
 | 
					* `topic`: The number of the topic card that has been opened at the back of
 | 
				
			||||||
  the artwork card. See below for a more detailed descripttion what these
 | 
					  the item card. See below for a more detailed descripttion what these
 | 
				
			||||||
  numbers possibly mean.
 | 
					  numbers mean.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `x`: Value of x-coordinate in pixel on the 4K-Display ($3840 \times 2160$)
 | 
					* `x`: Value of x-coordinate in pixel on the 4K-Display ($3840 \times 2160$)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `y`: Value of y-coordinate in pixel
 | 
					* `y`: Value of y-coordinate in pixel
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `scale`: Number in 128 bit that indicates how much the artwork card has
 | 
					* `scale`: Number in 128 bit that indicates how much the card has been
 | 
				
			||||||
  been scaled (????)
 | 
					  scaled
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `rotation`: Degree of rotation in start configuration.
 | 
					* `rotation`: Degree of rotation in start configuration.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@ -134,43 +126,45 @@ raw log file:
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
## Variables after "closing of events"
 | 
					## Variables after "closing of events"
 | 
				
			||||||
 | 
					
 | 
				
			||||||
The raw log data consists of start and stop events for each event type.
 | 
					The raw log data consist of start and stop events for each event type.
 | 
				
			||||||
After preprocessing for event types are extracted: `move`, `flipCard`,
 | 
					After preprocessing four event types are extracted: `move`, `flipCard`,
 | 
				
			||||||
`openTopic`, and `openPopup`. Except for the `move` events, which can occur
 | 
					`openTopic`, and `openPopup`. Except for the `move` events, which can occur
 | 
				
			||||||
at any time when interacting with an artwork card on the table, the events
 | 
					at any time when interacting with an item card on the table, the events
 | 
				
			||||||
have a hierachical order: An artwork card first needs to be flipped
 | 
					have a hierarchical order: An item card first needs to be flipped
 | 
				
			||||||
(`flipCard`), then the topic cards on the back of the card can be opened
 | 
					(`flipCard`), then the topic cards on the back of the card can be opened
 | 
				
			||||||
(`openTopic`), and finally pop-ups on these topic cards can be opened
 | 
					(`openTopic`), and finally pop-ups on these topic cards can be opened
 | 
				
			||||||
(`openPopup`). This implies that the event `openPopup` can only be present
 | 
					(`openPopup`). This implies that the event `openPopup` can only be present
 | 
				
			||||||
for a certain artwork, if the card has already been flipped (i.e., an event
 | 
					for a certain item, if the card has already been flipped (i.e., an event
 | 
				
			||||||
`flipCard` for the same artwork has already occured).
 | 
					`flipCard` for the same item has already occured).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
After preprocessing, the data frame is now in a wide format with columns
 | 
					After preprocessing, the data frame is now in a wide format with columns
 | 
				
			||||||
for the start and the stop of each event and contains the following
 | 
					for the start and the stop of each event and contains the following
 | 
				
			||||||
variables:
 | 
					variables:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `folder`: Containing the folder name (see above)
 | 
					* `fileId.start` / `fileId.stop`: See above.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `eventId`: A numerical variable that indicates the number of the event.
 | 
					* `date.start` / `date.stop`: See above.
 | 
				
			||||||
  Starts at 1 and ends with the total number of events, counting up by 1.
 | 
					
 | 
				
			||||||
 | 
					* `folder`: Containing the folder name (see above)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `case`: A numerical variable indicating cases in the data. A "case"
 | 
					* `case`: A numerical variable indicating cases in the data. A "case"
 | 
				
			||||||
  indicates an interaction interval and could be defined in different ways.
 | 
					  indicates an interaction interval and could be defined in different ways.
 | 
				
			||||||
  Right now a new case begins, when no event occured for 20 seconds.
 | 
					  Right now a new case begins, when no event occurred for 20 seconds or
 | 
				
			||||||
 | 
					  longer.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `trace`: A trace is defined as one interaction with one artwork. A trace
 | 
					* `path`: A path is defined as one interaction with one item A path
 | 
				
			||||||
  can either start with a `flipCard` event or when an artwork has been
 | 
					  can either start with a `flipCard` event or when an item has been
 | 
				
			||||||
  touched for the first time within this case. A trace ends with the
 | 
					  touched for the first time within this case. A path ends with the
 | 
				
			||||||
  artwork card being flipped close again or with the last movement of the
 | 
					  item card being flipped close again or with the last movement of the
 | 
				
			||||||
  card within this case. One case can contain several traces with the same
 | 
					  card within this case. One case can contain several paths with the same
 | 
				
			||||||
  artwork when the artwork is flipped open and slipped close again several
 | 
					  item when the item is flipped open and flipped close again several
 | 
				
			||||||
  times within a short time.
 | 
					  times within a short time.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `glossar`: An indicator variable with values 0/1 that tracks if a pop-up
 | 
					* `glossar`: An indicator variable with values 0/1 that tracks if a pop-up
 | 
				
			||||||
  has been opened from the glossar folder. These pop-ups can be assigned to
 | 
					  has been opened from the glossar folder. These pop-ups can be assigned to
 | 
				
			||||||
  the wronge artwork since it is not possible to do this algorithmically.
 | 
					  the wrong item since it is not possible to do this algorithmically.
 | 
				
			||||||
  It is possible that two artworks are flipped open that could both link to
 | 
					  It is possible that two items are flipped open that could both link to
 | 
				
			||||||
  the same popup from a glossar. The indicator variable is left as a
 | 
					  the same pop-up from a glossar. The indicator variable is left as a
 | 
				
			||||||
  variable, so that these pop-ups can be easily deleted from the data.
 | 
					  variable, so that these pop-ups can be easily deleted from the data.
 | 
				
			||||||
  Right now, glossar entries can be ignored completely by setting an
 | 
					  Right now, glossar entries can be ignored completely by setting an
 | 
				
			||||||
  argument and this is done by default. Using the pop-ups from the glossar
 | 
					  argument and this is done by default. Using the pop-ups from the glossar
 | 
				
			||||||
@ -179,20 +173,16 @@ variables:
 | 
				
			|||||||
* `event`: Indicating the event. Can take tha values `move`, `flipCard`,
 | 
					* `event`: Indicating the event. Can take tha values `move`, `flipCard`,
 | 
				
			||||||
  `openTopic`, and `openPopup`.
 | 
					  `openTopic`, and `openPopup`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `artwork`: Identifier of the different artworks. This is a 3 digit
 | 
					* `item`: Identifier of the different artworks and information cards. This
 | 
				
			||||||
  (left-padded) number. See above.
 | 
					  is a three-digit (left-padded) number. See above.
 | 
				
			||||||
 | 
					 | 
				
			||||||
* `fileId.start` / `fileId.stop`: See above.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
* `date.start` / `date.stop`: See above.
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `timeMs.start` / `timeMs.stop`: See above.
 | 
					* `timeMs.start` / `timeMs.stop`: See above.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `duration`: Calculated by $timeMs.stop - timeMs.start$ in Milliseconds.
 | 
					* `duration`: Calculated by $timeMs.stop - timeMs.start$ in Milliseconds.
 | 
				
			||||||
  Needs to be adjusted for events spanning more than one log file by a
 | 
					  Needs to be adjusted for events spanning more than one log file by a
 | 
				
			||||||
  factor of $60,000 \times #logfiles$. See below for details.
 | 
					  factor of $60,000 \times \text{number of logfiles}$. See below for details.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `topicNumber`: See above.
 | 
					* `topic`: See above.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `popup`: See above.
 | 
					* `popup`: See above.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@ -200,11 +190,12 @@ variables:
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
* `y.start` / `y.stop`: See above.
 | 
					* `y.start` / `y.stop`: See above.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `distance`: Euclidean distande calculated from $(x.start, y.start)$ and $(x.stop, y.stop)$.
 | 
					* `distance`: Euclidean distande calculated from $(x.start, y.start)$ and
 | 
				
			||||||
 | 
					  $(x.stop, y.stop)$.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `scale.start` / `scale.stop`: See above.
 | 
					* `scale.start` / `scale.stop`: See above.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `scaleSize`: Relative scaling of artwork card, calculated by
 | 
					* `scaleSize`: Relative scaling of item card, calculated by
 | 
				
			||||||
  $\frac{scale.stop}{scale.start}$.
 | 
					  $\frac{scale.stop}{scale.start}$.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `rotation.start` / `rotation.stop`: See above.
 | 
					* `rotation.start` / `rotation.stop`: See above.
 | 
				
			||||||
@ -215,60 +206,26 @@ variables:
 | 
				
			|||||||
## How unclosed events are handled
 | 
					## How unclosed events are handled
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Events do not necessarily need to be completed. A person can, e.g., leave
 | 
					Events do not necessarily need to be completed. A person can, e.g., leave
 | 
				
			||||||
the table and not flip the artwork card close again. For `flipCard`,
 | 
					the table and not flip the item card close again. For `flipCard`,
 | 
				
			||||||
`openTopic`, and `openPopup` the data frame contains `NA` when the event
 | 
					`openTopic`, and `openPopup` the data frame contains `NA` when the event
 | 
				
			||||||
does not complete. For `move` events is happens quite often that a start
 | 
					does not complete. For `move` events it happens quite often that a start
 | 
				
			||||||
event follows a start event and a stop event follows a stop event.
 | 
					event follows a start event and a stop event follows a stop event.
 | 
				
			||||||
Technically a move event cannot *not* be finished and the number of events
 | 
					Technically a move event cannot *not* be finished and the number of events
 | 
				
			||||||
without a start or stop indicated that the time resolution was not
 | 
					without a start or stop indicate that the time resolution was not
 | 
				
			||||||
sufficient to catch all these events accurately. Double start and stop
 | 
					sufficient to catch all these events accurately. Double start and stop
 | 
				
			||||||
`move`events have therefore been deleted from the data set.
 | 
					`move` events have therefore been deleted from the data set.
 | 
				
			||||||
 | 
					 | 
				
			||||||
<!--
 | 
					 | 
				
			||||||
## How a case is defined
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
* Herausfinden, ob mehr als eine Person am Tisch steht?
 | 
					 | 
				
			||||||
  - Sliding window, in der Anzahl von Artworks gezählt wird? Oder wie weit
 | 
					 | 
				
			||||||
    angefasste Artworks voneinander entfernt sind?
 | 
					 | 
				
			||||||
  - Man kann sowas schon "sehen" in den Logs - aber wie kann ich es
 | 
					 | 
				
			||||||
    automatisiert rausziehen? Was ist meine Definition von
 | 
					 | 
				
			||||||
    "Interaktionsboost"?
 | 
					 | 
				
			||||||
  - Egal wie wir es machen, geht es auf den "Event-Log-Daten"?
 | 
					 | 
				
			||||||
-->
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Additional meta data
 | 
					## Additional meta data
 | 
				
			||||||
 | 
					
 | 
				
			||||||
For the HAUM data, I added meta data on state holidays and school
 | 
					For the HAUM data, I added meta data on state holidays and school
 | 
				
			||||||
vacations. Additionally, the topic categories of the topic cards were
 | 
					vacations. 
 | 
				
			||||||
extracted from the XML files and added to the data frame.
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
This led to the following additional variables:
 | 
					This led to the following additional variables:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `topicIndex`
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
* `topicFile`
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
* `topic`
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
* `state` (Niedersachsen for complete HAUM data set)
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
* `stateCode` (NI)
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
* `holiday`
 | 
					* `holiday`
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `vacations`
 | 
					* `vacations`
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `stateCodeVacations`
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
<!--
 | 
					 | 
				
			||||||
  - Metadata on artworks like, name, artist, type of artwork, epoch, etc.
 | 
					 | 
				
			||||||
  - School vacations and holidays
 | 
					 | 
				
			||||||
  - Special exhibits at the museum
 | 
					 | 
				
			||||||
  - Number of visitors per day (bei Sven noch mal nachhaken?)
 | 
					 | 
				
			||||||
  - Age structure of visitors per day?
 | 
					 | 
				
			||||||
  - ... ????
 | 
					 | 
				
			||||||
-->
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
# Problems and how I handled them
 | 
					# Problems and how I handled them
 | 
				
			||||||
 | 
					
 | 
				
			||||||
This lists some problems with the log data that required decisions. These
 | 
					This lists some problems with the log data that required decisions. These
 | 
				
			||||||
@ -287,33 +244,12 @@ event spans more than two log files, a multiple of $600,000$ must be taken,
 | 
				
			|||||||
e.g. for three log files it must be: $2 \times 600,000 - timeMs.start +
 | 
					e.g. for three log files it must be: $2 \times 600,000 - timeMs.start +
 | 
				
			||||||
timeMs.stop$ and so on.
 | 
					timeMs.stop$ and so on.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```{r, results = FALSE, fig.show = TRUE}
 | 
					```{r timems, echo = FALSE, results = FALSE, fig.show = TRUE}
 | 
				
			||||||
# Read data
 | 
					# Read data
 | 
				
			||||||
dat0 <- read.table("data/haum/raw_logfiles_small_2023-09-26_13-50-20.csv", sep = ";",
 | 
					datraw <- read.table("code/results/raw_logfiles_2024-02-21_16-07-33.csv", sep = ";",
 | 
				
			||||||
                   header = TRUE)
 | 
					                     header = TRUE)
 | 
				
			||||||
dat0$date <- as.POSIXct(dat0$date)
 | 
					 | 
				
			||||||
dat0$glossar <- ifelse(dat0$artwork == "glossar", 1, 0)
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
# Remove irrelevant events
 | 
					plot(timeMs ~ as.factor(fileId), datraw[1:5000,], xlab = "fileId")
 | 
				
			||||||
dat <- subset(dat0, !(dat0$event %in% c("Start Application",
 | 
					 | 
				
			||||||
                                        "Show Application")))
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
# Add trace variable
 | 
					 | 
				
			||||||
artworks <- unique(stats::na.omit(dat$artwork))
 | 
					 | 
				
			||||||
artworks <- artworks[artworks != "glossar"]
 | 
					 | 
				
			||||||
glossar_files <- unique(subset(dat, dat$artwork == "glossar")$popup)
 | 
					 | 
				
			||||||
glossar_dict <- create_glossardict(artworks, glossar_files,
 | 
					 | 
				
			||||||
                    xmlpath = "data/haum/ContentEyevisit/eyevisit_cards_light/")
 | 
					 | 
				
			||||||
dat1 <- add_trace(dat, glossar_dict)
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
# Close events
 | 
					 | 
				
			||||||
dat2 <- rbind(close_events(dat1, "move", rm_nochange_moves = TRUE),
 | 
					 | 
				
			||||||
              close_events(dat1, "flipCard", rm_nochange_moves = TRUE),
 | 
					 | 
				
			||||||
              close_events(dat1, "openTopic", rm_nochange_moves = TRUE),
 | 
					 | 
				
			||||||
              close_events(dat1, "openPopup", rm_nochange_moves = TRUE))
 | 
					 | 
				
			||||||
dat2 <- dat2[order(dat2$fileId.start, dat2$date.start, dat2$timeMs.start), ]
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
plot(timeMs ~ as.factor(fileId), dat[1:5000,], xlab = "fileId")
 | 
					 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
The boxplot shows that we have a continuous range of values within one log
 | 
					The boxplot shows that we have a continuous range of values within one log
 | 
				
			||||||
@ -322,7 +258,7 @@ file but that `timeMs` does not increase over log files. I kept
 | 
				
			|||||||
in the data frame, so it is clear when events span more than one log file.
 | 
					in the data frame, so it is clear when events span more than one log file.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
<!--
 | 
					<!--
 | 
				
			||||||
Infos from Philipp:
 | 
					Infos from the programmer:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
"Bin außerdem gerade den Code von damals durchgegangen. Das Logging läuft
 | 
					"Bin außerdem gerade den Code von damals durchgegangen. Das Logging läuft
 | 
				
			||||||
so: Mit Start der Anwendung wird alle 10 Minuten ein neues Logfile
 | 
					so: Mit Start der Anwendung wird alle 10 Minuten ein neues Logfile
 | 
				
			||||||
@ -340,7 +276,7 @@ es passt."
 | 
				
			|||||||
## Left padding of file IDs
 | 
					## Left padding of file IDs
 | 
				
			||||||
 | 
					
 | 
				
			||||||
The file names of the raw log files are automatically generated and contain
 | 
					The file names of the raw log files are automatically generated and contain
 | 
				
			||||||
a time stamp. This time stamp is not well formed. First, it contains an
 | 
					a timestamp. This timestamp is not well formed. First, it contains an
 | 
				
			||||||
incorrect month. The months go from 0 to 11 which means, that the file name
 | 
					incorrect month. The months go from 0 to 11 which means, that the file name
 | 
				
			||||||
`2016_11_15-12_12_57.log` was collected on December 15, 2016 at 12:12 pm.
 | 
					`2016_11_15-12_12_57.log` was collected on December 15, 2016 at 12:12 pm.
 | 
				
			||||||
Another problem is that the file names are not zero left padded, e.g.,
 | 
					Another problem is that the file names are not zero left padded, e.g.,
 | 
				
			||||||
@ -350,11 +286,12 @@ will sort these files in the order shown below. In order to preprocess the
 | 
				
			|||||||
data and close events that belong together, the data need to be sorted by
 | 
					data and close events that belong together, the data need to be sorted by
 | 
				
			||||||
events and artworks repeatedly. In order to get them back in the correct
 | 
					events and artworks repeatedly. In order to get them back in the correct
 | 
				
			||||||
time order, it is necessary to order them based on three variables:
 | 
					time order, it is necessary to order them based on three variables:
 | 
				
			||||||
`fileId`, `date.start` and `timeMs`. The file IDs therefore need to
 | 
					`fileId.start`, `date.start` and `timeMs.start`. The file IDs therefore
 | 
				
			||||||
sort in the correct order (again see below for example). I zero left padded
 | 
					need to sort in the correct order (again see below for example). I zero
 | 
				
			||||||
the log file names within the data frame using it as an identifier. These
 | 
					left padded the log file names within the data frame using it as an
 | 
				
			||||||
"file names" do not correspond exactly to the original raw log file names.
 | 
					identifier. These "file names" do not correspond exactly to the original
 | 
				
			||||||
This needs to be kept in mind when doing any kind of matching etc.
 | 
					raw log file names. This needs to be kept in mind when doing any kind of
 | 
				
			||||||
 | 
					matching etc.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
## what it looked like before left padding
 | 
					## what it looked like before left padding
 | 
				
			||||||
@ -376,16 +313,16 @@ This needs to be kept in mind when doing any kind of matching etc.
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
## Timestamps repeat
 | 
					## Timestamps repeat
 | 
				
			||||||
 | 
					
 | 
				
			||||||
The time stamps in the `date` variable record year, month, day, hour,
 | 
					The timestamps in the `date` variable record year, month, day, hour,
 | 
				
			||||||
minute and seconds. Since one second is not a very short time interval for
 | 
					minute and seconds. Since one second is not a very short time interval for
 | 
				
			||||||
a move on a touch display, this is not fine grained enough to bring events
 | 
					a move on a touch display, this is not fine grained enough to bring events
 | 
				
			||||||
into the correct order, meaning there are events from the same log file
 | 
					into the correct order, meaning there are events from the same log file
 | 
				
			||||||
having the same time stamp and even events from different log files having
 | 
					having the same timestamp and even events from different log files having
 | 
				
			||||||
the same time stamp. The log files get written about every 10 minutes
 | 
					the same timestamp. The log files get written about every 10 minutes
 | 
				
			||||||
(which can easily be seen when looking at the file names of the raw log
 | 
					(which can easily be seen when looking at the file names of the raw log
 | 
				
			||||||
files). So in order to get events in the correct order, it is necessary to
 | 
					files). So in order to get events in the correct order, it is necessary to
 | 
				
			||||||
first order by file ID, within file ID then sort by time stamp `date` and
 | 
					first order by file ID, within file ID then sort by timestamp `date` and
 | 
				
			||||||
then within these more coarse grained time stamps sort be `timeMs`. But as
 | 
					then within these more coarse grained timestamps sort be `timeMs`. But as
 | 
				
			||||||
explained above, `timeMs` can only be sorted within one file ID, since they
 | 
					explained above, `timeMs` can only be sorted within one file ID, since they
 | 
				
			||||||
do not increase consistently over log files, but have a new setoff for each
 | 
					do not increase consistently over log files, but have a new setoff for each
 | 
				
			||||||
raw log file.
 | 
					raw log file.
 | 
				
			||||||
@ -394,64 +331,67 @@ raw log file.
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
The display of the Multi-Touch-Table is a 4K-display with 3840 x 2160
 | 
					The display of the Multi-Touch-Table is a 4K-display with 3840 x 2160
 | 
				
			||||||
pixels. When you plot the start and stop coordinates, the display is
 | 
					pixels. When you plot the start and stop coordinates, the display is
 | 
				
			||||||
clearly to distinguish. However, a lot of points are outside of the display
 | 
					clearly distinguishable. However, a lot of points are outside of the
 | 
				
			||||||
range. This can happen, when the art objects are scaled and then moved to
 | 
					display range. This can happen, when the art objects are scaled and then
 | 
				
			||||||
the very edge of the table. Then it will record pixels outside of the
 | 
					moved to the very edge of the table. Then it will record pixels outside of
 | 
				
			||||||
table. These are actually valid data points and I will leave them as is.
 | 
					the table. These are actually valid data points and I will leave them as
 | 
				
			||||||
 | 
					is.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```{r xycoord}
 | 
				
			||||||
 | 
					datlogs <- read.table("code/results/event_logfiles_2024-02-21_16-07-33.csv", sep = ";",
 | 
				
			||||||
 | 
					                      header = TRUE)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```{r}
 | 
					 | 
				
			||||||
par(mfrow = c(1, 2))
 | 
					par(mfrow = c(1, 2))
 | 
				
			||||||
plot(y.start ~ x.start, dat2)
 | 
					plot(y.start ~ x.start, datlogs)
 | 
				
			||||||
abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
 | 
					abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
 | 
				
			||||||
plot(y.stop ~ x.stop, dat2)
 | 
					plot(y.stop ~ x.stop, datlogs)
 | 
				
			||||||
abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
 | 
					abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
aggregate(cbind(x.start, x.stop, y.start, y.stop) ~ 1, dat2, mean)
 | 
					aggregate(cbind(x.start, x.stop, y.start, y.stop) ~ 1, datlogs, mean)
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Pop-ups from glossar cannot be assigned to a specific artwork
 | 
					## Pop-ups from glossar cannot be assigned to a specific item
 | 
				
			||||||
 | 
					
 | 
				
			||||||
All the information, pictures and texts for the topics and pop-ups are
 | 
					All the information, pictures and texts for the topics and pop-ups are
 | 
				
			||||||
stored in
 | 
					stored in `/data/haum/ContentEyevisit/eyevisit_cards_light/<item_number>`.
 | 
				
			||||||
`/Logfiles/ContentEyevisit/eyevisit_cards_light/<artwork_number>`. Among
 | 
					Among other things, each folder contains XML-files with the information
 | 
				
			||||||
other things, each folder contains XML-files with the information about any
 | 
					about any technical terms that can be opened from the hypertexts on the
 | 
				
			||||||
technical terms that can be opened from the hypertexts on the topic cards.
 | 
					topic cards. Often these information are item dependent and then the
 | 
				
			||||||
Often these information are artwork dependent and then the corresponding
 | 
					corresponding XML-file is in the folder for this item. Sometimes, however,
 | 
				
			||||||
XML-file is in the folder for this artwork. Sometimes, however, more
 | 
					more general terms can be opened. In order to avoid multiple files
 | 
				
			||||||
general terms can be opened. In order to avoid multiple files containing
 | 
					containing the same information, these were stored in a folder called
 | 
				
			||||||
the same information, these were stored in a folder called `glossar` and
 | 
					`glossar` and get accessed from there. The raw log files only contain the
 | 
				
			||||||
get accessed from there. The raw log files only contain the path to this
 | 
					path to this glossar entry and did not record from which item it was
 | 
				
			||||||
glossar entry and did not record from which artwork it was accessed. I
 | 
					accessed. I tried to assign these glossar entries to the correct items. The
 | 
				
			||||||
tried to assign these glossar entries to the correct artworks. The (very
 | 
					(very heuristic) approach was this:
 | 
				
			||||||
heuristic) approach was this:
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
1. Create a lookup table with all XML-file names (possible pop-ups) from
 | 
					1. Create a lookup table with all XML-file names (possible pop-ups) from
 | 
				
			||||||
   the glossar folder and what artworks possibly call them. This was stored
 | 
					   the glossar folder and what items possibly call them. This was stored
 | 
				
			||||||
   as an `RData` object for easier handling but should maybe be stored in a
 | 
					   as an `RData` object for easier handling but should maybe be stored in a
 | 
				
			||||||
   more interoperable format.
 | 
					   more interoperable format.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
2. I went through all possible pop-ups in this lookup table and stored the
 | 
					2. I went through all possible pop-ups in this lookup table and stored the
 | 
				
			||||||
   artworks that are associated with it.
 | 
					   items that are associated with it.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
3. I created a sub data frame without move events (since they can never be
 | 
					3. I created a sub data frame without move events (since they can never be
 | 
				
			||||||
   associated with a pop-up) and went through every line and looked up if
 | 
					   associated with a pop-up) and went through every line and looked up if
 | 
				
			||||||
   an artwork and a topic card had been opened. If this was the case and a
 | 
					   an item and a topic card had been opened. If this was the case and a
 | 
				
			||||||
   glossar entry came up before the artwork was closed again, I assigned
 | 
					   glossar entry came up before the item was closed again, I assigned
 | 
				
			||||||
   this artwork to this glossar entry.
 | 
					   this item to the glossar entry.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
This is heuristic since it is possible that several topic cards from
 | 
					This is heuristic since it is possible that several topic cards from
 | 
				
			||||||
different artworks are opened simultaneously and the glossar pop-up could
 | 
					different items are opened simultaneously and the glossar pop-up could
 | 
				
			||||||
be opened from either one (it could even be more than two, of course). In
 | 
					be opened from either one (it could even be more than two, of course). In
 | 
				
			||||||
these cases the artwork that was opened closest to the glossar pop-up has
 | 
					these cases the item that was opened closest to the glossar pop-up has
 | 
				
			||||||
been assigned, but this can never be completely error free.
 | 
					been assigned, but this can never be completely error free.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
And this heuristic only assigns a little more than half of the glossar
 | 
					And this heuristic only assigns a little more than half of the glossar
 | 
				
			||||||
entries. Since my heuristic only looks for the last artwork that has been
 | 
					entries. Since my heuristic only looks for the last item that has been
 | 
				
			||||||
opened and if this artwork is a possible candidate it misses all glossar
 | 
					opened and if this item is a possible candidate it misses all glossar
 | 
				
			||||||
pop-ups where another artwork has been opened in between. This is still an
 | 
					pop-ups where another item has been opened in between. This is still an
 | 
				
			||||||
open TODO to write a more elaborate algorithm.
 | 
					open TODO to write a more elaborate algorithm.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
All glossar pop-ups that do not get matched with an artwork are removed
 | 
					All glossar pop-ups that do not get matched with an item are removed
 | 
				
			||||||
from the data set with a warning if the argument `glossar = TRUE` is set.
 | 
					from the data set with a warning if the argument `glossar = TRUE` is set.
 | 
				
			||||||
Otherwise the glossar entries will be ignored completely.
 | 
					Otherwise the glossar entries will be ignored completely.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@ -473,232 +413,89 @@ gets extracted by the algorithm.
 | 
				
			|||||||
In order to investigate user behavior on a more fine grained level, it will
 | 
					In order to investigate user behavior on a more fine grained level, it will
 | 
				
			||||||
be necessary to come up with a more elaborate approach. A better, still
 | 
					be necessary to come up with a more elaborate approach. A better, still
 | 
				
			||||||
simple approach, could be to use this kind of time limit and additionally
 | 
					simple approach, could be to use this kind of time limit and additionally
 | 
				
			||||||
look at the distance between artworks interacted with within one time
 | 
					look at the distance between items interacted with within one time window.
 | 
				
			||||||
window. When artworks are far apart it seems plausible that more than one
 | 
					When items are far apart it seems plausible that more than one person
 | 
				
			||||||
person interacted with them. Very short time lapses between events on
 | 
					interacted with them. Very short time lapses between events on different
 | 
				
			||||||
different artworks could also be an indicator that more than one person is
 | 
					items could also be an indicator that more than one person is interacting
 | 
				
			||||||
interacting with the table.
 | 
					with the table.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Assign a `trace` variable
 | 
					## Assign a `path` variable
 | 
				
			||||||
 | 
					
 | 
				
			||||||
The `trace` variable is supposed to show one interaction trace with one
 | 
					The `path` variable is supposed to show one interaction trace with one
 | 
				
			||||||
artwork. Meaning it starts when an artwork is touched or flipped and stops
 | 
					artwork. Meaning it starts when an artwork is touched or flipped and stops
 | 
				
			||||||
when it is closed again. It is easy to assign a trace from flipping a card
 | 
					when it is closed again. It is easy to assign a path from flipping a card
 | 
				
			||||||
over opening (maybe several) topics and pop-ups for this artwork card until
 | 
					over opening (maybe several) topics and pop-ups for this artwork card until
 | 
				
			||||||
closing this card again. But one would like to assign the same trace to
 | 
					closing this card again. But one would like to assign the same path to
 | 
				
			||||||
move events surrounding this interaction. Again, this is not possible in an
 | 
					move events surrounding this interaction. Again, this is not possible in an
 | 
				
			||||||
algorithmic way but only heuristically. I used the `case` variable in order
 | 
					algorithmic way but only heuristically.
 | 
				
			||||||
to get meaningful units around the artworks.
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
If within one case only a single trace for a single artwork was opened, I
 | 
					Again, I used a time cutoff for this. First, if a `move` event occurs, it
 | 
				
			||||||
assigned this trace to the moves associated with this artwork. It (quite
 | 
					is checked, if the same item has been flipped less than 20 seconds
 | 
				
			||||||
often) happens that within one case one artwork is opened and closed
 | 
					beforehand. If yes, the same path indicator is assigned to this `move`. If
 | 
				
			||||||
several times, each time starting a new trace. I then assigned all the
 | 
					not, temporarily a new "move indicator" is assigned. Then, a "backward
 | 
				
			||||||
following move events to the trace beforehand. This is, of course,
 | 
					pass" is applied, where it is checked if the same item is opened less than
 | 
				
			||||||
arbitrary and could also be handled the other way around.
 | 
					20 seconds _after_ the event occurs. If yes, that path indicator is
 | 
				
			||||||
 | 
					assigned. For all the remaining moves, a new path number is assigned. This
 | 
				
			||||||
Another possibility is, that an artwork gets moved within one trace without
 | 
					corresponds to items being moved without being flipped.
 | 
				
			||||||
being flipped. I then assigned a new trace to this move.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
This overall worked very well even though it was based on the very
 | 
					 | 
				
			||||||
heuristic approach assigning a case when the table has not been touched for
 | 
					 | 
				
			||||||
20 seconds. It should be kept in mind that the trace assignments for the
 | 
					 | 
				
			||||||
moves will change when case is defined in a different way.
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
## A `move` event does not record any change
 | 
					## A `move` event does not record any change
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Most of the events in the log files are move events. Additionally, many of
 | 
					Most of the events in the log files are move events. Additionally, many of
 | 
				
			||||||
these move events are recorded but they do not indicate any change meaning
 | 
					these move events are recorded but they do not indicate any change, meaning
 | 
				
			||||||
the only difference is the time stamp. All other variables indicating moves
 | 
					the only difference is the timestamp. All other variables indicating moves
 | 
				
			||||||
like `x.start` and `x.stop`, `rotation.start` and `rotation.stop` etc. do
 | 
					like `x.start` and `x.stop`, `rotation.start` and `rotation.stop` etc. do
 | 
				
			||||||
not show any change. They represent about 2/3 of all move events. These
 | 
					not show _any_ change. They represent about 2/3 of all move events. These
 | 
				
			||||||
events are probably short touches of the table without an actual
 | 
					events are probably short touches of the table without an actual
 | 
				
			||||||
interaction. They were therefore removed from the data set.
 | 
					interaction. They were therefore removed from the data set.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Events that only close (`date.start` is NA)
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
It looks like there is some kind of log error for the events that do not
 | 
					 | 
				
			||||||
have a start stop. I was able to get rid of most by sorting for `popup` for
 | 
					 | 
				
			||||||
the openPopup events, but there are still some left (50 for the small data
 | 
					 | 
				
			||||||
set, which corresponds to 0.2 per mill). The following example shows that
 | 
					 | 
				
			||||||
artwork "501" gets closed (line 31030) while the pop-up `sommerbau.xml`
 | 
					 | 
				
			||||||
is still opened (line 31027). Then artwork "501" gets opened again
 | 
					 | 
				
			||||||
(line 31035) and after that the pop-up `sommerbau.xml` is closed (line
 | 
					 | 
				
			||||||
31040). This should not be possible and therefore (correctly) two events
 | 
					 | 
				
			||||||
are assigned: One where the pop-up was opened and then not closed (which is
 | 
					 | 
				
			||||||
common) and another one where the pop-up has no start.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
```{r}
 | 
					 | 
				
			||||||
dat[31000:31019,]
 | 
					 | 
				
			||||||
# Card gets flipped closed before pop-up closes --> log error!
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
I did not check all of these cases (for the complete data set this is
 | 
					 | 
				
			||||||
simply not possible by hand) but just excluded all events that do not have
 | 
					 | 
				
			||||||
a `date.start` since they are hard to interpret. Often they are log errors
 | 
					 | 
				
			||||||
but in some cases they might be resolvable.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
```{r}
 | 
					 | 
				
			||||||
# remove all events that do not have a `date.start`
 | 
					 | 
				
			||||||
dim(dat2[is.na(dat2$date.start), ])
 | 
					 | 
				
			||||||
dat2 <- dat2[!is.na(dat2$date.start), ]
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
In order to deal with these logging errors, I check the data for what I
 | 
					 | 
				
			||||||
call "fragmented traces". These are traces that cannot happen, when
 | 
					 | 
				
			||||||
everything is logged correctly, e.g., traces containing `flipCard ->
 | 
					 | 
				
			||||||
openPopup` or traces that only consist of `move`, `openTopic`, and
 | 
					 | 
				
			||||||
`openPopup` events. These fragmented traces are removed from the data. It
 | 
					 | 
				
			||||||
was not possible to check them all manually, but the 20 or more that I do
 | 
					 | 
				
			||||||
check in the raw log files were all some kind of logging error like above.
 | 
					 | 
				
			||||||
Most often a card was already closed again, before a topic card or pop-up
 | 
					 | 
				
			||||||
was recorded as being closed.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
## Card indices go from 0 to 7 (instead of 0 to 5 as expected)
 | 
					## Card indices go from 0 to 7 (instead of 0 to 5 as expected)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
See `questions_number-of-cards.R` for more details.
 | 
					In the beginning I thought that the number for topics was the index of
 | 
				
			||||||
 | 
					where the card was presented on the back of the item. But this is not
 | 
				
			||||||
 | 
					correct. It is the number of the topic. There are eight topics in total:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
I wrote a function that for each artwork extracts the file names of the
 | 
					 | 
				
			||||||
possible topic cards and then looks up which topics have actually been
 | 
					 | 
				
			||||||
displayed on the back of the card. I added an index giving the ordering in
 | 
					 | 
				
			||||||
the index files.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
The possible values in the variable `topicNumber` range from 0 to 7,
 | 
					 | 
				
			||||||
however, no artwork has more than six different numbers. So I just renamed
 | 
					 | 
				
			||||||
those numbers from 1 to the highest number, e.g., $0,1,2,4,5,6$ was changed
 | 
					 | 
				
			||||||
to $0\to 1,1\to 2,2\to 3,4\to 4,5\to 5,6\to 6$. Next I used the index to
 | 
					 | 
				
			||||||
assign topics and file names to the according pop-ups. This needs to be
 | 
					 | 
				
			||||||
cross checked with the programming, but seems the most plausible approach
 | 
					 | 
				
			||||||
with my current knowledge.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
<!-- TODO: Ask Philipp -->
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
## Extracting topics from `index.xml` vs. `<artwork_number>.xml`
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
When I extract the topics from `index.html` I get different topics, than
 | 
					 | 
				
			||||||
when I get them from `<artwork>.html`. At first glance, it looks like using
 | 
					 | 
				
			||||||
`index.html` actually gives the wrong results.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
```{r}
 | 
					 | 
				
			||||||
artworks <- unique(dat2$artwork)
 | 
					 | 
				
			||||||
path <- "data/haum/ContentEyevisit/eyevisit_cards_light/"
 | 
					 | 
				
			||||||
topics <- extract_topics(artworks, rep("index.xml", length(artworks)), path)
 | 
					 | 
				
			||||||
topics2 <- extract_topics(artworks, paste0(artworks, ".xml"), path)
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
topics[!topics$file_name %in% topics2$file_name, ]
 | 
					 | 
				
			||||||
topics2[!topics2$file_name %in% topics$file_name, ]
 | 
					 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					Indices for topics:
 | 
				
			||||||
 | 
					0   artist
 | 
				
			||||||
 | 
					1   thema
 | 
				
			||||||
 | 
					2   komposition
 | 
				
			||||||
 | 
					3   leben des kunstwerks
 | 
				
			||||||
 | 
					4   details
 | 
				
			||||||
 | 
					5   licht und farbe
 | 
				
			||||||
 | 
					6   extra info
 | 
				
			||||||
 | 
					7   technik
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					On the back of items, there can be between 2 to 6 topic cards. Several of
 | 
				
			||||||
 | 
					these topic cards can be about the same topic, e.g., there can be two topic
 | 
				
			||||||
 | 
					cards assigned to the topic `thema`. It is impossible to find out if the
 | 
				
			||||||
 | 
					same topic card was opened several times or if different topic cards with
 | 
				
			||||||
 | 
					the same topic were opened from the same item. See example below for item
 | 
				
			||||||
 | 
					"001".
 | 
				
			||||||
 | 
					
 | 
				
			||||||
For artwork "031", `index.html` only defines 5 cards (the 6th is commented
 | 
					```{r topics, echo = FALSE}
 | 
				
			||||||
out), but `topicNumber` for this artwork has 6 different entries. I will
 | 
					items <- sprintf("%03d", unique(datlogs$item))
 | 
				
			||||||
therefore extract the topics from `<artwork>.html`. (This seems also better
 | 
					topics <- extract_topics(items, xmlfiles = paste0(items, ".xml"),
 | 
				
			||||||
compatible with other data sets like 8o8m.)
 | 
					                         xmlpath = "data/haum/ContentEyevisit/eyevisit_cards_light/")
 | 
				
			||||||
 | 
					head(topics)
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## New artworks "504" and "505" starting October 2022
 | 
					## New artworks "504" and "505" starting October 2022
 | 
				
			||||||
 | 
					
 | 
				
			||||||
When I read in the complete data frame for the first time, all of the
 | 
					When I read in the complete data frame for the first time, all of the
 | 
				
			||||||
sudden there were 72 instead of 70 artworks. It seems like these two
 | 
					sudden there were 72 instead of 70 items. It seems like these two
 | 
				
			||||||
artworks appear on October 21, 2022.
 | 
					artworks appear on October 21, 2022.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```{r}
 | 
					```{r newitems}
 | 
				
			||||||
dat0 <- read.table("data/haum/raw_logfiles_2023-09-23_01-31-30.csv",
 | 
					summary(as.Date(datraw[datraw$item %in% c("504", "505"), "date"]))
 | 
				
			||||||
                   sep = ";", header = TRUE)
 | 
					 | 
				
			||||||
dat0$date <- as.POSIXct(dat0$date)
 | 
					 | 
				
			||||||
dat0$glossar <- ifelse(dat0$artwork == "glossar", 1, 0)
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
# Remove irrelevant events
 | 
					 | 
				
			||||||
dat <- subset(dat0, !(dat0$event %in% c("Start Application",
 | 
					 | 
				
			||||||
                                        "Show Application")))
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
summary(dat[dat$artwork %in% c("504", "505"), ])
 | 
					 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
The artworks seem to be have updated in general after October 21, 2022.
 | 
					The artworks seem to be have updated in general after October 21, 2022. The
 | 
				
			||||||
 | 
					following table shows which items were presented in which years.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```{r}
 | 
					```{r years}
 | 
				
			||||||
art_after_oct2022 <- sort(unique(dat[dat$date >= "2022-10-21", "artwork"]))
 | 
					xtabs(~ item + lubridate::year(date.start), datlogs)
 | 
				
			||||||
art_before_oct2022 <- sort(unique(dat[dat$date <= "2022-10-21", "artwork"]))
 | 
					 | 
				
			||||||
# Removed artworks
 | 
					 | 
				
			||||||
art_before_oct2022[!art_before_oct2022 %in% art_after_oct2022]
 | 
					 | 
				
			||||||
# Additional artworks
 | 
					 | 
				
			||||||
art_after_oct2022[!art_after_oct2022 %in% art_before_oct2022]
 | 
					 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
The following table shows which artworks were presented in which years.
 | 
					It shows that the artworks haven been updated after the Corona pandemic. I
 | 
				
			||||||
 | 
					think, the table was also moved to a different location at that point.
 | 
				
			||||||
```{r}
 | 
					 | 
				
			||||||
xtabs(~ artwork + lubridate::year(date), dat)
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
It strongly suggests that the artworks haven been updated after the Corona
 | 
					 | 
				
			||||||
pandemic. I think, the table was also moved to a different location at that
 | 
					 | 
				
			||||||
point. (Check with PG to make sure.)
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
# Optimizing resources used by the code
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
After I started trying out the functions on the complete data set, it
 | 
					 | 
				
			||||||
became obvious (not surprisingly `:)`) that this will not work --
 | 
					 | 
				
			||||||
especially for the move events. The reshape function cannot take a long
 | 
					 | 
				
			||||||
data frame with over 6 Million entries and convert it into a wide data
 | 
					 | 
				
			||||||
frame (at least not on my laptop). The code is supposed to work "out of the
 | 
					 | 
				
			||||||
box" for researchers, hence it *should* run on a regular (8 core) laptop.
 | 
					 | 
				
			||||||
So, I changed the reshaping so that it is done in batches on subsets of the
 | 
					 | 
				
			||||||
data for every `fileId` separately. This means that events that span over
 | 
					 | 
				
			||||||
two (or more) raw log files cannot be closed and will then be removed from
 | 
					 | 
				
			||||||
the data set. The function warns about this, but it is a random process
 | 
					 | 
				
			||||||
getting rid of these data and seems therefore not like a systematic
 | 
					 | 
				
			||||||
problem. Another reason why this is not bad, is that durations cannot be
 | 
					 | 
				
			||||||
calculated for events across log files anyways, because the time stamps do
 | 
					 | 
				
			||||||
not increase systematically over log files (see above).
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
UPDATE: By now, I close the events spanning more than one log file after
 | 
					 | 
				
			||||||
this has been done.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
I meant to put the lists back together with `do.call(rbind, some_list)` but
 | 
					 | 
				
			||||||
this can also not handle big data sets. I therefore switched to
 | 
					 | 
				
			||||||
`dplyr::bind_rows(some_ist)` which is really fast and was developed
 | 
					 | 
				
			||||||
especially for this purpose. It means, that I have to depend on the dplyr
 | 
					 | 
				
			||||||
package (which I am not a big fan of, since I meant to keep the package
 | 
					 | 
				
			||||||
self-contained).
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
# Reading list
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
* @Arizmendi2022 [--]
 | 
					 | 
				
			||||||
* @Bannert2014 [x]
 | 
					 | 
				
			||||||
* @Bousbia2010 [--]
 | 
					 | 
				
			||||||
* @Cerezo2020
 | 
					 | 
				
			||||||
* @GerjetsSchwan2021 [x]
 | 
					 | 
				
			||||||
* @Goldhammer2020
 | 
					 | 
				
			||||||
* @Guenther2007
 | 
					 | 
				
			||||||
* @HuberBannert2023 [x]
 | 
					 | 
				
			||||||
* @Kroehne2018
 | 
					 | 
				
			||||||
* @SchwanGerjets2021 [x]
 | 
					 | 
				
			||||||
* @vanderAalst2016 [Chap. 2, x]
 | 
					 | 
				
			||||||
* @vanderAalst2016 [Chap. 3]
 | 
					 | 
				
			||||||
* @vanderAalst2016 [Chap. 5, x]
 | 
					 | 
				
			||||||
* @Wang2019
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
# Open stuff
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
* Angle from which people approach table in Braunschweig? Consider in
 | 
					 | 
				
			||||||
  rotation variable?
 | 
					 | 
				
			||||||
* Time limit for `case` variable different for different events? (openTopic
 | 
					 | 
				
			||||||
  should be opened the longest)
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
  $\to$ I think this is not relevant since I am looking at time *between*
 | 
					 | 
				
			||||||
  events!
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
# Stuff AK found interesting
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
* Pre/post corona
 | 
					 | 
				
			||||||
* Identify school classes
 | 
					 | 
				
			||||||
* How many persons are present at the table?
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
# Other potential questions
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
* "Bursts"
 | 
					 | 
				
			||||||
* 1st vs. 2nd half of the day
 | 
					 | 
				
			||||||
* Can we identify "types of art"? With clustering or something?
 | 
					 | 
				
			||||||
* Possible to estimate how many persons per day? Maybe average of certain
 | 
					 | 
				
			||||||
  weekdays? ... ?
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
				
			|||||||
							
								
								
									
										577
									
								
								README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										577
									
								
								README.md
									
									
									
									
									
										Normal file
									
								
							@ -0,0 +1,577 @@
 | 
				
			|||||||
 | 
					Log data from the Multi-Touch Table at the HAUM
 | 
				
			||||||
 | 
					================
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The Multi Touch Table at the Herzog-Anton-Ulrich-Museum (HAUM) in
 | 
				
			||||||
 | 
					Braunschweig gives visitors of the Museum the opportunity to interact
 | 
				
			||||||
 | 
					with about 70 artworks and 3 virtual cards containing information about
 | 
				
			||||||
 | 
					the museum and its layout. The table was installed at the institute in
 | 
				
			||||||
 | 
					October 2016 and since November 2016 log files from interactions of
 | 
				
			||||||
 | 
					visitors of the museum have been collected. These log files are in an
 | 
				
			||||||
 | 
					unstructured format and cannot be easily analyzed. The purpose of the
 | 
				
			||||||
 | 
					following document is to describe how the data haven been transformed
 | 
				
			||||||
 | 
					and which decisions have been made along the way.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					# Data structure
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The log files contain lines that indicate the beginning and end of
 | 
				
			||||||
 | 
					possible activities that can be performed when interacting with the
 | 
				
			||||||
 | 
					artworks on the table. The layout of the table looks like pictures have
 | 
				
			||||||
 | 
					been tossed on a large table. Every artwork is visible at the start
 | 
				
			||||||
 | 
					configuration. People can move the pictures on the table, they can be
 | 
				
			||||||
 | 
					scaled and rotated. Additionally, the virtual picture cards can be
 | 
				
			||||||
 | 
					flipped in order to find more information of the artwork on the “back”
 | 
				
			||||||
 | 
					of the card. One has to press a little `i` for more information in one
 | 
				
			||||||
 | 
					of the bottom corners of the card. On the back of the card two to six
 | 
				
			||||||
 | 
					information cards can be found with a teaser text about a certain topic.
 | 
				
			||||||
 | 
					These topic cards can be opened and a hypertext with detailed
 | 
				
			||||||
 | 
					information opens. Within these hypertexts certain technical terms can
 | 
				
			||||||
 | 
					be clicked for lay people to get more information. This also opens up a
 | 
				
			||||||
 | 
					pop-up. The events encoded in the raw log files therefore have the
 | 
				
			||||||
 | 
					following structure.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    "Start Application"     --> Start Application
 | 
				
			||||||
 | 
					    "Show Application"
 | 
				
			||||||
 | 
					    "Transform start"       --> Move
 | 
				
			||||||
 | 
					    "Transform stop"
 | 
				
			||||||
 | 
					    "Show Info"             --> Flip Card
 | 
				
			||||||
 | 
					    "Show Front"
 | 
				
			||||||
 | 
					    "Artwork/OpenCard"      --> Open Topic
 | 
				
			||||||
 | 
					    "Artwork/CloseCard"
 | 
				
			||||||
 | 
					    "ShowPopup"             --> Open Popup
 | 
				
			||||||
 | 
					    "HidePopup"
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The right side shows what events can be extracted from these raw lines.
 | 
				
			||||||
 | 
					The “Start Application” is not an event in the original sense since it
 | 
				
			||||||
 | 
					only indicates if the table was started or maybe reset itself. This is
 | 
				
			||||||
 | 
					not an interaction with the table and therefore not interesting in
 | 
				
			||||||
 | 
					itself. All “Start Application” and “Show Application” are therefore
 | 
				
			||||||
 | 
					excluded from the data when further processed and are only in the raw
 | 
				
			||||||
 | 
					log files.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					# Parsing the raw log files
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The first step is to parse the raw log files that are stored by the
 | 
				
			||||||
 | 
					application as text files in a rather unstructured format to a format
 | 
				
			||||||
 | 
					that can be read by common statistics software packages. The data are
 | 
				
			||||||
 | 
					therefore transferred to a spread sheet format. The following section
 | 
				
			||||||
 | 
					describes what problems were encountered while doing this.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Corrupt lines
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					When reading the files containing the raw logs into R, a warning appears
 | 
				
			||||||
 | 
					that says
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    Warning messages:
 | 
				
			||||||
 | 
					      incomplete final line found on '2016/2016_11_18-11_31_0.log'
 | 
				
			||||||
 | 
					      incomplete final line found on '2016/2016_11_18-11_38_30.log'
 | 
				
			||||||
 | 
					      incomplete final line found on '2016/2016_11_18-11_40_36.log'
 | 
				
			||||||
 | 
					      ...
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					When you open these files, it looks like the last line contains some
 | 
				
			||||||
 | 
					binary content. It is unclear why and how this happens. So when reading
 | 
				
			||||||
 | 
					the data, these lines were removed. A warning will be given that
 | 
				
			||||||
 | 
					indicates how many files have been affected.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Extracted variables from raw log files
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The following variables (columns in the data frame) are extracted from
 | 
				
			||||||
 | 
					the raw log file:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `fileId`: Containing the zero-left-padded file name of the raw log
 | 
				
			||||||
 | 
					  file the data line has been extracted from
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `folder`: The folder names in which the raw log files haven been
 | 
				
			||||||
 | 
					  organized in. For the HAUM data set, the data are sorted by year
 | 
				
			||||||
 | 
					  (folders 2016, 2017, 2018, 2019, 2020, 2021, 2022, and 2023).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `date`: Extracted timestamp from the raw log file in the format
 | 
				
			||||||
 | 
					  `yyyy-mm-dd hh:mm:ss`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `timeMs`: Containing a timestamp in Milliseconds that restarts with
 | 
				
			||||||
 | 
					  every new raw log files.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `event`: Start and stop event tags. See above for possible values.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `item`: Identifier of the different items. This is a three-digit
 | 
				
			||||||
 | 
					  (left-padded) number. The numbers of the items correspond to the
 | 
				
			||||||
 | 
					  folder names in `/ContentEyevisit/eyevisit_cards_light/` and were
 | 
				
			||||||
 | 
					  orginally taken from the museums catalogue.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `popup`: Name of the pop-up opened. This is only interesting for
 | 
				
			||||||
 | 
					  “openPopup” events.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `topic`: The number of the topic card that has been opened at the back
 | 
				
			||||||
 | 
					  of the item card. See below for a more detailed descripttion what
 | 
				
			||||||
 | 
					  these numbers mean.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `x`: Value of x-coordinate in pixel on the 4K-Display
 | 
				
			||||||
 | 
					  ($3840 \times 2160$)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `y`: Value of y-coordinate in pixel
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `scale`: Number in 128 bit that indicates how much the card has been
 | 
				
			||||||
 | 
					  scaled
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `rotation`: Degree of rotation in start configuration.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					<!-- TODO: Nach welchem Zeitintervall resettet sich der Tisch wieder in die
 | 
				
			||||||
 | 
					  Ausgangskonfiguration? -> PM needs to look it up -->
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Variables after “closing of events”
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The raw log data consist of start and stop events for each event type.
 | 
				
			||||||
 | 
					After preprocessing four event types are extracted: `move`, `flipCard`,
 | 
				
			||||||
 | 
					`openTopic`, and `openPopup`. Except for the `move` events, which can
 | 
				
			||||||
 | 
					occur at any time when interacting with an item card on the table, the
 | 
				
			||||||
 | 
					events have a hierarchical order: An item card first needs to be flipped
 | 
				
			||||||
 | 
					(`flipCard`), then the topic cards on the back of the card can be opened
 | 
				
			||||||
 | 
					(`openTopic`), and finally pop-ups on these topic cards can be opened
 | 
				
			||||||
 | 
					(`openPopup`). This implies that the event `openPopup` can only be
 | 
				
			||||||
 | 
					present for a certain item, if the card has already been flipped (i.e.,
 | 
				
			||||||
 | 
					an event `flipCard` for the same item has already occured).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					After preprocessing, the data frame is now in a wide format with columns
 | 
				
			||||||
 | 
					for the start and the stop of each event and contains the following
 | 
				
			||||||
 | 
					variables:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `fileId.start` / `fileId.stop`: See above.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `date.start` / `date.stop`: See above.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `folder`: Containing the folder name (see above)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `case`: A numerical variable indicating cases in the data. A “case”
 | 
				
			||||||
 | 
					  indicates an interaction interval and could be defined in different
 | 
				
			||||||
 | 
					  ways. Right now a new case begins, when no event occurred for 20
 | 
				
			||||||
 | 
					  seconds or longer.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `path`: A path is defined as one interaction with one item A path can
 | 
				
			||||||
 | 
					  either start with a `flipCard` event or when an item has been touched
 | 
				
			||||||
 | 
					  for the first time within this case. A path ends with the item card
 | 
				
			||||||
 | 
					  being flipped close again or with the last movement of the card within
 | 
				
			||||||
 | 
					  this case. One case can contain several paths with the same item when
 | 
				
			||||||
 | 
					  the item is flipped open and flipped close again several times within
 | 
				
			||||||
 | 
					  a short time.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `glossar`: An indicator variable with values 0/1 that tracks if a
 | 
				
			||||||
 | 
					  pop-up has been opened from the glossar folder. These pop-ups can be
 | 
				
			||||||
 | 
					  assigned to the wrong item since it is not possible to do this
 | 
				
			||||||
 | 
					  algorithmically. It is possible that two items are flipped open that
 | 
				
			||||||
 | 
					  could both link to the same pop-up from a glossar. The indicator
 | 
				
			||||||
 | 
					  variable is left as a variable, so that these pop-ups can be easily
 | 
				
			||||||
 | 
					  deleted from the data. Right now, glossar entries can be ignored
 | 
				
			||||||
 | 
					  completely by setting an argument and this is done by default. Using
 | 
				
			||||||
 | 
					  the pop-ups from the glossar will need a lot more love, before it
 | 
				
			||||||
 | 
					  behaves satisfactorily.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `event`: Indicating the event. Can take tha values `move`, `flipCard`,
 | 
				
			||||||
 | 
					  `openTopic`, and `openPopup`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `item`: Identifier of the different artworks and information cards.
 | 
				
			||||||
 | 
					  This is a three-digit (left-padded) number. See above.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `timeMs.start` / `timeMs.stop`: See above.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `duration`: Calculated by $timeMs.stop - timeMs.start$ in
 | 
				
			||||||
 | 
					  Milliseconds. Needs to be adjusted for events spanning more than one
 | 
				
			||||||
 | 
					  log file by a factor of $60,000 \times \text{number of logfiles}$. See
 | 
				
			||||||
 | 
					  below for details.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `topic`: See above.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `popup`: See above.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `x.start` / `x.stop`: See above.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `y.start` / `y.stop`: See above.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `distance`: Euclidean distande calculated from $(x.start, y.start)$
 | 
				
			||||||
 | 
					  and $(x.stop, y.stop)$.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `scale.start` / `scale.stop`: See above.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `scaleSize`: Relative scaling of item card, calculated by
 | 
				
			||||||
 | 
					  $\frac{scale.stop}{scale.start}$.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `rotation.start` / `rotation.stop`: See above.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `rotationDegree`: Difference of rotation from $rotation.stop$ to
 | 
				
			||||||
 | 
					  $rotation.start$.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## How unclosed events are handled
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Events do not necessarily need to be completed. A person can, e.g.,
 | 
				
			||||||
 | 
					leave the table and not flip the item card close again. For `flipCard`,
 | 
				
			||||||
 | 
					`openTopic`, and `openPopup` the data frame contains `NA` when the event
 | 
				
			||||||
 | 
					does not complete. For `move` events it happens quite often that a start
 | 
				
			||||||
 | 
					event follows a start event and a stop event follows a stop event.
 | 
				
			||||||
 | 
					Technically a move event cannot *not* be finished and the number of
 | 
				
			||||||
 | 
					events without a start or stop indicate that the time resolution was not
 | 
				
			||||||
 | 
					sufficient to catch all these events accurately. Double start and stop
 | 
				
			||||||
 | 
					`move` events have therefore been deleted from the data set.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Additional meta data
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					For the HAUM data, I added meta data on state holidays and school
 | 
				
			||||||
 | 
					vacations.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This led to the following additional variables:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `holiday`
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- `vacations`
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					# Problems and how I handled them
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This lists some problems with the log data that required decisions.
 | 
				
			||||||
 | 
					These decisions influence the outcome and maybe even the data quality.
 | 
				
			||||||
 | 
					Hence, I tried to document how I handled these problems and explain the
 | 
				
			||||||
 | 
					decisions I made.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Weird behavior of `timeMs` and neg. `duration` values
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					`timeMs` resets itself every time a new log file starts. This means that
 | 
				
			||||||
 | 
					the durations of events spanning more than one log file must be
 | 
				
			||||||
 | 
					adjusted. Instead of just calculating $timeMs.stop - timeMs.start$,
 | 
				
			||||||
 | 
					`timeMs.start` must be subtracted from the maximum duration of the log
 | 
				
			||||||
 | 
					file where the event started ($600,000 ms$) and the `timeMs.stop` must
 | 
				
			||||||
 | 
					be added. If the event spans more than two log files, a multiple of
 | 
				
			||||||
 | 
					$600,000$ must be taken, e.g. for three log files it must be:
 | 
				
			||||||
 | 
					$2 \times 600,000 - timeMs.start + timeMs.stop$ and so on.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					<!-- -->
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The boxplot shows that we have a continuous range of values within one
 | 
				
			||||||
 | 
					log file but that `timeMs` does not increase over log files. I kept
 | 
				
			||||||
 | 
					`timeMs.start` and `timeMs.stop` and also `fileId.start` and
 | 
				
			||||||
 | 
					`fileId.stop` in the data frame, so it is clear when events span more
 | 
				
			||||||
 | 
					than one log file.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					<!--
 | 
				
			||||||
 | 
					Infos from the programmer:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					"Bin außerdem gerade den Code von damals durchgegangen. Das Logging läuft
 | 
				
			||||||
 | 
					so: Mit Start der Anwendung wird alle 10 Minuten ein neues Logfile
 | 
				
			||||||
 | 
					erstellt. Die Startzeit, von der aus die Duration berechnet wird, wird
 | 
				
			||||||
 | 
					jeweils neu gesetzt. Duration ist also nicht "Dauer seit Start der
 | 
				
			||||||
 | 
					Anwendung" sondern "Dauer seit Restart des Loggers". Deine Vermutung ist
 | 
				
			||||||
 | 
					also richtig - es sollte keine Durations >10 Minuten geben. Der erste
 | 
				
			||||||
 | 
					Eintrag eines Logfiles kann alles zwischen 0 und 10 Minuten sein (je
 | 
				
			||||||
 | 
					nachdem, ob der Tisch zum Zeitpunkt des neuen Logging-Intervalls in
 | 
				
			||||||
 | 
					Benutzung war). Wenn ein Case also über 2+ Logs verteilt ist, musst du auf
 | 
				
			||||||
 | 
					die Duration jeweils 10 Minuten pro Logfile nach dem ersten addieren, damit
 | 
				
			||||||
 | 
					es passt."
 | 
				
			||||||
 | 
					-->
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Left padding of file IDs
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The file names of the raw log files are automatically generated and
 | 
				
			||||||
 | 
					contain a timestamp. This timestamp is not well formed. First, it
 | 
				
			||||||
 | 
					contains an incorrect month. The months go from 0 to 11 which means,
 | 
				
			||||||
 | 
					that the file name `2016_11_15-12_12_57.log` was collected on December
 | 
				
			||||||
 | 
					15, 2016 at 12:12 pm. Another problem is that the file names are not
 | 
				
			||||||
 | 
					zero left padded, e.g., `2016_11_15-12_2_57.log`. This file was
 | 
				
			||||||
 | 
					collected on December 15, 2016 at 12:02 pm and therefore before the file
 | 
				
			||||||
 | 
					above. But most sorting algorithms, will sort these files in the order
 | 
				
			||||||
 | 
					shown below. In order to preprocess the data and close events that
 | 
				
			||||||
 | 
					belong together, the data need to be sorted by events and artworks
 | 
				
			||||||
 | 
					repeatedly. In order to get them back in the correct time order, it is
 | 
				
			||||||
 | 
					necessary to order them based on three variables: `fileId.start`,
 | 
				
			||||||
 | 
					`date.start` and `timeMs.start`. The file IDs therefore need to sort in
 | 
				
			||||||
 | 
					the correct order (again see below for example). I zero left padded the
 | 
				
			||||||
 | 
					log file names within the data frame using it as an identifier. These
 | 
				
			||||||
 | 
					“file names” do not correspond exactly to the original raw log file
 | 
				
			||||||
 | 
					names. This needs to be kept in mind when doing any kind of matching
 | 
				
			||||||
 | 
					etc.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    ## what it looked like before left padding
 | 
				
			||||||
 | 
					    # 1422  ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_2_57.log 2016-12-15 12:12:56  599671 Transform start     076 076.xml   NA 2092.25 2008.00 0.3000000   13.26874254
 | 
				
			||||||
 | 
					    # 1423 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57     621 Transform start     076 076.xml   NA 2092.25 2008.00 0.3000000   13.26523465
 | 
				
			||||||
 | 
					    # 1424 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57     677  Transform stop     076 076.xml   NA 2092.25 2008.00 0.2997736   13.26239605
 | 
				
			||||||
 | 
					    # 1425 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57     774 Transform start     076 076.xml   NA 2092.25 2008.00 0.2999345   13.26239605
 | 
				
			||||||
 | 
					    # 1426 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57     850  Transform stop     076 076.xml   NA 2092.25 2008.00 0.2997107   13.26223362
 | 
				
			||||||
 | 
					    # 1427  ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_2_57.log 2016-12-15 12:12:57  599916  Transform stop     076 076.xml   NA 2092.25 2008.00 0.2997771   13.26523465
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    ## what it looks like now
 | 
				
			||||||
 | 
					    # 1422 2016_11_15-12_02_57.log 2016-12-15 12:12:56  599671 Transform start     076 076.xml   NA 2092.25 2008.00 0.3000000   13.26874254
 | 
				
			||||||
 | 
					    # 1423 2016_11_15-12_02_57.log 2016-12-15 12:12:57  599916  Transform stop     076 076.xml   NA 2092.25 2008.00 0.2997771   13.26523465
 | 
				
			||||||
 | 
					    # 1424 2016_11_15-12_12_57.log 2016-12-15 12:12:57     621 Transform start     076 076.xml   NA 2092.25 2008.00 0.3000000   13.26523465
 | 
				
			||||||
 | 
					    # 1425 2016_11_15-12_12_57.log 2016-12-15 12:12:57     677  Transform stop     076 076.xml   NA 2092.25 2008.00 0.2997736   13.26239605
 | 
				
			||||||
 | 
					    # 1426 2016_11_15-12_12_57.log 2016-12-15 12:12:57     774 Transform start     076 076.xml   NA 2092.25 2008.00 0.2999345   13.26239605
 | 
				
			||||||
 | 
					    # 1427 2016_11_15-12_12_57.log 2016-12-15 12:12:57     850  Transform stop     076 076.xml   NA 2092.25 2008.00 0.2997107   13.26223362
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Timestamps repeat
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The timestamps in the `date` variable record year, month, day, hour,
 | 
				
			||||||
 | 
					minute and seconds. Since one second is not a very short time interval
 | 
				
			||||||
 | 
					for a move on a touch display, this is not fine grained enough to bring
 | 
				
			||||||
 | 
					events into the correct order, meaning there are events from the same
 | 
				
			||||||
 | 
					log file having the same timestamp and even events from different log
 | 
				
			||||||
 | 
					files having the same timestamp. The log files get written about every
 | 
				
			||||||
 | 
					10 minutes (which can easily be seen when looking at the file names of
 | 
				
			||||||
 | 
					the raw log files). So in order to get events in the correct order, it
 | 
				
			||||||
 | 
					is necessary to first order by file ID, within file ID then sort by
 | 
				
			||||||
 | 
					timestamp `date` and then within these more coarse grained timestamps
 | 
				
			||||||
 | 
					sort be `timeMs`. But as explained above, `timeMs` can only be sorted
 | 
				
			||||||
 | 
					within one file ID, since they do not increase consistently over log
 | 
				
			||||||
 | 
					files, but have a new setoff for each raw log file.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## x,y-coordinates outside of display range
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The display of the Multi-Touch-Table is a 4K-display with 3840 x 2160
 | 
				
			||||||
 | 
					pixels. When you plot the start and stop coordinates, the display is
 | 
				
			||||||
 | 
					clearly distinguishable. However, a lot of points are outside of the
 | 
				
			||||||
 | 
					display range. This can happen, when the art objects are scaled and then
 | 
				
			||||||
 | 
					moved to the very edge of the table. Then it will record pixels outside
 | 
				
			||||||
 | 
					of the table. These are actually valid data points and I will leave them
 | 
				
			||||||
 | 
					as is.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					``` r
 | 
				
			||||||
 | 
					datlogs <- read.table("code/results/event_logfiles_2024-02-21_16-07-33.csv", sep = ";",
 | 
				
			||||||
 | 
					                      header = TRUE)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					par(mfrow = c(1, 2))
 | 
				
			||||||
 | 
					plot(y.start ~ x.start, datlogs)
 | 
				
			||||||
 | 
					abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
 | 
				
			||||||
 | 
					plot(y.stop ~ x.stop, datlogs)
 | 
				
			||||||
 | 
					abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					<!-- -->
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					``` r
 | 
				
			||||||
 | 
					aggregate(cbind(x.start, x.stop, y.start, y.stop) ~ 1, datlogs, mean)
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    ##    x.start   x.stop  y.start   y.stop
 | 
				
			||||||
 | 
					    ## 1 1978.202 1975.876 1137.481 1133.494
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Pop-ups from glossar cannot be assigned to a specific item
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					All the information, pictures and texts for the topics and pop-ups are
 | 
				
			||||||
 | 
					stored in
 | 
				
			||||||
 | 
					`/data/haum/ContentEyevisit/eyevisit_cards_light/<item_number>`. Among
 | 
				
			||||||
 | 
					other things, each folder contains XML-files with the information about
 | 
				
			||||||
 | 
					any technical terms that can be opened from the hypertexts on the topic
 | 
				
			||||||
 | 
					cards. Often these information are item dependent and then the
 | 
				
			||||||
 | 
					corresponding XML-file is in the folder for this item. Sometimes,
 | 
				
			||||||
 | 
					however, more general terms can be opened. In order to avoid multiple
 | 
				
			||||||
 | 
					files containing the same information, these were stored in a folder
 | 
				
			||||||
 | 
					called `glossar` and get accessed from there. The raw log files only
 | 
				
			||||||
 | 
					contain the path to this glossar entry and did not record from which
 | 
				
			||||||
 | 
					item it was accessed. I tried to assign these glossar entries to the
 | 
				
			||||||
 | 
					correct items. The (very heuristic) approach was this:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					1.  Create a lookup table with all XML-file names (possible pop-ups)
 | 
				
			||||||
 | 
					    from the glossar folder and what items possibly call them. This was
 | 
				
			||||||
 | 
					    stored as an `RData` object for easier handling but should maybe be
 | 
				
			||||||
 | 
					    stored in a more interoperable format.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2.  I went through all possible pop-ups in this lookup table and stored
 | 
				
			||||||
 | 
					    the items that are associated with it.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					3.  I created a sub data frame without move events (since they can never
 | 
				
			||||||
 | 
					    be associated with a pop-up) and went through every line and looked
 | 
				
			||||||
 | 
					    up if an item and a topic card had been opened. If this was the case
 | 
				
			||||||
 | 
					    and a glossar entry came up before the item was closed again, I
 | 
				
			||||||
 | 
					    assigned this item to the glossar entry.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This is heuristic since it is possible that several topic cards from
 | 
				
			||||||
 | 
					different items are opened simultaneously and the glossar pop-up could
 | 
				
			||||||
 | 
					be opened from either one (it could even be more than two, of course).
 | 
				
			||||||
 | 
					In these cases the item that was opened closest to the glossar pop-up
 | 
				
			||||||
 | 
					has been assigned, but this can never be completely error free.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					And this heuristic only assigns a little more than half of the glossar
 | 
				
			||||||
 | 
					entries. Since my heuristic only looks for the last item that has been
 | 
				
			||||||
 | 
					opened and if this item is a possible candidate it misses all glossar
 | 
				
			||||||
 | 
					pop-ups where another item has been opened in between. This is still an
 | 
				
			||||||
 | 
					open TODO to write a more elaborate algorithm.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					All glossar pop-ups that do not get matched with an item are removed
 | 
				
			||||||
 | 
					from the data set with a warning if the argument `glossar = TRUE` is
 | 
				
			||||||
 | 
					set. Otherwise the glossar entries will be ignored completely.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Assign a `case` variable based on “time heuristic”
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					One thing needed in order to work with the data set and use it for
 | 
				
			||||||
 | 
					machine learning algorithms like process mining, is a variable that
 | 
				
			||||||
 | 
					tries to identify a case. A case variable will structure the data frame
 | 
				
			||||||
 | 
					in a way that navigation behavior can actually be investigated. However,
 | 
				
			||||||
 | 
					we do not know if several people are standing around the table
 | 
				
			||||||
 | 
					interacting with it or just one very active person. The simplest way to
 | 
				
			||||||
 | 
					define a case variable is to just use a time limit between events. This
 | 
				
			||||||
 | 
					means that when the table has not been interacted with for, e.g., 20
 | 
				
			||||||
 | 
					seconds than it is assumed that a person moved on and a new person
 | 
				
			||||||
 | 
					started interacting with the table. This is the easiest heuristic and
 | 
				
			||||||
 | 
					implemented at the moment. Process mining shows that this simple
 | 
				
			||||||
 | 
					approach works in a way that the correct process gets extracted by the
 | 
				
			||||||
 | 
					algorithm.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					In order to investigate user behavior on a more fine grained level, it
 | 
				
			||||||
 | 
					will be necessary to come up with a more elaborate approach. A better,
 | 
				
			||||||
 | 
					still simple approach, could be to use this kind of time limit and
 | 
				
			||||||
 | 
					additionally look at the distance between items interacted with within
 | 
				
			||||||
 | 
					one time window. When items are far apart it seems plausible that more
 | 
				
			||||||
 | 
					than one person interacted with them. Very short time lapses between
 | 
				
			||||||
 | 
					events on different items could also be an indicator that more than one
 | 
				
			||||||
 | 
					person is interacting with the table.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Assign a `path` variable
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The `path` variable is supposed to show one interaction trace with one
 | 
				
			||||||
 | 
					artwork. Meaning it starts when an artwork is touched or flipped and
 | 
				
			||||||
 | 
					stops when it is closed again. It is easy to assign a path from flipping
 | 
				
			||||||
 | 
					a card over opening (maybe several) topics and pop-ups for this artwork
 | 
				
			||||||
 | 
					card until closing this card again. But one would like to assign the
 | 
				
			||||||
 | 
					same path to move events surrounding this interaction. Again, this is
 | 
				
			||||||
 | 
					not possible in an algorithmic way but only heuristically.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Again, I used a time cutoff for this. First, if a `move` event occurs,
 | 
				
			||||||
 | 
					it is checked, if the same item has been flipped less than 20 seconds
 | 
				
			||||||
 | 
					beforehand. If yes, the same path indicator is assigned to this `move`.
 | 
				
			||||||
 | 
					If not, temporarily a new “move indicator” is assigned. Then, a
 | 
				
			||||||
 | 
					“backward pass” is applied, where it is checked if the same item is
 | 
				
			||||||
 | 
					opened less than 20 seconds *after* the event occurs. If yes, that path
 | 
				
			||||||
 | 
					indicator is assigned. For all the remaining moves, a new path number is
 | 
				
			||||||
 | 
					assigned. This corresponds to items being moved without being flipped.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## A `move` event does not record any change
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Most of the events in the log files are move events. Additionally, many
 | 
				
			||||||
 | 
					of these move events are recorded but they do not indicate any change,
 | 
				
			||||||
 | 
					meaning the only difference is the timestamp. All other variables
 | 
				
			||||||
 | 
					indicating moves like `x.start` and `x.stop`, `rotation.start` and
 | 
				
			||||||
 | 
					`rotation.stop` etc. do not show *any* change. They represent about 2/3
 | 
				
			||||||
 | 
					of all move events. These events are probably short touches of the table
 | 
				
			||||||
 | 
					without an actual interaction. They were therefore removed from the data
 | 
				
			||||||
 | 
					set.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Card indices go from 0 to 7 (instead of 0 to 5 as expected)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					In the beginning I thought that the number for topics was the index of
 | 
				
			||||||
 | 
					where the card was presented on the back of the item. But this is not
 | 
				
			||||||
 | 
					correct. It is the number of the topic. There are eight topics in total:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    Indices for topics:
 | 
				
			||||||
 | 
					    0   artist
 | 
				
			||||||
 | 
					    1   thema
 | 
				
			||||||
 | 
					    2   komposition
 | 
				
			||||||
 | 
					    3   leben des kunstwerks
 | 
				
			||||||
 | 
					    4   details
 | 
				
			||||||
 | 
					    5   licht und farbe
 | 
				
			||||||
 | 
					    6   extra info
 | 
				
			||||||
 | 
					    7   technik
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					On the back of items, there can be between 2 to 6 topic cards. Several
 | 
				
			||||||
 | 
					of these topic cards can be about the same topic, e.g., there can be two
 | 
				
			||||||
 | 
					topic cards assigned to the topic `thema`. It is impossible to find out
 | 
				
			||||||
 | 
					if the same topic card was opened several times or if different topic
 | 
				
			||||||
 | 
					cards with the same topic were opened from the same item. See example
 | 
				
			||||||
 | 
					below for item “001”.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    ##   item            file_name                topic
 | 
				
			||||||
 | 
					    ## 1  001 001_dargestellte.xml                thema
 | 
				
			||||||
 | 
					    ## 2  001       001_thema1.xml                thema
 | 
				
			||||||
 | 
					    ## 3  001        001_leben.xml leben des kunstwerks
 | 
				
			||||||
 | 
					    ## 4  001       001_leben3.xml leben des kunstwerks
 | 
				
			||||||
 | 
					    ## 5  001       001_thema2.xml                thema
 | 
				
			||||||
 | 
					    ## 6  001        001_thema.xml                thema
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## New artworks “504” and “505” starting October 2022
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					When I read in the complete data frame for the first time, all of the
 | 
				
			||||||
 | 
					sudden there were 72 instead of 70 items. It seems like these two
 | 
				
			||||||
 | 
					artworks appear on October 21, 2022.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					``` r
 | 
				
			||||||
 | 
					summary(as.Date(datraw[datraw$item %in% c("504", "505"), "date"]))
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    ##         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
 | 
				
			||||||
 | 
					    ## "2022-10-21" "2023-01-11" "2023-03-08" "2023-03-09" "2023-05-21" "2023-07-05"
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The artworks seem to be have updated in general after October 21, 2022.
 | 
				
			||||||
 | 
					The following table shows which items were presented in which years.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					``` r
 | 
				
			||||||
 | 
					xtabs(~ item + lubridate::year(date.start), datlogs)
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    ##      lubridate::year(date.start)
 | 
				
			||||||
 | 
					    ## item   2016  2017  2018  2019  2020  2022  2023
 | 
				
			||||||
 | 
					    ##   1     277  4082  1912  1434   424   394  1315
 | 
				
			||||||
 | 
					    ##   3     485  6730  3126  2356   528   457  1124
 | 
				
			||||||
 | 
					    ##   19    714  8656  4028  2743   660   698  1595
 | 
				
			||||||
 | 
					    ##   20    595  8461  3996  2983   938   657  1355
 | 
				
			||||||
 | 
					    ##   24    497  6638  2912  2251   649   439  1028
 | 
				
			||||||
 | 
					    ##   27    567  5959  3112  2318   651   711  1324
 | 
				
			||||||
 | 
					    ##   28    601  9329  4394  3056   778   762  1570
 | 
				
			||||||
 | 
					    ##   29    425  6865  3830  2365   516   615  1174
 | 
				
			||||||
 | 
					    ##   31    289  4118  2051  1218   291   296   675
 | 
				
			||||||
 | 
					    ##   32    562  7016  3477  2253   726   766  1647
 | 
				
			||||||
 | 
					    ##   33    509  4936  2242  1449   555   358   666
 | 
				
			||||||
 | 
					    ##   36    434  4505  2276  1668   373   387   976
 | 
				
			||||||
 | 
					    ##   37    242  4478  2182  1554   339   423  1168
 | 
				
			||||||
 | 
					    ##   38    480  4617  2144  1397   371   381   784
 | 
				
			||||||
 | 
					    ##   39    395  3227  1313  1003   237   161   622
 | 
				
			||||||
 | 
					    ##   41    282  3329  1303  1022   225   209   701
 | 
				
			||||||
 | 
					    ##   42    203  3113  1307   903   242   191   421
 | 
				
			||||||
 | 
					    ##   43    115  2420  1089   806   176   219   486
 | 
				
			||||||
 | 
					    ##   45   1491 13561  5924  4474   966   585  1828
 | 
				
			||||||
 | 
					    ##   46    903  9181  5340  3812   961   944  1648
 | 
				
			||||||
 | 
					    ##   47    306  4949  2395  1510   750   297   675
 | 
				
			||||||
 | 
					    ##   48    723 10455  5384  4162  1328   948  2031
 | 
				
			||||||
 | 
					    ##   49    433  4326  2124  1414   434   431   809
 | 
				
			||||||
 | 
					    ##   51    564  7837  4577  2991   884   659  1370
 | 
				
			||||||
 | 
					    ##   52    447  5021  2104  1729   471   349   840
 | 
				
			||||||
 | 
					    ##   54    424  5068  2816  2008   529   370   918
 | 
				
			||||||
 | 
					    ##   55    358  4859  2069  1428   341   403  1303
 | 
				
			||||||
 | 
					    ##   57    860 14264  6625  5092  1410  1221  2714
 | 
				
			||||||
 | 
					    ##   60    555  6865  3539  2336   639   586  1415
 | 
				
			||||||
 | 
					    ##   62    547  6736  3803  2210   795   633  1322
 | 
				
			||||||
 | 
					    ##   63    251  3677  1827  1241   300   282   527
 | 
				
			||||||
 | 
					    ##   66    552  6004  2774  1977   505   373   932
 | 
				
			||||||
 | 
					    ##   69    394  3730  1827  1438   272   206   680
 | 
				
			||||||
 | 
					    ##   70    226  3766  1843   973   293   268   703
 | 
				
			||||||
 | 
					    ##   71    557  6160  2490  1846   570   323   839
 | 
				
			||||||
 | 
					    ##   72    426  6194  2857  2129   508   635  1553
 | 
				
			||||||
 | 
					    ##   73    432  6125  2880  1821   583   395   939
 | 
				
			||||||
 | 
					    ##   75    258  5885  2418  1562   369   257   645
 | 
				
			||||||
 | 
					    ##   76    861 12435  6253  4214  1753  1153  2268
 | 
				
			||||||
 | 
					    ##   77    816  8595  4197  2897   699   674  1452
 | 
				
			||||||
 | 
					    ##   78    410  5632  2498  1924   394   408   850
 | 
				
			||||||
 | 
					    ##   80   1650 25687 12429  7782  1975  1712  4433
 | 
				
			||||||
 | 
					    ##   83    644  8618  4720  3026   987  1027  2294
 | 
				
			||||||
 | 
					    ##   84    184  2121  1231   759   231   254   465
 | 
				
			||||||
 | 
					    ##   87    149  1618   722   632    99     0     0
 | 
				
			||||||
 | 
					    ##   88    513  6996  3493  2272   539   533  1420
 | 
				
			||||||
 | 
					    ##   89    214  2204   950   723   156     0     0
 | 
				
			||||||
 | 
					    ##   90    281  3756  1372  1143   403   320   932
 | 
				
			||||||
 | 
					    ##   93    613  8528  4224  3015   696  1174  2058
 | 
				
			||||||
 | 
					    ##   98    462  6662  3265  2565   704   670  1453
 | 
				
			||||||
 | 
					    ##   99    180  4162  1653  1454   363   411   868
 | 
				
			||||||
 | 
					    ##   101   414  4209  1859  1282   392   411   981
 | 
				
			||||||
 | 
					    ##   103   677  8758  4366  3165  1045   909  1871
 | 
				
			||||||
 | 
					    ##   104   423  5256  2381  1865   463   467   933
 | 
				
			||||||
 | 
					    ##   107   181  2101  1106   788   205   146   339
 | 
				
			||||||
 | 
					    ##   109   321  4001  1619  1106   292   188   453
 | 
				
			||||||
 | 
					    ##   110   489  5846  2785  2008   494   387   923
 | 
				
			||||||
 | 
					    ##   125   640  8435  4519  3334   926     0     0
 | 
				
			||||||
 | 
					    ##   129   598 11322  5046  3369   910  1131  1682
 | 
				
			||||||
 | 
					    ##   145   419  7821  3945  2694   706   740  1396
 | 
				
			||||||
 | 
					    ##   176   507  8465  3968  2787   687   552  1544
 | 
				
			||||||
 | 
					    ##   180   516  7563  3720  2765   585   550  1272
 | 
				
			||||||
 | 
					    ##   183   377  4014  1819  1741   346   251   675
 | 
				
			||||||
 | 
					    ##   187   340  4222  2165  1753   319   312   734
 | 
				
			||||||
 | 
					    ##   197   426  7710  3603  2510   671   602  1217
 | 
				
			||||||
 | 
					    ##   229   303  4872  2360  1891   482   389  1005
 | 
				
			||||||
 | 
					    ##   231   271  3606  1851  1239   318   236   467
 | 
				
			||||||
 | 
					    ##   501  1915 15968  7849  5060  1157   890  2989
 | 
				
			||||||
 | 
					    ##   502  1212 14550  7111  4749  1105   883  2752
 | 
				
			||||||
 | 
					    ##   503  1308 15218  8632  6399  1626   870  2558
 | 
				
			||||||
 | 
					    ##   504     0     0     0     0     0   363   662
 | 
				
			||||||
 | 
					    ##   505     0     0     0     0     0   426  1533
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					It shows that the artworks haven been updated after the Corona pandemic.
 | 
				
			||||||
 | 
					I think, the table was also moved to a different location at that point.
 | 
				
			||||||
							
								
								
									
										
											BIN
										
									
								
								README_files/figure-gfm/timems-1.png
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								README_files/figure-gfm/timems-1.png
									
									
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							| 
		 After Width: | Height: | Size: 6.2 KiB  | 
							
								
								
									
										
											BIN
										
									
								
								README_files/figure-gfm/xycoord-1.png
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								README_files/figure-gfm/xycoord-1.png
									
									
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							| 
		 After Width: | Height: | Size: 12 KiB  | 
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user