Offene Fragen
Datenverständnis
- Welche Einheit haben x und y? Pixel? --> yes
- Welche Einheit hat scale? --> some kind if bit, does not matter, when calculating a ratio
- rotation wirklich degree? --> yes
- Nach welchem Zeitintervall resettet sich der Tisch wieder in die Ausgangskonfiguration? --> PM needs to look it up
Tisch-Software
- Gibt es Doku für die Bilder, die über die xml files hinausgeht? Sowas wie ein Manual oder ähnliches?
- Gibt es evtl. irgendwo noch ein Tablet mit der Anwendung drauf?
- Was bedeuten die Farben der Topic Cards? --> sieht man in den xml files
Event Logs
- Wie gehen wir mit "nicht geschlossenen" Events um? Einfach rauslöschen?
- für Transform tendiere ich zu ja, weil sonst total uninteressant
- bei flipCard bin ich nicht so sicher... Aber man kann dann keine duration berechnen, wäre NA
 
- Moves/scales/rotations ohne Veränderung würde ich auf jeden Fall rauslöschen
- Es ist nicht möglich (bzw. ich weiß nicht wie) zusammengehörige Events
eineindeutig zu identifizieren
- nach Heuristik vorgehen? Doppelte Transformation start und stop einfach raus?
- Daten sind nicht "fehlerfrei"; es gibt z.B. Transformation-Events wo das Ende nicht geloggt wurde
 
- Wie identifiziere ich eine "Interaktionseinheit"?
- Was ist ein "case"?
- Eher grob über Zeitintervalle?
- Noch irgendeine andere Idee?
 
- Herausfinden, ob mehr als eine Person am Tisch steht?
- Sliding window, in der Anzahl von Artworks gezählt wird? Oder wie weit angefasste Artworks voneinander entfernt sind?
- Man kann sowas schon "sehen" in den Logs - aber wie kann ich es automatisiert rausziehen? Was ist meine Definition von "Interaktionsboost"?
- Egal wie wir es machen, geht es auf den "Event-Log-Daten"?
 
- Anreicherung der Log-Daten mit weiteren Metadaten? Was wäre interessant?
- Metadata on artworks like, name, artist, type of artwork, epoch, etc.  - School vacations and holidays  - Special exhibits at the museum  - Number of visitors per day  - Age structure of visitors per day?
- ... ????
 
HAUM
- Bei Sven noch mal nachhaken wegen Besucherzahlen?
Problems and how I handled them
This lists some problems with the log data that required decisions. These decisions influence the outcome and maybe even the data quality. Hence, I tried to document how I handled these problems and explain the decisions I made.
Weird behavior of time_ms and neg. durationvalues
I think the negative duration values happen, when an event starts in one
log file and completes in another one. The variable time_ms seems to be
continuous within one log file but not over several log files.
dat_all[which(dat_all$duration < 0), ][1:5, 1:10]
# flipCard
## trace 56
dat3[dat3$trace == 56,]
dat[dat$fileid == "2016_11_15-11_12_57.log" & dat$date == "2016-12-15 11:17:26", ]
dat[dat$fileid == "2016_11_15-11_42_57.log" & dat$date == "2016-12-15 11:46:19", ]
#dat[309:1405, ]
tmp <- dat[300:1405, ]
tmp[tmp$artwork == "051", ]
## -> was closed correctly, but does it belong together?
## trace 61
dat3[dat3$trace == 61,]
dat[dat$fileid == "2016_11_15-11_12_57.log" & dat$date == "2016-12-15 11:17:52", ]
dat[dat$fileid == "2016_11_15-11_42_57.log" & dat$date == "2016-12-15 11:46:19", ]
tmp <- dat[350:1408, ]
tmp[tmp$artwork == "057", ]
## -> was closed correctly, but does it belong together?
# openTopic
dat_all[which(dat_all$duration < 0), ][100:105, 1:10]
# trace 2052
dat4[dat4$trace == 2052,]
dat[dat$fileid == "2016_11_17-14_12_10.log" & dat$date == "2016-12-17 14:21:51", ]
dat[dat$fileid == "2016_11_17-14_22_10.log" & dat$date == "2016-12-17 14:22:25", ]
tmp <- dat[23801:23950, ]
tmp[tmp$artwork == "502", ]
plot(time_ms ~ as.factor(fileid), dat[1:5000,])
The boxplot shows that we have a continuous range of values within one log
file but that time_ms does not increase over log files.
Since it seems not possible to fix this in a consistent way, I will set
negative durations to NA. I will keep time_ms.start and time_ms.stop
in the data frame, so it is clear why there are no durations. Maybe it
would also be useful to keep logfileid.start and logfileid.stop in the
data? Maybe just for proof checking this theory...
Part of it was that timestamps that are part of the log file names are not
zero-left-padded. But this fixed only three move events, since it only
fixed irregularities within one log file.
table(dat_all[dat_all$duration < 0, "event"])
#  flipCard      move openPopup openTopic
#       562       100        34       284
dat[dat$event %in% c("Transform start", "Transform stop"), ][1100:1300,]
# --> got fixed by left padding... but only three all together!!
dat_all[735, ]
## what it looked like before left padding
# 1422  ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_2_57.log 2016-12-15 12:12:56  599671 Transform start     076 076.xml   NA 2092.25 2008.00 0.3000000   13.26874254
# 1423 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57     621 Transform start     076 076.xml   NA 2092.25 2008.00 0.3000000   13.26523465
# 1424 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57     677  Transform stop     076 076.xml   NA 2092.25 2008.00 0.2997736   13.26239605
# 1425 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57     774 Transform start     076 076.xml   NA 2092.25 2008.00 0.2999345   13.26239605
# 1426 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57     850  Transform stop     076 076.xml   NA 2092.25 2008.00 0.2997107   13.26223362
# 1427  ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_2_57.log 2016-12-15 12:12:57  599916  Transform stop     076 076.xml   NA 2092.25 2008.00 0.2997771   13.26523465
## what it looks like now
# 1422 2016_11_15-12_02_57.log 2016-12-15 12:12:56  599671 Transform start     076 076.xml   NA 2092.25 2008.00 0.3000000   13.26874254
# 1423 2016_11_15-12_02_57.log 2016-12-15 12:12:57  599916  Transform stop     076 076.xml   NA 2092.25 2008.00 0.2997771   13.26523465
# 1424 2016_11_15-12_12_57.log 2016-12-15 12:12:57     621 Transform start     076 076.xml   NA 2092.25 2008.00 0.3000000   13.26523465
# 1425 2016_11_15-12_12_57.log 2016-12-15 12:12:57     677  Transform stop     076 076.xml   NA 2092.25 2008.00 0.2997736   13.26239605
# 1426 2016_11_15-12_12_57.log 2016-12-15 12:12:57     774 Transform start     076 076.xml   NA 2092.25 2008.00 0.2999345   13.26239605
# 1427 2016_11_15-12_12_57.log 2016-12-15 12:12:57     850  Transform stop     076 076.xml   NA 2092.25 2008.00 0.2997107   13.26223362
time_ms does not increase from log file to log file
tmp1 <- dat[!duplicated(dat$fileid), c("fileid", "time_ms", "event")]
tmp2 <- dat[!duplicated(dat$fileid, fromLast=T), c("fileid", "time_ms", "event")]
tmp <- rbind(tmp1, tmp2)
tmp <- tmp[order(tmp$fileid), ]
head(tmp, 50)
plot(time_ms ~ as.factor(fileid), dat[1:2000, ], xlab = "fileid")
x,y-coordinates outside of display range
The display is a 4K-display with 3840 x 2160 pixels. When you plot the start and stop coordinates, the display is clearly to distinguish. However, a lot of points are outside of the display range. This can happen, when the art objects are scaled and then moved to the very edge of the table. Then it will record pixels outside of the table. These are actually valid data points and I will leave them as is.
par(mfrow = c(1, 2))
plot(y.start ~ x.start, dat)
abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
plot(y.stop ~ x.stop, dat)
abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
aggregate(cbind(x.start, x.stop, y.start, y.stop) ~ 1, dat, mean)
Timestamps repeat
Popups from glossar cannot be assigned to a specific artwork
Assign a case variable based on "time heuristic"
A moveevent does not record any change
Add moves to trace variable
openPopup does not close correctly
The sorting had to include popup otherwise nested events could not be
closed correctly.
# TODO: Some correct entries are not closed:
df[df$trace == 1843, ]
# WHY???
# --> Wrong eventid!
dat5[dat5$trace == 1843, ]
openPopup_wide[openPopup_wide$trace == 1843, ]
Events that only close (date.start is NA)
It looks like there is some kind of log error for the events that do not
have a start stop. I was able to get rid of most by sorting for popup for
the openPopup events, but there are still some left (50 for the small data
set, which corresponds to 0.2 per mill).
# remove all events that do not have a `date.start`
dim(dat_all[is.na(dat_all$date.start), ])
dat_all <- dat_all[!is.na(dat_all$date.start), ]
# TODO: Find out how it can be that there is only a `date.stop`
## --> happens, when event is not properly closed, see here:
df[df$trace == 1843, ]
dat_openPopup[dat_openPopup$trace == 1843, ]
## --> still 50 (small data set) left, and some really do not seem to be
## opened! Must be a log error
# --> others should be closed!
dat[31000:31019,]     # this one e.g.
# --> Actually NOT! card gets flipped before! Again - log error!
Will probably just get rid of them!
Think about if you want give warning messages about these deletions in the functions.
Card indices go from 0 to 7 (instead of 0 to 5 as expected)
See questions_number-of-cards.R for details.
Extracting topics
When I extract the topics from index.html I get different topics, than
when I get them from <artwork>.html. At first glance, it looks like using
index.html actually gives the wrong results.
topics <- extract_topics(artworks, "index.xml", path)
topics2 <- extract_topics(artworks, paste0(artworks, ".xml"), path)
topics[!topics$file_name %in% topics2$file_name, ]
# artwork       file_name   topic index
#     072  072_artist.xml  artist     1
#     073  073_artist.xml  artist     1
#     110 110_technik.xml technik     2
topics2[!topics2$file_name %in% topics$file_name, ]
# artwork          file_name      topic index
#     031  031_vergleich.xml extra info     6
#     033    033_technik.xml    technik     2
#     055 055_vergleich4.xml extra info     5
#     063     063_thema3.xml      thema     3
#     063 063_extrainfo1.xml      thema     4
#     072    072_artist2.xml     artist     1
#     073    073_artist2.xml     artist     1
#     099    099_technik.xml    technik     2
#     110 110_technikneu.xml    technik     2
For artwork 031, index.html only defines 5 cards (the 6th is commented
out), but topicNumber for this artwork has 6 different entries. I will
therefore extract the topics from <artwork>.html. (This seems also better
compatible with other data sets like 8o8m.
Reading list
- @Arizmendi2022 [$-$]
- @Bannert2014 [x]
- @Bousbia2010 [$-$]
- @Cerezo2020
- @GerjetsSchwan2021 [x]
- @Goldhammer2020
- @Guenther2007
- @HuberBannert2023 [x]
- @Kroehne2018
- @SchwanGerjets2021 [x]
- @vanderAalst2016 [Chap. 2, x]
- @vanderAalst2016 [Chap. 3]
- @vanderAalst2016 [Chap. 5, x]
- @Wang2019
Open stuff
- Angle from which people approach table in Braunschweig? Consider in rotation variable?
- Time limit for casevariable different for different events? (openTopic should be opened the longest) --> I think this is not relevant since I am looking at time between events!
Stuff AK found interesting
- Pre/post corona
- Identify school classes
- How many persons are present at the table?
Other potential questions
- "Bursts"
- 1st vs. 2nd half of the day
- Can we identify "types of art"? With clustering or something?
- Possible to estimate how many persons per day? Maybe average of certain weekdays? ... ?