322 lines
12 KiB
Markdown
322 lines
12 KiB
Markdown
# Offene Fragen
|
||
|
||
## Datenverständnis
|
||
|
||
* Welche Einheit haben x und y? Pixel? --> yes
|
||
* Welche Einheit hat scale? --> some kind if bit, does not matter, when
|
||
calculating a ratio
|
||
* rotation wirklich degree? --> yes
|
||
* Nach welchem Zeitintervall resettet sich der Tisch wieder in die
|
||
Ausgangskonfiguration? --> PM needs to look it up
|
||
|
||
## Tisch-Software
|
||
|
||
* Gibt es Doku für die Bilder, die über die xml files hinausgeht? Sowas wie
|
||
ein Manual oder ähnliches?
|
||
* Gibt es evtl. irgendwo noch ein Tablet mit der Anwendung drauf?
|
||
* Was bedeuten die Farben der Topic Cards? --> sieht man in den xml files
|
||
|
||
## Event Logs
|
||
|
||
* Wie gehen wir mit "nicht geschlossenen" Events um? Einfach rauslöschen?
|
||
- für Transform tendiere ich zu ja, weil sonst total uninteressant
|
||
- bei flipCard bin ich nicht so sicher... Aber man kann dann keine
|
||
duration berechnen, wäre NA
|
||
* Moves/scales/rotations ohne Veränderung würde ich auf jeden Fall
|
||
rauslöschen
|
||
* Es ist nicht möglich (bzw. ich weiß nicht wie) zusammengehörige Events
|
||
eineindeutig zu identifizieren
|
||
- nach Heuristik vorgehen? Doppelte Transformation start und stop einfach
|
||
raus?
|
||
- Daten sind nicht "fehlerfrei"; es gibt z.B. Transformation-Events wo
|
||
das Ende nicht geloggt wurde
|
||
* Wie identifiziere ich eine "Interaktionseinheit"?
|
||
- Was ist ein "case"?
|
||
- Eher grob über Zeitintervalle?
|
||
- Noch irgendeine andere Idee?
|
||
* Herausfinden, ob mehr als eine Person am Tisch steht?
|
||
- Sliding window, in der Anzahl von Artworks gezählt wird? Oder wie weit
|
||
angefasste Artworks voneinander entfernt sind?
|
||
- Man kann sowas schon "sehen" in den Logs - aber wie kann ich es
|
||
automatisiert rausziehen? Was ist meine Definition von
|
||
"Interaktionsboost"?
|
||
- Egal wie wir es machen, geht es auf den "Event-Log-Daten"?
|
||
* Anreicherung der Log-Daten mit weiteren Metadaten? Was wäre interessant?
|
||
- Metadata on artworks like, name, artist, type of artwork, epoch, etc.
|
||
- School vacations and holidays
|
||
- Special exhibits at the museum
|
||
- Number of visitors per day
|
||
- Age structure of visitors per day?
|
||
- ... ????
|
||
|
||
## HAUM
|
||
|
||
* Bei Sven noch mal nachhaken wegen Besucherzahlen?
|
||
|
||
|
||
|
||
|
||
# Problems and how I handled them
|
||
|
||
This lists some problems with the log data that required decisions. These
|
||
decisions influence the outcome and maybe even the data quality. Hence, I
|
||
tried to document how I handled these problems and explain the decisions I
|
||
made.
|
||
|
||
## Weird behavior of `time_ms` and neg. `duration`values
|
||
|
||
I think the negative duration values happen, when an event starts in one
|
||
log file and completes in another one. The variable `time_ms` seems to be
|
||
continuous within one log file but not over several log files.
|
||
|
||
```{r}
|
||
dat_all[which(dat_all$duration < 0), ][1:5, 1:10]
|
||
|
||
# flipCard
|
||
## trace 56
|
||
dat3[dat3$trace == 56,]
|
||
|
||
dat[dat$fileid == "2016_11_15-11_12_57.log" & dat$date == "2016-12-15 11:17:26", ]
|
||
dat[dat$fileid == "2016_11_15-11_42_57.log" & dat$date == "2016-12-15 11:46:19", ]
|
||
|
||
#dat[309:1405, ]
|
||
|
||
tmp <- dat[300:1405, ]
|
||
tmp[tmp$artwork == "051", ]
|
||
## -> was closed correctly, but does it belong together?
|
||
|
||
|
||
## trace 61
|
||
dat3[dat3$trace == 61,]
|
||
|
||
dat[dat$fileid == "2016_11_15-11_12_57.log" & dat$date == "2016-12-15 11:17:52", ]
|
||
dat[dat$fileid == "2016_11_15-11_42_57.log" & dat$date == "2016-12-15 11:46:19", ]
|
||
|
||
tmp <- dat[350:1408, ]
|
||
tmp[tmp$artwork == "057", ]
|
||
## -> was closed correctly, but does it belong together?
|
||
|
||
|
||
# openTopic
|
||
dat_all[which(dat_all$duration < 0), ][100:105, 1:10]
|
||
|
||
# trace 2052
|
||
dat4[dat4$trace == 2052,]
|
||
|
||
dat[dat$fileid == "2016_11_17-14_12_10.log" & dat$date == "2016-12-17 14:21:51", ]
|
||
dat[dat$fileid == "2016_11_17-14_22_10.log" & dat$date == "2016-12-17 14:22:25", ]
|
||
|
||
tmp <- dat[23801:23950, ]
|
||
tmp[tmp$artwork == "502", ]
|
||
|
||
plot(time_ms ~ as.factor(fileid), dat[1:5000,])
|
||
```
|
||
|
||
The boxplot shows that we have a continuous range of values within one log
|
||
file but that `time_ms` does not increase over log files.
|
||
<!--
|
||
TODO: I will probably update how events are closed and the names of these
|
||
data frame, especially `dat3` and `dat4` will have to be adjusted.
|
||
-->
|
||
Since it seems not possible to fix this in a consistent way, I will set
|
||
negative durations to `NA`. I will keep `time_ms.start` and `time_ms.stop`
|
||
in the data frame, so it is clear why there are no durations. Maybe it
|
||
would also be useful to keep `logfileid.start` and `logfileid.stop` in the
|
||
data? Maybe just for proof checking this theory...
|
||
|
||
Part of it was that timestamps that are part of the log file names are not
|
||
zero-left-padded. But this fixed only three `move` events, since it only
|
||
fixed irregularities *within* one log file.
|
||
|
||
```{r}
|
||
table(dat_all[dat_all$duration < 0, "event"])
|
||
|
||
# flipCard move openPopup openTopic
|
||
# 562 100 34 284
|
||
|
||
|
||
dat[dat$event %in% c("Transform start", "Transform stop"), ][1100:1300,]
|
||
# --> got fixed by left padding... but only three all together!!
|
||
|
||
dat_all[735, ]
|
||
|
||
## what it looked like before left padding
|
||
# 1422 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_2_57.log 2016-12-15 12:12:56 599671 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26874254
|
||
# 1423 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57 621 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26523465
|
||
# 1424 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57 677 Transform stop 076 076.xml NA 2092.25 2008.00 0.2997736 13.26239605
|
||
# 1425 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57 774 Transform start 076 076.xml NA 2092.25 2008.00 0.2999345 13.26239605
|
||
# 1426 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_12_57.log 2016-12-15 12:12:57 850 Transform stop 076 076.xml NA 2092.25 2008.00 0.2997107 13.26223362
|
||
# 1427 ../data/haum_logs_2016-2023/_2016b/2016_11_15-12_2_57.log 2016-12-15 12:12:57 599916 Transform stop 076 076.xml NA 2092.25 2008.00 0.2997771 13.26523465
|
||
|
||
## what it looks like now
|
||
# 1422 2016_11_15-12_02_57.log 2016-12-15 12:12:56 599671 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26874254
|
||
# 1423 2016_11_15-12_02_57.log 2016-12-15 12:12:57 599916 Transform stop 076 076.xml NA 2092.25 2008.00 0.2997771 13.26523465
|
||
# 1424 2016_11_15-12_12_57.log 2016-12-15 12:12:57 621 Transform start 076 076.xml NA 2092.25 2008.00 0.3000000 13.26523465
|
||
# 1425 2016_11_15-12_12_57.log 2016-12-15 12:12:57 677 Transform stop 076 076.xml NA 2092.25 2008.00 0.2997736 13.26239605
|
||
# 1426 2016_11_15-12_12_57.log 2016-12-15 12:12:57 774 Transform start 076 076.xml NA 2092.25 2008.00 0.2999345 13.26239605
|
||
# 1427 2016_11_15-12_12_57.log 2016-12-15 12:12:57 850 Transform stop 076 076.xml NA 2092.25 2008.00 0.2997107 13.26223362
|
||
```
|
||
|
||
`time_ms` does not increase from log file to log file
|
||
|
||
```{r}
|
||
tmp1 <- dat[!duplicated(dat$fileid), c("fileid", "time_ms", "event")]
|
||
tmp2 <- dat[!duplicated(dat$fileid, fromLast=T), c("fileid", "time_ms", "event")]
|
||
tmp <- rbind(tmp1, tmp2)
|
||
tmp <- tmp[order(tmp$fileid), ]
|
||
head(tmp, 50)
|
||
|
||
plot(time_ms ~ as.factor(fileid), dat[1:2000, ], xlab = "fileid")
|
||
```
|
||
|
||
## x,y-coordinates outside of display range
|
||
|
||
The display is a 4K-display with 3840 x 2160 pixels. When you plot the
|
||
start and stop coordinates, the display is clearly to distinguish. However,
|
||
a lot of points are outside of the display range. This can happen, when the
|
||
art objects are scaled and then moved to the very edge of the table. Then
|
||
it will record pixels outside of the table. These are actually valid data
|
||
points and I will leave them as is.
|
||
|
||
```{r}
|
||
par(mfrow = c(1, 2))
|
||
plot(y.start ~ x.start, dat)
|
||
abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
|
||
plot(y.stop ~ x.stop, dat)
|
||
abline(v = c(0, 3840), h = c(0, 2160), col = "blue", lwd = 2)
|
||
|
||
|
||
aggregate(cbind(x.start, x.stop, y.start, y.stop) ~ 1, dat, mean)
|
||
```
|
||
|
||
|
||
## Timestamps repeat
|
||
|
||
|
||
|
||
## Popups from glossar cannot be assigned to a specific artwork
|
||
|
||
|
||
## Assign a case variable based on "time heuristic"
|
||
|
||
## A `move`event does not record any change
|
||
|
||
## Add moves to `trace` variable
|
||
|
||
## openPopup does not close correctly
|
||
|
||
The sorting had to include `popup` otherwise nested events could not be
|
||
closed correctly.
|
||
|
||
```{r}
|
||
# TODO: Some correct entries are not closed:
|
||
df[df$trace == 1843, ]
|
||
# WHY???
|
||
# --> Wrong eventid!
|
||
dat5[dat5$trace == 1843, ]
|
||
openPopup_wide[openPopup_wide$trace == 1843, ]
|
||
```
|
||
## Events that only close (`date.start` is NA)
|
||
|
||
It looks like there is some kind of log error for the events that do not
|
||
have a start stop. I was able to get rid of most by sorting for `popup` for
|
||
the openPopup events, but there are still some left (50 for the small data
|
||
set, which corresponds to 0.2 per mill).
|
||
|
||
```{r}
|
||
# remove all events that do not have a `date.start`
|
||
dim(dat_all[is.na(dat_all$date.start), ])
|
||
dat_all <- dat_all[!is.na(dat_all$date.start), ]
|
||
# TODO: Find out how it can be that there is only a `date.stop`
|
||
## --> happens, when event is not properly closed, see here:
|
||
df[df$trace == 1843, ]
|
||
dat_openPopup[dat_openPopup$trace == 1843, ]
|
||
## --> still 50 (small data set) left, and some really do not seem to be
|
||
## opened! Must be a log error
|
||
# --> others should be closed!
|
||
dat[31000:31019,] # this one e.g.
|
||
# --> Actually NOT! card gets flipped before! Again - log error!
|
||
```
|
||
Will probably just get rid of them!
|
||
|
||
Think about if you want give warning messages about these deletions in the
|
||
functions.
|
||
|
||
## Card indices go from 0 to 7 (instead of 0 to 5 as expected)
|
||
|
||
See `questions_number-of-cards.R` for details.
|
||
|
||
## Extracting topics
|
||
|
||
When I extract the topics from `index.html` I get different topics, than
|
||
when I get them from `<artwork>.html`. At first glance, it looks like using
|
||
`index.html` actually gives the wrong results.
|
||
|
||
```
|
||
topics <- extract_topics(artworks, "index.xml", path)
|
||
topics2 <- extract_topics(artworks, paste0(artworks, ".xml"), path)
|
||
|
||
topics[!topics$file_name %in% topics2$file_name, ]
|
||
# artwork file_name topic index
|
||
# 072 072_artist.xml artist 1
|
||
# 073 073_artist.xml artist 1
|
||
# 110 110_technik.xml technik 2
|
||
topics2[!topics2$file_name %in% topics$file_name, ]
|
||
# artwork file_name topic index
|
||
# 031 031_vergleich.xml extra info 6
|
||
# 033 033_technik.xml technik 2
|
||
# 055 055_vergleich4.xml extra info 5
|
||
# 063 063_thema3.xml thema 3
|
||
# 063 063_extrainfo1.xml thema 4
|
||
# 072 072_artist2.xml artist 1
|
||
# 073 073_artist2.xml artist 1
|
||
# 099 099_technik.xml technik 2
|
||
# 110 110_technikneu.xml technik 2
|
||
```
|
||
|
||
For artwork 031, `index.html` only defines 5 cards (the 6th is commented
|
||
out), but `topicNumber` for this artwork has 6 different entries. I will
|
||
therefore extract the topics from `<artwork>.html`. (This seems also better
|
||
compatible with other data sets like 8o8m.
|
||
|
||
# Reading list
|
||
|
||
* @Arizmendi2022 [$-$]
|
||
* @Bannert2014 [x]
|
||
* @Bousbia2010 [$-$]
|
||
* @Cerezo2020
|
||
* @GerjetsSchwan2021 [x]
|
||
* @Goldhammer2020
|
||
* @Guenther2007
|
||
* @HuberBannert2023 [x]
|
||
* @Kroehne2018
|
||
* @SchwanGerjets2021 [x]
|
||
* @vanderAalst2016 [Chap. 2, x]
|
||
* @vanderAalst2016 [Chap. 3]
|
||
* @vanderAalst2016 [Chap. 5, x]
|
||
* @Wang2019
|
||
|
||
# Open stuff
|
||
|
||
* Angle from which people approach table in Braunschweig? Consider in
|
||
rotation variable?
|
||
* Time limit for `case` variable different for different events? (openTopic
|
||
should be opened the longest)
|
||
--> I think this is not relevant since I am looking at time *between*
|
||
events!
|
||
|
||
# Stuff AK found interesting
|
||
|
||
* Pre/post corona
|
||
* Identify school classes
|
||
* How many persons are present at the table?
|
||
|
||
# Other potential questions
|
||
|
||
* "Bursts"
|
||
* 1st vs. 2nd half of the day
|
||
* Can we identify "types of art"? With clustering or something?
|
||
* Possible to estimate how many persons per day? Maybe average of certain
|
||
weekdays? ... ?
|
||
|