Analysing Foosball Data Using R - Who is the Ultimate Player?

It’s your first day at D2D CRC and you’re asked to choose a foosball player to team with. Based on the data set, who do you choose and why?

This is the question we asked a bunch of budding data scientists in a recent competition. This blog post details one of the responses received from a graduate from Carnegie Mellon University.

Please note the results of the analysis have been summarised for the purpose of the blog therefore not all of the results are shown.

BY YULIN ZHOU

Before processing the data in R, I translated Json formatted data into a .xlsx file. The following code loads data from the .xlsx file and displays the first 5 lines of the data.

library(readxl)
foosball <- read_excel("Desktop/foosball.xlsx",
    col_types = c("numeric", "text", "text",
        "text", "text", "text", "text"))
foosball
## # A tibble: 121 × 7
##       id          date Winner1 Winner2 Loser1 Loser2
description
##    <dbl>         <chr>   <chr>   <chr>  <chr>  <chr>
<chr>
## 1    120 1504580225440   Terry  Grant N    Dave   Nick B    Terry &   Grant N def.Dave & Nick B
## 2    119 1504239659312   Dave     Trent  Nick B  Grant N     Dave & Trent def. Nick B & Grant N
## 3    118 1504234112954   Dave      Dave    Ross     Ross                         Dave def. Ross
## 4    117 1504234107766   Dave  Andrew N   Terry     Hugh      Dave & Andrew N def. Terry & Hugh
## 5    116 1503469714954  Terry    Nick L Grant N   Nick B   Terry & Nick L def. Grant N & Nick B
## 6    115 1503458184492  Terry   Grant N    Dave   Asanga     Terry & Grant N def. Dave & Asanga
## 7    114 1503457553052  Terry    Asanga    Dave Andrew N    Terry & Asanga def. Dave & Andrew N
## 8    113 1502933247955   Dave    Asanga   Blake  Scott W     Dave & Asanga def. Blake & Scott W
## 9    112 1502854505582   Dave   Scott W Grant N   Nick L   Dave & Scott W def. Grant N & Nick L
## 10   111 1502433531954   Dave   Grant N    Hugh   Nick B      Dave & Grant N def. Hugh & Nick B
## # ... with 111 more rows

As shown above, for each game, there are two winners and two losers. When there is a case of one person VS two people or one person VS one person both winners or losers are the same person. Let’s extract the records for these special games first.

oneVStwo = subset(foosball, (Winner1==Winner2 | Loser2==Loser1) & (Winner1!=Winner
2 | Loser2!=Loser1))
oneVSone = subset(foosball, Winner1==Winner2 & Loser2==Loser1)
oneVSone
## # A tibble: 9 × 7
##       id         date Winner1 Winner2 Loser1 Loser2       description
##   <dbl>         <chr>  <chr>    <chr>  <chr>  <chr>             <chr>
## 1   118 1504234112954   Dave     Dave   Ross   Ross    Dave def. Ross
## 2    93 1501482440924  Brodie  Brodie  Blake  Blake Brodie def. Blake
## 3    89 1501140194829   Terry   Terry Nick B Nick B Terry def. Nick B
## 4    88 1501047668404  Nick B  Nick B  Terry  Terry Nick B def. Terry
## 5    51 1499651298167  Nick B  Nick B   Hugh   Hugh  Nick B def. Hugh
## 6    49 1499412238495   Terry   Terry   Hugh   Hugh   Terry def. Hugh
## 7    33 1498706893655    Ross    Ross   Dave   Dave    Ross def. Dave
## 8    32 1498706882114    Dave    Dave  Aaron  Aaron   Dave def. Aaron
## 9    22 1498459882177    Dave    Dave   Ross   Ross    Dave def. Ross
oneVStwo
## # A tibble: 7 × 7
##      id         date Winner1 Winner2  Loser1 Loser2                description
##  <dbl>         <chr>   <chr>   <chr>   <chr>  <chr>                      <chr>
## 1 94 1501482448767      Dave    Dave   Blake Brodie   Dave def. Blake & Brodie
## 2 92 1501222569620      Dave Brodie    Blake  Blake   Dave & Brodie def. Blake
## 3 76 1500521176415    Brodie   Hugh    Blake  Blake   Brodie & Hugh def. Blake
## 4 75 1500521163514      Hugh   Hugh    Blake Brodie   Hugh def. Blake & Brodie
## 5 69 1500419836214      Hugh   Hugh  Grant N Asanga Hugh def. Grant N & Asanga
## 6 64 1500002599809      Dave   Dave    Blake Brodie   Dave def. Blake & Brodie
## 7 63 1500002588364      Dave Brodie    Blake  Blake   Dave & Brodie def. Blake

The purpose of this analysis is to find the best player to team with. Let’s see who likes to play alone and who likes to team up with people and who is open to any type of player arrangement. I named people who like to play oneVSone as “lonely wolves” and people who like to play oneVStwo as “brave fighters”.

lonely_wolves_winners = as.list(oneVSone[,"Winner1"])
lonely_wolves_losers = as.list(oneVSone[,"Loser1"])

brave_fighters_losers = subset(oneVStwo, Loser1 == Loser2)$Loser1
brave_fighters_winners = subset(oneVStwo, Winner1 == Winner2)$Winner1
lonely_wolves_winners
## $Winner1
## [1] "Dave" "Brodie" "Terry" "Nick B" "Nick B" "Terry" "Ross" "Dave" "Dave"
lonely_wolves_losers
## $Loser1
## [1] "Ross" "Blake" "Nick B" "Terry" "Hugh" "Hugh" "Dave" "Aaron" "Ro
ss"
brave_fighters_winners
## [1] "Dave" "Hugh" "Hugh" "Dave"
brave_fighters_losers
## [1] "Blake" "Blake" "Blake"

Summary:

  • Blake likes to verse two people however he loses often (3 out of 3 times). He even lost his only oneVSone game as well, poor Blake.
  • Dave and Hugh each won twice as brave fighters.
  • Dave played oneVSone games and won three out of four times.
  • Ross played oneVSone games and won one out of three times.
  • Terry played oneVSone games and won one out of two times.
  • Nick B played oneVSone games and won two out of three times.
  • Hugh played oneVSone games twice and lost it all.

An interesting insight! Hugh wins when playing two people on his own but loses when playing one person. Who did he play the games with? It seems that Hugh lost the game to Nick B and Terry respectively. Hugh won the game against Blake and Brodie and Grant N and Asanga.

So far, Dave is the best candidate since he has strong individual ability though he seems to be a lonely wolf. I am not sure if I should team up with a lonely wolf. Let’s see if Dave is open to playing in a team too.

team_players = subset(foosball, Winner1 != Winner2 & Loser1 != Loser2)
team_players_winners = team_players[,3:4]
team_players_losers = team_players[,5:6]
team_players_winners
## # A tibble: 105 × 2
##    Winner1  Winner2
##      <chr>    <chr>
## 1   Terry   Grant N
## 2    Dave     Trent
## 3    Dave  Andrew N
## 4   Terry    Nick L
## 5   Terry   Grant N
## 6   Terry    Asanga
## 7    Dave    Asanga
## 8    Dave   Scott W
## 9    Dave   Grant N
## 10 Nick B     Trent
## # ... with 95 more rows
team_players_losers
## # A tibble: 105 × 2
##     Loser1   Loser2
##      <chr>    <chr>
## 1     Dave   Nick B
## 2   Nick B  Grant N
## 3    Terry     Hugh
## 4  Grant N   Nick B
## 5     Dave   Asanga
## 6     Dave Andrew N
## 7    Blake  Scott W
## 8  Grant N   Nick L
## 9    Hugh    Nick B
## 10   Dave   Scott W
## # ... with 95 more rows

I stored the winners and losers name for twoVStwo games in team_players_winners and team_players_losers respectively. Looking at the results, I can confirm that Dave is open to playing with team members too. He is not a lonely wolf! In fact, Brodie, Terry, Nick B, Ross, Blake, Hugh and Aaron (those who played oneVSone or oneVStwo games) all have experiences playing with Dave in their team.

The tricky thing is, when playing in a team, personal ability does not work as simply as 1 + 1 = 2. The combination of two people can either bring the best out of the team or supress individual performances. Let’s find out the all-conquering team, “lose-all-the-time” teams and interesting twoVStwo combinations.

#set the winner1 and winner2 sequence from "smaller" text to "larger" text to make
sure A & B and B & A are considered the same player combination
for (i in c(1:nrow(team_players))){
  if (team_players[i,3]>team_players[i,4]){
    temp = team_players[i,3]
    team_players[i,3] = team_players[i,4]
    team_players[i,4] = temp
  }
}
# do the same thing for losers
for (i in c(1:nrow(team_players))){
  if (team_players[i,5]>team_players[i,6]){
   temp = team_players[i,5]
   team_players[i,5] = team_players[i,6]
   team_players[i,6] = temp
  }
}
team_players$Winner_team = paste( team_players$Winner1, team_players$Winner2, sep
= " and " )
team_players$Loser_team = paste( team_players$Loser1, team_players$Loser2, sep = "
and " )
count_team_winners = table(team_players$Winner_team)
count_team_losers = table(team_players$Loser_team)


sort(count_team_winners,decreasing = TRUE)[0:25]

From the results we found that Grant N and Nick B tend to team up together, but they lost 12 games out of 20. Hugh and Nick B are also a frequent combination too and win 8 out of 15 games.

The team combination that does not perform too well are Andrew F and Blake.

  • Andrew F and Blake (lost 4 out of 4 games)
  • Blake and Brodie (lost 4 out of 4 games)
  • Andrew F and Asanga (lost 3 out of 3 games)
  • Asanga and Terry (lost 6 out of 10 games)

The team combinations that perform well are:

  • Blake and Dave (won 7 out of 7 games)
  • Hugh and Terry (won 7 out of 9 games)
  • Brodie and Dave (won 5 out of 5 games)
  • Dave and Scott W (won 4 out of 6 games)

Dave is still looking to be the best candidate because of his excellent performance in the team plays. Note that Asanga and Dave won three out of six games. This team combination’s performance is not as strong as Dave’s other team combination.

From the data we can pick out a few individuals that have a lower or higher probability to win in a team play. Individuals that have higher probability to win in a team are Dave and Hugh. Individuals that have lower probability to win in a team are Andrew F, Blake, Brodie and Asanga.

Now that we know what team combination has a high probability to win, who tends to team up together and what team combination has high probability to lose. Let’s find out who wins and loses the most either in a team or alone.

The code counts the number of  times a person wins either in a team or alone and stores the data in team_win_count, team_lose_count, indi_win_count and indi_lose_count in team_count. I sorted the dataframe team_count by column team_win_count (descending) then by column Indi_win_count (descending), then by column team_lose_count (ascending) then by column Indi_lose_count (ascending). The higher a user ranks, the better he/she plays. I added another column called win_prob as the probability of this person winning all the games he/she played.

From team_count we can tell that:

  • Dave, Terry, Hugh, Nick B, Grant N, Blake, Brodie and Asanga are frequent players.
  • Dave, Terry and Hugh win as at least twice as much as they lose. Well done!

From team_count_prob we can tell that:

  • Jie played once and he won which makes his wining probability 100%. Good strategy to keep your record pretty.
  • Dave, Hugh and Terry play a lot and have high winning rates.
  • Andrew N does not play as much but has high winning rate too.
  • Andrew F played 16 times and lost 14 of them. This is so sad.

The best candidate is Dave. What if Dave is not at work that day or he does not feel like playing? We need some backup options. Let’s scale the dimension of the data and see who are “close” to Dave.

From the plot, it looks like there are three to four clusters. Let’s see how Kmeans clusters these players.

From the plot where we set the cluster number as four, we can tell that Dave, Hugh and Terry are in a cluster. Let’s call this cluster “absolute winners”. Grant N, Blake and Nick B are in the same cluster. This cluster should be called “so-so performance”. In the green cluster, these people have a relatively low winning rate. This cluster is “not-too-good”. In the red cluster, these people do not play very much. Let’s call them “not-too-keen”

Now we know that if Dave is not available, I should choose Terry or Hugh to team up with.

We also know that foosball is a game that you can get better at if you practice more. Let’s have a look at who improves over time (assume that the date is incrementing along with game id). We are only going to look at Dave, Hugh, Terry, Grant N, Blake and Nick B.

From the plot we can tell that Dave’s performance is generally stable over the time. Terry’s performance had a sharp drop but increased all the way back to normal afterwards. Hugh’s performance increased dramatically in the early stages but dropped a little afterwards. Grant N’s performance seems to be constantly decreasing then stayed at a stable level. Blake’s performance is not so good in comparison to others. His performance stays at a stable (low) level. Nick B’s performance is increasingly stable, although he still has a long way to go before he catches up with Dave.

From the analysis above, I can conclude that my ideal teammate is Dave because:

  1. He loves to play foosball (he is a frequent player)
  2. He performs well in both team and individual games
  3. His performance is stably increasing

In case Dave is not available, my choice of teammate would be Terry or Hugh because they both are frequent players and they have good performance in team games.  Hugh and Terry currently have the same level of performance (yellow and green line merged at the end of the plot). Hugh and Terry together are a very strong team (won 7 out of 9 team games).

Maybe I would be better to team up with one of them to break their powerful team combination?