Berlin in 2.01.39

BMW Berlin-Marathon 2018 World Record set by Eliud Kipchoge

A Comparative Data Analysis of Eliud Kipchoge’s World Record at the 2018 Berlin Marathon

A Brief Overview

In the time since Eliud Kipchoge’s Breaking2 attempt on the Monza race-track in Italy, which secured him a place in the history books with 2 hours and 25 seconds over the marathon distance, he has seemed inevitable that he would go on to break the current marathon world-record, secured held by [Denis Kimetto] (https://www.iaaf.org/news/report/marathon-world-record-dennis-kimetto-berlin) since 2014 with a time of 2:02:57. And so, as Sunday morning approached the talk seemed less about if the current world-record would fall, and more about by how much.

Since 1998 there have been 7 new male marathon world-records set on Berlin’s fast and flat course, 3 of them since 2010. Would we see a another? Sure enough, as if to order, Eliud Kipchoge served the latest world-record to Berlin crowds, crossing the line in an incredible 2:01:39, smashing the previous record by 1 minute and 18 seconds; we haven’t seen a record breaking margin of more than a minute since Derek Clayton lowered the world record from 2:12:00 to 2:09:37 in 1967; which is currently ranked 1,203 in the world. It was all the more impressive because the 33-year-old Kenyan achieved it while running the last 17km on his own after the last of his pacers, not to mention the rest of the field, drifted away long before the finish.

sub_210 %>% 
  filter(runnier == "Derek Clayton") %>% pull(rank)
## [1] "1203" "2212"

2:01:39 in Comparison

How does Kipchoge’s new record compare with previous WRs? The great thing about Berlin is that it attracts the best of the best and so not surprisingly quite a few WRs have been established there. This means we can perform a like-for-like comparison by comparing Kipchoge’s race to (in our dataset) the previous 5 fastest times in Berlin:

#' AN INTERATIVE TO SHARE RESULTS MAYBE BASED IN SHINY ... 
top_times %>% arrange(rank) %>%
  filter(str_detect(marathon, "Berlin")) %>% 
  head(5) %>% 
  knitr::kable()
rank runner home marathon km_5 km_10 km_15 km_20 km_21.1 km_25 km_30 km_35 km_40 km_42.2
1 Eliud Kipchoge Kenya Berlin 2018 00:14:24 00:29:01 00:43:38 00:57:56 01:01:06 01:12:24 01:26:45 01:41:01 01:55:32 02:01:39
2 Dennis Kimetto Kenya Berlin 2014 00:14:42 00:29:24 00:44:10 00:58:36 01:01:45 01:13:08 01:27:38 01:41:47 01:56:29 02:02:57
3 Kenenisa Bekele Ethiopia Berlin 2016 00:14:21 00:29:00 00:43:37 00:58:02 01:01:11 01:12:47 01:27:30 01:42:01 01:56:55 02:03:03
5 Wilson Kipsang Kenya Berlin 2016 00:14:20 00:29:00 00:43:37 00:58:02 01:01:11 01:12:47 01:27:26 01:41:56 01:56:56 02:03:13
6 Emmanuel Matai Kenya Berlin 2014 00:14:42 00:29:24 00:44:09 00:58:35 01:01:46 01:13:08 01:27:37 01:41:48 01:56:36 02:03:13

Figure 2 shows the pacing (in decimal mins/km) for all 4 WRs across each of the 5km segments of the race (5 km, 10 km, …, 40km), and the final 2.195km segment. In each case the dashed line reflects the average pace for the runner in question. Based on the timing data released by the Berlin Marathon, Kipchoge ran the first 5 km in 14 mins, 24 seconds, or anout 2:53 mins/km, and he ran between 2:52 mins/km and 2:56 mins/km until the final 2.195 km stretch, which he dispatched at just under 2:48 mins/km pace, faster than any of last three male WRs managed in Berlin.

berlin_top <- top_times %>% arrange(rank) %>%
  filter(str_detect(marathon, "Berlin")) %>% 
  #' WE CAN ADD MORE RUNNERS IF WE CAN GET ALL THE PLOTS TO BE SIDE TO SIDE
  #' ELIUD HAS TWO RECORDS IN THE TOP 6
  slice(c(1:5,7)) %>% select(rank) %>% 
  inner_join(top_times_splits, by = "rank") %>%
  mutate(runner = factor(runner, levels = berlin_best))

berlin_top %>% select(runner, split_num, distance_ran, seconds_split) %>%
  #' CALCULATE ESTIMATED PACE
  pace_calculator(distance = "distance_ran", 
                  metric = "km", 
                  seconds = "seconds_split") %>%
  #' CALCULATE OVERALL PACE
  group_by(runner) %>%
  mutate(overall_pace = sum(seconds_split) / 26.2,
         overall_pace = 
           as.numeric(paste(trunc(overall_pace / 60), 
                            round(overall_pace %% 60), sep = "."))) %>%
  ungroup() %>%
  #' VISULIZATION OF SPLITS
ggplot(aes(split_num, mi_pace)) + 
  stat_mean_line(color="red", aes(split_num, overall_pace)) +
  geom_point() +
  scale_x_discrete(breaks=c(10, 20, 30, 40)) +
  #' IT WOULD BE NICE FOR THE RUNNIER TO BE RANKED IN DISTANCE
  facet_wrap(~ runner, ncol = 6) + 
   labs(title = "Berlin's Top Runners", 
       subtitle = "Figure 1 - Less than 3 Seconds Per Mile Distinguishes the World's Best Marathoners", 
       x = "Kilometer Splits",
       y = "Per Mile Pace") +
  pretty_theme_avg_pace

2:01:39 by the Splits

Kipchoge, and Kimetto before him, secured their world-records with negative splits, running the second half of the race more than 30 seconds faster than the first half. In contrast Kipsang and Musyoki secured their records with slight positive splits, running the second half about 10 seconds slower than the first; see Figure [X]. Since 2010 about half of the Berlin winners have run negative splits, averaging a 30-second difference, and half have run a positive split, averaging a 45-second difference.

top_times %>% arrange(rank) %>%
  filter(str_detect(marathon, "Berlin")) %>%
  #' WE CAN ADD MORE RUNNERS IF WE CAN GET ALL THE PLOTS TO BE SIDE TO SIDE
   slice(c(1:5,7)) %>% select(rank) %>% 
  inner_join(top_times, by = "rank") %>%
  mutate(runner = factor(runner, 
                         levels = berlin_best)) %>%
  mutate(
  first_half = as.numeric(km_21.1),
  second_half = as.numeric(km_42.2 - km_21.1),
  half_split = second_half - first_half
  ) %>%
  #' HISTOGRAM OF NEGATIVE SPLITS
  ggplot(aes(runner, half_split)) +
  geom_bar(stat = "identity", aes(fill = runner), legend = FALSE) +
  geom_text(aes(
  label = paste(half_split),
  vjust = ifelse(half_split >= 0, 0, 1)
  )) +
  scale_y_continuous() +
  theme(
  axis.title.x = element_blank(), #' REMOVES RUNNER FOR THE AXIS TITLE
  axis.text.x = element_blank(),  #' REMOVES NAMES FROM THE X AXIS
  axis.ticks.x = element_blank()  #' REMOVES THE DASHES
  ) +
  # MODIFY LEGEND TITLES
  labs(title = "Berlin's Top Runners", 
       subtitle = "Figure 2 - Kipsang, the King of Positive Splits", 
       x = "",
       y = "Half Split (Seconds)") +
  scale_fill_manual(values=c("#0062B4FF", "#0062B4FF", "#FF4C4CFF", "#FF0000FF", "#4883B4FF", "#FF9999FF")) + 
  pretty_theme
## Warning: Ignoring unknown parameters: legend

2:01:39 from Behind

Finally, let’s replay these five Berlin WRs in the same (virtual) race, to get a better sense of how the lead pack might have looked had these incredible runners toed the line together at the height of their performance, to run their WR races. Obviously we cannot account for the additional competitive tension that this might have introduced, but we can at least compare their pacing and timing information to get a better sense of how this lead pack might have developed.

  berlin_top %>%
  select(runner, split_num, time) %>%
  #' CREATE A PROPER COLUMN NAME
  mutate(runner = str_replace_all(runner, " ", "_")) %>%
  spread(runner, time) %>%
  select(split_num, Eliud_Kipchoge, everything()) %>%
  #' DEVELOPE A VIRTUAL RUNNER THAT RUN AN EVEN WR PACE
  mutate(Virtual_Runner = 
           round(
           (period_to_seconds(hms("2:01:38")) / 42.195) * 
           as.numeric(as.character(split_num)))) %>%
  #' TAKE THE DIFFERENCE OF A RUNNER SPLIT COMPAIRED TO WORLD RECORDS PACE
  mutate_at(.vars = str_replace_all(berlin_best, " ", "_"),
            funs(!! .$Virtual_Runner - .)) %>%
  #' THE VIRTUAL RUNNER SHOULD ALWAYS BE ON WR PACE
  mutate(Virtual_Runner = 0) -> berlin_wr_pace


berlin_wr_pace %>% 
  gather(-split_num, key = "runner", value = "second_to_wr") %>%
  mutate_at("runner", ~ str_replace_all(.,"_", " ")) -> berlin_wr_pace_long

berlin_wr_pace_long %>%
  mutate(runner = factor(runner, levels = c("Virtual Runner", berlin_best))) %>%
  ggplot(aes(x = split_num, y = second_to_wr, group = runner)) + 
  geom_line(aes(linetype = runner)) +
  geom_point() + 
  labs(title = "Seconds Behind a World Record", 
       subtitle = "Figure 3 - Kipchoge Seperates Himself from the Virtual Pack After the Half Marathon", 
       x = "Kilometer Split",
       y = "Seconds to a World Record") +
  labs(linetype = "Runner") +
  pretty_theme_seconds

Using Kipchoge’s WR run as the baseline, Figure 6 shows the number of minutes each recent world-record holder was behind Kipchoge at the end of each 5 km race segment (and at the finish-line) in this virtual race. Kipchoge leads from the start, stays in front, and gradually but steadily extends his lead after the 15km mark. Kipsang stays in touch with Kipchoge during the first 15km of the race, getting to within just over 7 seconds by the end of the 15 km mark. But after this, begins to drop back. Meanwhile, Kimetto, after spending the first 20 km in 4th position (by 20 km he is 40 seconds behind Kipchoge), moves into second place and starts to recover some ground to get within 35 seconds of Kipchoge by the end of 35 km. Despite the beginnings of what might have been a late surge by Kimetto, Kipchoge is too strong and extends his now unassailable lead all the way to the end, finishing a full 78 seconds ahead of Kimetto, just over 94 seconds ahead of Kipsang, and almost 119 seconds ahead of Musyoki. To put this another way, we can estimate how far behind (in distance rather than time) each of Kimetto, Kipsang, and Musyoki would have been when Kipchoge crossed the line, based on their average race paces. The results of this are shown in Figure 7: Kimetto would have finished 446m (almost half a kilometer) behind Kipchoge; Kipsang would have been just under 600m behind; and Musyoki would have been 675m back. Not even even close!

berlin_mps <- berlin_top %>%
  group_by(runner) %>%
  summarise(finishing_time = max(time),
         mps = 42195 / finishing_time) %>%
  ungroup() %>%
  select(- finishing_time)
   
  
  berlin_wr_pace_long %>%
  filter(runner != "Virtual Runner") %>%
  mutate(runner = factor(runner, levels = berlin_best)) %>%
  left_join(berlin_mps, by = "runner") %>%
  filter(split_num == "42.2") %>%
  mutate(meter_to_wr = second_to_wr * (-mps)) %>%
  ggplot(aes(x = fct_rev(runner), y = meter_to_wr, fill = meter_to_wr)) +
  geom_bar(stat="identity") +
  scale_fill_continuous(low = "#FFCCCCFF", high = "#FF2626FF") +
  labs(title = "Meters to a World Record", 
       subtitle = "Figure 4 - On a 400 Meter Track, Kipchoge would have Lapped the World's Best", 
       x = "Runner",
       y = "Meter to a World Record") +
  coord_flip() + 
  pretty_theme_meters

By any objective measure Kipchoge’s Berlin world-record race was nothing short of stunning. He obliterated Kimetto’s 2014 record, set on the same course, and his incredibly disciplined negative split is all the more impressive because he did it largely on his own, after dropping the last of his pacers shortly after the halfway point. It seems right and proper that on September 16th 2018 the fastest marathoner also ran the fastest marathon.

Avatar
Leonardo Palomera
Data Scientist