Digging into Spark Scheduler Delay

I posted the other day about the Event Timeline visualisation you can get in the Stages view of the Spark Application UI. What I didn’t cover was the Event Timeline you can get when you click through to the Stage details page. The Stage details page lists out all the tasks that are executed as part of the Stage processing; Tasks represent the actual unit of work processed by a Spark executor – there is one task for each partition.

Just like on the Stages screen there is also an option to display an Event Timeline that shows where and when each task is run. An example is shown in the diagram, it’s just running the following code:


val rdd = sc.makeRDD(1 to 1000)

rdd.count

Tasks Event Timeline

Each task is a bar on the chart and different sections of the bar are colour-coded for different stages of the task execution:

  • Scheduler Delay
  • Task Deserialization Time
  • Shuffle Read Time
  • Executor Computing Time
  • Shuffle Write Time
  • Result Serialization Time
  • Getting Result Time

What’s interesting from the diagram is that 8 of the tasks executed much quicker than the others, the key difference being the Scheduler Delay (40ms v 600ms). The reason is that I have 2 different executors running, one on my laptop (the same machine that is running the driver) and one on a machine connected by my wifi. The wifi is pretty variable with ping times ranging from 30ms to 500ms. It looks very much like Scheduler Delay includes time waiting for communications between the driver and the executor. So if you have similar symptoms (tasks on different executors experiencing different levels of scheduler delay) then network wait time may be the cause.

Leave a comment