How to kill a Spark job/stage on AWS EMR notebook

Alex Luo
3 min readAug 21, 2021

--

Welcome to my first ever Medium post!

In this post, I am going to demonstrate how to kill a job/stage (not the whole notebook session or Spark session), when you are using AWS EMR notebook. When you are running a EMR step, you can cancel the step on either the AWS console or using the CLI. If you want to kill the Spark session / application, you can SSH on the master node, and run yarn application -kill application_xxx.

But working with EMR notebook is not as straightforward.

I will use EMR 6.2, Spark 3.0.1 and Docker YARN runtime here. Other versions might have different way to do this.

What’s the usual way to kill a Spark job

According to the Spark documentation (https://spark.apache.org/docs/latest/web-ui.html), there is a kill button on an active stage on the Spark web UI. This allows you to kill a long running job/stage when you are developing, for easier debugging.

https://spark.apache.org/docs/latest/web-ui.html

AWS EMR notebook

However, this button is missing on the AWS EMR Spark, on both the persistent history server, and the Spark UI hosted on the master node, even with manually configured "spark.ui.killEnabled": "true"!

This makes the process of developing and debugging Spark on EMR notebook extremely frustrating. I often had to restart the notebook kernel, or even the notebook instance, to kill a long running job. Life would have been easier if I can easily kill a running job and move on to fixing what’s inefficient in the code.

Below are the steps to kill a running Spark job / stage on EMR notebook:

  1. set up SSH tunnelling between your machine and the EMR master node (see here)
  2. configure SwitchyOmega for your Chrome browser (see here)
  3. try to open one of these web UI to see if you set up SSH tunnelling successfully (see fig 1)
  4. open the Livy web UI in your Chrome, click the running Livy session (which is the session ID), sort statements by ID and find the statement you want to kill (see fig 2, 3)
  5. in another terminal window, SSH to your EMR master node, for example, run
    ssh -i ~/xxx.pem hadoop@ip-xx-xx-xx-xx.ap-southeast-2.compute.internal (see here)
  6. use the Livy REST API to kill a Livy statement (which is just the Spark job), run curl -X POST localhost:8998/sessions/{session ID}/statements/{statement ID}/cancelto kill your job, when the kill command is successful, you will see this error (fig 4) on your EMR notebook. The job is now killed and you are free to run another cell without recreating the whole Spark session (see here for other useful Livy http requests)
fig 1
fig 2
fig 3

Thank you for reading. If you know an easier or more straightforward way to do this, leave a comment to let me know!

--

--