Hey, EMR question -
asked it on Stackoverflow as well:
In an EMR cluster, I run multiple Spark steps.
Steps may or may not have the same name.
I want to monitor the number of failed steps, grouped by the step name.
EMR triggers EventBridge events for a step status change, but I want numbers: the goal is to trigger an alarm if more than (say) 5 steps with the same name failed within (say) the last hour.
Was hoping to get a Cloudwatch metric counting failed steps, with a dimension of the step name. Can I achieve that?