Rerun failed Parallel Runs | Feature Requests

Rerun failed Parallel Runs

under review

Justin Houck

We have parallelism set to 28 to run our tests. If a single parallel run fails, we currently have to rerun the entire job. It would be great to just re-run the parallel run that failed.

CCI-I-1329

January 22, 2020

Nick Chursin

https://ideas.circleci.com/cloud-feature-requests/p/test-rerun-ui-button-on-job-dashboard

Sebastian Lerner

Hi folks, we have a feature currently in Closed Preview (beta) that we believe addresses the root of the issue in this post: https://circleci.com/docs/rerun-failed-tests-only/. We're hoping to make the feature available to all in an Open Preview soon. You can start making the minor updates to your config.yml now to be prepared for when the feature is available to all users, details in the docs linked above.

If you have a use case that requires you to rerun the full parallel run instead of just running the failed tests, I'd love to hear about it.

Gregory Haddow

Sebastian Lerner: would love to give this a try we have our own work around for this currently that still requires us to rerun all of the tests assigned to an individual test node in a parallel job. I am interested in seeing how this will work with our pipeline. Am concerned the partial run may be problematic for coverage and and other test metrics. Is there an option that will duplicate our own behaviour which is to rerun only nodes but that will still result in running all tests assigned to that node?

Roman Ivanov

Sebastian Lerner: please active for me or my company too.

Donald Tyler

Sebastian Lerner: thanks for the update. This feature addresses some of the scenarios that this feature request is needed for, but not all.
A node can fail for reasons other than a failed test. Including, but not limited to:
* Problems with third party services, e.g. pulling images from GCR, Docker Hub, etc
* An issue with CircleCI's infrastructure
So we would still like the ability to explicitly request that a certain node within the parallel job be rerun from scratch. Not just the failed tests.

Bastian Krol

Sebastian Lerner: I have set up our tests as described in the docs (https://github.com/instana/nodejs/pull/779) , but I don't see the "Rerun failed tests only" option. I assume it is still in closed beta? Is there any chance you could activate this for our account (github/instana) as well?

Sebastian Lerner

Hey folks, an FYI that the "rerun failed tests only" functionality is now available to any CircleCI user. Feel free to reach out if there are any questions. https://discuss.circleci.com/t/product-launch-re-run-failed-tests-only-circleci-tests-run/47775/51

Donald Tyler thanks for clarifying, makes total sense. This is something that we're evaluating how to enable, it is unfortunately not trivial.

Matt Rubin

Please add this. It's a huge product deficiency and painpoint in our pre-merge testing.

Liam Sharp

Totally agree. We're using 20 machines on a job that takes about 15 mins. If 1 of the parallel runs fails (due to some flakiness out of our control) re running just that parallel run vs all 20 is the difference between £3 and £0.15 in terms of credits, so maybe this is why this hasn't been addressed yet.

Jeff Fairley

Hi everyone. I just wanted to share that I've been using a job matrix rather than parallelism for the stated issues. (screenshot attached)

Using

<<parameters.index>>

in my jobs has been a great functional equivalent to

$CIRCLE_NODE_INDEX

If you need the typical environments provided with parallelism (maybe for the

circleci tests

cli command), they can be provided like so:

echo 'export CIRCLE_NODE_INDEX=<<parameters.index>>' >> $BASH_ENV

echo 'export CIRCLE_NODE_TOTAL=<<parameters.total>>' >> $BASH_ENV

I hope this helps others, and I hope CircleCI implements individual parallel job restart soon!

William Tait

+1 Thanks for the update Dawit. It would be great to hear from the CircleCI team about some plans that address the original feature request (specifically for parallel runs)

Dawit Gebregziabher

Hi everyone, thanks for the feedback. We understand this is an issue and we have been exploring solutions. We are currently working on a feature to help your jobs fail fast in the event of a failure in your test suite. This will save time and credits and should be available in Preview soon. I'll be following up here with updates. 
In the future, we plan to explore solutions like rerunning only failed tests and running failed tests first. While these features won't necessarily be scoped at the parallel run level, the goal is to improve overall test suite run efficiency. 
As always, your feedback is very important to us so please continue to upvote and comment here with your feedback and questions!

Jake Cozart

Dawit Gebregziabher: For what it's worth... we really just want the ability to re-run any job step (failed or succeeded). Our use case is we user circle CI to kick off deployments (spins up a machine with the build number and talks to AWS to kick off code deploy / monitor for success). Occasionally when deploying there is a manual step (migrating the database). Once completed we would like to just kick off that step again to finish the deploy. OR it would be nice to kick off a build step on a completed run to rollback software if needed.

We can already do this with SSH but it leaves a machine running in the background that we have to kill in CircleCI. If we could simply re-run a step without SSH that would be amazing!

Mike LaRocca

Dawit Gebregziabher: Yeah this solution worries me a bit because you are limiting data gathering. Fail fast means you only get signal on a specific % of your tests. I want to know the whole picture but only want to follow up on what needs to be followed on.

I'm sure it's an opt-in feature but not sure it really addresses the root cause at all (aside from cost savings)

Chang Wang

Dawit Gebregziabher: This doesn't quite help our situation. 
we unfortunately have some flakey e2e tests
different parts of the test might flake on each run
would like to rerun only the failed parallel runs and prevent the successful ones from rerunning

Donald Tyler

Dawit Gebregziabher: Thanks for the info, but unfortunately I am in the same situation as Chang Wang. We want to retry flaky tests, which failing fast won't help with.

jake.scott@flockfreight.com

+1 can you all please do the right thing here? This results in a colossal waste of time and energy (both in terms of human energy and electrical). I don't want to have to manually implement retryable parallel jobs using a dynamic config.

ismail.jattioui1@gmail.com

+ 1 pls

ismail.jattioui1@gmail.com

Barbara Nichols hi are there any updates on this pls ?

Donald Tyler

This feature is desperately needed. Your customers are wasting SOOOOO many credits by re-running unnecessary nodes. My guess is that's why this hasn't been implemented yet, as doing so would lose you money. But this is definitely an anti-pattern and is toxic to your customers. Please fix!

→