add runtime backends detection and skip cuda tests if needed

6 jobs for feature_tests in 11 seconds (queued for 7 minutes and 53 seconds)