-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Report transient deployment trigger errors via API field #4941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report transient deployment trigger errors via API field #4941
Conversation
Update |
Discussed on IRC, but to recap: the change needs to originate in the deployment config generator, and the caller (e.g. the configChangeController) needs to be able to interpret that error and generate a more meaningful message. The current approach won't handle reporting the actual causes for a non-existent initial deployment, which is the primary use case. |
Talked more on IRC, and I'm now thinking it might make more sense to have the deploymentController do the state inspection you've got here every time it reconciles a DC. This would ensure the message is set upon initial DC creation (which is the primary use case), and also ensure the message is cleared/updated as the triggers mutate the config. So basically, when handling a DC, check to see "does a deployment for this DC.LatestVersion exist?" and if not, figure out some possible reasons why and update the message. If there is a deployment for the current version, clear the message. |
@ironcladlou makes sense, will update accordingly |
Thinking about this more, I believe we should not do that kind of inspection only when there is no respective deployment for the current dc version but also when the latest deployment is failed. WDYT? ps. in your first paragraph, you probably meant the deploymentConfig controller. |
Makes sense to me... if the deployment is visible and we see it failed, I don't see why not try and add more info to the config status. That might even be a good time to go look up the deployer pod and if it still exists, get some info from that as well to add to the message. The more clues we can collect and provide to the user, the better IMO.
You're right, I meant to say this work should happen in the deploymentConfig controller. |
@ironcladlou is this closer to what you were thinking? What other failures can we find out based on a dc? Inspect the template maybe? I could also see an unshedulable node as a possible failure but I am not sure about having a node client in here. Thoughts? |
|
||
// failureInspection inspects the given deployment configuration for any failure causes | ||
// and updates the status of the configuration accordingly | ||
func (c *DeploymentConfigController) failureInspection(config *deployapi.DeploymentConfig, outErr error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making this function just return the message and do the update from the calling side? Makes it easier to test and less side-effecty. I don't like doing the update in multiple places within the function, via defer, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, the code is cleaner now that the inspection happens once at start. Updated
Need to add tests |
} | ||
|
||
// Handle processes config and creates a new deployment if necessary. | ||
func (c *DeploymentConfigController) Handle(config *deployapi.DeploymentConfig) error { | ||
// Inspect a deployment configuration every time the controller reconciles it | ||
if details := c.failureInspection(config); len(details) > 0 { | ||
c.detailsUpdate(config, details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closer... there are a few mods to fit the general reconcilation model of other controllers. Something like:
details := c.findDetails(config, client)
c.updateDetails(config, details)
The findDetails
function could immediately check to see if there's a deployment for config
, and if so, return an empty string (to cause details to be cleared). It could then do all its other checks. Then findDetails
needs to only be called once, from here (the one on L94 would not be necessary).
The updateDetails
function would update config if details != config.details.Message
. That handles setting new messages and clearing outdated ones, and handles no-op'ing if there's no difference.
You should also ditch all those Namespacers and just use a single client.Interface
. The test support works just as well with those but with a lot less code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then findDetails needs to only be called once, from here (the one on L94 would not be necessary).
I was aware of it but wanted to keep population and cleaning in separate places. I guess though it doesn't really matter. Updating...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ironcladlou I am not sure what to do with errors returned from the clients when querying for resources. Pass those as the details into the DC? Let the controller handle them? Or ignore them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ironcladlou comments addressed, unit tests for findDetails added, PTAL
@ironcladlou updated the messages and tests (please use table tests with test names:) ). |
I don't see a reason to block this pull on |
}, nil | ||
}, | ||
}, | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nesting here renders like a flock of geese on my screen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol
I will add mock funcs to reduce it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol.
I was just kidding, don't sweat it
@ironcladlou @deads2k I think this is ready The oc status fix will be my next PR |
This LGTM, thanks a lot for working on it. |
Added the warning to oc status too. We could definitely traverse the graph and find out the cause sooner than the dc controller but then we would be incosistent. I think the right way is for status to follow what's reported by the controller.
|
This means that we have to report multiple bad triggers. Separate line for On Fri, Oct 9, 2015 at 2:49 PM, David Eads [email protected] wrote:
|
I do agree that this work really belongs in the graph code- I brought this up before we started and @smarterclayton and I decided to do ASAP inside the controller to get started. Please correct me if I misunderstood the preferred initial direction. My impression is since we're building a freeform message, it shouldn't be a big deal to swap in a different rendering from the graph library later. |
Controller doesn't have to use the graph library - especially if the controller is checking conditions directly as part of its core logic. |
You want to do status as a followup? I do think we should handle multiple bad triggers as markers for warnings and the separately mark a deployment that can't succeed on the same line. Something like " #1 deployment waiting on image or update (image stream doesn't exist)" |
@ironcladlou what's your opinion on reporting multiple bad triggers in the details field or the can't succeed message due to no image? I have updated findDetails likewise. |
I think adding some sort of line per trigger would be just fine... I see no harm in making the message longer/better formatted for that. |
Removed status changes and squashed down |
glog.V(2).Infof("Error while trying to get image stream %q: %v", name, err) | ||
if errors.IsNotFound(err) { | ||
details = fmt.Sprintf("The image trigger for istag/%s will have no effect because is/%s does not exist.", istag, name) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use "istag/" syntax in messages from a controller. Say "image stream tag foo:bar"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
[test] |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin/5761/) |
Not trying to re-open this issue for this pull, but why not keep the |
Because istag/ is a specific construct of one client (the CLI) and has no On Mon, Oct 12, 2015 at 8:43 AM, David Eads [email protected]
|
Squashed down to one commit and updated the commit message |
This commit adds failure inspection in the deploymentConfig controller. Every time a deploymentConfig is reconciled (every two minutes), we try to find any details about possible failures and report them to the user via the status.details.message field in the deploymentConfig. Users can use `oc describe dc/config` and see if there is any warning in their config. `oc status` should follow up these changes and report likewise.
Evaluated for origin test up to 0b38d81 |
This still LGTM but I think @deads2k and @smarterclayton will need to take another look to see if they're okay with the latest verbiage. |
I'm fine with the messages. |
[merge] |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/3621/) (Image: devenv-fedora_2465) |
Evaluated for origin merge up to 0b38d81 |
…field Merged by openshift-bot
This PR adds failure inspection in the deploymentConfig controller.
Every time a deploymentConfig is reconciled (every two minutes), we try
to find any details about possible failures and report them to the user
via the status.details.message field in the deploymentConfig. Users can
use
oc describe dc/config
and see if there is any warning in theirconfig.
oc status
should follow up these changes and report likewise.@ironcladlou @deads2k @smarterclayton PTAL
Fixes #3526
Fixes #3311