diff --git a/install_config/aggregate_logging.adoc b/install_config/aggregate_logging.adoc index 106556a828ed..dd35b7caf41c 100644 --- a/install_config/aggregate_logging.adoc +++ b/install_config/aggregate_logging.adoc @@ -473,6 +473,64 @@ $ oc delete secret logging-fluentd logging-elasticsearch \ logging-kibana-ops-proxy ---- + +== Upgrading the EFK Stack + +### OpenShift Origin 1.1 to 1.2 / OpenShift Enterprise 3.1 to 3.2 + +To upgrade your EFK stack to newer images, the following steps will help +minimize disruption to your log data. + +Scale down your Fluentd instances to 0. + +---- +$ oc scale dc/logging-fluentd --replicas=0 +---- + +Wait until they have properly terminated, this gives them time to properly +flush their current buffer and send any logs they were processing to +Elasticsearch. This helps prevent loss of data. + +You can scale down your Kibana instances at this time as well. + +---- +$ oc scale dc/logging-kibana --replicas=0 +$ oc scale dc/logging-kibana-ops --replicas=0 (if applicable) +---- + +Once your Fluentd and Kibana pods are confirmed to be terminated we can safely +scale down the Elasticsearch pods. + +---- +$ oc scale dc/logging-es-{unique_name} --replicas=0 +$ oc scale dc/logging-es-ops-{unique_name} --replicas=0 (if applicable) +---- + +Once your ES pods are confirmed to be terminated we can now pull in the latest +EFK images to use as described link:../install_config/upgrading/manual_upgrades.html#importing-the-latest-images[here], +replacing the default namespace with the namespace where logging was installed. + +With the latest images in your repository we can now begin to scale back up. +We want to scale ES back up incrementally so that the cluster has time to rebuild. + +---- +$ oc scale dc/logging-es-{unique_name} --replicas=1 +---- + +We want to tail the logs of the resulting pod to ensure that it was able to recover +its indices correctly and that there were no errors. If that is successful, we +can then do the same for the operations cluster if one was previously used. + +Once all ES nodes have recovered their indices, we can then scale it back up to +the size it was prior to doing maintenance. It is recommended to check the logs +of the ES members to verify that they have correctly joined the cluster and +recovered. + +We can now scale Kibana and Fluentd back up to their previous state. Since Fluentd +was shut down and allowed to push its remaining records to ES in the previous +steps it can now pick back up from where it left off with no loss of logs -- so long +as the log files that were not read in are still available on the node. + == Troubleshooting Kibana Using the Kibana console with OpenShift can cause problems that are easily