openshift · dgoodwin · Mar 17, 2016 · ewolinetz · Mar 18, 2016 · dgoodwin
diff --git a/install_config/aggregate_logging.adoc b/install_config/aggregate_logging.adoc
@@ -473,6 +473,64 @@ $ oc delete secret logging-fluentd logging-elasticsearch \
     logging-kibana-ops-proxy
 ----
 
+
+== Upgrading the EFK Stack
+
+###  OpenShift Origin 1.1 to 1.2 / OpenShift Enterprise 3.1 to 3.2
+
+To upgrade your EFK stack to newer images, the following steps will help
+minimize disruption to your log data.
+
+Scale down your Fluentd instances to 0.
+
+----
+$ oc scale dc/logging-fluentd --replicas=0
+----
+
+Wait until they have properly terminated, this gives them time to properly
+flush their current buffer and send any logs they were processing to
+Elasticsearch. This helps prevent loss of data.
+
+You can scale down your Kibana instances at this time as well.
+
+----
+$ oc scale dc/logging-kibana --replicas=0
+$ oc scale dc/logging-kibana-ops --replicas=0 (if applicable)
+----
+
+Once your Fluentd and Kibana pods are confirmed to be terminated we can safely
+scale down the Elasticsearch pods.
+
+----
+$ oc scale dc/logging-es-{unique_name} --replicas=0
+$ oc scale dc/logging-es-ops-{unique_name} --replicas=0 (if applicable)
+----
+
+Once your ES pods are confirmed to be terminated we can now pull in the latest
+EFK images to use as described link:../install_config/upgrading/manual_upgrades.html#importing-the-latest-images[here],
+replacing the default namespace with the namespace where logging was installed.
+
+With the latest images in your repository we can now begin to scale back up.
+We want to scale ES back up incrementally so that the cluster has time to rebuild.
+
+----
+$ oc scale dc/logging-es-{unique_name} --replicas=1
+----
+
+We want to tail the logs of the resulting pod to ensure that it was able to recover
+its indices correctly and that there were no errors.  If that is successful, we
+can then do the same for the operations cluster if one was previously used.
+
+Once all ES nodes have recovered their indices, we can then scale it back up to
+the size it was prior to doing maintenance. It is recommended to check the logs
+of the ES members to verify that they have correctly joined the cluster and
+recovered.
+
+We can now scale Kibana and Fluentd back up to their previous state.  Since Fluentd
+was shut down and allowed to push its remaining records to ES in the previous
+steps it can now pick back up from where it left off with no loss of logs -- so long
+as the log files that were not read in are still available on the node.
+
 == Troubleshooting Kibana
 
 Using the Kibana console with OpenShift can cause problems that are easily