-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
Overview of the Issue
When a service got service-router configuration, reaching the service via consul api gateway will result in HTTP 503 error.
This does not affect service with only service-resolver configuration, or reaching the service via normal envoy connect proxy.
I'm trying to do blue-green deployment via nomad and consul in our test environment. We use consul api gateway for ingress routing and are looking at service routing in consul to do traffic switching on deployment.
Reproduction Steps
Run consul agent locally with consul agent -dev
, then run the following script to setup consul configs.
docker run --rm --name nginx -p 8080:8080 -d nginx
# Not sure if this is necessary, but got this in my cluster
cat <<EOF | consul config write /dev/stdin
Kind = "proxy-defaults"
Name = "global"
Config {
protocol = "http"
}
EOF
# Create apigw config
cat <<EOF | consul config write /dev/stdin
Kind = "api-gateway"
Name = "api-gateway"
Listeners = [
{
Port = 8081
Name = "http"
Protocol = "http"
}
]
EOF
# Register our web service
consul services register -name=web -port=8080
# Route everything to web
cat <<EOF | consul config write /dev/stdin
Kind = "http-route"
Name = "web"
// Rules define how requests will be routed
Rules = [
{
Matches = [
{
Path = {
Match = "prefix"
Value = "/"
}
}
]
Services = [
{
Name = "web"
}
]
}
]
Parents = [
{
Kind = "api-gateway"
Name = "api-gateway"
SectionName = "http"
}
]
EOF
# Probably doesn't need service service-intentions, but keep in here just in
# case
cat <<EOF | consul config write /dev/stdin
Kind = "service-intentions"
Name = "web"
Sources = [
{
Name = "api-gateway"
Action = "allow"
}
]
EOF
# Resolver config. Only blue (untagged) is actually used here
# Note backtick here is escaped for bash.
cat <<EOF | consul config write /dev/stdin
Kind = "service-resolver"
Name = "web"
DefaultSubset = "blue"
Subsets = {
blue = {
Filter = "Service.Tags is empty or \`blue\` in Service.Tags"
OnlyPassing = true
}
green = {
Filter = "\`green\` in Service.Tags"
OnlyPassing = true
}
beta = {
Filter = "\`beta\` in Service.Tags"
OnlyPassing = true
}
}
EOF
cat <<EOF | consul config write /dev/stdin
Kind = "service-router"
Name = "web"
Routes = [
{
Match {}
Destination {
Service = "web"
# Just to mark this config is in effecvt
ResponseHeaders = {
Add = {
"x-match" = "1"
}
}
}
}
]
EOF
Run envoy proxy with consul connect envoy -gateway=api -service=api-gateway -register -- --log-level debug
Send HTTP request with curl -v http://127.0.0.1:8081
Envoy got the following log message which might be useful
[2025-08-13 16:47:44.380][720135][debug][router] [source/common/router/router.cc:522] [Tags: "ConnectionId":"1","StreamId":"16551638292998683569"] unknown cluster 'web.default.dc1.internal.48dee96d-3403-b310-c8d5-205cf35791a7.consul'
[2025-08-13 16:47:44.380][720135][debug][http] [source/common/http/filter_manager.cc:1040] [Tags: "ConnectionId":"1","StreamId":"16551638292998683569"] Preparing local reply with details cluster_not_found
Meanwhile consul's xDS-related log line after writing service-router or restarting the gateway is:
2025-08-13T16:42:09.966-0400 [DEBUG] agent.envoy.xds: generating cluster for: service_id=api-gateway xdsVersion=v3 cluster=blue.web.default.dc1.internal.48dee96d-3403-b310-c8d5-205cf35791a7.consul
Note the blue.
prefix in cluster on consul side and lack thereof on envoy side.
Run consul config delete -kind service-router -name web
to delete the service router. On my test env (on aws) this would resume connectivity to the service, but I can't get it work in local. Local dev is returning no healthy upstream
. Guess I made some mistake in this reproduction script.
Consul info for both Client and Server
Client & Server info (dev agent)
agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 2
build:
prerelease =
revision =
version = 1.21.3
version_metadata =
consul:
acl = disabled
bootstrap = false
known_datacenters = 1
leader = true
leader_addr = 127.0.0.1:8300
server = true
raft:
applied_index = 134
commit_index = 134
fsm_pending = 0
last_contact = 0
last_log_index = 134
last_log_term = 2
last_snapshot_index = 0
last_snapshot_term = 0
latest_configuration = [{Suffrage:Voter ID:4051693c-c0e0-ae1e-29e5-39807c8c73d8 Address:127.0.0.1:8300}]
latest_configuration_index = 0
num_peers = 0
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Leader
term = 2
runtime:
arch = amd64
cpu_count = 14
goroutines = 228
max_procs = 14
os = linux
version = go1.24.5
serf_lan:
coordinate_resets = 0
encrypted = false
event_queue = 1
event_time = 2
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1
members = 1
query_queue = 0
query_time = 1
serf_wan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1
members = 1
query_queue = 0
query_time = 1
Operating system and Environment details
Arch Linux, x86_64.