Skip to content

Service with service-router configured cannot get routed by consul api gateway #22575

@Holi0317

Description

@Holi0317

Overview of the Issue

When a service got service-router configuration, reaching the service via consul api gateway will result in HTTP 503 error.

This does not affect service with only service-resolver configuration, or reaching the service via normal envoy connect proxy.

I'm trying to do blue-green deployment via nomad and consul in our test environment. We use consul api gateway for ingress routing and are looking at service routing in consul to do traffic switching on deployment.


Reproduction Steps

Run consul agent locally with consul agent -dev, then run the following script to setup consul configs.

docker run --rm --name nginx -p 8080:8080 -d nginx

# Not sure if this is necessary, but got this in my cluster
cat <<EOF | consul config write /dev/stdin
Kind = "proxy-defaults"
Name = "global"

Config {
  protocol = "http"
}
EOF

# Create apigw config
cat <<EOF | consul config write /dev/stdin
Kind = "api-gateway"
Name = "api-gateway"

Listeners = [
  {
    Port     = 8081
    Name     = "http"
    Protocol = "http"
  }
]
EOF

# Register our web service
consul services register -name=web -port=8080

# Route everything to web
cat <<EOF | consul config write /dev/stdin
Kind = "http-route"
Name = "web"

// Rules define how requests will be routed
Rules = [
  {
    Matches = [
      {
        Path = {
          Match = "prefix"
          Value = "/"
        }
      }
    ]
    Services = [
      {
        Name = "web"
      }
    ]
  }
]

Parents = [
  {
    Kind        = "api-gateway"
    Name        = "api-gateway"
    SectionName = "http"
  }
]
EOF

# Probably doesn't need service service-intentions, but keep in here just in
# case
cat <<EOF | consul config write /dev/stdin
Kind = "service-intentions"
Name = "web"

Sources = [
  {
    Name   = "api-gateway"
    Action = "allow"
  }
]
EOF

# Resolver config. Only blue (untagged) is actually used here
# Note backtick here is escaped for bash.
cat <<EOF | consul config write /dev/stdin
Kind = "service-resolver"
Name = "web"

DefaultSubset = "blue"

Subsets = {
  blue = {
    Filter      = "Service.Tags is empty or \`blue\` in Service.Tags"
    OnlyPassing = true
  }
  green = {
    Filter      = "\`green\` in Service.Tags"
    OnlyPassing = true
  }
  beta = {
    Filter      = "\`beta\` in Service.Tags"
    OnlyPassing = true
  }
}
EOF

cat <<EOF | consul config write /dev/stdin
Kind = "service-router"
Name = "web"

Routes = [
  {
    Match {}

    Destination {
      Service = "web"

      # Just to mark this config is in effecvt
      ResponseHeaders = {
        Add = {
          "x-match" = "1"
        }
      }
    }
  }
]
EOF

Run envoy proxy with consul connect envoy -gateway=api -service=api-gateway -register -- --log-level debug

Send HTTP request with curl -v http://127.0.0.1:8081

Envoy got the following log message which might be useful

[2025-08-13 16:47:44.380][720135][debug][router] [source/common/router/router.cc:522] [Tags: "ConnectionId":"1","StreamId":"16551638292998683569"] unknown cluster 'web.default.dc1.internal.48dee96d-3403-b310-c8d5-205cf35791a7.consul'
[2025-08-13 16:47:44.380][720135][debug][http] [source/common/http/filter_manager.cc:1040] [Tags: "ConnectionId":"1","StreamId":"16551638292998683569"] Preparing local reply with details cluster_not_found

Meanwhile consul's xDS-related log line after writing service-router or restarting the gateway is:

2025-08-13T16:42:09.966-0400 [DEBUG] agent.envoy.xds: generating cluster for: service_id=api-gateway xdsVersion=v3 cluster=blue.web.default.dc1.internal.48dee96d-3403-b310-c8d5-205cf35791a7.consul

Note the blue. prefix in cluster on consul side and lack thereof on envoy side.

Run consul config delete -kind service-router -name web to delete the service router. On my test env (on aws) this would resume connectivity to the service, but I can't get it work in local. Local dev is returning no healthy upstream. Guess I made some mistake in this reproduction script.

Consul info for both Client and Server

Client & Server info (dev agent)
agent:
	check_monitors = 0
	check_ttls = 0
	checks = 0
	services = 2
build:
	prerelease = 
	revision = 
	version = 1.21.3
	version_metadata = 
consul:
	acl = disabled
	bootstrap = false
	known_datacenters = 1
	leader = true
	leader_addr = 127.0.0.1:8300
	server = true
raft:
	applied_index = 134
	commit_index = 134
	fsm_pending = 0
	last_contact = 0
	last_log_index = 134
	last_log_term = 2
	last_snapshot_index = 0
	last_snapshot_term = 0
	latest_configuration = [{Suffrage:Voter ID:4051693c-c0e0-ae1e-29e5-39807c8c73d8 Address:127.0.0.1:8300}]
	latest_configuration_index = 0
	num_peers = 0
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Leader
	term = 2
runtime:
	arch = amd64
	cpu_count = 14
	goroutines = 228
	max_procs = 14
	os = linux
	version = go1.24.5
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 1
	event_time = 2
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1
serf_wan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1

Operating system and Environment details

Arch Linux, x86_64.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions