Skip to content

Conversation

krijohs
Copy link

@krijohs krijohs commented Dec 9, 2024

Change to use metadata instead of context.WithValue to ensure each proxy watcher client has a new stream created with its token.
Previously context.WithValue resulted in streamKeyFromCtx returning an empty string in the clientv3 watcher, causing stream reuse.
When new clients connected to proxy after the token expired (token for the initial client which connected) the reused stream's context would still contain the expired token. This caused auth failures when isWatchPermitted on cluster checked the stream's context resulting in hanging proxy watcher clients.

Issue can be reproduced by setting a low --auth-token-ttl on cluster and connect 1 client first to proxy and then connect a second one after token expired.

@k8s-ci-robot
Copy link

Hi @krijohs. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@krijohs
Copy link
Author

krijohs commented Dec 19, 2024

Added test case which reproduces the issue, with included change it passes but without fails

@krijohs krijohs force-pushed the grpcproxy-client-auth-token branch from 3d3b143 to 8d9e89f Compare December 19, 2024 10:48
@krijohs
Copy link
Author

krijohs commented Jan 16, 2025

Have anyone had the chance to have a look at this? pinging from reveiwers in owners file, @fuweid @ivanvc

@ivanvc
Copy link
Member

ivanvc commented Jan 18, 2025

Hi @krijohs, thanks for your pull request. Ideally, we would want to discuss the issue and possible solutions before a pull request. Could you please open an issue so other members with more expertise in this area can jump in?

Thanks again.

@krijohs
Copy link
Author

krijohs commented Jan 22, 2025

Hello @ivanvc ok, got it will open an issue so possible solutions can be discussed, thanks.

Copy link

stale bot commented Apr 26, 2025

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 26, 2025
@github-actions github-actions bot closed this May 18, 2025
@ivanvc ivanvc reopened this Aug 14, 2025
@siyuanfoundation
Copy link
Contributor

/reopen

@ivanvc
Copy link
Member

ivanvc commented Aug 14, 2025

/ok-to-test

@github-actions github-actions bot removed the stale label Aug 15, 2025
@krijohs krijohs force-pushed the grpcproxy-client-auth-token branch from 8d9e89f to 78a893d Compare August 16, 2025 07:41
Copy link

codecov bot commented Aug 16, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.96%. Comparing base (431a65a) to head (5edb2be).
⚠️ Report is 110 commits behind head on main.

Additional details and impacted files
Files with missing lines Coverage Δ
server/proxy/grpcproxy/util.go 41.93% <100.00%> (+21.93%) ⬆️

... and 77 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #19033      +/-   ##
==========================================
- Coverage   69.21%   64.96%   -4.26%     
==========================================
  Files         419      419              
  Lines       34745    34770      +25     
==========================================
- Hits        24049    22588    -1461     
- Misses       9300    10815    +1515     
+ Partials     1396     1367      -29     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 431a65a...5edb2be. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@krijohs
Copy link
Author

krijohs commented Aug 18, 2025

/retest

@krijohs krijohs force-pushed the grpcproxy-client-auth-token branch from 100f331 to c6735a5 Compare August 18, 2025 10:40
@ivanvc
Copy link
Member

ivanvc commented Aug 20, 2025

@krijohs, could you please rebase your branch with the latest upstream main branch? I'm having issues running the tests because the base is outdated. Thanks :)

@krijohs krijohs force-pushed the grpcproxy-client-auth-token branch from c6735a5 to 873b3bc Compare August 21, 2025 08:47
@krijohs
Copy link
Author

krijohs commented Aug 21, 2025

@ivanvc sure no problem, just rebased and pushed

@krijohs
Copy link
Author

krijohs commented Aug 21, 2025

/retest

Copy link
Member

@ivanvc ivanvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although not an expert on this part of the code, I ran the tests without the fix to util.go, and they failed.

/cc @ahrtr @serathius

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ivanvc, krijohs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

require.NoError(t, clusterCli.Put(ctx, "/test/1", "test", config.PutOptions{}))
require.NoError(t, err)

time.Sleep(time.Second * 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessary.

Suggested change
time.Sleep(time.Second * 2)

Copy link
Author

@krijohs krijohs Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, was long time ago i submitted PR, but this sleep is to ensure the token has expired when the subsequent watchers connect, to be able to reproduce issue in the e2e test

Comment on lines +40 to +41
md := metadata.Pairs(rpctypes.TokenFieldNameGRPC, token)
return metadata.NewOutgoingContext(ctx, md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good.

It look like the previous implementation is wrong. All the usage on rpctypes.TokenFieldNameGRPC should be via the package google.golang.org/grpc/metadata (e.g,metadata.FromIncomingContext, metadata.NewOutgoingContext). However, grpcproxy gets the client auto token using grpc/metadata package, but re-add it using raw context WithValue method.

It seems that the grpcprocy couldn't pass the client auth token to the backend etcdserver at all. Could anyone double confirm this? thx

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From testing i noticed that the proxy did pass auth tokens to the etcd server successfully for the initial client that connected. That is, the first client connecting to the proxy worked fine and could authenticate and establish the watch. But after the initial client's token expired subsequent clients that tried to connect could not.
This can be reproduced with the e2e test

@krijohs krijohs force-pushed the grpcproxy-client-auth-token branch from 873b3bc to b1fc707 Compare September 9, 2025 13:11
@krijohs
Copy link
Author

krijohs commented Sep 9, 2025

/retest

@krijohs krijohs force-pushed the grpcproxy-client-auth-token branch from b1fc707 to 5edb2be Compare September 9, 2025 19:54
@krijohs
Copy link
Author

krijohs commented Sep 10, 2025

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants