-
Notifications
You must be signed in to change notification settings - Fork 411
MSC4186: Simplified Sliding Sync #4186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
806e455
to
8cc34df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation requirements:
- Client (ideally multiple)
- Server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current implementations to my knowledge:
- Client implementation: matrix-rust-sdk (PR)
- Server implementation: has been implemented in Synapse across 10s of PRs. (Perhaps @MadLittleMods, the implementer, can give better link(s) here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synapse server implementation
Maybe it's easiest to point to the main directories where the code lives:
element-hq/synapse
->synapse/rest/client/sync.py#L782
element-hq/synapse
->synapse/handlers/sliding_sync/
element-hq/synapse
->tests/rest/client/sliding_sync/
The PRs between @MadLittleMods and @erikjohnston with the ~A-Sync label are also a pretty good encapsulation:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conduwuit has implemented simplified sliding sync in https://github.com/girlbossceo/conduwuit/pull/666
|
||
The current `/sync` endpoint scales badly as the number of rooms on an account increases. It scales badly because all | ||
rooms are returned to the client, incremental syncs are unbounded and slow down based on how long the user has been | ||
offline, and clients cannot opt-out of a large amount of extraneous data such as receipts. On large accounts with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't data be removed with filters? (See RoomFilter
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, I believe you can opt-out of receipts with RoomFilter.ephemeral.not_types = [ "m.read.*" ]
or similar. When I was working on filtering support in grapevine, we had a fast path for filters like this that would skip reading any of the receipt data from the db server-side. That work hasn't been finished yet, and I have no idea if synapse has a similar fast-path.
One of the difficulties with v3 sync filtering in general is that performance can vary wildly wildly depending on server implementation details. The spec doesn't indicate which fast-paths should exist, and whether or not you hit a fast-path can also affect how many events are returned in the sync window (matrix-org/matrix-spec#1887).
// The position to use as the `since` token in the next sliding sync request. | ||
// c.f. Connections. | ||
"pos": "<opaque string>", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there was an effort to standardize pagination terminology?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently SCT is leaning towards: from
and next_batch
instead of pos
everywhere.
// The "heroes" for the room, if there is no room name. Only sent initially and when it changes. | ||
"heroes": [ | ||
{"user_id":"@alice:example.com","displayname":"Alice","avatar_url":"mxc://..."}, | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the heroes
membership state events need to be included in required_state
response when requesting required_state: ["m.room.member", "$LAZY"]
(lazy-loading room members)?
Sync v2 says this:
When lazy-loading room members is enabled, the membership events for the heroes MUST be included in the
state
, unless they are redundant. When the list of users changes, the server notifies the client by sending a fresh list of heroes. If there are no changes since the last sync, this field may be omitted.
But I think that's because m.heroes
in Sync v2 is only a list of user ID's. Whereas, in the sliding sync response here, we already have all of the info necessary in these stripped events.
One alternative is to match Sync v2 and only list the user ID's in heroes
and include the membership events in required_state
.
"heroes": [ | ||
{"user_id":"@alice:example.com","displayname":"Alice","avatar_url":"mxc://..."}, | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One weird corner with the current spec: If the room doesn't have a name set and the user is invited/knock to the room, we don't have any heroes
to provide here. This is because heroes
membership isn't included in the stripped state that the server receives when they are invited/knocked over federation.
Related to matrix-org/matrix-spec#380
This was solved for partial-joins via MSC3943
Not necessarily a blocker but just calling out something that won't work completely until that spec issue is solved.
# Synapse 1.114.0 (2024-09-02) This release enables support for [MSC4186](matrix-org/matrix-spec-proposals#4186) — Simplified Sliding Sync. This allows using the upcoming releases of the Element X mobile apps without having to run a Sliding Sync Proxy. ### Features - Enable native sliding sync support ([MSC3575](matrix-org/matrix-spec-proposals#3575) and [MSC4186](matrix-org/matrix-spec-proposals#4186)) by default. ([\#17648](element-hq/synapse#17648)) # Synapse 1.114.0rc3 (2024-08-30) ### Bugfixes - Fix regression in v1.114.0rc2 that caused workers to fail to start. ([\#17626](element-hq/synapse#17626)) # Synapse 1.114.0rc2 (2024-08-30) ### Features - Improve cross-signing upload when using [MSC3861](matrix-org/matrix-spec-proposals#3861) to use a custom UIA flow stage, with web fallback support. ([\#17509](element-hq/synapse#17509)) - Make `hash_password` script accept password input from stdin. ([\#17608](element-hq/synapse#17608)) ### Bugfixes - Fix hierarchy returning 403 when room is accessible through federation. Contributed by Krishan (@kfiven). ([\#17194](element-hq/synapse#17194)) - Fix content-length on federation `/thumbnail` responses. ([\#17532](element-hq/synapse#17532)) - Fix authenticated media responses using a wrong limit when following redirects over federation. ([\#17543](element-hq/synapse#17543)) ### Internal Changes - MSC3861: load the issuer and account management URLs from OIDC discovery. ([\#17407](element-hq/synapse#17407)) - Refactor sliding sync class into multiple files. ([\#17595](element-hq/synapse#17595)) - Store sliding sync per-connection state in the database. ([\#17599](element-hq/synapse#17599)) - Make the sliding sync `PerConnectionState` class immutable. ([\#17600](element-hq/synapse#17600)) - Add support to `@tag_args` for standalone functions. ([\#17604](element-hq/synapse#17604)) - Speed up incremental syncs in sliding sync by adding some more caching. ([\#17606](element-hq/synapse#17606)) - Always return the user's own read receipts in sliding sync. ([\#17617](element-hq/synapse#17617)) - Replace `isort` and `black` with `ruff`. ([\#17620](element-hq/synapse#17620)) - Refactor sliding sync code to move room list logic out into a separate class. ([\#17622](element-hq/synapse#17622)) ### Updates to locked dependencies * Bump attrs from 23.2.0 to 24.2.0. ([\#17609](element-hq/synapse#17609)) * Bump cryptography from 42.0.8 to 43.0.0. ([\#17584](element-hq/synapse#17584)) * Bump phonenumbers from 8.13.43 to 8.13.44. ([\#17610](element-hq/synapse#17610)) * Bump pygithub from 2.3.0 to 2.4.0. ([\#17612](element-hq/synapse#17612)) * Bump pyyaml from 6.0.1 to 6.0.2. ([\#17611](element-hq/synapse#17611)) * Bump sentry-sdk from 2.12.0 to 2.13.0. ([\#17585](element-hq/synapse#17585)) * Bump serde from 1.0.206 to 1.0.208. ([\#17581](element-hq/synapse#17581)) * Bump serde from 1.0.208 to 1.0.209. ([\#17613](element-hq/synapse#17613)) * Bump serde_json from 1.0.124 to 1.0.125. ([\#17582](element-hq/synapse#17582)) * Bump serde_json from 1.0.125 to 1.0.127. ([\#17614](element-hq/synapse#17614)) * Bump types-jsonschema from 4.23.0.20240712 to 4.23.0.20240813. ([\#17583](element-hq/synapse#17583)) * Bump types-setuptools from 71.1.0.20240726 to 71.1.0.20240818. ([\#17586](element-hq/synapse#17586)) # Synapse 1.114.0rc1 (2024-08-20) ### Features - Add a flag to `/versions`, `org.matrix.simplified_msc3575`, to indicate whether experimental sliding sync support has been enabled. ([\#17571](element-hq/synapse#17571)) - Handle changes in `timeline_limit` in experimental sliding sync. ([\#17579](element-hq/synapse#17579)) - Correctly track read receipts that should be sent down in experimental sliding sync. ([\#17575](element-hq/synapse#17575), [\#17589](element-hq/synapse#17589), [\#17592](element-hq/synapse#17592)) ### Bugfixes - Start handlers for new media endpoints when media resource configured. ([\#17483](element-hq/synapse#17483)) - Fix timeline ordering (using `stream_ordering` instead of topological ordering) in experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17510](element-hq/synapse#17510)) - Fix experimental sliding sync implementation to remember any updates in rooms that were not sent down immediately. ([\#17535](element-hq/synapse#17535)) - Better exclude partially stated rooms if we must await full state in experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17538](element-hq/synapse#17538)) - Handle lower-case http headers in `_Mulitpart_Parser_Protocol`. ([\#17545](element-hq/synapse#17545)) - Fix fetching federation signing keys from servers that omit `old_verify_keys`. Contributed by @tulir @ Beeper. ([\#17568](element-hq/synapse#17568)) - Fix bug where we would respond with an error when a remote server asked for media that had a length of 0, using the new multipart federation media endpoint. ([\#17570](element-hq/synapse#17570)) ### Improved Documentation - Clarify default behaviour of the [`auto_accept_invites.worker_to_run_on`](https://element-hq.github.io/synapse/develop/usage/configuration/config_documentation.html#auto-accept-invites) option. ([\#17515](element-hq/synapse#17515)) - Improve docstrings for profile methods. ([\#17559](element-hq/synapse#17559)) ### Internal Changes - Add more tracing to experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17514](element-hq/synapse#17514)) - Fixup comment in sliding sync implementation. ([\#17531](element-hq/synapse#17531)) - Replace override of deprecated method `HTTPAdapter.get_connection` with `get_connection_with_tls_context`. ([\#17536](element-hq/synapse#17536)) - Fix performance of device lists in `/key/changes` and sliding sync. ([\#17537](element-hq/synapse#17537), [\#17548](element-hq/synapse#17548)) - Bump setuptools from 67.6.0 to 72.1.0. ([\#17542](element-hq/synapse#17542)) - Add a utility function for generating random event IDs. ([\#17557](element-hq/synapse#17557)) - Speed up responding to media requests. ([\#17558](element-hq/synapse#17558), [\#17561](element-hq/synapse#17561), [\#17564](element-hq/synapse#17564), [\#17566](element-hq/synapse#17566), [\#17567](element-hq/synapse#17567), [\#17569](element-hq/synapse#17569)) - Test github token before running release script steps. ([\#17562](element-hq/synapse#17562)) - Reduce log spam of multipart files. ([\#17563](element-hq/synapse#17563)) - Refactor per-connection state in experimental sliding sync handler. ([\#17574](element-hq/synapse#17574)) - Add histogram metrics for sliding sync processing time. ([\#17593](element-hq/synapse#17593)) ### Updates to locked dependencies * Bump bytes from 1.6.1 to 1.7.1. ([\#17526](element-hq/synapse#17526)) * Bump lxml from 5.2.2 to 5.3.0. ([\#17550](element-hq/synapse#17550)) * Bump phonenumbers from 8.13.42 to 8.13.43. ([\#17551](element-hq/synapse#17551)) * Bump regex from 1.10.5 to 1.10.6. ([\#17527](element-hq/synapse#17527)) * Bump sentry-sdk from 2.10.0 to 2.12.0. ([\#17553](element-hq/synapse#17553)) * Bump serde from 1.0.204 to 1.0.206. ([\#17556](element-hq/synapse#17556)) * Bump serde_json from 1.0.122 to 1.0.124. ([\#17555](element-hq/synapse#17555)) * Bump sigstore/cosign-installer from 3.5.0 to 3.6.0. ([\#17549](element-hq/synapse#17549)) * Bump types-pyyaml from 6.0.12.20240311 to 6.0.12.20240808. ([\#17552](element-hq/synapse#17552)) * Bump types-requests from 2.31.0.20240406 to 2.32.0.20240712. ([\#17524](element-hq/synapse#17524)) # Synapse 1.113.0 (2024-08-13) No significant changes since 1.113.0rc1. # Synapse 1.113.0rc1 (2024-08-06) ### Features - Track which rooms have been sent to clients in the experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17447](element-hq/synapse#17447)) - Add Account Data extension support to experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17477](element-hq/synapse#17477)) - Add receipts extension support to experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17489](element-hq/synapse#17489)) - Add typing notification extension support to experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17505](element-hq/synapse#17505)) ### Bugfixes - Update experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint to handle invite/knock rooms when filtering. ([\#17450](element-hq/synapse#17450)) - Fix a bug introduced in v1.110.0 which caused `/keys/query` to return incomplete results, leading to high network activity and CPU usage on Matrix clients. ([\#17499](element-hq/synapse#17499)) ### Improved Documentation - Update the [`allowed_local_3pids`](https://element-hq.github.io/synapse/v1.112/usage/configuration/config_documentation.html#allowed_local_3pids) config option's msisdn address to a working example. ([\#17476](element-hq/synapse#17476)) ### Internal Changes - Change sliding sync to use their own token format in preparation for storing per-connection state. ([\#17452](element-hq/synapse#17452)) - Ensure we don't send down negative `bump_stamp` in experimental sliding sync endpoint. ([\#17478](element-hq/synapse#17478)) - Do not send down empty room entries down experimental sliding sync endpoint. ([\#17479](element-hq/synapse#17479)) - Refactor Sliding Sync tests to better utilize the `SlidingSyncBase`. ([\#17481](element-hq/synapse#17481), [\#17482](element-hq/synapse#17482)) - Add some opentracing tags and logging to the experimental sliding sync implementation. ([\#17501](element-hq/synapse#17501)) - Split and move Sliding Sync tests so we have some more sane test file sizes. ([\#17504](element-hq/synapse#17504)) - Update the `limited` field description in the Sliding Sync response to accurately describe what it actually represents. ([\#17507](element-hq/synapse#17507)) - Easier to understand `timeline` assertions in Sliding Sync tests. ([\#17511](element-hq/synapse#17511)) - Reset the sliding sync connection if we don't recognize the per-connection state position. ([\#17529](element-hq/synapse#17529)) ### Updates to locked dependencies * Bump bcrypt from 4.1.3 to 4.2.0. ([\#17495](element-hq/synapse#17495)) * Bump black from 24.4.2 to 24.8.0. ([\#17522](element-hq/synapse#17522)) * Bump phonenumbers from 8.13.39 to 8.13.42. ([\#17521](element-hq/synapse#17521)) * Bump ruff from 0.5.4 to 0.5.5. ([\#17494](element-hq/synapse#17494)) * Bump serde_json from 1.0.120 to 1.0.121. ([\#17493](element-hq/synapse#17493)) * Bump serde_json from 1.0.121 to 1.0.122. ([\#17525](element-hq/synapse#17525)) * Bump towncrier from 23.11.0 to 24.7.1. ([\#17523](element-hq/synapse#17523)) * Bump types-pyopenssl from 24.1.0.20240425 to 24.1.0.20240722. ([\#17496](element-hq/synapse#17496)) * Bump types-setuptools from 70.1.0.20240627 to 71.1.0.20240726. ([\#17497](element-hq/synapse#17497))
// Flag which only returns rooms which have an `m.room.encryption` state event. If unset, | ||
// both encrypted and unencrypted rooms are returned. If false, only unencrypted rooms | ||
// are returned. If true, only encrypted rooms are returned. | ||
"is_invite": true|false|null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will knocks and left rooms be handled? Maybe this should be a string and be the current membership of the user?
#18964) Regressed in #18900 (comment) (see conversation there for more context) ### How is this a regression? > To give this an update with more hindsight; this logic *was* redundant with the early return and it is safe to remove this complexity :white_check_mark: > > It seems like this actually has to do with completed vs incomplete deferreds... > > To explain how things previously worked *without* the early-return shortcut: > > With the normal case of **incomplete awaitable**, we store the `calling_context` and the `f` function is called and runs until it yields to the reactor. Because `f` follows the logcontext rules, it sets the `sentinel` logcontext. Then in `run_in_background(...)`, we restore the `calling_context`, store the current `ctx` (which is `sentinel`) and return. When the deferred completes, we restore `ctx` (which is `sentinel`) before yielding to the reactor again (all good :white_check_mark:) > > With the other case where we see a **completed awaitable**, we store the `calling_context` and the `f` function is called and runs to completion (no logcontext change). *This is where the shortcut would kick in but I'm going to continue explaining as if we commented out the shortcut.* -- Then in `run_in_background(...)`, we restore the `calling_context`, store the current `ctx` (which is same as the `calling_context`). Because the deferred is already completed, our extra callback is called immediately and we restore `ctx` (which is same as the `calling_context`). Since we never yield to the reactor, the `calling_context` is perfect as that's what we want again (all good :white_check_mark:) > > --- > > But this also means that our early-return shortcut is no longer just an optimization and is *necessary* to act correctly in the **completed awaitable** case as we want to return with the `calling_context` and not reset to the `sentinel` context. I've updated the comment in #18964 to explain the necessity as it's currently just described as an optimization. > > But because we made the same change to `run_coroutine_in_background(...)` which didn't have the same early-return shortcut, we regressed the correct behavior ❌ . This is being fixed in #18964 > > > *-- @MadLittleMods, #18900 (comment) ### How did we find this problem? Spawning from @wrjlewis [seeing](https://matrix.to/#/!SGNQGPGUwtcPBUotTL:matrix.org/$h3TxxPVlqC6BTL07dbrsz6PmaUoZxLiXnSTEY-QYDtA?via=jki.re&via=matrix.org&via=element.io) `Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost` in the logs: <details> <summary>More logs</summary> ``` synapse.http.request_metrics - 222 - ERROR - sentinel - Trying to stop RequestMetrics in the sentinel context. 2025-09-23 14:43:19,712 - synapse.util.metrics - 212 - WARNING - sentinel - Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost 2025-09-23 14:43:19,713 - synapse.rest.client.sync - 851 - INFO - sentinel - Client has disconnected; not serializing response. 2025-09-23 14:43:19,713 - synapse.http.server - 825 - WARNING - sentinel - Not sending response to request <XForwardedForRequest at 0x7f23e8111ed0 method='POST' uri='/_matrix/client/unstable/org.matrix.simplified_msc3575/sync?pos=281963%2Fs929324_147053_10_2652457_147960_2013_25554_4709564_0_164_2&timeout=30000' clientproto='HTTP/1.1' site='8008'>, already dis connected. 2025-09-23 14:43:19,713 - synapse.access.http.8008 - 515 - INFO - sentinel - 92.40.194.87 - 8008 - {@me:wi11.co.uk} Processed request: 30.005sec/-8.041sec (0.001sec, 0.000sec) (0.000sec/0.002sec/2) 0B 200! "POST /_matrix/client/unstable/org.matrix.simplified_msc3575/ ``` </details> From the logs there, we can see things relating to `typing.get_new_events` and `/_matrix/client/unstable/org.matrix.simplified_msc3575/sync` which led me to trying out Sliding Sync with the typing extension enabled and allowed me to reproduce the problem locally. Sliding Sync is a unique scenario as it's the only place we use `gather_optional_coroutines(...)` -> `run_coroutine_in_background(...)` (introduced in #17884) to exhibit this behavior. ### Testing strategy 1. Configure Synapse to enable [MSC4186](matrix-org/matrix-spec-proposals#4186): Simplified Sliding Sync which is actually under [MSC3575](matrix-org/matrix-spec-proposals#3575) ```yaml experimental_features: msc3575_enabled: true ``` 1. Start synapse: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Make a Sliding Sync request with one of the extensions enabled ```http POST http://localhost:8008/_matrix/client/unstable/org.matrix.simplified_msc3575/sync { "lists": {}, "room_subscriptions": { "!FlgJYGQKAIvAscfBhq:my.synapse.linux.server": { "required_state": [], "timeline_limit": 1 } }, "extensions": { "typing": { "enabled": true } } } ``` 1. Open your homeserver logs and notice warnings about `Starting ... from sentinel context: metrics will be lost`
A simpler alternative is to use `/messages` to fetch the history. This has two main problems: 1) clients generally want | ||
to preload history for multiple rooms at once, and 2) `/messages` can be slow if it tries to backfill over federation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connection exhaustion makes sense, and I guess having a separate API that is bulk /messages is essentially this API so...
Rooms are ordered by last activity, based on when the last event in the room was received by the server. The exact | ||
ordering is determined by the server implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works but do we want to standardize the language? -> matrix-org/matrix-spec#1334 (comment)
Previous discussion: #4186 (comment)
Rooms are ordered by last activity, based on when the last event in the room was received by the server. The exact | |
ordering is determined by the server implementation. | |
The order of rooms is always done based on the arrival time of the most recent event for the room on the homeserver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to take a little bit of care here, as "arrival time" correlates but isn't exactly the ordering we use in Synapse for example. In more distributed HS implementations this may also depend on the order in which a particular instance or whatever saw the events.
Rooms are ordered by last activity, based on when the last event in the room was received by the server. The exact | ||
ordering is determined by the server implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exact ordering is determined by the server implementation.
What does this mean? I'm struggling to come up with how servers would put their own spin on things
| `membership` | `string` | No | The current membership of the user, or omitted if user not in room (for peeking). | | ||
| `lists` | `[string]` | No | The name of the lists that match this room. The field is omitted if it doesn't match any list and is included only due to a subscription. | | ||
|
||
#### Joined/left/banned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#### Joined/left/banned | |
#### Joined/left/banned (assuming previously joined) |
A user could have been invite
-> ban
in which case we wouldn't have these fields and should act like an invited room.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh, good point. What do we send down in the transition from invite
-> leave
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #4186 (comment)
We should include that information ^
0 to ensure the server doesn't block waiting for new data. This can easily happen if the app starts and sends the first | ||
request with a `since` parameter, if the client shows a spinner but doesn't set a timeout then the request may take a | ||
long time to return (if there were no updates to return). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related info added to the MSC:
matrix-spec-proposals/proposals/4186-simplified-sliding-sync.md
Lines 560 to 561 in 985e71f
The client can keep increasing the list range in increments to pull in the full list of rooms. The client uses the | |
returned `count` for the list to know when to stop expanding the list. |
#18964) Regressed in #18900 (comment) (see conversation there for more context) ### How is this a regression? > To give this an update with more hindsight; this logic *was* redundant with the early return and it is safe to remove this complexity :white_check_mark: > > It seems like this actually has to do with completed vs incomplete deferreds... > > To explain how things previously worked *without* the early-return shortcut: > > With the normal case of **incomplete awaitable**, we store the `calling_context` and the `f` function is called and runs until it yields to the reactor. Because `f` follows the logcontext rules, it sets the `sentinel` logcontext. Then in `run_in_background(...)`, we restore the `calling_context`, store the current `ctx` (which is `sentinel`) and return. When the deferred completes, we restore `ctx` (which is `sentinel`) before yielding to the reactor again (all good :white_check_mark:) > > With the other case where we see a **completed awaitable**, we store the `calling_context` and the `f` function is called and runs to completion (no logcontext change). *This is where the shortcut would kick in but I'm going to continue explaining as if we commented out the shortcut.* -- Then in `run_in_background(...)`, we restore the `calling_context`, store the current `ctx` (which is same as the `calling_context`). Because the deferred is already completed, our extra callback is called immediately and we restore `ctx` (which is same as the `calling_context`). Since we never yield to the reactor, the `calling_context` is perfect as that's what we want again (all good :white_check_mark:) > > --- > > But this also means that our early-return shortcut is no longer just an optimization and is *necessary* to act correctly in the **completed awaitable** case as we want to return with the `calling_context` and not reset to the `sentinel` context. I've updated the comment in #18964 to explain the necessity as it's currently just described as an optimization. > > But because we made the same change to `run_coroutine_in_background(...)` which didn't have the same early-return shortcut, we regressed the correct behavior ❌ . This is being fixed in #18964 > > > *-- @MadLittleMods, #18900 (comment) ### How did we find this problem? Spawning from @wrjlewis [seeing](https://matrix.to/#/!SGNQGPGUwtcPBUotTL:matrix.org/$h3TxxPVlqC6BTL07dbrsz6PmaUoZxLiXnSTEY-QYDtA?via=jki.re&via=matrix.org&via=element.io) `Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost` in the logs: <details> <summary>More logs</summary> ``` synapse.http.request_metrics - 222 - ERROR - sentinel - Trying to stop RequestMetrics in the sentinel context. 2025-09-23 14:43:19,712 - synapse.util.metrics - 212 - WARNING - sentinel - Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost 2025-09-23 14:43:19,713 - synapse.rest.client.sync - 851 - INFO - sentinel - Client has disconnected; not serializing response. 2025-09-23 14:43:19,713 - synapse.http.server - 825 - WARNING - sentinel - Not sending response to request <XForwardedForRequest at 0x7f23e8111ed0 method='POST' uri='/_matrix/client/unstable/org.matrix.simplified_msc3575/sync?pos=281963%2Fs929324_147053_10_2652457_147960_2013_25554_4709564_0_164_2&timeout=30000' clientproto='HTTP/1.1' site='8008'>, already dis connected. 2025-09-23 14:43:19,713 - synapse.access.http.8008 - 515 - INFO - sentinel - 92.40.194.87 - 8008 - {@me:wi11.co.uk} Processed request: 30.005sec/-8.041sec (0.001sec, 0.000sec) (0.000sec/0.002sec/2) 0B 200! "POST /_matrix/client/unstable/org.matrix.simplified_msc3575/ ``` </details> From the logs there, we can see things relating to `typing.get_new_events` and `/_matrix/client/unstable/org.matrix.simplified_msc3575/sync` which led me to trying out Sliding Sync with the typing extension enabled and allowed me to reproduce the problem locally. Sliding Sync is a unique scenario as it's the only place we use `gather_optional_coroutines(...)` -> `run_coroutine_in_background(...)` (introduced in #17884) to exhibit this behavior. ### Testing strategy 1. Configure Synapse to enable [MSC4186](matrix-org/matrix-spec-proposals#4186): Simplified Sliding Sync which is actually under [MSC3575](matrix-org/matrix-spec-proposals#3575) ```yaml experimental_features: msc3575_enabled: true ``` 1. Start synapse: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Make a Sliding Sync request with one of the extensions enabled ```http POST http://localhost:8008/_matrix/client/unstable/org.matrix.simplified_msc3575/sync { "lists": {}, "room_subscriptions": { "!FlgJYGQKAIvAscfBhq:my.synapse.linux.server": { "required_state": [], "timeline_limit": 1 } }, "extensions": { "typing": { "enabled": true } } } ``` 1. Open your homeserver logs and notice warnings about `Starting ... from sentinel context: metrics will be lost`
Co-authored-by: Patrick Cloke <[email protected]>
Co-authored-by: Eric Eastwood <[email protected]>
Co-authored-by: Eric Eastwood <[email protected]>
Rendered