Skip to content

Conversation

erikjohnston
Copy link
Member

@erikjohnston erikjohnston commented Aug 30, 2024

@erikjohnston erikjohnston marked this pull request as ready for review August 30, 2024 09:59
@anoadragon453 anoadragon453 added feature Suggestion for a significant extension which needs considerable consideration proposal A matrix spec change proposal kind:core MSC which is critical to the protocol's success labels Aug 30, 2024
@uhoreg uhoreg added the client-server Client-Server API label Aug 30, 2024
@turt2live turt2live added the needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. label Aug 30, 2024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation requirements:

  • Client (ideally multiple)
  • Server

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current implementations to my knowledge:

  • Client implementation: matrix-rust-sdk (PR)
  • Server implementation: has been implemented in Synapse across 10s of PRs. (Perhaps @MadLittleMods, the implementer, can give better link(s) here.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conduwuit has implemented simplified sliding sync in https://github.com/girlbossceo/conduwuit/pull/666


The current `/sync` endpoint scales badly as the number of rooms on an account increases. It scales badly because all
rooms are returned to the client, incremental syncs are unbounded and slow down based on how long the user has been
offline, and clients cannot opt-out of a large amount of extraneous data such as receipts. On large accounts with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't data be removed with filters? (See RoomFilter)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, I believe you can opt-out of receipts with RoomFilter.ephemeral.not_types = [ "m.read.*" ] or similar. When I was working on filtering support in grapevine, we had a fast path for filters like this that would skip reading any of the receipt data from the db server-side. That work hasn't been finished yet, and I have no idea if synapse has a similar fast-path.

One of the difficulties with v3 sync filtering in general is that performance can vary wildly wildly depending on server implementation details. The spec doesn't indicate which fast-paths should exist, and whether or not you hit a fast-path can also affect how many events are returned in the sync window (matrix-org/matrix-spec#1887).

Comment on lines 234 to 236
// The position to use as the `since` token in the next sliding sync request.
// c.f. Connections.
"pos": "<opaque string>",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was an effort to standardize pagination terminology?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently SCT is leaning towards: from and next_batch instead of pos everywhere.

// The "heroes" for the room, if there is no room name. Only sent initially and when it changes.
"heroes": [
{"user_id":"@alice:example.com","displayname":"Alice","avatar_url":"mxc://..."},
],
Copy link
Contributor

@MadLittleMods MadLittleMods Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the heroes membership state events need to be included in required_state response when requesting required_state: ["m.room.member", "$LAZY"] (lazy-loading room members)?

Sync v2 says this:

When lazy-loading room members is enabled, the membership events for the heroes MUST be included in the state, unless they are redundant. When the list of users changes, the server notifies the client by sending a fresh list of heroes. If there are no changes since the last sync, this field may be omitted.

But I think that's because m.heroes in Sync v2 is only a list of user ID's. Whereas, in the sliding sync response here, we already have all of the info necessary in these stripped events.

One alternative is to match Sync v2 and only list the user ID's in heroes and include the membership events in required_state.

Comment on lines 256 to 258
"heroes": [
{"user_id":"@alice:example.com","displayname":"Alice","avatar_url":"mxc://..."},
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One weird corner with the current spec: If the room doesn't have a name set and the user is invited/knock to the room, we don't have any heroes to provide here. This is because heroes membership isn't included in the stripped state that the server receives when they are invited/knocked over federation.

Related to matrix-org/matrix-spec#380

This was solved for partial-joins via MSC3943

Not necessarily a blocker but just calling out something that won't work completely until that spec issue is solved.

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Sep 14, 2024
# Synapse 1.114.0 (2024-09-02)

This release enables support for
[MSC4186](matrix-org/matrix-spec-proposals#4186) —
Simplified Sliding Sync. This allows using the upcoming releases of the Element
X mobile apps without having to run a Sliding Sync Proxy.


### Features

- Enable native sliding sync support ([MSC3575](matrix-org/matrix-spec-proposals#3575) and [MSC4186](matrix-org/matrix-spec-proposals#4186)) by default. ([\#17648](element-hq/synapse#17648))




# Synapse 1.114.0rc3 (2024-08-30)

### Bugfixes

- Fix regression in v1.114.0rc2 that caused workers to fail to start. ([\#17626](element-hq/synapse#17626))




# Synapse 1.114.0rc2 (2024-08-30)

### Features

- Improve cross-signing upload when using [MSC3861](matrix-org/matrix-spec-proposals#3861) to use a custom UIA flow stage, with web fallback support. ([\#17509](element-hq/synapse#17509))
- Make `hash_password` script accept password input from stdin. ([\#17608](element-hq/synapse#17608))

### Bugfixes

- Fix hierarchy returning 403 when room is accessible through federation. Contributed by Krishan (@kfiven). ([\#17194](element-hq/synapse#17194))
- Fix content-length on federation `/thumbnail` responses. ([\#17532](element-hq/synapse#17532))
- Fix authenticated media responses using a wrong limit when following redirects over federation. ([\#17543](element-hq/synapse#17543))

### Internal Changes

- MSC3861: load the issuer and account management URLs from OIDC discovery. ([\#17407](element-hq/synapse#17407))
- Refactor sliding sync class into multiple files. ([\#17595](element-hq/synapse#17595))
- Store sliding sync per-connection state in the database. ([\#17599](element-hq/synapse#17599))
- Make the sliding sync `PerConnectionState` class immutable. ([\#17600](element-hq/synapse#17600))
- Add support to `@tag_args` for standalone functions. ([\#17604](element-hq/synapse#17604))
- Speed up incremental syncs in sliding sync by adding some more caching. ([\#17606](element-hq/synapse#17606))
- Always return the user's own read receipts in sliding sync. ([\#17617](element-hq/synapse#17617))
- Replace `isort` and `black` with `ruff`. ([\#17620](element-hq/synapse#17620))
- Refactor sliding sync code to move room list logic out into a separate class. ([\#17622](element-hq/synapse#17622))



### Updates to locked dependencies

* Bump attrs from 23.2.0 to 24.2.0. ([\#17609](element-hq/synapse#17609))
* Bump cryptography from 42.0.8 to 43.0.0. ([\#17584](element-hq/synapse#17584))
* Bump phonenumbers from 8.13.43 to 8.13.44. ([\#17610](element-hq/synapse#17610))
* Bump pygithub from 2.3.0 to 2.4.0. ([\#17612](element-hq/synapse#17612))
* Bump pyyaml from 6.0.1 to 6.0.2. ([\#17611](element-hq/synapse#17611))
* Bump sentry-sdk from 2.12.0 to 2.13.0. ([\#17585](element-hq/synapse#17585))
* Bump serde from 1.0.206 to 1.0.208. ([\#17581](element-hq/synapse#17581))
* Bump serde from 1.0.208 to 1.0.209. ([\#17613](element-hq/synapse#17613))
* Bump serde_json from 1.0.124 to 1.0.125. ([\#17582](element-hq/synapse#17582))
* Bump serde_json from 1.0.125 to 1.0.127. ([\#17614](element-hq/synapse#17614))
* Bump types-jsonschema from 4.23.0.20240712 to 4.23.0.20240813. ([\#17583](element-hq/synapse#17583))
* Bump types-setuptools from 71.1.0.20240726 to 71.1.0.20240818. ([\#17586](element-hq/synapse#17586))

# Synapse 1.114.0rc1 (2024-08-20)

### Features

- Add a flag to `/versions`, `org.matrix.simplified_msc3575`, to indicate whether experimental sliding sync support has been enabled. ([\#17571](element-hq/synapse#17571))
- Handle changes in `timeline_limit` in experimental sliding sync. ([\#17579](element-hq/synapse#17579))
- Correctly track read receipts that should be sent down in experimental sliding sync. ([\#17575](element-hq/synapse#17575), [\#17589](element-hq/synapse#17589), [\#17592](element-hq/synapse#17592))

### Bugfixes

- Start handlers for new media endpoints when media resource configured. ([\#17483](element-hq/synapse#17483))
- Fix timeline ordering (using `stream_ordering` instead of topological ordering) in experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17510](element-hq/synapse#17510))
- Fix experimental sliding sync implementation to remember any updates in rooms that were not sent down immediately. ([\#17535](element-hq/synapse#17535))
- Better exclude partially stated rooms if we must await full state in experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17538](element-hq/synapse#17538))
- Handle lower-case http headers in `_Mulitpart_Parser_Protocol`. ([\#17545](element-hq/synapse#17545))
- Fix fetching federation signing keys from servers that omit `old_verify_keys`. Contributed by @tulir @ Beeper. ([\#17568](element-hq/synapse#17568))
- Fix bug where we would respond with an error when a remote server asked for media that had a length of 0, using the new multipart federation media endpoint. ([\#17570](element-hq/synapse#17570))

### Improved Documentation

- Clarify default behaviour of the
  [`auto_accept_invites.worker_to_run_on`](https://element-hq.github.io/synapse/develop/usage/configuration/config_documentation.html#auto-accept-invites)
  option. ([\#17515](element-hq/synapse#17515))
- Improve docstrings for profile methods. ([\#17559](element-hq/synapse#17559))

### Internal Changes

- Add more tracing to experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17514](element-hq/synapse#17514))
- Fixup comment in sliding sync implementation. ([\#17531](element-hq/synapse#17531))
- Replace override of deprecated method `HTTPAdapter.get_connection` with `get_connection_with_tls_context`. ([\#17536](element-hq/synapse#17536))
- Fix performance of device lists in `/key/changes` and sliding sync. ([\#17537](element-hq/synapse#17537), [\#17548](element-hq/synapse#17548))
- Bump setuptools from 67.6.0 to 72.1.0. ([\#17542](element-hq/synapse#17542))
- Add a utility function for generating random event IDs. ([\#17557](element-hq/synapse#17557))
- Speed up responding to media requests. ([\#17558](element-hq/synapse#17558), [\#17561](element-hq/synapse#17561), [\#17564](element-hq/synapse#17564), [\#17566](element-hq/synapse#17566), [\#17567](element-hq/synapse#17567), [\#17569](element-hq/synapse#17569))
- Test github token before running release script steps. ([\#17562](element-hq/synapse#17562))
- Reduce log spam of multipart files. ([\#17563](element-hq/synapse#17563))
- Refactor per-connection state in experimental sliding sync handler. ([\#17574](element-hq/synapse#17574))
- Add histogram metrics for sliding sync processing time. ([\#17593](element-hq/synapse#17593))



### Updates to locked dependencies

* Bump bytes from 1.6.1 to 1.7.1. ([\#17526](element-hq/synapse#17526))
* Bump lxml from 5.2.2 to 5.3.0. ([\#17550](element-hq/synapse#17550))
* Bump phonenumbers from 8.13.42 to 8.13.43. ([\#17551](element-hq/synapse#17551))
* Bump regex from 1.10.5 to 1.10.6. ([\#17527](element-hq/synapse#17527))
* Bump sentry-sdk from 2.10.0 to 2.12.0. ([\#17553](element-hq/synapse#17553))
* Bump serde from 1.0.204 to 1.0.206. ([\#17556](element-hq/synapse#17556))
* Bump serde_json from 1.0.122 to 1.0.124. ([\#17555](element-hq/synapse#17555))
* Bump sigstore/cosign-installer from 3.5.0 to 3.6.0. ([\#17549](element-hq/synapse#17549))
* Bump types-pyyaml from 6.0.12.20240311 to 6.0.12.20240808. ([\#17552](element-hq/synapse#17552))
* Bump types-requests from 2.31.0.20240406 to 2.32.0.20240712. ([\#17524](element-hq/synapse#17524))

# Synapse 1.113.0 (2024-08-13)

No significant changes since 1.113.0rc1.




# Synapse 1.113.0rc1 (2024-08-06)

### Features

- Track which rooms have been sent to clients in the experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17447](element-hq/synapse#17447))
- Add Account Data extension support to experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17477](element-hq/synapse#17477))
- Add receipts extension support to experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17489](element-hq/synapse#17489))
- Add typing notification extension support to experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17505](element-hq/synapse#17505))

### Bugfixes

- Update experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint to handle invite/knock rooms when filtering. ([\#17450](element-hq/synapse#17450))
- Fix a bug introduced in v1.110.0 which caused `/keys/query` to return incomplete results, leading to high network activity and CPU usage on Matrix clients. ([\#17499](element-hq/synapse#17499))

### Improved Documentation

- Update the [`allowed_local_3pids`](https://element-hq.github.io/synapse/v1.112/usage/configuration/config_documentation.html#allowed_local_3pids) config option's msisdn address to a working example. ([\#17476](element-hq/synapse#17476))

### Internal Changes

- Change sliding sync to use their own token format in preparation for storing per-connection state. ([\#17452](element-hq/synapse#17452))
- Ensure we don't send down negative `bump_stamp` in experimental sliding sync endpoint. ([\#17478](element-hq/synapse#17478))
- Do not send down empty room entries down experimental sliding sync endpoint. ([\#17479](element-hq/synapse#17479))
- Refactor Sliding Sync tests to better utilize the `SlidingSyncBase`. ([\#17481](element-hq/synapse#17481), [\#17482](element-hq/synapse#17482))
- Add some opentracing tags and logging to the experimental sliding sync implementation. ([\#17501](element-hq/synapse#17501))
- Split and move Sliding Sync tests so we have some more sane test file sizes. ([\#17504](element-hq/synapse#17504))
- Update the `limited` field description in the Sliding Sync response to accurately describe what it actually represents. ([\#17507](element-hq/synapse#17507))
- Easier to understand `timeline` assertions in Sliding Sync tests. ([\#17511](element-hq/synapse#17511))
- Reset the sliding sync connection if we don't recognize the per-connection state position. ([\#17529](element-hq/synapse#17529))



### Updates to locked dependencies

* Bump bcrypt from 4.1.3 to 4.2.0. ([\#17495](element-hq/synapse#17495))
* Bump black from 24.4.2 to 24.8.0. ([\#17522](element-hq/synapse#17522))
* Bump phonenumbers from 8.13.39 to 8.13.42. ([\#17521](element-hq/synapse#17521))
* Bump ruff from 0.5.4 to 0.5.5. ([\#17494](element-hq/synapse#17494))
* Bump serde_json from 1.0.120 to 1.0.121. ([\#17493](element-hq/synapse#17493))
* Bump serde_json from 1.0.121 to 1.0.122. ([\#17525](element-hq/synapse#17525))
* Bump towncrier from 23.11.0 to 24.7.1. ([\#17523](element-hq/synapse#17523))
* Bump types-pyopenssl from 24.1.0.20240425 to 24.1.0.20240722. ([\#17496](element-hq/synapse#17496))
* Bump types-setuptools from 70.1.0.20240627 to 71.1.0.20240726. ([\#17497](element-hq/synapse#17497))
// Flag which only returns rooms which have an `m.room.encryption` state event. If unset,
// both encrypted and unencrypted rooms are returned. If false, only unencrypted rooms
// are returned. If true, only encrypted rooms are returned.
"is_invite": true|false|null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will knocks and left rooms be handled? Maybe this should be a string and be the current membership of the user?

MadLittleMods added a commit to element-hq/synapse that referenced this pull request Sep 24, 2025
#18964)

Regressed in
#18900 (comment)
(see conversation there for more context)


### How is this a regression?

> To give this an update with more hindsight; this logic *was* redundant
with the early return and it is safe to remove this complexity
:white_check_mark:
> 
> It seems like this actually has to do with completed vs incomplete
deferreds...
> 
> To explain how things previously worked *without* the early-return
shortcut:
> 
> With the normal case of **incomplete awaitable**, we store the
`calling_context` and the `f` function is called and runs until it
yields to the reactor. Because `f` follows the logcontext rules, it sets
the `sentinel` logcontext. Then in `run_in_background(...)`, we restore
the `calling_context`, store the current `ctx` (which is `sentinel`) and
return. When the deferred completes, we restore `ctx` (which is
`sentinel`) before yielding to the reactor again (all good
:white_check_mark:)
> 
> With the other case where we see a **completed awaitable**, we store
the `calling_context` and the `f` function is called and runs to
completion (no logcontext change). *This is where the shortcut would
kick in but I'm going to continue explaining as if we commented out the
shortcut.* -- Then in `run_in_background(...)`, we restore the
`calling_context`, store the current `ctx` (which is same as the
`calling_context`). Because the deferred is already completed, our extra
callback is called immediately and we restore `ctx` (which is same as
the `calling_context`). Since we never yield to the reactor, the
`calling_context` is perfect as that's what we want again (all good
:white_check_mark:)
> 
> ---
> 
> But this also means that our early-return shortcut is no longer just
an optimization and is *necessary* to act correctly in the **completed
awaitable** case as we want to return with the `calling_context` and not
reset to the `sentinel` context. I've updated the comment in
#18964 to explain the
necessity as it's currently just described as an optimization.
> 
> But because we made the same change to
`run_coroutine_in_background(...)` which didn't have the same
early-return shortcut, we regressed the correct behavior ❌ . This is
being fixed in #18964
>
>
> *-- @MadLittleMods,
#18900 (comment)

### How did we find this problem?

Spawning from @wrjlewis
[seeing](https://matrix.to/#/!SGNQGPGUwtcPBUotTL:matrix.org/$h3TxxPVlqC6BTL07dbrsz6PmaUoZxLiXnSTEY-QYDtA?via=jki.re&via=matrix.org&via=element.io)
`Starting metrics collection 'typing.get_new_events' from sentinel
context: metrics will be lost` in the logs:

<details>
<summary>More logs</summary>

```
synapse.http.request_metrics - 222 - ERROR - sentinel - Trying to stop RequestMetrics in the sentinel context.
2025-09-23 14:43:19,712 - synapse.util.metrics - 212 - WARNING - sentinel - Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost
2025-09-23 14:43:19,713 - synapse.rest.client.sync - 851 - INFO - sentinel - Client has disconnected; not serializing response.
2025-09-23 14:43:19,713 - synapse.http.server - 825 - WARNING - sentinel - Not sending response to request <XForwardedForRequest at 0x7f23e8111ed0 method='POST' uri='/_matrix/client/unstable/org.matrix.simplified_msc3575/sync?pos=281963%2Fs929324_147053_10_2652457_147960_2013_25554_4709564_0_164_2&timeout=30000' clientproto='HTTP/1.1' site='8008'>, already dis
connected.
2025-09-23 14:43:19,713 - synapse.access.http.8008 - 515 - INFO - sentinel - 92.40.194.87 - 8008 - {@me:wi11.co.uk} Processed request: 30.005sec/-8.041sec (0.001sec, 0.000sec) (0.000sec/0.002sec/2) 0B 200! "POST /_matrix/client/unstable/org.matrix.simplified_msc3575/
```

</details>

From the logs there, we can see things relating to
`typing.get_new_events` and
`/_matrix/client/unstable/org.matrix.simplified_msc3575/sync` which led
me to trying out Sliding Sync with the typing extension enabled and
allowed me to reproduce the problem locally. Sliding Sync is a unique
scenario as it's the only place we use `gather_optional_coroutines(...)`
-> `run_coroutine_in_background(...)` (introduced in
#17884) to exhibit this
behavior.


### Testing strategy

1. Configure Synapse to enable
[MSC4186](matrix-org/matrix-spec-proposals#4186):
Simplified Sliding Sync which is actually under
[MSC3575](matrix-org/matrix-spec-proposals#3575)
    ```yaml
    experimental_features:
      msc3575_enabled: true
    ```
1. Start synapse: `poetry run synapse_homeserver --config-path
homeserver.yaml`
 1. Make a Sliding Sync request with one of the extensions enabled
    ```http
POST
http://localhost:8008/_matrix/client/unstable/org.matrix.simplified_msc3575/sync
    {
      "lists": {},
      "room_subscriptions": {
            "!FlgJYGQKAIvAscfBhq:my.synapse.linux.server": {
                "required_state": [],
                "timeline_limit": 1
            }
        },
        "extensions": {
            "typing": {
                "enabled": true
            }
        }
    }
    ```
1. Open your homeserver logs and notice warnings about `Starting ...
from sentinel context: metrics will be lost`
@turt2live turt2live moved this from Ready for general review to Proposed for FCP readiness in Spec Core Team Workflow Sep 24, 2025
Comment on lines +592 to +593
A simpler alternative is to use `/messages` to fetch the history. This has two main problems: 1) clients generally want
to preload history for multiple rooms at once, and 2) `/messages` can be slow if it tries to backfill over federation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connection exhaustion makes sense, and I guess having a separate API that is bulk /messages is essentially this API so...

Comment on lines +110 to +111
Rooms are ordered by last activity, based on when the last event in the room was received by the server. The exact
ordering is determined by the server implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works but do we want to standardize the language? -> matrix-org/matrix-spec#1334 (comment)

Previous discussion: #4186 (comment)

Suggested change
Rooms are ordered by last activity, based on when the last event in the room was received by the server. The exact
ordering is determined by the server implementation.
The order of rooms is always done based on the arrival time of the most recent event for the room on the homeserver.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to take a little bit of care here, as "arrival time" correlates but isn't exactly the ordering we use in Synapse for example. In more distributed HS implementations this may also depend on the order in which a particular instance or whatever saw the events.

Comment on lines +110 to +111
Rooms are ordered by last activity, based on when the last event in the room was received by the server. The exact
ordering is determined by the server implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exact ordering is determined by the server implementation.

What does this mean? I'm struggling to come up with how servers would put their own spin on things

| `membership` | `string` | No | The current membership of the user, or omitted if user not in room (for peeking). |
| `lists` | `[string]` | No | The name of the lists that match this room. The field is omitted if it doesn't match any list and is included only due to a subscription. |

#### Joined/left/banned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Joined/left/banned
#### Joined/left/banned (assuming previously joined)

A user could have been invite -> ban in which case we wouldn't have these fields and should act like an invited room.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh, good point. What do we send down in the transition from invite -> leave?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #4186 (comment)

We should include that information ^

0 to ensure the server doesn't block waiting for new data. This can easily happen if the app starts and sends the first
request with a `since` parameter, if the client shows a spinner but doesn't set a timeout then the request may take a
long time to return (if there were no updates to return).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related info added to the MSC:

The client can keep increasing the list range in increments to pull in the full list of rooms. The client uses the
returned `count` for the list to know when to stop expanding the list.

anoadragon453 pushed a commit to element-hq/synapse that referenced this pull request Sep 25, 2025
#18964)

Regressed in
#18900 (comment)
(see conversation there for more context)


### How is this a regression?

> To give this an update with more hindsight; this logic *was* redundant
with the early return and it is safe to remove this complexity
:white_check_mark:
> 
> It seems like this actually has to do with completed vs incomplete
deferreds...
> 
> To explain how things previously worked *without* the early-return
shortcut:
> 
> With the normal case of **incomplete awaitable**, we store the
`calling_context` and the `f` function is called and runs until it
yields to the reactor. Because `f` follows the logcontext rules, it sets
the `sentinel` logcontext. Then in `run_in_background(...)`, we restore
the `calling_context`, store the current `ctx` (which is `sentinel`) and
return. When the deferred completes, we restore `ctx` (which is
`sentinel`) before yielding to the reactor again (all good
:white_check_mark:)
> 
> With the other case where we see a **completed awaitable**, we store
the `calling_context` and the `f` function is called and runs to
completion (no logcontext change). *This is where the shortcut would
kick in but I'm going to continue explaining as if we commented out the
shortcut.* -- Then in `run_in_background(...)`, we restore the
`calling_context`, store the current `ctx` (which is same as the
`calling_context`). Because the deferred is already completed, our extra
callback is called immediately and we restore `ctx` (which is same as
the `calling_context`). Since we never yield to the reactor, the
`calling_context` is perfect as that's what we want again (all good
:white_check_mark:)
> 
> ---
> 
> But this also means that our early-return shortcut is no longer just
an optimization and is *necessary* to act correctly in the **completed
awaitable** case as we want to return with the `calling_context` and not
reset to the `sentinel` context. I've updated the comment in
#18964 to explain the
necessity as it's currently just described as an optimization.
> 
> But because we made the same change to
`run_coroutine_in_background(...)` which didn't have the same
early-return shortcut, we regressed the correct behavior ❌ . This is
being fixed in #18964
>
>
> *-- @MadLittleMods,
#18900 (comment)

### How did we find this problem?

Spawning from @wrjlewis
[seeing](https://matrix.to/#/!SGNQGPGUwtcPBUotTL:matrix.org/$h3TxxPVlqC6BTL07dbrsz6PmaUoZxLiXnSTEY-QYDtA?via=jki.re&via=matrix.org&via=element.io)
`Starting metrics collection 'typing.get_new_events' from sentinel
context: metrics will be lost` in the logs:

<details>
<summary>More logs</summary>

```
synapse.http.request_metrics - 222 - ERROR - sentinel - Trying to stop RequestMetrics in the sentinel context.
2025-09-23 14:43:19,712 - synapse.util.metrics - 212 - WARNING - sentinel - Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost
2025-09-23 14:43:19,713 - synapse.rest.client.sync - 851 - INFO - sentinel - Client has disconnected; not serializing response.
2025-09-23 14:43:19,713 - synapse.http.server - 825 - WARNING - sentinel - Not sending response to request <XForwardedForRequest at 0x7f23e8111ed0 method='POST' uri='/_matrix/client/unstable/org.matrix.simplified_msc3575/sync?pos=281963%2Fs929324_147053_10_2652457_147960_2013_25554_4709564_0_164_2&timeout=30000' clientproto='HTTP/1.1' site='8008'>, already dis
connected.
2025-09-23 14:43:19,713 - synapse.access.http.8008 - 515 - INFO - sentinel - 92.40.194.87 - 8008 - {@me:wi11.co.uk} Processed request: 30.005sec/-8.041sec (0.001sec, 0.000sec) (0.000sec/0.002sec/2) 0B 200! "POST /_matrix/client/unstable/org.matrix.simplified_msc3575/
```

</details>

From the logs there, we can see things relating to
`typing.get_new_events` and
`/_matrix/client/unstable/org.matrix.simplified_msc3575/sync` which led
me to trying out Sliding Sync with the typing extension enabled and
allowed me to reproduce the problem locally. Sliding Sync is a unique
scenario as it's the only place we use `gather_optional_coroutines(...)`
-> `run_coroutine_in_background(...)` (introduced in
#17884) to exhibit this
behavior.


### Testing strategy

1. Configure Synapse to enable
[MSC4186](matrix-org/matrix-spec-proposals#4186):
Simplified Sliding Sync which is actually under
[MSC3575](matrix-org/matrix-spec-proposals#3575)
    ```yaml
    experimental_features:
      msc3575_enabled: true
    ```
1. Start synapse: `poetry run synapse_homeserver --config-path
homeserver.yaml`
 1. Make a Sliding Sync request with one of the extensions enabled
    ```http
POST
http://localhost:8008/_matrix/client/unstable/org.matrix.simplified_msc3575/sync
    {
      "lists": {},
      "room_subscriptions": {
            "!FlgJYGQKAIvAscfBhq:my.synapse.linux.server": {
                "required_state": [],
                "timeline_limit": 1
            }
        },
        "extensions": {
            "typing": {
                "enabled": true
            }
        }
    }
    ```
1. Open your homeserver logs and notice warnings about `Starting ...
from sentinel context: metrics will be lost`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00-weekly-pings Tracking for weekly pings in the SCT office. 00 to make it first in the labels list. client-server Client-Server API feature Suggestion for a significant extension which needs considerable consideration kind:core MSC which is critical to the protocol's success matrix-2.0 Required for Matrix 2.0 needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal
Projects
Status: Proposed for FCP readiness
Development

Successfully merging this pull request may close these issues.