-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[PM-25138] Optimize code paths for creating default collections #6308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[PM-25138] Optimize code paths for creating default collections #6308
Conversation
New Issues (1)Checkmarx found the following issues in this Pull Request
|
|
// Offload some work from SQL Server by pre-sorting before insert. | ||
// This lets us use the SqlBulkCopy.ColumnOrderHints to improve performance. | ||
collectionUsers = collectionUsers | ||
.OrderBy(cu => cu.CollectionId) | ||
.ThenBy(cu => cu.OrganizationUserId) | ||
.ToList(); | ||
|
||
using var bulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.KeepIdentity, transaction); | ||
bulkCopy.DestinationTableName = "[dbo].[CollectionUser]"; | ||
bulkCopy.BatchSize = 500; | ||
bulkCopy.BulkCopyTimeout = 120; | ||
bulkCopy.EnableStreaming = true; | ||
bulkCopy.ColumnOrderHints.Add("CollectionId", SortOrder.Ascending); | ||
bulkCopy.ColumnOrderHints.Add("OrganizationUserId", SortOrder.Ascending); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this will apply to the "import vault" code path as well, but these seem like sensible defaults for both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkac-bw this implements your suggestions to pre-sort the data. However, I am getting test failures:
Microsoft.Data.SqlClient.SqlException : Cannot bulk load. The bulk data stream was incorrectly specified as sorted or the data violates a uniqueness constraint imposed by the target table. Sort order incorrect for the following two rows: primary key of first row: (0564afea-b76a-427d-bcf5-b354000c067b), primary key of second row: (aad77ce5-5dd6-444b-b11a-b354000c067b).
The docs say that sort order must match the clustered index definition, but best I can tell it does (Collection
uses Id
as its clustered index, CollectionUser
uses OrganizationId
and OrganizationUserId
).
However, just through trial and error I have found that sorting (and providing SqlBulkCopyColumnOrderHint
s) for just Collection.OrganizationId
and either CollectionUser.OrganizationId
or OrganizationUserId
(but not both) works fine. This doesn't seem to match the documentation though. Any advice?
🎟️ Tracking
https://bitwarden.atlassian.net/browse/PM-25138
📔 Objective
We currently use the same bulk method to create default collections for 1 user or for many users. However, that bulk creation method is optimized for the top end of town: it uses SqlBulkCopy, intended for large orgs. This is causing deadlock issues in tests. dbops advice is that it's best for >500 inserts at once, so we need to rethink about where and how this is used.
This PR switches the "single user" code paths back to using single user sprocs to avoid this overhead. This reverts some of the changes in #6153 which switched over to the bulk method.
It also uses
GenerateComb
and pre-sorts SqlBulkCopy data per dbops advice.There is a follow-up ticket to write a bulk sproc (not using SqlBulkCopy) for the 1-500 range.
📸 Screenshots
⏰ Reminders before review
🦮 Reviewer guidelines
:+1:
) or similar for great changes:memo:
) or ℹ️ (:information_source:
) for notes or general info:question:
) for questions:thinking:
) or 💭 (:thought_balloon:
) for more open inquiry that's not quite a confirmed issue and could potentially benefit from discussion:art:
) for suggestions / improvements:x:
) or:warning:
) for more significant problems or concerns needing attention:seedling:
) or ♻️ (:recycle:
) for future improvements or indications of technical debt:pick:
) for minor or nitpick changes