-
Notifications
You must be signed in to change notification settings - Fork 186
LowCardinality column implementation #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Minor refactoing of columns_ut.cpp: moved timers to utils.h Basic support of LowCardinality column, with draft serialization. LowCardinality implementation LowCardinality fixes: Fixed ColumnLowCardinality::Load to update unique items map Updated ColumnLowCardinality::Swap to validate nested type of other column. Minor fix: moved line around Tests for ColumnLowCardinality moved performance-tests to performance_tests.cpp Added performance test for ColumnLowCardinality(String) and ColumnLowCardinality(FixedString) Added type parser unit-tests for LowCardinality Added client tests for LowCardinality Bumped revision up to indicate that LowCardinality is supported DUMP of current work Implemented ItemView struct, using it as return type of Column::GetItem() Fixed various bugs in ColumnLowCardinality: mostly serialization and deserialization-wise More tests, for LowCardinaility in isolation, unit tests, performance tests, integration tests. Fixed, updated and cleaned up test cases
… has values Updated tests
Moved ItemView to separate header Renamed ColumnLowCardinalityWrapper and renamed it into ColumnLowCardinalityT Udpdated tests accordingly
Comparison of populating, inserting and selecting 1'000'000 items performance-wise:
Please note:
|
Since it is not present in older versions of XCode
Since there is no clickhouse server for MacOS started
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First bunch of comments. Still in progress.
* client_version_patch in ClientInfo * every overload of Column::Swap() now throws an exception on attempt to swap with instance of wrong type. * ColumnNullable::Swap() throws is nested column is of unexpected type. * PausableTimer is now started in paused mode. * few whitespace changes here and there
Updated and fixed all mentioned issues + fixed inconsistent whitespace (at least in code added/updated in this PR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also revisit and review the Timers code and make it solid and bulletproof and impossible to misuse, as it is arguably the most important code in this PR :) since it measures the performance of a feature whose sole existence is based on the performance benefits it gives.
// however some day they may. | ||
|
||
// prefix | ||
uint64_t key_version; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless it is a conscious performance-motivated decision backed by data, vars are better to always be initialized. (Here and several other places.)
return; | ||
|
||
dictionary_column_->Swap(*col->dictionary_column_); | ||
index_column_.swap(col->index_column_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be index_column_->Swap(*col->index_column_);
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn’t matter here, but for consistency, let’s update it.
|
||
timer.Start(); | ||
column.Save(&ostr); | ||
ostr.Flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really want to measure flush too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, since lots of interesting things can happen there, also I am measuring the whole process, the way it is in real cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, however potentially LowCardinality + flush, and regular type + flush numbers may differ insignificantly, if flush is heavy. While isolated LowCardinality may be like several times slower than a regular type, and that difference just dwarfed by flush overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is quite the opposite: serialization/deserialization of LowCardinality
columns is MUCH faster (up to 100 times faster than bare String
and up to 3 times faster than bare FixedSring
columns)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry, my bad. But the idea remains the same, the observer wont be able to properly see that degree of difference.
* Using simpler timer * More clean measurments of SELECT * minor whitespace/braces fixes
…m ColumnLowCardinality + Other minor fixes: mostly whitespace.
After the most recent update to implementation and measurements:
Tested against localhost server, so the IO lag is diminishable. Tested on 1M entries, time is in microseconds. |
LGTM |
Implemented LowCardinality(String) and LowCardinality(FixedString) with native binary serialization and a multitude of tests.
To enable efficient LowCardinality implementation, added some methods to the
Column
interface (and implementation in most derived classes), and addedItemView
class (see the documentation in sources).Also, to enable receiving LC columns from the server, updated client TCP protocol revision to 54405 (
DBMS_MIN_REVISION_WITH_LOW_CARDINALITY_TYPE
). That also forced me to update client/server protocol a bit:Known limitations: