-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Lower graph build memory by processing GTFS shapes more efficiently #6752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lower graph build memory by processing GTFS shapes more efficiently #6752
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## dev-2.x #6752 +/- ##
=============================================
+ Coverage 71.83% 71.87% +0.03%
Complexity 19130 19130
=============================================
Files 2074 2075 +1
Lines 77784 77754 -30
Branches 7940 7939 -1
=============================================
+ Hits 55879 55888 +9
+ Misses 19090 19054 -36
+ Partials 2815 2812 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f587b18
to
1643897
Compare
1643897
to
7ddd7f2
Compare
application/src/main/java/org/opentripplanner/gtfs/mapping/CompactShape.java
Show resolved
Hide resolved
1342b98
to
f7b2aa7
Compare
I'm trying to skip OBA altogether in a follow up. |
I've now reworked the |
Got it. So for
That readme also says "Trove is quite stable and continues to be useable," which is my experience. It's a simple enough concept that as a library it's basically "done" and doesn't need much maintenance. If I was starting from scratch I'd probably use Fastutil but Trove still works flawlessly everywhere I've used it. I think TIntHashSet and TIntHashMap are still used in various places in OTP so it shouldn't be objectionable. |
Yes, that seems good. I was thinking more of using a sort function that sorts one array based on another, and I thought there was one in Apache Commons but I must have been mistaken. What you did with the map looks like it would work fine, though as you said it takes more memory. Since the ShapePoints are used once and thrown away, couldn't CompactShape just have a method that builds a temporary list of all ShapePoints for that CompactShape, sorts that list, then returns that throwaway list of throwaway objects (or an iterator for that list)? This would increase the memory consumption from one single ShapePoint at a time to all ShapePoints for one shape at a time, but that's still very small and bounded. |
Here's an example of what I mean: This is untested but I think it would have memory consumption closer to your original version. Seems quite readable/maintainable too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, only some minor comments.
application/src/main/java/org/opentripplanner/model/ShapePoint.java
Outdated
Show resolved
Hide resolved
...ation/src/main/java/org/opentripplanner/graph_builder/module/geometry/GeometryProcessor.java
Outdated
Show resolved
Hide resolved
@abyrd I've taken your version and modified it a little bit: I removed the intermediate list generation and went from stream directly to iterator. It probably still materializes a collection somewhere but maybe the standard lib uses some optimization tricks. |
I've merge this, @abyrd, since I believe that I satisfied your requests. If not, add your comments and I will send a follow-up. |
Summary
Lowers the graph build memory by packing GTFS shape points more tightly and getting rid of a lot of duplicated copies of all shape points which would take up large amounts of RAM.
The core of the memory savings are in packing coordinates tightly into an array rather than lists of shape points.
Results
One of my feeds contained 60 million (!) shape points and there I could lower the required memory from 21 to 9GB!
In Helsinki I can see a more modest decrease from 9 to 6GB of memory. However, this is still 30%.
Unit tests
Added and updated.
Documentation
Javadoc.
Bumping the serialization version id
The actual graph entities are not changed and therefore it's not necessary.