Skip to content

Badger datastore performance with English snapshot #85

@lidel

Description

@lidel

badger shows consistent issues in go-ipfs 0.8.0 with 300GB of wikipedia_en_all_maxi_2021-02

Issues

  • oom-killer stopped the process at 30%, then 60%
    • unable to open resulting repo with 32GB of RAM (churns until oom-killer kicks in)
    • unable to open when RAM limited to 20G, errors after ~30s:
      $ firejail --noprofile --rlimit-as=20000000000 ipfs daemon
      Error: Opening table: "/media/1tb/projects/wikipedia/ipfs-repo/en/badgerds/189654.sst": Unable to map file: "189654.sst": cannot allocate memory

How to reproduce

Reproduction of relevant import steps:

  1. go-ipfs 0.8.0 from https://dist.ipfs.io/#go-ipfs
  2. zimdump from https://download.openzim.org/nightly/2021-02-12/zim-tools_linux-x86_64-2021-02-12.tar.gz
  3. download wikipedia_en_all_maxi_2021-02.zim (~80GB)
  4. unpack ZIM (requires ~300GB, ~20 000 000 files)
$ zimdump dump --dir=wikipedia_en_all_maxi_2021-02 wikipedia_en_all_maxi_2021-02.zim
  1. Add unpacked archive to a fresh ipfs repo with badger backend:
$ ipfs init -p badgerds --empty-repo
$ ipfs config --json 'Experimental.ShardingEnabled' true
$ ipfs add -r --cid-version 1 --pin=false --offline -Qp ./wikipedia_en_all_maxi_2021-02/

Would be useful if someone reproduced the memory issue so we know its not specific to my box.

Things to try

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High: Likely tackled by core team if no one steps updif/expertExtensive knowledge (implications, ramifications) requiredeffort/weeksEstimated to take multiple weekskind/bugA bug in existing code (including security flaws)need/analysisNeeds further analysis before proceedingsnapshotsissues related to snapshot creation and updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions