Skip to content

[ENH] (somewhat) systematic evaluation tool for finding optimal hyperparams. #67

@Davidyz

Description

@Davidyz

It'd be nice to have some benchmarking mechanism that tells whether a modification to the retrieval process/chunking method helps produce better retrieval results.

Initial plan:
Use a large and powerful reranker as the grader. This grader should produce a score for each keyword-result pair, allowing us to directly measure how good the reranker used by VectorCode during the query process is.

  • The grader model should be sufficiently large and able to handle super-long context (long enough for the largest single file in the test repository to fit in);
  • Change as few hyperparameters in each run as possible;
  • Potentially implementing a grid-search like procedure for tuning the hyperparams;
  • We'd be able to evaluate changes to the chunking/reranking mechanism.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions