Skip to content

Conversation

declark1
Copy link
Collaborator

@declark1 declark1 commented Oct 1, 2024

This PR is a consolidation of several changes related to clients.

  • Refactors clients, including splitting detector implementations into standalone clients; see Refactor clients to better align with requirements #217 for additional details.
    • TextContextChatDetectorClient is a placeholder and not fully implemented yet
  • Adds OpenAiClient with chat completions and completions support
  • Adds GenerationProvider::OpenAi variant
  • Adds ChatGenerationConfig (OrchestratorConfig.chat_generation field)
  • Adds DetectorType (DetectorConfig.type field)
  • Requires that only the hostname (as opposed to url) is set for client services (ServiceConfig.hostname field) and adds hostname validation
  • Health check related tweaks / simplifications
    • The /info response structure is changed as clients are no longer grouped by type. Example of new format with all changes:
      {
          "services": {
              "generation": {
        	      "status": "HEALTHY"
              },
              "chunker1": {
        	      "status": "HEALTHY"
              },
              "detector1": {
        	      "status": "UNKNOWN",
        	      "code": 404,
        	      "reason": "Not Found"
              },
              "detector2": {
        	      "status": "UNKNOWN",
        	      "code": 404,
        	      "reason": "Not Found"
              }
          }
      }
  • Moved clients::Error and HttpClient and related items out of clients.rs to separate files

Closes #217
Closes #165
Closes #142
Closes #191
Closes #194
Closes #198

@declark1 declark1 changed the title Client refactor (DRAFT) Client refactor Oct 2, 2024
@declark1 declark1 mentioned this pull request Oct 3, 2024
2 tasks
declark1 and others added 3 commits October 3, 2024 11:37
@declark1 declark1 marked this pull request as ready for review October 4, 2024 18:59
@declark1 declark1 requested review from mdevino and pscoro October 4, 2024 19:00
Signed-off-by: declark1 <[email protected]>
Copy link
Collaborator

@pscoro pscoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. I had one idea for a potential improvement that I was digging into on Friday but I think it may be over-engineered, I'll mention it anyway:

Every client implementor seems to have exactly one "inner client" thats either a tonic or http client. The tonic grpc clients need to be cloned in every handler function, and the http ones are accessed by reference. We could potentially add an associated Inner type and a client() to access it in the Client trait. For clients with an HTTP inner client, this is implemented to return &HttpClient and for tonic a clone of the client, e.g. NlpServiceClient<LoadBalancedChannel>. This on its own doesn't sound too bad but the catch is that it will introduce the need for AnyClient, because the ClientMap:

pub struct ClientMap(HashMap<String, Box<dyn Client>>);

will now need the client to specify the Inner type (e.g. Client<Inner = &HttpClient>), so instead we can use something like:

pub struct ClientMap(HashMap<String, Box<dyn AnyClient>>);

Setting up this type-erased blanket-implemented trait is doable and there may be more use cases for it in the future but for now I think it may just add more confusing code to the project without adding much value, all it would really add as benefit for now is that any client just needs to call client() to get their inner client, and in the case of grpc this means we could get rid of the clone line thats at the top of every handler.

…e OpenAI-specific items, drop Completions API.

Signed-off-by: declark1 <[email protected]>
Copy link
Collaborator

@gkumbhat gkumbhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor Dan. Left a few comments, suggestions and questions


/// Health status determined for or returned by a client service.
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "UPPERCASE")]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

panic!("Unexpected error during client health probing: {}", e);
});
info!("Probing client health...");
let client_health = self.client_health(true).await;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be donw in a tokio task so that it doesn't block startup ? (we can address it in future, but just wanted to make a note here)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume Orchestrator::on_start_up() tasks, i.e. checks, should "complete" prior to startup, but I don't really have an opinion on this. @pscoro any thoughts on this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one side, its good to check if everything is wired up correctly, but then in case connections and this verification takes time, it will halt orchestrator bootup and look like orchestrator is having issue.

So I think we should make this async and put out error levels logs if this startup check fails for some downstreams.

(can be handled in separate PR though)

Copy link
Collaborator Author

@declark1 declark1 Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do recall that we wanted to leverage health checks as an extension of config validation, so that we catch misconfigured services at start-up. I believe this was the original intention of running probes at start-up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, thats accurate, but then we realized that this has a unwanted side-effect that it will make orchestrator's startup to fail when even 1 of the downstream is not responding. And so it looks like orchestrator is having issue (from looking at deployment), but the real problem is one of the detectors.

So the idea pivoted to orchestrator doing this check, but instead of blocking its own startup / boot, it will expose this information with API call.

Copy link
Collaborator

@pscoro pscoro Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume Orchestrator::on_start_up() tasks, i.e. checks, should "complete" prior to startup, but I don't really have an opinion on this. @pscoro any thoughts on this?

This brings up a good point, the naming of that function may have some legacy from when we used to want orchestrator to fatally error if the start up health check failed. Now since we decided that the orchestrator does not treat its clients as dependencies, there is no behaviour currently in that function that needs to be checked synchronously before really starting the orchestrator.

I think on_startup() should imply behaviour executed before start up, maybe in the future this could include some tasks that need to run synchronously. I think what we should do is run the call to client_health in a new tokio task thats started inside the on_start_up(). Thoughts on this?

Copy link
Collaborator

@pscoro pscoro Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do recall that we wanted to leverage health checks as an extension of config validation, so that we catch misconfigured services at start-up. I believe this was the original intention of running probes at start-up.

This would still catch misconfig errors at start up time. Since we are already decided that the state of clients (as they are known via their configs) does not affect the liveliness of the orchestrator, and are only reported as non-fatal errors, I dont think there is functionally any difference between:

  • starting an async health check, then starting the orchestrator & recving requests, then handling and reporting the response of the health check without crashing on failure, and
  • starting a sync health check, waiting for the response, then handling and reporting the response of the health check without crashing on failure, and then starting the orchestrator

declark1 and others added 6 commits October 7, 2024 14:38
Co-authored-by: Gaurav Kumbhat <[email protected]>
Signed-off-by: Dan Clark <[email protected]>
Co-authored-by: Gaurav Kumbhat <[email protected]>
Signed-off-by: Dan Clark <[email protected]>
Co-authored-by: Gaurav Kumbhat <[email protected]>
Signed-off-by: Dan Clark <[email protected]>
Co-authored-by: Gaurav Kumbhat <[email protected]>
Signed-off-by: Dan Clark <[email protected]>
…drop GenerationProvider::OpenAi variant

Signed-off-by: declark1 <[email protected]>
Copy link
Collaborator

@mdevino mdevino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me!
Having detectors in separate files make it much easier to navigate the code.

I left a few comments around breaking down large functions to start a dicussion, but I'd suggest leaving these for a separate PR (assuming we agree on these changes).

…alth_client to detector clients and OpenAiClient

Co-authored-by: Paul Scoropan <[email protected]>

Signed-off-by: declark1 <[email protected]>
Copy link
Collaborator

@gkumbhat gkumbhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for all the updates and aligning clients design with goals.

@gkumbhat gkumbhat merged commit b78edf0 into foundation-model-stack:main Oct 10, 2024
2 checks passed
@gkumbhat gkumbhat deleted the client-refactor branch October 10, 2024 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants