I have this method in my controller that sends out messages but in chunks:
[HttpPost]public async Task Chat([FromBody] ChatRequest request){ Response.Headers.Add("Content-Type", "text/event-stream"); Response.Headers.Add("Cache-Control", "no-cache, no-store, must-revalidate"); Response.Headers.Add("Pragma", "no-cache"); Response.Headers.Add("Expires", "0"); try { var responseStream = _chatService.ChatAsync(request.UserInput!, request.History!); await foreach (var chunk in responseStream) { Console.WriteLine(chunk); await Response.WriteAsync($"data: {chunk}\n\n"); await Response.Body.FlushAsync(); } } catch (Exception ex) { await Response.WriteAsync($"data: Error: {ex.Message}\n\n"); await Response.Body.FlushAsync(); } finally { Console.WriteLine("END"); await Response.WriteAsync($" --{DateTime.Now.ToString("yyyy/MM/dd HH:mm:ss")} \n\n"); await Response.Body.FlushAsync(); }}Using Postman, I can get each word of a message one by one.
In my client side, I created a page and a service that calls this API endpoint
public async IAsyncEnumerable<string> ChatAsync( string userInput, string history, [EnumeratorCancellation] CancellationToken cancellationToken = default){ var request = new ChatRequest { UserInput = userInput, History = history }; var httpRequest = new HttpRequestMessage(HttpMethod.Post, "api/Chats") { Content = JsonContent.Create(request) }; httpRequest.Headers.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("text/event-stream")); var response = await _httpClient.SendAsync(httpRequest, HttpCompletionOption.ResponseHeadersRead, cancellationToken); response.EnsureSuccessStatusCode(); using var stream = await response.Content.ReadAsStreamAsync(cancellationToken); using var reader = new StreamReader(stream); while (!reader.EndOfStream) { cancellationToken.ThrowIfCancellationRequested(); var line = await reader.ReadLineAsync(); Console.WriteLine($"Read line: {line}"); if (string.IsNullOrEmpty(line)) continue; string chunk; if (line.StartsWith("data: ", StringComparison.OrdinalIgnoreCase)) { chunk = line.Substring(6); } else { chunk = line; } yield return chunk; }}Code-behind for my HTML:
private async Task SendMessage(){ if (string.IsNullOrWhiteSpace(userInput) || isSending) return; isSending = true; messages.Add(new ChatMessage { Content = userInput, IsUser = true }); var currentUserInput = userInput; userInput = ""; var botMessage = new ChatMessage { Content = "", IsUser = false }; messages.Add(botMessage); StateHasChanged(); try { Console.WriteLine("Starting to receive chunks..."); await foreach (var chunk in ChatService.ChatAsync(currentUserInput, history, cts.Token)) { Console.WriteLine($"Received chunk: {chunk}"); botMessage.Content += chunk; StateHasChanged(); await InvokeAsync(() => JSRuntime.InvokeVoidAsync("scrollToBottom", "chatMessages")); } Console.WriteLine("Finished receiving chunks."); } catch (OperationCanceledException) { botMessage.Content += " [Message was interrupted]"; } catch (Exception ex) { botMessage.Content = $"An error occurred: {ex.Message}"; Console.WriteLine($"Exception occurred: {ex}"); } history += $"User: {currentUserInput}\nBot: {botMessage.Content}\n"; isSending = false; StateHasChanged();}Why is it that I cannot get the message chunk by chunk or word by word? What happens is it waits for the whole reply then renders it. I can get the whole reply. The only problem is, I need it to be able to render it to the UI each time it receives a word from the API and render it. Just like an ordinary Generative AI. If I leave it as it is (wait for the whole reply and then render it) if the reply is lengthy, then it would take time before i can actually render it in the UI. It would cause a bad user experience.
I am using Blazor WASM core hosted in .NET 6.