Skip to content

Conversation

@surajmn1
Copy link
Contributor

@surajmn1 surajmn1 commented Dec 16, 2025

merge after #30

Summary by CodeRabbit

  • New Features

    • Real-time server-sent events (SSE) progress streaming for uploads and refresh operations with multi-step status messages and percent progress.
  • User Experience

    • Progress-driven dialogs and persistent toasts show live status, disable closing during operations, and consolidate error reporting for retry-friendly flows.
    • Some transient success toasts removed in favor of persistent progress/success indicators.
  • UI Improvements

    • Condensed validation warnings and clearer rejected-row summaries for easier review.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 16, 2025

Walkthrough

Added Server-Sent Events (SSE) to backend upload/refresh handlers (database and S3), introduced a frontend SSE client and SSE-based mutation hooks, and updated multiple UI components to stream incremental progress, errors, and final results; also simplified rejected-row validation logic.

Changes

Cohort / File(s) Summary
Server — Database SSE handlers
server/interfaces/http/routes/source/database/create.go, server/interfaces/http/routes/source/database/refresh.go
Replaced synchronous JSON responses with SSE streaming; added SSEEvent/SSEData types, SSE channels, sendEvent/handleFailure helpers, goroutine-based async processing emitting per-step status/error/complete events, and stream writer + headers.
Server — S3 SSE handlers
server/interfaces/http/routes/source/s3/create.go, server/interfaces/http/routes/source/s3/refresh.go
Implemented SSE streaming for S3 upload/refresh: new SSE types and channel, async goroutine processing with granular progress and error events, buffered writer streaming to clients, and SSE-based error signaling.
Frontend — SSE client utility
web/src/lib/sse-client.ts
Added SSEEvent interface and fetchWithSSE to consume server SSE-like streams, parse events, invoke onEvent callbacks, handle auth/org headers, and return final payload from complete event.
Frontend — SSE mutation hooks
web/src/lib/mutations/dataset/source-database-dataset.ts, web/src/lib/mutations/dataset/source-dataset.ts, web/src/lib/mutations/dataset/refresh-dataset.ts, web/src/lib/mutations/dataset/refresh-dataset-database.ts
Added SSE variants (useSourceDatabaseDatasetSSE, useSourceDatasetSSE, useRefreshDatasetSSE, useRefreshDatabaseDatasetSSE) using fetchWithSSE, accepting onProgress callbacks and forwarding auth tokens; some original hooks retained.
Frontend — Upload/Refresh UI integration
web/src/components/dataset/database-source-form.tsx, web/src/components/dataset/dataset-upload-wizard.tsx, web/src/components/dataset/dataset-header.tsx, web/src/components/dataset/file-refresh-wizard.tsx
Wired SSE hooks into UI: added uploadProgress/dialogProgress/currentProgress state, progress bars/spinner, dynamic status messages, onProgress handlers updating UI/toasts, prevented dialog closes mid-progress, and adapted some payload shape usage (res.dataset).
Frontend — Minor UX/toast cleanup
web/src/components/dataset/csv-validation-uppy.tsx, web/src/components/dataset/unified-uploader.tsx
Removed specific success-toasts in upload-success handlers; other behaviors unchanged.
Frontend — Validation simplification
web/src/lib/validation/validate-file.ts
Simplified rejected-row retrieval: single information_schema check, limited query (200 rows), minimal RejectedRow shape, and streamlined error handling/logging.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HTTP as HTTP Handler
    participant SSE as Stream Writer
    participant Worker as Background Goroutine
    participant DB as Database
    participant AI as AI/External Service

    Client->>HTTP: POST /source/... (upload / refresh)
    HTTP->>Client: 200 + headers (text/event-stream)
    HTTP->>SSE: start streaming loop
    HTTP->>Worker: spawn goroutine to perform steps
    activate Worker
    Worker->>SSE: event(status_update, "validating")
    SSE->>Client: data: {type:"status_update", message:"validating"}
    Worker->>DB: check/create tables or start ingest
    Worker->>SSE: event(status_update, "table_created" / "ingesting")
    SSE->>Client: data: {type:"status_update", message:"table_created"}
    Worker->>DB: fetch metrics / verify
    Worker->>AI: generate descriptions / schema (if applicable)
    Worker->>SSE: event(status_update, "ai_summary_generated")
    SSE->>Client: data: {type:"status_update", message:"ai_summary_generated"}
    Worker->>DB: create dataset/source, upload schema
    alt success
      Worker->>SSE: event(complete, "done", data:{dataset,...})
      SSE->>Client: data: {type:"complete", message:"done", data:{...}}
    else error
      Worker->>SSE: event(error, "reason")
      SSE->>Client: data: {type:"error", message:"reason"}
    end
    deactivate Worker
    HTTP->>Client: close stream
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Focus review on:
    • SSE lifecycle, buffering, and flush behavior in server handlers (create.go, refresh.go for DB and S3).
    • Consistency between server SSEEvent payloads and frontend SSEEvent parsing in web/src/lib/sse-client.ts.
    • Auth/header propagation and org context handling in fetchWithSSE and new hooks.
    • UI state management and cancellation/cleanup paths in components using progress (database-source-form, file-refresh-wizard, dataset-header).
    • Validation changes in validate-file.ts and consumers that expect the previous richer RejectedRow shape.

Suggested reviewers

  • Nirlep5252

Poem

🐰 I hopped along the data stream,
sending tiny status beams,
progress ticks and error signs,
little hops through coded lines,
the rabbit cheered — "Events! Redeems!"

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title focuses on a specific fix for row validation error rendering, but the changeset is predominantly about implementing Server-Sent Events (SSE) streaming across multiple upload and refresh workflows, with row validation fix being a minor change. Update the title to reflect the primary changes. Consider: 'Implement SSE-based streaming for database and S3 upload/refresh flows' or similar to accurately represent the substantial architectural changes throughout the codebase.
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/upload-error-rendering

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Dec 16, 2025

Claude finished @surajmn1's task —— View job


PR Review: Row Validation Errors Rendering Fix

Comprehensive review completed for this PR that fixes row validation errors not rendering properly and adds streaming support for DB-based uploads.

Review Progress

  • Analyze validation error rendering fixes
  • Review new SSE client implementation
  • Check backend streaming changes
  • Examine frontend component updates
  • Review security and performance considerations
  • Check code quality and best practices
  • Provide comprehensive feedback

🔍 Detailed Code Review

Positive Findings

1. Excellent Validation Error Rendering Improvements

The validation error display has been significantly improved across components:

  • Enhanced Error Interface (validate-file.ts:22-32): New RejectedRow interface provides comprehensive error context
  • Detailed Error Display (dataset-upload-wizard.tsx:1123-1155): Validation warnings show specific row numbers, column names, and error messages
  • User-Friendly Warning Dialogs (dataset-upload-wizard.tsx:1518-1594): Clear dialog explaining data type issues with option to proceed
  • Consistent Error Handling: Pattern implemented across multiple upload components

2. Robust SSE Implementation

New streaming client (sse-client.ts) provides:

  • Clean TypeScript interfaces for event types
  • Proper authentication/authorization handling
  • Progressive loading capabilities
  • Environment-aware configuration

3. Comprehensive Backend Streaming

Database source endpoints demonstrate excellent patterns:

  • Progressive status updates via SSE events
  • Comprehensive resource cleanup on failures (create.go:44-76)
  • Proper error propagation and handling
  • Good separation of concerns

4. Improved User Experience

  • Real-time Progress Feedback: Users see detailed progress during long operations
  • Better Error Context: Specific validation errors instead of generic failures
  • Graceful Degradation: Operations continue with warnings when possible

⚠️ Issues Requiring Attention

1. Critical: SSE Error Handling Vulnerability

Location: web/src/lib/sse-client.ts:86-90

} catch (e) {
  if (e instanceof Error && e.message !== "Unexpected end of JSON input") {
     console.error("SSE Parse Error", e);
  }
  if (line.includes('"type":"error"')) throw e; 
}

Issues:

  • Silently ignoring "Unexpected end of JSON input" could mask connection issues
  • String-based error detection is fragile and could miss malformed events
  • No validation of parsed JSON structure before processing

Recommended Fix:

} catch (e) {
  console.warn("SSE Parse Error", { error: e, line, context: 'SSE_STREAM' });
  // Only re-throw if this is a properly formatted error event
  if (line.trim().startsWith('data: ')) {
    try {
      const parsed = JSON.parse(line.slice(6));
      if (parsed?.type === 'error') throw e;
    } catch {}
  }
}

2. Performance: Memory Usage in Validation

Location: Multiple files use LIMIT 1000000

  • validate-file.ts:234: CSV validation processing 1M rows
  • validate-file.ts:285: Parquet validation
  • validate-file.ts:348: JSON validation

Concern: Processing 1M rows in browser can cause memory pressure and UI blocking.

Recommendation: Implement adaptive limits based on available memory:

const getValidationLimit = () => {
  const memoryMB = (performance as any)?.memory?.usedJSHeapSize / (1024 * 1024) || 0;
  return memoryMB > 100 ? 100000 : 1000000; // Reduce limit if memory usage is high
};

3. Progress Tracking Accuracy

Location: dataset-upload-wizard.tsx:462, dataset-header.tsx:344

currentProgress = Math.min(currentProgress + 10, 90);

Issue: Artificial progress increments don't reflect actual completion and may mislead users.

Recommendation: Use actual operation progress when available, or implement time-based estimates.

🔒 Security Assessment

Strong Points

  • SQL Injection Prevention: Random table names and parameterized queries
  • Authentication: Proper token handling with organization-based access
  • Input Validation: Comprehensive request validation on backend
  • Resource Cleanup: Prevents resource leaks and unauthorized access

⚠️ Areas to Monitor

  • Rate Limiting: No visible rate limiting on SSE endpoints
  • File Size Limits: Large file processing could enable DoS attacks

Performance Analysis

Optimizations

  • Client-side Validation: Reduces server load using DuckDB WASM
  • Streaming Responses: Prevents timeout issues on large operations
  • Efficient Sampling: ORDER BY random() LIMIT 50 for AI analysis

⚠️ Potential Issues

  • UI Blocking: Heavy validation tasks could impact responsiveness
  • Memory Usage: Large file processing without web workers
  • Database Load: Random sampling expensive on large tables

📋 Code Quality Assessment

Strengths

  • TypeScript Usage: Excellent type safety and interfaces
  • Error Boundaries: Good error isolation and handling
  • Code Organization: Clear separation of concerns
  • Consistent Patterns: Similar error handling across components

⚠️ Improvements Needed

  • Documentation: Complex SSE logic needs more inline comments
  • Test Coverage: No visible tests for error scenarios
  • Logging: Some error paths lack sufficient logging context

🧪 Missing Test Coverage

Critical scenarios that should have tests:

  • SSE client error handling and reconnection
  • Validation error rendering edge cases
  • Resource cleanup failure scenarios
  • Memory pressure during large file validation

📝 Prioritized Recommendations

🚨 High Priority (Pre-merge)

  1. Fix SSE error handling to prevent silent failures and improve error detection
  2. Add timeout handling for long-running SSE operations
  3. Improve progress tracking accuracy where possible

Medium Priority (Next Sprint)

  1. Implement memory monitoring for large file validation
  2. Add retry logic for failed cleanup operations
  3. Consider web workers for heavy validation tasks

📚 Low Priority (Future)

  1. Add comprehensive unit tests for error scenarios
  2. Implement rate limiting for SSE endpoints
  3. Add performance monitoring for validation operations

🎯 Final Assessment

This PR represents a significant improvement in user experience and system robustness. The validation error rendering fixes address a real usability issue, and the streaming capabilities add valuable real-time feedback.

Key Achievements:

  • ✅ Fixed validation error rendering across multiple components
  • ✅ Added robust streaming architecture with proper cleanup
  • ✅ Improved user experience with real-time progress feedback
  • ✅ Maintained strong security and type safety practices

Critical Issue:

  • ⚠️ SSE error handling needs improvement to prevent silent failures

Recommendation:

✅ Approve with requested changes - Fix the SSE error handling issue, then this PR significantly improves the platform and is ready to ship.

The core functionality is solid and the improvements are valuable. Address the error handling edge case and this will be an excellent addition to the codebase.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (24)
server/interfaces/http/routes/source/database/create.go (4)

199-200: Avoid hardcoded sleep for waiting on table creation.

Using a fixed 2-second delay is fragile—it may be insufficient under load or wastefully long otherwise. Consider polling with exponential backoff or using a callback/notification mechanism from the OLAP service.

-	sendEvent("status_update", "Waiting for table creation...", nil)
-	time.Sleep(2 * time.Second) // Wait for the table to be created in OLAP
+	sendEvent("status_update", "Waiting for table creation...", nil)
+	// TODO: Replace with polling/retry mechanism or callback from OLAP service
+	if err := h.waitForTableReady(tableName); err != nil {
+		h.logger.Error("Table creation timeout", zap.Error(err), zap.String("table_name", tableName))
+		handleFailure(fmt.Errorf("table creation timeout: %w", err))
+		return
+	}

230-238: Quote the table identifier for defense in depth.

While tableName is internally generated, directly interpolating identifiers into SQL queries is a risky pattern. If the generation logic changes or is reused, this could become a SQL injection vector. Consider quoting the identifier.

-	rows, err := h.olapSvc.ExecuteQuery(fmt.Sprintf("select * from %s order by random() limit 50", tableName))
+	rows, err := h.olapSvc.ExecuteQuery(fmt.Sprintf(`select * from "%s" order by random() limit 50`, tableName))

Alternatively, if olapSvc supports parameterized identifiers or has a dedicated method for safe identifier handling, prefer that approach.


264-278: Typo: datasetDesciptiondatasetDescription.

-	datasetDesciption, err := h.aiSvc.GenerateDatasetDescription(
+	datasetDescription, err := h.aiSvc.GenerateDatasetDescription(
 		body.Alias,
 		columnNames,
 		descriptions,
 		rowsString,
 		summaryString,
 	)
 	if err != nil {
 		h.logger.Error("Error generating dataset description", zap.Error(err))
 		h.cleanupResources(cleanup)
 		handleFailure(fmt.Errorf("error generating dataset description: %w", err))
 		return
 	}
 	// override the description provided by user for now
-	body.Description = datasetDesciption
+	body.Description = datasetDescription

158-161: Handle client disconnection to avoid wasted work.

The goroutine doesn't check for client disconnection. If a client drops the connection mid-stream, the goroutine continues executing all operations (AI calls, database writes), wasting resources. Consider passing a context and checking for cancellation at key points.

+	// Get context for cancellation detection
+	reqCtx := ctx.Context()
+
 	// Start async upload process
 	go func() {
 		defer close(sseChan)
+
+		// Helper to check if client disconnected
+		checkCancelled := func() bool {
+			select {
+			case <-reqCtx.Done():
+				h.logger.Info("Client disconnected, aborting operation")
+				return true
+			default:
+				return false
+			}
+		}

Then call checkCancelled() before expensive operations (AI calls, database writes) and return early if true.

web/src/lib/validation/validate-file.ts (1)

622-622: Extract the hardcoded limit to a named constant.

The LIMIT 200 is hardcoded and appears in the query. Consider extracting it to a named constant at the file level for better maintainability.

+// Maximum number of rejected rows to retrieve for display
+const MAX_REJECTED_ROWS_PREVIEW = 200;
+
 async function getRejectedRows(
   conn: duckdb.AsyncDuckDBConnection,
 ): Promise<{ rejectedRows: RejectedRow[]; rejectedRowCount: number }> {
   try {
     // ... existing code ...
     const rejectedQ = await conn.query(`
       SELECT *
       FROM reject_errors
       ORDER BY file_id, line
-      LIMIT 200;
+      LIMIT ${MAX_REJECTED_ROWS_PREVIEW};
     `);
server/interfaces/http/routes/source/s3/refresh.go (3)

128-139: Transaction commit/rollback logic relies on err variable state at defer time.

The defer closure captures err by reference. Since err is reassigned throughout the goroutine, the final value at defer execution determines commit vs rollback. This works correctly, but the pattern is subtle and could be fragile if future code paths return early without setting err.

Consider using a named success boolean set explicitly before the completion event for clearer intent:

+		success := false
 		// Ensure transactions are committed or rolled back
 		defer func() {
-			if err != nil {
+			if !success {
 				h.logger.Error("Rolling back transactions due to error", zap.Error(err))
 				olapTx.Rollback()
 				storeTx.Rollback(txCtx)
 			} else {
 				h.logger.Info("Committing transactions")
 				olapTx.Commit()
 				storeTx.Commit(txCtx)
 			}
 		}()

Then set success = true before the completion event at line 239.


167-168: Hardcoded sleep for data commit synchronization is fragile.

The 2-second sleep may be insufficient under load or excessive in normal conditions. Consider using a polling mechanism or event-based synchronization if the OLAP service supports it.


78-84: Error event format may not be parsed correctly by the frontend SSE client.

The handleFailure helper emits event: error\ndata: ... but the frontend fetchWithSSE only processes lines starting with data: . The event: error line is ignored. The error is still detected via event.type === 'error' in the JSON payload, so this works, but the event: line is redundant.

For consistency with sendEvent, consider simplifying:

 	handleFailure := func(failErr error) {
 		errMsg := failErr.Error()
-		errorPayload, _ := json.Marshal(map[string]string{"type": "error", "message": errMsg})
-		errorMsg := fmt.Sprintf("event: error\ndata: %s\n\n", errorPayload)
-		sseChan <- SSEData{Data: []byte(errorMsg)}
+		sendEvent("error", errMsg, nil)
 	}
server/interfaces/http/routes/source/s3/create.go (1)

17-28: SSEEvent and SSEData types are duplicated across packages.

These types are also defined in server/interfaces/http/routes/source/database/create.go (per relevant snippets). Extract to a shared package to avoid duplication and ensure consistency:

// e.g., server/interfaces/http/routes/common/sse.go
package common

type SSEEvent struct {
    Type    string `json:"type"`
    Message string `json:"message"`
    Data    any    `json:"data,omitempty"`
}

type SSEData struct {
    Data  []byte
    Error error
}
server/interfaces/http/routes/source/database/refresh.go (1)

214-215: Same hardcoded sleep pattern as other handlers.

Consider extracting this to a configuration constant or implementing proper synchronization.

web/src/lib/mutations/dataset/refresh-dataset.ts (1)

73-106: Hook pattern doesn't integrate with React Query mutation lifecycle.

The useRefreshDatasetSSE returns a plain async function rather than using createMutation. This means:

  1. No automatic loading/error state tracking
  2. No mutation cache integration
  3. No retry logic or callbacks (onSuccess, onError, onSettled)

Per coding guidelines, React Query should be used for server state management. Consider wrapping with useMutation from React Query:

export const useRefreshDatasetSSE = () => {
  const { accessToken } = useAuth();
  
  return useMutation({
    mutationKey: ["refresh-dataset-sse"],
    mutationFn: async ({
      datasetName,
      projectId,
      s3Url,
      source,
      ignoreErrors,
      onProgress,
    }: RefreshDatasetSSEVariables) => {
      return fetchWithSSE(
        '/source/s3/refresh',
        {
          method: 'POST',
          body: JSON.stringify({
            dataset_name: datasetName,
            project_id: projectId,
            file_path: s3Url,
            ignore_errors: ignoreErrors ?? true,
            source: source,
          }),
        },
        onProgress,
        accessToken
      );
    },
  });
};

As per coding guidelines, use React Query for server state management.

web/src/lib/mutations/dataset/refresh-dataset-database.ts (1)

66-93: Same hook pattern issues as useRefreshDatasetSSE.

This hook has the same concerns:

  1. Plain async function return instead of React Query mutation integration
  2. Access token captured at hook render time (potential staleness)

Apply the same refactoring suggestions as useRefreshDatasetSSE for consistency and proper React Query integration. As per coding guidelines, use React Query for server state management.

web/src/lib/mutations/dataset/source-dataset.ts (1)

59-103: Consider adding explicit return type for type safety.

The useSourceDatasetSSE hook returns fetchWithSSE result without an explicit type, which loses type safety. Additionally, SSEEvent is used without a generic parameter.

-export const useSourceDatasetSSE = () => {
+interface SourceDatasetSSEResult {
+  dataset: Dataset;
+  summary: {
+    dataset_name: string;
+    summary: Record<string, string>[];
+  };
+}
+
+export const useSourceDatasetSSE = () => {
   const { accessToken } = useAuth();
   return async ({
     ...
-    onProgress: (event: SSEEvent) => void
+    onProgress: (event: SSEEvent<SourceDatasetSSEResult>) => void
   }) => {
     ...
-    return await fetchWithSSE(
+    return await fetchWithSSE<SourceDatasetSSEResult>(
       '/source/s3/upload',
       ...
     );
   };
 };
web/src/lib/mutations/dataset/source-database-dataset.ts (1)

4-15: Interface design differs from useSourceDatasetSSE pattern.

The onProgress callback is included in SourceDatabaseDatasetParams interface, while useSourceDatasetSSE defines it inline in the function parameters. Consider aligning the patterns for consistency across SSE hooks.

web/src/components/dataset/dataset-header.tsx (2)

342-366: Progress increment logic may not accurately reflect actual progress.

The progress increments by a fixed 10% per status_update event and caps at 90%. If the server sends many events, progress will quickly hit 90% and stall; if it sends few events, progress may appear slower than actual work. Consider using event.progress if the server provides it, or adjusting the increment based on expected event count.

 onProgress: (event: SSEEvent) => {
   if (event.type === 'status_update') {
-    currentProgress = Math.min(currentProgress + 10, 90);
+    // Prefer server-provided progress if available
+    currentProgress = event.progress != null 
+      ? Math.min(event.progress, 90) 
+      : Math.min(currentProgress + 10, 90);
     
     if (isDialogFlow) {

147-147: Variable rename from toast to toasty is functional but consider alternatives.

Renaming destructured toast to toasty to avoid shadowing the imported toast from sonner works, but is slightly confusing. Consider renaming the sonner import instead for clarity.

-import { toast } from "sonner"; 
+import { toast as sonnerToast } from "sonner"; 
 ...
-const { toast : toasty } = useToast();
+const { toast } = useToast();

Then use sonnerToast for the sonner calls (lines 325, 354, 380, 382, 394, 396).

web/src/components/dataset/database-source-form.tsx (2)

85-93: Same progress increment pattern — consider extracting to shared utility.

This progress calculation logic (Math.min(currentProgress + 10, 90)) is duplicated in dataset-header.tsx. Consider extracting a shared utility or using server-provided progress values.


96-112: Complex type assertions for response handling.

The response structure handling with multiple type assertions is defensive but hard to follow. Consider defining a proper response type interface.

+interface DatabaseUploadResponse {
+  data?: {
+    dataset: {
+      id: string;
+      alias: string;
+    };
+  };
+  dataset?: {
+    id: string;
+    alias: string;
+  };
+}

 // In handleSubmit:
-const result = (response as unknown as Record<string, unknown>)?.data || response;
-const typedResult = result as Record<string, unknown>;
-const dataset = typedResult?.dataset as Record<string, unknown> | undefined;
+const typedResponse = response as DatabaseUploadResponse;
+const dataset = typedResponse?.data?.dataset || typedResponse?.dataset;
web/src/lib/sse-client.ts (2)

22-22: Inconsistent indentation on line 22.

The const isAuthEnabled line has inconsistent indentation compared to surrounding code.

   // Add Auth & Org Headers (Reusing logic from api-client)
-const isAuthEnabled = String(process.env.NEXT_PUBLIC_ENABLE_AUTH).trim() === "true";
+  const isAuthEnabled = String(process.env.NEXT_PUBLIC_ENABLE_AUTH).trim() === "true";
   const orgId = getGlobalOrganizationId();

60-93: SSE parsing may fail with chunked JSON split across reads.

The current implementation splits chunks by newline and parses each data: line as complete JSON. If a large JSON payload is split across multiple chunks (mid-line), parsing will fail. The "Unexpected end of JSON input" check suggests this is known but silently ignored.

Consider buffering incomplete lines:

 const reader = response.body.getReader();
 const decoder = new TextDecoder();
 let resultData: T | undefined;
+let buffer = '';

 while (true) {
   const { done, value } = await reader.read();
   if (done) break;

-  const chunk = decoder.decode(value, { stream: true });
-  const lines = chunk.split('\n');
+  buffer += decoder.decode(value, { stream: true });
+  const lines = buffer.split('\n');
+  // Keep the last incomplete line in the buffer
+  buffer = lines.pop() || '';

   for (const line of lines) {
web/src/components/dataset/file-refresh-wizard.tsx (2)

58-73: Unused datasetId prop in component signature.

The datasetId is declared in FileRefreshWizardProps (line 60) but is never destructured or used from props. Instead, it's obtained from useParams() on line 70. This creates a confusing API where callers must pass datasetId but it's ignored.

Either remove datasetId from the props interface and rely solely on useParams(), or use the prop and remove the useParams() call for consistency.

 export interface FileRefreshWizardProps {
   projectId: string;
-  datasetId: string;
   onRefreshComplete: () => void;
 }

482-497: Remove unnecessary curly braces around JSX.

The curly braces wrapping RefreshFileUploader on lines 482 and 497 serve no purpose and can be removed for cleaner code.

-              {
-                <RefreshFileUploader
-                  ref={uploaderRef}
-                  existingSchema={existingSchemaData?.schema as ColumnInfo[]}
-                  onValidationStateChange={handleValidationStateChange}
-                  onUploadSuccess={handleUploadComplete}
-                  onUploadError={(msg) => {
-                    // Update toast if active, otherwise just set error
-                    toast.error("Upload Failed", {
-                      id: REFRESH_TOAST_ID,
-                      description: msg,
-                    });
-                    setApiError(msg);
-                  }}
-                />
-              }
+              <RefreshFileUploader
+                ref={uploaderRef}
+                existingSchema={existingSchemaData?.schema as ColumnInfo[]}
+                onValidationStateChange={handleValidationStateChange}
+                onUploadSuccess={handleUploadComplete}
+                onUploadError={(msg) => {
+                  // Update toast if active, otherwise just set error
+                  toast.error("Upload Failed", {
+                    id: REFRESH_TOAST_ID,
+                    description: msg,
+                  });
+                  setApiError(msg);
+                }}
+              />
web/src/components/dataset/dataset-upload-wizard.tsx (2)

1272-1286: Remove commented-out code block.

This commented JSX block for uploadProgress appears to be dead code. If it's intended for future implementation, consider removing it and tracking the feature in a separate issue or TODO comment instead.

-              {/* {uploadProgress.status === 'processing' && (
-      <div className="space-y-3 p-4 border rounded-lg bg-secondary/10">
-        <div className="flex items-center justify-between text-sm">
-          <span className="font-medium flex items-center gap-2">
-            <Loader2 className="h-4 w-4 animate-spin text-primary" />
-            {uploadProgress.message}
-          </span>
-          <span className="text-muted-foreground">{uploadProgress.percentage}%</span>
-        </div>
-        <Progress value={uploadProgress.percentage} className="h-2" />
-        <p className="text-xs text-muted-foreground">
-          Processing your data. Large datasets may take a moment.
-        </p>
-      </div>
-    )} */}
-                           <Button
+              <Button

324-341: Hardcoded timeout for store update is fragile.

The 2-second hardcoded wait on line 325 assumes the store will be updated within that timeframe. This is a fragile pattern that could cause race conditions on slow connections or unnecessary delays on fast ones.

Consider using a polling approach with a maximum timeout, or restructuring the flow so the upload response is passed directly rather than relying on store synchronization timing.

-          // Wait a bit longer for the store to update with the upload response
-          await new Promise((resolve) => setTimeout(resolve, 2000));
+          // Poll for the upload response with timeout
+          const maxAttempts = 10;
+          const pollInterval = 200;
+          for (let i = 0; i < maxAttempts; i++) {
+            const newUploadResponse = useUploadStore.getState().uploadResponse;
+            if (newUploadResponse && typeof newUploadResponse === "object") {
+              break;
+            }
+            await new Promise((resolve) => setTimeout(resolve, pollInterval));
+          }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2948d39 and f8ed7a5.

📒 Files selected for processing (16)
  • server/interfaces/http/routes/source/database/create.go (4 hunks)
  • server/interfaces/http/routes/source/database/refresh.go (3 hunks)
  • server/interfaces/http/routes/source/s3/create.go (5 hunks)
  • server/interfaces/http/routes/source/s3/refresh.go (3 hunks)
  • web/src/components/dataset/csv-validation-uppy.tsx (0 hunks)
  • web/src/components/dataset/database-source-form.tsx (6 hunks)
  • web/src/components/dataset/dataset-header.tsx (12 hunks)
  • web/src/components/dataset/dataset-upload-wizard.tsx (11 hunks)
  • web/src/components/dataset/file-refresh-wizard.tsx (10 hunks)
  • web/src/components/dataset/unified-uploader.tsx (0 hunks)
  • web/src/lib/mutations/dataset/refresh-dataset-database.ts (2 hunks)
  • web/src/lib/mutations/dataset/refresh-dataset.ts (2 hunks)
  • web/src/lib/mutations/dataset/source-database-dataset.ts (2 hunks)
  • web/src/lib/mutations/dataset/source-dataset.ts (2 hunks)
  • web/src/lib/sse-client.ts (1 hunks)
  • web/src/lib/validation/validate-file.ts (1 hunks)
💤 Files with no reviewable changes (2)
  • web/src/components/dataset/csv-validation-uppy.tsx
  • web/src/components/dataset/unified-uploader.tsx
🧰 Additional context used
📓 Path-based instructions (4)
web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx}: Use Next.js 15 with App Router and React 19 for the web frontend framework
Use TypeScript as the language for the web frontend
Generate and use types from API responses for type safety
Use dynamic imports and code splitting for optimal performance
Load DuckDB WASM on-demand and use web workers for client-side SQL execution

Files:

  • web/src/lib/mutations/dataset/refresh-dataset-database.ts
  • web/src/lib/mutations/dataset/refresh-dataset.ts
  • web/src/lib/sse-client.ts
  • web/src/components/dataset/dataset-header.tsx
  • web/src/lib/mutations/dataset/source-dataset.ts
  • web/src/lib/mutations/dataset/source-database-dataset.ts
  • web/src/lib/validation/validate-file.ts
  • web/src/components/dataset/database-source-form.tsx
  • web/src/components/dataset/file-refresh-wizard.tsx
  • web/src/components/dataset/dataset-upload-wizard.tsx
web/src/lib/{queries,mutations}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (web/CLAUDE.md)

Use React Query (TanStack Query) for server state management

Files:

  • web/src/lib/mutations/dataset/refresh-dataset-database.ts
  • web/src/lib/mutations/dataset/refresh-dataset.ts
  • web/src/lib/mutations/dataset/source-dataset.ts
  • web/src/lib/mutations/dataset/source-database-dataset.ts
web/src/components/**/*.{ts,tsx}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/components/**/*.{ts,tsx}: Use TailwindCSS with Radix UI components for styling instead of custom CSS
Use React Hook Form with Zod validation for form handling
Use pre-configured queries and mutations from lib/queries and lib/mutations for API calls
Wrap features in error boundaries to handle errors gracefully
Always handle loading and error states in UI components
Use Next.js Image component for image optimization

Files:

  • web/src/components/dataset/dataset-header.tsx
  • web/src/components/dataset/database-source-form.tsx
  • web/src/components/dataset/file-refresh-wizard.tsx
  • web/src/components/dataset/dataset-upload-wizard.tsx
web/src/components/dataset/**upload**.{ts,tsx}

📄 CodeRabbit inference engine (web/CLAUDE.md)

Use Uppy for dataset file uploads in upload components

Files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
🧠 Learnings (14)
📓 Common learnings
Learnt from: CR
Repo: factly/gopie PR: 0
File: chat-server/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:34.955Z
Learning: Applies to chat-server/app/api/v1/routers/**/*.py : Use Server-Sent Events for streaming real-time query progress responses
📚 Learning: 2025-12-02T11:48:50.411Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:50.411Z
Learning: Applies to web/src/components/**/*.{ts,tsx} : Use pre-configured queries and mutations from lib/queries and lib/mutations for API calls

Applied to files:

  • web/src/lib/mutations/dataset/refresh-dataset-database.ts
  • web/src/lib/mutations/dataset/refresh-dataset.ts
  • web/src/components/dataset/dataset-header.tsx
  • web/src/lib/mutations/dataset/source-dataset.ts
  • web/src/lib/mutations/dataset/source-database-dataset.ts
  • web/src/components/dataset/file-refresh-wizard.tsx
  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-12-02T11:48:50.411Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:50.411Z
Learning: Applies to web/src/lib/{queries,mutations}/**/*.{ts,tsx} : Use React Query (TanStack Query) for server state management

Applied to files:

  • web/src/lib/mutations/dataset/refresh-dataset-database.ts
  • web/src/lib/mutations/dataset/refresh-dataset.ts
  • web/src/lib/mutations/dataset/source-dataset.ts
  • web/src/lib/mutations/dataset/source-database-dataset.ts
  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-07-29T09:36:39.908Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-29T09:36:39.908Z
Learning: Applies to web/**/*.{js,jsx,ts,tsx} : Integrate DuckDB WASM for client-side SQL execution in the Next.js frontend

Applied to files:

  • web/src/lib/mutations/dataset/refresh-dataset-database.ts
  • web/src/components/dataset/dataset-header.tsx
  • web/src/lib/mutations/dataset/source-database-dataset.ts
  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-12-02T11:48:34.955Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: chat-server/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:34.955Z
Learning: Applies to chat-server/app/api/v1/routers/**/*.py : Use Server-Sent Events for streaming real-time query progress responses

Applied to files:

  • server/interfaces/http/routes/source/database/create.go
  • server/interfaces/http/routes/source/s3/create.go
  • server/interfaces/http/routes/source/database/refresh.go
📚 Learning: 2025-07-29T09:36:39.908Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-29T09:36:39.908Z
Learning: Applies to server/**/*.go : Integrate DuckDB for OLAP queries in the Go backend server

Applied to files:

  • server/interfaces/http/routes/source/database/create.go
📚 Learning: 2025-07-29T09:36:39.908Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-29T09:36:39.908Z
Learning: Applies to server/**/*.go : Store dataset files in S3-compatible storage (MinIO) in the Go backend server

Applied to files:

  • server/interfaces/http/routes/source/database/create.go
  • server/interfaces/http/routes/source/s3/refresh.go
  • server/interfaces/http/routes/source/s3/create.go
  • server/interfaces/http/routes/source/database/refresh.go
📚 Learning: 2025-07-29T09:36:39.908Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-29T09:36:39.908Z
Learning: Applies to server/**/*.go : Use PostgreSQL with SQLC for type-safe queries in the Go backend server

Applied to files:

  • server/interfaces/http/routes/source/database/create.go
📚 Learning: 2025-12-02T11:48:50.411Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:50.411Z
Learning: Applies to web/src/components/dataset/**upload**.{ts,tsx} : Use Uppy for dataset file uploads in upload components

Applied to files:

  • web/src/lib/mutations/dataset/refresh-dataset.ts
  • web/src/components/dataset/dataset-header.tsx
  • web/src/lib/mutations/dataset/source-dataset.ts
  • web/src/lib/mutations/dataset/source-database-dataset.ts
  • web/src/components/dataset/database-source-form.tsx
  • web/src/components/dataset/file-refresh-wizard.tsx
  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-12-02T11:48:50.411Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:50.411Z
Learning: Applies to web/src/components/dataset/sql-editor.{ts,tsx} : Use Monaco Editor for SQL editing with syntax highlighting in SQL editor components

Applied to files:

  • web/src/components/dataset/dataset-header.tsx
  • web/src/lib/mutations/dataset/source-database-dataset.ts
  • web/src/components/dataset/database-source-form.tsx
  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-07-29T09:36:39.908Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-29T09:36:39.908Z
Learning: Applies to web/**/*.{js,jsx,ts,tsx} : Use Monaco Editor for SQL editing in the Next.js frontend

Applied to files:

  • web/src/components/dataset/dataset-header.tsx
  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-12-02T11:48:50.411Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:50.411Z
Learning: Applies to web/src/**/*.{ts,tsx} : Load DuckDB WASM on-demand and use web workers for client-side SQL execution

Applied to files:

  • web/src/components/dataset/dataset-header.tsx
  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-12-02T11:48:50.411Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:50.411Z
Learning: Applies to web/src/lib/stores/**/*.{ts,tsx} : Use Zustand for global state management (auth, chat, SQL, visualization stores)

Applied to files:

  • web/src/components/dataset/dataset-header.tsx
📚 Learning: 2025-07-29T09:36:39.908Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-29T09:36:39.908Z
Learning: Applies to web/**/*.{js,jsx,ts,tsx} : Use Zustand for global state management and React Query for server state in the Next.js frontend

Applied to files:

  • web/src/components/dataset/file-refresh-wizard.tsx
  • web/src/components/dataset/dataset-upload-wizard.tsx
🧬 Code graph analysis (10)
web/src/lib/mutations/dataset/refresh-dataset-database.ts (1)
web/src/lib/sse-client.ts (1)
  • fetchWithSSE (10-96)
web/src/lib/mutations/dataset/refresh-dataset.ts (1)
web/src/lib/sse-client.ts (1)
  • fetchWithSSE (10-96)
web/src/lib/sse-client.ts (1)
web/src/lib/api-client.ts (1)
  • getGlobalOrganizationId (42-44)
web/src/components/dataset/dataset-header.tsx (2)
web/src/lib/mutations/dataset/refresh-dataset-database.ts (1)
  • useRefreshDatabaseDatasetSSE (66-93)
web/src/lib/queries/dataset/check-timestamp-column.ts (1)
  • useCheckTimestampColumn (16-42)
web/src/lib/mutations/dataset/source-dataset.ts (1)
web/src/lib/sse-client.ts (1)
  • fetchWithSSE (10-96)
web/src/lib/mutations/dataset/source-database-dataset.ts (1)
web/src/lib/sse-client.ts (1)
  • fetchWithSSE (10-96)
server/interfaces/http/routes/source/s3/create.go (3)
server/interfaces/http/routes/source/database/create.go (2)
  • SSEData (25-28)
  • SSEEvent (18-22)
server/domain/models/dataset.go (1)
  • CreateDatasetParams (84-108)
server/infrastructure/postgres/gen/datasets.sql.go (1)
  • CreateDatasetParams (32-45)
web/src/components/dataset/database-source-form.tsx (1)
web/src/lib/mutations/dataset/source-database-dataset.ts (1)
  • useSourceDatabaseDatasetSSE (17-41)
web/src/components/dataset/file-refresh-wizard.tsx (2)
web/src/lib/mutations/dataset/refresh-dataset.ts (1)
  • useRefreshDatasetSSE (73-106)
web/src/components/dataset/refresh-file-uploader.tsx (1)
  • RefreshFileUploader (64-402)
web/src/components/dataset/dataset-upload-wizard.tsx (2)
web/src/lib/mutations/dataset/source-dataset.ts (1)
  • useSourceDatasetSSE (59-103)
web/src/lib/sse-client.ts (1)
  • SSEEvent (3-8)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: claude-review
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (25)
server/interfaces/http/routes/source/database/create.go (4)

17-28: LGTM!

The SSE type definitions are well-structured for streaming progress updates. SSEEvent provides a clean JSON structure, and SSEData appropriately wraps channel communication with error handling capability.


135-156: LGTM!

The SSE channel setup and helper functions are well-designed. The buffer size of 10 is reasonable, and the helpers reduce code duplication. Note: ignoring json.Marshal errors (lines 145, 153) is acceptable here since these simple structs won't fail marshaling.


226-229: Verify cleanup logic: hasSummary set prematurely?

Setting cleanup.hasSummary = true after GetDatasetSummary (line 228) seems premature—this appears to compute/fetch the summary, not persist it. The actual persistence happens at CreateDatasetSummary (line 345). If cleanup runs before line 345, DeleteDatasetSummary may be called on a non-existent record.

Consider moving this assignment to after the summary is actually persisted:

 	summaryBytes, _ := json.Marshal(datasetSummary)
 	summaryString := string(summaryBytes)
-	cleanup.hasSummary = true

And keep only the assignment at line 353 (after CreateDatasetSummary succeeds). Please verify the semantics of GetDatasetSummary vs CreateDatasetSummary to confirm.


380-406: LGTM!

The SSE streaming setup follows best practices—correct headers, proper use of SetBodyStreamWriter, and appropriate error handling for write/flush operations. Based on learnings, this aligns with the recommendation to use Server-Sent Events for streaming real-time progress responses.

web/src/lib/validation/validate-file.ts (2)

607-616: LGTM!

The early-exit pattern efficiently handles the case where no validation errors occurred, avoiding unnecessary queries.


644-646: LGTM!

Graceful error handling that returns empty results instead of throwing ensures the validation process doesn't fail completely when rejected row retrieval encounters issues.

server/interfaces/http/routes/source/s3/refresh.go (1)

252-268: Client disconnection doesn't stop the background refresh operation.

If w.Write or w.Flush fails (client disconnected), the stream writer returns but the goroutine continues processing. This is acceptable if server-side completion is desired regardless of client presence, but wastes resources otherwise.

Consider using a context with cancellation passed to the goroutine if early termination on disconnect is preferred.

server/interfaces/http/routes/source/s3/create.go (2)

189-202: LGTM!

The progressive resource cleanup pattern is well-implemented. Resources are tracked incrementally and cleaned up appropriately on failure paths.


295-319: SSE streaming implementation is consistent with refresh.go.

The streaming pattern matches the refresh handler. The same considerations about client disconnection and error event format apply here.

server/interfaces/http/routes/source/database/refresh.go (1)

240-268: No rollback for partial failures after dataset update.

If summary creation (line 251) or AI agent upload (line 260) fails, the dataset record is already updated with new metrics. This may leave the system in an inconsistent state where dataset metadata is current but summary/schema are stale or missing.

Consider whether a transactional approach (similar to s3/refresh.go) or compensating actions are needed for consistency.

web/src/lib/mutations/dataset/refresh-dataset.ts (1)

73-75: Access token captured at hook call time may become stale.

The accessToken is captured when the hook renders, but the returned async function may be called later. If the token refreshes in between, a stale token is used.

Consider accessing the token inside the async function, or use a ref to always get the current value:

export const useRefreshDatasetSSE = () => {
  const { getAccessToken } = useAuth(); // if available
  return async (...) => {
    const accessToken = await getAccessToken();
    return fetchWithSSE(..., accessToken);
  };
};
web/src/lib/mutations/dataset/refresh-dataset-database.ts (1)

79-91: LGTM on the fetchWithSSE integration.

The SSE request configuration correctly maps the mutation parameters to the expected backend payload format, and properly passes the access token for authorization.

web/src/lib/mutations/dataset/source-database-dataset.ts (1)

17-41: LGTM — SSE hook implementation is correct.

The hook properly extracts the access token and delegates to fetchWithSSE. Consider adding an explicit return type for improved type safety, similar to the suggestion for useSourceDatasetSSE.

web/src/components/dataset/dataset-header.tsx (2)

370-375: Good use of Promise.all for parallel query invalidations.

Consolidating query invalidations into a single Promise.all improves efficiency by running them concurrently rather than sequentially.


971-975: Good UX: Preventing dialog close during active refresh.

Preventing the dialog from closing while a refresh is in progress avoids confusing state and potential data inconsistency.

web/src/components/dataset/database-source-form.tsx (1)

144-164: Progress UI during upload looks good.

The progress UI provides clear feedback with spinner, message, percentage, and progress bar. The informational text about process duration is helpful.

web/src/lib/sse-client.ts (1)

3-8: SSEEvent interface is well-defined.

The interface properly captures the expected event types with appropriate optional fields for data and progress.

web/src/components/dataset/file-refresh-wizard.tsx (3)

195-230: SSE progress handling looks good.

The SSE integration correctly updates the persistent toast with incremental progress. The pattern of capping progress at 90% before completion and using a persistent toast ID ensures a smooth user experience.


247-260: Error handling with toast transition is correct.

The error handling properly converts the persistent loading toast to an error toast with a finite duration, ensuring proper cleanup. The error message is captured and displayed appropriately.


463-480: New validation state alerts are well-structured.

The no_schema_to_compare and validation_unavailable states now have corresponding alert components with appropriate styling and messaging, improving the user experience for edge cases.

web/src/components/dataset/dataset-upload-wizard.tsx (5)

164-165: Good extraction of rejected rows into a computed variable.

Creating rejected as a derived variable from validationResult?.rejectedRows ?? [] with a fallback to an empty array is a clean pattern that simplifies the rendering logic and prevents potential undefined access issues.


1123-1156: Validation error rendering fix looks correct.

The validation warnings now properly render with:

  1. Stable composite keys using ${err.rowNumber}-${err.columnName}
  2. The errorMessage field is displayed correctly
  3. Proper fallback with empty array prevents crashes

This addresses the PR objective of fixing row validation errors not rendering properly.


1544-1559: Dialog validation error rendering is correct but key can be simplified.

The rendering logic is correct. The key ${error.rowNumber}-${error.columnName}-${index} is unique but the index suffix is redundant if rowNumber + columnName is already unique per error. This is a minor nit.


459-476: SSE progress callback pattern is consistent with other components.

The onProgress callback correctly updates the toast with SSE status messages. The artificial progress increment (capped at 90%) provides good UX feedback while waiting for actual completion.


450-481: Incorrect response structure assumption masks runtime logic error.

The fetchWithSSE function returns Promise<unknown> (no type parameter passed), but the code asserts it as { dataset: Dataset } while the actual response structure is { data: { dataset: Dataset, summary: {...} } }. This means the runtime check if (!res?.dataset?.id) accesses the wrong property path—it should be res?.data?.dataset?.id.

To fix: Pass a proper type parameter to fetchWithSSE<Response>() where Response is properly defined, or validate the SSE response structure using Zod (per coding guidelines for API response validation) to ensure type safety at both compile and runtime.

Comment on lines +136 to 140
if strings.HasPrefix(source.ConnectionString, "postgres") {
driver = "postgres"
} else {
driver = "mysql"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Driver detection may incorrectly default to MySQL.

The check only matches postgres prefix. Consider handling postgresql:// as well, or adding explicit validation:

-		if strings.HasPrefix(source.ConnectionString, "postgres") {
+		if strings.HasPrefix(source.ConnectionString, "postgres://") || strings.HasPrefix(source.ConnectionString, "postgresql://") {
 			driver = "postgres"
-		} else {
+		} else if strings.HasPrefix(source.ConnectionString, "mysql://") {
 			driver = "mysql"
+		} else {
+			handleFailure(fmt.Errorf("unsupported database driver in connection string"))
+			return
 		}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In server/interfaces/http/routes/source/database/refresh.go around lines 136 to
140, the current logic defaults to "mysql" whenever the connection string
doesn't start with "postgres", which can misdetect "postgresql://" or other
schemes; update the detection to explicitly handle both "postgres" and
"postgresql" prefixes (or better, parse the connection string with url.Parse and
switch on the URL scheme) and if the scheme is neither postgres/postgresql nor
mysql, return an error or validation failure instead of silently defaulting to
"mysql".

Comment on lines +234 to +244
<DialogFooter className="pt-4">
<DialogClose asChild>
<Button type="button" variant="outline" onClick={onCloseDialog}>
Cancel
</Button>
</DialogClose>
<Button type="submit">
Create Dataset
</Button>
</DialogClose>
<Button type="submit" disabled={sourceDatabaseDataset.isPending}>
{sourceDatabaseDataset.isPending ? "Creating..." : "Create Dataset"}
</Button>
</DialogFooter>
</form>
</DialogFooter>
</form>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Form lacks ability to cancel during upload.

When uploadProgress is set, the form is hidden and there's no cancel button in the progress UI. If the upload takes too long or the user wants to abort, they have no option except closing the browser tab. Consider adding a cancel mechanism or at minimum a note about closing the dialog.

🤖 Prompt for AI Agents
In web/src/components/dataset/database-source-form.tsx around lines 234-244, the
form footer hides any cancel action while uploadProgress is set so users cannot
abort an ongoing upload; update the footer rendering to show a Cancel/Abort
button (or a close icon) whenever uploadProgress is active, wiring that button
to an abort handler (prefer an injected abortUpload prop or store an
AbortController for the current upload and call controller.abort()), and then
call onCloseDialog or appropriate cleanup after abort; additionally, display a
short hint near the progress UI that the user can cancel to stop the upload.

}
}

return resultData as T;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Return type assertion may return undefined as T.

If no complete event is received before the stream ends, resultData remains undefined but is cast to T. This could cause runtime issues in callers expecting a valid value.

-  return resultData as T;
+  if (resultData === undefined) {
+    throw new Error('SSE stream ended without a complete event');
+  }
+  return resultData;
🤖 Prompt for AI Agents
In web/src/lib/sse-client.ts around line 95, the code returns resultData as T
which may be undefined if no 'complete' event was received; update the function
to either (A) validate resultData before returning and throw a descriptive
runtime error if it's undefined (e.g., "SSE stream ended without complete
event") or (B) change the function return type to T | undefined and propagate
that change to callers; implement one of these fixes and update any call sites
or types accordingly so callers handle the absent value instead of receiving an
unsafe asserted T.

Comment on lines +631 to +637
rejectedRows.push({
rowNumber: Number(row.line || 0),
columnName,
expectedType,
actualValue,
errorMessage,
};
});

// Get total count of rejected rows
const countQuery = await conn.query(
`SELECT COUNT(*) as total FROM reject_errors`
);
const totalCount = Number(countQuery.toArray()[0]?.total || 0);
expectedType:"",
actualValue:"",
errorMessage: errorMsg,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the validate-file.ts file and examine the relevant section
find web/src -name "validate-file.ts" -type f | head -5

Repository: factly/gopie

Length of output: 95


🏁 Script executed:

#!/bin/bash
# Search for RejectedRow interface definition
rg -n "interface RejectedRow" web/src/ -A 10

Repository: factly/gopie

Length of output: 765


🏁 Script executed:

#!/bin/bash
# Search for components that consume rejectedRows
rg -n "rejectedRows" web/src/ --type=ts --type=tsx | head -20

Repository: factly/gopie

Length of output: 83


🏁 Script executed:

#!/bin/bash
# Search for components that use or display rejectedRows
rg -l "rejectedRows" web/src --type=ts | head -10

Repository: factly/gopie

Length of output: 243


🏁 Script executed:

#!/bin/bash
# Get the actual implementation in validate-file.ts around lines 631-637
sed -n '610,650p' web/src/lib/validation/validate-file.ts

Repository: factly/gopie

Length of output: 1228


🏁 Script executed:

#!/bin/bash
# Search for imports or references to RejectedRow interface
rg -n "RejectedRow\|rejectedRows" web/src --type=ts -A 2 | head -50

Repository: factly/gopie

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Check csv-validation-uppy.tsx for how it displays rejectedRows
rg -n "expectedType\|actualValue\|rejectedRows" web/src/components/dataset/csv-validation-uppy.tsx -B 2 -A 2

Repository: factly/gopie

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Check dataset-upload-wizard.tsx for how it displays rejectedRows
rg -n "expectedType\|actualValue\|rejectedRows" web/src/components/dataset/dataset-upload-wizard.tsx -B 2 -A 2

Repository: factly/gopie

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Look for any table or display component that renders the validation errors
rg -n "RejectedRow\|rejected.*map\|rejected.*render" web/src/components/dataset/ -A 5

Repository: factly/gopie

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Search for where rejectedRows is actually used in components
rg -n "rejectedRows" web/src/components/dataset/ -B 5 -A 5

Repository: factly/gopie

Length of output: 9286


🏁 Script executed:

#!/bin/bash
# Look for table or list rendering patterns in csv-validation-uppy.tsx
head -200 web/src/components/dataset/csv-validation-uppy.tsx | tail -150

Repository: factly/gopie

Length of output: 4851


🏁 Script executed:

#!/bin/bash
# Search more broadly for validation error display
rg -n "errorMessage\|error_message" web/src/components/dataset/csv-validation-uppy.tsx -B 3 -A 3

Repository: factly/gopie

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Get lines 789-831 from csv-validation-uppy.tsx to see the complete displayMessage logic
sed -n '789,831p' web/src/components/dataset/csv-validation-uppy.tsx

Repository: factly/gopie

Length of output: 2129


🏁 Script executed:

#!/bin/bash
# Also check url-uploader.tsx logic for displayMessage
sed -n '652,680p' web/src/components/dataset/url-uploader.tsx

Repository: factly/gopie

Length of output: 1767


Empty expectedType and actualValue fields reduce error message clarity for users.

The expectedType and actualValue fields in RejectedRow objects (lines 631-637) are hardcoded to empty strings, but the UI components in csv-validation-uppy.tsx and url-uploader.tsx render these fields and have logic to display them when available. With empty values, users see degraded error messages:

  • actualValue always displays as "is empty" instead of the problematic value
  • expectedType always displays as "a valid type" instead of the actual expected type

Users lose critical context when understanding why rows were rejected. Consider:

  1. Parse error_message: Extract type or value information from DuckDB's error message string
  2. Query table schema: Retrieve expected types by querying information_schema.columns for the target table
  3. Make fields optional: If values cannot be reliably populated, mark them as optional in the interface and update UI consumers accordingly
🤖 Prompt for AI Agents
In web/src/lib/validation/validate-file.ts around lines 631-637 the RejectedRow
objects are populated with empty expectedType and actualValue which degrades UI
error clarity; update the code to populate these fields by (1) attempting to
parse DuckDB's error message to extract the offending value and expected type,
and set actualValue/expectedType when parsing succeeds, (2) if parsing fails,
query information_schema.columns for the target table to get the expected type
for the column and set expectedType, and (3) if neither approach yields reliable
values, change the RejectedRow type to make expectedType and actualValue
optional and adjust callers (csv-validation-uppy.tsx and url-uploader.tsx) to
handle absent fields gracefully.

@claude
Copy link

claude bot commented Dec 16, 2025

Claude Code is working…

I'll analyze this and get back to you.

View job run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
web/src/components/dataset/dataset-upload-wizard.tsx (1)

243-526: Decompose this long function for maintainability.

The handleCreateDataset function is 283 lines long, handling S3 URL extraction, upload triggering, URL parsing, SSE communication, and toast management. This violates the single responsibility principle and makes the code harder to test and maintain.

Consider extracting these helper functions:

// Extract S3 URL from upload response
function extractS3Url(uploadResponse: unknown): string {
  // Lines 277-288, 331-342 logic
}

// Parse and normalize S3 URL
function normalizeS3Url(uploadURL: string): string {
  // Lines 362-432 logic
}

// Create progress toast updater
function createToastUpdater(toastId: string) {
  return {
    updateProgress: (message: string, progress: number) => {
      toast.loading(
        <div className="flex flex-col gap-2 w-full min-w-[250px]">
          <div className="flex justify-between text-sm">
            <span className="font-medium">{message}</span>
            <span className="text-muted-foreground">{progress}%</span>
          </div>
          <Progress value={progress} className="h-2" />
        </div>,
        { id: toastId }
      );
    },
    // ... other toast helpers
  };
}

Then refactor handleCreateDataset to orchestrate these helpers, reducing it to ~100 lines.

🧹 Nitpick comments (1)
web/src/components/dataset/dataset-upload-wizard.tsx (1)

459-476: Consider progress tracking limitations.

The progress increments by 10% on each status_update event and caps at 90%, which doesn't reflect actual server-side progress. While this provides visual feedback, users might expect the percentage to represent real completion.

Consider these options:

  1. Use event.progress if the server provides it (check line 5 in web/src/lib/sse-client.ts shows SSEEvent has optional progress?: number)
  2. Add a comment explaining the progress is estimated
  3. Remove percentage display and show only a pulsing progress bar

Example using server progress:

         onProgress: (event: SSEEvent) => {
           if (event.type === 'status_update') {
-            // Increment progress artificially (capped at 90%)
-            currentProgress = Math.min(currentProgress + 10, 90);
+            // Use server-provided progress or fall back to artificial increment
+            currentProgress = event.progress ?? Math.min(currentProgress + 10, 90);
             
             // 6. UPDATE TOAST with SSE Message
             toast.loading(
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f8ed7a5 and 40ca5e3.

📒 Files selected for processing (1)
  • web/src/components/dataset/dataset-upload-wizard.tsx (11 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx}: Use Next.js 15 with App Router and React 19 for the web frontend framework
Use TypeScript as the language for the web frontend
Generate and use types from API responses for type safety
Use dynamic imports and code splitting for optimal performance
Load DuckDB WASM on-demand and use web workers for client-side SQL execution

Files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
web/src/components/**/*.{ts,tsx}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/components/**/*.{ts,tsx}: Use TailwindCSS with Radix UI components for styling instead of custom CSS
Use React Hook Form with Zod validation for form handling
Use pre-configured queries and mutations from lib/queries and lib/mutations for API calls
Wrap features in error boundaries to handle errors gracefully
Always handle loading and error states in UI components
Use Next.js Image component for image optimization

Files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
web/src/components/dataset/**upload**.{ts,tsx}

📄 CodeRabbit inference engine (web/CLAUDE.md)

Use Uppy for dataset file uploads in upload components

Files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
🧠 Learnings (9)
📓 Common learnings
Learnt from: CR
Repo: factly/gopie PR: 0
File: chat-server/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:34.955Z
Learning: Applies to chat-server/app/api/v1/routers/**/*.py : Use Server-Sent Events for streaming real-time query progress responses
📚 Learning: 2025-12-02T11:48:50.411Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:50.411Z
Learning: Applies to web/src/components/dataset/**upload**.{ts,tsx} : Use Uppy for dataset file uploads in upload components

Applied to files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-12-02T11:48:50.411Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:50.411Z
Learning: Applies to web/src/components/dataset/sql-editor.{ts,tsx} : Use Monaco Editor for SQL editing with syntax highlighting in SQL editor components

Applied to files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-12-02T11:48:50.411Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:50.411Z
Learning: Applies to web/src/components/**/*.{ts,tsx} : Use pre-configured queries and mutations from lib/queries and lib/mutations for API calls

Applied to files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-07-29T09:36:39.908Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-29T09:36:39.908Z
Learning: Applies to web/**/*.{js,jsx,ts,tsx} : Use Zustand for global state management and React Query for server state in the Next.js frontend

Applied to files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-07-29T09:36:39.908Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-29T09:36:39.908Z
Learning: Applies to web/**/*.{js,jsx,ts,tsx} : Integrate DuckDB WASM for client-side SQL execution in the Next.js frontend

Applied to files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-12-02T11:48:50.411Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:50.411Z
Learning: Applies to web/src/lib/{queries,mutations}/**/*.{ts,tsx} : Use React Query (TanStack Query) for server state management

Applied to files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-07-29T09:36:39.908Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-29T09:36:39.908Z
Learning: Applies to web/**/*.{js,jsx,ts,tsx} : Use Monaco Editor for SQL editing in the Next.js frontend

Applied to files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-12-02T11:48:50.411Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:50.411Z
Learning: Applies to web/src/**/*.{ts,tsx} : Load DuckDB WASM on-demand and use web workers for client-side SQL execution

Applied to files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
🧬 Code graph analysis (1)
web/src/components/dataset/dataset-upload-wizard.tsx (2)
web/src/lib/mutations/dataset/source-dataset.ts (1)
  • useSourceDatasetSSE (59-103)
web/src/lib/sse-client.ts (1)
  • SSEEvent (3-8)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: claude-review
🔇 Additional comments (5)
web/src/components/dataset/dataset-upload-wizard.tsx (5)

43-46: LGTM: Clean SSE integration imports.

The new imports for SSE support are well-organized and properly typed.


79-79: LGTM: SSE hook integration.

The hook usage follows the established pattern for mutations in this codebase.


164-164: LGTM: Clean derived state.

This simplifies access to rejected rows throughout the component and handles the undefined case properly.


1123-1164: LGTM: Clean validation error rendering.

The validation error display properly limits to 5 items and provides a clear summary count. The use of the derived rejected array simplifies the code and improves readability.


1552-1567: LGTM: Consistent validation error display.

The dialog maintains consistency with Step 2's validation rendering pattern, properly limiting display and showing a summary count.

Comment on lines +477 to 481
}) as { dataset: Dataset };

if (!res?.data.dataset.id) {
if (!res?.dataset?.id) {
throw new Error("Invalid response from server: Dataset ID not found.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unsafe type assertion could cause runtime errors.

Line 477 uses a type assertion as { dataset: Dataset } without validating the response shape. If the SSE endpoint returns an unexpected structure, this will cause runtime errors.

Apply this diff to add runtime validation:

-      }) as { dataset: Dataset };
+      });

-      if (!res?.dataset?.id) {
+      // Validate response shape
+      if (!res || typeof res !== 'object') {
+        throw new Error("Invalid response from server: Expected an object.");
+      }
+      
+      const response = res as Record<string, unknown>;
+      if (!response.dataset || typeof response.dataset !== 'object') {
+        throw new Error("Invalid response from server: Dataset object not found.");
+      }
+      
+      const dataset = response.dataset as Dataset;
+      if (!dataset.id) {
         throw new Error("Invalid response from server: Dataset ID not found.");
       }

Then update the subsequent usage:

       setCreatedDataset({
-        ...res.dataset,
+        ...dataset,
         formatDisplay,
         columnMappings: alter_column_names,
         columnDescriptions: column_descriptions,
         validationResult,
       });

-       toast.success(`Dataset ${res.dataset.alias} (${formatDisplay}) created successfully`, {
+       toast.success(`Dataset ${dataset.alias} (${formatDisplay}) created successfully`, {
         id: toastId,
         duration: 4000, 
       });

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +1280 to +1294
{/* {uploadProgress.status === 'processing' && (
<div className="space-y-3 p-4 border rounded-lg bg-secondary/10">
<div className="flex items-center justify-between text-sm">
<span className="font-medium flex items-center gap-2">
<Loader2 className="h-4 w-4 animate-spin text-primary" />
{uploadProgress.message}
</span>
<span className="text-muted-foreground">{uploadProgress.percentage}%</span>
</div>
<Progress value={uploadProgress.percentage} className="h-2" />
<p className="text-xs text-muted-foreground">
Processing your data. Large datasets may take a moment.
</p>
</div>
)} */}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove commented code.

This large commented block appears to be old upload progress UI that was replaced by the SSE toast implementation. Commented code should be removed to improve readability—git history preserves this for reference if needed.

Apply this diff:

-              {/* {uploadProgress.status === 'processing' && (
-      <div className="space-y-3 p-4 border rounded-lg bg-secondary/10">
-        <div className="flex items-center justify-between text-sm">
-          <span className="font-medium flex items-center gap-2">
-            <Loader2 className="h-4 w-4 animate-spin text-primary" />
-            {uploadProgress.message}
-          </span>
-          <span className="text-muted-foreground">{uploadProgress.percentage}%</span>
-        </div>
-        <Progress value={uploadProgress.percentage} className="h-2" />
-        <p className="text-xs text-muted-foreground">
-          Processing your data. Large datasets may take a moment.
-        </p>
-      </div>
-    )} */}
                            <Button
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{/* {uploadProgress.status === 'processing' && (
<div className="space-y-3 p-4 border rounded-lg bg-secondary/10">
<div className="flex items-center justify-between text-sm">
<span className="font-medium flex items-center gap-2">
<Loader2 className="h-4 w-4 animate-spin text-primary" />
{uploadProgress.message}
</span>
<span className="text-muted-foreground">{uploadProgress.percentage}%</span>
</div>
<Progress value={uploadProgress.percentage} className="h-2" />
<p className="text-xs text-muted-foreground">
Processing your data. Large datasets may take a moment.
</p>
</div>
)} */}
🤖 Prompt for AI Agents
In web/src/components/dataset/dataset-upload-wizard.tsx around lines 1280 to
1294, remove the entire commented-out JSX block that renders the old upload
progress UI (the multi-line comment starting with {/* and ending with */}) so
the file no longer contains dead commented code; rely on git history if you need
to restore it.

@claude
Copy link

claude bot commented Dec 16, 2025

Claude finished @surajmn1's task —— View job


PR Review: Row Validation Errors Rendering Fix

Comprehensive review completed for this PR that fixes row validation errors not rendering properly and adds streaming support for DB-based uploads.

Review Progress

  • Analyze validation error rendering fixes
  • Review new SSE client implementation
  • Check backend streaming changes
  • Examine frontend component updates
  • Review security and performance considerations
  • Check code quality and best practices
  • Provide comprehensive feedback

🔍 Detailed Code Review

Positive Findings

1. Excellent Validation Error Rendering Improvements

The validation error display has been significantly improved across components:

  • Enhanced Error Interface (validate-file.ts:22-32): New RejectedRow interface provides comprehensive error context
  • Detailed Error Display (dataset-upload-wizard.tsx:1123-1155): Validation warnings show specific row numbers, column names, and error messages
  • User-Friendly Warning Dialogs (dataset-upload-wizard.tsx:1518-1594): Clear dialog explaining data type issues with option to proceed
  • Consistent Error Handling: Pattern implemented across multiple upload components

2. Robust SSE Implementation

New streaming client (sse-client.ts) provides:

  • Clean TypeScript interfaces for event types
  • Proper authentication/authorization handling
  • Progressive loading capabilities
  • Environment-aware configuration

3. Comprehensive Backend Streaming

Database source endpoints demonstrate excellent patterns:

  • Progressive status updates via SSE events
  • Comprehensive resource cleanup on failures (create.go:44-76)
  • Proper error propagation and handling
  • Good separation of concerns

4. Improved User Experience

  • Real-time Progress Feedback: Users see detailed progress during long operations
  • Better Error Context: Specific validation errors instead of generic failures
  • Graceful Degradation: Operations continue with warnings when possible

⚠️ Issues Requiring Attention

1. Critical: SSE Error Handling Vulnerability

Location: web/src/lib/sse-client.ts:86-90

} catch (e) {
  if (e instanceof Error && e.message !== "Unexpected end of JSON input") {
     console.error("SSE Parse Error", e);
  }
  if (line.includes('"type":"error"')) throw e; 
}

Issues:

  • Silently ignoring "Unexpected end of JSON input" could mask connection issues
  • String-based error detection is fragile and could miss malformed events
  • No validation of parsed JSON structure before processing

Recommended Fix:

} catch (e) {
  console.warn("SSE Parse Error", { error: e, line, context: 'SSE_STREAM' });
  // Only re-throw if this is a properly formatted error event
  if (line.trim().startsWith('data: ')) {
    try {
      const parsed = JSON.parse(line.slice(6));
      if (parsed?.type === 'error') throw e;
    } catch {}
  }
}

2. Performance: Memory Usage in Validation

Location: Multiple files use LIMIT 1000000

  • validate-file.ts:234: CSV validation processing 1M rows
  • validate-file.ts:285: Parquet validation
  • validate-file.ts:348: JSON validation

Concern: Processing 1M rows in browser can cause memory pressure and UI blocking.

Recommendation: Implement adaptive limits based on available memory:

const getValidationLimit = () => {
  const memoryMB = (performance as any)?.memory?.usedJSHeapSize / (1024 * 1024) || 0;
  return memoryMB > 100 ? 100000 : 1000000; // Reduce limit if memory usage is high
};

3. Progress Tracking Accuracy

Location: dataset-upload-wizard.tsx:462, dataset-header.tsx:344

currentProgress = Math.min(currentProgress + 10, 90);

Issue: Artificial progress increments don't reflect actual completion and may mislead users.

Recommendation: Use actual operation progress when available, or implement time-based estimates.

🔒 Security Assessment

Strong Points

  • SQL Injection Prevention: Random table names and parameterized queries
  • Authentication: Proper token handling with organization-based access
  • Input Validation: Comprehensive request validation on backend
  • Resource Cleanup: Prevents resource leaks and unauthorized access

⚠️ Areas to Monitor

  • Rate Limiting: No visible rate limiting on SSE endpoints
  • File Size Limits: Large file processing could enable DoS attacks

Performance Analysis

Optimizations

  • Client-side Validation: Reduces server load using DuckDB WASM
  • Streaming Responses: Prevents timeout issues on large operations
  • Efficient Sampling: ORDER BY random() LIMIT 50 for AI analysis

⚠️ Potential Issues

  • UI Blocking: Heavy validation tasks could impact responsiveness
  • Memory Usage: Large file processing without web workers
  • Database Load: Random sampling expensive on large tables

📋 Code Quality Assessment

Strengths

  • TypeScript Usage: Excellent type safety and interfaces
  • Error Boundaries: Good error isolation and handling
  • Code Organization: Clear separation of concerns
  • Consistent Patterns: Similar error handling across components

⚠️ Improvements Needed

  • Documentation: Complex SSE logic needs more inline comments
  • Test Coverage: No visible tests for error scenarios
  • Logging: Some error paths lack sufficient logging context

🧪 Missing Test Coverage

Critical scenarios that should have tests:

  • SSE client error handling and reconnection
  • Validation error rendering edge cases
  • Resource cleanup failure scenarios
  • Memory pressure during large file validation

📝 Prioritized Recommendations

🚨 High Priority (Pre-merge)

  1. Fix SSE error handling to prevent silent failures and improve error detection
  2. Add timeout handling for long-running SSE operations
  3. Improve progress tracking accuracy where possible

Medium Priority (Next Sprint)

  1. Implement memory monitoring for large file validation
  2. Add retry logic for failed cleanup operations
  3. Consider web workers for heavy validation tasks

📚 Low Priority (Future)

  1. Add comprehensive unit tests for error scenarios
  2. Implement rate limiting for SSE endpoints
  3. Add performance monitoring for validation operations

🎯 Final Assessment

This PR represents a significant improvement in user experience and system robustness. The validation error rendering fixes address a real usability issue, and the streaming capabilities add valuable real-time feedback.

Key Achievements:

  • ✅ Fixed validation error rendering across multiple components
  • ✅ Added robust streaming architecture with proper cleanup
  • ✅ Improved user experience with real-time progress feedback
  • ✅ Maintained strong security and type safety practices

Critical Issue:

  • ⚠️ SSE error handling needs improvement to prevent silent failures

Recommendation:

✅ Approve with requested changes - Fix the SSE error handling issue, then this PR significantly improves the platform and is ready to ship.

The core functionality is solid and the improvements are valuable. Address the error handling edge case and this will be an excellent addition to the codebase.


View job run: https://github.com/factly/gopie/actions/runs/20268567433

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
web/src/components/dataset/dataset-upload-wizard.tsx (2)

477-477: Duplicate: Unsafe type assertion flagged in previous review.

This type assertion without runtime validation was already identified in a previous review comment. The suggested fix includes proper runtime validation of the response structure before casting.


1280-1294: Duplicate: Remove commented code as previously flagged.

This commented-out upload progress UI was already identified for removal in a previous review. Dead code should be deleted to improve maintainability—git history preserves it if needed.

🧹 Nitpick comments (1)
web/src/components/dataset/dataset-upload-wizard.tsx (1)

459-476: Consider smoothing progress increments based on event count.

The current logic increments progress by 10% per SSE event (capped at 90%). If the backend sends many status updates quickly, progress could jump rapidly and feel jarring.

Consider a more gradual increment or time-based smoothing:

-            // Increment progress artificially (capped at 90%)
-            currentProgress = Math.min(currentProgress + 10, 90);
+            // Increment progress gradually (capped at 90%)
+            const increment = (90 - currentProgress) * 0.2; // 20% of remaining progress
+            currentProgress = Math.min(currentProgress + increment, 90);

This creates a logarithmic progress curve that feels smoother to users.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 40ca5e3 and 6224a02.

📒 Files selected for processing (1)
  • web/src/components/dataset/dataset-upload-wizard.tsx (11 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx}: Use Next.js 15 with App Router and React 19 for the web frontend framework
Use TypeScript as the language for the web frontend
Generate and use types from API responses for type safety
Use dynamic imports and code splitting for optimal performance
Load DuckDB WASM on-demand and use web workers for client-side SQL execution

Files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
web/src/components/**/*.{ts,tsx}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/components/**/*.{ts,tsx}: Use TailwindCSS with Radix UI components for styling instead of custom CSS
Use React Hook Form with Zod validation for form handling
Use pre-configured queries and mutations from lib/queries and lib/mutations for API calls
Wrap features in error boundaries to handle errors gracefully
Always handle loading and error states in UI components
Use Next.js Image component for image optimization

Files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
web/src/components/dataset/**upload**.{ts,tsx}

📄 CodeRabbit inference engine (web/CLAUDE.md)

Use Uppy for dataset file uploads in upload components

Files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
🧠 Learnings (9)
📓 Common learnings
Learnt from: CR
Repo: factly/gopie PR: 0
File: chat-server/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:34.955Z
Learning: Applies to chat-server/app/api/v1/routers/**/*.py : Use Server-Sent Events for streaming real-time query progress responses
📚 Learning: 2025-12-02T11:48:50.411Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:50.411Z
Learning: Applies to web/src/components/dataset/**upload**.{ts,tsx} : Use Uppy for dataset file uploads in upload components

Applied to files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-12-02T11:48:50.411Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:50.411Z
Learning: Applies to web/src/components/dataset/sql-editor.{ts,tsx} : Use Monaco Editor for SQL editing with syntax highlighting in SQL editor components

Applied to files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-12-02T11:48:50.411Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:50.411Z
Learning: Applies to web/src/components/**/*.{ts,tsx} : Use pre-configured queries and mutations from lib/queries and lib/mutations for API calls

Applied to files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-07-29T09:36:39.908Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-29T09:36:39.908Z
Learning: Applies to web/**/*.{js,jsx,ts,tsx} : Use Zustand for global state management and React Query for server state in the Next.js frontend

Applied to files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-07-29T09:36:39.908Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-29T09:36:39.908Z
Learning: Applies to web/**/*.{js,jsx,ts,tsx} : Integrate DuckDB WASM for client-side SQL execution in the Next.js frontend

Applied to files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-12-02T11:48:50.411Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:50.411Z
Learning: Applies to web/src/lib/{queries,mutations}/**/*.{ts,tsx} : Use React Query (TanStack Query) for server state management

Applied to files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-07-29T09:36:39.908Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-29T09:36:39.908Z
Learning: Applies to web/**/*.{js,jsx,ts,tsx} : Use Monaco Editor for SQL editing in the Next.js frontend

Applied to files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
📚 Learning: 2025-12-02T11:48:50.411Z
Learnt from: CR
Repo: factly/gopie PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2025-12-02T11:48:50.411Z
Learning: Applies to web/src/**/*.{ts,tsx} : Load DuckDB WASM on-demand and use web workers for client-side SQL execution

Applied to files:

  • web/src/components/dataset/dataset-upload-wizard.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: claude-review
🔇 Additional comments (5)
web/src/components/dataset/dataset-upload-wizard.tsx (5)

43-46: LGTM! Clean SSE integration imports.

The new imports properly support the SSE-based upload flow with progress tracking.


79-79: LGTM! SSE hook properly initialized.

The useSourceDatasetSSE hook correctly replaces the previous mutation approach for streaming upload progress.

Based on learnings, using Server-Sent Events for streaming real-time progress is the recommended pattern.


164-164: LGTM! Clean derived state pattern.

Extracting rejected as a derived value improves readability and eliminates repetitive validationResult?.rejectedRows ?? [] checks throughout the component.


1123-1164: LGTM! Clear and concise validation error display.

The refactored rendering using the rejected array provides a clean, user-friendly display of validation errors with appropriate limiting (5 items max) and summary for additional issues.


1552-1569: LGTM! Consistent validation warning display.

The dialog rendering matches the pattern used in Step 2, providing a consistent user experience when displaying validation errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants