Dataset Management

Overview Description

The Dataset Management feature provides administrative capabilities for maintaining and managing datasets used in product analysis. This feature enables users to handle dataset-related operations, with a key function being the ability to retry dataset analysis when previous attempts have failed or produced incomplete results. This functionality ensures data reliability and completeness for analytical purposes.

This feature is primarily intended for administrators and analysts who need to ensure the quality and availability of datasets for product analysis. It helps maintain system integrity by allowing recovery from processing issues without requiring complete re-creation of datasets.

Swagger Link

API: Retry Dataset API

Case Documentation

Case 1: Retrying Dataset Analysis

Description

User initiates a retry of the analysis process for a dataset that previously encountered processing issues.

Sequence Diagram

sequenceDiagram
    participant User
    participant API as WishlistDatasetController
    participant Service as ProductAnalysisService
    participant Repo as WishlistDatasetHistoryRepository
    participant Auth as Authentication
    participant JP as JP API

    Note over User,JP: Step 1: Request Dataset Retry
    User->>API: GET /api/v1/general/{wldh_slug}/retry
    
    Note over API,Repo: Step 2: Validate Dataset
    API->>Repo: findBySlug(wldh_slug)
    Repo-->>API: wishlistDatasetHistory
    
    Note over API,Auth: Step 3: Validate User Permission
    API->>Auth: get_logged_in_user()
    Auth-->>API: user
    API->>API: Verify user belongs to same group
    
    Note over API,Service: Step 4: Initiate Dataset Retry
    API->>Service: retryDataset(wishlistDatasetHistory, dataset_id)
    Service->>JP: Update dataset status and queue for reprocessing
    JP-->>Service: Operation result
    
    Note over API,User: Step 5: Return Response
    API-->>User: 200 OK (Success message)

Steps

Step 1: Request Dataset Retry

  • Description: User makes a request to retry dataset analysis
  • Request: GET /api/v1/general/{wldh_slug}/retry
  • Parameters:
    • Path parameter: wldh_slug - unique identifier for the dataset history

Step 2: Validate Dataset

  • Description: System validates that the requested dataset exists
  • Action: Find dataset history by slug in the repository
  • Potential errors: Dataset not found

Step 3: Validate User Permission

  • Description: System validates that the user has permission to retry the dataset
  • Action: Check if logged-in user belongs to the same group as the dataset
  • Potential errors: Unauthorized access

Step 4: Initiate Dataset Retry

  • Description: System initiates the retry process for the dataset
  • Action: Call service method to retry dataset analysis
  • Operations performed:
    • Reset dataset processing status
    • Clear previous error flags
    • Queue dataset for reprocessing
    • Update timestamp information

Step 5: Return Response

  • Description: System returns the result of the retry operation
  • Response:
    • Success: 200 OK with success message
    • Error: Appropriate error message and status code

Database Related Tables & Fields

View: Analysis API Spectification

Error Handling

  • Log

    • Dataset retry failures logged to application log
  • Error Detail:

    Status Code Error Message Description
    404 "Dataset not found" When the specified dataset slug does not exist
    404 "Wishlist group not found" When the associated wishlist group cannot be found
    403 "Unauthorized access" When the user doesn't belong to the same group as the dataset
    500 "Failed to retry dataset" When the retry operation fails for technical reasons
    500 Generic error with exception message When unexpected errors occur during processing

Additional Notes

  • Dataset retry should only be used when necessary, as it may consume significant system resources
  • Multiple consecutive retries may indicate underlying data issues that should be addressed
  • Consider implementing a cooldown period between retry attempts for the same dataset
  • The system maintains a history of retry attempts for audit and troubleshooting purposes
  • Dataset retry does not modify the original dataset content, only reprocesses the existing data