Dataset Management

Overview Description

The Dataset Management feature provides administrative capabilities for maintaining and managing datasets used in product analysis. This feature enables users to handle dataset-related operations, with a key function being the ability to retry dataset analysis when previous attempts have failed or produced incomplete results. This functionality ensures data reliability and completeness for analytical purposes.

This feature is primarily intended for administrators and analysts who need to ensure the quality and availability of datasets for product analysis. It helps maintain system integrity by allowing recovery from processing issues without requiring complete re-creation of datasets.

Swagger Link

API: Retry Dataset API

Case Documentation

Case 1: Retrying Dataset Analysis

Description

User initiates a retry of the analysis process for a dataset that previously encountered processing issues.

Sequence Diagram

sequenceDiagram
    participant User
    participant API as WishlistDatasetController
    participant Service as ProductAnalysisService
    participant Repo as WishlistDatasetHistoryRepository
    participant Auth as Authentication
    participant JP as JP API

    Note over User,JP: Step 1: Request Dataset Retry
    User->>API: GET /api/v1/general/{wldh_slug}/retry
    
    Note over API,Repo: Step 2: Validate Dataset
    API->>Repo: findBySlug(wldh_slug)
    Repo-->>API: wishlistDatasetHistory
    
    Note over API,Auth: Step 3: Validate User Permission
    API->>Auth: get_logged_in_user()
    Auth-->>API: user
    API->>API: Verify user belongs to same group
    
    Note over API,Service: Step 4: Initiate Dataset Retry
    API->>Service: retryDataset(wishlistDatasetHistory, dataset_id)
    Service->>JP: Update dataset status and queue for reprocessing
    JP-->>Service: Operation result
    
    Note over API,User: Step 5: Return Response
    API-->>User: 200 OK (Success message)

Steps

Step 1: Request Dataset Retry

Description: User makes a request to retry dataset analysis
Request: GET /api/v1/general/{wldh_slug}/retry
Parameters:
- Path parameter: wldh_slug - unique identifier for the dataset history

Step 2: Validate Dataset

Description: System validates that the requested dataset exists
Action: Find dataset history by slug in the repository
Potential errors: Dataset not found

Step 3: Validate User Permission

Description: System validates that the user has permission to retry the dataset
Action: Check if logged-in user belongs to the same group as the dataset
Potential errors: Unauthorized access

Step 4: Initiate Dataset Retry

Description: System initiates the retry process for the dataset
Action: Call service method to retry dataset analysis
Operations performed:
- Reset dataset processing status
- Clear previous error flags
- Queue dataset for reprocessing
- Update timestamp information

Step 5: Return Response

Description: System returns the result of the retry operation
Response:
- Success: 200 OK with success message
- Error: Appropriate error message and status code

Database Related Tables & Fields

View: Analysis API Spectification

Error Handling

Log
- Dataset retry failures logged to application log

Error Detail:

Status Code	Error Message	Description
404	"Dataset not found"	When the specified dataset slug does not exist
404	"Wishlist group not found"	When the associated wishlist group cannot be found
403	"Unauthorized access"	When the user doesn't belong to the same group as the dataset
500	"Failed to retry dataset"	When the retry operation fails for technical reasons
500	Generic error with exception message	When unexpected errors occur during processing

Additional Notes

Dataset retry should only be used when necessary, as it may consume significant system resources
Multiple consecutive retries may indicate underlying data issues that should be addressed
Consider implementing a cooldown period between retry attempts for the same dataset
The system maintains a history of retry attempts for audit and troubleshooting purposes
Dataset retry does not modify the original dataset content, only reprocesses the existing data

07.dataset-retry