Dataset Management
Overview Description
The Dataset Management feature provides administrative capabilities for maintaining and managing datasets used in product analysis. This feature enables users to handle dataset-related operations, with a key function being the ability to retry dataset analysis when previous attempts have failed or produced incomplete results. This functionality ensures data reliability and completeness for analytical purposes.
This feature is primarily intended for administrators and analysts who need to ensure the quality and availability of datasets for product analysis. It helps maintain system integrity by allowing recovery from processing issues without requiring complete re-creation of datasets.
Swagger Link
API: Retry Dataset API
Case Documentation
Case 1: Retrying Dataset Analysis
Description
User initiates a retry of the analysis process for a dataset that previously encountered processing issues.
Sequence Diagram
sequenceDiagram
participant User
participant API as WishlistDatasetController
participant Service as ProductAnalysisService
participant Repo as WishlistDatasetHistoryRepository
participant Auth as Authentication
participant JP as JP API
Note over User,JP: Step 1: Request Dataset Retry
User->>API: GET /api/v1/general/{wldh_slug}/retry
Note over API,Repo: Step 2: Validate Dataset
API->>Repo: findBySlug(wldh_slug)
Repo-->>API: wishlistDatasetHistory
Note over API,Auth: Step 3: Validate User Permission
API->>Auth: get_logged_in_user()
Auth-->>API: user
API->>API: Verify user belongs to same group
Note over API,Service: Step 4: Initiate Dataset Retry
API->>Service: retryDataset(wishlistDatasetHistory, dataset_id)
Service->>JP: Update dataset status and queue for reprocessing
JP-->>Service: Operation result
Note over API,User: Step 5: Return Response
API-->>User: 200 OK (Success message)
Steps
Step 1: Request Dataset Retry
- Description: User makes a request to retry dataset analysis
- Request:
GET /api/v1/general/{wldh_slug}/retry - Parameters:
- Path parameter:
wldh_slug- unique identifier for the dataset history
- Path parameter:
Step 2: Validate Dataset
- Description: System validates that the requested dataset exists
- Action: Find dataset history by slug in the repository
- Potential errors: Dataset not found
Step 3: Validate User Permission
- Description: System validates that the user has permission to retry the dataset
- Action: Check if logged-in user belongs to the same group as the dataset
- Potential errors: Unauthorized access
Step 4: Initiate Dataset Retry
- Description: System initiates the retry process for the dataset
- Action: Call service method to retry dataset analysis
- Operations performed:
- Reset dataset processing status
- Clear previous error flags
- Queue dataset for reprocessing
- Update timestamp information
Step 5: Return Response
- Description: System returns the result of the retry operation
- Response:
- Success:
200 OKwith success message - Error: Appropriate error message and status code
- Success:
Database Related Tables & Fields
View: Analysis API Spectification
Error Handling
-
Log
- Dataset retry failures logged to application log
-
Error Detail:
Status Code Error Message Description 404 "Dataset not found" When the specified dataset slug does not exist 404 "Wishlist group not found" When the associated wishlist group cannot be found 403 "Unauthorized access" When the user doesn't belong to the same group as the dataset 500 "Failed to retry dataset" When the retry operation fails for technical reasons 500 Generic error with exception message When unexpected errors occur during processing
Additional Notes
- Dataset retry should only be used when necessary, as it may consume significant system resources
- Multiple consecutive retries may indicate underlying data issues that should be addressed
- Consider implementing a cooldown period between retry attempts for the same dataset
- The system maintains a history of retry attempts for audit and troubleshooting purposes
- Dataset retry does not modify the original dataset content, only reprocesses the existing data