Dataset Creation
Description
The Dataset Creation feature provides advanced integration with the TV Python API for sophisticated product analysis and business intelligence generation. This system transforms wishlist data into comprehensive datasets for machine learning analysis, enabling deep insights into product performance, market trends, and competitive positioning.
This feature serves as the bridge between wishlist management and advanced analytics, offering:
- Automated dataset preparation from wishlist data
- TV Python API integration for advanced analysis
- Real-time status monitoring and progress tracking
- Comprehensive error handling and retry mechanisms
- Historical dataset tracking and management
- Training status management and scheduling
The system ensures data integrity through transaction-safe operations and provides real-time updates via Pusher integration for optimal user experience.
Activity Diagram
---
config:
theme: base
layout: dagre
flowchart:
curve: linear
htmlLabels: true
themeVariables:
edgeLabelBackground: "transparent"
---
flowchart TB
%% Main components
UserRequest[User Request]
Database[(Database)]
subgraph Controllers
WishlistController[WishlistToGroupController]
end
subgraph Services
AnalysisService(ProductAnalysisService)
DatasetService(WishlistDatasetHistoryService)
end
subgraph ExternalAPIs
TVPythonAPI((TV Python API))
PusherService((Pusher Service))
end
subgraph Middleware
AuthMiddleware{AuthMiddleware}
end
UserRequest --- Step1[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #6699cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>1</span>
<p style='margin-top: 8px'>Authentication & Wishlist Validation</p>
</div>
]
Step1 --> AuthMiddleware
AuthMiddleware --> WishlistController
WishlistController --- Step2[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #6699cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>2</span>
<p style='margin-top: 8px'>Dataset Preparation & Validation</p>
</div>
]
Step2 --> AnalysisService
AnalysisService --> Database
AnalysisService --- Step3[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #6699cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>3</span>
<p style='margin-top: 8px'>TV Python API Integration</p>
</div>
]
Step3 --> TVPythonAPI
TVPythonAPI --- Step4[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #6699cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>4</span>
<p style='margin-top: 8px'>Dataset History & Status Tracking</p>
</div>
]
Step4 --> DatasetService
DatasetService --> Database
DatasetService --- Step5[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #6699cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>5</span>
<p style='margin-top: 8px'>Real-time Monitoring & Updates</p>
</div>
]
Step5 --> PusherService
TVPythonAPI -.-> DatasetService
PusherService --- Step6[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #6699cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>6</span>
<p style='margin-top: 8px'>Result Processing & Training Status</p>
</div>
]
Step6 --> WishlistController
%% Styling
style UserRequest fill:#e6f3ff,stroke:#0066cc,stroke-width:2px
style Database fill:#ffe6cc,stroke:#ff9900,stroke-width:2px
style TVPythonAPI fill:#fcd9d9,stroke:#cc3333,stroke-width:2px
style PusherService fill:#fcd9d9,stroke:#cc3333,stroke-width:2px
style Controllers fill:#e6f3ff
style Services fill:#f0f8e6
style ExternalAPIs fill:#fcd9d9
style Middleware fill:#f5f0ff
style Step1 fill:transparent,stroke:transparent,stroke-width:1px
style Step2 fill:transparent,stroke:transparent,stroke-width:1px
style Step3 fill:transparent,stroke:transparent,stroke-width:1px
style Step4 fill:transparent,stroke:transparent,stroke-width:1px
style Step5 fill:transparent,stroke:transparent,stroke-width:1px
style Step6 fill:transparent,stroke:transparent,stroke-width:1px
Detail Dataflow Dependency
Step-by-Step Process
Step 1: Authentication & Wishlist Validation
- Description: Validates user authentication, wishlist access, and subscription status for dataset creation
- Action: Authenticate user, validate wishlist ownership, check subscription validity, verify group membership
- Input: User credentials, wishlist slug, subscription details, group membership data
- Output: Validated user context with wishlist access and subscription confirmation
- Dependencies: User authentication service, wishlist repository, subscription validation
- External Services: Authentication provider, subscription billing system
Step 2: Dataset Preparation & Validation
- Description: Prepares wishlist data for TV Python API integration and validates dataset requirements
- Action: Aggregate wishlist products/categories/queries, validate data completeness, check existing datasets
- Input: Wishlist data, product details, category information, search queries
- Output: Formatted dataset request with validation results and conflict checking
- Dependencies: Wishlist data services, dataset history service, data validation
- External Services: Product validation APIs, data integrity services
Step 3: TV Python API Integration
- Description: Sends dataset creation request to TV Python API with proper formatting and error handling
- Action: Format data for TV Python API, send HTTP request, handle API responses, manage timeouts
- Input: Formatted dataset request, API credentials, configuration parameters
- Output: Dataset creation response with tracking ID and initial status
- Dependencies: TV Python API client, HTTP service, configuration management
- External Services: TV Python API, dataset storage service, API gateway
Step 4: Dataset History & Status Tracking
- Description: Creates dataset history record and initializes real-time status monitoring
- Action: Create history record, initialize status tracking, setup monitoring, configure notifications
- Input: Dataset response, tracking ID, wishlist context, user preferences
- Output: Dataset history record with monitoring configuration and notification setup
- Dependencies: Dataset history service, status tracking service, notification service
- External Services: Database cluster, monitoring systems, notification queue
Step 5: Real-time Monitoring & Progress Updates
- Description: Monitors dataset analysis progress with real-time updates and status notifications
- Action: Poll TV Python API status, update progress, send Pusher notifications, handle status changes
- Input: Dataset tracking ID, polling configuration, notification preferences
- Output: Real-time status updates, progress notifications, completion alerts
- Dependencies: Polling service, notification service, status management
- External Services: TV Python API status endpoints, Pusher real-time service, monitoring dashboard
Step 6: Result Processing & Training Status Management
- Description: Processes analysis completion, updates training status, and manages result availability
- Action: Process completion status, update training configuration, manage result access, handle errors
- Input: Analysis completion data, training preferences, error information
- Output: Updated training status, result availability, error handling
- Dependencies: Training status service, result processing, error management
- External Services: Result storage service, training scheduler, error reporting systems
Database Related Tables & Fields
Database: gb_console
erDiagram
wishlist_dataset_histories {
bigint id PK
bigint wishlist_to_group_id FK
bigint dataset_id
string slug
integer status "status of the dataset"
}
wishlist_to_groups {
bigint id PK
string name "Name of the wishlist"
string slug "Slug of the wishlist"
integer status "0: Inactive, 1: Active, 3: Canceled"
}
wishlist_dataset_histories }|--|| wishlist_to_groups : "belongs to"
Case Documentation
Case 1: Create Dataset for Analysis
API: Create Dataset
Sequence Diagram
sequenceDiagram
participant Client
participant Controller as WishlistToGroupController
participant Service as ProductAnalysisService
participant Repository as WishlistToGroupRepository
participant HistoryRepo as WishlistDatasetHistoryRepository
participant TVPythonAPI as TV Python API
participant Pusher as Pusher Service
participant Database
Note over Client,Database: Create Dataset for Analysis Flow
rect rgb(255, 255, 200)
Note right of Client: Authentication Phase
Client->>Controller: POST /api/v1/wishlist-to-group/{slug}/createAnalysis
Controller->>Controller: Authenticate User
Controller->>Repository: findBySlug(slug)
Repository->>Database: SELECT wishlist
Database-->>Repository: Wishlist Record
Repository-->>Controller: Wishlist or null
end
rect rgb(200, 230, 255)
Note right of Controller: Validation Phase
Controller->>Controller: Validate Group Access
Controller->>Controller: Check Subscription Status
Controller->>Controller: Validate Dataset History
alt Validation Failed
Controller-->>Client: Error Response
end
end
rect rgb(200, 255, 255)
Note right of Controller: Dataset Preparation
Controller->>Service: createDatasetForWishlistToGroup(wishlist)
Service->>Service: Prepare Dataset Request
Service->>Service: Aggregate Wishlist Data
Service->>Service: Format for TV Python API
end
rect rgb(255, 230, 200)
Note right of Service: TV Python API Integration
Service->>TVPythonAPI: POST /dataset/create
TVPythonAPI->>TVPythonAPI: Process Dataset Request
TVPythonAPI-->>Service: Dataset Creation Response
Service->>Service: Extract Dataset ID
end
rect rgb(230, 200, 255)
Note right of Service: History & Status Tracking
Service->>HistoryRepo: create(datasetHistory)
HistoryRepo->>Database: INSERT dataset_history
Database-->>HistoryRepo: Created Record
Service->>Service: Initialize Status Monitoring
end
rect rgb(200, 255, 200)
Note right of Service: Success Response
Service-->>Controller: Success Result
Controller->>Pusher: Send Initial Notification
Pusher->>Client: Real-time Status Update
Controller-->>Client: JSON Response with Dataset ID
end
rect rgb(255, 200, 200)
Note right of Service: Error Handling
alt TV Python API Error
TVPythonAPI-->>Service: API Error Response
Service->>Service: Log Error Details
Service-->>Controller: API Error
else Database Error
Database-->>HistoryRepo: Database Error
HistoryRepo-->>Service: Creation Failed
Service-->>Controller: Database Error
else Subscription Error
Controller->>Controller: Subscription Invalid
Controller-->>Client: Subscription Error
end
Controller-->>Client: Error Response
end
Steps
- Authentication & Authorization: Validate user and wishlist access
- Subscription Validation: Check subscription status and validity
- Dataset History Check: Verify no active dataset processing
- Data Preparation: Aggregate and format wishlist data for TV Python API
- API Integration: Send dataset creation request to TV Python API
- History Creation: Create dataset history record with tracking ID
- Status Monitoring: Initialize real-time status monitoring
- Notification Setup: Configure Pusher notifications for progress updates
Error Handling
- 401 Unauthorized: Invalid authentication
- 404 Not Found: Wishlist not found or access denied
- 402 Payment Required: Subscription expired or invalid
- 409 Conflict: Dataset already processing
- 503 Service Unavailable: TV Python API unavailable
- 500 Internal Server Error: Database or system errors
Case 2: Get Dataset Histories
Sequence Diagram
sequenceDiagram
participant Client
participant Controller as WishlistToGroupController
participant Service as WishlistDatasetHistoryService
participant Repository as WishlistDatasetHistoryRepository
participant Database
Note over Client,Database: Get Dataset Histories Flow
rect rgb(255, 255, 200)
Note right of Client: Authentication Phase
Client->>Controller: GET /api/v1/wishlist-to-group/{slug}/histories
Controller->>Controller: Authenticate User
Controller->>Controller: Validate Wishlist Access
end
rect rgb(200, 230, 255)
Note right of Controller: Validation Phase
Controller->>Controller: Validate Query Parameters
Controller->>Service: getWishlistHistoriesById(params, wishlist)
Service->>Service: Apply Filters and Sorting
end
rect rgb(230, 200, 255)
Note right of Service: Database Query
Service->>Repository: findWhere(conditions)
Repository->>Database: SELECT with pagination
Database-->>Repository: History Records
Repository-->>Service: Paginated Collection
end
rect rgb(200, 255, 200)
Note right of Service: Success Response
Service-->>Controller: Paginated Histories
Controller->>Controller: Transform to Resources
Controller-->>Client: JSON Response with Pagination
end
Steps
- Authentication: Validate user session and wishlist access
- Parameter Validation: Validate pagination and filter parameters
- Query Execution: Execute paginated query with proper filtering
- Data Transformation: Transform results to API resource format
- Response Generation: Return paginated results with metadata
Error Handling
- 401 Unauthorized: Invalid authentication
- 404 Not Found: Wishlist not found or access denied
- 400 Bad Request: Invalid query parameters
- 500 Internal Server Error: Database or system errors
Case 3: Update Training Status
Sequence Diagram
sequenceDiagram
participant Client
participant Controller as WishlistToGroupController
participant Service as WishlistToGroupService
participant Repository as WishlistToGroupRepository
participant Database
Note over Client,Database: Update Training Status Flow
rect rgb(255, 255, 200)
Note right of Client: Authentication Phase
Client->>Controller: PATCH /api/v1/wishlist-to-group/{slug}/update-tranning-status
Controller->>Controller: Authenticate User
Controller->>Controller: Validate Wishlist Access
end
rect rgb(200, 230, 255)
Note right of Controller: Validation Phase
Controller->>Controller: Validate Request Data
Controller->>Controller: Check Dataset Status
Controller->>Controller: Validate Training Permissions
end
rect rgb(200, 255, 255)
Note right of Controller: Status Update
Controller->>Service: changeTranningStatus(data, wishlist)
Service->>Service: Validate Status Transition
Service->>Repository: update(wishlist, data)
Repository->>Database: UPDATE training_status
Database-->>Repository: Update Result
end
rect rgb(200, 255, 200)
Note right of Repository: Success Response
Repository-->>Service: Update Success
Service-->>Controller: Success Result
Controller-->>Client: Success Response
end
Steps
- Authentication & Authorization: Validate user and wishlist access
- Request Validation: Validate training status update data
- Dataset Status Check: Verify dataset is ready for training updates
- Status Transition: Validate and execute status transition
- Database Update: Update training status in database
- Response Generation: Return success confirmation
Error Handling
- 401 Unauthorized: Invalid authentication
- 404 Not Found: Wishlist not found or access denied
- 400 Bad Request: Invalid status transition or data
- 409 Conflict: Dataset not ready for training updates
- 500 Internal Server Error: Database or system errors