Dataset Creation

Description

The Dataset Creation feature provides advanced integration with the TV Python API for sophisticated product analysis and business intelligence generation. This system transforms wishlist data into comprehensive datasets for machine learning analysis, enabling deep insights into product performance, market trends, and competitive positioning.

This feature serves as the bridge between wishlist management and advanced analytics, offering:

  • Automated dataset preparation from wishlist data
  • TV Python API integration for advanced analysis
  • Real-time status monitoring and progress tracking
  • Comprehensive error handling and retry mechanisms
  • Historical dataset tracking and management
  • Training status management and scheduling

The system ensures data integrity through transaction-safe operations and provides real-time updates via Pusher integration for optimal user experience.

Activity Diagram

---
config:
  theme: base
  layout: dagre
  flowchart:
    curve: linear
    htmlLabels: true
  themeVariables:
    edgeLabelBackground: "transparent"
---
flowchart TB
    %% Main components
    UserRequest[User Request]
    Database[(Database)]
    
    subgraph Controllers
        WishlistController[WishlistToGroupController]
    end
    
    subgraph Services
        AnalysisService(ProductAnalysisService)
        DatasetService(WishlistDatasetHistoryService)
    end
    
    subgraph ExternalAPIs
        TVPythonAPI((TV Python API))
        PusherService((Pusher Service))
    end
    
    subgraph Middleware
        AuthMiddleware{AuthMiddleware}
    end
    
    UserRequest --- Step1[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #6699cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>1</span>
            <p style='margin-top: 8px'>Authentication & Wishlist Validation</p>
        </div>
    ]
    Step1 --> AuthMiddleware
    AuthMiddleware --> WishlistController

    WishlistController --- Step2[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #6699cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>2</span>
            <p style='margin-top: 8px'>Dataset Preparation & Validation</p>
        </div>
    ]
    Step2 --> AnalysisService
    AnalysisService --> Database

    AnalysisService --- Step3[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #6699cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>3</span>
            <p style='margin-top: 8px'>TV Python API Integration</p>
        </div>
    ]
    Step3 --> TVPythonAPI

    TVPythonAPI --- Step4[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #6699cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>4</span>
            <p style='margin-top: 8px'>Dataset History & Status Tracking</p>
        </div>
    ]
    Step4 --> DatasetService
    DatasetService --> Database

    DatasetService --- Step5[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #6699cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>5</span>
            <p style='margin-top: 8px'>Real-time Monitoring & Updates</p>
        </div>
    ]
    Step5 --> PusherService
    TVPythonAPI -.-> DatasetService

    PusherService --- Step6[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #6699cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>6</span>
            <p style='margin-top: 8px'>Result Processing & Training Status</p>
        </div>
    ]
    Step6 --> WishlistController

    %% Styling
    style UserRequest fill:#e6f3ff,stroke:#0066cc,stroke-width:2px
    style Database fill:#ffe6cc,stroke:#ff9900,stroke-width:2px
    style TVPythonAPI fill:#fcd9d9,stroke:#cc3333,stroke-width:2px
    style PusherService fill:#fcd9d9,stroke:#cc3333,stroke-width:2px
    style Controllers fill:#e6f3ff
    style Services fill:#f0f8e6
    style ExternalAPIs fill:#fcd9d9
    style Middleware fill:#f5f0ff
    style Step1 fill:transparent,stroke:transparent,stroke-width:1px
    style Step2 fill:transparent,stroke:transparent,stroke-width:1px
    style Step3 fill:transparent,stroke:transparent,stroke-width:1px
    style Step4 fill:transparent,stroke:transparent,stroke-width:1px
    style Step5 fill:transparent,stroke:transparent,stroke-width:1px
    style Step6 fill:transparent,stroke:transparent,stroke-width:1px

Detail Dataflow Dependency

Step-by-Step Process

Step 1: Authentication & Wishlist Validation

  • Description: Validates user authentication, wishlist access, and subscription status for dataset creation
  • Action: Authenticate user, validate wishlist ownership, check subscription validity, verify group membership
  • Input: User credentials, wishlist slug, subscription details, group membership data
  • Output: Validated user context with wishlist access and subscription confirmation
  • Dependencies: User authentication service, wishlist repository, subscription validation
  • External Services: Authentication provider, subscription billing system

Step 2: Dataset Preparation & Validation

  • Description: Prepares wishlist data for TV Python API integration and validates dataset requirements
  • Action: Aggregate wishlist products/categories/queries, validate data completeness, check existing datasets
  • Input: Wishlist data, product details, category information, search queries
  • Output: Formatted dataset request with validation results and conflict checking
  • Dependencies: Wishlist data services, dataset history service, data validation
  • External Services: Product validation APIs, data integrity services

Step 3: TV Python API Integration

  • Description: Sends dataset creation request to TV Python API with proper formatting and error handling
  • Action: Format data for TV Python API, send HTTP request, handle API responses, manage timeouts
  • Input: Formatted dataset request, API credentials, configuration parameters
  • Output: Dataset creation response with tracking ID and initial status
  • Dependencies: TV Python API client, HTTP service, configuration management
  • External Services: TV Python API, dataset storage service, API gateway

Step 4: Dataset History & Status Tracking

  • Description: Creates dataset history record and initializes real-time status monitoring
  • Action: Create history record, initialize status tracking, setup monitoring, configure notifications
  • Input: Dataset response, tracking ID, wishlist context, user preferences
  • Output: Dataset history record with monitoring configuration and notification setup
  • Dependencies: Dataset history service, status tracking service, notification service
  • External Services: Database cluster, monitoring systems, notification queue

Step 5: Real-time Monitoring & Progress Updates

  • Description: Monitors dataset analysis progress with real-time updates and status notifications
  • Action: Poll TV Python API status, update progress, send Pusher notifications, handle status changes
  • Input: Dataset tracking ID, polling configuration, notification preferences
  • Output: Real-time status updates, progress notifications, completion alerts
  • Dependencies: Polling service, notification service, status management
  • External Services: TV Python API status endpoints, Pusher real-time service, monitoring dashboard

Step 6: Result Processing & Training Status Management

  • Description: Processes analysis completion, updates training status, and manages result availability
  • Action: Process completion status, update training configuration, manage result access, handle errors
  • Input: Analysis completion data, training preferences, error information
  • Output: Updated training status, result availability, error handling
  • Dependencies: Training status service, result processing, error management
  • External Services: Result storage service, training scheduler, error reporting systems

Database Related Tables & Fields

Database: gb_console

erDiagram
    wishlist_dataset_histories {
        bigint id PK
        bigint wishlist_to_group_id FK
        bigint dataset_id
        string slug
        integer status "status of the dataset"
    }
    
    wishlist_to_groups {
        bigint id PK
        string name "Name of the wishlist"
        string slug "Slug of the wishlist"
        integer status "0: Inactive, 1: Active, 3: Canceled"
    }
    
    wishlist_dataset_histories }|--|| wishlist_to_groups : "belongs to"

Case Documentation

Case 1: Create Dataset for Analysis

API: Create Dataset

Sequence Diagram

sequenceDiagram
    participant Client
    participant Controller as WishlistToGroupController
    participant Service as ProductAnalysisService
    participant Repository as WishlistToGroupRepository
    participant HistoryRepo as WishlistDatasetHistoryRepository
    participant TVPythonAPI as TV Python API
    participant Pusher as Pusher Service
    participant Database
    
    Note over Client,Database: Create Dataset for Analysis Flow
    
    rect rgb(255, 255, 200)
    Note right of Client: Authentication Phase
    Client->>Controller: POST /api/v1/wishlist-to-group/{slug}/createAnalysis
    Controller->>Controller: Authenticate User
    Controller->>Repository: findBySlug(slug)
    Repository->>Database: SELECT wishlist
    Database-->>Repository: Wishlist Record
    Repository-->>Controller: Wishlist or null
    end
    
    rect rgb(200, 230, 255)
    Note right of Controller: Validation Phase
    Controller->>Controller: Validate Group Access
    Controller->>Controller: Check Subscription Status
    Controller->>Controller: Validate Dataset History
    alt Validation Failed
        Controller-->>Client: Error Response
    end
    end
    
    rect rgb(200, 255, 255)
    Note right of Controller: Dataset Preparation
    Controller->>Service: createDatasetForWishlistToGroup(wishlist)
    Service->>Service: Prepare Dataset Request
    Service->>Service: Aggregate Wishlist Data
    Service->>Service: Format for TV Python API
    end
    
    rect rgb(255, 230, 200)
    Note right of Service: TV Python API Integration
    Service->>TVPythonAPI: POST /dataset/create
    TVPythonAPI->>TVPythonAPI: Process Dataset Request
    TVPythonAPI-->>Service: Dataset Creation Response
    Service->>Service: Extract Dataset ID
    end
    
    rect rgb(230, 200, 255)
    Note right of Service: History & Status Tracking
    Service->>HistoryRepo: create(datasetHistory)
    HistoryRepo->>Database: INSERT dataset_history
    Database-->>HistoryRepo: Created Record
    Service->>Service: Initialize Status Monitoring
    end
    
    rect rgb(200, 255, 200)
    Note right of Service: Success Response
    Service-->>Controller: Success Result
    Controller->>Pusher: Send Initial Notification
    Pusher->>Client: Real-time Status Update
    Controller-->>Client: JSON Response with Dataset ID
    end
    
    rect rgb(255, 200, 200)
    Note right of Service: Error Handling
    alt TV Python API Error
        TVPythonAPI-->>Service: API Error Response
        Service->>Service: Log Error Details
        Service-->>Controller: API Error
    else Database Error
        Database-->>HistoryRepo: Database Error
        HistoryRepo-->>Service: Creation Failed
        Service-->>Controller: Database Error
    else Subscription Error
        Controller->>Controller: Subscription Invalid
        Controller-->>Client: Subscription Error
    end
    Controller-->>Client: Error Response
    end

Steps

  1. Authentication & Authorization: Validate user and wishlist access
  2. Subscription Validation: Check subscription status and validity
  3. Dataset History Check: Verify no active dataset processing
  4. Data Preparation: Aggregate and format wishlist data for TV Python API
  5. API Integration: Send dataset creation request to TV Python API
  6. History Creation: Create dataset history record with tracking ID
  7. Status Monitoring: Initialize real-time status monitoring
  8. Notification Setup: Configure Pusher notifications for progress updates

Error Handling

  • 401 Unauthorized: Invalid authentication
  • 404 Not Found: Wishlist not found or access denied
  • 402 Payment Required: Subscription expired or invalid
  • 409 Conflict: Dataset already processing
  • 503 Service Unavailable: TV Python API unavailable
  • 500 Internal Server Error: Database or system errors

Case 2: Get Dataset Histories

API: Get Dataset Histories

Sequence Diagram

sequenceDiagram
    participant Client
    participant Controller as WishlistToGroupController
    participant Service as WishlistDatasetHistoryService
    participant Repository as WishlistDatasetHistoryRepository
    participant Database
    
    Note over Client,Database: Get Dataset Histories Flow
    
    rect rgb(255, 255, 200)
    Note right of Client: Authentication Phase
    Client->>Controller: GET /api/v1/wishlist-to-group/{slug}/histories
    Controller->>Controller: Authenticate User
    Controller->>Controller: Validate Wishlist Access
    end
    
    rect rgb(200, 230, 255)
    Note right of Controller: Validation Phase
    Controller->>Controller: Validate Query Parameters
    Controller->>Service: getWishlistHistoriesById(params, wishlist)
    Service->>Service: Apply Filters and Sorting
    end
    
    rect rgb(230, 200, 255)
    Note right of Service: Database Query
    Service->>Repository: findWhere(conditions)
    Repository->>Database: SELECT with pagination
    Database-->>Repository: History Records
    Repository-->>Service: Paginated Collection
    end
    
    rect rgb(200, 255, 200)
    Note right of Service: Success Response
    Service-->>Controller: Paginated Histories
    Controller->>Controller: Transform to Resources
    Controller-->>Client: JSON Response with Pagination
    end

Steps

  1. Authentication: Validate user session and wishlist access
  2. Parameter Validation: Validate pagination and filter parameters
  3. Query Execution: Execute paginated query with proper filtering
  4. Data Transformation: Transform results to API resource format
  5. Response Generation: Return paginated results with metadata

Error Handling

  • 401 Unauthorized: Invalid authentication
  • 404 Not Found: Wishlist not found or access denied
  • 400 Bad Request: Invalid query parameters
  • 500 Internal Server Error: Database or system errors

Case 3: Update Training Status

API: Update Training Status

Sequence Diagram

sequenceDiagram
    participant Client
    participant Controller as WishlistToGroupController
    participant Service as WishlistToGroupService
    participant Repository as WishlistToGroupRepository
    participant Database
    
    Note over Client,Database: Update Training Status Flow
    
    rect rgb(255, 255, 200)
    Note right of Client: Authentication Phase
    Client->>Controller: PATCH /api/v1/wishlist-to-group/{slug}/update-tranning-status
    Controller->>Controller: Authenticate User
    Controller->>Controller: Validate Wishlist Access
    end
    
    rect rgb(200, 230, 255)
    Note right of Controller: Validation Phase
    Controller->>Controller: Validate Request Data
    Controller->>Controller: Check Dataset Status
    Controller->>Controller: Validate Training Permissions
    end
    
    rect rgb(200, 255, 255)
    Note right of Controller: Status Update
    Controller->>Service: changeTranningStatus(data, wishlist)
    Service->>Service: Validate Status Transition
    Service->>Repository: update(wishlist, data)
    Repository->>Database: UPDATE training_status
    Database-->>Repository: Update Result
    end
    
    rect rgb(200, 255, 200)
    Note right of Repository: Success Response
    Repository-->>Service: Update Success
    Service-->>Controller: Success Result
    Controller-->>Client: Success Response
    end

Steps

  1. Authentication & Authorization: Validate user and wishlist access
  2. Request Validation: Validate training status update data
  3. Dataset Status Check: Verify dataset is ready for training updates
  4. Status Transition: Validate and execute status transition
  5. Database Update: Update training status in database
  6. Response Generation: Return success confirmation

Error Handling

  • 401 Unauthorized: Invalid authentication
  • 404 Not Found: Wishlist not found or access denied
  • 400 Bad Request: Invalid status transition or data
  • 409 Conflict: Dataset not ready for training updates
  • 500 Internal Server Error: Database or system errors