Dataset Workflow - Detailed Sequence Diagrams

Wishlist Creation Sequence

sequenceDiagram
    participant User
    participant TrendApp as trend-viewer-app
    participant TrendAPI as trend-viewer-api
    participant WishlistController as WishlistToGroupController
    participant WishlistService as WishlistToGroupService
    participant GBConsole as gb_console DB
    participant Logger
    participant Slack
    
    Note over User,Slack: Wishlist Creation Flow (Phase 1)
    
    rect rgb(200, 255, 200)
    Note right of User: Happy Case - Wishlist Registration
    
    User->>TrendApp: Input wishlist data
    TrendApp->>TrendAPI: POST /api/v1/general/wishlist-to-group
    TrendAPI->>WishlistController: Handle request
    WishlistController->>WishlistService: validateAndCreate()
    
    rect rgb(255, 255, 200)
    Note right of WishlistService: Optional - Data Validation
    WishlistService->>WishlistService: Validate input data
    WishlistService->>WishlistService: Check subscription limits
    end
    
    WishlistService->>GBConsole: Create wishlist_to_groups
    Note over GBConsole: INSERT wishlist_to_groups<br/>(status: 1 | Active)
    
    WishlistService->>GBConsole: Create summary tables
    Note over GBConsole: INSERT summary_wishlist_products<br/>(crawl_status: 0 | New)<br/>INSERT summary_wishlist_categories<br/>INSERT summary_wishlist_search_queries
    
    GBConsole-->>WishlistService: Return created IDs
    WishlistService-->>WishlistController: Return success
    WishlistController-->>TrendAPI: Return response
    
    rect rgb(230, 200, 255)
    Note right of TrendAPI: Success Monitoring
    TrendAPI->>Logger: Log wishlist creation
    TrendAPI->>Slack: Send creation notification
    end
    
    TrendAPI-->>TrendApp: Return success response
    TrendApp-->>User: Show success message
    end
    
    rect rgb(255, 200, 200)
    Note right of WishlistService: Error Handling
    rect rgb(255, 230, 230)
    alt Validation Error
        WishlistService->>Logger: Log validation error
        WishlistService->>Slack: Send validation failure
        WishlistService-->>WishlistController: Return error response
    else Database Error
        WishlistService->>Logger: Log database error
        WishlistService->>Slack: Send database failure
        WishlistService->>GBConsole: Rollback transaction
    end
    end
    end

Data Crawling Sequence

sequenceDiagram
    participant TrendBackend as trend-viewer-backend
    participant SendToCrawler as SendToCrawlerCommand
    participant PLGApi as PLG API
    participant CrawlerDB as Crawler DB
    participant GBConsole as gb_console DB
    participant Logger
    participant Slack
    
    Note over TrendBackend,Slack: Data Crawling Flow (Phase 2)
    
    rect rgb(200, 230, 255)
    Note right of TrendBackend: Happy Case - Scheduled Crawling
    
    TrendBackend->>SendToCrawler: Execute crawl command
    SendToCrawler->>GBConsole: Query summary tables
    Note over GBConsole: WHERE crawl_status = 0 (New)
    
    GBConsole-->>SendToCrawler: Return pending items
    SendToCrawler->>GBConsole: Update crawl_status = 1 (InProgress)
    
    SendToCrawler->>PLGApi: Send crawl configurations
    PLGApi->>CrawlerDB: Store crawl requests
    Note over CrawlerDB: crawler_v2.configs
    
    rect rgb(255, 255, 200)
    Note right of PLGApi: Optional - Multiple Products Processing
    PLGApi->>PLGApi: Execute crawling process
    PLGApi->>CrawlerDB: Store successful crawled data
    Note over CrawlerDB: analyzer_v2.products<br/>analyzer_v2.reviews<br/>analyzer_v2.review_sentences
    PLGApi->>CrawlerDB: Update crawl status & errors
    Note over CrawlerDB: crawler_v2.configs (status updates)
    end
    
    SendToCrawler->>GBConsole: Update crawl_status = 2 (Success)
    
    rect rgb(230, 200, 255)
    Note right of SendToCrawler: Success Monitoring
    SendToCrawler->>Logger: Log crawl success
    SendToCrawler->>Slack: Send crawl completion notification
    end
    end
    
    rect rgb(255, 200, 200)
    Note right of PLGApi: Error Handling
    rect rgb(255, 230, 230)
    alt Crawl Error
        PLGApi->>CrawlerDB: Log crawl errors & failures
        Note over CrawlerDB: crawler_v2.configs
        PLGApi->>Logger: Log crawl error
        PLGApi->>Slack: Send crawl failure notification
        PLGApi-->>SendToCrawler: Return error status
        SendToCrawler->>GBConsole: Update crawl_status = 3 (Error)
    else Timeout Error
        SendToCrawler->>Logger: Log timeout error
        SendToCrawler->>GBConsole: Update crawl_status = 3 (Error)
    end
    end
    end

Dataset Creation & Analysis Sequence

sequenceDiagram
    participant TrendBackend as trend-viewer-backend
    participant CreateDataset as CreateDatasetCommand
    participant CloudRunService as CloudRunJobService
    participant AnalyzerBatch as analyzer_batch
    participant TVDB as TV DB
    participant CrawlerDB as Crawler DB
    participant Logger
    participant Slack
    
    Note over TrendBackend,Slack: Dataset Creation & Analysis Flow (Phase 3-4)
    
    rect rgb(255, 230, 200)
    Note right of TrendBackend: Critical - Dataset Creation
    
    TrendBackend->>CreateDataset: Execute dataset:create
    CreateDataset->>TVDB: Check wishlist readiness
    Note over TVDB: gb_console.wishlist_to_groups<br/>WHERE status = 1 (Active)<br/>AND crawl_status = 2 (Success)
    
    CreateDataset->>CreateDataset: Validate conditions
    Note over CreateDataset: ✓ Crawl completed<br/>✓ Embedding ready (via PLG API)<br/>✓ Prediction ready (via PLG API)
    
    CreateDataset->>TVDB: Create dataset metadata
    Note over TVDB: ds_analyzer.datasets<br/>(status: 1 | Pending, progress: 0)
    
    CreateDataset->>TVDB: Create history record
    Note over TVDB: gb_console.wishlist_dataset_histories<br/>(status: 1, spvp_status: 1)
    
    CreateDataset->>CloudRunService: Trigger analyzer_batch
    CloudRunService->>AnalyzerBatch: Start Google Cloud Run job
    Note over AnalyzerBatch: python main.py --dataset_id={id}
    end
    
    rect rgb(200, 255, 200)
    Note right of AnalyzerBatch: Happy Case - ML Analysis Processing
    
    AnalyzerBatch->>TVDB: Update status = 2 (Processing)
    Note over TVDB: ds_analyzer.datasets (status: 1→2)
    
    AnalyzerBatch->>CrawlerDB: Load source data
    Note over CrawlerDB: analyzer_v2.products<br/>analyzer_v2.reviews<br/>analyzer_v2.review_sentences
    CrawlerDB-->>AnalyzerBatch: Return source data (progress: 0→25)
    
    AnalyzerBatch->>AnalyzerBatch: ML processing phase
    Note over AnalyzerBatch: K-means clustering (progress: 25→50)<br/>OpenAI GPT-4 labeling (progress: 50→75)<br/>Product similarity calc (progress: 75→85)
    
    AnalyzerBatch->>TVDB: Write analysis results (7 tables)
    Note over TVDB: ds_analyzer.products<br/>ds_analyzer.product_details<br/>ds_analyzer.product_similarities<br/>ds_analyzer.ai_viewpoints<br/>ds_analyzer.review_sentence_aivp<br/>ds_analyzer.reviews<br/>ds_analyzer.review_sentences<br/>(progress: 85→100)
    
    AnalyzerBatch->>TVDB: Complete analysis
    Note over TVDB: ds_analyzer.datasets<br/>(status: 3 | Completed, progress: 100)
    
    AnalyzerBatch-->>CloudRunService: Analysis completed
    CloudRunService-->>CreateDataset: Notify completion
    
    rect rgb(230, 200, 255)
    Note right of CreateDataset: Success Monitoring
    CreateDataset->>Logger: Log analysis completion
    CreateDataset->>Slack: Send analysis success notification
    end
    end
    
    rect rgb(255, 200, 200)
    Note right of AnalyzerBatch: Error Handling
    rect rgb(255, 230, 230)
    alt Data Loading Error
        AnalyzerBatch->>TVDB: Set status = 9 (Failed), error_code = 1001
        Note over TVDB: ds_analyzer.datasets
        AnalyzerBatch->>Logger: Log data loading error
        AnalyzerBatch->>Slack: Send data loading failure
    else ML Processing Error
        AnalyzerBatch->>TVDB: Set status = 9 (Failed), error_code = 2001-2003
        Note over TVDB: ds_analyzer.datasets
        AnalyzerBatch->>Logger: Log ML processing error
        AnalyzerBatch->>Slack: Send ML processing failure
    else Data Writing Error
        AnalyzerBatch->>TVDB: Set status = 9 (Failed), error_code = 3001
        Note over TVDB: ds_analyzer.datasets
        AnalyzerBatch->>Logger: Log data writing error
        AnalyzerBatch->>Slack: Send data writing failure
        AnalyzerBatch->>TVDB: Rollback transaction
    end
    end
    end

SPVP Processing Sequence

sequenceDiagram
    participant TrendBackend as trend-viewer-backend
    participant GetStatus as GetDatasetStatusCommand
    participant UpdateSPVP as UpdateSPVPStatusCommand
    participant SPVPBatch as spvp_batch
    participant TVDB as TV DB
    participant Logger
    participant Slack
    
    Note over TrendBackend,Slack: SPVP Processing Flow (Phase 5)
    
    rect rgb(200, 230, 255)
    Note right of TrendBackend: Happy Case - Status Monitoring & SPVP Trigger
    
    TrendBackend->>GetStatus: Execute dataset:get-status
    GetStatus->>TVDB: Check dataset status
    Note over TVDB: ds_analyzer.datasets<br/>WHERE status = 3 (Completed)
    
    GetStatus->>TVDB: Update history status
    Note over TVDB: gb_console.wishlist_dataset_histories<br/>(status: 3, spvp_status: 2 | Analyzing)
    
    GetStatus->>SPVPBatch: Trigger SPVP processing
    SPVPBatch->>TVDB: Load review sentences
    Note over TVDB: ds_analyzer.review_sentences
    SPVPBatch->>TVDB: Load specific viewpoints & categories
    Note over TVDB: ds_analyzer.specific_viewpoints<br/>ds_analyzer.viewpoint_categories
    TVDB-->>SPVPBatch: Return sentences, viewpoints & categories
    
    SPVPBatch->>SPVPBatch: Qwen mapping process
    Note over SPVPBatch: Qwen model maps<br/>specific_viewpoints ↔ review_sentences
    
    SPVPBatch->>TVDB: Store mapping results
    Note over TVDB: ds_analyzer.review_sentence_spvp<br/>(sentence-viewpoint mappings)
    
    SPVPBatch->>TVDB: Update viewpoint progress
    Note over TVDB: ds_analyzer.specific_viewpoints<br/>(last_object_id updated)
    
    TrendBackend->>UpdateSPVP: Execute spvp:update-status
    UpdateSPVP->>TVDB: Check SPVP completion
    Note over TVDB: ds_analyzer.specific_viewpoints<br/>Check all viewpoints processed
    
    rect rgb(255, 255, 200)
    Note right of UpdateSPVP: Optional - Completion Check
    UpdateSPVP->>UpdateSPVP: Verify SPVP completion
    Note over UpdateSPVP: All viewpoints processed?
    end
    
    UpdateSPVP->>TVDB: Update final status
    Note over TVDB: gb_console.wishlist_dataset_histories<br/>(spvp_status: 3 | Completed)
    
    rect rgb(230, 200, 255)
    Note right of UpdateSPVP: Success Monitoring
    UpdateSPVP->>Logger: Log SPVP completion
    UpdateSPVP->>Slack: Send workflow completion notification
    end
    end
    
    rect rgb(255, 200, 200)
    Note right of SPVPBatch: Error Handling
    rect rgb(255, 230, 230)
    alt SPVP Processing Error
        SPVPBatch->>TVDB: Set spvp mapping error
        Note over TVDB: ds_analyzer.review_sentence_spvp<br/>ds_analyzer.specific_viewpoints
        SPVPBatch->>Logger: Log SPVP error
        SPVPBatch->>Slack: Send SPVP failure notification
        UpdateSPVP->>TVDB: Set spvp_status = 9 (Failed)
        Note over TVDB: gb_console.wishlist_dataset_histories
    else SPVP Timeout
        UpdateSPVP->>Logger: Log SPVP timeout
        UpdateSPVP->>TVDB: Set spvp_status = 9 (Failed)
        Note over TVDB: gb_console.wishlist_dataset_histories
    end
    end
    end

Tài Liệu Liên Quan

Thành Phần Workflow Chính

Tài Liệu Hệ Thống Liên Quan

Tài Liệu Backend Services

  • Tài liệu trend-viewer-backend - Tài liệu backend đầy đủ
    • Dataset Commands: link - Console commands được tham chiếu trong sequences
      • dataset:create - Lệnh tạo dataset (Giai đoạn 3)
      • dataset:get-status - Lệnh giám sát trạng thái (Giai đoạn 5)
      • dataset:update-spvp-status - Cập nhật trạng thái SPVP (Giai đoạn 5)
    • Crawler Integration: link - Chi tiết tích hợp PLG API
      • SendToCrawlerCommand - Sequences cào dữ liệu (Giai đoạn 2)

Processing Services (Tài liệu đang phát triển)

  • analyzer_batch: Service xử lý Python ML/AI
    • Vai trò Sequence: Giai đoạn 4 - Xử lý ML Analysis
    • Chức năng: K-means clustering, OpenAI GPT-4 labeling, Tính toán độ tương đồng sản phẩm
  • spvp_batch: Service xử lý specific viewpoint dựa trên Python Qwen
    • Vai trò Sequence: Giai đoạn 5 - Xử lý SPVP
    • Chức năng: Qwen mapping process, theo dõi tiến độ SPVP