Dataset Workflow - 概要

はじめに

Dataset Workflowは、wishlist_to_groupからデータを作成・分析するための完全な自動化プロセスです。この文書では、データセット作成と処理のためのメインワークフロー要約を説明します。

関連ワークフロー

このメインワークフローは、別途文書化されている以下の前提プロセスに基づいて構築されています:

  • Temp Wishlist to GroupWishlist to Group変換ワークフロー: Wishlist System Overview
  • 詳細ウィッシュリスト管理: wishlist_products, wishlist_categories, wishlist_search_queries, wishlist_product_reviewssummary_wishlist_* テーブル
  • Backend Console Commands: trend-viewer-backendドキュメントで詳細資料が利用可能
    • Dataset Commands: link
    • Crawler Integration: link
  • Processing Services: ドキュメント開発中
    • analyzer_batch: Python ML/AI処理サービス (docs in development)
    • spvp_batch: Python Qwenベース特定観点処理 (docs in development)

注記: この概要は、wishlist_to_groupから完了した分析結果までのコアデータセット作成フローに焦点を当てています。

PLG API - Crawlerとのインターフェース

PLG API (Playground API) は、TV連携のためにCrawlerチームが提供するAPIゲートウェイです:

  • Crawl Management: TVはPLG API経由でクロールリクエストを送信
  • Data Validation: TVはPLG API経由でembedding/predictionステータスを確認
  • Error Tracking: PLG APIはcrawler_v2 DBにログと失敗を管理
  • Success Storage: PLG APIは成功したクロールデータをanalyzer_v2 DBに保存

PLG APIはCrawlerが管理し、crawler_v2analyzer_v2データベースを直接操作します。

概要図

---
config:
  theme: base
  layout: dagre
  flowchart:
    curve: linear
    htmlLabels: true
  themeVariables:
    edgeLabelBackground: "transparent"
---
flowchart LR
    %% User Input
    User[User Input]
    
    %% Core Systems
    TrendApp[trend-viewer-app]
    TrendAPI[trend-viewer-api]
    TrendBackend[trend-viewer-backend]
    AnalyzerBatch[analyzer_batch]
    SPVPBatch[spvp_batch]
    PLGApi[PLG API]
    
    %% Databases
    GBConsole[(gb_console DB)]
    DSAnalyzer[(ds_analyzer DB)]
    AnalyzerV2[(analyzer_v2 DB)]
    CrawlerV2[(crawler_v2 DB)]

    %% Flow with numbered steps - Steps 1 and 2 vertical
    User --- Step1[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #6699cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>1</span>
            <p style='margin-top: 8px'>Wishlist作成</p>
        </div>
    ]
    Step1 --> TrendApp
    TrendApp --> TrendAPI
    TrendAPI --> GBConsole
    
    GBConsole --- Step2[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #99cc66 !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>2</span>
            <p style='margin-top: 8px'>Data Crawl</p>
        </div>
    ]
    Step2 --> TrendBackend
    TrendBackend --> PLGApi
    PLGApi --> AnalyzerV2
    PLGApi --> CrawlerV2
    
    %% Steps 3, 4, 5 horizontal flow (left to right)
    TrendBackend --- Step3[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #cc6699 !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>3</span>
            <p style='margin-top: 8px'>Dataset作成</p>
        </div>
    ]
    Step3 --> DSAnalyzer
    
    TrendBackend --> Step4[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #ff9900 !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>4</span>
            <p style='margin-top: 8px'>分析</p>
        </div>
    ]
    Step4 --> AnalyzerBatch
    AnalyzerV2 --> AnalyzerBatch
    AnalyzerBatch --> DSAnalyzer
    DSAnalyzer --> SPVPBatch
    
    AnalyzerBatch --> Step5[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #cc3366 !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>5</span>
            <p style='margin-top: 8px'>SPVP Process</p>
        </div>
    ]
    Step5 --> SPVPBatch
    SPVPBatch --> DSAnalyzer

    %% Style step boxes as transparent
    style Step1 fill:transparent,stroke:transparent,stroke-width:1px
    style Step2 fill:transparent,stroke:transparent,stroke-width:1px
    style Step3 fill:transparent,stroke:transparent,stroke-width:1px
    style Step4 fill:transparent,stroke:transparent,stroke-width:1px
    style Step5 fill:transparent,stroke:transparent,stroke-width:1px

シーケンス図

sequenceDiagram
    participant User
    participant TrendApp as trend-viewer-app
    participant TrendAPI as trend-viewer-api
    participant TrendBackend as trend-viewer-backend
    participant PLG as PLG API
    participant Analyzer as analyzer_batch
    participant SPVP as spvp_batch
    participant TVDB as TV DB
    participant CrawlerDB as Crawler DB
    participant Logger
    participant Slack
    
    Note over User,Slack: Dataset Workflow Complete Flow
    
    rect rgb(200, 255, 200)
    Note right of User: Happy Case - Phase 1: Wishlist Creation
    
    User->>TrendApp: 1. Create wishlist_to_group
    TrendApp->>TrendAPI: POST /api/v1/general/wishlist-to-group
    TrendAPI->>TrendAPI: Validate and store data
    TrendAPI->>TVDB: Store wishlist data
    Note over TVDB: gb_console.wishlist_to_groups (status: 1 | Active)<br/>gb_console.summary_wishlist_* (crawl_status: 0 | New)
    
    rect rgb(230, 200, 255)
    Note right of TrendAPI: Success Monitoring
    TrendAPI->>Logger: Log wishlist creation
    TrendAPI->>Slack: Send creation notification
    end
    end
    
    rect rgb(200, 230, 255)
    Note right of TrendBackend: Happy Case - Phase 2: Data Crawling
    
    TrendBackend->>TVDB: Query pending items
    TVDB-->>TrendBackend: Return crawl queue
    TrendBackend->>PLG: Send crawl config
    PLG->>CrawlerDB: Store crawl requests
    Note over CrawlerDB: crawler_v2.configs
    
    rect rgb(255, 255, 200)
    Note right of PLG: Optional - Multiple Products
    PLG->>PLG: Execute crawling process
    PLG->>CrawlerDB: Store successful crawled data
    Note over CrawlerDB: analyzer_v2.products<br/>analyzer_v2.reviews<br/>analyzer_v2.review_sentences
    PLG->>CrawlerDB: Update crawl status & errors
    Note over CrawlerDB: crawler_v2.configs
    end
    
    PLG-->>TrendBackend: Return crawl status
    TrendBackend->>TVDB: Update crawl_status: 0→1→2 (Success)
    Note over TVDB: gb_console.summary_wishlist_*
    end
    
    rect rgb(255, 230, 200)
    Note right of TrendBackend: Critical - Phase 3: Dataset Creation
    
    TrendBackend->>TVDB: Check wishlist readiness
    TVDB-->>TrendBackend: Return ready wishlists
    TrendBackend->>PLG: Validate embedding & prediction
    PLG->>CrawlerDB: Check data completeness
    Note over CrawlerDB: analyzer_v2.products (embedding status)
    CrawlerDB-->>PLG: Return validation status
    PLG-->>TrendBackend: Validation completed
    
    TrendBackend->>TVDB: Create dataset metadata
    Note over TVDB: ds_analyzer.datasets (status: 1 | Pending, progress: 0)
    TrendBackend->>TVDB: Create history record
    Note over TVDB: gb_console.wishlist_dataset_histories (status: 1, spvp_status: 1)
    
    TrendBackend->>Analyzer: Trigger analyzer_batch
    Note over Analyzer: Google Cloud Run job started
    end
    
    rect rgb(200, 255, 200)
    Note right of Analyzer: Happy Case - Phase 4: ML Analysis
    
    Analyzer->>TVDB: Update status to Processing
    Note over TVDB: ds_analyzer.datasets (status: 1→2 | Processing, progress: 0→25)
    
    Analyzer->>CrawlerDB: Load source data directly
    Note over CrawlerDB: analyzer_v2.products<br/>analyzer_v2.reviews<br/>analyzer_v2.review_sentences
    CrawlerDB-->>Analyzer: Return products, reviews, sentences
    
    Analyzer->>Analyzer: ML Processing
    Note over Analyzer: - K-means Clustering (progress: 25→50)<br/>- OpenAI GPT-4 Labeling (progress: 50→75)<br/>- Product Similarity Calc (progress: 75→85)
    
    Analyzer->>TVDB: Write analysis results (7 tables)
    Note over TVDB: ds_analyzer.products<br/>ds_analyzer.product_details<br/>ds_analyzer.product_similarities<br/>ds_analyzer.ai_viewpoints<br/>ds_analyzer.review_sentence_aivp<br/>ds_analyzer.reviews<br/>ds_analyzer.review_sentences (progress: 85→100)
    
    Analyzer->>TVDB: Complete analysis
    Note over TVDB: ds_analyzer.datasets (status: 2→3 | Completed, progress: 100)
    
    Analyzer-->>TrendBackend: Analysis completed notification
    TrendBackend->>TVDB: Sync dataset status
    Note over TVDB: gb_console.wishlist_dataset_histories<br/>(status: 3, spvp_status: 2 | Analyzing)
    end
    
    rect rgb(200, 230, 255)
    Note right of TrendBackend: Happy Case - Phase 5: SPVP Processing
    
    TrendBackend->>SPVP: Trigger spvp_batch
    SPVP->>TVDB: Load review sentences
    Note over TVDB: ds_analyzer.review_sentences
    SPVP->>TVDB: Load specific viewpoints & categories
    Note over TVDB: ds_analyzer.specific_viewpoints<br/>ds_analyzer.viewpoint_categories
    TVDB-->>SPVP: Return sentences, viewpoints & categories
    
    SPVP->>SPVP: Qwen mapping process
    Note over SPVP: Qwen model maps<br/>specific_viewpoints ↔ review_sentences
    
    SPVP->>TVDB: Store mapping results
    Note over TVDB: ds_analyzer.review_sentence_spvp<br/>(sentence-viewpoint mappings)
    
    SPVP->>TVDB: Update viewpoint progress
    Note over TVDB: ds_analyzer.specific_viewpoints<br/>(last_object_id updated)
    
    SPVP-->>TrendBackend: SPVP completed
    TrendBackend->>TVDB: Final status update
    Note over TVDB: gb_console.wishlist_dataset_histories<br/>(spvp_status: 2→3 | Completed)
    
    rect rgb(230, 200, 255)
    Note right of TrendBackend: Success Monitoring
    TrendBackend->>Logger: Log workflow completion
    TrendBackend->>Slack: Send success notification
    TrendBackend-->>TrendAPI: Dataset ready notification
    TrendAPI-->>TrendApp: WebSocket status update
    TrendApp-->>User: Dataset available
    end
    end
    
    rect rgb(255, 200, 200)
    Note right of TrendBackend: Error Handling
    rect rgb(255, 230, 230)
    alt Crawl Failure
        PLG->>CrawlerDB: Log crawl errors & failures
        Note over CrawlerDB: crawler_v2.error_logs
        PLG->>TVDB: Set crawl_status = 3 (Error)
        Note over TVDB: gb_console.summary_wishlist_*
        PLG->>Logger: Log crawl error
        PLG->>Slack: Send crawl failure notification
    else Analysis Failure  
        Analyzer->>TVDB: Set status = 9 (Failed), error_code
        Note over TVDB: ds_analyzer.datasets
        Analyzer->>TVDB: Update history status = 9
        Note over TVDB: gb_console.wishlist_dataset_histories
        Analyzer->>Logger: Log analysis error
        Analyzer->>Slack: Send analysis failure notification
    else SPVP Failure
        SPVP->>TVDB: Set spvp mapping error
        Note over TVDB: ds_analyzer.review_sentence_spvp<br/>ds_analyzer.specific_viewpoints
        SPVP->>TVDB: Set spvp_status = 9 (Failed)
        Note over TVDB: gb_console.wishlist_dataset_histories
        SPVP->>Logger: Log SPVP error
        SPVP->>Slack: Send SPVP failure notification
    end
    end
    end

システムコンポーネント

Frontend

  • trend-viewer-app: ユーザーインターフェース用Vue.jsフロントエンド

API Layer

  • trend-viewer-api: Laravel REST APIレイヤー

Backend Services

  • trend-viewer-backend: スケジュールされたコマンドを持つLaravelバックエンド
  • PLG API: CrawlerがTV連携のために提供するPlayground API

Processing Services

  • analyzer_batch: Python ML/AI処理サービス
  • spvp_batch: Python Qwenベース特定観点処理

Databases

  • gb_console DB (MySQL): TVバックエンド - ユーザーデータ、ウィッシュリストライフサイクル、ステータス追跡
  • crawler_v2 DB (MySQL): Crawler管理 - クロールリクエスト、ログ、エラー追跡
  • analyzer_v2 DB (MySQL): Crawler管理 - マーケットプレイスからの成功したクロールデータ
  • ds_analyzer DB (MySQL): TVアナライザー - analyzer_batchからの処理済み分析結果

データフロー要約

  1. User Input → gb_console DB (wishlist_to_groups, summary_wishlist_*)
  2. Crawl Requests → TVがPLG APIを呼び出し → Crawlerがcrawler_v2 DBを操作
  3. Successful Crawling → PLG APIがanalyzer_v2 DBに保存 (products, reviews, review_sentences)
  4. Dataset Creation → ds_analyzer DB (datasets metadata)
  5. Analysis → analyzer_batchがanalyzer_v2 DBから直接読み取り → ds_analyzer DBに保存 (7 tables)
  6. SPVP Processing → ds_analyzer DB (specific_viewpoints updates)

モジュールリスト

名前 リンク 説明
データベーススキーマ変更 Database Schema dataset workflowのデータベーススキーマ変更の詳細
詳細シーケンス図 Sequence Diagrams 各ワークフローフェーズの詳細なシーケンス図

Dataset状態

Dataset Status (ds_analyzer.datasets.status)

  • 1 | Pending: Dataset作成済み、analyzer_batch待機中
  • 2 | Processing: analyzer_batch処理中
  • 3 | Completed: analyzer_batch完了
  • 9 | Failed: analyzer_batch失敗

SPVP Status (gb_console.wishlist_dataset_histories.spvp_status)

  • 1 | Pending: spvp_batch待機中
  • 2 | Analyzing: spvp_batch処理中
  • 3 | Completed: spvp_batch完了
  • 9 | Failed: spvp_batch失敗

Crawl Status (gb_console.summary_wishlist_*.crawl_status)

  • 0 | New: 未クロール
  • 1 | InProgress: Crawler Service クロール中
  • 2 | Success: クロール成功
  • 3 | Error: クロール失敗
  • 4 | Canceled: クロールキャンセル

成功したDataset: status = 3 AND spvp_status = 3