Scheduled Commands Documentation (English)

This documentation provides details about all scheduled commands in the system, organized by component.

System Components Overview

---
config:
  theme: base
  layout: dagre
  flowchart:
    curve: linear
    htmlLabels: true
  themeVariables:
    edgeLabelBackground: "transparent"
---
flowchart TD
    %% External Data Sources
    BigQuery[(GCP BigQuery)]
    
    %% Core Databases - Aligned horizontally
    subgraph CoreDatabases["Core Databases"]
        direction LR
        AnalyzerDB[(gb_analyzer)]
        ConsoleDB[(gb_console)]
    end
    
    %% External Services
    Crawler((PLG API))
    Dataset((TV Python API))
    OpenAI[[Open AI]]
    Notification((Notification System))

    %% Define connections with improved formatting
    BigQuery --- Step1[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #6699cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>1</span>
            <p style='margin-top: 8px'>gcp:sync-*</p>
        </div>
    ]
    Step1 --> AnalyzerDB
    
    AnalyzerDB --- Step2[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #99cc66 !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>2</span>
            <p style='margin-top: 8px'>localdb:sync-*</p>
        </div>
    ]
    Step2 --> AnalyzerDB
    
    ConsoleDB --- Step3[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #cc66cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>3</span>
            <p style='margin-top: 8px'>plg-api:sending-configs-to-crawler</p>
        </div>
    ]
    Step3 --> Crawler
    
    Crawler --- Step4[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #cc66cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>4</span>
            <p style='margin-top: 8px'>plg-api:sync-crawl-failed-from-crawler</p>
        </div>
    ]
    Step4 --> ConsoleDB
    
    AnalyzerDB --- Step5[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #6699cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>5</span>
            <p style='margin-top: 8px'>analyzerdb:sync-crawl-success-from-analyzer</p>
        </div>
    ]
    Step5 --> ConsoleDB
    
    ConsoleDB --- Step6[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #ff9966 !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>6</span>
            <p style='margin-top: 8px'>dataset:create</p>
        </div>
    ]
    Step6 --> Dataset
    
    Dataset --- Step7[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #ff9966 !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>7</span>
            <p style='margin-top: 8px'>dataset:get-status</p>
        </div>
    ]
    Step7 --> ConsoleDB
    
    AnalyzerDB --- Step8[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #66cccc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>8</span>
            <p style='margin-top: 8px'>specific-viewpoint:process-batches</p>
        </div>
    ]
    Step8 --> OpenAI
    
    OpenAI --- Step9[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #66cccc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>9</span>
            <p style='margin-top: 8px'>specific-viewpoint:check-batch-status</p>
        </div>
    ]
    Step9 --> AnalyzerDB
    
    ConsoleDB --- Step10[
        <div style='text-align: center'>
            <span style='display: inline-block; background-color: #cc9966 !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>10</span>
            <p style='margin-top: 8px'>notification:sale-one-month</p>
        </div>
    ]
    Step10 --> Notification

    %% Styling for components
    style BigQuery fill:#d2e3fc,stroke:#3366cc,stroke-width:2px
    style ConsoleDB fill:#e8f5e8,stroke:#2d8f2d,stroke-width:3px,stroke-dasharray: 5 5
    style AnalyzerDB fill:#e8e8f5,stroke:#4d4d99,stroke-width:3px,stroke-dasharray: 5 5
    style OpenAI fill:#fcf3d2,stroke:#cc9900,stroke-width:2px
    style Dataset fill:#fcd9d9,stroke:#cc6666,stroke-width:2px
    style Crawler fill:#fcd9f2,stroke:#cc66cc,stroke-width:2px
    style Notification fill:#d9f2f2,stroke:#66cccc,stroke-width:2px
    
    %% Subgraph styling with database theme
    style CoreDatabases fill:#f0f8ff,stroke:#4682b4,stroke-width:3px,stroke-dasharray: 10 5
    
    %% Transparent connection steps
    style Step1 fill:transparent,stroke:transparent,stroke-width:1px
    style Step2 fill:transparent,stroke:transparent,stroke-width:1px
    style Step3 fill:transparent,stroke:transparent,stroke-width:1px
    style Step4 fill:transparent,stroke:transparent,stroke-width:1px
    style Step5 fill:transparent,stroke:transparent,stroke-width:1px
    style Step6 fill:transparent,stroke:transparent,stroke-width:1px
    style Step7 fill:transparent,stroke:transparent,stroke-width:1px
    style Step8 fill:transparent,stroke:transparent,stroke-width:1px
    style Step9 fill:transparent,stroke:transparent,stroke-width:1px
    style Step10 fill:transparent,stroke:transparent,stroke-width:1px

System Architecture

The Trend Viewer backend is a Laravel-based application that integrates multiple data sources and services to provide trend analysis functionality. The system follows the repository pattern and is designed to clean, process, and analyze product and review data from various sources.

Key Components

Database Connections

  1. gb_analyzer (configured as 'mysql' in BaseModel.php)

    • Primary database for clean, processed data
    • Used by models extending BaseModel
    • Contains tables with clean data sourced from BigQuery and Crawler
    • Contains 't_' prefixed tables that are directly populated by the Crawler
  2. gb_console (configured as 'mysql_console' in BaseConsoleModel.php)

    • Business database for the TV (Trend Viewer) website
    • Used by models extending BaseConsoleModel in the Console namespace
    • Contains wishlist data, user data, and business logic data

External Services Integration

  1. BigQuery

    • Source of raw data for products, reviews, rankings, etc.
    • Data is synced to gb_analyzer through scheduled commands
    • Sync status is tracked to ensure data integrity
  2. OpenAI

    • Used for specific viewpoint predictions on review sentences
    • Implements batch processing to efficiently analyze large datasets
    • Configurable prompts through AIPromptConfig model
  3. PLG API (Crawler Integration)

    • Interface for communication with the Crawler system
    • Sends summary data from gb_console to Crawler
    • Receives crawl status updates
    • Manages creation and updating of crawl configurations
  4. TV Python API (Analyzer API)

    • Interface for dataset operations
    • Creates datasets based on wishlist requirements
    • Retrieves dataset status information
    • Handles batch processing for data analysis
  5. Notification System

    • Processes data changes and threshold-based events
    • Delivers notifications to users through database and Pusher
    • Handles user preferences and group-based notification rules

Database Structure

  1. gb_analyzer (configured as 'mysql' in BaseModel.php)

    • Primary database for clean, processed data
    • Contains tables with clean data sourced from BigQuery and Crawler
    • Contains 't_' prefixed tables that are directly populated by the Crawler
  2. gb_console (configured as 'mysql_console' in BaseConsoleModel.php)

    • Business database for the TV (Trend Viewer) website
    • Contains wishlist data, user data, and business logic data

Architecture Patterns

  1. Repository Pattern

    • Each model has a corresponding repository interface and implementation
    • Base repository provides common database operations:
      • app/Contracts/Interfaces/BaseRepositoryInterface.php
      • app/Contracts/Repositories/BaseRepository.php
    • Specialized repositories implement business-specific logic
    • Repositories use interfaces to enable dependency injection and testability
  2. Service Layer

    • Services encapsulate business logic
    • Apply transformations and validations to data
    • Coordinate operations between multiple repositories
    • Handle integration with external APIs
  3. Command Pattern for Scheduled Tasks

    • Commands are organized by functional domain
    • Each command handles a specific task in the data pipeline
    • Common functionality extracted to base classes
    • Error handling and reporting built into commands
  4. Queue-based Processing

    • Jobs dispatched for asynchronous processing
    • Batched job processing for efficiency
    • Retry mechanisms for transient failures
    • Progress tracking and logging

Key Data Models

gb_analyzer Models

  • Product: Core product information
  • Review: Product reviews
  • ReviewSentence: Individual sentences from reviews for analysis
  • CategoryRanking: Product rankings within categories
  • SearchQueryRanking: Product rankings for search queries
  • BatchJobViewpoint: Jobs for viewpoint analysis
  • ReviewSentenceSpecificViewpoint: Associations between sentences and viewpoints

gb_console Models

  • WishlistToGroup: Links wishlists to user groups
  • SummaryWishlistProduct, SummaryWishlistCategory: Summary data for crawler
  • WishlistDatasetHistory: History of dataset creation requests
  • AIPromptConfig: Configurations for OpenAI prompts
  • WishlistSpecificViewpoint: Specific viewpoints for analysis

Error Handling and Monitoring

  • ErrorHandleTrait: Provides consistent error handling across the system
  • Slack notification traits: Send alerts to appropriate channels
  • Job status tracking: Monitor and report on job progress
  • Logging: Detailed logging of operations and errors

Data Flow Summary

The system's data flow follows the numbered steps illustrated in the System Components Overview diagram:

  1. BigQuery to Analyzer DB (Step 1):

    • Command: gcp:sync-* (products, reviews, review_sentences)
    • Process: Raw data is extracted from BigQuery and imported into the Analyzer database
    • Implementation: BaseBigQuerySyncCommand fetches data which is then processed in batches by specialized jobs
    • Frequency: Every 30 minutes with daily missed data synchronization
    • Key Feature: Efficient chunking system for handling large datasets
  2. LocalDB Internal Processing (Step 2):

    • Command: localdb:sync-* (product-category-rankings, product-search-query-rankings)
    • Process: Internal processing of 't_' prefixed tables (populated by Crawler) into clean format in Analyzer DB
    • Implementation: Transforms raw data into normalized structures for analysis
    • Frequency: Every 5 minutes
    • Key Feature: Data validation and transformation for downstream systems
  3. Console DB to Crawler (Step 3):

    • Command: plg-api:sending-configs-to-crawler
    • Process: Wishlist and summary data from Console DB sent to Crawler via PLG API
    • Implementation: SendToCrawler prepares configuration data for create/update operations
    • Frequency: Every 5 minutes
    • Key Feature: Dynamic configuration generation based on business rules
  4. Crawler Failed Status to Console DB (Step 4):

    • Command: plg-api:sync-crawl-failed-from-crawler
    • Process: Failed crawl information retrieved from Crawler API and recorded in Console DB
    • Implementation: Updates tracking records with failure details for accountability
    • Frequency: Every 30 minutes
    • Key Feature: Error tracking and diagnostics for failed crawl operations
  5. Analyzer Successful Crawls to Console DB (Step 5):

    • Command: analyzerdb:sync-crawl-success-from-analyzer
    • Process: Successful crawl data verified and status updated in Console DB
    • Implementation: Cross-database verification process for data integrity
    • Frequency: Every 5 minutes
    • Key Feature: State synchronization between operational and business databases
  6. Console DB to Dataset Creation (Step 6):

    • Command: dataset:create
    • Process: Wishlist data processed into datasets via TV Python API
    • Implementation: CreateDataset prepares product, category, and viewpoint data
    • Frequency: Every 30 minutes
    • Key Feature: Comprehensive data packaging for analysis systems
  7. Dataset Status Updates to Console DB (Step 7):

    • Command: dataset:get-status
    • Process: Dataset processing status retrieved from TV Python API and recorded in Console DB
    • Implementation: Status tracking and monitoring of dataset generation
    • Frequency: Every 5 minutes
    • Key Feature: Asynchronous job monitoring and status management
  8. Analyzer DB to OpenAI Processing (Step 8):

    • Command: specific-viewpoint:process-batches
    • Process: Review sentences and viewpoint definitions sent to OpenAI for analysis
    • Implementation: CreateBatchJobLogCommand with prompts from PromptConfigService
    • Frequency: Every 5 minutes
    • Key Feature: AI-powered classification of customer feedback
  9. OpenAI Results to Analyzer DB (Step 9):

    • Command: specific-viewpoint:check-batch-status
    • Process: Classification results retrieved from OpenAI and stored in Analyzer DB
    • Implementation: Processes batch results and creates viewpoint associations
    • Frequency: Every 5 minutes
    • Key Feature: Structured extraction and storage of AI analysis results
  10. Console DB to Notification System (Step 10):

    • Command: notification:sale-one-month
    • Process: Sales data changes trigger notifications through Pusher service
    • Implementation: Threshold-based notification generation and delivery
    • Frequency: Daily
    • Key Feature: Real-time alerts for significant business metrics changes

This interconnected data flow enables a comprehensive trend analysis system that combines data from multiple sources, processes it through specialized services, and delivers valuable insights to end users.

Component Documentation

Component Description Link
BigQuery Sync Synchronizes data from Google BigQuery to the local database. Implements regular sync, status updates, and missed data recovery to ensure data integrity. View Documentation
Local DB Sync Commands for maintaining local database consistency and processing of 't_' prefixed tables from Crawler (localdb:sync-product-category-rankings, localdb:sync-product-search-query-rankings). Ensures data integrity and proper transformation from raw crawler data to clean application data. View Documentation
Crawler Integration Commands for sending wishlist summary data to Crawler (plg-api:sending-configs-to-crawler) and handling failed crawl synchronization (plg-api:sync-crawl-failed-from-crawler). Manages the creation and updating of crawl configurations and handles status updates. View Documentation
AnalyzerDB Integration Commands for synchronizing successful crawl data from Crawler back to gb_console (analyzerdb:sync-crawl-success-from-analyzer). Verifies and processes data from the 't_' prefixed tables and updates business data accordingly. View Documentation
Dataset Commands for dataset generation (dataset:create) and status monitoring (dataset:get-status) via TV Python API. Processes wishlist groups, product and category data to create analysis datasets and monitors their processing status. View Documentation
Specific Viewpoint Commands for processing review sentences with OpenAI (specific-viewpoint:process-batches) and checking batch status (specific-viewpoint:check-batch-status). Uses batch processing for efficiency and configurable prompts through the AIPromptConfig model. View Documentation
Notification Commands for managing and sending system notifications to users (notification:send). Handles various notification types including sales alerts and system updates. View Documentation

Frequency Overview

timeline
    title Command Schedule Timeline
    section High Frequency
        Every 5 minutes<br>(Ex. 08.00) : update bigquery-status
                                       : localdb sync-product-category-rankings
                                       : localdb sync-product-search-query-rankings
                                       : plg-api sending-configs-to-crawler --mode=create --data-type=*
                                       : plg-api sending-configs-to-crawler --mode=update --data-type=*
                                       : analyzerdb sync-crawl-success-from-analyzer --data-type=*
                                       : dataset get-status
                                       : specific-viewpoint process-batches
                                       : specific-viewpoint check-batch-status
    section Medium Frequency
        Every 30 minutes<br>(Ex. 08.30) : gcp sync-products
                                        : gcp sync-reviews
                                        : gcp sync-review_sentences
                                        : plg-api sync-crawl-failed-from-crawler --data-type=*
                                        : dataset create
    section Low Frequency
        Daily<br>(Ex. 00.00) : gcp sync-products --missed
                              : gcp sync-reviews --missed
                              : notification sale-one-month

Note on DataType values:

  • * represents the following data types:
    • SummaryProduct ('product'): Product summary data
    • SummaryProductReview ('reviews'): Product review data
    • SummaryCategory ('category_ranking_group'): Category ranking data
    • SummarySearchQuery ('sq_ranking_group'): Search query ranking data

Implementation Details

Repository Pattern Implementation

The system implements the repository pattern for data access:

  • Interfaces: Define the contract for repository operations in app/Contracts/Interfaces/

    • Base interface: BaseRepositoryInterface.php
    • Analyzer interfaces: app/Contracts/Interfaces/Analyzer/
    • Console interfaces: app/Contracts/Interfaces/Console/
  • Implementations: Provide concrete functionality in app/Contracts/Repositories/

    • Base repository: BaseRepository.php
    • Analyzer repositories: app/Contracts/Repositories/Analyzer/
    • Console repositories: app/Contracts/Repositories/Console/
  • Models: Define data structure and relationships:

    • gb_analyzer models extend app/Models/BaseModel.php
    • gb_console models extend app/Models/Console/BaseConsoleModel.php

Command Pattern for Scheduled Tasks

Commands are organized by functional domain, with each command handling a specific task in the data pipeline:

  • Base Command Classes: Provide shared functionality for related commands

    • Example: BaseBigQuerySyncCommand.php provides common BigQuery synchronization logic
  • Specialized Commands: Implement specific business logic

    • Example: SyncProduct.php, SyncReview.php for specific data types
  • Error Handling: Built-in error handling with ErrorHandleTrait

    • Consistent logging and notification across commands
    • Slack integration for alerts on critical issues

Queue-based Processing

The system uses Laravel's queue system for efficient processing:

  • Job Dispatching: Commands create and dispatch jobs for asynchronous processing
  • Batched Processing: Jobs are chunked and processed in batches for efficiency
  • Retry Mechanisms: Jobs implement retry logic for handling transient failures
  • Monitoring: Job status tracking and detailed logging