Scheduled Commands Documentation (English)
This documentation provides details about all scheduled commands in the system, organized by component.
System Components Overview
---
config:
theme: base
layout: dagre
flowchart:
curve: linear
htmlLabels: true
themeVariables:
edgeLabelBackground: "transparent"
---
flowchart TD
%% External Data Sources
BigQuery[(GCP BigQuery)]
%% Core Databases - Aligned horizontally
subgraph CoreDatabases["Core Databases"]
direction LR
AnalyzerDB[(gb_analyzer)]
ConsoleDB[(gb_console)]
end
%% External Services
Crawler((PLG API))
Dataset((TV Python API))
OpenAI[[Open AI]]
Notification((Notification System))
%% Define connections with improved formatting
BigQuery --- Step1[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #6699cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>1</span>
<p style='margin-top: 8px'>gcp:sync-*</p>
</div>
]
Step1 --> AnalyzerDB
AnalyzerDB --- Step2[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #99cc66 !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>2</span>
<p style='margin-top: 8px'>localdb:sync-*</p>
</div>
]
Step2 --> AnalyzerDB
ConsoleDB --- Step3[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #cc66cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>3</span>
<p style='margin-top: 8px'>plg-api:sending-configs-to-crawler</p>
</div>
]
Step3 --> Crawler
Crawler --- Step4[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #cc66cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>4</span>
<p style='margin-top: 8px'>plg-api:sync-crawl-failed-from-crawler</p>
</div>
]
Step4 --> ConsoleDB
AnalyzerDB --- Step5[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #6699cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>5</span>
<p style='margin-top: 8px'>analyzerdb:sync-crawl-success-from-analyzer</p>
</div>
]
Step5 --> ConsoleDB
ConsoleDB --- Step6[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #ff9966 !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>6</span>
<p style='margin-top: 8px'>dataset:create</p>
</div>
]
Step6 --> Dataset
Dataset --- Step7[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #ff9966 !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>7</span>
<p style='margin-top: 8px'>dataset:get-status</p>
</div>
]
Step7 --> ConsoleDB
AnalyzerDB --- Step8[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #66cccc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>8</span>
<p style='margin-top: 8px'>specific-viewpoint:process-batches</p>
</div>
]
Step8 --> OpenAI
OpenAI --- Step9[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #66cccc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>9</span>
<p style='margin-top: 8px'>specific-viewpoint:check-batch-status</p>
</div>
]
Step9 --> AnalyzerDB
ConsoleDB --- Step10[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #cc9966 !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>10</span>
<p style='margin-top: 8px'>notification:sale-one-month</p>
</div>
]
Step10 --> Notification
%% Styling for components
style BigQuery fill:#d2e3fc,stroke:#3366cc,stroke-width:2px
style ConsoleDB fill:#e8f5e8,stroke:#2d8f2d,stroke-width:3px,stroke-dasharray: 5 5
style AnalyzerDB fill:#e8e8f5,stroke:#4d4d99,stroke-width:3px,stroke-dasharray: 5 5
style OpenAI fill:#fcf3d2,stroke:#cc9900,stroke-width:2px
style Dataset fill:#fcd9d9,stroke:#cc6666,stroke-width:2px
style Crawler fill:#fcd9f2,stroke:#cc66cc,stroke-width:2px
style Notification fill:#d9f2f2,stroke:#66cccc,stroke-width:2px
%% Subgraph styling with database theme
style CoreDatabases fill:#f0f8ff,stroke:#4682b4,stroke-width:3px,stroke-dasharray: 10 5
%% Transparent connection steps
style Step1 fill:transparent,stroke:transparent,stroke-width:1px
style Step2 fill:transparent,stroke:transparent,stroke-width:1px
style Step3 fill:transparent,stroke:transparent,stroke-width:1px
style Step4 fill:transparent,stroke:transparent,stroke-width:1px
style Step5 fill:transparent,stroke:transparent,stroke-width:1px
style Step6 fill:transparent,stroke:transparent,stroke-width:1px
style Step7 fill:transparent,stroke:transparent,stroke-width:1px
style Step8 fill:transparent,stroke:transparent,stroke-width:1px
style Step9 fill:transparent,stroke:transparent,stroke-width:1px
style Step10 fill:transparent,stroke:transparent,stroke-width:1px
System Architecture
The Trend Viewer backend is a Laravel-based application that integrates multiple data sources and services to provide trend analysis functionality. The system follows the repository pattern and is designed to clean, process, and analyze product and review data from various sources.
Key Components
Database Connections
-
gb_analyzer (configured as 'mysql' in BaseModel.php)
- Primary database for clean, processed data
- Used by models extending
BaseModel - Contains tables with clean data sourced from BigQuery and Crawler
- Contains 't_' prefixed tables that are directly populated by the Crawler
-
gb_console (configured as 'mysql_console' in BaseConsoleModel.php)
- Business database for the TV (Trend Viewer) website
- Used by models extending
BaseConsoleModelin theConsolenamespace - Contains wishlist data, user data, and business logic data
External Services Integration
-
BigQuery
- Source of raw data for products, reviews, rankings, etc.
- Data is synced to gb_analyzer through scheduled commands
- Sync status is tracked to ensure data integrity
-
OpenAI
- Used for specific viewpoint predictions on review sentences
- Implements batch processing to efficiently analyze large datasets
- Configurable prompts through
AIPromptConfigmodel
-
PLG API (Crawler Integration)
- Interface for communication with the Crawler system
- Sends summary data from gb_console to Crawler
- Receives crawl status updates
- Manages creation and updating of crawl configurations
-
TV Python API (Analyzer API)
- Interface for dataset operations
- Creates datasets based on wishlist requirements
- Retrieves dataset status information
- Handles batch processing for data analysis
-
Notification System
- Processes data changes and threshold-based events
- Delivers notifications to users through database and Pusher
- Handles user preferences and group-based notification rules
Database Structure
-
gb_analyzer (configured as 'mysql' in BaseModel.php)
- Primary database for clean, processed data
- Contains tables with clean data sourced from BigQuery and Crawler
- Contains 't_' prefixed tables that are directly populated by the Crawler
-
gb_console (configured as 'mysql_console' in BaseConsoleModel.php)
- Business database for the TV (Trend Viewer) website
- Contains wishlist data, user data, and business logic data
Architecture Patterns
-
Repository Pattern
- Each model has a corresponding repository interface and implementation
- Base repository provides common database operations:
app/Contracts/Interfaces/BaseRepositoryInterface.phpapp/Contracts/Repositories/BaseRepository.php
- Specialized repositories implement business-specific logic
- Repositories use interfaces to enable dependency injection and testability
-
Service Layer
- Services encapsulate business logic
- Apply transformations and validations to data
- Coordinate operations between multiple repositories
- Handle integration with external APIs
-
Command Pattern for Scheduled Tasks
- Commands are organized by functional domain
- Each command handles a specific task in the data pipeline
- Common functionality extracted to base classes
- Error handling and reporting built into commands
-
Queue-based Processing
- Jobs dispatched for asynchronous processing
- Batched job processing for efficiency
- Retry mechanisms for transient failures
- Progress tracking and logging
Key Data Models
gb_analyzer Models
Product: Core product informationReview: Product reviewsReviewSentence: Individual sentences from reviews for analysisCategoryRanking: Product rankings within categoriesSearchQueryRanking: Product rankings for search queriesBatchJobViewpoint: Jobs for viewpoint analysisReviewSentenceSpecificViewpoint: Associations between sentences and viewpoints
gb_console Models
WishlistToGroup: Links wishlists to user groupsSummaryWishlistProduct,SummaryWishlistCategory: Summary data for crawlerWishlistDatasetHistory: History of dataset creation requestsAIPromptConfig: Configurations for OpenAI promptsWishlistSpecificViewpoint: Specific viewpoints for analysis
Error Handling and Monitoring
ErrorHandleTrait: Provides consistent error handling across the system- Slack notification traits: Send alerts to appropriate channels
- Job status tracking: Monitor and report on job progress
- Logging: Detailed logging of operations and errors
Data Flow Summary
The system's data flow follows the numbered steps illustrated in the System Components Overview diagram:
-
BigQuery to Analyzer DB (Step 1):
- Command:
gcp:sync-*(products, reviews, review_sentences) - Process: Raw data is extracted from BigQuery and imported into the Analyzer database
- Implementation:
BaseBigQuerySyncCommandfetches data which is then processed in batches by specialized jobs - Frequency: Every 30 minutes with daily missed data synchronization
- Key Feature: Efficient chunking system for handling large datasets
- Command:
-
LocalDB Internal Processing (Step 2):
- Command:
localdb:sync-*(product-category-rankings, product-search-query-rankings) - Process: Internal processing of 't_' prefixed tables (populated by Crawler) into clean format in Analyzer DB
- Implementation: Transforms raw data into normalized structures for analysis
- Frequency: Every 5 minutes
- Key Feature: Data validation and transformation for downstream systems
- Command:
-
Console DB to Crawler (Step 3):
- Command:
plg-api:sending-configs-to-crawler - Process: Wishlist and summary data from Console DB sent to Crawler via PLG API
- Implementation:
SendToCrawlerprepares configuration data for create/update operations - Frequency: Every 5 minutes
- Key Feature: Dynamic configuration generation based on business rules
- Command:
-
Crawler Failed Status to Console DB (Step 4):
- Command:
plg-api:sync-crawl-failed-from-crawler - Process: Failed crawl information retrieved from Crawler API and recorded in Console DB
- Implementation: Updates tracking records with failure details for accountability
- Frequency: Every 30 minutes
- Key Feature: Error tracking and diagnostics for failed crawl operations
- Command:
-
Analyzer Successful Crawls to Console DB (Step 5):
- Command:
analyzerdb:sync-crawl-success-from-analyzer - Process: Successful crawl data verified and status updated in Console DB
- Implementation: Cross-database verification process for data integrity
- Frequency: Every 5 minutes
- Key Feature: State synchronization between operational and business databases
- Command:
-
Console DB to Dataset Creation (Step 6):
- Command:
dataset:create - Process: Wishlist data processed into datasets via TV Python API
- Implementation:
CreateDatasetprepares product, category, and viewpoint data - Frequency: Every 30 minutes
- Key Feature: Comprehensive data packaging for analysis systems
- Command:
-
Dataset Status Updates to Console DB (Step 7):
- Command:
dataset:get-status - Process: Dataset processing status retrieved from TV Python API and recorded in Console DB
- Implementation: Status tracking and monitoring of dataset generation
- Frequency: Every 5 minutes
- Key Feature: Asynchronous job monitoring and status management
- Command:
-
Analyzer DB to OpenAI Processing (Step 8):
- Command:
specific-viewpoint:process-batches - Process: Review sentences and viewpoint definitions sent to OpenAI for analysis
- Implementation:
CreateBatchJobLogCommandwith prompts fromPromptConfigService - Frequency: Every 5 minutes
- Key Feature: AI-powered classification of customer feedback
- Command:
-
OpenAI Results to Analyzer DB (Step 9):
- Command:
specific-viewpoint:check-batch-status - Process: Classification results retrieved from OpenAI and stored in Analyzer DB
- Implementation: Processes batch results and creates viewpoint associations
- Frequency: Every 5 minutes
- Key Feature: Structured extraction and storage of AI analysis results
- Command:
-
Console DB to Notification System (Step 10):
- Command:
notification:sale-one-month - Process: Sales data changes trigger notifications through Pusher service
- Implementation: Threshold-based notification generation and delivery
- Frequency: Daily
- Key Feature: Real-time alerts for significant business metrics changes
- Command:
This interconnected data flow enables a comprehensive trend analysis system that combines data from multiple sources, processes it through specialized services, and delivers valuable insights to end users.
Component Documentation
| Component | Description | Link |
|---|---|---|
| BigQuery Sync | Synchronizes data from Google BigQuery to the local database. Implements regular sync, status updates, and missed data recovery to ensure data integrity. | View Documentation |
| Local DB Sync | Commands for maintaining local database consistency and processing of 't_' prefixed tables from Crawler (localdb:sync-product-category-rankings, localdb:sync-product-search-query-rankings). Ensures data integrity and proper transformation from raw crawler data to clean application data. | View Documentation |
| Crawler Integration | Commands for sending wishlist summary data to Crawler (plg-api:sending-configs-to-crawler) and handling failed crawl synchronization (plg-api:sync-crawl-failed-from-crawler). Manages the creation and updating of crawl configurations and handles status updates. | View Documentation |
| AnalyzerDB Integration | Commands for synchronizing successful crawl data from Crawler back to gb_console (analyzerdb:sync-crawl-success-from-analyzer). Verifies and processes data from the 't_' prefixed tables and updates business data accordingly. | View Documentation |
| Dataset | Commands for dataset generation (dataset:create) and status monitoring (dataset:get-status) via TV Python API. Processes wishlist groups, product and category data to create analysis datasets and monitors their processing status. | View Documentation |
| Specific Viewpoint | Commands for processing review sentences with OpenAI (specific-viewpoint:process-batches) and checking batch status (specific-viewpoint:check-batch-status). Uses batch processing for efficiency and configurable prompts through the AIPromptConfig model. | View Documentation |
| Notification | Commands for managing and sending system notifications to users (notification:send). Handles various notification types including sales alerts and system updates. | View Documentation |
Frequency Overview
timeline
title Command Schedule Timeline
section High Frequency
Every 5 minutes<br>(Ex. 08.00) : update bigquery-status
: localdb sync-product-category-rankings
: localdb sync-product-search-query-rankings
: plg-api sending-configs-to-crawler --mode=create --data-type=*
: plg-api sending-configs-to-crawler --mode=update --data-type=*
: analyzerdb sync-crawl-success-from-analyzer --data-type=*
: dataset get-status
: specific-viewpoint process-batches
: specific-viewpoint check-batch-status
section Medium Frequency
Every 30 minutes<br>(Ex. 08.30) : gcp sync-products
: gcp sync-reviews
: gcp sync-review_sentences
: plg-api sync-crawl-failed-from-crawler --data-type=*
: dataset create
section Low Frequency
Daily<br>(Ex. 00.00) : gcp sync-products --missed
: gcp sync-reviews --missed
: notification sale-one-month
Note on DataType values:
*represents the following data types:SummaryProduct('product'): Product summary dataSummaryProductReview('reviews'): Product review dataSummaryCategory('category_ranking_group'): Category ranking dataSummarySearchQuery('sq_ranking_group'): Search query ranking data
Implementation Details
Repository Pattern Implementation
The system implements the repository pattern for data access:
-
Interfaces: Define the contract for repository operations in
app/Contracts/Interfaces/- Base interface:
BaseRepositoryInterface.php - Analyzer interfaces:
app/Contracts/Interfaces/Analyzer/ - Console interfaces:
app/Contracts/Interfaces/Console/
- Base interface:
-
Implementations: Provide concrete functionality in
app/Contracts/Repositories/- Base repository:
BaseRepository.php - Analyzer repositories:
app/Contracts/Repositories/Analyzer/ - Console repositories:
app/Contracts/Repositories/Console/
- Base repository:
-
Models: Define data structure and relationships:
- gb_analyzer models extend
app/Models/BaseModel.php - gb_console models extend
app/Models/Console/BaseConsoleModel.php
- gb_analyzer models extend
Command Pattern for Scheduled Tasks
Commands are organized by functional domain, with each command handling a specific task in the data pipeline:
-
Base Command Classes: Provide shared functionality for related commands
- Example:
BaseBigQuerySyncCommand.phpprovides common BigQuery synchronization logic
- Example:
-
Specialized Commands: Implement specific business logic
- Example:
SyncProduct.php,SyncReview.phpfor specific data types
- Example:
-
Error Handling: Built-in error handling with
ErrorHandleTrait- Consistent logging and notification across commands
- Slack integration for alerts on critical issues
Queue-based Processing
The system uses Laravel's queue system for efficient processing:
- Job Dispatching: Commands create and dispatch jobs for asynchronous processing
- Batched Processing: Jobs are chunked and processed in batches for efficiency
- Retry Mechanisms: Jobs implement retry logic for handling transient failures
- Monitoring: Job status tracking and detailed logging