BigQuery Regular Sync
Command Signatures
php artisan gcp:sync-products [--items-per-page=] [--all] [--missed]
php artisan gcp:sync-reviews [--items-per-page=] [--all] [--missed]
php artisan gcp:sync-review-sentences [--items-per-page=] [--all] [--missed]
Purpose
These commands synchronize product data, review data, and review sentence data from BigQuery to the local database. They run on a regular schedule to ensure that the local database contains up-to-date information from BigQuery for analysis and presentation.
Sequence Diagrams
Products Sync
sequenceDiagram
participant System
participant Products as gcp:sync-products
participant BigQuery
participant ProductsTable as products table
participant ProductDetailsTable as product_details table
participant Redis
Note over System,Redis: Products Sync Flow
rect rgb(191, 223, 255)
Note right of System: Every 30 Minutes
System->>Products: Execute
Products->>BigQuery: Fetch Products Data (recent records)
BigQuery-->>Products: Return Products & Product Details Data
Products->>ProductsTable: Insert/Update Products Records
Products->>ProductDetailsTable: Insert/Update Product Details Records
Products->>Redis: Store Product IDs for Status Update
end
Reviews Sync
sequenceDiagram
participant System
participant Reviews as gcp:sync-reviews
participant BigQuery
participant ReviewsTable as reviews table
participant Redis
Note over System,Redis: Reviews Sync Flow
rect rgb(191, 223, 255)
Note right of System: Every 30 Minutes
System->>Reviews: Execute
Reviews->>BigQuery: Fetch Reviews Data (recent records)
BigQuery-->>Reviews: Return Reviews Data
Reviews->>ReviewsTable: Insert/Update Reviews Records
Reviews->>Redis: Store Review IDs for Status Update
end
Review Sentences Sync
sequenceDiagram
participant System
participant Sentences as gcp:sync-review-sentences
participant BigQuery
participant ReviewSentencesTable as review_sentences table
participant Redis
Note over System,Redis: Review Sentences Sync Flow
rect rgb(191, 223, 255)
Note right of System: Every 30 Minutes
System->>Sentences: Execute
Sentences->>BigQuery: Fetch Review Sentences Data (recent records)
BigQuery-->>Sentences: Return Review Sentences Data
Sentences->>ReviewSentencesTable: Insert/Update Review Sentences Records
Sentences->>Redis: Store Sentence IDs for Status Update
end
Implementation Details
Command Structure
All commands inherit from BaseBigQuerySyncCommand which provides:
- Query building with conditions for recent records
- Chunking of data for batch processing
- Error handling and logging
- Redis tracking for status updates
Parameters
--items-per-page: Number of records to process per batch (default: 500)--all: Process all records without time constraints--missed: Process only records with null status
Frequency
Every 30 minutes
Dependencies
- Google Cloud Platform access credentials
- BigQuery project and dataset configuration
- Redis for tracking processed IDs
- Local database connection
- Queue system for background processing
Output
Database Schema
erDiagram
products {
bigint id PK
integer mall_id "Mall identifier"
string mall_product_id "Product ID in mall system"
string jan_code "JAN code"
string input_type "Source of data"
string unique_key "System-generated unique key"
timestamp crawl_created_at "Timestamp from crawler"
timestamp bq_created_at "Timestamp from BigQuery"
timestamp created_at
timestamp updated_at
}
product_details {
bigint id PK
bigint product_id FK
string image "Product image URL"
string product_url "Product page URL"
string title "Product title"
double rating "Average rating"
integer num_reviews "Number of reviews"
string mall_shop_name "Shop name in mall"
string maker_name "Manufacturer name"
double base_price "Base price"
double shipping_fee "Shipping cost"
double price "Calculated final price"
integer sales_one_month "Monthly sales"
integer point "Points"
json coupon "Coupon information"
string ranking_description "Ranking description"
string unique_key "System-generated unique key"
timestamp crawl_created_at "Timestamp from crawler"
timestamp bq_created_at "Timestamp from BigQuery"
timestamp created_at
timestamp updated_at
}
reviews {
bigint id PK
integer mall_id "Mall identifier"
integer review_count "Number of reviews by user"
integer vote "Helpful votes"
tinyint type "Review type (1=Non-buyer, 2=Buyer, 3=Vine)"
string variant "Variant options"
string age "Age of buyer"
string gender "Gender of buyer"
text content "Review content"
string mall_product_id "Product ID in mall system"
string jan_code "JAN code"
double rating "Numerical rating"
string shop_name "Shop name"
date post_date "Date of review"
bigint crawl_review_id "Unique review ID from crawler"
timestamp crawl_created_at "Timestamp from crawler"
timestamp bq_created_at "Timestamp from BigQuery"
timestamp created_at
timestamp updated_at
}
review_sentences {
bigint id PK
bigint crawl_review_id "Related review ID"
bigint sentence_id "Unique sentence ID"
string mall_product_id "Product ID in mall system"
string jan_code "JAN code"
longtext content "Individual sentence content"
double sentiment_score "Sentiment analysis score"
date post_date "Date of review"
timestamp crawl_created_at "Timestamp from crawler"
timestamp bq_created_at "Timestamp from BigQuery"
timestamp created_at
timestamp updated_at
}
products ||--o{ product_details : "has many"
reviews ||--o{ review_sentences : "contains"
Processing Flow
- Command is executed on schedule
- Data is fetched from BigQuery with appropriate conditions
- Data is chunked into batches and processed by queue jobs
- Processed IDs are stored in Redis for status tracking
- Status update command periodically updates BigQuery status
Error Handling
Logs
- Detailed start/end logs with job counts and record counts
- Error messages with stack traces
- Performance metrics for monitoring
Slack Notifications
- Critical errors during sync are reported to Slack channels
- Success notifications with summary statistics
Troubleshooting
Common Issues
- Missing Data: Check Redis queue and status update command
- Slow Sync: Verify batch sizes and database indexes
- Query Errors: Validate BigQuery table schema matches expected format
- Queue Backlog: Monitor queue workers and increase capacity if needed
Verification Steps
- Check logs for successful job completion
- Verify Redis contains IDs awaiting status updates
- Confirm updated records in local database
- Monitor BigQuery for updated status flags