BigQuery Regular Sync

Command Signatures

php artisan gcp:sync-products [--items-per-page=] [--all] [--missed]
php artisan gcp:sync-reviews [--items-per-page=] [--all] [--missed]
php artisan gcp:sync-review-sentences [--items-per-page=] [--all] [--missed]

Purpose

These commands synchronize product data, review data, and review sentence data from BigQuery to the local database. They run on a regular schedule to ensure that the local database contains up-to-date information from BigQuery for analysis and presentation.

Sequence Diagrams

Products Sync

sequenceDiagram
    participant System
    participant Products as gcp:sync-products
    participant BigQuery
    participant ProductsTable as products table
    participant ProductDetailsTable as product_details table
    participant Redis
    
    Note over System,Redis: Products Sync Flow
    
    rect rgb(191, 223, 255)
    Note right of System: Every 30 Minutes
    System->>Products: Execute
    Products->>BigQuery: Fetch Products Data (recent records)
    BigQuery-->>Products: Return Products & Product Details Data
    Products->>ProductsTable: Insert/Update Products Records
    Products->>ProductDetailsTable: Insert/Update Product Details Records
    Products->>Redis: Store Product IDs for Status Update
    end

Reviews Sync

sequenceDiagram
    participant System
    participant Reviews as gcp:sync-reviews
    participant BigQuery
    participant ReviewsTable as reviews table
    participant Redis
    
    Note over System,Redis: Reviews Sync Flow
    
    rect rgb(191, 223, 255)
    Note right of System: Every 30 Minutes
    System->>Reviews: Execute
    Reviews->>BigQuery: Fetch Reviews Data (recent records)
    BigQuery-->>Reviews: Return Reviews Data
    Reviews->>ReviewsTable: Insert/Update Reviews Records
    Reviews->>Redis: Store Review IDs for Status Update
    end

Review Sentences Sync

sequenceDiagram
    participant System
    participant Sentences as gcp:sync-review-sentences
    participant BigQuery
    participant ReviewSentencesTable as review_sentences table
    participant Redis
    
    Note over System,Redis: Review Sentences Sync Flow
    
    rect rgb(191, 223, 255)
    Note right of System: Every 30 Minutes
    System->>Sentences: Execute
    Sentences->>BigQuery: Fetch Review Sentences Data (recent records)
    BigQuery-->>Sentences: Return Review Sentences Data
    Sentences->>ReviewSentencesTable: Insert/Update Review Sentences Records
    Sentences->>Redis: Store Sentence IDs for Status Update
    end

Implementation Details

Command Structure

All commands inherit from BaseBigQuerySyncCommand which provides:

  1. Query building with conditions for recent records
  2. Chunking of data for batch processing
  3. Error handling and logging
  4. Redis tracking for status updates

Parameters

  • --items-per-page: Number of records to process per batch (default: 500)
  • --all: Process all records without time constraints
  • --missed: Process only records with null status

Frequency

Every 30 minutes

Dependencies

  • Google Cloud Platform access credentials
  • BigQuery project and dataset configuration
  • Redis for tracking processed IDs
  • Local database connection
  • Queue system for background processing

Output

Database Schema

erDiagram
    products {
        bigint id PK
        integer mall_id "Mall identifier"
        string mall_product_id "Product ID in mall system"
        string jan_code "JAN code"
        string input_type "Source of data"
        string unique_key "System-generated unique key"
        timestamp crawl_created_at "Timestamp from crawler"
        timestamp bq_created_at "Timestamp from BigQuery"
        timestamp created_at
        timestamp updated_at
    }
    
    product_details {
        bigint id PK
        bigint product_id FK
        string image "Product image URL"
        string product_url "Product page URL"
        string title "Product title"
        double rating "Average rating"
        integer num_reviews "Number of reviews"
        string mall_shop_name "Shop name in mall"
        string maker_name "Manufacturer name"
        double base_price "Base price"
        double shipping_fee "Shipping cost"
        double price "Calculated final price"
        integer sales_one_month "Monthly sales"
        integer point "Points"
        json coupon "Coupon information"
        string ranking_description "Ranking description"
        string unique_key "System-generated unique key"
        timestamp crawl_created_at "Timestamp from crawler"
        timestamp bq_created_at "Timestamp from BigQuery"
        timestamp created_at
        timestamp updated_at
    }
    
    reviews {
        bigint id PK
        integer mall_id "Mall identifier"
        integer review_count "Number of reviews by user"
        integer vote "Helpful votes"
        tinyint type "Review type (1=Non-buyer, 2=Buyer, 3=Vine)"
        string variant "Variant options"
        string age "Age of buyer"
        string gender "Gender of buyer"
        text content "Review content"
        string mall_product_id "Product ID in mall system"
        string jan_code "JAN code"
        double rating "Numerical rating"
        string shop_name "Shop name"
        date post_date "Date of review"
        bigint crawl_review_id "Unique review ID from crawler"
        timestamp crawl_created_at "Timestamp from crawler"
        timestamp bq_created_at "Timestamp from BigQuery"
        timestamp created_at
        timestamp updated_at
    }
    
    review_sentences {
        bigint id PK
        bigint crawl_review_id "Related review ID"
        bigint sentence_id "Unique sentence ID"
        string mall_product_id "Product ID in mall system"
        string jan_code "JAN code"
        longtext content "Individual sentence content"
        double sentiment_score "Sentiment analysis score"
        date post_date "Date of review"
        timestamp crawl_created_at "Timestamp from crawler"
        timestamp bq_created_at "Timestamp from BigQuery"
        timestamp created_at
        timestamp updated_at
    }
    
    products ||--o{ product_details : "has many"
    reviews ||--o{ review_sentences : "contains"

Processing Flow

  1. Command is executed on schedule
  2. Data is fetched from BigQuery with appropriate conditions
  3. Data is chunked into batches and processed by queue jobs
  4. Processed IDs are stored in Redis for status tracking
  5. Status update command periodically updates BigQuery status

Error Handling

Logs

  • Detailed start/end logs with job counts and record counts
  • Error messages with stack traces
  • Performance metrics for monitoring

Slack Notifications

  • Critical errors during sync are reported to Slack channels
  • Success notifications with summary statistics

Troubleshooting

Common Issues

  1. Missing Data: Check Redis queue and status update command
  2. Slow Sync: Verify batch sizes and database indexes
  3. Query Errors: Validate BigQuery table schema matches expected format
  4. Queue Backlog: Monitor queue workers and increase capacity if needed

Verification Steps

  1. Check logs for successful job completion
  2. Verify Redis contains IDs awaiting status updates
  3. Confirm updated records in local database
  4. Monitor BigQuery for updated status flags