Sync Success

Command Signature

php artisan analyzerdb:sync-crawl-success-from-analyzer --data-type=SummaryProduct [--limit=100]
php artisan analyzerdb:sync-crawl-success-from-analyzer --data-type=SummaryProductReview [--limit=100]
php artisan analyzerdb:sync-crawl-success-from-analyzer --data-type=SummaryCategory [--limit=100]
php artisan analyzerdb:sync-crawl-success-from-analyzer --data-type=SummarySearchQuery [--limit=100]

Purpose

These commands verify whether records from the summary wishlist tables in the Console database (gb_console) have been successfully processed in the Analyzer database (gb_analyzer). When matching records are found in the Analyzer database with appropriate timestamps, the corresponding records in the Console database are updated with a "Success" crawl status, maintaining accurate tracking of the data processing pipeline across both databases.

Sequence Diagram

Main Command Flow

sequenceDiagram
    participant System
    participant Command as analyzerdb:sync-crawl-success-from-analyzer
    participant Repository as SummaryWishlist*Repository
    participant ConsoleDB[(gb_console)]
    participant Logger
    participant Slack
    
    Note over System,Slack: Command Execution Flow (Every 5 Minutes)
    
    rect rgb(200, 255, 200)
    Note right of System: Happy Case - Command Processing
    
    System->>Command: Execute with --data-type parameter
    Command->>Logger: Log command start with parameters
    Command->>Repository: chunkDataUpdateCrawlStatusSuccess()
    Repository->>ConsoleDB: Query records WHERE crawl_status = Crawling
    ConsoleDB-->>Repository: Return records needing verification
    Repository-->>Command: Return chunks of records (max: limit option)
    
    rect rgb(200, 230, 255)
    loop For each chunk of records
        Note right of Command: Batch Job Dispatch
        Command->>Command: Dispatch specialized job based on data type
        Command->>Logger: Log chunk processing start
    end
    end
    
    Command->>Logger: Log command completion
    Command->>Slack: Send processing summary notification
    end
    
    rect rgb(255, 200, 200)
    Note right of System: Error Handling
    rect rgb(255, 230, 230)
    alt Command Error Occurs
        Command->>Logger: Log command error details
        Command->>Slack: Send error notification
    end
    end
    end

SummaryProduct Job Flow

sequenceDiagram
    participant Job as SummaryProductJob
    participant ProductRepo as ProductRepositoryInterface
    participant AnalyzerDB[(gb_analyzer.products)]
    participant ConsoleRepo as SummaryWishlistProductRepository
    participant ConsoleDB[(gb_console.summary_wishlist_products)]
    participant Logger
    participant Slack
    
    Note over Job,Slack: Product Verification Job Flow
    
    rect rgb(200, 255, 200)
    Note right of Job: Happy Case - Product Verification
    
    Job->>Logger: Log job start with record count
    Job->>Job: Extract mall_id, input_type, input from records
    Job->>ProductRepo: Query products with lookup keys
    ProductRepo->>AnalyzerDB: SELECT WHERE mall_id, input_type, input match
    Note right of AnalyzerDB: Time constraint: crawl_created_at >= updated_at
    AnalyzerDB-->>ProductRepo: Return matching products
    ProductRepo-->>Job: Return verified product records
    
    rect rgb(255, 255, 200)
    alt Products found in Analyzer DB
        Note right of Job: Success Update
        Job->>ConsoleRepo: updateByConditions(crawl_status = Success)
        ConsoleRepo->>ConsoleDB: UPDATE crawl_status = 2 WHERE id IN (...)
        ConsoleDB-->>ConsoleRepo: Confirm batch update
        ConsoleRepo-->>Job: Return update count
        
        rect rgb(230, 200, 255)
        Note right of Job: Success Monitoring
        Job->>Logger: Log successful updates with count
        Job->>Slack: Send success notification with statistics
        end
    else No matching products
        Note right of Job: No Action Required
        Job->>Logger: Log no updates needed
    end
    end
    end
    
    rect rgb(255, 200, 200)
    Note right of Job: Error Handling
    rect rgb(255, 230, 230)
    alt Database Error Occurs
        Job->>Job: Detect deadlock/connection error
        Job->>Job: Implement exponential backoff retry
        Job->>Logger: Log retry attempt with sleep time
        
        rect rgb(233, 196, 196)
        alt Max retries exceeded (5 attempts)
            Job->>Logger: Log final error with stack trace
            Job->>Slack: Send error notification
        end
        end
    end
    end
    end

SummaryProductReview Job Flow

sequenceDiagram
    participant Job as SummaryProductReviewJob
    participant ReviewRepo as ReviewRepositoryInterface
    participant AnalyzerDB[(gb_analyzer.reviews)]
    participant ConsoleRepo as SummaryWishlistProductReviewRepository
    participant ConsoleDB[(gb_console.summary_wishlist_product_reviews)]
    participant Logger
    participant Slack
    
    Note over Job,Slack: Product Review Verification Job Flow
    
    rect rgb(200, 255, 200)
    Note right of Job: Happy Case - Review Verification
    
    Job->>Logger: Log job start with record count
    Job->>Job: Extract product identifiers from related products
    Job->>ReviewRepo: Query reviews with product identifiers
    ReviewRepo->>AnalyzerDB: SELECT WHERE mall_id, product identifiers match
    Note right of AnalyzerDB: Time constraint: crawl_created_at >= updated_at
    AnalyzerDB-->>ReviewRepo: Return matching reviews
    ReviewRepo-->>Job: Return verified review records
    
    rect rgb(255, 255, 200)
    alt Reviews found in Analyzer DB
        Note right of Job: Success Update
        Job->>ConsoleRepo: updateBySummaryWishlistProductIds()
        ConsoleRepo->>ConsoleDB: UPDATE crawl_status = 2 WHERE product_id IN (...)
        ConsoleDB-->>ConsoleRepo: Confirm batch update
        ConsoleRepo-->>Job: Return update count
        
        rect rgb(230, 200, 255)
        Note right of Job: Success Monitoring
        Job->>Logger: Log successful updates with count
        Job->>Slack: Send success notification with statistics
        end
    else No matching reviews
        Note right of Job: No Action Required
        Job->>Logger: Log no updates needed
    end
    end
    end
    
    rect rgb(255, 200, 200)
    Note right of Job: Error Handling
    rect rgb(255, 230, 230)
    alt Database Error Occurs
        Job->>Job: Detect deadlock/connection error
        Job->>Job: Implement exponential backoff retry
        Job->>Logger: Log retry attempt with sleep time
        
        rect rgb(233, 196, 196)
        alt Max retries exceeded (5 attempts)
            Job->>Logger: Log final error with stack trace
            Job->>Slack: Send error notification
        end
        end
        end
    end
    end

SummaryCategory Job Flow

sequenceDiagram
    participant Job as SummaryCategoryJob
    participant CategoryRepo as TempCategoryRankingRepositoryInterface
    participant AnalyzerDB[(gb_analyzer.t_category_rankings)]
    participant ConsoleRepo as SummaryWishlistCategoryRepository
    participant ConsoleDB[(gb_console.summary_wishlist_categories)]
    participant Logger
    participant Slack
    
    Note over Job,Slack: Category Verification Job Flow
    
    rect rgb(200, 255, 200)
    Note right of Job: Happy Case - Category Verification
    
    Job->>Logger: Log job start with record count
    Job->>Job: Extract mall_id, category_id from records
    Job->>CategoryRepo: Query category rankings with lookup keys
    CategoryRepo->>AnalyzerDB: SELECT WHERE mall_id, category_id match
    Note right of AnalyzerDB: Time constraint: crawl_created_at >= updated_at
    AnalyzerDB-->>CategoryRepo: Return matching category rankings
    CategoryRepo-->>Job: Return verified category records
    
    rect rgb(255, 255, 200)
    alt Categories found in Analyzer DB
        Note right of Job: Success Update
        Job->>ConsoleRepo: updateByConditions(crawl_status = Success)
        ConsoleRepo->>ConsoleDB: UPDATE crawl_status = 2 WHERE id IN (...)
        ConsoleDB-->>ConsoleRepo: Confirm batch update
        ConsoleRepo-->>Job: Return update count
        
        rect rgb(230, 200, 255)
        Note right of Job: Success Monitoring
        Job->>Logger: Log successful updates with count
        Job->>Slack: Send success notification with statistics
        end
    else No matching categories
        Note right of Job: No Action Required
        Job->>Logger: Log no updates needed
    end
    end
    end
    
    rect rgb(255, 200, 200)
    Note right of Job: Error Handling
    rect rgb(255, 230, 230)
    alt Database Error Occurs
        Job->>Job: Detect deadlock/connection error
        Job->>Job: Implement exponential backoff retry
        Job->>Logger: Log retry attempt with sleep time
        
        rect rgb(233, 196, 196)
        alt Max retries exceeded (5 attempts)
            Job->>Logger: Log final error with stack trace
            Job->>Slack: Send error notification
        end
        end
    end
    end
    end

SummarySearchQuery Job Flow

sequenceDiagram
    participant Job as SummarySearchQueryJob
    participant SearchRepo as TempSearchQueryRankingRepositoryInterface
    participant AnalyzerDB[(gb_analyzer.t_sq_rankings)]
    participant ConsoleRepo as SummaryWishlistSearchQueryRepository
    participant ConsoleDB[(gb_console.summary_wishlist_search_queries)]
    participant Logger
    participant Slack
    
    Note over Job,Slack: Search Query Verification Job Flow
    
    rect rgb(200, 255, 200)
    Note right of Job: Happy Case - Search Query Verification
    
    Job->>Logger: Log job start with record count
    Job->>Job: Extract mall_id, keyword from records
    Job->>SearchRepo: Query search query rankings with lookup keys
    SearchRepo->>AnalyzerDB: SELECT WHERE mall_id, keyword match
    Note right of AnalyzerDB: Time constraint: crawl_created_at >= updated_at
    AnalyzerDB-->>SearchRepo: Return matching search query rankings
    SearchRepo-->>Job: Return verified search query records
    
    rect rgb(255, 255, 200)
    alt Search queries found in Analyzer DB
        Note right of Job: Success Update
        Job->>ConsoleRepo: updateByConditions(crawl_status = Success)
        ConsoleRepo->>ConsoleDB: UPDATE crawl_status = 2 WHERE id IN (...)
        ConsoleDB-->>ConsoleRepo: Confirm batch update
        ConsoleRepo-->>Job: Return update count
        
        rect rgb(230, 200, 255)
        Note right of Job: Success Monitoring
        Job->>Logger: Log successful updates with count
        Job->>Slack: Send success notification with statistics
        end
    else No matching search queries
        Note right of Job: No Action Required
        Job->>Logger: Log no updates needed
    end
    end
    end
    
    rect rgb(255, 200, 200)
    Note right of Job: Error Handling
    rect rgb(255, 230, 230)
    alt Database Error Occurs
        Job->>Job: Detect deadlock/connection error
        Job->>Job: Implement exponential backoff retry
        Job->>Logger: Log retry attempt with sleep time
        
        rect rgb(233, 196, 196)
        alt Max retries exceeded (5 attempts)
            Job->>Logger: Log final error with stack trace
            Job->>Slack: Send error notification
        end
        end
    end
    end
    end

Database Schema

Console Database Tables (gb_console)

erDiagram
    summary_wishlist_products {
        bigint id PK
        string input "The input of the product"
        string input_type "The type of the input: jan, asin, rakuten_id"
        bigint mall_id FK "Foreign key to malls table"
        integer crawl_status "The status of the crawling (updated by SummaryProductJob)"
        timestamp updated_at "Used for time constraint verification"
    }
    
    summary_wishlist_product_reviews {
        bigint id PK
        bigint summary_wishlist_product_id FK "Foreign key to summary_wishlist_products"
        integer crawl_status "The status of the crawling (updated by SummaryProductReviewJob)"
        timestamp updated_at "Used for time constraint verification"
    }
    
    summary_wishlist_categories {
        bigint id PK
        string category_id "The category identifier"
        bigint mall_id FK "Foreign key to malls table"
        integer crawl_status "The status of the crawling (updated by SummaryCategoryJob)"
        timestamp updated_at "Used for time constraint verification"
    }
    
    summary_wishlist_search_queries {
        bigint id PK
        bigint mall_id FK "Foreign key to malls table"
        string keyword "The search keyword"
        integer crawl_status "The status of the crawling (updated by SummarySearchQueryJob)"
        timestamp updated_at "Used for time constraint verification"
    }
    
    summary_wishlist_products ||--o| summary_wishlist_product_reviews : "has one"

Analyzer Database Tables (gb_analyzer)

erDiagram
    products {
        bigint id PK
        bigint mall_id FK "Mall identifier (used for lookup)"
        string mall_product_id "Product ID from the mall"
        string jan_code "JAN code (nullable, used for lookup)"
        string input_type "Type of input: asin, jan, rakuten_id (used for lookup)"
        timestamp crawl_created_at "When the product was crawled (time constraint field)"
    }
    
    reviews {
        bigint id PK
        bigint mall_id FK "Mall identifier (used for lookup)"
        string mall_product_id "Product ID from mall (used for lookup)"
        string jan_code "JAN code (nullable, max 50 chars, used for lookup)"
        timestamp crawl_created_at "When the review was crawled (time constraint field)"
    }
    
    t_category_rankings {
        bigint id PK
        bigint mall_id FK "Mall identifier (used for lookup)"
        string category_id "Category identifier (used for lookup)"
        timestamp crawl_created_at "When the ranking was crawled (time constraint field)"
    }
    
    t_sq_rankings {
        bigint id PK
        string keyword "Search keyword (used for lookup, max 100 chars)"
        bigint mall_id FK "Mall identifier (used for lookup)"
        timestamp crawl_created_at "When the ranking was crawled (time constraint field)"
    }

Verification Lookup Keys

SummaryProduct Verification

  • Console Table: summary_wishlist_products
  • Analyzer Table: products
  • Lookup Keys:
    • mall_id (integer) - Must match exactly
    • input_type (string) - jan, asin, or rakuten_id
    • input (string) - The actual product identifier
  • Time Constraint: products.crawl_created_at >= summary_wishlist_products.updated_at
  • Update Field: crawl_status = 2 (Success)

SummaryProductReview Verification

  • Console Table: summary_wishlist_product_reviews
  • Analyzer Table: reviews
  • Lookup Keys:
    • mall_id (integer) - From related summary_wishlist_product
    • Product identifiers - Based on input_type from related product
  • Time Constraint: reviews.crawl_created_at >= summary_wishlist_product_reviews.updated_at
  • Update Field: crawl_status = 2 (Success)

SummaryCategory Verification

  • Console Table: summary_wishlist_categories
  • Analyzer Table: t_category_rankings
  • Lookup Keys:
    • mall_id (integer) - Must match exactly
    • category_id (string) - Must match exactly
  • Time Constraint: t_category_rankings.crawl_created_at >= summary_wishlist_categories.updated_at
  • Update Field: crawl_status = 2 (Success)

SummarySearchQuery Verification

  • Console Table: summary_wishlist_search_queries
  • Analyzer Table: t_sq_rankings
  • Lookup Keys:
    • mall_id (integer) - Must match exactly
    • keyword (string) - Must match exactly
  • Time Constraint: t_sq_rankings.crawl_created_at >= summary_wishlist_search_queries.updated_at
  • Update Field: crawl_status = 2 (Success)

CrawlStatus Enum Values

The crawl_status field in all Console tables uses the following enum values:

  • 0 (NotCrawled): Initial state for new records
  • 1 (Crawling): Record is currently being processed by crawler (filtered by sync commands)
  • 2 (Success): Processing completed successfully (updated by these commands)
  • 3 (Failed): Processing failed with errors
  • 4 (Error): System error occurred during processing

Repository Interfaces Used

  • SummaryWishlistProductRepositoryInterface: Handles summary_wishlist_products operations
  • SummaryWishlistProductReviewRepositoryInterface: Handles summary_wishlist_product_reviews operations
  • SummaryWishlistCategoryRepositoryInterface: Handles summary_wishlist_categories operations
  • SummaryWishlistSearchQueryRepositoryInterface: Handles summary_wishlist_search_queries operations
  • ProductRepositoryInterface: Queries products table in Analyzer database
  • ReviewRepositoryInterface: Queries reviews table in Analyzer database
  • TempCategoryRankingRepositoryInterface: Queries t_category_rankings table in Analyzer database
  • TempSearchQueryRankingRepositoryInterface: Queries t_sq_rankings table in Analyzer database

Detail

Parameters

  • --data-type: Required parameter specifying the type of data to verify
    • SummaryProduct: Verifies product data in summary_wishlist_products table
    • SummaryProductReview: Verifies product review data in summary_wishlist_product_reviews table
    • SummaryCategory: Verifies category data in summary_wishlist_categories table
    • SummarySearchQuery: Verifies search query data in summary_wishlist_search_queries table
  • --limit=N: Optional parameter to control batch size for processing (default: 100 records per chunk)

Frequency

Every 5 minutes for each data type (SummaryProduct, SummaryProductReview, SummaryCategory, SummarySearchQuery)

Dependencies

  • Console Database (gb_console): Must contain records in summary wishlist tables with crawl_status = Crawling
  • Analyzer Database (gb_analyzer): Must contain processed data in corresponding tables:
    • products table for SummaryProduct verification
    • reviews table for SummaryProductReview verification
    • t_category_rankings table for SummaryCategory verification
    • t_sq_rankings table for SummarySearchQuery verification
  • Laravel Queue System: For asynchronous job processing
  • Repository Interfaces: For database abstraction and batch operations

Output

Tables

  • summary_wishlist_products: Updates crawl_status field to 2 (Success)

    • Updated by: SummaryProductJob
    • Lookup method: Based on mall_id, input_type, and input fields
    • Time constraint: crawl_created_at >= updated_at
  • summary_wishlist_product_reviews: Updates crawl_status field to 2 (Success)

    • Updated by: SummaryProductReviewJob
    • Lookup method: Based on related product identifiers
    • Time constraint: crawl_created_at >= updated_at
  • summary_wishlist_categories: Updates crawl_status field to 2 (Success)

    • Updated by: SummaryCategoryJob
    • Lookup method: Based on mall_id and category_id fields
    • Time constraint: crawl_created_at >= updated_at
  • summary_wishlist_search_queries: Updates crawl_status field to 2 (Success)

    • Updated by: SummarySearchQueryJob
    • Lookup method: Based on mall_id and keyword fields
    • Time constraint: crawl_created_at >= updated_at

Services

  • Job Queue: Dispatches specialized verification jobs for each data type
  • Database Transactions: Ensures atomic updates with rollback capability
  • Slack Notifications: Sends success/error notifications with processing statistics
  • Logging System: Records detailed execution logs with timestamps and record counts

Error Handling

Log

The system generates comprehensive logs for monitoring and debugging:

  • Command Execution Logs: Start/end timestamps with data type and limit parameters
  • Progress Tracking: Number of chunks processed and total records verified
  • Success Updates: Count of records updated to Success status with specific IDs
  • Error Details: Database errors with file/line information and retry attempts
  • Time Constraint Violations: Records that don't meet timestamp requirements

Slack

Automated notifications are sent to configured Slack channels:

  • Success Notifications: Include processing statistics (records processed, updated count, execution time)
  • Error Notifications: Contain detailed error messages with context and stack traces
  • Retry Notifications: Alert when database errors trigger retry mechanisms
  • Final Failure Alerts: Sent when maximum retry attempts are exceeded

Troubleshooting

Check Data

  1. Verify Console Database Records:

    SELECT COUNT(*) FROM summary_wishlist_products WHERE crawl_status = 1; -- Crawling status
    SELECT COUNT(*) FROM summary_wishlist_products WHERE crawl_status = 2; -- Success status
    
  2. Check Analyzer Database Population:

    SELECT COUNT(*) FROM products WHERE crawl_created_at >= NOW() - INTERVAL 1 HOUR;
    SELECT COUNT(*) FROM reviews WHERE crawl_created_at >= NOW() - INTERVAL 1 HOUR;
    
  3. Validate Time Constraints:

    SELECT id, updated_at FROM summary_wishlist_products WHERE crawl_status = 1 ORDER BY updated_at DESC LIMIT 10;
    

Check Logs

  1. Laravel Application Logs: Check storage/logs/laravel.log for command execution details
  2. Job Queue Logs: Monitor failed jobs using php artisan queue:failed command
  3. Database Query Logs: Enable query logging to debug slow or failing database operations
  4. Slack Message History: Review notification channel for error patterns and success rates

Common Issues and Solutions

  1. Database Deadlocks:

    • Symptom: Retry notifications in Slack
    • Solution: System automatically retries with exponential backoff (up to 5 attempts)
    • Prevention: Consider reducing batch size with --limit parameter
  2. No Matching Records:

    • Symptom: No status updates despite records in Crawling status
    • Solution: Verify Analyzer database contains recent data with proper timestamps
    • Check: Ensure crawler and analyzer systems are functioning properly
  3. Time Constraint Issues:

    • Symptom: Records remain in Crawling status despite data in Analyzer database
    • Solution: Check timestamp alignment between updated_at in Console and crawl_created_at in Analyzer
    • Fix: Verify system clocks are synchronized across databases
  4. High Processing Load:

    • Symptom: Command timeouts or slow execution
    • Solution: Reduce batch size using --limit=50 or lower
    • Monitor: Check database performance and queue worker capacity