Crawler Integration - Update Configurations

Command Signatures

php artisan plg-api:sending-configs-to-crawler --mode=update --data-type=SummaryProduct [--limit=100]
php artisan plg-api:sending-configs-to-crawler --mode=update --data-type=SummaryProductReview [--limit=100]
php artisan plg-api:sending-configs-to-crawler --mode=update --data-type=SummaryCategory [--limit=100]
php artisan plg-api:sending-configs-to-crawler --mode=update --data-type=SummarySearchQuery [--limit=100]

Purpose

These commands update existing crawl configurations in the Crawler system for different data types. They identify records in the summary wishlist tables that have been modified and require updated crawler configurations, then send the modified configuration data to the Crawler system via the Playground API.

Sequence Diagram

sequenceDiagram
    participant System
    participant Command as plg-api:sending-configs-to-crawler (update)
    participant Repository as SummaryWishlist*Repository
    participant Job as Summary*Job
    participant APIService as PlaygroundApiService
    participant Crawler as Crawler System
    participant Logger
    participant Slack
    
    Note over System,Slack: Crawler Configuration Update Flow (Every 5 Minutes)
    
    rect rgb(200, 255, 200)
    Note right of System: Happy Case - Normal Processing
    
    System->>Command: Execute with specific data type
    Command->>Logger: Log command start
    Command->>Command: Validate mode and data type parameters
    
    Command->>Repository: chunkDataSendToUpdate()
    Repository->>Repository: Query records WHERE sending_status = Sent AND needs update
    Repository-->>Command: Return chunk of records (max: limit option)
    
    rect rgb(200, 230, 255)
    alt Records Found
        Note right of Command: Job Processing
        Command->>Job: dispatch(records)
        
        Job->>Job: mapRecordsToData()
        Job->>APIService: bulkUpdate(mapped data)
        APIService->>Crawler: HTTP PUT /bulk-update
        Crawler-->>APIService: Response with updated configs
        APIService-->>Job: Return API response
        
        rect rgb(230, 200, 255)
        alt Success Response (200 OK)
            Note right of Job: Success Processing
            Job->>Repository: Update sending_status to Sent
            Job->>Repository: Update crawl_config_id if changed
            Job->>Repository: Update crawl_status if needed
            Job->>Logger: Log success with statistics
            Job->>Slack: Send success notification
        else Bad Request (400)
            Note right of Job: Error Processing
            Job->>Repository: Update sending_status to Error
            Job->>Logger: Log error details
            Job->>Slack: Send error notification
        end
        end
    else No Records
        Note right of Command: No Data Scenario
        Command->>Logger: Log no records to process
    end
    end
    end
    
    rect rgb(255, 200, 200)
    Note right of System: Error Handling
    rect rgb(255, 230, 230)
    alt Unexpected Error Occurs
        Command->>Logger: Log error details
        Command->>Slack: Send error notification with context
    end
    end
    end

Detail

Parameters

  • --mode=update: Required parameter specifying that existing configurations should be updated
  • --data-type: Required parameter specifying the type of data for which to update configurations
    • SummaryProduct: Product summary data
    • SummaryProductReview: Product review data
    • SummaryCategory: Category summary data
    • SummarySearchQuery: Search query summary data
  • --limit=N: Optional parameter to control the chunk size (default: 100)

Frequency

Every 5 minutes for each data type

Dependencies

  • Summary wishlist tables must contain records with sending_status = Sent that have been modified
  • Playground API service must be accessible
  • Valid API authentication tokens
  • Existing crawl_config_id values in the database

Output

Tables

  • summary_wishlist_products: Updates sending_status, crawl_config_id, crawl_status
    • sending_status: Remains Sent for successful updates, changes to Error for failures
    • crawl_config_id: May be updated if Crawler returns new ID
    • crawl_status: Updated based on Crawler response
  • summary_wishlist_product_reviews: Same field updates as products
  • summary_wishlist_categories: Same field updates as products
  • summary_wishlist_search_queries: Same field updates as products

Services

  • Playground API: Receives bulk update requests with modified crawler configurations
  • Crawler System: Updates existing crawl configurations based on summary data changes

Database Schema

erDiagram
    summary_wishlist_products {
        bigint id PK
        string input "The input of the product"
        string input_type "The type of the input: jan, asin, rakuten_id"
        bigint mall_id FK "Foreign key to malls table"
        integer schedule_id "The id of the schedule"
        integer schedule_priority "The priority of the schedule"
        integer sending_status "The status of the sending to crawler"
        bigint crawl_config_id "The id of the configs table from Crawler (nullable)"
        integer status "The status of the product"
    }
    
    summary_wishlist_product_reviews {
        bigint id PK
        bigint summary_wishlist_product_id FK "Foreign key to summary_wishlist_products (unique)"
        integer schedule_id "The id of the schedule"
        integer schedule_priority "The priority of the schedule"
        integer sending_status "The status of the sending to crawler"
        bigint crawl_config_id "The id of the configs table from Crawler (nullable)"
        integer status "The status of the product"
    }
    
    summary_wishlist_categories {
        bigint id PK
        string category_id "The id of the category in the mall"
        bigint mall_id FK "Foreign key to malls table"
        integer schedule_id "The id of the schedule"
        integer schedule_priority "The priority of the schedule"
        integer sending_status "The status of the sending to crawler"
        bigint crawl_config_id "The id of the configs table from Crawler (nullable)"
        integer status "The status of the product"
    }
    
    summary_wishlist_search_queries {
        bigint id PK
        bigint mall_id FK "The id of the mall"
        string keyword "The keyword to search"
        integer schedule_id "The id of the schedule"
        integer schedule_priority "The priority of the schedule"
        integer sending_status "The status of the sending to crawler"
        bigint crawl_config_id "The id of the configs table from Crawler (nullable)"
        integer status "The status of the product"
    }
    
    %% Relationships
    summary_wishlist_products ||--o{ summary_wishlist_product_reviews : "has reviews"

Error Handling

Log

  • Command execution start/end with data type and parameters
  • Success/failure of API calls with response codes
  • Record counts and batch processing information
  • Detailed error messages with file and line information for debugging

Slack

  • Success notifications with data type and processing statistics (records processed, configs updated)
  • Error notifications with detailed message and source information
  • Full error context including API response details and affected record counts

Troubleshooting

Check Data

  1. Verify summary_wishlist_* tables contain records with sending_status = Sent that have been modified
  2. Check that records have valid crawl_config_id values from previous create operations
  3. Ensure schedule_id and schedule_priority values are valid
  4. Validate that updated_at timestamps indicate recent modifications

Check Logs

  1. Monitor command execution logs for successful starts and completions
  2. Check API response logs for HTTP status codes and error messages
  3. Review Slack notifications for success/failure patterns
  4. Examine job queue logs for processing delays or failures
  5. Verify database update logs show proper status transitions
  6. Compare before/after configuration data to confirm updates were applied