Crawler Integration - Create Configurations

Command Signatures

php artisan plg-api:sending-configs-to-crawler --mode=create --data-type=SummaryProduct [--limit=100]
php artisan plg-api:sending-configs-to-crawler --mode=create --data-type=SummaryProductReview [--limit=100]
php artisan plg-api:sending-configs-to-crawler --mode=create --data-type=SummaryCategory [--limit=100]
php artisan plg-api:sending-configs-to-crawler --mode=create --data-type=SummarySearchQuery [--limit=100]

Purpose

These commands identify records in the summary wishlist tables that require new crawler configurations and send the necessary configuration data to the Crawler system via the Playground API. Each data type has its own configuration format and is processed separately to maintain efficient crawling operations.

Sequence Diagram

sequenceDiagram
    participant System
    participant Command as plg-api:sending-configs-to-crawler (create)
    participant Repository as SummaryWishlist*Repository
    participant Job as Summary*Job
    participant APIService as PlaygroundApiService
    participant Crawler as Crawler System
    participant Logger
    participant Slack
    
    Note over System,Slack: Crawler Configuration Creation Flow (Every 5 Minutes)
    
    rect rgb(200, 255, 200)
    Note right of System: Happy Case - Normal Processing
    
    System->>Command: Execute with specific data type
    Command->>Logger: Log command start
    Command->>Command: Validate mode and data type parameters
    
    Command->>Repository: chunkDataSendToCreate()
    Repository->>Repository: Query records WHERE sending_status = NotSent OR Error
    Repository-->>Command: Return chunk of records (max: limit option)
    
    rect rgb(200, 230, 255)
    alt Records Found
        Note right of Command: Job Processing
        Command->>Job: dispatch(records)
        
        Job->>Job: mapRecordsToData()
        Job->>APIService: bulkCreate(mapped data)
        APIService->>Crawler: HTTP POST /bulk-create
        Crawler-->>APIService: Response with created configs
        APIService-->>Job: Return API response
        
        rect rgb(230, 200, 255)
        alt Success Response (201 Created)
            Note right of Job: Success Processing
            Job->>Repository: Update sending_status to Sent
            Job->>Repository: Update crawl_config_id from response
            Job->>Repository: Update crawl_status to NotCrawled
            Job->>Logger: Log success with statistics
            Job->>Slack: Send success notification
        else Bad Request (400)
            Note right of Job: Error Processing
            Job->>Repository: Update sending_status to Error
            Job->>Logger: Log error details
            Job->>Slack: Send error notification
        end
        end
    else No Records
        Note right of Command: No Data Scenario
        Command->>Logger: Log no records to process
    end
    end
    end
    
    rect rgb(255, 200, 200)
    Note right of System: Error Handling
    rect rgb(255, 230, 230)
    alt Unexpected Error Occurs
        Command->>Logger: Log error details
        Command->>Slack: Send error notification with context
    end
    end
    end

Detail

Parameters

  • --mode=create: Required parameter specifying that new configurations should be created
  • --data-type: Required parameter specifying the type of data for which to create configurations
    • SummaryProduct: Product summary data
    • SummaryProductReview: Product review data
    • SummaryCategory: Category summary data
    • SummarySearchQuery: Search query summary data
  • --limit=N: Optional parameter to control the chunk size (default: 100)

Frequency

Every 5 minutes for each data type

Dependencies

  • Summary wishlist tables must contain records with sending_status = NotSent or Error
  • Playground API service must be accessible
  • Valid API authentication tokens

Output

Tables

  • summary_wishlist_products: Updates sending_status, crawl_config_id, crawl_status
    • sending_status: Changes from NotSent/Error to Sent
    • crawl_config_id: Populated with ID from Crawler response
    • crawl_status: Set to NotCrawled for successful creations
  • summary_wishlist_product_reviews: Same field updates as products
  • summary_wishlist_categories: Same field updates as products
  • summary_wishlist_search_queries: Same field updates as products

Services

  • Playground API: Receives bulk create requests with crawler configurations
  • Crawler System: Creates new crawl configurations based on summary data

Database Schema

erDiagram
    summary_wishlist_products {
        bigint id PK
        string input "The input of the product"
        string input_type "The type of the input: jan, asin, rakuten_id"
        bigint mall_id FK "Foreign key to malls table"
        integer schedule_id "The id of the schedule"
        integer schedule_priority "The priority of the schedule"
        integer sending_status "The status of the sending to crawler"
        bigint crawl_config_id "The id of the configs table from Crawler (nullable)"
        integer status "The status of the product"
    }
    
    summary_wishlist_product_reviews {
        bigint id PK
        bigint summary_wishlist_product_id FK "Foreign key to summary_wishlist_products (unique)"
        integer schedule_id "The id of the schedule"
        integer schedule_priority "The priority of the schedule"
        integer sending_status "The status of the sending to crawler"
        bigint crawl_config_id "The id of the configs table from Crawler (nullable)"
        integer status "The status of the product"
    }
    
    summary_wishlist_categories {
        bigint id PK
        string category_id "The id of the category in the mall"
        bigint mall_id FK "Foreign key to malls table"
        integer schedule_id "The id of the schedule"
        integer schedule_priority "The priority of the schedule"
        integer sending_status "The status of the sending to crawler"
        bigint crawl_config_id "The id of the configs table from Crawler (nullable)"
        integer status "The status of the product"
    }
    
    summary_wishlist_search_queries {
        bigint id PK
        bigint mall_id FK "The id of the mall"
        string keyword "The keyword to search"
        integer schedule_id "The id of the schedule"
        integer schedule_priority "The priority of the schedule"
        integer sending_status "The status of the sending to crawler"
        bigint crawl_config_id "The id of the configs table from Crawler (nullable)"
        integer status "The status of the product"
    }
    
    %% Relationships
    summary_wishlist_products ||--o{ summary_wishlist_product_reviews : "has reviews"

Error Handling

Log

  • Command execution start/end with data type and parameters
  • Success/failure of API calls with response codes
  • Record counts and batch processing information
  • Detailed error messages with file and line information for debugging

Slack

  • Success notifications with data type and processing statistics (records processed, configs created)
  • Error notifications with detailed message and source information
  • Full error context including API response details and affected record counts

Troubleshooting

Check Data

  1. Verify summary_wishlist_* tables contain records with sending_status = NotSent or Error
  2. Check that records have valid schedule_id and schedule_priority values
  3. Ensure mall_id references exist in the malls table
  4. Validate input_type values for products (jan, asin, rakuten_id)

Check Logs

  1. Monitor command execution logs for successful starts and completions
  2. Check API response logs for HTTP status codes and error messages
  3. Review Slack notifications for success/failure patterns
  4. Examine job queue logs for processing delays or failures
  5. Verify database update logs show proper status transitions