Dataset Create

Command Signature

php artisan dataset:create {wishlist_slug?}

Purpose

The dataset:create command identifies eligible wishlist groups with active subscriptions, collects analyzed data from both Console (gb_console) and Analyzer (gb_analyzer) databases, validates data thresholds, and sends structured dataset creation requests to the TV Python API. This command ensures that only wishlists with sufficient processed data are used for dataset generation, maintaining data quality standards for machine learning analysis.

Sequence Diagram

Step 1: Command Initialization and Wishlist Eligibility

sequenceDiagram
    participant System
    participant CreateCommand as dataset:create
    participant WishlistRepo as WishlistToGroupRepository
    participant ConsoleDB[(gb_console.wishlist_to_groups)]
    participant Logger
    participant Slack
    
    Note over System,Slack: Command Initialization (Every 30 Minutes)
    
    rect rgb(200, 255, 200)
    Note right of System: Happy Case - Command Startup
    System->>CreateCommand: Execute Command
    CreateCommand->>Logger: Log command start with wishlist_slug parameter
    
    CreateCommand->>WishlistRepo: getEligibleWishlists(wishlist_slug?)
    WishlistRepo->>ConsoleDB: Query active wishlists with subscriptions
    Note right of WishlistRepo: WHERE status=1 AND admin_status=1 AND subscription active
    ConsoleDB-->>WishlistRepo: Return eligible wishlist groups
    WishlistRepo-->>CreateCommand: Return active wishlists with subscriptions
    CreateCommand->>Logger: Log eligible wishlists count
    end
    
    rect rgb(255, 200, 200)
    Note right of System: Error Handling
    rect rgb(255, 230, 230)
    alt Database Connection Error
        WishlistRepo->>Logger: Log database connection error
        WishlistRepo->>Slack: Send database error notification
    else No Eligible Wishlists
        CreateCommand->>Logger: Log no wishlists to process
    else Invalid Wishlist Slug
        CreateCommand->>Logger: Log invalid wishlist slug error
        CreateCommand->>Slack: Send invalid parameter notification
    end
    end
    end

Step 2: Data Collection from Multiple Sources

sequenceDiagram
    participant CreateCommand as dataset:create
    participant ProductRepo as ProductRepository
    participant CategoryRepo as CategoryRankingRepository
    participant SearchRepo as SearchQueryRankingRepository
    participant ViewpointRepo as ReviewSentenceRepository
    participant AnalyzerDB[(gb_analyzer)]
    participant ConsoleDB[(gb_console)]
    participant Logger
    participant Slack
    
    Note over CreateCommand,Slack: Data Collection Phase
    
    rect rgb(200, 255, 200)
    Note right of CreateCommand: Happy Case - Data Retrieval
    
    loop For each Eligible Wishlist
        CreateCommand->>Logger: Log wishlist processing start
        
        rect rgb(200, 230, 255)
        Note right of CreateCommand: Product Data Collection
        CreateCommand->>ProductRepo: getProductsForWishlist(wishlistId)
        ProductRepo->>AnalyzerDB: Query products with rankings WHERE crawl_status = 2
        AnalyzerDB-->>ProductRepo: Return product data with rankings
        ProductRepo-->>CreateCommand: Return product data with rankings
        CreateCommand->>Logger: Log product data count
        end
        
        rect rgb(200, 230, 255)
        Note right of CreateCommand: Category Data Collection
        CreateCommand->>CategoryRepo: getCategoriesForWishlist(wishlistId)
        CategoryRepo->>AnalyzerDB: Query category_rankings with product relationships
        AnalyzerDB-->>CategoryRepo: Return category rankings
        CategoryRepo-->>CreateCommand: Return category rankings
        CreateCommand->>Logger: Log category data count
        end
        
        rect rgb(200, 230, 255)
        Note right of CreateCommand: Search Query Data Collection
        CreateCommand->>SearchRepo: getSearchQueriesForWishlist(wishlistId)
        SearchRepo->>AnalyzerDB: Query search_query_rankings with product relationships
        AnalyzerDB-->>SearchRepo: Return search query rankings
        SearchRepo-->>CreateCommand: Return search query rankings
        CreateCommand->>Logger: Log search query data count
        end
        
        rect rgb(200, 230, 255)
        Note right of CreateCommand: Viewpoint Data Collection
        CreateCommand->>ViewpointRepo: getSpecificViewpoints(wishlistId)
        ViewpointRepo->>AnalyzerDB: Query review_sentences with viewpoint associations
        AnalyzerDB-->>ViewpointRepo: Return viewpoint associations
        ViewpointRepo-->>CreateCommand: Return viewpoint associations
        CreateCommand->>Logger: Log viewpoint data count
        end
    end
    end
    
    rect rgb(255, 200, 200)
    Note right of CreateCommand: Error Handling
    rect rgb(255, 230, 230)
    alt Insufficient Product Data
        CreateCommand->>Logger: Log insufficient product data
        CreateCommand->>CreateCommand: Skip wishlist processing
    else Database Query Error
        CreateCommand->>Logger: Log database query error
        CreateCommand->>Slack: Send data collection error notification
    else Data Integrity Error
        CreateCommand->>Logger: Log data integrity validation error
        CreateCommand->>Slack: Send data integrity error notification
    end
    end
    end

Step 3: Data Threshold Validation

sequenceDiagram
    participant CreateCommand as dataset:create
    participant ConfigService as Configuration Service
    participant Logger
    participant Slack
    
    Note over CreateCommand,Slack: Data Threshold Validation Process
    
    rect rgb(200, 255, 200)
    Note right of CreateCommand: Happy Case - Threshold Validation
    
    rect rgb(255, 255, 200)
    Note right of CreateCommand: Threshold Checking
    CreateCommand->>ConfigService: getMinProductSuccessCount()
    ConfigService-->>CreateCommand: Return MIN_PRODUCT_SUCCESS_COUNT (2)
    
    CreateCommand->>ConfigService: getMinNewProductSuccessCount()
    ConfigService-->>CreateCommand: Return MIN_NEW_PRODUCT_SUCCESS_COUNT (1)
    
    CreateCommand->>CreateCommand: validateDataThresholds()
    CreateCommand->>CreateCommand: countSuccessfulProducts()
    CreateCommand->>CreateCommand: countSuccessfulCategories()
    CreateCommand->>CreateCommand: countSuccessfulSearchQueries()
    
    CreateCommand->>Logger: Log threshold validation results
    end
    
    rect rgb(200, 230, 255)
    alt Thresholds Met
        Note right of CreateCommand: Validation Successful
        CreateCommand->>CreateCommand: prepareDatasetRequest()
        CreateCommand->>Logger: Log threshold validation success
    else Thresholds Not Met
        Note right of CreateCommand: Validation Failed
        CreateCommand->>CreateCommand: updateWishlistWithError()
        CreateCommand->>Logger: Log threshold validation failure with details
        CreateCommand->>Slack: Send threshold failure notification
    end
    end
    end
    
    rect rgb(255, 200, 200)
    Note right of CreateCommand: Error Handling
    rect rgb(255, 230, 230)
    alt Configuration Error
        CreateCommand->>Logger: Log configuration retrieval error
        CreateCommand->>Slack: Send configuration error notification
    else Validation Logic Error
        CreateCommand->>Logger: Log validation logic error
        CreateCommand->>Slack: Send validation error notification
    end
    end
    end

Step 4: TV Python API Integration

sequenceDiagram
    participant CreateCommand as dataset:create
    participant AnalyzerAPI as TV Python API
    participant APIService as AnalyzerApiService
    participant Logger
    participant Slack
    
    Note over CreateCommand,Slack: TV Python API Dataset Creation
    
    rect rgb(200, 255, 200)
    Note right of CreateCommand: Happy Case - API Integration
    
    rect rgb(200, 230, 255)
    Note right of CreateCommand: API Request Preparation
    CreateCommand->>APIService: prepareDatasetRequest(structuredData)
    APIService->>APIService: validateRequestStructure()
    APIService->>APIService: addVersioningHeaders()
    APIService-->>CreateCommand: Return formatted request
    CreateCommand->>Logger: Log API request preparation
    end
    
    rect rgb(255, 255, 200)
    Note right of CreateCommand: API Submission
    CreateCommand->>AnalyzerAPI: createDataset(structuredData)
    AnalyzerAPI->>AnalyzerAPI: Validate request payload
    AnalyzerAPI->>AnalyzerAPI: Create dataset in ML pipeline
    AnalyzerAPI-->>CreateCommand: Return dataset ID and status
    CreateCommand->>Logger: Log successful API submission with dataset_id
    end
    end
    
    rect rgb(255, 200, 200)
    Note right of CreateCommand: Error Handling
    rect rgb(255, 230, 230)
    alt API Authentication Error
        AnalyzerAPI->>Logger: Log authentication failure
        AnalyzerAPI->>Slack: Send API authentication error
        AnalyzerAPI->>CreateCommand: Return authentication error
    else API Service Error
        AnalyzerAPI->>Logger: Log API service error with response
        AnalyzerAPI->>Slack: Send API service error notification
        AnalyzerAPI->>CreateCommand: Return service error
    else Request Validation Error
        AnalyzerAPI->>Logger: Log request validation failure
        AnalyzerAPI->>Slack: Send request format error
        AnalyzerAPI->>CreateCommand: Return validation error
    else API Timeout Error
        AnalyzerAPI->>Logger: Log API timeout error
        AnalyzerAPI->>Slack: Send timeout error notification
        AnalyzerAPI->>CreateCommand: Return timeout error
    end
    end
    end

Step 5: Database Record Creation and Logging

sequenceDiagram
    participant CreateCommand as dataset:create
    participant HistoryRepo as WishlistDatasetHistoryRepository
    participant LogService as DatasetCreationLogService
    participant ConsoleDB[(gb_console)]
    participant Logger
    participant Slack
    
    Note over CreateCommand,Slack: Database Record Management
    
    rect rgb(200, 255, 200)
    Note right of CreateCommand: Happy Case - Record Creation
    
    rect rgb(200, 230, 255)
    Note right of CreateCommand: Dataset History Creation
    CreateCommand->>HistoryRepo: createDatasetHistory(datasetId, config)
    HistoryRepo->>ConsoleDB: BEGIN TRANSACTION
    HistoryRepo->>ConsoleDB: INSERT INTO wishlist_dataset_histories
    ConsoleDB-->>HistoryRepo: Return history record ID
    HistoryRepo->>ConsoleDB: COMMIT TRANSACTION
    HistoryRepo-->>CreateCommand: Confirm history record created
    CreateCommand->>Logger: Log dataset history creation success
    end
    
    rect rgb(200, 230, 255)
    Note right of CreateCommand: Event Logging
    CreateCommand->>LogService: logSuccessEvent(wishlist, dataset)
    LogService->>ConsoleDB: INSERT INTO wishlist_dataset_creation_logs
    ConsoleDB-->>LogService: Return log record ID
    LogService-->>CreateCommand: Confirm event logged
    CreateCommand->>Logger: Log success event creation
    end
    
    rect rgb(230, 200, 255)
    Note right of CreateCommand: Success Monitoring
    CreateCommand->>Logger: Log dataset creation success with statistics
    CreateCommand->>Slack: Send dataset creation success notification
    end
    end
    
    rect rgb(255, 200, 200)
    Note right of CreateCommand: Error Handling
    rect rgb(255, 230, 230)
    alt Database Transaction Error
        HistoryRepo->>ConsoleDB: ROLLBACK TRANSACTION
        HistoryRepo->>Logger: Log transaction rollback error
        HistoryRepo->>Slack: Send database error notification
        HistoryRepo->>CreateCommand: Return transaction error
    else Log Creation Failure
        LogService->>Logger: Log creation failure with details
        LogService->>Slack: Send log creation failure notification
        LogService->>CreateCommand: Return log error
    else Record Constraint Error
        CreateCommand->>Logger: Log constraint violation error
        CreateCommand->>Slack: Send constraint error notification
    end
    end
    end

Step 6: Notification and User Communication

sequenceDiagram
    participant CreateCommand as dataset:create
    participant NotificationService as DatasetNotificationService
    participant PusherService as Pusher Service
    participant EmailService as Email Service
    participant ConsoleDB[(gb_console.notifications)]
    participant Logger
    participant Slack
    
    Note over CreateCommand,Slack: Notification and Communication
    
    rect rgb(200, 255, 200)
    Note right of CreateCommand: Happy Case - Notification Delivery
    
    rect rgb(200, 230, 255)
    Note right of CreateCommand: In-App Notifications
    CreateCommand->>NotificationService: sendSuccessNotification()
    NotificationService->>ConsoleDB: INSERT INTO notifications
    ConsoleDB-->>NotificationService: Return notification ID
    NotificationService->>PusherService: Send real-time notification
    PusherService-->>NotificationService: Confirm delivery
    NotificationService-->>CreateCommand: Confirm notification sent
    CreateCommand->>Logger: Log in-app notification success
    end
    
    rect rgb(200, 230, 255)
    Note right of CreateCommand: Email Notifications
    CreateCommand->>EmailService: sendDatasetCreationEmail(wishlist, dataset)
    EmailService->>EmailService: Prepare email template
    EmailService->>EmailService: Send to group members
    EmailService-->>CreateCommand: Confirm email sent
    CreateCommand->>Logger: Log email notification success
    end
    
    rect rgb(230, 200, 255)
    Note right of CreateCommand: Administrative Monitoring
    CreateCommand->>Slack: Send admin success notification with metrics
    CreateCommand->>Logger: Log administrative notification success
    end
    end
    
    rect rgb(255, 200, 200)
    Note right of CreateCommand: Error Handling
    rect rgb(255, 230, 230)
    alt Notification Storage Error
        NotificationService->>Logger: Log notification storage error
        NotificationService->>Slack: Send notification error alert
    else Pusher Service Error
        PusherService->>Logger: Log Pusher service error
        PusherService->>CreateCommand: Fallback to database-only notification
    else Email Service Error
        EmailService->>Logger: Log email service error
        EmailService->>Slack: Send email error notification
    end
    end
    end

Step 7: Command Completion and Error Recovery

sequenceDiagram
    participant CreateCommand as dataset:create
    participant LogService as DatasetCreationLogService
    participant System
    participant Logger
    participant Slack
    
    Note over CreateCommand,Slack: Command Completion and Error Recovery
    
    rect rgb(200, 255, 200)
    Note right of CreateCommand: Happy Case - Successful Completion
    
    rect rgb(230, 200, 255)
    Note right of CreateCommand: Command Summary
    CreateCommand->>Logger: Log command completion with overall statistics
    CreateCommand->>Logger: Log total wishlists processed and success rate
    CreateCommand->>Logger: Log execution time and performance metrics
    CreateCommand->>Slack: Send command completion summary with metrics
    end
    end
    
    rect rgb(255, 200, 200)
    Note right of CreateCommand: Error Handling and Recovery
    rect rgb(255, 230, 230)
    alt API Error Recovery
        CreateCommand->>Logger: Log API error details with retry information
        CreateCommand->>LogService: logFailureEvent(wishlist, apiError)
        CreateCommand->>CreateCommand: Rollback transaction
        CreateCommand->>Slack: Send API error alert with resolution steps
    else Database Error Recovery
        CreateCommand->>Logger: Log database error details
        CreateCommand->>CreateCommand: Rollback transaction
        CreateCommand->>Slack: Send database error notification
        CreateCommand->>LogService: logFailureEvent(wishlist, databaseError)
    else Threshold Validation Failure
        CreateCommand->>CreateCommand: updateWishlistWithError()
        CreateCommand->>LogService: logFailureEvent(wishlist, thresholdError)
        CreateCommand->>Logger: Log threshold failure with specific counts
    else Critical System Error
        CreateCommand->>Logger: Log critical system error
        CreateCommand->>Slack: Send critical error alert
        CreateCommand->>System: Halt further processing
    end
    end
    end

Detail

Parameters

  • {wishlist_slug?}: Optional parameter to create a dataset for a specific wishlist
    • When provided, the command will only process the specified wishlist group
    • When omitted, all eligible wishlist groups will be processed
    • Must match an existing wishlist slug in the database
    • Useful for manual dataset creation or testing specific wishlists

Frequency

  • Scheduled Execution: Every 30 minutes
    • Configured in routes/console.php using Laravel's scheduler
    • Example: Schedule::command('dataset:create')->everyThirtyMinutes();
  • Manual Execution: Can be triggered manually for specific wishlists
    • Used for testing or immediate dataset creation needs

Dependencies

Database Dependencies:

  • Console database (gb_console) connection for wishlist and dataset records
  • Analyzer database (gb_analyzer) connection for product, category, and search data
  • Minimum product success count: 2 (configurable via MIN_PRODUCT_SUCCESS_COUNT)
  • Minimum new product success count: 1 (configurable via MIN_NEW_PRODUCT_SUCCESS_COUNT)

External Service Dependencies:

  • TV Python API service availability and authentication
  • Valid API credentials configured in environment variables
  • Network connectivity to the API service
  • API version compatibility (configured in config/analyzer_api.dataset.version)

System Dependencies:

  • Active subscriptions for wishlist groups with paid status
  • Successfully processed product, category, and search data with crawl_status = 2 (Success)
  • Properly configured dataset settings in config/analyzer_api.php
  • Review sentences data for specific viewpoint analysis

Output

Tables

The dataset creation command interacts with multiple database tables. For complete table structures, field definitions, and relationships, see the Database Schema section.

Primary Output Tables:

  • wishlist_dataset_histories: Creates new dataset tracking records with API dataset ID
  • wishlist_dataset_creation_logs: Logs all creation events (Request, Success, Failure)
  • wishlist_to_groups: Updates error messages and manual request timestamps

Command-Specific Operations:

  • Creates: New records in wishlist_dataset_histories with dataset ID from API
  • Logs: Success/failure events in wishlist_dataset_creation_logs with detailed data
  • Updates: Wishlist error messages and manual request flags

Services

TV Python API:

  • Dataset creation endpoint with structured request payload
  • Returns dataset ID and initial status for tracking
  • Handles configuration versioning and API compatibility

Notification Services:

  • In-app notifications via DatasetNotification class for users
  • Email notifications for dataset creation completion/failure
  • Slack alerts for administrators via DatasetSlackChannel
  • Real-time updates via Pusher for connected clients

Repository Services:

  • WishlistToGroupRepositoryInterface: Eligibility filtering with subscription validation
  • WishlistDatasetHistoryRepositoryInterface: Dataset history CRUD operations
  • DatasetCreationLogService: Event logging with structured data
  • ProductRepositoryInterface: Product data retrieval from Analyzer database
  • CategoryRankingRepositoryInterface: Category ranking data with product relationships
  • SearchQueryRankingRepositoryInterface: Search query ranking data with product relationships
  • ReviewSentenceRepositoryInterface: Specific viewpoint data for sentiment analysis

Error Handling

Log

The system generates comprehensive logs for troubleshooting dataset creation issues:

Dataset Creation Errors:

  • API communication failures with full request/response details
  • Data validation errors with specific threshold information and counts
  • Database transaction failures with rollback details and affected records
  • Configuration errors with missing parameter information and suggestions

Eligibility Validation Errors:

  • Subscription validation failures with subscription status details
  • Data threshold failures with actual vs required counts
  • Wishlist configuration errors with specific field validation issues

Log Locations:

  • Application logs: storage/logs/laravel.log with contextual information
  • Command-specific logs with execution statistics and performance metrics
  • Error logs with full stack traces and request details for debugging

Slack

Automated Slack notifications are sent via DatasetSlackChannel for operational monitoring:

Success Notifications:

  • Dataset creation completion with processing statistics and timing
  • Batch processing summaries with success/failure counts
  • Performance metrics including API response times and data volumes

Error Notifications:

  • API communication failures with error codes and retry information
  • Database operation failures with affected wishlist details
  • Configuration issues with resolution suggestions and documentation links
  • Critical system errors requiring immediate administrative attention

Notification Format:

  • Command name and execution timestamp for tracking
  • Error type and severity level for prioritization
  • Affected wishlist groups and dataset IDs for investigation
  • Suggested troubleshooting steps and documentation references

Troubleshooting

Check Data

Verify Wishlist Eligibility:

-- Check active wishlist groups with valid subscriptions
SELECT wtg.id, wtg.name, wtg.status, wtg.admin_status, 
       wtg.training_schedule, wtg.manual_request_dataset_at,
       s.status as subscription_status, sh.payment_status
FROM wishlist_to_groups wtg
JOIN subscriptions s ON wtg.subscription_id = s.id
JOIN subscription_histories sh ON s.id = sh.subscription_id
WHERE wtg.status = 1 AND wtg.admin_status = 1 
AND s.status = 'active' AND sh.payment_status = 'paid'
AND (wtg.training_schedule != 'manual' OR wtg.manual_request_dataset_at IS NOT NULL);

Verify Data Thresholds:

-- Check product count for wishlist groups
SELECT wtg.id, wtg.name, 
       COUNT(CASE WHEN swp.crawl_status = 2 THEN 1 END) as success_products,
       COUNT(CASE WHEN swc.crawl_status = 2 THEN 1 END) as success_categories,
       COUNT(CASE WHEN swsq.crawl_status = 2 THEN 1 END) as success_search_queries
FROM wishlist_to_groups wtg
LEFT JOIN summary_wishlist_products swp ON wtg.id = swp.wishlist_to_group_id
LEFT JOIN summary_wishlist_categories swc ON wtg.id = swc.wishlist_to_group_id
LEFT JOIN summary_wishlist_search_queries swsq ON wtg.id = swsq.wishlist_to_group_id
GROUP BY wtg.id, wtg.name
HAVING success_products >= 2; -- MIN_PRODUCT_SUCCESS_COUNT

Check Recent Dataset Creation Attempts:

-- Review recent dataset creation events
SELECT wdcl.*, wtg.name as wishlist_name
FROM wishlist_dataset_creation_logs wdcl
JOIN wishlist_to_groups wtg ON wdcl.wishlist_to_group_id = wtg.id
WHERE wdcl.created_at > DATE_SUB(NOW(), INTERVAL 1 DAY)
ORDER BY wdcl.created_at DESC LIMIT 20;

Check Logs

Application Logs:

# Check recent dataset:create command logs
tail -f storage/logs/laravel.log | grep -E "dataset:create"

# Check API communication logs
grep "AnalyzerApiService.*dataset.*create" storage/logs/laravel.log | tail -20

# Check threshold validation errors
grep -E "(MIN_PRODUCT_SUCCESS_COUNT|MIN_NEW_PRODUCT_SUCCESS_COUNT)" storage/logs/laravel.log | tail -10

# Check error patterns
grep -E "(ERROR|CRITICAL)" storage/logs/laravel.log | grep "dataset:create" | tail -10

Database Logs:

-- Check failed dataset creation attempts
SELECT wdh.*, wtg.name as wishlist_name
FROM wishlist_dataset_histories wdh
JOIN wishlist_to_groups wtg ON wdh.wishlist_to_group_id = wtg.id
WHERE wdh.created_at > DATE_SUB(NOW(), INTERVAL 1 DAY)
AND wdh.dataset_id IS NULL
ORDER BY wdh.created_at DESC;

-- Check wishlist error messages
SELECT id, name, error_message, updated_at
FROM wishlist_to_groups
WHERE error_message IS NOT NULL
AND updated_at > DATE_SUB(NOW(), INTERVAL 1 DAY);

API Response Validation:

# Test API connectivity
curl -X GET "https://api.analyzer.example.com/health" \
  -H "Authorization: Bearer YOUR_API_TOKEN"

# Test dataset creation endpoint
curl -X POST "https://api.analyzer.example.com/datasets?version=1" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"dataset_name": "test", "dataset_type": 1, "settings": {}}'

Performance Monitoring:

  • Monitor command execution times in logs for performance degradation
  • Check database query performance for large wishlist datasets
  • Verify API response times and timeout configurations
  • Review memory usage during data collection phases for optimization opportunities