Dataset Create
Command Signature
php artisan dataset:create {wishlist_slug?}
Purpose
The dataset:create command identifies eligible wishlist groups with active subscriptions, collects analyzed data from both Console (gb_console) and Analyzer (gb_analyzer) databases, validates data thresholds, and sends structured dataset creation requests to the TV Python API. This command ensures that only wishlists with sufficient processed data are used for dataset generation, maintaining data quality standards for machine learning analysis.
Sequence Diagram
Step 1: Command Initialization and Wishlist Eligibility
sequenceDiagram
participant System
participant CreateCommand as dataset:create
participant WishlistRepo as WishlistToGroupRepository
participant ConsoleDB[(gb_console.wishlist_to_groups)]
participant Logger
participant Slack
Note over System,Slack: Command Initialization (Every 30 Minutes)
rect rgb(200, 255, 200)
Note right of System: Happy Case - Command Startup
System->>CreateCommand: Execute Command
CreateCommand->>Logger: Log command start with wishlist_slug parameter
CreateCommand->>WishlistRepo: getEligibleWishlists(wishlist_slug?)
WishlistRepo->>ConsoleDB: Query active wishlists with subscriptions
Note right of WishlistRepo: WHERE status=1 AND admin_status=1 AND subscription active
ConsoleDB-->>WishlistRepo: Return eligible wishlist groups
WishlistRepo-->>CreateCommand: Return active wishlists with subscriptions
CreateCommand->>Logger: Log eligible wishlists count
end
rect rgb(255, 200, 200)
Note right of System: Error Handling
rect rgb(255, 230, 230)
alt Database Connection Error
WishlistRepo->>Logger: Log database connection error
WishlistRepo->>Slack: Send database error notification
else No Eligible Wishlists
CreateCommand->>Logger: Log no wishlists to process
else Invalid Wishlist Slug
CreateCommand->>Logger: Log invalid wishlist slug error
CreateCommand->>Slack: Send invalid parameter notification
end
end
end
Step 2: Data Collection from Multiple Sources
sequenceDiagram
participant CreateCommand as dataset:create
participant ProductRepo as ProductRepository
participant CategoryRepo as CategoryRankingRepository
participant SearchRepo as SearchQueryRankingRepository
participant ViewpointRepo as ReviewSentenceRepository
participant AnalyzerDB[(gb_analyzer)]
participant ConsoleDB[(gb_console)]
participant Logger
participant Slack
Note over CreateCommand,Slack: Data Collection Phase
rect rgb(200, 255, 200)
Note right of CreateCommand: Happy Case - Data Retrieval
loop For each Eligible Wishlist
CreateCommand->>Logger: Log wishlist processing start
rect rgb(200, 230, 255)
Note right of CreateCommand: Product Data Collection
CreateCommand->>ProductRepo: getProductsForWishlist(wishlistId)
ProductRepo->>AnalyzerDB: Query products with rankings WHERE crawl_status = 2
AnalyzerDB-->>ProductRepo: Return product data with rankings
ProductRepo-->>CreateCommand: Return product data with rankings
CreateCommand->>Logger: Log product data count
end
rect rgb(200, 230, 255)
Note right of CreateCommand: Category Data Collection
CreateCommand->>CategoryRepo: getCategoriesForWishlist(wishlistId)
CategoryRepo->>AnalyzerDB: Query category_rankings with product relationships
AnalyzerDB-->>CategoryRepo: Return category rankings
CategoryRepo-->>CreateCommand: Return category rankings
CreateCommand->>Logger: Log category data count
end
rect rgb(200, 230, 255)
Note right of CreateCommand: Search Query Data Collection
CreateCommand->>SearchRepo: getSearchQueriesForWishlist(wishlistId)
SearchRepo->>AnalyzerDB: Query search_query_rankings with product relationships
AnalyzerDB-->>SearchRepo: Return search query rankings
SearchRepo-->>CreateCommand: Return search query rankings
CreateCommand->>Logger: Log search query data count
end
rect rgb(200, 230, 255)
Note right of CreateCommand: Viewpoint Data Collection
CreateCommand->>ViewpointRepo: getSpecificViewpoints(wishlistId)
ViewpointRepo->>AnalyzerDB: Query review_sentences with viewpoint associations
AnalyzerDB-->>ViewpointRepo: Return viewpoint associations
ViewpointRepo-->>CreateCommand: Return viewpoint associations
CreateCommand->>Logger: Log viewpoint data count
end
end
end
rect rgb(255, 200, 200)
Note right of CreateCommand: Error Handling
rect rgb(255, 230, 230)
alt Insufficient Product Data
CreateCommand->>Logger: Log insufficient product data
CreateCommand->>CreateCommand: Skip wishlist processing
else Database Query Error
CreateCommand->>Logger: Log database query error
CreateCommand->>Slack: Send data collection error notification
else Data Integrity Error
CreateCommand->>Logger: Log data integrity validation error
CreateCommand->>Slack: Send data integrity error notification
end
end
end
Step 3: Data Threshold Validation
sequenceDiagram
participant CreateCommand as dataset:create
participant ConfigService as Configuration Service
participant Logger
participant Slack
Note over CreateCommand,Slack: Data Threshold Validation Process
rect rgb(200, 255, 200)
Note right of CreateCommand: Happy Case - Threshold Validation
rect rgb(255, 255, 200)
Note right of CreateCommand: Threshold Checking
CreateCommand->>ConfigService: getMinProductSuccessCount()
ConfigService-->>CreateCommand: Return MIN_PRODUCT_SUCCESS_COUNT (2)
CreateCommand->>ConfigService: getMinNewProductSuccessCount()
ConfigService-->>CreateCommand: Return MIN_NEW_PRODUCT_SUCCESS_COUNT (1)
CreateCommand->>CreateCommand: validateDataThresholds()
CreateCommand->>CreateCommand: countSuccessfulProducts()
CreateCommand->>CreateCommand: countSuccessfulCategories()
CreateCommand->>CreateCommand: countSuccessfulSearchQueries()
CreateCommand->>Logger: Log threshold validation results
end
rect rgb(200, 230, 255)
alt Thresholds Met
Note right of CreateCommand: Validation Successful
CreateCommand->>CreateCommand: prepareDatasetRequest()
CreateCommand->>Logger: Log threshold validation success
else Thresholds Not Met
Note right of CreateCommand: Validation Failed
CreateCommand->>CreateCommand: updateWishlistWithError()
CreateCommand->>Logger: Log threshold validation failure with details
CreateCommand->>Slack: Send threshold failure notification
end
end
end
rect rgb(255, 200, 200)
Note right of CreateCommand: Error Handling
rect rgb(255, 230, 230)
alt Configuration Error
CreateCommand->>Logger: Log configuration retrieval error
CreateCommand->>Slack: Send configuration error notification
else Validation Logic Error
CreateCommand->>Logger: Log validation logic error
CreateCommand->>Slack: Send validation error notification
end
end
end
Step 4: TV Python API Integration
sequenceDiagram
participant CreateCommand as dataset:create
participant AnalyzerAPI as TV Python API
participant APIService as AnalyzerApiService
participant Logger
participant Slack
Note over CreateCommand,Slack: TV Python API Dataset Creation
rect rgb(200, 255, 200)
Note right of CreateCommand: Happy Case - API Integration
rect rgb(200, 230, 255)
Note right of CreateCommand: API Request Preparation
CreateCommand->>APIService: prepareDatasetRequest(structuredData)
APIService->>APIService: validateRequestStructure()
APIService->>APIService: addVersioningHeaders()
APIService-->>CreateCommand: Return formatted request
CreateCommand->>Logger: Log API request preparation
end
rect rgb(255, 255, 200)
Note right of CreateCommand: API Submission
CreateCommand->>AnalyzerAPI: createDataset(structuredData)
AnalyzerAPI->>AnalyzerAPI: Validate request payload
AnalyzerAPI->>AnalyzerAPI: Create dataset in ML pipeline
AnalyzerAPI-->>CreateCommand: Return dataset ID and status
CreateCommand->>Logger: Log successful API submission with dataset_id
end
end
rect rgb(255, 200, 200)
Note right of CreateCommand: Error Handling
rect rgb(255, 230, 230)
alt API Authentication Error
AnalyzerAPI->>Logger: Log authentication failure
AnalyzerAPI->>Slack: Send API authentication error
AnalyzerAPI->>CreateCommand: Return authentication error
else API Service Error
AnalyzerAPI->>Logger: Log API service error with response
AnalyzerAPI->>Slack: Send API service error notification
AnalyzerAPI->>CreateCommand: Return service error
else Request Validation Error
AnalyzerAPI->>Logger: Log request validation failure
AnalyzerAPI->>Slack: Send request format error
AnalyzerAPI->>CreateCommand: Return validation error
else API Timeout Error
AnalyzerAPI->>Logger: Log API timeout error
AnalyzerAPI->>Slack: Send timeout error notification
AnalyzerAPI->>CreateCommand: Return timeout error
end
end
end
Step 5: Database Record Creation and Logging
sequenceDiagram
participant CreateCommand as dataset:create
participant HistoryRepo as WishlistDatasetHistoryRepository
participant LogService as DatasetCreationLogService
participant ConsoleDB[(gb_console)]
participant Logger
participant Slack
Note over CreateCommand,Slack: Database Record Management
rect rgb(200, 255, 200)
Note right of CreateCommand: Happy Case - Record Creation
rect rgb(200, 230, 255)
Note right of CreateCommand: Dataset History Creation
CreateCommand->>HistoryRepo: createDatasetHistory(datasetId, config)
HistoryRepo->>ConsoleDB: BEGIN TRANSACTION
HistoryRepo->>ConsoleDB: INSERT INTO wishlist_dataset_histories
ConsoleDB-->>HistoryRepo: Return history record ID
HistoryRepo->>ConsoleDB: COMMIT TRANSACTION
HistoryRepo-->>CreateCommand: Confirm history record created
CreateCommand->>Logger: Log dataset history creation success
end
rect rgb(200, 230, 255)
Note right of CreateCommand: Event Logging
CreateCommand->>LogService: logSuccessEvent(wishlist, dataset)
LogService->>ConsoleDB: INSERT INTO wishlist_dataset_creation_logs
ConsoleDB-->>LogService: Return log record ID
LogService-->>CreateCommand: Confirm event logged
CreateCommand->>Logger: Log success event creation
end
rect rgb(230, 200, 255)
Note right of CreateCommand: Success Monitoring
CreateCommand->>Logger: Log dataset creation success with statistics
CreateCommand->>Slack: Send dataset creation success notification
end
end
rect rgb(255, 200, 200)
Note right of CreateCommand: Error Handling
rect rgb(255, 230, 230)
alt Database Transaction Error
HistoryRepo->>ConsoleDB: ROLLBACK TRANSACTION
HistoryRepo->>Logger: Log transaction rollback error
HistoryRepo->>Slack: Send database error notification
HistoryRepo->>CreateCommand: Return transaction error
else Log Creation Failure
LogService->>Logger: Log creation failure with details
LogService->>Slack: Send log creation failure notification
LogService->>CreateCommand: Return log error
else Record Constraint Error
CreateCommand->>Logger: Log constraint violation error
CreateCommand->>Slack: Send constraint error notification
end
end
end
Step 6: Notification and User Communication
sequenceDiagram
participant CreateCommand as dataset:create
participant NotificationService as DatasetNotificationService
participant PusherService as Pusher Service
participant EmailService as Email Service
participant ConsoleDB[(gb_console.notifications)]
participant Logger
participant Slack
Note over CreateCommand,Slack: Notification and Communication
rect rgb(200, 255, 200)
Note right of CreateCommand: Happy Case - Notification Delivery
rect rgb(200, 230, 255)
Note right of CreateCommand: In-App Notifications
CreateCommand->>NotificationService: sendSuccessNotification()
NotificationService->>ConsoleDB: INSERT INTO notifications
ConsoleDB-->>NotificationService: Return notification ID
NotificationService->>PusherService: Send real-time notification
PusherService-->>NotificationService: Confirm delivery
NotificationService-->>CreateCommand: Confirm notification sent
CreateCommand->>Logger: Log in-app notification success
end
rect rgb(200, 230, 255)
Note right of CreateCommand: Email Notifications
CreateCommand->>EmailService: sendDatasetCreationEmail(wishlist, dataset)
EmailService->>EmailService: Prepare email template
EmailService->>EmailService: Send to group members
EmailService-->>CreateCommand: Confirm email sent
CreateCommand->>Logger: Log email notification success
end
rect rgb(230, 200, 255)
Note right of CreateCommand: Administrative Monitoring
CreateCommand->>Slack: Send admin success notification with metrics
CreateCommand->>Logger: Log administrative notification success
end
end
rect rgb(255, 200, 200)
Note right of CreateCommand: Error Handling
rect rgb(255, 230, 230)
alt Notification Storage Error
NotificationService->>Logger: Log notification storage error
NotificationService->>Slack: Send notification error alert
else Pusher Service Error
PusherService->>Logger: Log Pusher service error
PusherService->>CreateCommand: Fallback to database-only notification
else Email Service Error
EmailService->>Logger: Log email service error
EmailService->>Slack: Send email error notification
end
end
end
Step 7: Command Completion and Error Recovery
sequenceDiagram
participant CreateCommand as dataset:create
participant LogService as DatasetCreationLogService
participant System
participant Logger
participant Slack
Note over CreateCommand,Slack: Command Completion and Error Recovery
rect rgb(200, 255, 200)
Note right of CreateCommand: Happy Case - Successful Completion
rect rgb(230, 200, 255)
Note right of CreateCommand: Command Summary
CreateCommand->>Logger: Log command completion with overall statistics
CreateCommand->>Logger: Log total wishlists processed and success rate
CreateCommand->>Logger: Log execution time and performance metrics
CreateCommand->>Slack: Send command completion summary with metrics
end
end
rect rgb(255, 200, 200)
Note right of CreateCommand: Error Handling and Recovery
rect rgb(255, 230, 230)
alt API Error Recovery
CreateCommand->>Logger: Log API error details with retry information
CreateCommand->>LogService: logFailureEvent(wishlist, apiError)
CreateCommand->>CreateCommand: Rollback transaction
CreateCommand->>Slack: Send API error alert with resolution steps
else Database Error Recovery
CreateCommand->>Logger: Log database error details
CreateCommand->>CreateCommand: Rollback transaction
CreateCommand->>Slack: Send database error notification
CreateCommand->>LogService: logFailureEvent(wishlist, databaseError)
else Threshold Validation Failure
CreateCommand->>CreateCommand: updateWishlistWithError()
CreateCommand->>LogService: logFailureEvent(wishlist, thresholdError)
CreateCommand->>Logger: Log threshold failure with specific counts
else Critical System Error
CreateCommand->>Logger: Log critical system error
CreateCommand->>Slack: Send critical error alert
CreateCommand->>System: Halt further processing
end
end
end
Detail
Parameters
{wishlist_slug?}: Optional parameter to create a dataset for a specific wishlist- When provided, the command will only process the specified wishlist group
- When omitted, all eligible wishlist groups will be processed
- Must match an existing wishlist slug in the database
- Useful for manual dataset creation or testing specific wishlists
Frequency
- Scheduled Execution: Every 30 minutes
- Configured in
routes/console.phpusing Laravel's scheduler - Example:
Schedule::command('dataset:create')->everyThirtyMinutes();
- Configured in
- Manual Execution: Can be triggered manually for specific wishlists
- Used for testing or immediate dataset creation needs
Dependencies
Database Dependencies:
- Console database (
gb_console) connection for wishlist and dataset records - Analyzer database (
gb_analyzer) connection for product, category, and search data - Minimum product success count: 2 (configurable via
MIN_PRODUCT_SUCCESS_COUNT) - Minimum new product success count: 1 (configurable via
MIN_NEW_PRODUCT_SUCCESS_COUNT)
External Service Dependencies:
- TV Python API service availability and authentication
- Valid API credentials configured in environment variables
- Network connectivity to the API service
- API version compatibility (configured in
config/analyzer_api.dataset.version)
System Dependencies:
- Active subscriptions for wishlist groups with paid status
- Successfully processed product, category, and search data with
crawl_status = 2 (Success) - Properly configured dataset settings in
config/analyzer_api.php - Review sentences data for specific viewpoint analysis
Output
Tables
The dataset creation command interacts with multiple database tables. For complete table structures, field definitions, and relationships, see the Database Schema section.
Primary Output Tables:
wishlist_dataset_histories: Creates new dataset tracking records with API dataset IDwishlist_dataset_creation_logs: Logs all creation events (Request, Success, Failure)wishlist_to_groups: Updates error messages and manual request timestamps
Command-Specific Operations:
- Creates: New records in
wishlist_dataset_historieswith dataset ID from API - Logs: Success/failure events in
wishlist_dataset_creation_logswith detailed data - Updates: Wishlist error messages and manual request flags
Services
TV Python API:
- Dataset creation endpoint with structured request payload
- Returns dataset ID and initial status for tracking
- Handles configuration versioning and API compatibility
Notification Services:
- In-app notifications via
DatasetNotificationclass for users - Email notifications for dataset creation completion/failure
- Slack alerts for administrators via
DatasetSlackChannel - Real-time updates via Pusher for connected clients
Repository Services:
WishlistToGroupRepositoryInterface: Eligibility filtering with subscription validationWishlistDatasetHistoryRepositoryInterface: Dataset history CRUD operationsDatasetCreationLogService: Event logging with structured dataProductRepositoryInterface: Product data retrieval from Analyzer databaseCategoryRankingRepositoryInterface: Category ranking data with product relationshipsSearchQueryRankingRepositoryInterface: Search query ranking data with product relationshipsReviewSentenceRepositoryInterface: Specific viewpoint data for sentiment analysis
Error Handling
Log
The system generates comprehensive logs for troubleshooting dataset creation issues:
Dataset Creation Errors:
- API communication failures with full request/response details
- Data validation errors with specific threshold information and counts
- Database transaction failures with rollback details and affected records
- Configuration errors with missing parameter information and suggestions
Eligibility Validation Errors:
- Subscription validation failures with subscription status details
- Data threshold failures with actual vs required counts
- Wishlist configuration errors with specific field validation issues
Log Locations:
- Application logs:
storage/logs/laravel.logwith contextual information - Command-specific logs with execution statistics and performance metrics
- Error logs with full stack traces and request details for debugging
Slack
Automated Slack notifications are sent via DatasetSlackChannel for operational monitoring:
Success Notifications:
- Dataset creation completion with processing statistics and timing
- Batch processing summaries with success/failure counts
- Performance metrics including API response times and data volumes
Error Notifications:
- API communication failures with error codes and retry information
- Database operation failures with affected wishlist details
- Configuration issues with resolution suggestions and documentation links
- Critical system errors requiring immediate administrative attention
Notification Format:
- Command name and execution timestamp for tracking
- Error type and severity level for prioritization
- Affected wishlist groups and dataset IDs for investigation
- Suggested troubleshooting steps and documentation references
Troubleshooting
Check Data
Verify Wishlist Eligibility:
-- Check active wishlist groups with valid subscriptions
SELECT wtg.id, wtg.name, wtg.status, wtg.admin_status,
wtg.training_schedule, wtg.manual_request_dataset_at,
s.status as subscription_status, sh.payment_status
FROM wishlist_to_groups wtg
JOIN subscriptions s ON wtg.subscription_id = s.id
JOIN subscription_histories sh ON s.id = sh.subscription_id
WHERE wtg.status = 1 AND wtg.admin_status = 1
AND s.status = 'active' AND sh.payment_status = 'paid'
AND (wtg.training_schedule != 'manual' OR wtg.manual_request_dataset_at IS NOT NULL);
Verify Data Thresholds:
-- Check product count for wishlist groups
SELECT wtg.id, wtg.name,
COUNT(CASE WHEN swp.crawl_status = 2 THEN 1 END) as success_products,
COUNT(CASE WHEN swc.crawl_status = 2 THEN 1 END) as success_categories,
COUNT(CASE WHEN swsq.crawl_status = 2 THEN 1 END) as success_search_queries
FROM wishlist_to_groups wtg
LEFT JOIN summary_wishlist_products swp ON wtg.id = swp.wishlist_to_group_id
LEFT JOIN summary_wishlist_categories swc ON wtg.id = swc.wishlist_to_group_id
LEFT JOIN summary_wishlist_search_queries swsq ON wtg.id = swsq.wishlist_to_group_id
GROUP BY wtg.id, wtg.name
HAVING success_products >= 2; -- MIN_PRODUCT_SUCCESS_COUNT
Check Recent Dataset Creation Attempts:
-- Review recent dataset creation events
SELECT wdcl.*, wtg.name as wishlist_name
FROM wishlist_dataset_creation_logs wdcl
JOIN wishlist_to_groups wtg ON wdcl.wishlist_to_group_id = wtg.id
WHERE wdcl.created_at > DATE_SUB(NOW(), INTERVAL 1 DAY)
ORDER BY wdcl.created_at DESC LIMIT 20;
Check Logs
Application Logs:
# Check recent dataset:create command logs
tail -f storage/logs/laravel.log | grep -E "dataset:create"
# Check API communication logs
grep "AnalyzerApiService.*dataset.*create" storage/logs/laravel.log | tail -20
# Check threshold validation errors
grep -E "(MIN_PRODUCT_SUCCESS_COUNT|MIN_NEW_PRODUCT_SUCCESS_COUNT)" storage/logs/laravel.log | tail -10
# Check error patterns
grep -E "(ERROR|CRITICAL)" storage/logs/laravel.log | grep "dataset:create" | tail -10
Database Logs:
-- Check failed dataset creation attempts
SELECT wdh.*, wtg.name as wishlist_name
FROM wishlist_dataset_histories wdh
JOIN wishlist_to_groups wtg ON wdh.wishlist_to_group_id = wtg.id
WHERE wdh.created_at > DATE_SUB(NOW(), INTERVAL 1 DAY)
AND wdh.dataset_id IS NULL
ORDER BY wdh.created_at DESC;
-- Check wishlist error messages
SELECT id, name, error_message, updated_at
FROM wishlist_to_groups
WHERE error_message IS NOT NULL
AND updated_at > DATE_SUB(NOW(), INTERVAL 1 DAY);
API Response Validation:
# Test API connectivity
curl -X GET "https://api.analyzer.example.com/health" \
-H "Authorization: Bearer YOUR_API_TOKEN"
# Test dataset creation endpoint
curl -X POST "https://api.analyzer.example.com/datasets?version=1" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"dataset_name": "test", "dataset_type": 1, "settings": {}}'
Performance Monitoring:
- Monitor command execution times in logs for performance degradation
- Check database query performance for large wishlist datasets
- Verify API response times and timeout configurations
- Review memory usage during data collection phases for optimization opportunities