BigQuery Status Updates
Command Signature
php artisan update:bigquery-status
Purpose
This command updates the status of records in BigQuery tables after they have been processed by the sync commands. It reads data from Redis where processed record IDs are stored, then updates the corresponding records in BigQuery by setting their status to "Synced".
Sequence Diagram
sequenceDiagram
participant System
participant StatusCommand as update:bigquery-status
participant Redis
participant BigQuery
Note over System,BigQuery: BigQuery Status Update Flow
rect rgb(200, 255, 200)
Note right of System: Every 5 Minutes
System->>StatusCommand: Execute
StatusCommand->>Redis: Fetch Pending Updates
Redis-->>StatusCommand: Return Product IDs, Date Range, and Data Type
StatusCommand->>BigQuery: Update Table Status (Products/Reviews/Sentences)
BigQuery-->>StatusCommand: Confirm Update
StatusCommand->>Redis: Clear Processed Data
Redis-->>StatusCommand: Confirm Cleared
end
Diagram Explanation
- System Execution: The scheduler triggers the
update:bigquery-statuscommand every 5 minutes. - Fetch Pending Updates: The command retrieves pending data from Redis including:
- Product IDs that need status updates
- Date range for filtering
- Data type (Products, Reviews, or Review Sentences)
- Update Table Status: Based on data type, updates are performed on the corresponding BigQuery tables:
- For Product data: Updates status in product tables
- For Review data: Updates status in review tables
- For Review Sentence data: Updates status in sentence tables
- All updates change status field to "Synced"
- Confirmation: BigQuery confirms successful status updates
- Clear Processed Data: After successful updates, processed entries are removed from Redis
- Completion: Redis confirms data clearing, completing the cycle
Implementation Details
Process Flow
- Command fetches data from Redis (product IDs, date range, data type)
- Based on data type, it constructs appropriate queries for BigQuery tables
- Updates are performed in batches (default 100 records per batch)
- Successful updates are tracked and Redis data is cleared after completion
- The command processes multiple Redis entries per run (configured limit)
Parameters
No command-line parameters. Configuration is handled through environment variables:
GCP_BQ_MAX_NUMBER_EXECUTE_UPDATE_STATUS: Maximum number of Redis entries to process per run
Frequency
Every 5 minutes
Dependencies
- Redis for storing product IDs and metadata
- BigQuery access for updating status
- BigQuery table schemas with status fields
- Previous successful execution of sync commands
Output
The command doesn't directly output to database tables but updates status fields in BigQuery tables:
- Updates
statusfield to"Synced"for processed records in:- BigQuery products table
- BigQuery reviews table
- BigQuery review sentences table
- BigQuery product category rankings table
- BigQuery product search query rankings table
Error Handling
Batch Processing
- Updates are processed in batches to prevent timeouts
- If any batch fails, other batches continue processing
- Success rate is calculated and reported
Logging
- Detailed logs of start/end of processing for each table
- Progress logs for multi-batch operations
- Error logs with file and line information for debugging
Slack Notifications
- Error notifications sent to BigQuery Slack channel
- Success rate reporting for partial success scenarios
- Low success rate alerts (< 80% success) trigger warnings
Troubleshooting
Common Issues
- Redis Connection Issues: Check Redis connection and data format
- BigQuery Permission Issues: Verify project and dataset permissions
- Timeouts: Consider reducing batch size or increasing timeout limits
- Missing IDs: Ensure sync commands are storing IDs properly in Redis
Verification Steps
- Check Redis to confirm data is being stored correctly
- Verify BigQuery records have updated status after processing
- Monitor logs for batch processing success rates
- Check if Redis entries are being cleared after successful processing