BigQuery Missed Data Sync
Command Signatures
php artisan gcp:sync-products --missed [--items-per-page=]
php artisan gcp:sync-reviews --missed [--items-per-page=]
php artisan gcp:sync-review-sentences --missed [--items-per-page=]
Purpose
These commands ensure that any data missed during regular synchronization is eventually synchronized from BigQuery to the local database. They specifically target records with null status values in BigQuery tables, regardless of creation time. This serves as a safety net to maintain data integrity and completeness.
Sequence Diagrams
Products Missed Data Sync
sequenceDiagram
participant System
participant MissedProducts as gcp:sync-products --missed
participant BigQuery
participant ProductsTable as products table
participant ProductDetailsTable as product_details table
participant Redis
Note over System,Redis: Products Missed Data Sync Flow
rect rgb(255, 200, 200)
Note right of System: Daily
System->>MissedProducts: Execute
MissedProducts->>BigQuery: Query Products WHERE status IS NULL
BigQuery-->>MissedProducts: Return Missed Products Data
MissedProducts->>ProductsTable: Insert/Update Products Records
MissedProducts->>ProductDetailsTable: Insert/Update Product Details Records
MissedProducts->>Redis: Store Product IDs for Status Update
end
Reviews Missed Data Sync
sequenceDiagram
participant System
participant MissedReviews as gcp:sync-reviews --missed
participant BigQuery
participant ReviewsTable as reviews table
participant Redis
Note over System,Redis: Reviews Missed Data Sync Flow
rect rgb(255, 200, 200)
Note right of System: Daily
System->>MissedReviews: Execute
MissedReviews->>BigQuery: Query Reviews WHERE status IS NULL
BigQuery-->>MissedReviews: Return Missed Reviews Data
MissedReviews->>ReviewsTable: Insert/Update Reviews Records
MissedReviews->>Redis: Store Review IDs for Status Update
end
Review Sentences Missed Data Sync
sequenceDiagram
participant System
participant MissedSentences as gcp:sync-review-sentences --missed
participant BigQuery
participant ReviewSentencesTable as review_sentences table
participant Redis
Note over System,Redis: Review Sentences Missed Data Sync Flow
rect rgb(255, 200, 200)
Note right of System: Daily
System->>MissedSentences: Execute
MissedSentences->>BigQuery: Query Review Sentences WHERE status IS NULL
BigQuery-->>MissedSentences: Return Missed Sentences Data
MissedSentences->>ReviewSentencesTable: Insert/Update Review Sentences Records
MissedSentences->>Redis: Store Sentence IDs for Status Update
end
Implementation Details
Parameters
--missed: Flag that modifies the query to select all records with null status regardless of creation time--items-per-page=N: Optional parameter to control batch size (default: 500)
Frequency
Daily - scheduled to run during low-traffic periods
Dependencies
- Google Cloud Platform access credentials
- BigQuery project and dataset configuration
- Redis for tracking processed IDs
- Queue workers for processing jobs
- Update status command for marking records as processed
Processing Flow
- Command queries BigQuery for records with null status (missing the time constraint of regular sync)
- Data is chunked into batches and processed via queue jobs
- Same transformation and storage logic as regular sync is applied
- Processed IDs are stored in Redis
- Status update command later marks these records as processed
Error Handling
Job Structure
- Uses same error handling as regular sync
- Retries failed jobs based on Laravel queue configuration
- Records detailed errors for troubleshooting
Logging
- Detailed logs with job counts and record counts
- Error stack traces for debugging
- Cross-references to BigQuery tables and conditions
Notifications
- Slack alerts for critical failures
- Success notifications with processing statistics
Troubleshooting
Common Issues
- Large Data Volumes: Missed data sync can process large volumes if regular sync has been failing
- Queue Overload: Monitor queue size and worker capacity during missed data sync
- Resource Contention: Schedule missed sync during off-peak hours to avoid performance impact
Performance Optimization
- Adjust batch sizes via
--items-per-pageparameter - Increase queue worker count before running missed data sync
- Monitor system resource usage during execution
- Consider staggering missed sync commands rather than running simultaneously
Verification Steps
- Compare record counts in BigQuery with null status before and after sync
- Check Redis for successfully processed record IDs
- Verify local database record counts and content
- Review status update command logs for successful batch processing