Dataset Workflow - Database Changes
Database Schema & Changes
---
config:
theme: base
layout: dagre
flowchart:
curve: linear
htmlLabels: true
themeVariables:
edgeLabelBackground: "transparent"
---
flowchart TB
%%ds_analyzer.
subgraph GBConsoleDB ["gb_console DB (MySQL)"]
direction TB
WTG["wishlist_to_groups<br/>📝 name, group_id<br/>📊 status: 1 | Active"]
subgraph SummaryTables ["Summary Wishlist Tables"]
direction LR
SWP["summary_wishlist_products<br/>📝 input, input_type<br/>📊 crawl_status: 0→1→2"]
SWPR["summary_wishlist_product_reviews<br/>📝 summary_wishlist_product_id<br/>📊 crawl_status: 0→1→2"]
SWC["summary_wishlist_categories<br/>📝 category_id, mall_id<br/>📊 crawl_status: 0→1→2"]
SWSQ["summary_wishlist_search_queries<br/>📝 keyword, mall_id<br/>📊 crawl_status: 0→1→2"]
end
WDH["wishlist_dataset_histories<br/>📝 dataset_id, config<br/>📊 status: 1→2→3<br/>📊 spvp_status: 1→2→3"]
end
subgraph AnalyzerV2DB ["analyzer_v2 DB (MySQL)"]
direction LR
CP["products<br/>📝 name, price, description<br/>📊 Raw marketplace data"]
CR["reviews<br/>📝 content, rating<br/>📊 User reviews"]
CRS["review_sentences<br/>📝 sentence, embedding<br/>📊 Processed sentences"]
end
subgraph DSAnalyzerDB ["ds_analyzer DB (MySQL)"]
direction TB
DS["datasets<br/>📝 name, settings<br/>📊 status: 1→2→3<br/>📊 progress: 0→100"]
subgraph AnalyzerBatchTables ["analyzer_batch creates"]
direction LR
ADP["products<br/>📝 processed_name<br/>📊 Dataset products"]
ADPD["product_details<br/>📝 specifications<br/>📊 Product metadata"]
PS["product_similarities<br/>📝 similarity_score<br/>📊 Similarity matrix"]
AV["ai_viewpoints<br/>📝 name, description<br/>📊 AI-generated labels"]
RSAIVP["review_sentence_aivp<br/>📝 sentence-ai_viewpoint mappings<br/>📊 Clustering results"]
ADR["reviews<br/>📝 sentiment_overall<br/>📊 Dataset reviews"]
ARS["review_sentences<br/>📝 cluster_id, sentiment<br/>📊 Clustered sentences"]
end
subgraph SPVPBatchTables ["spvp_batch manages"]
direction LR
SV["specific_viewpoints<br/>📝 last_object_id<br/>📊 SPVP progress tracking"]
VC["viewpoint_categories<br/>📝 category_name<br/>📊 SPVP categories"]
RSSPVP["review_sentence_spvp<br/>📝 sentence-viewpoint mappings<br/>📊 Qwen results"]
end
end
%% Phase 1: Wishlist Creation
WTG --- Step1[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #6699cc !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>1</span>
<p style='margin-top: 8px'>Create Wishlist Items</p>
</div>
]
Step1 --> SummaryTables
%% Phase 2: Data Crawling
SummaryTables --- Step2[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #99cc66 !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>2</span>
<p style='margin-top: 8px'>Crawl Marketplace</p>
</div>
]
Step2 --> CP
CP --> CR
CR --> CRS
%% Phase 3: Dataset Creation
SummaryTables --- Step3[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #cc6699 !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>3</span>
<p style='margin-top: 8px'>Create Dataset Metadata</p>
</div>
]
Step3 --> DS
DS --> WDH
%% Phase 4: ML Analysis
CRS --- Step4[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #ff9900 !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>4</span>
<p style='margin-top: 8px'>Analyze & Cluster</p>
</div>
]
Step4 --> AnalyzerBatchTables
%% Phase 5: SPVP Processing
ARS --- Step5[
<div style='text-align: center'>
<span style='display: inline-block; background-color: #cc3366 !important; color:white; width: 28px; height: 28px; line-height: 28px; border-radius: 50%; font-weight: bold'>5</span>
<p style='margin-top: 8px'>SPVP Processing</p>
</div>
]
Step5 --> SPVPBatchTables
SV -.-> WDH
%% Styling per api-guide.md
style Step1 fill:transparent,stroke:transparent,stroke-width:1px
style Step2 fill:transparent,stroke:transparent,stroke-width:1px
style Step3 fill:transparent,stroke:transparent,stroke-width:1px
style Step4 fill:transparent,stroke:transparent,stroke-width:1px
style Step5 fill:transparent,stroke:transparent,stroke-width:1px
style GBConsoleDB fill:#fcf3d2
style SummaryTables fill:#faf4de
style AnalyzerV2DB fill:#d2e3fc
style DSAnalyzerDB fill:#d9f2d9
style AnalyzerBatchTables fill:#e6ffe6
style SPVPBatchTables fill:#ffe6f2
style WTG fill:#ffe6cc,stroke:#ff9900,stroke-width:2px
style WDH fill:#d9f2d9,stroke:#339933,stroke-width:2px
style SWP fill:#fffae8,stroke:#ebdba2,stroke-width:2px
style SWPR fill:#fffae8,stroke:#ebdba2,stroke-width:2px
style SWC fill:#fffae8,stroke:#ebdba2,stroke-width:2px
style SWSQ fill:#fffae8,stroke:#ebdba2,stroke-width:2px
style CP fill:#e6f0ff,stroke:#6699cc,stroke-width:1px
style CR fill:#e6f0ff,stroke:#6699cc,stroke-width:1px
style CRS fill:#e6f0ff,stroke:#6699cc,stroke-width:1px
style DS fill:#ffe6cc,stroke:#ff9900,stroke-width:2px
%% analyzer_batch tables
style ADP fill:#d9f2d9,stroke:#339933,stroke-width:2px
style ADPD fill:#d9f2d9,stroke:#339933,stroke-width:2px
style PS fill:#d9f2d9,stroke:#339933,stroke-width:2px
style AV fill:#d9f2d9,stroke:#339933,stroke-width:2px
style RSAIVP fill:#d9f2d9,stroke:#339933,stroke-width:2px
style ADR fill:#d9f2d9,stroke:#339933,stroke-width:2px
style ARS fill:#d9f2d9,stroke:#339933,stroke-width:2px
%% spvp_batch tables
style SV fill:#ffccdd,stroke:#cc3366,stroke-width:2px
style VC fill:#ffccdd,stroke:#cc3366,stroke-width:2px
style RSSPVP fill:#ffccdd,stroke:#cc3366,stroke-width:2px
Key Database Tables
gb_console DB - Business Logic
| Table |
Key Fields |
Status Values |
wishlist_to_groups |
name, group_id, subscription_id |
status: 1 | Active |
wishlist_dataset_histories |
dataset_id, config |
status: 1→2→3, spvp_status: 1→2→3 |
summary_wishlist_products |
input, input_type, mall_id |
crawl_status: 0→1→2 |
summary_wishlist_product_reviews |
summary_wishlist_product_id, schedule_id |
crawl_status: 0→1→2 |
summary_wishlist_categories |
category_id, mall_id |
crawl_status: 0→1→2 |
summary_wishlist_search_queries |
keyword, mall_id |
crawl_status: 0→1→2 |
analyzer_v2 DB - Raw Crawled Data
| Table |
Key Fields |
Service Responsible |
products |
name, description, price, mall_id |
Crawler Service |
reviews |
product_id, content, rating |
Crawler Service |
review_sentences |
review_id, sentence, embedding |
Crawler Service + PLG API |
ds_analyzer DB - Analysis Results
| Table |
Key Fields |
Service Responsible |
datasets |
name, settings, status, progress, error_code |
trend-viewer-backend |
analyzer_batch creates 7 tables:
| Table |
Key Fields |
Description |
products |
dataset_id, crawler_product_id, processed_name |
Dataset products with processed data |
product_details |
product_id, specifications, metadata |
Product detailed specifications |
product_similarities |
dataset_id, mall_product_id_a, product_id_b, similarity_score |
Product similarity matrix |
ai_viewpoints |
dataset_id, cluster_id, name, description |
AI-generated cluster labels |
review_sentence_aivp |
sentence_id, dataset_id, ai_viewpoint_id, mall_product_id |
AI Viewpoint ↔ Sentence mappings |
reviews |
dataset_id, product_id, sentiment_overall |
Dataset reviews with sentiment |
review_sentences |
dataset_id, ai_viewpoint_id, cluster_id, is_noise |
Clustered review sentences |
spvp_batch manages 3 tables:
| Table |
Key Fields |
Description |
specific_viewpoints |
viewpoint_category_id, last_object_id |
SPVP progress tracking |
viewpoint_categories |
category_name, description |
SPVP category definitions |
review_sentence_spvp |
sentence_id, viewpoint_ordinal_number, mall_product_id, viewpoint_category_id |
Specific Viewpoint ↔ Sentence mappings |
Detailed Data Changes
Phase 1: Wishlist Creation (trend-viewer-api → gb_console)
-- Create main wishlist
INSERT INTO gb_console.wishlist_to_groups (
name, group_id, subscription_id,
status = 1 -- 1 | Active
);
-- Store crawl items - Products
INSERT INTO gb_console.summary_wishlist_products (
wishlist_to_group_id, input, input_type, mall_id,
crawl_status = 0 -- 0 | New
);
-- Store crawl items - Product Reviews
INSERT INTO gb_console.summary_wishlist_product_reviews (
summary_wishlist_product_id, schedule_id,
crawl_status = 0 -- 0 | New
);
-- Store crawl items - Categories
INSERT INTO gb_console.summary_wishlist_categories (
category_id, mall_id, wishlist_to_group_id,
crawl_status = 0 -- 0 | New
);
-- Store crawl items - Search Queries
INSERT INTO gb_console.summary_wishlist_search_queries (
keyword, mall_id, wishlist_to_group_id,
crawl_status = 0 -- 0 | New
);
Phase 2: Data Crawling (trend-viewer-backend → Crawler Service → analyzer_v2)
-- Update crawl status for all summary tables
UPDATE gb_console.summary_wishlist_products
SET crawl_status = 1 -- 1 | InProgress
WHERE crawl_status = 0;
UPDATE gb_console.summary_wishlist_product_reviews
SET crawl_status = 1 -- 1 | InProgress
WHERE crawl_status = 0;
UPDATE gb_console.summary_wishlist_categories
SET crawl_status = 1 -- 1 | InProgress
WHERE crawl_status = 0;
UPDATE gb_console.summary_wishlist_search_queries
SET crawl_status = 1 -- 1 | InProgress
WHERE crawl_status = 0;
-- Raw data from Crawler Service - Products & Reviews
INSERT INTO analyzer_v2.products (name, description, price, mall_id);
INSERT INTO analyzer_v2.reviews (product_id, content, rating);
INSERT INTO analyzer_v2.review_sentences (review_id, sentence, embedding);
-- Complete crawl for all summary tables
UPDATE gb_console.summary_wishlist_*
SET crawl_status = 2 -- 2 | Success
WHERE crawl_completed;
Phase 3: Dataset Creation (trend-viewer-backend → ds_analyzer)
-- Create dataset metadata
INSERT INTO ds_analyzer.datasets (
name, group_id, viewpoint_category_id,
status = 1, -- 1 | Pending
progress = 0,
settings = JSON_OBJECT('product_ids', [...])
);
-- Create history record
INSERT INTO gb_console.wishlist_dataset_histories (
wishlist_to_group_id, dataset_id,
status = 1, -- 1 | Pending
spvp_status = 1 -- 1 | Pending
);
Phase 4: ML Analysis (analyzer_batch → ds_analyzer)
-- Update progress
UPDATE ds_analyzer.datasets
SET status = 2, progress = 50 -- 2 | Processing
WHERE id = dataset_id;
-- analyzer_batch creates 7 complete tables
-- 1. Products & Details
INSERT INTO ds_analyzer.products (dataset_id, crawler_product_id, processed_name);
INSERT INTO ds_analyzer.product_details (product_id, specifications, metadata);
-- 2. Product Similarities
INSERT INTO ds_analyzer.product_similarities (dataset_id, mall_product_id_a, mall_product_id_b, similarity_score);
-- 3. AI Viewpoints & Clustering
INSERT INTO ds_analyzer.ai_viewpoints (dataset_id, cluster_id, name, description);
INSERT INTO ds_analyzer.review_sentence_spvp (
sentence_id,
viewpoint_ordinal_number,
mall_product_id,
viewpoint_category_id
);
-- 4. Reviews & Sentences with sentiment analysis
INSERT INTO ds_analyzer.reviews (dataset_id, product_id, sentiment_overall);
INSERT INTO ds_analyzer.review_sentences (dataset_id, ai_viewpoint_id, cluster_id, is_noise);
-- Complete analysis
UPDATE ds_analyzer.datasets
SET status = 3, progress = 100 -- 3 | Completed
WHERE id = dataset_id;
-- Sync status
UPDATE gb_console.wishlist_dataset_histories
SET status = 3, spvp_status = 2 -- 2 | Analyzing (SPVP)
WHERE dataset_id = dataset_id;
Phase 5: SPVP Processing (spvp_batch → ds_analyzer)
-- Load data for SPVP processing
SELECT * FROM ds_analyzer.review_sentences WHERE dataset_id = ?;
SELECT * FROM ds_analyzer.specific_viewpoints WHERE viewpoint_category_id = ?;
SELECT * FROM ds_analyzer.viewpoint_categories WHERE id = ?;
-- Qwen mapping process: specific_viewpoints ↔ review_sentences
-- Store mapping results
INSERT INTO ds_analyzer.review_sentence_spvp (
sentence_id,
viewpoint_ordinal_number,
mall_product_id,
viewpoint_category_id
);
-- Update progress tracking
UPDATE ds_analyzer.specific_viewpoints
SET last_object_id = max_processed_sentence_id
WHERE viewpoint_category_id = dataset_viewpoint_category;
-- SPVP completion check
UPDATE gb_console.wishlist_dataset_histories
SET spvp_status = 3 -- 3 | Completed
WHERE dataset_id = dataset_id
AND all_specific_viewpoints_processed();
Service Responsibilities
trend-viewer-api
- Creates:
gb_console.wishlist_to_groups, gb_console.summary_wishlist_*
- Updates: Status fields for user-facing operations
trend-viewer-backend
- Creates:
ds_analyzer.datasets, gb_console.wishlist_dataset_histories
- Updates: Crawl status sync, dataset status monitoring
- Triggers: analyzer_batch, spvp_batch
- Scheduled crawl management
- Dataset creation logic
- Status synchronization
Crawler Service
- Creates:
analyzer_v2.products, analyzer_v2.reviews, analyzer_v2.review_sentences
- Coordinates with: PLG API for embedding/prediction
analyzer_batch
- Reads from:
analyzer_v2.* tables (products, reviews, review_sentences)
- Creates: 7 tables in
ds_analyzer:
products - Dataset products with processed data
product_details - Product specifications & metadata
product_similarities - Product similarity matrix
ai_viewpoints - AI-generated cluster labels
review_sentence_aivp - AI Viewpoint ↔ Sentence mappings
reviews - Dataset reviews with sentiment analysis
review_sentences - Clustered review sentences
- Updates:
ds_analyzer.datasets (status, progress)
- Processing: ML clustering (K-means), AI labeling (OpenAI GPT-4), Product similarity calculation, Sentiment analysis
- Architecture: Data writing with distributed transactions
spvp_batch
- Reads from:
ds_analyzer.review_sentences, ds_analyzer.specific_viewpoints, ds_analyzer.viewpoint_categories
- Creates:
ds_analyzer.review_sentence_spvp (sentence-viewpoint mappings)
- Updates:
ds_analyzer.specific_viewpoints (last_object_id), gb_console.wishlist_dataset_histories (spvp_status)
- Qwen-based sentence to specific viewpoint mapping
- Progress tracking through last_object_id
- Final completion status updates
PLG API
- Embedding status checks
- Prediction readiness verification
- Crawler service coordination
Critical Status Fields
Dataset Lifecycle Status
-- trend-viewer-backend creates dataset
ds_analyzer.datasets.status = 1 | Pending
-- analyzer_batch processes
ds_analyzer.datasets.status = 2 | Processing
ds_analyzer.datasets.progress = 0 → 100
-- analyzer_batch completes
ds_analyzer.datasets.status = 3 | Completed
-- Status sync to gb_console
gb_console.wishlist_dataset_histories.status = 3 | Complete
gb_console.wishlist_dataset_histories.spvp_status = 2 | Analyzing
-- spvp_batch completes
gb_console.wishlist_dataset_histories.spvp_status = 3 | Completed
Error Tracking
-- analyzer_batch errors
ds_analyzer.datasets.status = 9 | Failed
ds_analyzer.datasets.error_code = 1001-3003
ds_analyzer.datasets.progress = -1
Status Values Reference
Dataset Status (ds_analyzer.datasets.status)
- 1 | Pending: trend-viewer-backend creates dataset, waiting for analyzer_batch
- 2 | Processing: analyzer_batch is processing
- 3 | Completed: analyzer_batch completed
- 9 | Failed: analyzer_batch failed
SPVP Status (gb_console.wishlist_dataset_histories.spvp_status)
- 1 | Pending: Waiting for spvp_batch
- 2 | Analyzing: spvp_batch is processing
- 3 | Completed: spvp_batch completed
- 9 | Failed: spvp_batch failed
Crawl Status (gb_console.summary_wishlist_*.crawl_status)
- 0 | New: Not crawled yet, waiting for trend-viewer-backend schedule
- 1 | InProgress: Crawler Service is crawling
- 2 | Success: Crawler Service crawl successful
- 3 | Error: Crawler Service crawl failed
- 4 | Canceled: Crawl canceled
Error Recovery by Service
trend-viewer-backend
- Crawl sync failures: Automatic retry with scheduled jobs
- Dataset creation failures: Manual trigger available
- Status sync issues: Reconciliation jobs
analyzer_batch
- Processing failures: Full rollback, error_code tracking
- Resource issues: Container restart, scaling
- OpenAI API failures: Retry with backoff
spvp_batch
- Processing stuck: Manual reset last_object_id
- Completion detection failures: Manual status update
Related Documentation
Core Workflow Components
Related Systems Documentation
Backend Services Documentation
- trend-viewer-backend Documentation - Complete backend documentation
- Dataset Commands: link - Console commands for dataset management
- Crawler Integration: link - PLG API integration details
Processing Services (Documentation in Development)
analyzer_batch: Python ML/AI processing service
- Functions: ML clustering, AI labeling, Product similarity analysis
spvp_batch: Python Qwen-based specific viewpoint processing
- Functions: SPVP mapping, Qwen model integration, Progress tracking