diff --git a/docs/CHUNK_SPEC.md b/docs/CHUNK_SPEC.md new file mode 100644 index 0000000..d4e1b2f --- /dev/null +++ b/docs/CHUNK_SPEC.md @@ -0,0 +1,586 @@ +# Video Chunk 切分規範 + +本文檔定義 Momentry Core 系統中影片 chunks 的切分原則與資料結構。 + +--- + +## 1. Chunk 概述 + +### 1.1 設計原則 + +1. **允許重疊**: 不同類型的 chunk 可以重疊(如語句 chunk 與時間 chunk) +2. **Frame 精確度**: 時間坐標精確到影片 frame +3. **多元分類**: 支援語句、場景、時間三種分割方式 + +### 1.2 Chunk 類型 + +| 類型 | 說明 | 是否可重疊 | +|------|------|------------| +| **Sentence** | 語句分割 | ✅ 可與其他類型重疊 | +| **Cut** | 場景切割 | ✅ 可與其他類型重疊 | +| **TimeBased** | 時間長度切割 | ✅ 可與其他類型重疊 | + +--- + +## 2. 時間坐標系統 + +### 2.1 時間格式 + +所有時間使用 **秒** 為單位,精確到 **微秒** (浮點數): + +```json +{ + "start_time": 10.5, + "end_time": 15.75 +} +``` + +### 2.2 Frame 計算 + +``` +frame_number = floor(time_in_seconds * fps) +time_at_frame = frame_number / fps +``` + +**範例**: +- 影片 FPS: 24/1 (24 fps) +- 時間: 10.5 秒 +- Frame: floor(10.5 * 24) = 252 +- 校驗: 252 / 24 = 10.5 秒 ✅ + +### 2.3 Frame 資訊結構 + +```json +{ + "start_time": 10.5, + "start_frame": 252, + "end_time": 15.75, + "end_frame": 378, + "fps": "24/1", + "fps_value": 24.0 +} +``` + +--- + +## 3. 三種切分方式 + +### 3.1 Sentence (語句分割) + +**原則**: +- 根據 ASR 語音識別結果 +- 每個識別的語句為一個 chunk +- 文字內容來自 ASR 輸出 + +**範例**: + +``` +ASR 輸出: +[ + {"start": 10.0, "end": 15.0, "text": "Hello world"}, + {"start": 15.0, "end": 20.0, "text": "This is a test"}, + {"start": 20.0, "end": 25.5, "text": "Processing video"} +] + +轉換為 Chunks: +┌────────────────────────────────────────┐ +│ chunk_0001: 10.0s - 15.0s "Hello world" │ +├────────────────────────────────────────┤ +│ chunk_0002: 15.0s - 20.0s "This is a test" │ +├────────────────────────────────────────┤ +│ chunk_0003: 20.0s - 25.5s "Processing video" │ +└────────────────────────────────────────┘ +``` + +### 3.2 Cut (場景切割) + +**原則**: +- 根據影片鏡頭變化 (scene change / cut detection) +- 使用 ffmpeg 或 Python (scenedetect) 偵測 +- 每個場景為一個 chunk + +**偵測方法**: + +```bash +# 使用 ffmpeg 偵測場景變化 +ffmpeg -i input.mp4 -filter:v "select='gt(scene,0.3)',showinfo" -f null - +``` + +**範例**: + +``` +場景偵測結果: +[ + {"start": 0.0, "end": 45.2, "scene_id": 1}, + {"start": 45.2, "end": 120.5, "scene_id": 2}, + {"start": 120.5, "end": 180.0, "scene_id": 3} +] + +轉換為 Chunks: +┌────────────────────────────────────────┐ +│ chunk_0001: 0.0s - 45.2s (Scene 1) │ +├────────────────────────────────────────┤ +│ chunk_0002: 45.2s - 120.5s (Scene 2) │ +├────────────────────────────────────────┤ +│ chunk_0003: 120.5s - 180.0s (Scene 3) │ +└────────────────────────────────────────┘ +``` + +### 3.3 TimeBased (時間長度切割) + +**原則**: +- 固定時間長度切割 +- 預設 **10 秒** 為一個 chunk +- 最後一個 chunk 可能不足 10 秒 +- **支援重疊** (可設定 overlap 秒數) + +**參數配置**: + +| 參數 | 預設值 | 說明 | +|------|--------|------| +| duration | 10.0 | 每個 chunk 時長 (秒) | +| overlap | 0.0 | 重疊時長 (秒) | + +**範例** (無重疊): + +``` +影片時長: 35 秒, duration=10 + +Chunks: +┌────────────────────────────────────────┐ +│ chunk_0001: 0.0s - 10.0s │ +├────────────────────────────────────────┤ +│ chunk_0002: 10.0s - 20.0s │ +├────────────────────────────────────────┤ +│ chunk_0003: 20.0s - 30.0s │ +├────────────────────────────────────────┤ +│ chunk_0004: 30.0s - 35.0s (不足10秒) │ +└────────────────────────────────────────┘ +``` + +**範例** (有重疊, overlap=2): + +``` +影片時長: 35 秒, duration=10, overlap=2 + +Chunks: +┌────────────────────────────────────────┐ +│ chunk_0001: 0.0s - 10.0s │ +├────────────────────────────────────────┤ +│ chunk_0002: 8.0s - 18.0s (重疊 2秒) │ +├────────────────────────────────────────┤ +│ chunk_0003: 16.0s - 26.0s (重疊 2秒) │ +├────────────────────────────────────────┤ +│ chunk_0004: 24.0s - 34.0s (重疊 2秒) │ +├────────────────────────────────────────┤ +│ chunk_0005: 32.0s - 35.0s (重疊+不足) │ +└────────────────────────────────────────┘ +``` + +--- + +## 4. Chunk 資料結構 + +### 4.1 基本結構 + +```json +{ + "uuid": "1636719dc31f78ac", + "chunk_id": "sentence_0001", + "chunk_index": 1, + "chunk_type": "sentence", + "start_time": 10.5, + "start_frame": 252, + "end_time": 15.75, + "end_frame": 378, + "fps": "24/1", + "fps_value": 24.0, + "content": { + "text": "Hello world, this is a test" + }, + "metadata": { + "source": "asr", + "confidence": 0.95, + "language": "en" + } +} +``` + +### 4.2 欄位說明 + +| 欄位 | 類型 | 必填 | 說明 | +|------|------|------|------| +| `uuid` | String | ✅ | 影片 UUID (16 字元) | +| `chunk_id` | String | ✅ | Chunk 唯一 ID | +| `chunk_index` | Integer | ✅ | Chunk 索引 (從 0 開始) | +| `chunk_type` | String | ✅ | 類型: sentence/cut/time_based | +| `start_time` | Float | ✅ | 開始時間 (秒) | +| `start_frame` | Integer | ✅ | 開始 frame 編號 | +| `end_time` | Float | ✅ | 結束時間 (秒) | +| `end_frame` | Integer | ✅ | 結束 frame 編號 | +| `fps` | String | ✅ | FPS 表示 (如 "24/1") | +| `fps_value` | Float | ✅ | FPS 數值 (如 24.0) | +| `content` | Object | ✅ | 內容 (見下文) | +| `metadata` | Object | ❌ | 額外資訊 (見下文) | + +### 4.3 Content 結構 + +根據 `chunk_type` 不同,content 結構也不同: + +#### Sentence Content + +```json +{ + "content": { + "text": "Hello world, this is a test message", + "text_normalized": "hello world this is a test message", + "word_count": 7, + "char_count": 34 + } +} +``` + +| 欄位 | 類型 | 說明 | +|------|------|------| +| `text` | String | 原始識別文字 | +| `text_normalized` | String | 正規化文字 (小寫,去除標點) | +| `word_count` | Integer | 字詞數量 | +| `char_count` | Integer | 字元數量 | + +#### Cut Content + +```json +{ + "content": { + "scene_id": 2, + "scene_number": 2, + "transition_type": "cut", + "scene_change_score": 0.95 + } +} +``` + +| 欄位 | 類型 | 說明 | +|------|------|------| +| `scene_id` | Integer | 場景 ID | +| `scene_number` | Integer | 場景編號 | +| `transition_type` | String | 轉場類型: cut/dissolve/fade | +| `scene_change_score` | Float | 場景變化分數 (0-1) | + +#### TimeBased Content + +```json +{ + "content": { + "duration": 10.0, + "is_last": false, + "segment_number": 3, + "total_segments": 10 + } +} +``` + +| 欄位 | 類型 | 說明 | +|------|------|------| +| `duration` | Float | 時長 (秒) | +| `is_last` | Boolean | 是否最後一個 chunk | +| `segment_number` | Integer | 分段編號 | +| `total_segments` | Integer | 總分段數 | + +### 4.4 Metadata 結構 + +```json +{ + "metadata": { + "source": "asr", + "confidence": 0.95, + "language": "en", + "model": "tiny", + "created_at": "2026-03-16T10:00:00Z" + } +} +``` + +| 欄位 | 類型 | 說明 | +|------|------|------| +| `source` | String | 來源: asr/scene_detect/time_based | +| `confidence` | Float | 信心度 (0-1) | +| `language` | String | 語言代碼 | +| `model` | String | 使用模型 | +| `created_at` | String | 創建時間 (ISO 8601) | + +--- + +## 5. Chunk ID 命名規範 + +### 5.1 格式 + +``` +{chunk_type}_{chunk_index:04} +``` + +| 類型 | 前綴 | 範例 | +|------|------|------| +| Sentence | `sentence_` | `sentence_0001` | +| Cut | `cut_` | `cut_0001` | +| TimeBased | `time_based_` | `time_based_0001` | + +### 5.2 編號規則 + +- 從 **0** 開始 +- 使用 **4 位數** 補零 +- 按時間順序遞增 + +--- + +## 6. 資料庫 Schema + +### 6.1 PostgreSQL Table + +```sql +CREATE TABLE chunks ( + id BIGSERIAL PRIMARY KEY, + uuid VARCHAR(16) NOT NULL, + chunk_id VARCHAR(64) NOT NULL, + chunk_index INTEGER NOT NULL, + chunk_type VARCHAR(32) NOT NULL, + start_time DOUBLE PRECISION NOT NULL, + start_frame BIGINT NOT NULL, + end_time DOUBLE PRECISION NOT NULL, + end_frame BIGINT NOT NULL, + fps VARCHAR(16) NOT NULL, + fps_value DOUBLE PRECISION NOT NULL, + content JSONB NOT NULL, + metadata JSONB, + created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), + UNIQUE(uuid, chunk_id) +); + +-- 索引 +CREATE INDEX idx_chunks_uuid ON chunks(uuid); +CREATE INDEX idx_chunks_type ON chunks(chunk_type); +CREATE INDEX idx_chunks_time ON chunks(start_time, end_time); +CREATE INDEX idx_chunks_uuid_type ON chunks(uuid, chunk_type); +``` + +### 6.2 查詢範例 + +```sql +-- 查詢影片所有 chunks +SELECT * FROM chunks WHERE uuid = '1636719dc31f78ac'; + +-- 查詢特定類型的 chunks +SELECT * FROM chunks WHERE uuid = '1636719dc31f78ac' AND chunk_type = 'sentence'; + +-- 查詢時間範圍內的 chunks +SELECT * FROM chunks +WHERE uuid = '1636719dc31f78ac' +AND start_time <= 30.0 AND end_time >= 20.0; + +-- 查詢時間範圍內的所有 chunks (混合類型) +SELECT * FROM chunks +WHERE uuid = '1636719dc31f78ac' +AND start_time <= 30.0 AND end_time >= 20.0 +ORDER BY chunk_type, chunk_index; +``` + +--- + +## 7. Rust 資料結構 + +### 7.1 Chunk 定義 + +```rust +use serde::{Deserialize, Serialize}; + +#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq)] +#[serde(rename_all = "snake_case")] +pub enum ChunkType { + Sentence, + Cut, + TimeBased, +} + +impl ChunkType { + pub fn as_str(&self) -> &'static str { + match self { + ChunkType::Sentence => "sentence", + ChunkType::Cut => "cut", + ChunkType::TimeBased => "time_based", + } + } +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct Chunk { + pub uuid: String, + pub chunk_id: String, + pub chunk_index: u32, + pub chunk_type: ChunkType, + pub start_time: f64, + pub start_frame: i64, + pub end_time: f64, + pub end_frame: i64, + pub fps: String, + pub fps_value: f64, + pub content: serde_json::Value, + pub metadata: Option, +} +``` + +### 7.2 建立 Chunk + +```rust +impl Chunk { + pub fn new( + uuid: String, + chunk_index: u32, + chunk_type: ChunkType, + start_time: f64, + end_time: f64, + fps: &str, + content: serde_json::Value, + ) -> Self { + let fps_value = parse_fps(fps); + let start_frame = (start_time * fps_value) as i64; + let end_frame = (end_time * fps_value) as i64; + let chunk_id = format!("{}_{:04}", chunk_type.as_str(), chunk_index); + + Self { + uuid, + chunk_id, + chunk_index, + chunk_type, + start_time, + start_frame, + end_time, + end_frame, + fps: fps.to_string(), + fps_value, + content, + metadata: None, + } + } +} +``` + +--- + +## 8. 時間切割器實作 + +### 8.1 TimeBasedSplitter + +```rust +pub struct TimeBasedSplitter { + pub duration: f64, // 每個 chunk 時長 (秒) + pub overlap: f64, // 重疊時長 (秒) +} + +impl TimeBasedSplitter { + pub fn new(duration: f64, overlap: f64) -> Self { + Self { duration, overlap } + } + + pub fn split(&self, uuid: &str, video_duration: f64, fps: f64) -> Vec { + let mut chunks = Vec::new(); + let step = self.duration - self.overlap; + let mut current_time = 0.0; + let mut index = 0; + + while current_time < video_duration { + let end_time = (current_time + self.duration).min(video_duration); + + let chunk = Chunk::new( + uuid.to_string(), + index, + ChunkType::TimeBased, + current_time, + end_time, + &format!("{:.0}/1", fps as u32), + serde_json::json!({ + "duration": end_time - current_time, + "is_last": end_time >= video_duration, + "segment_number": index + 1, + }), + ); + chunks.push(chunk); + + current_time += step; + index += 1; + } + + chunks + } +} +``` + +### 8.2 使用範例 + +```rust +// 建立時間切割器 (10秒, 無重疊) +let splitter = TimeBasedSplitter::new(10.0, 0.0); +let chunks = splitter.split(&uuid, video_duration, 24.0); + +// 建立時間切割器 (10秒, 2秒重疊) +let splitter = TimeBasedSplitter::new(10.0, 2.0); +let chunks = splitter.split(&uuid, video_duration, 24.0); +``` + +--- + +## 9. 處理流程 + +### 9.1 完整流程 + +``` +1. Register (註冊影片) + └── 取得 UUID, video_duration, fps + +2. Probe (探測影片) + └── 取得 streams, format, fps + +3. 產生 Sentence Chunks + └── 讀取 ASR 輸出 + └── 為每個 segment 建立 chunk + +4. 產生 Cut Chunks + └── 執行場景偵測 + └── 為每個 scene 建立 chunk + +5. 產生 TimeBased Chunks + └── 使用 TimeBasedSplitter + └── 為每個時間段建立 chunk + +6. 儲存至資料庫 + └── 批次寫入 PostgreSQL +``` + +### 9.2 輸出範例 + +``` +影片: 35 秒, FPS: 24 + +Sentence Chunks (3 個): + sentence_0000: 0.0s - 10.0s (252 frames) + sentence_0001: 10.0s - 20.0s (480 frames) + sentence_0002: 20.0s - 35.0s (840 frames) + +Cut Chunks (3 個): + cut_0000: 0.0s - 15.0s (360 frames) + cut_0001: 15.0s - 28.0s (672 frames) + cut_0002: 28.0s - 35.0s (168 frames) + +TimeBased Chunks (4 個, 重疊 2秒): + time_based_0000: 0.0s - 10.0s (240 frames) + time_based_0001: 8.0s - 18.0s (240 frames) + time_based_0002: 16.0s - 26.0s (240 frames) + time_based_0003: 24.0s - 35.0s (264 frames) +``` + +--- + +## 10. 相關文件 + +- [JSON_OUTPUT_SPEC.md](./JSON_OUTPUT_SPEC.md) - JSON 輸出規範 +- [RUST_DEVELOPMENT.md](./RUST_DEVELOPMENT.md) - Rust 開發規範 +- [AGENTS.md](../AGENTS.md) - 開發規範