# Video Chunk 切分規範 本文檔定義 Momentry Core 系統中影片 chunks 的切分原則與資料結構。 --- ## 1. Chunk 概述 ### 1.1 設計原則 1. **允許重疊**: 不同類型的 chunk 可以重疊(如語句 chunk 與時間 chunk) 2. **Frame 精確度**: 時間坐標精確到影片 frame 3. **多元分類**: 支援語句、場景、時間三種分割方式 ### 1.2 Chunk 類型 | 類型 | 說明 | 是否可重疊 | |------|------|------------| | **Sentence** | 語句分割 | ✅ 可與其他類型重疊 | | **Cut** | 場景切割 | ✅ 可與其他類型重疊 | | **TimeBased** | 時間長度切割 | ✅ 可與其他類型重疊 | --- ## 2. 時間坐標系統 ### 2.1 時間格式 所有時間使用 **秒** 為單位,精確到 **微秒** (浮點數): ```json { "start_time": 10.5, "end_time": 15.75 } ``` ### 2.2 Frame 計算 ``` frame_number = floor(time_in_seconds * fps) time_at_frame = frame_number / fps ``` **範例**: - 影片 FPS: 24/1 (24 fps) - 時間: 10.5 秒 - Frame: floor(10.5 * 24) = 252 - 校驗: 252 / 24 = 10.5 秒 ✅ ### 2.3 Frame 資訊結構 ```json { "start_time": 10.5, "start_frame": 252, "end_time": 15.75, "end_frame": 378, "fps": "24/1", "fps_value": 24.0 } ``` --- ## 3. 三種切分方式 ### 3.1 Sentence (語句分割) **原則**: - 根據 ASR 語音識別結果 - 每個識別的語句為一個 chunk - 文字內容來自 ASR 輸出 **範例**: ``` ASR 輸出: [ {"start": 10.0, "end": 15.0, "text": "Hello world"}, {"start": 15.0, "end": 20.0, "text": "This is a test"}, {"start": 20.0, "end": 25.5, "text": "Processing video"} ] 轉換為 Chunks: ┌────────────────────────────────────────┐ │ chunk_0001: 10.0s - 15.0s "Hello world" │ ├────────────────────────────────────────┤ │ chunk_0002: 15.0s - 20.0s "This is a test" │ ├────────────────────────────────────────┤ │ chunk_0003: 20.0s - 25.5s "Processing video" │ └────────────────────────────────────────┘ ``` ### 3.2 Cut (場景切割) **原則**: - 根據影片鏡頭變化 (scene change / cut detection) - 使用 ffmpeg 或 Python (scenedetect) 偵測 - 每個場景為一個 chunk **偵測方法**: ```bash # 使用 ffmpeg 偵測場景變化 ffmpeg -i input.mp4 -filter:v "select='gt(scene,0.3)',showinfo" -f null - ``` **範例**: ``` 場景偵測結果: [ {"start": 0.0, "end": 45.2, "scene_id": 1}, {"start": 45.2, "end": 120.5, "scene_id": 2}, {"start": 120.5, "end": 180.0, "scene_id": 3} ] 轉換為 Chunks: ┌────────────────────────────────────────┐ │ chunk_0001: 0.0s - 45.2s (Scene 1) │ ├────────────────────────────────────────┤ │ chunk_0002: 45.2s - 120.5s (Scene 2) │ ├────────────────────────────────────────┤ │ chunk_0003: 120.5s - 180.0s (Scene 3) │ └────────────────────────────────────────┘ ``` ### 3.3 TimeBased (時間長度切割) **原則**: - 固定時間長度切割 - 預設 **10 秒** 為一個 chunk - 最後一個 chunk 可能不足 10 秒 - **支援重疊** (可設定 overlap 秒數) **參數配置**: | 參數 | 預設值 | 說明 | |------|--------|------| | duration | 10.0 | 每個 chunk 時長 (秒) | | overlap | 0.0 | 重疊時長 (秒) | **範例** (無重疊): ``` 影片時長: 35 秒, duration=10 Chunks: ┌────────────────────────────────────────┐ │ chunk_0001: 0.0s - 10.0s │ ├────────────────────────────────────────┤ │ chunk_0002: 10.0s - 20.0s │ ├────────────────────────────────────────┤ │ chunk_0003: 20.0s - 30.0s │ ├────────────────────────────────────────┤ │ chunk_0004: 30.0s - 35.0s (不足10秒) │ └────────────────────────────────────────┘ ``` **範例** (有重疊, overlap=2): ``` 影片時長: 35 秒, duration=10, overlap=2 Chunks: ┌────────────────────────────────────────┐ │ chunk_0001: 0.0s - 10.0s │ ├────────────────────────────────────────┤ │ chunk_0002: 8.0s - 18.0s (重疊 2秒) │ ├────────────────────────────────────────┤ │ chunk_0003: 16.0s - 26.0s (重疊 2秒) │ ├────────────────────────────────────────┤ │ chunk_0004: 24.0s - 34.0s (重疊 2秒) │ ├────────────────────────────────────────┤ │ chunk_0005: 32.0s - 35.0s (重疊+不足) │ └────────────────────────────────────────┘ ``` --- ## 4. Chunk 資料結構 ### 4.1 基本結構 ```json { "uuid": "1636719dc31f78ac", "chunk_id": "sentence_0001", "chunk_index": 1, "chunk_type": "sentence", "start_time": 10.5, "start_frame": 252, "end_time": 15.75, "end_frame": 378, "fps": "24/1", "fps_value": 24.0, "content": { "text": "Hello world, this is a test" }, "metadata": { "source": "asr", "confidence": 0.95, "language": "en" } } ``` ### 4.2 欄位說明 | 欄位 | 類型 | 必填 | 說明 | |------|------|------|------| | `uuid` | String | ✅ | 影片 UUID (16 字元) | | `chunk_id` | String | ✅ | Chunk 唯一 ID | | `chunk_index` | Integer | ✅ | Chunk 索引 (從 0 開始) | | `chunk_type` | String | ✅ | 類型: sentence/cut/time_based | | `start_time` | Float | ✅ | 開始時間 (秒) | | `start_frame` | Integer | ✅ | 開始 frame 編號 | | `end_time` | Float | ✅ | 結束時間 (秒) | | `end_frame` | Integer | ✅ | 結束 frame 編號 | | `fps` | String | ✅ | FPS 表示 (如 "24/1") | | `fps_value` | Float | ✅ | FPS 數值 (如 24.0) | | `content` | Object | ✅ | 內容 (見下文) | | `metadata` | Object | ❌ | 額外資訊 (見下文) | ### 4.3 Content 結構 根據 `chunk_type` 不同,content 結構也不同: #### Sentence Content ```json { "content": { "text": "Hello world, this is a test message", "text_normalized": "hello world this is a test message", "word_count": 7, "char_count": 34 } } ``` | 欄位 | 類型 | 說明 | |------|------|------| | `text` | String | 原始識別文字 | | `text_normalized` | String | 正規化文字 (小寫,去除標點) | | `word_count` | Integer | 字詞數量 | | `char_count` | Integer | 字元數量 | #### Cut Content ```json { "content": { "scene_id": 2, "scene_number": 2, "transition_type": "cut", "scene_change_score": 0.95 } } ``` | 欄位 | 類型 | 說明 | |------|------|------| | `scene_id` | Integer | 場景 ID | | `scene_number` | Integer | 場景編號 | | `transition_type` | String | 轉場類型: cut/dissolve/fade | | `scene_change_score` | Float | 場景變化分數 (0-1) | #### TimeBased Content ```json { "content": { "duration": 10.0, "is_last": false, "segment_number": 3, "total_segments": 10 } } ``` | 欄位 | 類型 | 說明 | |------|------|------| | `duration` | Float | 時長 (秒) | | `is_last` | Boolean | 是否最後一個 chunk | | `segment_number` | Integer | 分段編號 | | `total_segments` | Integer | 總分段數 | ### 4.4 Metadata 結構 ```json { "metadata": { "source": "asr", "confidence": 0.95, "language": "en", "model": "tiny", "created_at": "2026-03-16T10:00:00Z" } } ``` | 欄位 | 類型 | 說明 | |------|------|------| | `source` | String | 來源: asr/scene_detect/time_based | | `confidence` | Float | 信心度 (0-1) | | `language` | String | 語言代碼 | | `model` | String | 使用模型 | | `created_at` | String | 創建時間 (ISO 8601) | --- ## 5. Chunk ID 命名規範 ### 5.1 格式 ``` {chunk_type}_{chunk_index:04} ``` | 類型 | 前綴 | 範例 | |------|------|------| | Sentence | `sentence_` | `sentence_0001` | | Cut | `cut_` | `cut_0001` | | TimeBased | `time_based_` | `time_based_0001` | ### 5.2 編號規則 - 從 **0** 開始 - 使用 **4 位數** 補零 - 按時間順序遞增 --- ## 6. 資料庫 Schema ### 6.1 PostgreSQL Table ```sql CREATE TABLE chunks ( id BIGSERIAL PRIMARY KEY, uuid VARCHAR(16) NOT NULL, chunk_id VARCHAR(64) NOT NULL, chunk_index INTEGER NOT NULL, chunk_type VARCHAR(32) NOT NULL, start_time DOUBLE PRECISION NOT NULL, start_frame BIGINT NOT NULL, end_time DOUBLE PRECISION NOT NULL, end_frame BIGINT NOT NULL, fps VARCHAR(16) NOT NULL, fps_value DOUBLE PRECISION NOT NULL, content JSONB NOT NULL, metadata JSONB, created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), UNIQUE(uuid, chunk_id) ); -- 索引 CREATE INDEX idx_chunks_uuid ON chunks(uuid); CREATE INDEX idx_chunks_type ON chunks(chunk_type); CREATE INDEX idx_chunks_time ON chunks(start_time, end_time); CREATE INDEX idx_chunks_uuid_type ON chunks(uuid, chunk_type); ``` ### 6.2 查詢範例 ```sql -- 查詢影片所有 chunks SELECT * FROM chunks WHERE uuid = '1636719dc31f78ac'; -- 查詢特定類型的 chunks SELECT * FROM chunks WHERE uuid = '1636719dc31f78ac' AND chunk_type = 'sentence'; -- 查詢時間範圍內的 chunks SELECT * FROM chunks WHERE uuid = '1636719dc31f78ac' AND start_time <= 30.0 AND end_time >= 20.0; -- 查詢時間範圍內的所有 chunks (混合類型) SELECT * FROM chunks WHERE uuid = '1636719dc31f78ac' AND start_time <= 30.0 AND end_time >= 20.0 ORDER BY chunk_type, chunk_index; ``` --- ## 7. Rust 資料結構 ### 7.1 Chunk 定義 ```rust use serde::{Deserialize, Serialize}; #[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq)] #[serde(rename_all = "snake_case")] pub enum ChunkType { Sentence, Cut, TimeBased, } impl ChunkType { pub fn as_str(&self) -> &'static str { match self { ChunkType::Sentence => "sentence", ChunkType::Cut => "cut", ChunkType::TimeBased => "time_based", } } } #[derive(Debug, Clone, Serialize, Deserialize)] pub struct Chunk { pub uuid: String, pub chunk_id: String, pub chunk_index: u32, pub chunk_type: ChunkType, pub start_time: f64, pub start_frame: i64, pub end_time: f64, pub end_frame: i64, pub fps: String, pub fps_value: f64, pub content: serde_json::Value, pub metadata: Option, } ``` ### 7.2 建立 Chunk ```rust impl Chunk { pub fn new( uuid: String, chunk_index: u32, chunk_type: ChunkType, start_time: f64, end_time: f64, fps: &str, content: serde_json::Value, ) -> Self { let fps_value = parse_fps(fps); let start_frame = (start_time * fps_value) as i64; let end_frame = (end_time * fps_value) as i64; let chunk_id = format!("{}_{:04}", chunk_type.as_str(), chunk_index); Self { uuid, chunk_id, chunk_index, chunk_type, start_time, start_frame, end_time, end_frame, fps: fps.to_string(), fps_value, content, metadata: None, } } } ``` --- ## 8. 時間切割器實作 ### 8.1 TimeBasedSplitter ```rust pub struct TimeBasedSplitter { pub duration: f64, // 每個 chunk 時長 (秒) pub overlap: f64, // 重疊時長 (秒) } impl TimeBasedSplitter { pub fn new(duration: f64, overlap: f64) -> Self { Self { duration, overlap } } pub fn split(&self, uuid: &str, video_duration: f64, fps: f64) -> Vec { let mut chunks = Vec::new(); let step = self.duration - self.overlap; let mut current_time = 0.0; let mut index = 0; while current_time < video_duration { let end_time = (current_time + self.duration).min(video_duration); let chunk = Chunk::new( uuid.to_string(), index, ChunkType::TimeBased, current_time, end_time, &format!("{:.0}/1", fps as u32), serde_json::json!({ "duration": end_time - current_time, "is_last": end_time >= video_duration, "segment_number": index + 1, }), ); chunks.push(chunk); current_time += step; index += 1; } chunks } } ``` ### 8.2 使用範例 ```rust // 建立時間切割器 (10秒, 無重疊) let splitter = TimeBasedSplitter::new(10.0, 0.0); let chunks = splitter.split(&uuid, video_duration, 24.0); // 建立時間切割器 (10秒, 2秒重疊) let splitter = TimeBasedSplitter::new(10.0, 2.0); let chunks = splitter.split(&uuid, video_duration, 24.0); ``` --- ## 9. 處理流程 ### 9.1 完整流程 ``` 1. Register (註冊影片) └── 取得 UUID, video_duration, fps 2. Probe (探測影片) └── 取得 streams, format, fps 3. 產生 Sentence Chunks └── 讀取 ASR 輸出 └── 為每個 segment 建立 chunk 4. 產生 Cut Chunks └── 執行場景偵測 └── 為每個 scene 建立 chunk 5. 產生 TimeBased Chunks └── 使用 TimeBasedSplitter └── 為每個時間段建立 chunk 6. 儲存至資料庫 └── 批次寫入 PostgreSQL ``` ### 9.2 輸出範例 ``` 影片: 35 秒, FPS: 24 Sentence Chunks (3 個): sentence_0000: 0.0s - 10.0s (252 frames) sentence_0001: 10.0s - 20.0s (480 frames) sentence_0002: 20.0s - 35.0s (840 frames) Cut Chunks (3 個): cut_0000: 0.0s - 15.0s (360 frames) cut_0001: 15.0s - 28.0s (672 frames) cut_0002: 28.0s - 35.0s (168 frames) TimeBased Chunks (4 個, 重疊 2秒): time_based_0000: 0.0s - 10.0s (240 frames) time_based_0001: 8.0s - 18.0s (240 frames) time_based_0002: 16.0s - 26.0s (240 frames) time_based_0003: 24.0s - 35.0s (264 frames) ``` --- ## 10. 相關文件 - [JSON_OUTPUT_SPEC.md](./JSON_OUTPUT_SPEC.md) - JSON 輸出規範 - [RUST_DEVELOPMENT.md](./RUST_DEVELOPMENT.md) - Rust 開發規範 - [AGENTS.md](../AGENTS.md) - 開發規範