Files
momentry_core_0_1/docs/CHUNK_SPEC.md
accusys 26f73ab620 docs: Add video chunk specification
- Define three chunk types: Sentence, Cut, TimeBased
- Support overlapping chunks
- Frame-accurate timestamps
- Include content and metadata structures
- Add PostgreSQL schema
- Document Rust data structures and splitter implementation
2026-03-16 15:59:15 +08:00

15 KiB
Raw Permalink Blame History

Video Chunk 切分規範

本文檔定義 Momentry Core 系統中影片 chunks 的切分原則與資料結構。


1. Chunk 概述

1.1 設計原則

  1. 允許重疊: 不同類型的 chunk 可以重疊(如語句 chunk 與時間 chunk
  2. Frame 精確度: 時間坐標精確到影片 frame
  3. 多元分類: 支援語句、場景、時間三種分割方式

1.2 Chunk 類型

類型 說明 是否可重疊
Sentence 語句分割 可與其他類型重疊
Cut 場景切割 可與其他類型重疊
TimeBased 時間長度切割 可與其他類型重疊

2. 時間坐標系統

2.1 時間格式

所有時間使用 為單位,精確到 微秒 (浮點數)

{
  "start_time": 10.5,
  "end_time": 15.75
}

2.2 Frame 計算

frame_number = floor(time_in_seconds * fps)
time_at_frame = frame_number / fps

範例:

  • 影片 FPS: 24/1 (24 fps)
  • 時間: 10.5 秒
  • Frame: floor(10.5 * 24) = 252
  • 校驗: 252 / 24 = 10.5 秒

2.3 Frame 資訊結構

{
  "start_time": 10.5,
  "start_frame": 252,
  "end_time": 15.75,
  "end_frame": 378,
  "fps": "24/1",
  "fps_value": 24.0
}

3. 三種切分方式

3.1 Sentence (語句分割)

原則:

  • 根據 ASR 語音識別結果
  • 每個識別的語句為一個 chunk
  • 文字內容來自 ASR 輸出

範例:

ASR 輸出:
[
  {"start": 10.0, "end": 15.0, "text": "Hello world"},
  {"start": 15.0, "end": 20.0, "text": "This is a test"},
  {"start": 20.0, "end": 25.5, "text": "Processing video"}
]

轉換為 Chunks:
┌────────────────────────────────────────┐
│ chunk_0001: 10.0s - 15.0s "Hello world"    │
├────────────────────────────────────────┤
│ chunk_0002: 15.0s - 20.0s "This is a test"  │
├────────────────────────────────────────┤
│ chunk_0003: 20.0s - 25.5s "Processing video" │
└────────────────────────────────────────┘

3.2 Cut (場景切割)

原則:

  • 根據影片鏡頭變化 (scene change / cut detection)
  • 使用 ffmpeg 或 Python (scenedetect) 偵測
  • 每個場景為一個 chunk

偵測方法:

# 使用 ffmpeg 偵測場景變化
ffmpeg -i input.mp4 -filter:v "select='gt(scene,0.3)',showinfo" -f null -

範例:

場景偵測結果:
[
  {"start": 0.0, "end": 45.2, "scene_id": 1},
  {"start": 45.2, "end": 120.5, "scene_id": 2},
  {"start": 120.5, "end": 180.0, "scene_id": 3}
]

轉換為 Chunks:
┌────────────────────────────────────────┐
│ chunk_0001: 0.0s - 45.2s (Scene 1)        │
├────────────────────────────────────────┤
│ chunk_0002: 45.2s - 120.5s (Scene 2)       │
├────────────────────────────────────────┤
│ chunk_0003: 120.5s - 180.0s (Scene 3)      │
└────────────────────────────────────────┘

3.3 TimeBased (時間長度切割)

原則:

  • 固定時間長度切割
  • 預設 10 秒 為一個 chunk
  • 最後一個 chunk 可能不足 10 秒
  • 支援重疊 (可設定 overlap 秒數)

參數配置:

參數 預設值 說明
duration 10.0 每個 chunk 時長 (秒)
overlap 0.0 重疊時長 (秒)

範例 (無重疊):

影片時長: 35 秒, duration=10

Chunks:
┌────────────────────────────────────────┐
│ chunk_0001: 0.0s - 10.0s                  │
├────────────────────────────────────────┤
│ chunk_0002: 10.0s - 20.0s                 │
├────────────────────────────────────────┤
│ chunk_0003: 20.0s - 30.0s                 │
├────────────────────────────────────────┤
│ chunk_0004: 30.0s - 35.0s (不足10秒)       │
└────────────────────────────────────────┘

範例 (有重疊, overlap=2):

影片時長: 35 秒, duration=10, overlap=2

Chunks:
┌────────────────────────────────────────┐
│ chunk_0001: 0.0s - 10.0s                  │
├────────────────────────────────────────┤
│ chunk_0002: 8.0s - 18.0s (重疊 2秒)       │
├────────────────────────────────────────┤
│ chunk_0003: 16.0s - 26.0s (重疊 2秒)      │
├────────────────────────────────────────┤
│ chunk_0004: 24.0s - 34.0s (重疊 2秒)      │
├────────────────────────────────────────┤
│ chunk_0005: 32.0s - 35.0s (重疊+不足)      │
└────────────────────────────────────────┘

4. Chunk 資料結構

4.1 基本結構

{
  "uuid": "1636719dc31f78ac",
  "chunk_id": "sentence_0001",
  "chunk_index": 1,
  "chunk_type": "sentence",
  "start_time": 10.5,
  "start_frame": 252,
  "end_time": 15.75,
  "end_frame": 378,
  "fps": "24/1",
  "fps_value": 24.0,
  "content": {
    "text": "Hello world, this is a test"
  },
  "metadata": {
    "source": "asr",
    "confidence": 0.95,
    "language": "en"
  }
}

4.2 欄位說明

欄位 類型 必填 說明
uuid String 影片 UUID (16 字元)
chunk_id String Chunk 唯一 ID
chunk_index Integer Chunk 索引 (從 0 開始)
chunk_type String 類型: sentence/cut/time_based
start_time Float 開始時間 (秒)
start_frame Integer 開始 frame 編號
end_time Float 結束時間 (秒)
end_frame Integer 結束 frame 編號
fps String FPS 表示 (如 "24/1")
fps_value Float FPS 數值 (如 24.0)
content Object 內容 (見下文)
metadata Object 額外資訊 (見下文)

4.3 Content 結構

根據 chunk_type 不同content 結構也不同:

Sentence Content

{
  "content": {
    "text": "Hello world, this is a test message",
    "text_normalized": "hello world this is a test message",
    "word_count": 7,
    "char_count": 34
  }
}
欄位 類型 說明
text String 原始識別文字
text_normalized String 正規化文字 (小寫,去除標點)
word_count Integer 字詞數量
char_count Integer 字元數量

Cut Content

{
  "content": {
    "scene_id": 2,
    "scene_number": 2,
    "transition_type": "cut",
    "scene_change_score": 0.95
  }
}
欄位 類型 說明
scene_id Integer 場景 ID
scene_number Integer 場景編號
transition_type String 轉場類型: cut/dissolve/fade
scene_change_score Float 場景變化分數 (0-1)

TimeBased Content

{
  "content": {
    "duration": 10.0,
    "is_last": false,
    "segment_number": 3,
    "total_segments": 10
  }
}
欄位 類型 說明
duration Float 時長 (秒)
is_last Boolean 是否最後一個 chunk
segment_number Integer 分段編號
total_segments Integer 總分段數

4.4 Metadata 結構

{
  "metadata": {
    "source": "asr",
    "confidence": 0.95,
    "language": "en",
    "model": "tiny",
    "created_at": "2026-03-16T10:00:00Z"
  }
}
欄位 類型 說明
source String 來源: asr/scene_detect/time_based
confidence Float 信心度 (0-1)
language String 語言代碼
model String 使用模型
created_at String 創建時間 (ISO 8601)

5. Chunk ID 命名規範

5.1 格式

{chunk_type}_{chunk_index:04}
類型 前綴 範例
Sentence sentence_ sentence_0001
Cut cut_ cut_0001
TimeBased time_based_ time_based_0001

5.2 編號規則

  • 0 開始
  • 使用 4 位數 補零
  • 按時間順序遞增

6. 資料庫 Schema

6.1 PostgreSQL Table

CREATE TABLE chunks (
    id BIGSERIAL PRIMARY KEY,
    uuid VARCHAR(16) NOT NULL,
    chunk_id VARCHAR(64) NOT NULL,
    chunk_index INTEGER NOT NULL,
    chunk_type VARCHAR(32) NOT NULL,
    start_time DOUBLE PRECISION NOT NULL,
    start_frame BIGINT NOT NULL,
    end_time DOUBLE PRECISION NOT NULL,
    end_frame BIGINT NOT NULL,
    fps VARCHAR(16) NOT NULL,
    fps_value DOUBLE PRECISION NOT NULL,
    content JSONB NOT NULL,
    metadata JSONB,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    UNIQUE(uuid, chunk_id)
);

-- 索引
CREATE INDEX idx_chunks_uuid ON chunks(uuid);
CREATE INDEX idx_chunks_type ON chunks(chunk_type);
CREATE INDEX idx_chunks_time ON chunks(start_time, end_time);
CREATE INDEX idx_chunks_uuid_type ON chunks(uuid, chunk_type);

6.2 查詢範例

-- 查詢影片所有 chunks
SELECT * FROM chunks WHERE uuid = '1636719dc31f78ac';

-- 查詢特定類型的 chunks
SELECT * FROM chunks WHERE uuid = '1636719dc31f78ac' AND chunk_type = 'sentence';

-- 查詢時間範圍內的 chunks
SELECT * FROM chunks 
WHERE uuid = '1636719dc31f78ac' 
AND start_time <= 30.0 AND end_time >= 20.0;

-- 查詢時間範圍內的所有 chunks (混合類型)
SELECT * FROM chunks 
WHERE uuid = '1636719dc31f78ac' 
AND start_time <= 30.0 AND end_time >= 20.0
ORDER BY chunk_type, chunk_index;

7. Rust 資料結構

7.1 Chunk 定義

use serde::{Deserialize, Serialize};

#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq)]
#[serde(rename_all = "snake_case")]
pub enum ChunkType {
    Sentence,
    Cut,
    TimeBased,
}

impl ChunkType {
    pub fn as_str(&self) -> &'static str {
        match self {
            ChunkType::Sentence => "sentence",
            ChunkType::Cut => "cut",
            ChunkType::TimeBased => "time_based",
        }
    }
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Chunk {
    pub uuid: String,
    pub chunk_id: String,
    pub chunk_index: u32,
    pub chunk_type: ChunkType,
    pub start_time: f64,
    pub start_frame: i64,
    pub end_time: f64,
    pub end_frame: i64,
    pub fps: String,
    pub fps_value: f64,
    pub content: serde_json::Value,
    pub metadata: Option<serde_json::Value>,
}

7.2 建立 Chunk

impl Chunk {
    pub fn new(
        uuid: String,
        chunk_index: u32,
        chunk_type: ChunkType,
        start_time: f64,
        end_time: f64,
        fps: &str,
        content: serde_json::Value,
    ) -> Self {
        let fps_value = parse_fps(fps);
        let start_frame = (start_time * fps_value) as i64;
        let end_frame = (end_time * fps_value) as i64;
        let chunk_id = format!("{}_{:04}", chunk_type.as_str(), chunk_index);
        
        Self {
            uuid,
            chunk_id,
            chunk_index,
            chunk_type,
            start_time,
            start_frame,
            end_time,
            end_frame,
            fps: fps.to_string(),
            fps_value,
            content,
            metadata: None,
        }
    }
}

8. 時間切割器實作

8.1 TimeBasedSplitter

pub struct TimeBasedSplitter {
    pub duration: f64,  // 每個 chunk 時長 (秒)
    pub overlap: f64,  // 重疊時長 (秒)
}

impl TimeBasedSplitter {
    pub fn new(duration: f64, overlap: f64) -> Self {
        Self { duration, overlap }
    }
    
    pub fn split(&self, uuid: &str, video_duration: f64, fps: f64) -> Vec<Chunk> {
        let mut chunks = Vec::new();
        let step = self.duration - self.overlap;
        let mut current_time = 0.0;
        let mut index = 0;
        
        while current_time < video_duration {
            let end_time = (current_time + self.duration).min(video_duration);
            
            let chunk = Chunk::new(
                uuid.to_string(),
                index,
                ChunkType::TimeBased,
                current_time,
                end_time,
                &format!("{:.0}/1", fps as u32),
                serde_json::json!({
                    "duration": end_time - current_time,
                    "is_last": end_time >= video_duration,
                    "segment_number": index + 1,
                }),
            );
            chunks.push(chunk);
            
            current_time += step;
            index += 1;
        }
        
        chunks
    }
}

8.2 使用範例

// 建立時間切割器 (10秒, 無重疊)
let splitter = TimeBasedSplitter::new(10.0, 0.0);
let chunks = splitter.split(&uuid, video_duration, 24.0);

// 建立時間切割器 (10秒, 2秒重疊)
let splitter = TimeBasedSplitter::new(10.0, 2.0);
let chunks = splitter.split(&uuid, video_duration, 24.0);

9. 處理流程

9.1 完整流程

1. Register (註冊影片)
   └── 取得 UUID, video_duration, fps

2. Probe (探測影片)
   └── 取得 streams, format, fps

3. 產生 Sentence Chunks
   └── 讀取 ASR 輸出
       └── 為每個 segment 建立 chunk

4. 產生 Cut Chunks
   └── 執行場景偵測
       └── 為每個 scene 建立 chunk

5. 產生 TimeBased Chunks
   └── 使用 TimeBasedSplitter
       └── 為每個時間段建立 chunk

6. 儲存至資料庫
   └── 批次寫入 PostgreSQL

9.2 輸出範例

影片: 35 秒, FPS: 24

Sentence Chunks (3 個):
  sentence_0000: 0.0s - 10.0s (252 frames)
  sentence_0001: 10.0s - 20.0s (480 frames)
  sentence_0002: 20.0s - 35.0s (840 frames)

Cut Chunks (3 個):
  cut_0000: 0.0s - 15.0s (360 frames)
  cut_0001: 15.0s - 28.0s (672 frames)
  cut_0002: 28.0s - 35.0s (168 frames)

TimeBased Chunks (4 個, 重疊 2秒):
  time_based_0000: 0.0s - 10.0s (240 frames)
  time_based_0001: 8.0s - 18.0s (240 frames)
  time_based_0002: 16.0s - 26.0s (240 frames)
  time_based_0003: 24.0s - 35.0s (264 frames)

10. 相關文件