- Define three chunk types: Sentence, Cut, TimeBased - Support overlapping chunks - Frame-accurate timestamps - Include content and metadata structures - Add PostgreSQL schema - Document Rust data structures and splitter implementation
15 KiB
15 KiB
Video Chunk 切分規範
本文檔定義 Momentry Core 系統中影片 chunks 的切分原則與資料結構。
1. Chunk 概述
1.1 設計原則
- 允許重疊: 不同類型的 chunk 可以重疊(如語句 chunk 與時間 chunk)
- Frame 精確度: 時間坐標精確到影片 frame
- 多元分類: 支援語句、場景、時間三種分割方式
1.2 Chunk 類型
| 類型 | 說明 | 是否可重疊 |
|---|---|---|
| Sentence | 語句分割 | ✅ 可與其他類型重疊 |
| Cut | 場景切割 | ✅ 可與其他類型重疊 |
| TimeBased | 時間長度切割 | ✅ 可與其他類型重疊 |
2. 時間坐標系統
2.1 時間格式
所有時間使用 秒 為單位,精確到 微秒 (浮點數):
{
"start_time": 10.5,
"end_time": 15.75
}
2.2 Frame 計算
frame_number = floor(time_in_seconds * fps)
time_at_frame = frame_number / fps
範例:
- 影片 FPS: 24/1 (24 fps)
- 時間: 10.5 秒
- Frame: floor(10.5 * 24) = 252
- 校驗: 252 / 24 = 10.5 秒 ✅
2.3 Frame 資訊結構
{
"start_time": 10.5,
"start_frame": 252,
"end_time": 15.75,
"end_frame": 378,
"fps": "24/1",
"fps_value": 24.0
}
3. 三種切分方式
3.1 Sentence (語句分割)
原則:
- 根據 ASR 語音識別結果
- 每個識別的語句為一個 chunk
- 文字內容來自 ASR 輸出
範例:
ASR 輸出:
[
{"start": 10.0, "end": 15.0, "text": "Hello world"},
{"start": 15.0, "end": 20.0, "text": "This is a test"},
{"start": 20.0, "end": 25.5, "text": "Processing video"}
]
轉換為 Chunks:
┌────────────────────────────────────────┐
│ chunk_0001: 10.0s - 15.0s "Hello world" │
├────────────────────────────────────────┤
│ chunk_0002: 15.0s - 20.0s "This is a test" │
├────────────────────────────────────────┤
│ chunk_0003: 20.0s - 25.5s "Processing video" │
└────────────────────────────────────────┘
3.2 Cut (場景切割)
原則:
- 根據影片鏡頭變化 (scene change / cut detection)
- 使用 ffmpeg 或 Python (scenedetect) 偵測
- 每個場景為一個 chunk
偵測方法:
# 使用 ffmpeg 偵測場景變化
ffmpeg -i input.mp4 -filter:v "select='gt(scene,0.3)',showinfo" -f null -
範例:
場景偵測結果:
[
{"start": 0.0, "end": 45.2, "scene_id": 1},
{"start": 45.2, "end": 120.5, "scene_id": 2},
{"start": 120.5, "end": 180.0, "scene_id": 3}
]
轉換為 Chunks:
┌────────────────────────────────────────┐
│ chunk_0001: 0.0s - 45.2s (Scene 1) │
├────────────────────────────────────────┤
│ chunk_0002: 45.2s - 120.5s (Scene 2) │
├────────────────────────────────────────┤
│ chunk_0003: 120.5s - 180.0s (Scene 3) │
└────────────────────────────────────────┘
3.3 TimeBased (時間長度切割)
原則:
- 固定時間長度切割
- 預設 10 秒 為一個 chunk
- 最後一個 chunk 可能不足 10 秒
- 支援重疊 (可設定 overlap 秒數)
參數配置:
| 參數 | 預設值 | 說明 |
|---|---|---|
| duration | 10.0 | 每個 chunk 時長 (秒) |
| overlap | 0.0 | 重疊時長 (秒) |
範例 (無重疊):
影片時長: 35 秒, duration=10
Chunks:
┌────────────────────────────────────────┐
│ chunk_0001: 0.0s - 10.0s │
├────────────────────────────────────────┤
│ chunk_0002: 10.0s - 20.0s │
├────────────────────────────────────────┤
│ chunk_0003: 20.0s - 30.0s │
├────────────────────────────────────────┤
│ chunk_0004: 30.0s - 35.0s (不足10秒) │
└────────────────────────────────────────┘
範例 (有重疊, overlap=2):
影片時長: 35 秒, duration=10, overlap=2
Chunks:
┌────────────────────────────────────────┐
│ chunk_0001: 0.0s - 10.0s │
├────────────────────────────────────────┤
│ chunk_0002: 8.0s - 18.0s (重疊 2秒) │
├────────────────────────────────────────┤
│ chunk_0003: 16.0s - 26.0s (重疊 2秒) │
├────────────────────────────────────────┤
│ chunk_0004: 24.0s - 34.0s (重疊 2秒) │
├────────────────────────────────────────┤
│ chunk_0005: 32.0s - 35.0s (重疊+不足) │
└────────────────────────────────────────┘
4. Chunk 資料結構
4.1 基本結構
{
"uuid": "1636719dc31f78ac",
"chunk_id": "sentence_0001",
"chunk_index": 1,
"chunk_type": "sentence",
"start_time": 10.5,
"start_frame": 252,
"end_time": 15.75,
"end_frame": 378,
"fps": "24/1",
"fps_value": 24.0,
"content": {
"text": "Hello world, this is a test"
},
"metadata": {
"source": "asr",
"confidence": 0.95,
"language": "en"
}
}
4.2 欄位說明
| 欄位 | 類型 | 必填 | 說明 |
|---|---|---|---|
uuid |
String | ✅ | 影片 UUID (16 字元) |
chunk_id |
String | ✅ | Chunk 唯一 ID |
chunk_index |
Integer | ✅ | Chunk 索引 (從 0 開始) |
chunk_type |
String | ✅ | 類型: sentence/cut/time_based |
start_time |
Float | ✅ | 開始時間 (秒) |
start_frame |
Integer | ✅ | 開始 frame 編號 |
end_time |
Float | ✅ | 結束時間 (秒) |
end_frame |
Integer | ✅ | 結束 frame 編號 |
fps |
String | ✅ | FPS 表示 (如 "24/1") |
fps_value |
Float | ✅ | FPS 數值 (如 24.0) |
content |
Object | ✅ | 內容 (見下文) |
metadata |
Object | ❌ | 額外資訊 (見下文) |
4.3 Content 結構
根據 chunk_type 不同,content 結構也不同:
Sentence Content
{
"content": {
"text": "Hello world, this is a test message",
"text_normalized": "hello world this is a test message",
"word_count": 7,
"char_count": 34
}
}
| 欄位 | 類型 | 說明 |
|---|---|---|
text |
String | 原始識別文字 |
text_normalized |
String | 正規化文字 (小寫,去除標點) |
word_count |
Integer | 字詞數量 |
char_count |
Integer | 字元數量 |
Cut Content
{
"content": {
"scene_id": 2,
"scene_number": 2,
"transition_type": "cut",
"scene_change_score": 0.95
}
}
| 欄位 | 類型 | 說明 |
|---|---|---|
scene_id |
Integer | 場景 ID |
scene_number |
Integer | 場景編號 |
transition_type |
String | 轉場類型: cut/dissolve/fade |
scene_change_score |
Float | 場景變化分數 (0-1) |
TimeBased Content
{
"content": {
"duration": 10.0,
"is_last": false,
"segment_number": 3,
"total_segments": 10
}
}
| 欄位 | 類型 | 說明 |
|---|---|---|
duration |
Float | 時長 (秒) |
is_last |
Boolean | 是否最後一個 chunk |
segment_number |
Integer | 分段編號 |
total_segments |
Integer | 總分段數 |
4.4 Metadata 結構
{
"metadata": {
"source": "asr",
"confidence": 0.95,
"language": "en",
"model": "tiny",
"created_at": "2026-03-16T10:00:00Z"
}
}
| 欄位 | 類型 | 說明 |
|---|---|---|
source |
String | 來源: asr/scene_detect/time_based |
confidence |
Float | 信心度 (0-1) |
language |
String | 語言代碼 |
model |
String | 使用模型 |
created_at |
String | 創建時間 (ISO 8601) |
5. Chunk ID 命名規範
5.1 格式
{chunk_type}_{chunk_index:04}
| 類型 | 前綴 | 範例 |
|---|---|---|
| Sentence | sentence_ |
sentence_0001 |
| Cut | cut_ |
cut_0001 |
| TimeBased | time_based_ |
time_based_0001 |
5.2 編號規則
- 從 0 開始
- 使用 4 位數 補零
- 按時間順序遞增
6. 資料庫 Schema
6.1 PostgreSQL Table
CREATE TABLE chunks (
id BIGSERIAL PRIMARY KEY,
uuid VARCHAR(16) NOT NULL,
chunk_id VARCHAR(64) NOT NULL,
chunk_index INTEGER NOT NULL,
chunk_type VARCHAR(32) NOT NULL,
start_time DOUBLE PRECISION NOT NULL,
start_frame BIGINT NOT NULL,
end_time DOUBLE PRECISION NOT NULL,
end_frame BIGINT NOT NULL,
fps VARCHAR(16) NOT NULL,
fps_value DOUBLE PRECISION NOT NULL,
content JSONB NOT NULL,
metadata JSONB,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
UNIQUE(uuid, chunk_id)
);
-- 索引
CREATE INDEX idx_chunks_uuid ON chunks(uuid);
CREATE INDEX idx_chunks_type ON chunks(chunk_type);
CREATE INDEX idx_chunks_time ON chunks(start_time, end_time);
CREATE INDEX idx_chunks_uuid_type ON chunks(uuid, chunk_type);
6.2 查詢範例
-- 查詢影片所有 chunks
SELECT * FROM chunks WHERE uuid = '1636719dc31f78ac';
-- 查詢特定類型的 chunks
SELECT * FROM chunks WHERE uuid = '1636719dc31f78ac' AND chunk_type = 'sentence';
-- 查詢時間範圍內的 chunks
SELECT * FROM chunks
WHERE uuid = '1636719dc31f78ac'
AND start_time <= 30.0 AND end_time >= 20.0;
-- 查詢時間範圍內的所有 chunks (混合類型)
SELECT * FROM chunks
WHERE uuid = '1636719dc31f78ac'
AND start_time <= 30.0 AND end_time >= 20.0
ORDER BY chunk_type, chunk_index;
7. Rust 資料結構
7.1 Chunk 定義
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq)]
#[serde(rename_all = "snake_case")]
pub enum ChunkType {
Sentence,
Cut,
TimeBased,
}
impl ChunkType {
pub fn as_str(&self) -> &'static str {
match self {
ChunkType::Sentence => "sentence",
ChunkType::Cut => "cut",
ChunkType::TimeBased => "time_based",
}
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Chunk {
pub uuid: String,
pub chunk_id: String,
pub chunk_index: u32,
pub chunk_type: ChunkType,
pub start_time: f64,
pub start_frame: i64,
pub end_time: f64,
pub end_frame: i64,
pub fps: String,
pub fps_value: f64,
pub content: serde_json::Value,
pub metadata: Option<serde_json::Value>,
}
7.2 建立 Chunk
impl Chunk {
pub fn new(
uuid: String,
chunk_index: u32,
chunk_type: ChunkType,
start_time: f64,
end_time: f64,
fps: &str,
content: serde_json::Value,
) -> Self {
let fps_value = parse_fps(fps);
let start_frame = (start_time * fps_value) as i64;
let end_frame = (end_time * fps_value) as i64;
let chunk_id = format!("{}_{:04}", chunk_type.as_str(), chunk_index);
Self {
uuid,
chunk_id,
chunk_index,
chunk_type,
start_time,
start_frame,
end_time,
end_frame,
fps: fps.to_string(),
fps_value,
content,
metadata: None,
}
}
}
8. 時間切割器實作
8.1 TimeBasedSplitter
pub struct TimeBasedSplitter {
pub duration: f64, // 每個 chunk 時長 (秒)
pub overlap: f64, // 重疊時長 (秒)
}
impl TimeBasedSplitter {
pub fn new(duration: f64, overlap: f64) -> Self {
Self { duration, overlap }
}
pub fn split(&self, uuid: &str, video_duration: f64, fps: f64) -> Vec<Chunk> {
let mut chunks = Vec::new();
let step = self.duration - self.overlap;
let mut current_time = 0.0;
let mut index = 0;
while current_time < video_duration {
let end_time = (current_time + self.duration).min(video_duration);
let chunk = Chunk::new(
uuid.to_string(),
index,
ChunkType::TimeBased,
current_time,
end_time,
&format!("{:.0}/1", fps as u32),
serde_json::json!({
"duration": end_time - current_time,
"is_last": end_time >= video_duration,
"segment_number": index + 1,
}),
);
chunks.push(chunk);
current_time += step;
index += 1;
}
chunks
}
}
8.2 使用範例
// 建立時間切割器 (10秒, 無重疊)
let splitter = TimeBasedSplitter::new(10.0, 0.0);
let chunks = splitter.split(&uuid, video_duration, 24.0);
// 建立時間切割器 (10秒, 2秒重疊)
let splitter = TimeBasedSplitter::new(10.0, 2.0);
let chunks = splitter.split(&uuid, video_duration, 24.0);
9. 處理流程
9.1 完整流程
1. Register (註冊影片)
└── 取得 UUID, video_duration, fps
2. Probe (探測影片)
└── 取得 streams, format, fps
3. 產生 Sentence Chunks
└── 讀取 ASR 輸出
└── 為每個 segment 建立 chunk
4. 產生 Cut Chunks
└── 執行場景偵測
└── 為每個 scene 建立 chunk
5. 產生 TimeBased Chunks
└── 使用 TimeBasedSplitter
└── 為每個時間段建立 chunk
6. 儲存至資料庫
└── 批次寫入 PostgreSQL
9.2 輸出範例
影片: 35 秒, FPS: 24
Sentence Chunks (3 個):
sentence_0000: 0.0s - 10.0s (252 frames)
sentence_0001: 10.0s - 20.0s (480 frames)
sentence_0002: 20.0s - 35.0s (840 frames)
Cut Chunks (3 個):
cut_0000: 0.0s - 15.0s (360 frames)
cut_0001: 15.0s - 28.0s (672 frames)
cut_0002: 28.0s - 35.0s (168 frames)
TimeBased Chunks (4 個, 重疊 2秒):
time_based_0000: 0.0s - 10.0s (240 frames)
time_based_0001: 8.0s - 18.0s (240 frames)
time_based_0002: 16.0s - 26.0s (240 frames)
time_based_0003: 24.0s - 35.0s (264 frames)
10. 相關文件
- JSON_OUTPUT_SPEC.md - JSON 輸出規範
- RUST_DEVELOPMENT.md - Rust 開發規範
- AGENTS.md - 開發規範