docs: Add video chunk specification
- Define three chunk types: Sentence, Cut, TimeBased - Support overlapping chunks - Frame-accurate timestamps - Include content and metadata structures - Add PostgreSQL schema - Document Rust data structures and splitter implementation
This commit is contained in:
586
docs/CHUNK_SPEC.md
Normal file
586
docs/CHUNK_SPEC.md
Normal file
@@ -0,0 +1,586 @@
|
||||
# Video Chunk 切分規範
|
||||
|
||||
本文檔定義 Momentry Core 系統中影片 chunks 的切分原則與資料結構。
|
||||
|
||||
---
|
||||
|
||||
## 1. Chunk 概述
|
||||
|
||||
### 1.1 設計原則
|
||||
|
||||
1. **允許重疊**: 不同類型的 chunk 可以重疊(如語句 chunk 與時間 chunk)
|
||||
2. **Frame 精確度**: 時間坐標精確到影片 frame
|
||||
3. **多元分類**: 支援語句、場景、時間三種分割方式
|
||||
|
||||
### 1.2 Chunk 類型
|
||||
|
||||
| 類型 | 說明 | 是否可重疊 |
|
||||
|------|------|------------|
|
||||
| **Sentence** | 語句分割 | ✅ 可與其他類型重疊 |
|
||||
| **Cut** | 場景切割 | ✅ 可與其他類型重疊 |
|
||||
| **TimeBased** | 時間長度切割 | ✅ 可與其他類型重疊 |
|
||||
|
||||
---
|
||||
|
||||
## 2. 時間坐標系統
|
||||
|
||||
### 2.1 時間格式
|
||||
|
||||
所有時間使用 **秒** 為單位,精確到 **微秒** (浮點數):
|
||||
|
||||
```json
|
||||
{
|
||||
"start_time": 10.5,
|
||||
"end_time": 15.75
|
||||
}
|
||||
```
|
||||
|
||||
### 2.2 Frame 計算
|
||||
|
||||
```
|
||||
frame_number = floor(time_in_seconds * fps)
|
||||
time_at_frame = frame_number / fps
|
||||
```
|
||||
|
||||
**範例**:
|
||||
- 影片 FPS: 24/1 (24 fps)
|
||||
- 時間: 10.5 秒
|
||||
- Frame: floor(10.5 * 24) = 252
|
||||
- 校驗: 252 / 24 = 10.5 秒 ✅
|
||||
|
||||
### 2.3 Frame 資訊結構
|
||||
|
||||
```json
|
||||
{
|
||||
"start_time": 10.5,
|
||||
"start_frame": 252,
|
||||
"end_time": 15.75,
|
||||
"end_frame": 378,
|
||||
"fps": "24/1",
|
||||
"fps_value": 24.0
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. 三種切分方式
|
||||
|
||||
### 3.1 Sentence (語句分割)
|
||||
|
||||
**原則**:
|
||||
- 根據 ASR 語音識別結果
|
||||
- 每個識別的語句為一個 chunk
|
||||
- 文字內容來自 ASR 輸出
|
||||
|
||||
**範例**:
|
||||
|
||||
```
|
||||
ASR 輸出:
|
||||
[
|
||||
{"start": 10.0, "end": 15.0, "text": "Hello world"},
|
||||
{"start": 15.0, "end": 20.0, "text": "This is a test"},
|
||||
{"start": 20.0, "end": 25.5, "text": "Processing video"}
|
||||
]
|
||||
|
||||
轉換為 Chunks:
|
||||
┌────────────────────────────────────────┐
|
||||
│ chunk_0001: 10.0s - 15.0s "Hello world" │
|
||||
├────────────────────────────────────────┤
|
||||
│ chunk_0002: 15.0s - 20.0s "This is a test" │
|
||||
├────────────────────────────────────────┤
|
||||
│ chunk_0003: 20.0s - 25.5s "Processing video" │
|
||||
└────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 3.2 Cut (場景切割)
|
||||
|
||||
**原則**:
|
||||
- 根據影片鏡頭變化 (scene change / cut detection)
|
||||
- 使用 ffmpeg 或 Python (scenedetect) 偵測
|
||||
- 每個場景為一個 chunk
|
||||
|
||||
**偵測方法**:
|
||||
|
||||
```bash
|
||||
# 使用 ffmpeg 偵測場景變化
|
||||
ffmpeg -i input.mp4 -filter:v "select='gt(scene,0.3)',showinfo" -f null -
|
||||
```
|
||||
|
||||
**範例**:
|
||||
|
||||
```
|
||||
場景偵測結果:
|
||||
[
|
||||
{"start": 0.0, "end": 45.2, "scene_id": 1},
|
||||
{"start": 45.2, "end": 120.5, "scene_id": 2},
|
||||
{"start": 120.5, "end": 180.0, "scene_id": 3}
|
||||
]
|
||||
|
||||
轉換為 Chunks:
|
||||
┌────────────────────────────────────────┐
|
||||
│ chunk_0001: 0.0s - 45.2s (Scene 1) │
|
||||
├────────────────────────────────────────┤
|
||||
│ chunk_0002: 45.2s - 120.5s (Scene 2) │
|
||||
├────────────────────────────────────────┤
|
||||
│ chunk_0003: 120.5s - 180.0s (Scene 3) │
|
||||
└────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 3.3 TimeBased (時間長度切割)
|
||||
|
||||
**原則**:
|
||||
- 固定時間長度切割
|
||||
- 預設 **10 秒** 為一個 chunk
|
||||
- 最後一個 chunk 可能不足 10 秒
|
||||
- **支援重疊** (可設定 overlap 秒數)
|
||||
|
||||
**參數配置**:
|
||||
|
||||
| 參數 | 預設值 | 說明 |
|
||||
|------|--------|------|
|
||||
| duration | 10.0 | 每個 chunk 時長 (秒) |
|
||||
| overlap | 0.0 | 重疊時長 (秒) |
|
||||
|
||||
**範例** (無重疊):
|
||||
|
||||
```
|
||||
影片時長: 35 秒, duration=10
|
||||
|
||||
Chunks:
|
||||
┌────────────────────────────────────────┐
|
||||
│ chunk_0001: 0.0s - 10.0s │
|
||||
├────────────────────────────────────────┤
|
||||
│ chunk_0002: 10.0s - 20.0s │
|
||||
├────────────────────────────────────────┤
|
||||
│ chunk_0003: 20.0s - 30.0s │
|
||||
├────────────────────────────────────────┤
|
||||
│ chunk_0004: 30.0s - 35.0s (不足10秒) │
|
||||
└────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**範例** (有重疊, overlap=2):
|
||||
|
||||
```
|
||||
影片時長: 35 秒, duration=10, overlap=2
|
||||
|
||||
Chunks:
|
||||
┌────────────────────────────────────────┐
|
||||
│ chunk_0001: 0.0s - 10.0s │
|
||||
├────────────────────────────────────────┤
|
||||
│ chunk_0002: 8.0s - 18.0s (重疊 2秒) │
|
||||
├────────────────────────────────────────┤
|
||||
│ chunk_0003: 16.0s - 26.0s (重疊 2秒) │
|
||||
├────────────────────────────────────────┤
|
||||
│ chunk_0004: 24.0s - 34.0s (重疊 2秒) │
|
||||
├────────────────────────────────────────┤
|
||||
│ chunk_0005: 32.0s - 35.0s (重疊+不足) │
|
||||
└────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Chunk 資料結構
|
||||
|
||||
### 4.1 基本結構
|
||||
|
||||
```json
|
||||
{
|
||||
"uuid": "1636719dc31f78ac",
|
||||
"chunk_id": "sentence_0001",
|
||||
"chunk_index": 1,
|
||||
"chunk_type": "sentence",
|
||||
"start_time": 10.5,
|
||||
"start_frame": 252,
|
||||
"end_time": 15.75,
|
||||
"end_frame": 378,
|
||||
"fps": "24/1",
|
||||
"fps_value": 24.0,
|
||||
"content": {
|
||||
"text": "Hello world, this is a test"
|
||||
},
|
||||
"metadata": {
|
||||
"source": "asr",
|
||||
"confidence": 0.95,
|
||||
"language": "en"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 欄位說明
|
||||
|
||||
| 欄位 | 類型 | 必填 | 說明 |
|
||||
|------|------|------|------|
|
||||
| `uuid` | String | ✅ | 影片 UUID (16 字元) |
|
||||
| `chunk_id` | String | ✅ | Chunk 唯一 ID |
|
||||
| `chunk_index` | Integer | ✅ | Chunk 索引 (從 0 開始) |
|
||||
| `chunk_type` | String | ✅ | 類型: sentence/cut/time_based |
|
||||
| `start_time` | Float | ✅ | 開始時間 (秒) |
|
||||
| `start_frame` | Integer | ✅ | 開始 frame 編號 |
|
||||
| `end_time` | Float | ✅ | 結束時間 (秒) |
|
||||
| `end_frame` | Integer | ✅ | 結束 frame 編號 |
|
||||
| `fps` | String | ✅ | FPS 表示 (如 "24/1") |
|
||||
| `fps_value` | Float | ✅ | FPS 數值 (如 24.0) |
|
||||
| `content` | Object | ✅ | 內容 (見下文) |
|
||||
| `metadata` | Object | ❌ | 額外資訊 (見下文) |
|
||||
|
||||
### 4.3 Content 結構
|
||||
|
||||
根據 `chunk_type` 不同,content 結構也不同:
|
||||
|
||||
#### Sentence Content
|
||||
|
||||
```json
|
||||
{
|
||||
"content": {
|
||||
"text": "Hello world, this is a test message",
|
||||
"text_normalized": "hello world this is a test message",
|
||||
"word_count": 7,
|
||||
"char_count": 34
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| 欄位 | 類型 | 說明 |
|
||||
|------|------|------|
|
||||
| `text` | String | 原始識別文字 |
|
||||
| `text_normalized` | String | 正規化文字 (小寫,去除標點) |
|
||||
| `word_count` | Integer | 字詞數量 |
|
||||
| `char_count` | Integer | 字元數量 |
|
||||
|
||||
#### Cut Content
|
||||
|
||||
```json
|
||||
{
|
||||
"content": {
|
||||
"scene_id": 2,
|
||||
"scene_number": 2,
|
||||
"transition_type": "cut",
|
||||
"scene_change_score": 0.95
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| 欄位 | 類型 | 說明 |
|
||||
|------|------|------|
|
||||
| `scene_id` | Integer | 場景 ID |
|
||||
| `scene_number` | Integer | 場景編號 |
|
||||
| `transition_type` | String | 轉場類型: cut/dissolve/fade |
|
||||
| `scene_change_score` | Float | 場景變化分數 (0-1) |
|
||||
|
||||
#### TimeBased Content
|
||||
|
||||
```json
|
||||
{
|
||||
"content": {
|
||||
"duration": 10.0,
|
||||
"is_last": false,
|
||||
"segment_number": 3,
|
||||
"total_segments": 10
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| 欄位 | 類型 | 說明 |
|
||||
|------|------|------|
|
||||
| `duration` | Float | 時長 (秒) |
|
||||
| `is_last` | Boolean | 是否最後一個 chunk |
|
||||
| `segment_number` | Integer | 分段編號 |
|
||||
| `total_segments` | Integer | 總分段數 |
|
||||
|
||||
### 4.4 Metadata 結構
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"source": "asr",
|
||||
"confidence": 0.95,
|
||||
"language": "en",
|
||||
"model": "tiny",
|
||||
"created_at": "2026-03-16T10:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| 欄位 | 類型 | 說明 |
|
||||
|------|------|------|
|
||||
| `source` | String | 來源: asr/scene_detect/time_based |
|
||||
| `confidence` | Float | 信心度 (0-1) |
|
||||
| `language` | String | 語言代碼 |
|
||||
| `model` | String | 使用模型 |
|
||||
| `created_at` | String | 創建時間 (ISO 8601) |
|
||||
|
||||
---
|
||||
|
||||
## 5. Chunk ID 命名規範
|
||||
|
||||
### 5.1 格式
|
||||
|
||||
```
|
||||
{chunk_type}_{chunk_index:04}
|
||||
```
|
||||
|
||||
| 類型 | 前綴 | 範例 |
|
||||
|------|------|------|
|
||||
| Sentence | `sentence_` | `sentence_0001` |
|
||||
| Cut | `cut_` | `cut_0001` |
|
||||
| TimeBased | `time_based_` | `time_based_0001` |
|
||||
|
||||
### 5.2 編號規則
|
||||
|
||||
- 從 **0** 開始
|
||||
- 使用 **4 位數** 補零
|
||||
- 按時間順序遞增
|
||||
|
||||
---
|
||||
|
||||
## 6. 資料庫 Schema
|
||||
|
||||
### 6.1 PostgreSQL Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE chunks (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
uuid VARCHAR(16) NOT NULL,
|
||||
chunk_id VARCHAR(64) NOT NULL,
|
||||
chunk_index INTEGER NOT NULL,
|
||||
chunk_type VARCHAR(32) NOT NULL,
|
||||
start_time DOUBLE PRECISION NOT NULL,
|
||||
start_frame BIGINT NOT NULL,
|
||||
end_time DOUBLE PRECISION NOT NULL,
|
||||
end_frame BIGINT NOT NULL,
|
||||
fps VARCHAR(16) NOT NULL,
|
||||
fps_value DOUBLE PRECISION NOT NULL,
|
||||
content JSONB NOT NULL,
|
||||
metadata JSONB,
|
||||
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
|
||||
UNIQUE(uuid, chunk_id)
|
||||
);
|
||||
|
||||
-- 索引
|
||||
CREATE INDEX idx_chunks_uuid ON chunks(uuid);
|
||||
CREATE INDEX idx_chunks_type ON chunks(chunk_type);
|
||||
CREATE INDEX idx_chunks_time ON chunks(start_time, end_time);
|
||||
CREATE INDEX idx_chunks_uuid_type ON chunks(uuid, chunk_type);
|
||||
```
|
||||
|
||||
### 6.2 查詢範例
|
||||
|
||||
```sql
|
||||
-- 查詢影片所有 chunks
|
||||
SELECT * FROM chunks WHERE uuid = '1636719dc31f78ac';
|
||||
|
||||
-- 查詢特定類型的 chunks
|
||||
SELECT * FROM chunks WHERE uuid = '1636719dc31f78ac' AND chunk_type = 'sentence';
|
||||
|
||||
-- 查詢時間範圍內的 chunks
|
||||
SELECT * FROM chunks
|
||||
WHERE uuid = '1636719dc31f78ac'
|
||||
AND start_time <= 30.0 AND end_time >= 20.0;
|
||||
|
||||
-- 查詢時間範圍內的所有 chunks (混合類型)
|
||||
SELECT * FROM chunks
|
||||
WHERE uuid = '1636719dc31f78ac'
|
||||
AND start_time <= 30.0 AND end_time >= 20.0
|
||||
ORDER BY chunk_type, chunk_index;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Rust 資料結構
|
||||
|
||||
### 7.1 Chunk 定義
|
||||
|
||||
```rust
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq)]
|
||||
#[serde(rename_all = "snake_case")]
|
||||
pub enum ChunkType {
|
||||
Sentence,
|
||||
Cut,
|
||||
TimeBased,
|
||||
}
|
||||
|
||||
impl ChunkType {
|
||||
pub fn as_str(&self) -> &'static str {
|
||||
match self {
|
||||
ChunkType::Sentence => "sentence",
|
||||
ChunkType::Cut => "cut",
|
||||
ChunkType::TimeBased => "time_based",
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct Chunk {
|
||||
pub uuid: String,
|
||||
pub chunk_id: String,
|
||||
pub chunk_index: u32,
|
||||
pub chunk_type: ChunkType,
|
||||
pub start_time: f64,
|
||||
pub start_frame: i64,
|
||||
pub end_time: f64,
|
||||
pub end_frame: i64,
|
||||
pub fps: String,
|
||||
pub fps_value: f64,
|
||||
pub content: serde_json::Value,
|
||||
pub metadata: Option<serde_json::Value>,
|
||||
}
|
||||
```
|
||||
|
||||
### 7.2 建立 Chunk
|
||||
|
||||
```rust
|
||||
impl Chunk {
|
||||
pub fn new(
|
||||
uuid: String,
|
||||
chunk_index: u32,
|
||||
chunk_type: ChunkType,
|
||||
start_time: f64,
|
||||
end_time: f64,
|
||||
fps: &str,
|
||||
content: serde_json::Value,
|
||||
) -> Self {
|
||||
let fps_value = parse_fps(fps);
|
||||
let start_frame = (start_time * fps_value) as i64;
|
||||
let end_frame = (end_time * fps_value) as i64;
|
||||
let chunk_id = format!("{}_{:04}", chunk_type.as_str(), chunk_index);
|
||||
|
||||
Self {
|
||||
uuid,
|
||||
chunk_id,
|
||||
chunk_index,
|
||||
chunk_type,
|
||||
start_time,
|
||||
start_frame,
|
||||
end_time,
|
||||
end_frame,
|
||||
fps: fps.to_string(),
|
||||
fps_value,
|
||||
content,
|
||||
metadata: None,
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. 時間切割器實作
|
||||
|
||||
### 8.1 TimeBasedSplitter
|
||||
|
||||
```rust
|
||||
pub struct TimeBasedSplitter {
|
||||
pub duration: f64, // 每個 chunk 時長 (秒)
|
||||
pub overlap: f64, // 重疊時長 (秒)
|
||||
}
|
||||
|
||||
impl TimeBasedSplitter {
|
||||
pub fn new(duration: f64, overlap: f64) -> Self {
|
||||
Self { duration, overlap }
|
||||
}
|
||||
|
||||
pub fn split(&self, uuid: &str, video_duration: f64, fps: f64) -> Vec<Chunk> {
|
||||
let mut chunks = Vec::new();
|
||||
let step = self.duration - self.overlap;
|
||||
let mut current_time = 0.0;
|
||||
let mut index = 0;
|
||||
|
||||
while current_time < video_duration {
|
||||
let end_time = (current_time + self.duration).min(video_duration);
|
||||
|
||||
let chunk = Chunk::new(
|
||||
uuid.to_string(),
|
||||
index,
|
||||
ChunkType::TimeBased,
|
||||
current_time,
|
||||
end_time,
|
||||
&format!("{:.0}/1", fps as u32),
|
||||
serde_json::json!({
|
||||
"duration": end_time - current_time,
|
||||
"is_last": end_time >= video_duration,
|
||||
"segment_number": index + 1,
|
||||
}),
|
||||
);
|
||||
chunks.push(chunk);
|
||||
|
||||
current_time += step;
|
||||
index += 1;
|
||||
}
|
||||
|
||||
chunks
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 8.2 使用範例
|
||||
|
||||
```rust
|
||||
// 建立時間切割器 (10秒, 無重疊)
|
||||
let splitter = TimeBasedSplitter::new(10.0, 0.0);
|
||||
let chunks = splitter.split(&uuid, video_duration, 24.0);
|
||||
|
||||
// 建立時間切割器 (10秒, 2秒重疊)
|
||||
let splitter = TimeBasedSplitter::new(10.0, 2.0);
|
||||
let chunks = splitter.split(&uuid, video_duration, 24.0);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. 處理流程
|
||||
|
||||
### 9.1 完整流程
|
||||
|
||||
```
|
||||
1. Register (註冊影片)
|
||||
└── 取得 UUID, video_duration, fps
|
||||
|
||||
2. Probe (探測影片)
|
||||
└── 取得 streams, format, fps
|
||||
|
||||
3. 產生 Sentence Chunks
|
||||
└── 讀取 ASR 輸出
|
||||
└── 為每個 segment 建立 chunk
|
||||
|
||||
4. 產生 Cut Chunks
|
||||
└── 執行場景偵測
|
||||
└── 為每個 scene 建立 chunk
|
||||
|
||||
5. 產生 TimeBased Chunks
|
||||
└── 使用 TimeBasedSplitter
|
||||
└── 為每個時間段建立 chunk
|
||||
|
||||
6. 儲存至資料庫
|
||||
└── 批次寫入 PostgreSQL
|
||||
```
|
||||
|
||||
### 9.2 輸出範例
|
||||
|
||||
```
|
||||
影片: 35 秒, FPS: 24
|
||||
|
||||
Sentence Chunks (3 個):
|
||||
sentence_0000: 0.0s - 10.0s (252 frames)
|
||||
sentence_0001: 10.0s - 20.0s (480 frames)
|
||||
sentence_0002: 20.0s - 35.0s (840 frames)
|
||||
|
||||
Cut Chunks (3 個):
|
||||
cut_0000: 0.0s - 15.0s (360 frames)
|
||||
cut_0001: 15.0s - 28.0s (672 frames)
|
||||
cut_0002: 28.0s - 35.0s (168 frames)
|
||||
|
||||
TimeBased Chunks (4 個, 重疊 2秒):
|
||||
time_based_0000: 0.0s - 10.0s (240 frames)
|
||||
time_based_0001: 8.0s - 18.0s (240 frames)
|
||||
time_based_0002: 16.0s - 26.0s (240 frames)
|
||||
time_based_0003: 24.0s - 35.0s (264 frames)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. 相關文件
|
||||
|
||||
- [JSON_OUTPUT_SPEC.md](./JSON_OUTPUT_SPEC.md) - JSON 輸出規範
|
||||
- [RUST_DEVELOPMENT.md](./RUST_DEVELOPMENT.md) - Rust 開發規範
|
||||
- [AGENTS.md](../AGENTS.md) - 開發規範
|
||||
Reference in New Issue
Block a user