2501.07365

๐ŸŽฏ ์—ฐ๊ตฌ ๋™๊ธฐ ๋ฐ ๋ฌธ์ œ ์ •์˜

๊ธฐ์กด ๋ฌธ์ œ์ :

  • ์ด์ปค๋จธ์Šค ๊ฒ€์ƒ‰์—์„œ Semantic Retrieval์€ ์ฃผ๋กœ ํ…์ŠคํŠธ ๊ธฐ๋ฐ˜์œผ๋กœ๋งŒ ์—ฐ๊ตฌ๋จ
  • ์ƒํ’ˆ ์ด๋ฏธ์ง€๋Š” ๊ตฌ๋งค ๊ฒฐ์ •์— ์ค‘์š”ํ•œ ์š”์†Œ์ธ๋ฐ, dense retrieval์—์„œ์˜ ์˜ํ–ฅ์ด ์ฒด๊ณ„์ ์œผ๋กœ ์—ฐ๊ตฌ๋˜์ง€ ์•Š์Œ
  • ๊ธฐ์กด multimodal ์—ฐ๊ตฌ๋“ค์˜ ํ•œ๊ณ„:
    • MaxSim ๊ฐ™์€ score function์€ ๋Œ€๊ทœ๋ชจ ์ธ๋ฑ์Šค์—์„œ ํ™•์žฅ์„ฑ ๋ฌธ์ œ
    • ๋…๋ฆฝ์ ์œผ๋กœ ํ•™์Šต๋œ text encoder์™€ visual encoder ๊ฐ„ misalignment ์ด์Šˆ

์—ฐ๊ตฌ ๋ชฉํ‘œ:

  • Text-only vs Multimodal representation ๋น„๊ต
  • Cosine similarity ๊ธฐ๋ฐ˜์˜ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ(scalable) ์†”๋ฃจ์…˜ ์ œ์‹œ
  • ์ˆ˜๋ฐฑ๋งŒ ๊ฐœ ์ƒํ’ˆ ์ธ๋ฑ์Šค์—์„œ์˜ ์‹ค์ œ ์„ฑ๋Šฅ ๊ฒ€์ฆ

B) ๐Ÿ—๏ธ ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜

B.1) Text-Only Baseline

  • Query encoder + Product text encoder (2-tower)
  • NT-Xent loss๋กœ ํ•™์Šต

B.2) 4-Tower Multimodal Model

Query Tower โ†’ Query Embedding
Product Text Tower โ†’ Text Embedding  โ”€โ”ฌโ†’ Fusion โ†’ Product Embedding
Product Image Tower โ†’ Image Embedding โ”€โ”˜
(+ Optional: Query Image Tower)
  • Text์™€ Image๋ฅผ ๋ณ„๋„ encoder๋กœ ์ธ์ฝ”๋”ฉ
  • Fusion module๋กœ ๊ฒฐํ•ฉ (Concatenation, MLP ๋“ฑ)
  • Pre-trained visual encoder ํ™œ์šฉ ๊ฐ€๋Šฅ

B.3) 3-Tower Multimodal Model

Query Tower โ†’ Query Embedding
Product Text Tower โ†’ Text Embedding  โ”€โ”ฌโ†’ Fusion โ†’ Product Embedding
Product Image Tower โ†’ Image Embedding โ”€โ”˜
  • 4-tower์—์„œ Query Image Tower ์ œ๊ฑฐ
  • ๋” ๊ฒฝ๋Ÿ‰ํ™”๋œ ๊ตฌ์กฐ
  • Fine-tuning์œผ๋กœ 4-tower์— ๊ทผ์ ‘ํ•œ ์„ฑ๋Šฅ ๋‹ฌ์„ฑ ๊ฐ€๋Šฅ

C) ๐Ÿ”ฌ ์‹คํ—˜ ์„ค์ •

๋ฐ์ดํ„ฐ์…‹:

  • ์ด์ปค๋จธ์Šค ๋ฐ์ดํ„ฐ์…‹ (์ˆ˜๋ฐฑ๋งŒ ๊ฐœ ์ƒํ’ˆ)
  • Query-Product positive pair๋กœ ํ•™์Šต
  • In-batch negatives + ์ถ”๊ฐ€ hard negatives (query๋‹น 3๊ฐœ)

ํ‰๊ฐ€ ์ง€ํ‘œ:

  • Purchase Recall: ์‹ค์ œ ๊ตฌ๋งค๋กœ ์ด์–ด์ง„ ์ƒํ’ˆ์˜ recall
  • Relevance Accuracy: ๊ฒ€์ƒ‰ ๊ด€๋ จ์„ฑ ์ •ํ™•๋„
  • Exclusive match ๋ถ„์„ (Multimodal์—์„œ๋งŒ ๊ฒ€์ƒ‰๋˜๋Š” ์ƒํ’ˆ)

์ธํ”„๋ผ:

  • Cosine similarity ๊ธฐ๋ฐ˜ scoring (ANN indexer ํ˜ธํ™˜)
  • ๋Œ€๊ทœ๋ชจ ์ธ๋ฑ์Šค(์ˆ˜๋ฐฑ๋งŒ ์ƒํ’ˆ)์—์„œ ํšจ์œจ์  ๊ฒ€์ƒ‰

D) ๐Ÿ“Š ์ฃผ์š” ๊ฒฐ๊ณผ

D.1) ํ•ต์‹ฌ ๋ฐœ๊ฒฌ

  1. Multimodal > Text-only

    • Purchase recall ๋˜๋Š” Relevance accuracy์—์„œ ๊ฐœ์„ 
  2. 4-tower ๋ชจ๋ธ์˜ ์žฅ์ 

    • Pre-trained visual encoder ํ†ตํ•ฉ ์‹œ relevance score ์œ ์˜๋ฏธํ•˜๊ฒŒ ํ–ฅ์ƒ
  3. 3-tower ๋ชจ๋ธ์˜ ์‹ค์šฉ์„ฑ

    • Fine-tuning์œผ๋กœ 4-tower์— ๊ทผ์ ‘ํ•œ ์„ฑ๋Šฅ ๋‹ฌ์„ฑ
    • ๋” ์ ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ํšจ์œจ์ 
  4. Exclusive Match ๋ถ„์„

    • Multimodal ๋ชจ๋ธ์—์„œ๋งŒ ๊ฒ€์ƒ‰๋˜๋Š” ์ƒํ’ˆ๋“ค ์กด์žฌ
    • โ†’ ์ด๋ฏธ์ง€ ์ •๋ณด๊ฐ€ ํ…์ŠคํŠธ๋กœ ํ‘œํ˜„ ๋ชปํ•˜๋Š” ์ •๋ณด๋ฅผ ๋ณด์™„

E) ๐Ÿ’ก Contribution ์ •๋ฆฌ

  1. Text-only, 3-tower, 4-tower ๋ชจ๋ธ์„ ์ฒด๊ณ„์ ์œผ๋กœ ๋น„๊ต โ†’ Language-Visual alignment ์ดํ•ด
  2. Pre-trained visual encoder ํ†ตํ•ฉ์ด 4-tower์—์„œ ํฐ ๊ธฐ์—ฌ
  3. 3-tower๋„ fine-tuning์œผ๋กœ 4-tower์— ๊ทผ์ ‘ ๊ฐ€๋Šฅ (์•ฝ๊ฐ„์˜ ์„ฑ๋Šฅ ์ €ํ•˜)
  4. ๋Œ€๊ทœ๋ชจ ์ธ๋ฑ์Šค(์ˆ˜๋ฐฑ๋งŒ) + Cosine similarity๋กœ ์‹ค์šฉ์  ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅ
  5. Purchase recall๊ณผ relevance accuracy ๋‘ ์ง€ํ‘œ ๋ชจ๋‘์—์„œ ํšจ๊ณผ ๊ฒ€์ฆ

F) ์ž„๋ฒ ๋”ฉ ์ฐจ์›

๋…ผ๋ฌธ์—์„œ ๊ตฌ์ฒด์ ์ธ ์ฐจ์› ์ˆ˜์น˜๋Š” ๋ช…์‹œํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ ๋ชจ๋ธ ๊ตฌ์กฐ ์ •๋ณด๋Š” ๋‚˜์™€์žˆ์–ด์š”:

  • BiBERT (Text-only baseline): 2-layer Transformer, 4 attention heads
  • CLIP ๋ชจ๋ธ์˜ text/image encoder ์‚ฌ์šฉ
  • ์ตœ์ข… embedding dimension์€ โ€œNโ€์œผ๋กœ๋งŒ ํ‘œ๊ธฐ

CLIP ๊ธฐ๋ฐ˜์ด๋‹ˆ๊นŒ ์•„๋งˆ 512 ๋˜๋Š” 768์ฐจ์›์ผ ๊ฒƒ์œผ๋กœ ์ถ”์ •๋˜๋Š”๋ฐ, ์ •ํ™•ํ•œ ์ˆ˜์น˜๋Š” ๋…ผ๋ฌธ ๋ณธ๋ฌธ์„ ๋ด์•ผ ํ•  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.


G) Text-only Vs Multimodal ํšจ๊ณผ ๋น„๊ต

G.1) ์ฃผ์š” ์ˆ˜์น˜

๋ชจ๋ธRelevance (Exact+Substitute)Purchase Recall
CLIP alone71.9%46% (๋‚ฎ์Œ)
BiBERT (Text-only)baselinebaseline
4-tower Multimodal๊ฐœ์„ ๊ฐœ์„ 

G.2) Exclusive Match ๋ถ„์„ (Multimodal๋งŒ ๊ฒ€์ƒ‰ํ•œ ์ƒํ’ˆ)

  • Query๋‹น 60~80๊ฐœ์˜ exclusive matches
  • Ground-truth ๋Œ€๋น„ 10~20% recall ์ฐจ์ง€
  • Precision๋„ ๋†’์Œ: ~50% Exact match, ~30% Substitute match

H) ํ•ต์‹ฌ ๋ฐœ๊ฒฌ

  1. CLIP๋งŒ ์“ฐ๋ฉด relevance๋Š” ๋†’์€๋ฐ purchase recall์ด ๋‚ฎ์Œ (46%)
    • โ†’ ์ด๋ฏธ์ง€ ์œ ์‚ฌ์„ฑ โ‰  ๊ตฌ๋งค ์˜๋„
  2. Multimodal์ด Text-only๋ณด๋‹ค ๋‚˜์€ ์ :
    • Relevance accuracy ๋˜๋Š” Purchase recall ๊ฐœ์„ 
    • ๋‘˜ ๋‹ค ๋™์‹œ์— ์ตœ์ ํ™”ํ•˜๊ธฐ๋Š” ์–ด๋ ค์›€ (trade-off ์กด์žฌ)
  3. Fine-tuning + Hard negatives ์ถ”๊ฐ€ ์‹œ:
    • Relevance precision ํฌ๊ฒŒ ๊ฐœ์„  (irrelevant ๊ฐ์†Œ)
    • ๋‹จ, recall์€ ์•ฝ๊ฐ„ ํ•˜๋ฝ

I) ๐Ÿ“ ์ž„๋ฒ ๋”ฉ ์ฐจ์›

๋…ผ๋ฌธ์—์„œ ๊ตฌ์ฒด์ ์ธ ์ฐจ์› ์ˆ˜์น˜๋Š” ๋ช…์‹œํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

  • ์ˆ˜์‹์—์„œ N์œผ๋กœ๋งŒ ํ‘œ๊ธฐ (q โˆˆ โ„แดบ, d โˆˆ โ„แดบ)
  • BiBERT: 2-layer Transformer, 4 attention heads
  • CLIP: 50์–ต ์ด๋ฏธ์ง€๋กœ pre-trained๋œ ๋ชจ๋ธ ์‚ฌ์šฉ

Fusion ๋ฐฉ์‹์— ๋”ฐ๋ผ ์ตœ์ข… ์ฐจ์›์ด ๋‹ฌ๋ผ์ง:

  • Concatenation: BiBERT dim + CLIP dim
  • MLP fusion: MLP output dim

J) ๐Ÿ“Š Text-only Vs Multimodal ์„ฑ๋Šฅ ๋น„๊ต (Table 2)

ModelRecall@100ExactSubstituteIrrelevant
BiBERT (Text-only)78.1%52.7%30.3%13.6%
CLIP alone46%45.4%26.5%25.4%
4tMM cat78.6%52.5%31.1%14%
4tMM ฮฑ-cat78.5%51.9%31.2%14.5%
4tMM (BiBERT+MLP joint)73.3%54%26.8%11.9%
3tMM (BiBERT+MLP joint)73.1%53.8%26.8%12.1%

K) ๐ŸŽฏ ํ•ต์‹ฌ ๊ฒฐ๊ณผ ์š”์•ฝ

K.1) Concatenation๋งŒ ํ–ˆ์„ ๋•Œ

  • Recall: 78.1% โ†’ 78.6% (+0.5%p ๊ฐœ์„ )
  • Relevance๋Š” ๋น„์Šทํ•˜๊ฑฐ๋‚˜ ์•ฝ๊ฐ„ ํ˜ผํ•ฉ๋œ ๊ฒฐ๊ณผ

K.2) Joint Training (BiBERT + MLP ๊ฐ™์ด ํ•™์Šต)

  • Exact: 52.7% โ†’ 54% (+1.3%p)
  • Irrelevant: 13.6% โ†’ 11.9% (-1.7%p) โ† ์ด๊ฒŒ ํผ
  • ๋Œ€์‹  Recall: 78.1% โ†’ 73.3% (-4.8%p) ํ•˜๋ฝ

K.3) Trade-off ์กด์žฌ

Recall โ†” Relevance Precision์€ ๋ฐ˜๋น„๋ก€ ๊ด€๊ณ„


L) ๐Ÿ” Exclusive Match ๋ถ„์„ (Table 4) - Multimodal๋งŒ ์ฐพ์€ ์ƒํ’ˆ

ModelQuery๋‹น Exclusive ์ƒํ’ˆNet RecallNet ExactNet Irrelevant
4tMM (joint)60๊ฐœ56.2%57.6%11.4%
3tMM (joint)59๊ฐœ56.2%57.6%11.5%

โ†’ Multimodal์—์„œ๋งŒ ๊ฒ€์ƒ‰๋œ ์ƒํ’ˆ๋“ค๋„ ํ’ˆ์งˆ์ด ๋†’์Œ (Exact 57.6%, Irrelevant 11.4%)


M) โœ… ๊ฒฐ๋ก : Text-only๋ณด๋‹ค ํšจ๊ณผ์ ์ธ๊ฐ€?

๊ด€์ ๊ฒฐ๊ณผ
Relevance Precisionโœ… Multimodal์ด ๋” ์ข‹์Œ (Exactโ†‘, Irrelevantโ†“)
Purchase RecallโŒ ์˜คํžˆ๋ ค ํ•˜๋ฝ (78.1% โ†’ 73.3%)
Exclusive Match ํ’ˆ์งˆโœ… ํ…์ŠคํŠธ๋กœ ๋ชป ์ฐพ๋Š” ๊ณ ํ’ˆ์งˆ ์ƒํ’ˆ ๋ฐœ๊ฒฌ

M.1) ๋…ผ๋ฌธ์˜ ๊ฒฐ๋ก 

โ€œMultimodal models show larger potential on improving relevance accuracy (higher exact, lower irrelevant) than purchase prediction.โ€

์ฆ‰, ๊ตฌ๋งค ์˜ˆ์ธก๋ณด๋‹ค๋Š” ๊ด€๋ จ์„ฑ ์ •ํ™•๋„ ๊ฐœ์„ ์— ๋” ํšจ๊ณผ์ ์ด๋ผ๋Š” ๊ฒƒ์ด ์ด ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ๋ฐœ๊ฒฌ์ž…๋‹ˆ๋‹ค.

N) ๐Ÿ“ ํ‰๊ฐ€ ์ง€ํ‘œ ์ธก์ • ๋ฐฉ๋ฒ•

N.1) Recall@100

Recall@100 = (Top-100 ์˜ˆ์ธก ์ค‘ ์‹ค์ œ ๊ตฌ๋งค๋œ ์ƒํ’ˆ ์ˆ˜) / (ํ•ด๋‹น ์ฟผ๋ฆฌ์˜ ์‹ค์ œ ์ด ๊ตฌ๋งค ์ˆ˜)
  • ์ฟผ๋ฆฌ๋ณ„๋กœ ๊ณ„์‚ฐ ํ›„ ์ „์ฒด ํ‰๊ท 
  • Ground truth = ์‹ค์ œ ๊ตฌ๋งค ๊ธฐ๋ก
  • ์ฆ‰, โ€œ๊ตฌ๋งค๋กœ ์ด์–ด์ง„ ์ƒํ’ˆ์„ ์–ผ๋งˆ๋‚˜ ์ž˜ ์ฐพ์•˜๋‚˜โ€

N.2) Exact / Substitute / Complement / Irrelevant

  • ๋ณ„๋„์˜ Relevance Annotation ๋ชจ๋ธ๋กœ ๋ผ๋ฒจ๋ง
  • ๊ฐ query-product pair์— ๋Œ€ํ•ด 4๊ฐ€์ง€ ์ค‘ ํ•˜๋‚˜๋กœ ๋ถ„๋ฅ˜:
๋ผ๋ฒจ์˜๋ฏธ์˜ˆ์‹œ (์ฟผ๋ฆฌ: โ€œ์•„์ดํฐ 15 ์ผ€์ด์Šคโ€)
Exact์ •ํ™•ํžˆ ์›ํ•˜๋Š” ์ƒํ’ˆ์•„์ดํฐ 15 ์ „์šฉ ์ผ€์ด์Šค
Substitute๋Œ€์ฒด ๊ฐ€๋Šฅํ•œ ์ƒํ’ˆ์•„์ดํฐ 14 ์ผ€์ด์Šค (ํ˜ธํ™˜๋จ)
Complement๋ณด์™„์žฌ์•„์ดํฐ 15 ๊ฐ•ํ™”์œ ๋ฆฌ
Irrelevant๊ด€๋ จ ์—†์Œ๊ฐค๋Ÿญ์‹œ ์ผ€์ด์Šค
  • Top-100 ์˜ˆ์ธก ๊ฒฐ๊ณผ์—์„œ ๊ฐ ๋ผ๋ฒจ์˜ ๋น„์œจ(%) ์„ ๊ณ„์‚ฐ
  • ์ข‹์€ ๋ชจ๋ธ = Exactโ†‘, Irrelevantโ†“

O) ๐Ÿ”€ ฮ‘-cat (Alpha-weighted Concatenation)

๋…ผ๋ฌธ Eq.4์— ์ •์˜๋˜์–ด ์žˆ์–ด์š”:

f_ฮฑ-cat(vโ‚, vโ‚‚) = (ฮฑ ยท vโ‚) โŠ• ((1-ฮฑ) ยท vโ‚‚),  ฮฑ โˆˆ (0, 1)

O.1) ์ผ๋ฐ˜ Concatenation (cat)

[BiBERT_emb, CLIP_emb]  โ† ๊ทธ๋ƒฅ ์ด์–ด๋ถ™์ด๊ธฐ

O.2) ฮ‘-weighted Concatenation (ฮฑ-cat)

[ฮฑ ร— BiBERT_emb, (1-ฮฑ) ร— CLIP_emb]  โ† ๊ฐ€์ค‘์น˜ ๋ถ€์—ฌ ํ›„ ์ด์–ด๋ถ™์ด๊ธฐ

์˜๋„: BiBERT์™€ CLIP ์ž„๋ฒ ๋”ฉ์˜ ์ƒ๋Œ€์  ์ค‘์š”๋„๋ฅผ ์กฐ์ ˆ

  • ฮฑ = 0.7์ด๋ฉด ํ…์ŠคํŠธ(BiBERT) ๋” ์ค‘์‹œ
  • ฮฑ = 0.3์ด๋ฉด ์ด๋ฏธ์ง€(CLIP) ๋” ์ค‘์‹œ

O.3) ์‹คํ—˜ ๊ฒฐ๊ณผ (Table 2)

FusionRecall@100Exact
cat78.6%52.5%
ฮฑ-cat78.5%51.9%

โ†’ ํฐ ์ฐจ์ด ์—†์Œ. ๋‹จ์ˆœ concat์ด๋ž‘ ๊ฑฐ์˜ ๋น„์Šทํ•œ ์„ฑ๋Šฅ


P) ์ฆ‰, โ€œSentinelโ€์ด๋ผ๋Š” ๋‚ด๋ถ€ ๋ชจ๋ธ

  • Amazon ๋‚ด๋ถ€์—์„œ ๋งŒ๋“  Query-Product Relevance ๋ถ„๋ฅ˜ ๋ชจ๋ธ
  • ์ž…๋ ฅ: (query, product) pair
  • ์ถœ๋ ฅ: Exact / Substitute / Complement / Irrelevant ์ค‘ ํ•˜๋‚˜
Sentinel("์•„์ดํฐ 15 ์ผ€์ด์Šค", ์ƒํ’ˆA) โ†’ "Exact"
Sentinel("์•„์ดํฐ 15 ์ผ€์ด์Šค", ์ƒํ’ˆB) โ†’ "Irrelevant"

Q) ์™œ ์ด๋Ÿฐ ๋ชจ๋ธ์„ ์“ฐ๋‚˜?

Q.1) ๋ฌธ์ œ

  • 3.38M ์ƒํ’ˆ ร— 38K ์ฟผ๋ฆฌ = 1,280์–ต ๊ฐœ pair
  • ์‚ฌ๋žŒ์ด ์ผ์ผ์ด ๋ผ๋ฒจ๋ง ๋ถˆ๊ฐ€๋Šฅ

Q.2) ํ•ด๊ฒฐ

  • ์ผ๋ถ€ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์‚ฌ๋žŒ์ด ๋ผ๋ฒจ๋ง โ†’ Relevance ๋ถ„๋ฅ˜ ๋ชจ๋ธ ํ•™์Šต
  • ์ด ๋ชจ๋ธ๋กœ ๋‚˜๋จธ์ง€ ์ „์ฒด ๋ฐ์ดํ„ฐ ์ž๋™ ๋ผ๋ฒจ๋ง

R) ์š”์•ฝ

์šฉ์–ด์„ค๋ช…
Recall@100๊ตฌ๋งค ๊ธฐ๋ก ๊ธฐ๋ฐ˜, ์‹ค์ œ ๊ตฌ๋งค ์ƒํ’ˆ์„ Top-100์—์„œ ์–ผ๋งˆ๋‚˜ ์ฐพ์•˜๋‚˜
Exact/Substitute/โ€ฆAnnotation ๋ชจ๋ธ์ด ๋ถ„๋ฅ˜ํ•œ ๊ด€๋ จ์„ฑ ๋ผ๋ฒจ ๋น„์œจ
ฮฑ-cat๋‘ ์ž„๋ฒ ๋”ฉ์— ๊ฐ€์ค‘์น˜ ฮฑ, (1-ฮฑ) ๊ณฑํ•ด์„œ concat
ํ•ญ๋ชฉ๋‚ด์šฉ
๋ชจ๋ธ ์ด๋ฆ„Sentinel (Amazon ๋‚ด๋ถ€ ๋ชจ๋ธ)
์—ญํ• Query-Product pair์˜ ๊ด€๋ จ์„ฑ ์ž๋™ ๋ถ„๋ฅ˜
์ถœ๋ ฅ4-class (Exact/Substitute/Complement/Irrelevant)
๊ตฌ์กฐ๋…ผ๋ฌธ์— ๋ฏธ๊ณต๊ฐœ (์•„๋งˆ Cross-Encoder ๊ธฐ๋ฐ˜ ์ถ”์ •)

๊ฒฐ๊ตญ ์‚ฌ๋žŒ ๋ผ๋ฒจ๋ง์„ ๋Œ€์ฒดํ•˜๋Š” ์ž๋™ํ™”๋œ ํ‰๊ฐ€ ๋ชจ๋ธ

DatasetDistinct Products
Training581,158 (์•ฝ 58๋งŒ)
Evaluation3,384,067 (์•ฝ 338๋งŒ)

โ†’ ์–ต ๋‹จ์œ„ ์•„๋‹ˆ๊ณ  ๋ฐฑ๋งŒ ๋‹จ์œ„์ž…๋‹ˆ๋‹ค.

R.1) ์‹ค์ œ Amazon ๊ทœ๋ชจ ์ฐธ๊ณ 

  • Amazon ์ „์ฒด ์ƒํ’ˆ: 3์–ต ๊ฐœ ์ด์ƒ
  • ์ด ๋…ผ๋ฌธ์€ ํŠน์ • ๋งˆ์ผ“/์นดํ…Œ๊ณ ๋ฆฌ ์ƒ˜ํ”Œ๋กœ ์‹คํ—˜ํ•œ ๊ฒƒ์œผ๋กœ ์ถ”์ •

S) ANN (Approximate Nearest Neighbor)

S.1) ๋…ผ๋ฌธ Page 4

โ€œusing KNN (k-nearest neighbors) algorithm in FAISS library for top-100 relevant products retrievalโ€

Query Embedding โ†’ FAISS Index (338๋งŒ ์ƒํ’ˆ) โ†’ Top-100 ๊ฒ€์ƒ‰

S.3) FAISS ํŠน์ง•

  • Meta(Facebook)์—์„œ ๋งŒ๋“  ๋ฒกํ„ฐ ๊ฒ€์ƒ‰ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
  • GPU ๊ฐ€์† ์ง€์›
  • ๋‹ค์–‘ํ•œ ์ธ๋ฑ์Šค ํƒ€์ž…: IVF, HNSW, PQ ๋“ฑ

๋…ผ๋ฌธ์—์„œ ๊ตฌ์ฒด์ ์ธ ์ธ๋ฑ์Šค ํƒ€์ž…(IVF, HNSW ๋“ฑ)์€ ๋ช…์‹œ ์•ˆ ํ•จโ€”์•„๋งˆ ๊ธฐ๋ณธ ์„ค์ •์ด๋‚˜ IVF ๊ณ„์—ด ์‚ฌ์šฉ ์ถ”์ •


T) ์š”์•ฝ

์งˆ๋ฌธ๋‹ต๋ณ€
์‹ค์„œ๋น„์Šค?โŒ ์˜คํ”„๋ผ์ธ ์‹คํ—˜๋งŒ (๋ฐฐํฌ ์–ธ๊ธ‰ ์—†์Œ)
์ƒํ’ˆ ๊ฐœ์ˆ˜338๋งŒ ๊ฐœ (์–ต ๋‹จ์œ„ ์•„๋‹˜)
ANNFAISS ์‚ฌ์šฉ (๊ตฌ์ฒด์  ์ธ๋ฑ์Šค ํƒ€์ž… ๋ฏธ๋ช…์‹œ)