Entity-Aware Summarization for Reliable Sponsored Search
Tsinghua University ยท Microsoft AI ยท Microsoft Research
AI-generated summaries often misalign with actual webpage content, leading to user dissatisfaction and retrieval inaccuracies.
LLMs may generate compelling but inaccurate summaries that don't match the actual webpage content, misleading users.
Current models struggle to accurately represent critical entities like brands, products, or features essential for alignment.
Webpages covering multiple items make it challenging to establish relevance between specific queries and content.
A structured three-step process that ensures AI-generated summaries align with webpage content and user intent.
Extract and categorize crucial entities from webpage content:
Anticipate potential user queries by analyzing entity relationships:
Create concise, engaging summaries that:
Our comprehensive training approach combines supervised fine-tuning with preference optimization.
Fine-tune LLaMA3.1-8B on GPT-4o generated entity-aware summaries
Train dense retrieval model using summaries and queries
Apply Direct Preference Optimization (DPO) for query alignment
Our entity-aware approach significantly outperforms existing methods across all metrics.
| Entity Type | Base Model F1 | Our Model F1 | Improvement |
|---|---|---|---|
| Brand | 0.26 | 0.57 | +119% |
| Product | 0.21 | 0.44 | +110% |
| Feature | 0.22 | 0.26 | +18% |
| Method | Recall@50 | nDCG@50 | MRR@10 | Average |
|---|---|---|---|---|
| Raw Webpage | 76.84 | 41.61 | 31.87 | 58.75 |
| Basic Summary | 83.78 | 54.37 | 45.49 | 67.98 |
| Our Method | 84.70 | 54.64 | 46.19 | 68.50 |
Our entity-aware summarization framework represents a significant advancement in AI-driven information retrieval for sponsored search.
By focusing on critical entities and aligning summaries with user queries, we've achieved remarkable improvements in both accuracy and relevance. This work paves the way for more trustworthy and effective AI-generated content in commercial search environments.