What You See Is What You Get

Entity-Aware Summarization for Reliable Sponsored Search

Xiao Liang, Xinyu Hu, Simiao Zuo, et al.

Tsinghua University · Microsoft AI · Microsoft Research

The Challenge

AI-generated summaries often misalign with actual webpage content, leading to user dissatisfaction and retrieval inaccuracies.

🎯

Misalignment Issue

LLMs may generate compelling but inaccurate summaries that don't match the actual webpage content, misleading users.

🏷️

Lack of Entity Awareness

Current models struggle to accurately represent critical entities like brands, products, or features essential for alignment.

🔍

Poor Query-Document Relevance

Webpages covering multiple items make it challenging to establish relevance between specific queries and content.

Our Solution: Entity-Aware Summarization

A structured three-step process that ensures AI-generated summaries align with webpage content and user intent.

Webpage Entity Tagging

Extract and categorize crucial entities from webpage content:

Brand names
Product information
Key features
Pricing details
Target audience

Query Reflection

Anticipate potential user queries by analyzing entity relationships:

Generate 10 diverse search queries
Assess entity coverage
Ensure relevance to user intent

Entity-Aware Summary Generation

Create concise, engaging summaries that:

Highlight key entities
Address user queries
Maintain accuracy

Example Transformation:

Webpage: Nike Air Max 90 product page
Basic Summary: "Check out all different Nike shoes"
Entity-Aware Summary: "Nike's Air Max 90 offers superior comfort and iconic style for runners and casual wearers. Priced at $120, features cushioned sole and breathable mesh."

Training Pipeline

Our comprehensive training approach combines supervised fine-tuning with preference optimization.

Step 1

Fine-tune LLaMA3.1-8B on GPT-4o generated entity-aware summaries

Step 2

Train dense retrieval model using summaries and queries

Step 3

Apply Direct Preference Optimization (DPO) for query alignment

Impressive Results

Our entity-aware approach significantly outperforms existing methods across all metrics.

82.31%

Improvement in Entity Coverage F1 Score

7.86%

Increase in Recall@50

9.75%

Average Improvement Across All Metrics

Entity Type	Base Model F1	Our Model F1	Improvement
Brand	0.26	0.57	+119%
Product	0.21	0.44	+110%
Feature	0.22	0.26	+18%

Retrieval Performance Comparison

Method	Recall@50	nDCG@50	MRR@10	Average
Raw Webpage	76.84	41.61	31.87	58.75
Basic Summary	83.78	54.37	45.49	67.98
Our Method	84.70	54.64	46.19	68.50

Conclusion

Our entity-aware summarization framework represents a significant advancement in AI-driven information retrieval for sponsored search.

By focusing on critical entities and aligning summaries with user queries, we've achieved remarkable improvements in both accuracy and relevance. This work paves the way for more trustworthy and effective AI-generated content in commercial search environments.

Key Contributions:

✓ Novel three-step entity-aware summarization process
✓ Integration of Direct Preference Optimization for query alignment
✓ Substantial improvements across all evaluation metrics
✓ Practical framework for real-world sponsored search applications