What You See Is What You Get

Entity-Aware Summarization for Reliable Sponsored Search

Xiao Liang, Xinyu Hu, Simiao Zuo, et al.

Tsinghua University ยท Microsoft AI ยท Microsoft Research

The Challenge

AI-generated summaries often misalign with actual webpage content, leading to user dissatisfaction and retrieval inaccuracies.

๐ŸŽฏ

Misalignment Issue

LLMs may generate compelling but inaccurate summaries that don't match the actual webpage content, misleading users.

๐Ÿท๏ธ

Lack of Entity Awareness

Current models struggle to accurately represent critical entities like brands, products, or features essential for alignment.

๐Ÿ”

Poor Query-Document Relevance

Webpages covering multiple items make it challenging to establish relevance between specific queries and content.

Our Solution: Entity-Aware Summarization

A structured three-step process that ensures AI-generated summaries align with webpage content and user intent.

1

Webpage Entity Tagging

Extract and categorize crucial entities from webpage content:

  • Brand names
  • Product information
  • Key features
  • Pricing details
  • Target audience
2

Query Reflection

Anticipate potential user queries by analyzing entity relationships:

  • Generate 10 diverse search queries
  • Assess entity coverage
  • Ensure relevance to user intent
3

Entity-Aware Summary Generation

Create concise, engaging summaries that:

  • Highlight key entities
  • Address user queries
  • Maintain accuracy
Example Transformation:
Webpage: Nike Air Max 90 product page
Basic Summary: "Check out all different Nike shoes"
Entity-Aware Summary: "Nike's Air Max 90 offers superior comfort and iconic style for runners and casual wearers. Priced at $120, features cushioned sole and breathable mesh."

Training Pipeline

Our comprehensive training approach combines supervised fine-tuning with preference optimization.

Step 1

Fine-tune LLaMA3.1-8B on GPT-4o generated entity-aware summaries

Step 2

Train dense retrieval model using summaries and queries

Step 3

Apply Direct Preference Optimization (DPO) for query alignment

Impressive Results

Our entity-aware approach significantly outperforms existing methods across all metrics.

82.31%
Improvement in Entity Coverage F1 Score
7.86%
Increase in Recall@50
9.75%
Average Improvement Across All Metrics
Entity Type Base Model F1 Our Model F1 Improvement
Brand 0.26 0.57 +119%
Product 0.21 0.44 +110%
Feature 0.22 0.26 +18%

Retrieval Performance Comparison

Method Recall@50 nDCG@50 MRR@10 Average
Raw Webpage 76.84 41.61 31.87 58.75
Basic Summary 83.78 54.37 45.49 67.98
Our Method 84.70 54.64 46.19 68.50

Conclusion

Our entity-aware summarization framework represents a significant advancement in AI-driven information retrieval for sponsored search.

By focusing on critical entities and aligning summaries with user queries, we've achieved remarkable improvements in both accuracy and relevance. This work paves the way for more trustworthy and effective AI-generated content in commercial search environments.

Key Contributions:

  • โœ“ Novel three-step entity-aware summarization process
  • โœ“ Integration of Direct Preference Optimization for query alignment
  • โœ“ Substantial improvements across all evaluation metrics
  • โœ“ Practical framework for real-world sponsored search applications