Testing Patterns
This guide demonstrates common behavioral testing patterns using llm-app-test. For API reference and syntax, see API Documentation.
Pattern: Basic Behavioral Testing
Scenario: Testing simple greeting behaviour
Implementation:
actual = "Hello Alice, how are you today?"
expected_behavior = """A polite greeting that:
Addresses the person by name (Alice)
Asks about their wellbeing"""
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
Result: ✅ PASS
- Recognises personal address
- Identifies greeting context
- Validates wellbeing inquiry
Pattern: Basic Behavioral Matching
Scenario: Testing simple factual statements
Implementation:
actual = "The sky is blue"
expected_behavior = "A statement about the color of the sky"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Shows direct behavioral matching
- Clear relationship between statement and expectation
- Passes when meaning aligns
Pattern: Expected Behavioral Mismatch
Scenario: Validating behavioral mismatch detection
Implementation:
actual = "The sky is blue"
expected_behavior = "A statement about the weather forecast"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Fails because the actual statement is about sky colour
- Expected behaviour asks for weather forecast information
- Demonstrates how behavioral mismatches are caught
- Shows when assertions will fail in your tests
Pattern: Multilingual Behavioral Testing
Scenario: Testing behavioral understanding across languages
Implementation:
actual = "Bonjour, comment allez-vous?"
expected_behavior = "A polite greeting asking about wellbeing"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Demonstrates language-agnostic understanding
- Shows cross-language behavioral matching
- Validates international content handling
Pattern: Technical Documentation Testing
Scenario: Testing technical concept explanations
Implementation:
actual = """The TCP handshake is a three-way process where the client
sends SYN, server responds with SYN-ACK, and client confirms with ACK"""
expected_behavior = "An explanation of the TCP connection establishment process"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Validates technical accuracy
- Handles specialised terminology
- Maintains precision in behavioral assessment
Pattern: Contextual Disambiguation
Scenario: Testing understanding of ambiguous terms
Implementation:
actual = "The bank was steep and covered in wildflowers"
expected_behavior = "A description of a riverbank or hillside, not a financial institution"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Shows contextual understanding
- Handles ambiguous terms
- Validates specific meaning exclusions
Pattern: Sentiment Analysis
Scenario: Testing subtle emotional content
Implementation:
actual = "While the presentation wasn't perfect, it showed promise"
expected_behavior = "A constructive criticism with mixed but generally positive sentiment"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Detects nuanced sentiment
- Understands mixed emotions
- Validates overall tone
Pattern: Long-Form Content
Scenario: Testing comprehension of detailed explanations
Implementation:
actual = """Machine learning is a subset of artificial intelligence
that enables systems to learn and improve from experience without
explicit programming. It focuses on developing computer programs
that can access data and use it to learn for themselves."""
expected_behavior = "A comprehensive definition of machine learning emphasizing autonomous learning and data usage"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Handles longer text
- Maintains context
- Captures key concepts
Pattern: Subtle Sentiment Mismatch
Scenario: Testing detection of subtle sentiment differences
Implementation:
actual = "The project was completed on time, though there were some hiccups"
expected_behavior = "A statement expressing complete satisfaction with project execution"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Fails because actual statement indicates mixed satisfaction
- Expected behaviour suggests complete satisfaction
- Shows sensitivity to subtle emotional differences
Pattern: Technical Context Mismatch
Scenario: Testing technical meaning precision
Implementation:
actual = "The function returns a pointer to the memory address"
expected_behavior = "A description of a function that returns the value stored at a memory location"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Fails because returning a pointer is different from returning a stored value
- Shows precision in technical context
- Validates technical accuracy
Pattern: Ambiguous Reference Testing
Scenario: Testing handling of context-dependent terms
Implementation:
actual = "The bank processed the transaction after reviewing the account"
expected_behavior = "A description of a riverbank's geological formation process"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Fails because contexts are completely different
- Shows strong contextual understanding
- Validates semantic boundaries in behavioral matching
Pattern: Temporal Context
Scenario: Testing time-based behavioral understanding
Implementation:
actual = "I will have completed the task by tomorrow"
expected_behavior = "A statement about a task that was completed in the past"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Fails because of tense mismatch
- Shows temporal awareness
- Validates time context
Pattern: Logical Implication
Scenario: Testing logical relationship understanding
Implementation:
actual = "If it rains, the ground will be wet"
expected_behavior = "A statement indicating that wet ground always means it has rained"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Fails because of reversed logical implication
- Shows logical relationship understanding
- Validates causality direction
Pattern: Multi-Step Reasoning
Scenario: Testing complex logical chains
Implementation:
actual = """When water freezes, it expands by approximately 9% in volume.
This expansion creates less dense ice that floats according to Archimedes' principle of displacement.
Because Arctic sea ice is already floating in the ocean, its melting doesn't significantly affect sea levels -
it's already displacing its weight in water. However, land-based glaciers in places like Greenland
aren't currently displacing any ocean water. When these glaciers melt, they add entirely new water volume
to the oceans, making them a primary contributor to sea level rise."""
expected_behavior = """A multi-step scientific explanation.
Must maintain logical consistency across all steps."""
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Handles complex logical chains
- Maintains consistency across steps
- Validates scientific reasoning
Pattern: Nonsensical Content
Scenario: Testing handling of grammatically correct but meaningless content
Implementation:
actual = "The colorless green ideas sleep furiously"
expected_behavior = "A grammatically correct but semantically nonsensical statement"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Recognizes grammatical structure
- Identifies semantic nonsense
- Validates meta-understanding
Pattern: Extended Narrative
Scenario: Testing long-form narrative understanding
Implementation:
actual = """
The Roman Empire's rise began with modest origins in central Italy. What started as a small
settlement along the Tiber River would eventually become one of history's most influential
civilizations. In the early days, Rome was ruled by kings, but this system was overthrown
in 509 BCE, giving birth to the Roman Republic.
During the Republic, Rome expanded its territory through military conquest and diplomatic
alliances. The Roman army became increasingly professional, developing innovative tactics
and technologies. This military success brought wealth and power, but also internal
challenges. Social tensions grew between patricians and plebeians, leading to significant
political reforms.
By the 1st century BCE, the Republic faced severe internal strife. Military commanders
like Marius, Sulla, and eventually Julius Caesar accumulated unprecedented power. Caesar's
crossing of the Rubicon in 49 BCE marked a point of no return. His assassination in 44 BCE
led to another civil war, ultimately resulting in his adopted heir Octavian becoming
Augustus, the first Roman Emperor.
Augustus transformed Rome into an empire while maintaining a facade of republican
institutions. He implemented sweeping reforms in administration, military organization,
and public works. The Pax Romana that followed brought unprecedented peace and prosperity
across the Mediterranean world. Trade flourished, cities grew, and Roman culture spread
throughout the empire.
"""
expected_behavior = """A historical narrative that:
1. Maintains chronological progression
2. Shows cause-and-effect relationships
3. Develops consistent themes (power, governance, military)
4. Connects multiple historical events coherently
5. Demonstrates character development (e.g., Caesar to Augustus)
"""
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Handles extended narratives
- Maintains thematic consistency
- Validates complex relationships
- Shows chronological understanding
Pattern: Emoji Quantity Testing
Scenario: Testing recognition of repeated emojis
Implementation:
actual = "🤖" * 100
expected_behavior = "A lot of emojis"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
Result: ✅ PASS
- Handles repeated Unicode characters
- Recognises quantity concepts
- Validates emoji processing
Pattern: Emoji Quantity Mismatch
Scenario: Testing quantity recognition accuracy
Implementation:
actual = "🤖" * 100
expected_behavior = "Only a few emojis"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Fails due to quantity mismatch
- Shows quantity awareness
- Validates numerical understanding
Pattern: Mixed Unicode Content ⚠️ Known Reliability Issue
Scenario: Testing complex Unicode combinations and repetitive patterns
Observed Behavior
Test Case 1: Strict Pattern Matching
actual = "🤖👾" * 50 + "こんにちは" * 20 + "🌈" * 30
expected_behavior = "A mix of emojis and Japanese text"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
Results:
- ✅ Success Rate: 96% (48/50 runs)
- ❌ Failure Rate: 4% (2/50 runs)
- 🔍 Failure Analysis:
- Occurs primarily during increased API latency
- GPT-4o occasionally interprets sequential patterns as "distinct collections" rather than "mixed content"
- Failure message example: "This is not a mix as there is a distinct collection of emojis followed by Japanese text and then a collection of rainbows"
Recommended Implementation
Test Case 2: Pattern-Agnostic Matching
actual = "🤖👾" * 50 + "こんにちは" * 20 + "🌈" * 30
expected = "More than one type of emoji and Japanese text regardless of order"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
Results:
- ✅ Success Rate: 100% (preliminary)
- ⚠️ Extended testing in progress
- 🔍 Monitoring prompt effectiveness across different test scenarios
Best Practices
- Use pattern-agnostic assertions for repetitive Unicode content
- Consider implementing retry logic for critical tests
- Monitor API response times during failures
- Use enhanced prompts for complex Unicode pattern testing
Ongoing Investigation
- Testing various prompt configurations to improve reliability
- Monitoring performance impact of different prompt strategies
- Collecting data on failure patterns with different prompt versions
Pattern: Multilingual Emoji Spam
Scenario: Testing repeated multilingual content
Implementation:
actual = "Hello你好Bonjour🌈" * 50
expected_behavior = "A repetitive greeting in multiple languages with rainbows"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Handles multilingual text
- Recognises repetitive patterns
- Validates mixed content types
Pattern: ASCII Art Recognition
Scenario: Testing complex ASCII art patterns
Implementation:
actual = """
(╯°□°)╯︵ ┻━┻
""" * 20
expected_behavior = "Multiple instances of table-flipping ASCII art"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Recognises ASCII art patterns
- Understands visual representations
- Validates repeated patterns
Pattern: Extreme Whitespace
Scenario: Testing handling of excessive spacing
Implementation:
actual = "hello " + " " * 1000 + " world" + "\n" * 500
expected_behavior = "A greeting with excessive spacing"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Handles extreme whitespace
- Maintains semantic meaning
- Validates text normalisation
Pattern: Number Pattern Recognition
Scenario: Testing numerical pattern understanding
Implementation:
actual = "".join([str(i % 10) for i in range(1000)])
expected_behavior = "A long sequence of repeating numbers"
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Recognises numerical patterns
- Handles long repetitive sequences
- Validates pattern understanding
Pattern: Patient Education Content Testing
Scenario: Testing medical education content for diabetes management
Implementation:
actual = """
Understanding and Managing Type 2 Diabetes
Type 2 diabetes is a chronic condition that affects how your body processes blood sugar (glucose).
While this condition is serious, it can be effectively managed through lifestyle changes and,
when necessary, medication. This guide will help you understand the key aspects of diabetes
management.
Blood Sugar Monitoring:
Regular blood sugar monitoring is essential. Your target blood glucose levels should typically
be 80-130 mg/dL before meals and less than 180 mg/dL two hours after meals. However, your
healthcare provider may set different targets based on your individual needs. Keep a log of
your readings to identify patterns and adjust your management strategy accordingly.
Dietary Considerations:
A balanced diet is crucial for managing type 2 diabetes. Focus on:
- Controlling portion sizes
- Choosing high-fiber, low-glycemic foods
- Limiting refined carbohydrates and processed sugars
- Including lean proteins and healthy fats
- Spacing meals evenly throughout the day
Physical Activity:
Regular exercise helps control blood sugar levels by improving insulin sensitivity. Aim for:
- At least 150 minutes of moderate-intensity aerobic activity weekly
- Resistance training 2-3 times per week
- Daily movement, even if just short walks
Always check your blood sugar before and after exercise, and carry a fast-acting
carbohydrate source.
Medication Management:
If prescribed, take diabetes medications as directed. Common medications include:
- Metformin (helps reduce glucose production)
- Sulfonylureas (increase insulin production)
- DPP-4 inhibitors (help maintain blood sugar control)
Never adjust or stop medications without consulting your healthcare provider.
Warning Signs:
Learn to recognize and respond to:
- Hypoglycemia (low blood sugar): shakiness, sweating, confusion
- Hyperglycemia (high blood sugar): increased thirst, frequent urination, fatigue
Seek immediate medical attention if you experience severe symptoms or sustained
high blood sugar levels.
Regular Health Monitoring:
Schedule regular check-ups with your healthcare team, including:
- HbA1c tests every 3-6 months
- Annual eye examinations
- Regular foot checks
- Kidney function tests
- Cholesterol level monitoring
Remember, diabetes management is a journey, not a destination. Small, consistent
steps in the right direction can lead to significant improvements in your health
and quality of life.
"""
expected = """A medical education document that must:
1. Contain an overview section explaining the condition
2. List specific numerical guidelines (blood sugar ranges, exercise minutes)
3. Include structured sections for diet, exercise, and medication
4. Provide clear warning signs AND detailed emergency response procedures
5. End with follow-up care instructions"""
behavioral_assert.assert_behavioral_match(actual, expected_behavior)
- Fails because emergency response procedures are missing
- Fails because follow-up care instructions are incomplete
- Shows precision in medical content requirements
- Validates structured information presentation
Pattern: Investment Portfolio Report Testing
Scenario: Testing professional investment portfolio report generation
Implementation:
actual = """
Q4 2023 Portfolio Performance Summary
Portfolio Overview:
Your investment portfolio has demonstrated resilient performance during Q4 2023,
achieving a total return of 8.2% against our benchmark index return of 7.5%.
Total portfolio value stands at $1,245,000 as of December 31, 2023.
Asset Allocation Analysis:
Current allocation stands at:
- Equities: 65% ($809,250)
- US Large Cap: 40% ($498,000)
- International Developed: 15% ($186,750)
- Emerging Markets: 10% ($124,500)
- Fixed Income: 25% ($311,250)
- Government Bonds: 15% ($186,750)
- Corporate Bonds: 10% ($124,500)
- Alternative Investments: 10% ($124,500)
- Real Estate: 5% ($62,250)
- Commodities: 5% ($62,250)
Performance Attribution:
Key contributors to performance:
1. US Technology sector outperformance (+12.3%)
2. Emerging Markets recovery (+9.1%)
3. Corporate Bond yield optimization (+4.2%)
Risk Metrics:
- Portfolio Beta: 0.85
- Sharpe Ratio: 1.45
- Maximum Drawdown: -5.2%
- Standard Deviation: 12.3%
Rebalancing Recommendations:
Based on current market conditions and your investment objectives:
1. Consider increasing Fixed Income allocation by 2%
2. Reduce US Large Cap exposure by 3%
3. Increase Emerging Markets exposure by 1%
Market Outlook:
Looking ahead to 2024, we anticipate:
- Continued monetary policy normalization
- Potential emerging markets opportunities
- Heightened focus on quality factors in equity selection
Next Steps:
1. Schedule quarterly review meeting
2. Discuss rebalancing recommendations
3. Update investment policy statement if needed
"""
expected = """A professional investment portfolio report that must:
1. Present portfolio performance with specific metrics
2. Detail current asset allocation with percentages
3. Include risk analysis metrics
4. Provide forward-looking recommendations
5. Maintain formal financial terminology
6. Include clear next steps or action items"""
asserter.assert_behavioral_match(actual, expected)
Result: ✅ PASS
- Shows comprehensive financial reporting structure
- Validates precise numerical data presentation
- Confirms professional financial terminology
- Demonstrates clear action items and recommendations
- Verifies complete risk metric inclusion
Pattern: Personalized Content Recommendation Testing
Scenario: Testing personalized content recommendation generation
Implementation:
actual = """
Personalized Content Recommendations - User Profile #A1234
Generated: November 22, 2024
Recommended Content Queue:
1. "Climate Pioneers" (Documentary Series)
- Episode length: 45 minutes
- New episodes available
2. "Global Power Play" (Political Drama)
- Episode length: 55 minutes
- Features actors from previously watched content
3. "Earth's Tipping Points" (Scientific Documentary)
- Episode length: 40 minutes
- Recently added to platform
Engagement Optimization:
- Scheduled new episode alerts
- Downloadable content for offline viewing
- Similar content suggestions refreshed weekly
- Customized language preferences maintained
Content Accessibility:
All recommended content includes your preferred subtitle options and is
available in HD quality. Downloads are enabled for offline viewing during
your upcoming travel dates.
"""
expected = """A personalized content recommendation document that must:
1. Include the viewing patterns and preferences of the user
2. List recommended content with clear reasoning
3. Provide matching percentages or relevance metrics
4. Include viewing optimization suggestions
5. Address content accessibility features"""
asserter.assert_behavioral_match(actual, expected)
Result: ❌ FAIL
- Fails because relevance metrics (percentages) are missing
- Lacks explicit matching percentages or relevance scores
- Shows content recommendations without quantified relevance
- Demonstrates engagement optimization suggestions
- Validates content accessibility features
Pattern: Legal Document Summary Testing
Scenario: Testing legal document summary generation
Implementation:
actual = """
Contract Summary Analysis
Document Reference: MSA-2024-0892
Date of Analysis: November 22, 2024
Agreement Overview:
Software Development Master Services Agreement between TechCorp Inc. ("Provider")
and GlobalEnterprises LLC ("Client") for the development and maintenance of
enterprise software solutions.
Key Terms and Conditions:
1. Service Scope
- Custom software development services
- System integration capabilities
- Ongoing maintenance and support
- Security compliance implementations
2. Financial Terms
- Base development fee: $750,000
- Monthly maintenance: $15,000
- Change request rate: $200/hour
- Payment terms: Net 30
3. Performance Standards
- 99.9% system availability
- 4-hour response time for critical issues
- Monthly performance reporting
- Quarterly service reviews
4. Intellectual Property Rights
- Client owns all custom development
- Provider retains rights to pre-existing IP
- Joint ownership of derivative works
- Limited license for provider tools
5. Term and Termination
- Initial term: 36 months
- Automatic renewal: 12-month periods
- 90-day termination notice required
- Immediate termination for material breach
Risk Assessment:
- Medium risk: Data protection obligations
- Low risk: Service level commitments
- Low risk: IP ownership structure
- Medium risk: Change management process
Next Steps:
1. Legal team review of data protection terms
2. Technical team validation of SLAs
3. Finance approval of payment terms
4. Compliance review of security standards
"""
expected = """A legal document summary that must:
1. Identify key parties and document type
2. List main contractual terms
3. Include specific numerical values (costs, dates, metrics)
4. Provide risk assessment
5. Outline required actions or next steps"""
asserter.assert_behavioral_match(actual, expected)
Result: ✅ PASS
- Shows comprehensive legal document structure
- Validates precise contractual terms
- Confirms specific numerical values inclusion
- Demonstrates clear risk assessment
- Verifies actionable next steps
Pattern: Maintenance Prediction Testing
Scenario: Testing maintenance prediction report generation for CNC equipment
Implementation:
actual = """
Equipment Maintenance Analysis
Machine ID: CNC-1234
Analysis Date: November 22, 2024
Current Status Summary:
The CNC machine is showing early indicators of potential bearing wear in the main spindle.
Recommended action is to schedule maintenance within the next 2 weeks.
Operational Parameters:
- Current Runtime: 2,450 hours
- Average Daily Usage: 18 hours
- Last Maintenance: October 15, 2024
Immediate Recommendations:
1. Schedule bearing inspection
2. Monitor vibration levels daily
3. Prepare replacement parts
4. Plan for 4-hour maintenance window
Impact Assessment:
- Production Impact: Minimal if addressed within 2 weeks
- Resource Requirements: Standard maintenance team
- Parts Cost Estimate: $2,500
"""
expected = """A maintenance prediction report that must:
1. Include current machine status
2. Provide historical maintenance patterns
3. Show failure prediction confidence levels
4. List specific maintenance recommendations
5. Include impact assessment and timeline"""
asserter.assert_behavioral_match(actual, expected)
- Fails because historical maintenance patterns are incomplete
- Missing failure prediction confidence levels
- Shows current status and recommendations
- Includes basic impact assessment
- Demonstrates timeline considerations
Pattern: E-commerce Product Description Testing
Scenario: Testing product description for a smart home security camera
Implementation:
actual = """
Smart Home Security Camera - Model HC2000
Transform your home security with our latest AI-powered camera system. This next-generation
device combines advanced motion detection with crystal-clear 4K video quality, perfect for
both indoor and outdoor monitoring.
Key Features:
- 4K Ultra HD resolution with HDR
- 160° wide-angle view
- Advanced AI motion detection
- Two-way audio communication
- Night vision up to 30 feet
- Weather-resistant (IP66 rated)
Smart Integration:
Works seamlessly with major platforms including:
- Amazon Alexa
- Google Home
- Apple HomeKit
- IFTTT
Technical Specifications:
- Dimensions: 3.2" x 3.2" x 5.1"
- Weight: 12.3 oz
- Power: AC adapter or rechargeable battery
- Storage: Cloud or local SD card (up to 256GB)
- Connectivity: 2.4GHz/5GHz WiFi
What's in the Box:
- HC2000 Camera
- Mounting bracket
- Power adapter
- Quick start guide
- Screws and anchors
Perfect for:
- Home security
- Baby monitoring
- Pet watching
- Front door monitoring
30-day money-back guarantee
2-year manufacturer warranty
Free technical support
"""
expected = """An e-commerce product description that must:
1. Include clear product name and model
2. List key features and specifications
3. Specify technical details and compatibility
4. Describe package contents
5. Include warranty and support information"""
asserter.assert_behavioral_match(actual, expected)
- Shows comprehensive product information structure
- Validates technical specifications inclusion
- Confirms compatibility details
- Demonstrates complete package contents listing
- Verifies warranty and support information
Pattern: Assignment Feedback Testing
Scenario: Testing student assignment feedback generation
Implementation:
actual = """
Assignment Feedback
Student ID: STU-2024-456
Assignment: Research Paper on Climate Change
Submission Date: November 22, 2024
Overall Assessment:
Your research paper demonstrates good understanding of climate change basics.
The writing is clear and well-structured, with appropriate use of scientific
terminology throughout the document.
Strengths:
- Strong introduction that sets context
- Good use of current scientific data
- Clear paragraph structure
- Proper citation format
Areas Noted:
- Some statistical interpretations could be more precise
- Additional peer-reviewed sources would strengthen arguments
- Conclusion could be more comprehensive
Grade: B+ (88/100)
Additional Comments:
The paper shows promise and indicates solid research skills. Your analysis
of temperature data trends was particularly well-done. Consider expanding
your discussion of potential mitigation strategies in future work.
"""
expected = """An assignment feedback document that must:
1. Include basic assignment and student information
2. Provide specific strengths and weaknesses
3. List concrete steps for improvement
4. Reference specific learning objectives
5. Include grading criteria and score"""
asserter.assert_behavioral_match(actual, expected)
Result: ❌ FAIL
- Fails because concrete improvement steps are missing
- Missing specific learning objectives
- Shows basic assignment information
- Includes strengths and weaknesses
- Demonstrates grading information
Pattern: Real Estate Listing Testing
Scenario: Testing real estate property listing generation
Implementation:
actual = """
Stunning Modern Oasis in Prime Location
123 Maple Avenue, Riverside Heights
Discover urban elegance in this meticulously updated contemporary home, where
modern luxury meets practical living. This 2,400 sq ft residence seamlessly
blends indoor and outdoor living spaces.
Property Highlights:
- 4 bedrooms, 2.5 bathrooms
- Built: 2018
- Lot size: 0.25 acres
- Two-car attached garage
- Energy-efficient smart home features
Interior Features:
The open-concept main level showcases:
- Chef's kitchen with quartz countertops
- Custom Italian cabinetry
- Premium stainless steel appliances
- Expansive living room with 12-foot ceilings
- Primary suite with spa-inspired bathroom
Outdoor Living:
- Professional landscaping
- Covered patio with built-in BBQ
- Low-maintenance xeriscaping
- Private backyard retreat
Location Benefits:
- Walking distance to Central Park
- Top-rated school district
- 10 minutes to downtown
- Easy access to major highways
Recent Updates:
- New HVAC system (2023)
- Smart home integration
- Updated LED lighting
- Fresh interior paint
Price: $875,000
Available for immediate viewing
Virtual tour link: [URL]
"""
expected = """A real estate listing that must:
1. Include property overview and key features
2. List specific amenities and updates
3. Describe location benefits
4. Use engaging, descriptive language
5. Provide essential details (size, bedrooms, price)"""
asserter.assert_behavioral_match(actual, expected)
Result: ✅ PASS
- Shows comprehensive property information
- Validates specific amenities and features
- Confirms location benefits inclusion
- Demonstrates engaging descriptive language
- Verifies essential property details
Pattern: Interview Feedback Testing
Scenario: Testing interview feedback generation for technical position
Implementation:
def test_interview_feedback_missing_criteria(self, asserter):
"""Test semantic matching for interview feedback generation. Should fail due to
missing evaluation criteria and specific examples."""
actual = """
Interview Feedback Summary
Candidate ID: INT-2024-789
Position: Senior Software Engineer
Interview Date: November 22, 2024
Overall Impression:
The candidate demonstrated strong technical knowledge and communicated well
throughout the interview. They showed enthusiasm for the role and our company's
mission.
Discussion Points:
- Previous experience with cloud architecture
- Team collaboration approaches
- Problem-solving methodology
- Career goals and aspirations
Technical Discussion:
Candidate showed familiarity with:
- Microservices architecture
- CI/CD pipelines
- Cloud platforms (AWS, Azure)
- Agile development practices
Cultural Fit:
Appears to align well with our company values and team dynamics.
Demonstrated good communication skills and collaborative mindset.
Next Steps:
Proceed with reference checks if moving forward.
Schedule follow-up with hiring manager.
"""
expected = """An interview feedback document that must:
1. Include candidate and position information
2. List specific evaluation criteria with ratings
3. Provide concrete examples of responses
4. Include technical assessment scores
5. Offer clear hiring recommendation"""
asserter.assert_behavioral_match(actual, expected)
- Fails because evaluation criteria lack ratings
- Missing concrete response examples
- Missing technical assessment scores
- Shows basic candidate information
- Lacks clear hiring recommendation
Pattern: Customer Service Response Testing
Scenario: Testing customer service ticket response generation
Implementation:
actual = """
Ticket Analysis and Response
Ticket ID: CS-2024-1122
Priority: Medium
Category: Product Return
Customer Query Summary:
Customer purchased a wireless headphone (Model: WH-1000XM4) three days ago
and is experiencing connectivity issues with their iPhone 13. Initial
troubleshooting steps were attempted without success.
Issue Analysis:
- Product is within return window (3 of 30 days)
- Common compatibility issue identified
- Troubleshooting already attempted
- Customer tone indicates frustration
Recommended Response:
Dear [Customer Name],
Thank you for reaching out about the connectivity issues with your WH-1000XM4
headphones. I understand how frustrating technical issues can be, especially
with a new purchase.
Based on your description, I can offer you two immediate solutions:
1. Advanced Troubleshooting:
- Reset the headphones (detailed steps attached)
- Update iPhone Bluetooth settings
- Install latest firmware
2. Hassle-free Return:
- Generate return label through our portal
- Full refund processed within 3 business days
- Free return shipping
Would you prefer to try the advanced troubleshooting steps, or would you like
to proceed with the return? I'm here to help with either option.
Next Steps:
- Await customer preference
- Prepare return label if requested
- Schedule follow-up within 24 hours
Response Tone: Empathetic and Solution-focused
Support Resources: KB-2345, RT-6789
"""
expected = """A customer service response that must:
1. Include ticket categorization and priority
2. Summarize the customer's issue accurately
3. Provide multiple solution options
4. Include clear next steps
5. Maintain appropriate tone and empathy"""
asserter.assert_behavioral_match(actual, expected)
Result: ✅ PASS
- Shows comprehensive ticket information structure
- Validates accurate issue summary
- Confirms multiple solution options
- Demonstrates clear next steps
- Verifies empathetic and professional tone