Technical Documentation: AI Calendar Tools

Purpose: This documentation is designed for developers and future Replit Agent sessions to quickly understand how the AI-powered school calendar analyzer and parenting plan analyzer are implemented.

Architecture Overview

Core Components

main.py
├── OpenAI Client Setup (get_openai_client)
├── PDF Text Extraction (extract_text_from_pdf)
├── School Calendar Analyzer
│   ├── Two-Pass Analysis System
│   ├── Date Merging Logic
│   └── Date Inference Engine
├── Parenting Plan Analyzer
│   ├── Basic Analysis (SYSTEM_PROMPT)
│   └── Enhanced Analysis with Form Snapshot (ENHANCED_PARENTING_PLAN_PROMPT)
└── Drafting Audit Report Generator
        

AI Integration

The system uses OpenAI's GPT-4o model via two possible configurations:

  1. User API Key (OPENAI_API_KEY) - Direct connection to OpenAI, works in all environments
  2. Replit Integration (AI_INTEGRATIONS_OPENAI_API_KEY + AI_INTEGRATIONS_OPENAI_BASE_URL) - Works in both development and production

Priority: User API key is checked first, then Replit integration as fallback.

Troubleshooting: AI Features Hanging in Production

Symptom: Features like "Extracting calendar dates..." or "Analyzing parenting plan..." spin indefinitely in production but work in development.

Root Cause: The get_openai_client() function may have a restriction that prevents the Replit AI integration from initializing in production.

Fix (December 2024): Ensure the Replit integration check in get_openai_client() does NOT require "localhost" in the base URL. The condition should be:

# CORRECT - works in both dev and production:
if replit_api_key and replit_base_url:

# WRONG - only works in development:
if replit_api_key and replit_base_url and "localhost" in replit_base_url:

Why: In development, the base URL contains "localhost". In production, Replit provides a different proxy URL (e.g., https://proxy.replit.com/...). Both are valid.

Verification: Test with curl http://localhost:5000/test_openai - should return {"success": true}.

PDF Processing

Text extraction uses a two-tier approach:

  1. Primary: pdfplumber for native text extraction
  2. Fallback: pytesseract + pdf2image for OCR on scanned documents

School Calendar Analyzer

Two-Pass Analysis System

The school calendar analyzer uses a sophisticated two-pass system to maximize accuracy:

Pass 1: Raw Date Extraction (AI)

Function: extract_raw_calendar_dates(text)

Prompt: SCHOOL_CALENDAR_RAW_EXTRACTION_PROMPT

The AI extracts every marked date from the calendar with:

  • Date and optional end date (for ranges)
  • Label (exact text from calendar)
  • Category (holiday, break, teacher_day, student_holiday, early_release, other)
  • isStudentDayOff flag
  • Visual indicator description (shading, colors, etc.)

Pass 2: Merge and Normalize (Python)

Function: merge_and_normalize_breaks(raw_result)

Deterministic Python logic that:

  1. Parses dates and filters for student days off
  2. Sorts entries chronologically
  3. Merges adjacent dates (gap ≤ 1 day, or Friday-Monday patterns with gap ≤ 3)
  4. Handles month boundary merging (end of month to start of next)
  5. Normalizes break names based on month and label content

Key Merging Logic

# Merge conditions (lines ~1707-1724 in main.py):
should_merge = False

# Adjacent or same day
if gap_days <= 1:
    should_merge = True

# Weekend spanning (Friday to Monday patterns)
elif gap_days <= 3:
    if current_end.weekday() == 4 and entry_start.weekday() == 0:  # Fri to Mon
        should_merge = True
    # ... additional weekend patterns

# Month boundary (late month to early next month)
if current_end.month != entry_start.month:
    if current_end.day >= 28 and entry_start.day <= 5:
        if gap_days <= 5:
            should_merge = True
        

Break Naming Convention

Function: get_break_name(month, label, is_multi_day)

MonthStandard NameNotes
OctoberFall BreakAny multi-day break in October
NovemberThanksgiving BreakAround Thanksgiving Day
DecemberChristmas BreakANY break in December, even if labeled "Winter Break"
FebruaryWinter BreakUsually around Presidents Day
March/AprilSpring BreakMulti-day breaks in these months

Student Holiday Detection

A date is marked as a student day off if any of these conditions are true:

  • isStudentDayOff == true in the AI response
  • Label or notes contain "student holiday" or "student day off"
  • Category is teacher_day/teacher_planning AND (isStudentDayOff OR label contains "student holiday")

Key principle: A break ends on the last day BEFORE school resumes. Any Student Holiday following a break extends that break.

CRITICAL: Christmas Break Extension into January

Problem: This is the most common bug - failing to append January student holidays to Christmas Break.

Example from Gwinnett County 2025-26:

December 22-31: Winter Break (School Holidays)
January 1: Winter Break (School Holidays)
January 2: Teacher Planning/Staff Development [#8-9] (Student Holiday)  <-- THIS MUST BE INCLUDED

CORRECT: Christmas Break = Dec 22 - Jan 2
WRONG: Christmas Break = Dec 22 - Jan 1  (missing Jan 2)

Root Cause: In PDF parsing, "(Student Holiday)" may appear on a separate line from "Teacher Planning", causing the AI to miss it.

Fix locations:

  • SCHOOL_CALENDAR_RAW_EXTRACTION_PROMPT - Rules 11-14 explicitly handle January dates
  • SCHOOL_CALENDAR_SYSTEM_PROMPT - "CHRISTMAS BREAK EXTENSION INTO JANUARY" section
  • merge_and_normalize_breaks() - Python merge logic handles adjacent days

DO NOT MODIFY these sections without understanding the full extraction + merge pipeline.

Parenting Plan Analyzer

Two Analysis Modes

Basic Analysis (No Form Snapshot)

Function: analyze_with_openai(text)

Prompt: SYSTEM_PROMPT

Extracts scheduling information without school calendar context.

Enhanced Analysis (With Form Snapshot)

Function: analyze_with_openai(text, form_snapshot)

Prompt: ENHANCED_PARENTING_PLAN_PROMPT

When the frontend provides a form snapshot (including school calendar dates), the AI can:

  • Apply date correction rules from the parenting plan
  • Compute adjusted date ranges based on provisions like "begins when school dismisses"
  • Return corrected dateFields with reasoning

Form Snapshot Structure

{
  "dateFields": [
    {
      "name": "christmas_break_start_even",
      "label": "Christmas Break Start (Even Years)",
      "currentValue": "2026-12-21"
    },
    // ... more date fields
  ],
  "schoolCalendar": {
    "schoolName": "Gwinnett County",
    "holidays": [...]
  }
}
        

Response Structure

{
  "parentA": "Mother's Name",
  "parentB": "Father's Name",
  "weeklySchedule": { ... },
  "holidaySchedule": {
    "christmasBreak": {
      "evenYears": { "option": "Parent A", "reasoning": "..." },
      "oddYears": { "option": "Parent B", "reasoning": "..." }
    },
    // ... other holidays
  },
  "summerSchedule": { ... },
  "detectedRules": [
    {
      "breakName": "Christmas Break",
      "provision": "begins when school dismisses for the break",
      "effect": "Start date adjusted to Friday before official start"
    }
  ],
  "correctedDateFields": [
    {
      "name": "christmas_break_start_even",
      "originalValue": "2026-12-21",
      "correctedValue": "2026-12-18",
      "reasoning": "School dismisses Friday Dec 18 for break starting Dec 21"
    }
  ],
  "shortSummary": "...",
  "longSummary": "...",
  "confidence": "high|medium|low"
}
        

Date Correction UI

Fields with corrected values receive the CSS class ai-adjusted which triggers a pulse animation to draw attention to AI-corrected values.

Date Inference System

Function: infer_missing_years(calendar_result)

Purpose

School calendars typically show only one academic year. This function infers dates for adjacent years to support 24-month calendar generation.

Inference Methods

  1. Federal Holidays - Uses official rules (e.g., MLK Day = 3rd Monday in January)
  2. Template-Based Pattern Matching - For breaks like Fall Break and Winter Break (February), preserves exact weekday + week + duration pattern
  3. Christmas Break - Maintains relative position to Dec 25
  4. Thanksgiving - Anchors to 4th Thursday in November

Template-Based Break Inference (Critical Algorithm)

Important: For Fall Break and Winter Break (February), the inference must preserve the exact pattern from the source calendar.

Function: infer_break_by_pattern(target_year, source_holiday)

What Gets Preserved

  1. Nth Full Week - Which full week of the month (e.g., 2nd full Monday-Sunday week of October)
  2. Weekday Offset - Which day of the week the break starts (e.g., Thursday = weekday 3)
  3. Duration - How many days the break lasts (e.g., 5 days)

Algorithm

  1. Extract source start weekday: source_weekday = source_start.weekday() (Mon=0, Sun=6)
  2. Determine which Nth full week the source falls in: nth_full_week = get_nth_full_week(source_start)
  3. Calculate duration: duration = (source_end - source_start).days
  4. Find the Monday of that Nth full week in the target year: week_monday = get_nth_full_week_start(target_year, month, nth_full_week)
  5. Apply weekday offset: inferred_start = week_monday + timedelta(days=source_weekday)
  6. Apply duration: inferred_end = inferred_start + timedelta(days=duration)

Example: Fall Break

Source (2025)Pattern ExtractedInferred (2026)
Oct 9-13, 2025Thursday of 2nd full week, 4 daysOct 8-12, 2026

Why Oct 8? October 2026 starts on Thursday. First full week starts Monday Oct 5. Second full week starts Monday Oct 12. Wait - we need to check: Oct 9, 2025 = Thursday. The 2nd full week of October 2025 starts Monday Oct 6, so Thursday of that week = Oct 9. In 2026, 2nd full week starts Monday Oct 12, so Thursday = Oct 15.

Example: Winter Break (February)

Source (2026)Pattern ExtractedInferred (2027)
Feb 12-16, 2026Thursday of 2nd full week, 4 daysFeb 11-15, 2027

Key Functions

  • get_nth_weekday_of_month(year, month, weekday, n) - Gets nth occurrence of weekday
  • get_nth_full_week(dt) - Determines which full week a date falls in
  • get_nth_full_week_start(year, month, nth) - Gets Monday of the Nth full week in a month
  • infer_break_by_pattern(target_year, source_holiday) - Applies template to infer break dates
  • get_federal_holiday_date(name, year) - Returns official federal holiday dates
  • infer_christmas_break(year, source_holiday) - Special handling for Christmas Break

Marking Inferred Dates

Inferred entries include "inferred": true in their JSON and are displayed with a star symbol in the UI.

Visual Shading Extraction (Fallback)

Function: add_missing_breaks_from_shading(calendar_result, shading_info)

Purpose

Some school calendar PDFs have complex layouts (e.g., two-column formats) that produce garbled text when extracted. This function supplements text-based extraction by detecting visually shaded day cells in the calendar grid.

Two-Column Layout Handling

Function: extract_shading_from_calendar(pdf_path)

  • Month headers are assigned to left or right page halves based on x-position
  • Left half: x < page_width / 2
  • Right half: x >= page_width / 2
  • Day cells are associated with the nearest month header in their half

Weekend-Aware Gap Bridging (Critical Algorithm)

Important: This is a critical business rule. Do not modify without understanding the full logic.

Business Rule: When reconstructing break date ranges from detected shaded days, gaps between detected days are ONLY bridged if ALL gap days fall on a weekend (Saturday or Sunday).

Rationale

Children don't attend school on weekends. If school is closed Friday and Monday, it's logically one continuous break (the weekend is implicitly included). However, if we detect Monday and Thursday as shaded, the gap includes Tuesday and Wednesday—actual school days—so those should NOT be merged.

Helper Function

def is_weekend_gap(year, month, day1, day2):
    """Check if all days between day1 and day2 (exclusive) are weekends."""
    if day2 <= day1 + 1:
        return True  # Consecutive days, no gap
    for d in range(day1 + 1, day2):
        try:
            dt = datetime(year, month, d)
            if dt.weekday() < 5:  # Mon=0, Sun=6; < 5 means weekday
                return False
        except ValueError:
            return False
    return True
        

Algorithm

  1. Group detected shaded days by month
  2. Sort days within each month
  3. Determine the year for each month from schoolYear (Jan-Jun = second year, Jul-Dec = first year)
  4. Build consecutive runs:
    • If next day is immediately consecutive (d == end + 1) → extend run
    • If gap contains ONLY weekend days → extend run
    • If gap contains ANY weekday → start new run
  5. Create breaks from runs with at least 2 days

Example: February 2026

Detected DaysDay of WeekGap Analysis
Feb 6Friday
Feb 10TuesdayGap (7, 8, 9) = Sat, Sun, Mon → includes weekday → NEW RUN
Feb 12ThursdayGap (11) = Wed → weekday → NEW RUN
Feb 13FridayConsecutive → extend run
Feb 16MondayGap (14, 15) = Sat, Sun → all weekend → BRIDGE

Result: Winter Break detected as February 12-16, 2026 (Thursday through Monday, bridging the weekend)

Shading Detection Notes

  • Detects cells with gray/dark background colors
  • Aggregates individual digit characters into complete day numbers
  • Associates each day number with the correct month based on column position
  • This is a fallback mechanism—primary extraction is text-based AI analysis

Early Release Days (Critical Exclusion)

Important: Early Release days are NOT student days off.

Business Rule: Early Release days (e.g., "Early Release for High School Exams") are days when students are still in school but dismissed early. These should NEVER be merged with adjacent breaks.

Example

DateLabelIs Day Off?
Dec 17-19Early Release for High School ExamsNO - students still in school
Dec 22-31Winter Break (School Holidays)YES - Christmas Break

Result: Christmas Break starts December 22, NOT December 17.

Implementation

  • AI prompt explicitly instructs: Early Release = isStudentDayOff: false and category: "early_release"
  • Merge logic skips entries with "early release" in the label (continue statement)
  • Early Release entries are not added to the date_entries list that feeds the merge algorithm

Date Range Boundary Rules (Critical)

Important: Date ranges must respect hyphen-bound boundaries. Do not extend ranges based on adjacent unrelated text.

Business Rule: When a calendar entry shows a date range like "6-10 Spring Break", the extraction must use ONLY those dates (April 6-10), even if other numbers appear in nearby text (e.g., "13 Students Return").

Problem Scenario

Calendar TextIncorrect ExtractionCorrect Extraction
"6-10 Spring Break (School Holidays) ... 13 Students Return"April 6-13 (wrong!)April 6-10 (correct)

The "13" from "Students Return" is unrelated—it indicates when school resumes, not the break end date.

AI Prompt Rules (19-22)

  • Rule 19: Only use the hyphen-bound date range from the label
  • Rule 20: Do NOT extend ranges by including unrelated numbers from nearby entries
  • Rule 21: Use EXACTLY the stated start/end days when hyphen notation is present (e.g., "22-31", "6-10")
  • Rule 22: When shading shows specific days, use ONLY those shaded days as the range

Validation Priority

  1. Explicit hyphen-bound range in text (e.g., "6-10") → use exactly those days
  2. Visual shading confirmation → cross-reference with text extraction
  3. If text and shading conflict → prefer shading (visual truth over OCR errors)

Current/Ongoing Break Handling (Critical)

Important: When the analyzer runs during an active break, a third break occurrence may need storage.

Business Rule: When extracting a school calendar during an ongoing break (e.g., running the analyzer on Dec 26 during Christmas break), the system may infer three occurrences of the same break type with the same even/odd parity.

Problem Scenario

Uploaded CalendarInferred BreaksIssue
2026-27 School Year (Christmas Dec 21-31, 2026)Dec 15, 2025 - Jan 1, 2026 (odd)
Dec 20-27, 2027 (odd)
Both are odd years → one overwrites the other

Solution: Consolidated Current Break Fields

  • Three hidden form fields store the ongoing break: currentBreakStart, currentBreakEnd, currentBreakType
  • Only one break can be active at a time, so a single set of fields suffices
  • applyExtractedDates() checks if today falls within a break's date range
  • If a slot conflict occurs (same even/odd parity), the ongoing break is stored in "current" fields
  • handleCurrentBreak() uses currentBreakType to look up the correct parent selector

Flow

  1. Extract holidays from uploaded calendar
  2. Infer missing years (backward and forward)
  3. For each break, check if today falls within the date range
  4. If two breaks share the same even/odd slot, store the current break separately
  5. During calendar generation, apply both even/odd breaks AND current breaks

API Endpoints

POST /extract_school_calendar

Analyzes uploaded school calendar PDF.

InputPDF file (multipart form)
ProcessExtract text → Two-pass analysis → Infer missing years
OutputJSON with holidays array, omittedHolidays, metadata

POST /analyze_document

Analyzes uploaded parenting plan PDF.

InputPDF file + optional formSnapshot (JSON string)
ProcessExtract text → AI analysis (basic or enhanced)
OutputJSON with scheduling information, date corrections, summaries

POST /generate_audit_report

Generates drafting audit of parenting plan.

InputPDF file
ProcessExtract text → Audit analysis with specialized prompt
OutputJSON with findings categorized by severity

Key Conventions

Break Naming Rules (Critical)

  • Christmas Break = ANY break in December (even if document says "Winter Break")
  • Winter Break = Breaks in February only
  • Fall Break = October breaks (even if labeled differently)

Date Formatting

All dates use YYYY-MM-DD format (ISO 8601).

Break End Date Logic

A break ends on the last day BEFORE school resumes. If a Teacher Planning Day (Student Holiday) follows a break, it extends that break.

File Size Limits

PDF uploads are limited to 10MB.

Error Handling

API endpoints track the current step and return failed_at_step in error responses for debugging.

Protected Files

Per user preferences, do not modify without explicit approval:

  • auth.py
  • payments.py
  • models.py
  • config.py
  • extensions.py