Skip to content

N+1 Query Pattern in get_data_for_bar API #772

@mrrobot47

Description

@mrrobot47

Severity: High

Category: Database/API

File: next_pms/timesheet/api/timesheet.py

Lines: 361-373

Description

The get_data_for_bar API (used for timesheet visualization) uses get_doc() in a list comprehension to load Timesheet Detail records, creating an N+1 query pattern.

Code Evidence

timesheet_logs = frappe.get_list(
    "Timesheet",
    filters={
        "employee": employee,
        "start_date": ["in", dates],
        "docstatus": ["!=", 2],
    },
    fields=["time_logs.name"],  # Just gets names
    ignore_permissions=employee_has_higher_access(employee, ptype="read"),
)
if not timesheet_logs:
    return [data, total_hours]

# N+1 Pattern - get_doc per timesheet log
timesheet_logs = [frappe.get_doc("Timesheet Detail", ts.name) for ts in timesheet_logs]

Impact

For an employee viewing 5 days with 20 timesheet entries:

  • 1 initial query to get log names
  • 20 additional get_doc() queries to load full details

Total: 21 queries per API call

This API is likely called frequently for the timesheet bar chart visualization.

Proposed Solution

Use frappe.get_all() with the required fields instead of get_doc():

def get_data_for_bar(employee, dates, week):
    data = {}
    total_hours = 0
    
    # Get timesheet logs with all needed fields in one query
    timesheet_logs = frappe.get_all(
        "Timesheet Detail",
        filters={
            "parent": ["in", frappe.get_all(
                "Timesheet",
                filters={
                    "employee": employee,
                    "start_date": ["in", dates],
                    "docstatus": ["!=", 2],
                },
                pluck="name",
            )],
        },
        fields=["name", "task", "hours", "from_time", "to_time", "description", 
                "activity_type", "parent", "parenttype"],
    )
    
    if not timesheet_logs:
        return [data, total_hours]
    
    # Batch fetch task details
    task_ids = list(set([ts.task for ts in timesheet_logs if ts.task]))
    task_details = {}
    if task_ids:
        tasks = frappe.get_all(
            "Task",
            filters={"name": ["in", task_ids]},
            fields=[
                "name", "subject", "project.project_name as project_name",
                "project", "custom_is_billable", "expected_time",
                "actual_time", "status", "_liked_by",
            ],
        )
        task_details = {task["name"]: task for task in tasks}
    
    # Process logs with pre-fetched data
    for log in timesheet_logs:
        total_hours += log.hours
        if not log.task:
            continue
        task = task_details.get(log.task)
        if not task:
            continue
        # ... rest of processing

Alternative: Use JOINs

# Single query with JOINs
Timesheet = frappe.qb.DocType("Timesheet")
TimesheetDetail = frappe.qb.DocType("Timesheet Detail")
Task = frappe.qb.DocType("Task")

query = (
    frappe.qb.from_(TimesheetDetail)
    .inner_join(Timesheet)
    .on(TimesheetDetail.parent == Timesheet.name)
    .left_join(Task)
    .on(TimesheetDetail.task == Task.name)
    .where(Timesheet.employee == employee)
    .where(Timesheet.start_date.isin(dates))
    .where(Timesheet.docstatus != 2)
    .select(
        TimesheetDetail.name,
        TimesheetDetail.task,
        TimesheetDetail.hours,
        Task.subject,
        Task.project,
        Task.custom_is_billable,
    )
)

Estimated Performance Gain

  • 90%+ reduction in queries (from 21+ to 2-3 queries)
  • Faster timesheet view loading

Verification

Monitor queries when loading the timesheet bar chart for an employee with 20+ entries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    highHigh severityperformancePerformance optimization

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions