Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve algolia.py input data validation #1046

Merged
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
59b6e20
feat: add validation for search parameters in Algolia API
ahmedxgouda Mar 7, 2025
652b93e
fix: ensure query validation only occurs if query is provided.
ahmedxgouda Mar 7, 2025
f648725
feat: refactor search parameter validation into dedicated functions
ahmedxgouda Mar 7, 2025
356384d
Merge remote-tracking branch 'upstream/main' into feature/improve-alg…
ahmedxgouda Mar 7, 2025
553f128
Revert "feat: refactor search parameter validation into dedicated fun…
ahmedxgouda Mar 7, 2025
126ba14
Reapply "feat: refactor search parameter validation into dedicated fu…
ahmedxgouda Mar 7, 2025
0bdecb8
Revert "Merge remote-tracking branch 'upstream/main' into feature/imp…
ahmedxgouda Mar 7, 2025
c9b22de
fix: reorder import statements in validators.py
ahmedxgouda Mar 7, 2025
b909229
feat: add module docstring for search parameter validators
ahmedxgouda Mar 7, 2025
bc31967
feat: refactor the returned lambda function parameter name to be unli…
ahmedxgouda Mar 7, 2025
2208c9e
Merge branch 'main' into feature/improve-algolia-input-validation
ahmedxgouda Mar 7, 2025
025ac0b
Resolve conflicts.
ahmedxgouda Mar 8, 2025
4fd7b04
Merge branch 'main' into feature/improve-algolia-input-validation
ahmedxgouda Mar 8, 2025
7483d64
Resolve conflicts manually.
ahmedxgouda Mar 8, 2025
29885fc
Refactoring for the checks.
ahmedxgouda Mar 8, 2025
c299653
Merge branch 'main' into feature/improve-algolia-input-validation
ahmedxgouda Mar 8, 2025
4078f01
Fix formatting in algolia.py for consistency
ahmedxgouda Mar 8, 2025
8294e1b
Refactor the validations to use warlus operator, and improving the pa…
ahmedxgouda Mar 9, 2025
6fbe8de
Merge branch 'main' into feature/improve-algolia-input-validation
ahmedxgouda Mar 9, 2025
c2c66e9
Separate the validation logic from the HTTP response logic.
ahmedxgouda Mar 9, 2025
0518f12
Merge branch 'main' into feature/improve-algolia-input-validation
arkid15r Mar 11, 2025
623aea9
Refactor validation logic in Algolia API to raise ValidationError for…
ahmedxgouda Mar 11, 2025
407f92a
Merge branch 'main' into feature/improve-algolia-input-validation
ahmedxgouda Mar 11, 2025
1f2b483
Merge branch 'main' into feature/improve-algolia-input-validation
ahmedxgouda Mar 11, 2025
f97a6f6
Update code
arkid15r Mar 11, 2025
a1288d2
Update hitsPerPage validation message and add tests for invalid Algol…
ahmedxgouda Mar 12, 2025
1161e00
Merge branch 'main' into feature/improve-algolia-input-validation
ahmedxgouda Mar 12, 2025
575ec75
Create validators_test.py and add the relevant validations.
ahmedxgouda Mar 13, 2025
e476e5c
Merge branch 'main' into feature/improve-algolia-input-validation
ahmedxgouda Mar 13, 2025
b6a64ce
Refactoring for the checks.
ahmedxgouda Mar 13, 2025
a7a75fb
Merge branch 'main' into pr/ahmedxgouda/1046
arkid15r Mar 14, 2025
5a6cc75
Update code
arkid15r Mar 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions backend/apps/core/api/algolia.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,13 @@
from algoliasearch.http.exceptions import AlgoliaException
from django.conf import settings
from django.core.cache import cache
from django.core.exceptions import ValidationError
from django.http import JsonResponse

from apps.common.index import IndexBase
from apps.common.utils import get_user_ip_address
from apps.core.utils.params_mapping import get_params_for_index
from apps.core.validators import validate_search_params

CACHE_PREFIX = "algolia_proxy"
CACHE_TTL_IN_SECONDS = 3600 # 1 hour
Expand Down Expand Up @@ -50,19 +52,24 @@ def algolia_search(request):
try:
data = json.loads(request.body)

try:
validate_search_params(data)
except ValidationError as error:
return JsonResponse({"error": error.message}, status=400)

facet_filters = data.get("facetFilters", [])
index_name = data.get("indexName")

limit = data.get("hitsPerPage", 25)
page = data.get("page", 1)
ip_address = get_user_ip_address(request)
limit = int(data.get("hitsPerPage", 25))
page = int(data.get("page", 1))
query = data.get("query", "")

cache_key = f"{CACHE_PREFIX}:{index_name}:{query}:{page}:{limit}"
if index_name == "chapters":
cache_key = f"{cache_key}:{ip_address}"

result = cache.get(cache_key)
if result is not None:
if result := cache.get(cache_key):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this for more strict check we had before.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not review anything BTW. It is a glitch in GitHub pull requests extension in VS Code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think it is more strict?

return JsonResponse(result)

result = get_search_results(
Expand Down
82 changes: 82 additions & 0 deletions backend/apps/core/validators.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
"""Validators for the search parameters of the Algolia endpoint."""

import re

from django.core.exceptions import ValidationError
from django.core.validators import validate_slug


def validate_index_name(index_name):
"""Validate index name."""
if not index_name or not isinstance(index_name, str):
message = "indexName is required and must be a string."
raise ValidationError(message)
try:
validate_slug(index_name)
except ValidationError:
message = (
"Invalid indexName value provided. "
"Only alphanumeric characters hyphens and underscores are allowed."
)
raise ValidationError(message) from None


def validate_limit(limit):
"""Validate limit."""
if not isinstance(limit, int):
message = "hitsPerPage must be an integer."
raise ValidationError(message)

max_limit = 100
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use 1000 in some cases.

min_limit = 1

if limit < min_limit or limit > max_limit:
message = "hitsPerPage value must be between 1 and 100."
raise ValidationError(message)


def validate_page(page):
"""Validate page."""
if not isinstance(page, int):
message = "page must be an integer."
raise ValidationError(message)

if page <= 0:
message = "page value must be greater than 0."
raise ValidationError(message)


def validate_query(query):
"""Validate query."""
if query:
if not isinstance(query, str):
message = "query must be a string."
raise ValidationError(message)
if not re.match(r"^[a-zA-Z0-9-_ ]*$", query):
message = (
"Invalid query value provided. "
"Only alphanumeric characters, hyphens, spaces and underscores are allowed."
)
raise ValidationError(message)


def validate_facet_filters(facet_filters):
"""Validate facet filters."""
if not isinstance(facet_filters, list):
message = "facetFilters must be a list."
raise ValidationError(message)


def validate_search_params(data):
"""Validate search parameters."""
index_name = data.get("indexName")
limit = data.get("hitsPerPage", 25)
page = data.get("page", 1)
query = data.get("query", "")
facet_filters = data.get("facetFilters", [])

validate_index_name(index_name)
validate_limit(limit)
validate_page(page)
validate_query(query)
validate_facet_filters(facet_filters)