|
| 1 | +# Handling File Uploads in GraphQL |
| 2 | + |
| 3 | +GraphQL was not designed with file uploads in mind. While it’s technically possible to implement them, doing so requires |
| 4 | +extending the transport layer and introduces several risks, both in security and reliability. |
| 5 | + |
| 6 | +This guide explains why file uploads via GraphQL are problematic and presents safer alternatives. |
| 7 | + |
| 8 | +## Why uploads are challenging |
| 9 | + |
| 10 | +The [GraphQL specification](https://spec.graphql.org/draft/) is transport-agnostic and serialization-agnostic (though HTTP and JSON are the most prevalent combination seen in the community). |
| 11 | +GraphQL was designed to work with relatively small requests from clients, and was not designed with handling binary data in mind. |
| 12 | + |
| 13 | +File uploads, by contrast, typically handle binary data such as images and PDFs — something many encodings, including JSON, cannot handle directly. |
| 14 | +One option is to encode within our encoding (e.g. use a base64-encoded string within our JSON), but this is inefficient and is not suitable for larger binary files as it does not support streamed processing easily. |
| 15 | +Instead, `multipart/form-data` is a common choice for transferring binary data; but it is not without its own set of complexities. |
| 16 | + |
| 17 | +Supporting uploads over GraphQL usually involves adopting community conventions, the most prevalent of which is the |
| 18 | +[GraphQL multipart request specification](https://github.com/jaydenseric/graphql-multipart-request-spec). |
| 19 | +This specification has been successfully implemented in many languages and frameworks, but users |
| 20 | +implementing it must pay very close attention to ensure that they do not introduce |
| 21 | +security or reliability concerns. |
| 22 | + |
| 23 | +## Risks to be aware of |
| 24 | + |
| 25 | +### Memory exhaustion from repeated variables |
| 26 | + |
| 27 | +GraphQL operations allow the same variable to be referenced multiple times. If a file upload variable is reused, the underlying |
| 28 | +stream may be read multiple times or prematurely drained. This can result in incorrect behavior or memory exhaustion. |
| 29 | + |
| 30 | +A safe practice is to use trusted documents or a validation rule to ensure each upload variable is referenced exactly once. |
| 31 | + |
| 32 | +### Stream leaks on failed operations |
| 33 | + |
| 34 | +GraphQL executes in phases: validation, then execution. If validation fails or an authorization check prematurely terminates execution, uploaded |
| 35 | +file streams may never be consumed. If your server buffers or retains these streams, it can cause memory leaks. |
| 36 | + |
| 37 | +To avoid this, ensure that all streams are terminated when the request finishes, whether or not they were consumed in resolvers. |
| 38 | +An alternative to consider is writing incoming files to temporary storage immediately, and passing references (like filenames) into |
| 39 | +resolvers. Ensure this storage is cleaned up after request completion, regardless of success or failure. |
| 40 | + |
| 41 | +### Cross-Site Request Forgery (CSRF) |
| 42 | + |
| 43 | +`multipart/form-data` is classified as a “simple” request in the CORS spec and does not trigger a preflight check. Without |
| 44 | +explicit CSRF protection, your GraphQL server may unknowingly accept uploads from malicious origins. |
| 45 | + |
| 46 | +### Oversized or excess payloads |
| 47 | + |
| 48 | +Attackers may submit very large uploads or include extraneous files under unused variable names. Servers that accept and |
| 49 | +buffer these can be overwhelmed. |
| 50 | + |
| 51 | +Enforce request size caps and reject any files not explicitly referenced in the map field of the multipart payload. |
| 52 | + |
| 53 | +### Untrusted file metadata |
| 54 | + |
| 55 | +Information such as file names, MIME types, and contents should never be trusted. To mitigate risk: |
| 56 | + |
| 57 | +- Sanitize filenames to prevent path traversal or injection issues. |
| 58 | +- Sniff file types independently of declared MIME types, and reject mismatches. |
| 59 | +- Validate file contents. Be aware of format-specific exploits like zip bombs or maliciously crafted PDFs. |
| 60 | + |
| 61 | +## Recommendation: Use signed URLs |
| 62 | + |
| 63 | +The most secure and scalable approach is to avoid uploading files through GraphQL entirely. Instead: |
| 64 | + |
| 65 | +1. Use a GraphQL mutation to request a signed upload URL from your storage provider (e.g., Amazon S3). |
| 66 | +2. Upload the file directly from the client using that URL. |
| 67 | +3. Submit a second mutation to associate the uploaded file with your application’s data (or use an automatically triggered process, such as Amazon Lambda, to do the same). |
| 68 | + |
| 69 | +You should ensure that these file uploads are only retained for a short period such that an attacker completing only steps 1 and 2 will not exhaust your storage. |
| 70 | +When processing the file upload (step 3), the file should be moved to more permanent storage as appropriate. |
| 71 | + |
| 72 | +This separates responsibilities cleanly, protects your server from binary data handling, and aligns with best practices for |
| 73 | +modern web architecture. |
| 74 | + |
| 75 | +## If you still choose to support uploads |
| 76 | + |
| 77 | +If your application truly requires file uploads through GraphQL, proceed with caution. At a minimum, you should: |
| 78 | + |
| 79 | +- Use a well-maintained implementation of the |
| 80 | +[GraphQL multipart request spec](https://github.com/jaydenseric/graphql-multipart-request-spec). |
| 81 | +- Enforce a rule that upload variables are only referenced once. |
| 82 | +- Stream uploads to disk or cloud storage—avoid buffering them in memory. |
| 83 | +- Ensure that streams are always terminated when the request ends, whether or not they were consumed. |
| 84 | +- Apply strict request size limits and validate all fields. |
| 85 | +- Treat file names, types, and contents as untrusted data. |
0 commit comments