flutter · csells · Dec 20, 2025 · Dec 20, 2025 · Dec 20, 2025 · Dec 20, 2025
@@ -197,6 +197,17 @@ on the [Flutter contributors Discord][]!
 
 [Flutter contributors Discord]: https://github.com/flutter/flutter/blob/main/docs/contributing/Chat.md
 
+### Check links
+
+If you've made changes to the content and you'd like to make sure the site
+builds and that the links resolve properly, then run the following command:
+
+# build the site and check links
+dart run dash_site build && dart run dash_site check-link-references
+If this script reports any errors or warnings, then address those issues and
+rerun the command.
+
+
 ### Refresh code excerpts
 
 A build that fails with the error

@@ -0,0 +1,166 @@
+---
+title: Developer experience
+description: >
+  Learn how to use spec-driven development and Gemini to plan, code, and 
+  iterate on high-quality Flutter applications.
+prev:
+  title: Mode of interaction
+  path: /ai-best-practices/mode-of-interaction
+---
+
+
+Generative AI is not just useful for implementing features in your app; it’s
+also useful for generating the code to implement those features.
+
+Unfortunately, it’s just as easy as prompting an AI coding agent to “build a
+Flutter app that solves crossword puzzles.” I’m sure that prompt would yield
+something, but I doubt very much that it would give us the powerful AI-assisted,
+user-validated combination the Crossword Companion provides.
+
+With better prompting, however, the sample app was implemented with Gemini 2.5
+Pro for the bulk of the functionality and Gemini 3 Pro Preview to add the final
+touches. The process to get the best results from both models was the same:
+
+- Plan  
+- Code  
+- Validate  
+- Iterate
+
+### Plan
+
+The goal of the planning process is to kick off the coding process with enough
+detail to let the agent know what you have in mind. The Crossword Companion
+planning process was started with the following prompt:
+
+```plaintext
+I'd like to create a file called requirements.md in the plans folder at the root of the project. here's a description of the project:
+
+The application will be an open-source sample hosted on GitHub in the flutter/demos directory. It aims to demonstrate the use of Flutter, Firebase AI Logic, and Gemini to produce an agentic workflow that can solve a small crossword puzzle (one with a size under 10x10)....lots more description of the app along with a sample puzzle screenshot...
+Ask any questions you may have before you get started.
+```
+
+This prompt, with a little bit of Q&A, manual edits by a human, and some updates
+during the coding process, yielded [the requirements file][requirements].
+
+Before jumping into architectural design, the Gemini CLI was asked to initialize
+the GEMINI.md rules file and then to update it with a list of architectural
+principles:
+
+```plaintext
+DRY (Don’t Repeat Yourself) – eliminate duplicated logic by extracting shared utilities and modules.
+
+Separation of Concerns – each module should handle one distinct responsibility.
+
+Single Responsibility Principle (SRP) – every class/module/function/file should have exactly one reason to change.
+
+Clear Abstractions & Contracts – expose intent through small, stable interfaces and hide implementation details.
+
+Low Coupling, High Cohesion – keep modules self-contained, minimize cross-dependencies.
+
+Scalability & Statelessness – design components to scale horizontally and prefer stateless services when possible.
+
+Observability & Testability – build in logging, metrics, tracing, and ensure components can be unit/integration tested.
+
+KISS (Keep It Simple, Sir) - keep solutions as simple as possible.
+
+YAGNI (You're Not Gonna Need It) – avoid speculative complexity or over-engineering.
+```
+
+The GEMINI.md file is loaded into every new prompt you create with Gemini; it
+provides the set of rules you want it to remember for any activity. Gemini was
+running inside of an empty Flutter app project, so the `/init` command
+documented how to build, test and run it, which was useful during coding.
+
+If you’re building something more than a sample, I also recommend adding
+something for test-driven development:
+
+```markdown
+- **TDD (Test-Driven Development)** - write the tests first; the implementation
+  code isn't done until the tests pass.
+```
+
+This helps to build guardrails to ensure the coding agent is writing solid code
+over time.
+
+With the requirements and rules in place, prompting for the design.md file was
+next:
+
+```plaintext
+great. i'd like to work on the design with you to be created in a design.md file to be stored in the plans folder. please use the @GEMINI.md and @requirements.md files as input. ask any questions you may have before you get started.
+```
+
+After inspecting and editing the generated app design, Gemini was prompted to
+break it down into [tasks][tasks-spec]:
+
+```plaintext
+please read the files in the @specs folder and create a corresponding tasks.md file in the same folder that lays out a set of tasks and subtasks representing the functionality of this app. lay out the top-level tasks as minimal new functionality that the user can see in the running app, step-by-step as each top-level task is completed. each top-level task should include sub-tasks for creating and running tests and updating the @README.md with a description of the current functionality of the app. ask any questions you may have before you get started.
+```
+
+All of this happens before any code is written. You don’t have to split things
+into separate files, but by carefully considering the requirements, the design
+and the task breakdown, you’re helping the agent to provide results that meet
+your expectations. This is called “Spec-Driven Development” and it’s currently
+the best way we know of to upgrade your process from “vibe coding” to
+“AI-assisted software development.”
+
+Also, the sentence that says “ask any questions you may have before you get
+started” is a great way for the agent to clarify anything that it doesn’t
+understand instead of just making up the answers as it goes. It’s also useful to
+help you to decide on details you might not otherwise have considered.
+
+### Code
+
+With the requirements, rules, design and tasks in place, kicking off the coding
+part is easy:
+
+```plaintext
+Read the @tasks.md file and implement the first milestone.
+```
+
+You can watch the coding agent at work, jumping in to correct it as it works, or
+just let it go. Either way, when it’s done, it’s time to check its work.
+
+### Validate
+
+At this point, you have some code and (in the world outside of samples) some
+tests. To validate, ask yourself some questions:
+
+- Does the analyzer show it to be free of errors? Of warnings?  
+- Does the app run?  
+- Does it have the features you asked for? Do they work?  
+- Do the tests pass?  
+- Does the code pass your review?
+
+The questions to these questions form the input to the next phase.
+
+### Iterate
+
+Gather the issues that need to be addressed and hand the ones that need fixing
+back to the coding agent, iterating between it coding and your validation until
+you get to a good place from a functional point of view.
+
+Now take another pass through validation from an architectural principles point
+of view, spinning up a new agent to check the code. By clearing out the agent’s
+context, you remove the biases the original agent gathered choosing what code to
+write in the first place. To ground it on just the code changes the agent has
+just made, use a prompt like this:
+
+```plaintext
+Use git diff to find the new code and check it against the architectural principles listed here: @GEMINI.md. Make recommendations for important improvements.
+```
+
+Doing this a few times keeps the code in good shape for AI agents and humans
+alike.
+
+
+
+
+
+
+
+
+
+[requirements]:
+    {{site.github}}/flutter/demos/blob/main/crossword_companion/specs/requirements.md
+[tasks-spec]:
+    {{site.github}}/flutter/demos/blob/main/crossword_companion/specs/tasks.md
@@ -0,0 +1,61 @@
+---
+title: Flutter AI best practices
+description: >
+  Learn best practices for building AI-powered Flutter apps using guardrails to 
+  verify and correct AI-generated data.
+next:
+  title: Prompting
+  path: /ai-best-practices/prompting
+---
+
+
+Flutter and AI go well together on multiple levels. If you’re using AI to
+generate Flutter code, you only have to generate the code for a single app to
+target multiple platforms. And if you’re harnessing Gemini to implement features
+in your app, the Firebase AI Logic SDK makes that simple, with an easy-to-use
+API, and secure, by keeping the API keys out of your code.
+
+If you’re new to AI for either of these two use cases, you should know: as good
+as it is (and the Gemini 3 Pro Preview is *very* good), AI still makes mistakes.
+If you’re using AI to write your code, then you can use guardrails to keep AI on
+track using tools like the Flutter analyzer and unit tests.
+
+But what do you do when you’re using AI to implement the features in your app,
+knowing that sometimes it’s going to get things wrong? Or, to quote a friend of
+mine:
+
+***Morgan’s Law***  
+*“Eventually, due to the nature of sampling from a probability distribution,
+[AI] will fail to do the thing that must be done.”*  
+*–Brett Morgan, Flutter Developer Relations Engineer, July, 2025\.*
-*–Brett Morgan, Flutter Developer Relations Engineer, July, 2025\.*
+*–Brett Morgan, Flutter Developer Relations Engineer, July, 2025.*
-*–Brett Morgan, Flutter Developer Relations Engineer, July, 2025\.*
+*–Brett Morgan, Flutter Developer Relations Engineer, July, 2025.*
+
+The good news is that, just as you can use developer tools to build guardrails
+around the AI writing your code, you can use Flutter to build guardrails around
+the AI you use to implement your features. The [Crossword Companion
+app][crossword-app] was built to demonstrate these techniques.  
+<img
+src="/assets/images/docs/ai-best-practices/crossword-companion-app-interface-showin.png"
+alt="Crossword Companion app interface showing a 5-step setup process starting
+with selecting a crossword image.">  
+The goal of the Crossword Companion app is not to help you cheat at
+mini-crosswords – although it’s darn good at that – but to illustrate how to
+channel the power of AI using Flutter. As an example, the first thing you do
+when running the app is upload the screenshot of a mini-crossword puzzle. When
+you press the **Next** button, the AI uses that image to infer the size,
+contents and clues of the puzzle:  
+<img
+src="/assets/images/docs/ai-best-practices/crossword-companion-app-showing-a-5x5-gr.png"
+alt="Crossword Companion app showing a 5x5 grid with settings incorrectly
+displaying 4 rows and 5 columns.">  
+Notice that while the crossword puzzle is a 5x5 grid, the AI says it’s 4x5.
+Because we know that mistakes happen (apparently AIs are only human, too), we
+built the app to allow the user to verify and correct the AI-generated data.
+That’s important; bad data leads to bad results.
+
+So this write-up is not about the app in detail but rather about the best
+practices to use when you’re building your own AI apps with Flutter. So let’s
+get to it!
+
+
+
+[crossword-app]: {{site.github}}/flutter/demos/tree/main/crossword_companion
@@ -0,0 +1,108 @@
+---
+title: Mode of interaction
+description: >
+  Learn to balance LLM capabilities with traditional code and implement 
+  guardrails to manage nondeterministic AI behavior.
+prev:
+  title: Tool calls (aka function calls)
+  path: /ai-best-practices/tool-calls-aka-function-calls
+next:
+  title: Developer experience
+  path: /ai-best-practices/developer-experience
+---
+
+
+It’s a mistake to think of a request to an LLM in the same way as calling a
+function. Given the same set of inputs in the same order, a function acts
+predictably. We can write tests and inject faults and harden a function for a
+wide variety of inputs.
+
+An LLM is not like that. A better way to think about it is as if the LLM were a
+user and to treat the data we get from them as such. Like a user, an LLM is
+nondeterministic, often wrong (partially or wholly) and sometimes plain random.
+To guard our apps under these conditions, we need to build the same guardrails
+around LLM input as we do around user input.
+
+If we can do that successfully, then we can bring extraordinary abilities to
+apps in the form of problem solving and creativity that can rival that of a
+human.
+
+### Separation of concerns
+
+LLMs are good at some things and bad at others; the key is to bring them into
+your apps for the good while mitigating the bad. As an example, let’s consider
+the task list in the Crossword Companion:
+
+<img
+src="/assets/images/docs/ai-best-practices/crossword-task-list-showing-solved-clues.png"
+alt="Crossword task list showing solved clues in green with confidence
+percentages and unsolved clues in red">
+
+The task list is the set of clues that need solving. The goal is to use colors
+and solutions in the task list to show progress during the solving process. The
+initial implementation provided the model with a tool for managing the task
+list, asking it to provide updates on progress as it went. Flash could not solve
+the puzzle this way, but Pro could. Unfortunately, it solved it in big chunks,
+only remembering to update the task list once or twice with a big delay in
+between. No amount of prompting could convince it to update the tasks as it
+went. You’ll see the same behavior with modern AI agents managing their own task
+lists; that’s just where we are in the evolution of LLMs at the moment.
+
+So how do we get consistent, deterministic updates of the task list? Take task
+management out of the LLM’s hands and handle it in the code.
+
+To generalize, before applying an LLM solution to a problem you’re facing, ask yourself whether an LLM is the best tool for the job. Is human-like problem solving and creativity worth the tradeoff in unpredictability?
-To generalize, before applying an LLM solution to a problem you’re facing, ask yourself whether an LLM is the best tool for the job. Is human-like problem solving and creativity worth the tradeoff in unpredictability?
+To generalize, before applying an LLM solution to a problem you’re facing, ask
+yourself whether an LLM is the best tool for the job. Is human-like problem
+solving and creativity worth the tradeoff in unpredictability?
-To generalize, before applying an LLM solution to a problem you’re facing, ask yourself whether an LLM is the best tool for the job. Is human-like problem solving and creativity worth the tradeoff in unpredictability?
+To generalize, before applying an LLM solution to a problem you’re facing, ask
+yourself whether an LLM is the best tool for the job. Is human-like problem
+solving and creativity worth the tradeoff in unpredictability?
+
+The answer to that question comes with experimentation. Here are some examples
+from the sample:
+
+| Task                                              | LLM Suitability                                              | Code Suitability                                                                  |
+| ------------------------------------------------- | ------------------------------------------------------------ | --------------------------------------------------------------------------------- |
+| **Parsing the grid for size, contents and clues** | Great for an LLM by using  vision and language understanding | Difficult to write the code to do this                                            |
+| **Validating grid contents**                      | Possible to do with another LLM checking the work            | Easier for a human to glance at and adjust                                        |
+| **Handling the task list**                        | An LLM is unlikely to do this consistently                   | Easy to write the code to loop through a task list, updating as it goes           |
+| **Solving each clue**                             | Great for an LLM using language understanding and generation | Difficult to do given real world clues that depend on word play, names, and slang |
+| **Resolving conflicts**                           | An LLM is inconsistent on this kind of looping               | Easy for a human to glance at and adjust                                          |
+
+It’s a judgement call for sure, but if you can reasonably write the code to do
+it, your results will be predictable. However, if writing the code would be
+unreasonably difficult, then consider an LLM, knowing you’ll have to build the
+guardrails like we did in the sample.
+
+### Ask vs agent
+
+There’s more than just one pivot to consider besides code vs. LLM. Models
+operate in roughly two modes: “ask” and “agent”.
+
+A LLM is in “ask” mode when we prompt it without giving it tools to affect
+change in the world, for example, no tools at all or tools just for looking up
+data. Both the crossword interference model and clue solver models run in ask
+mode, using tools only for additional data.
+
+On the other hand, when we give an LLM a set of tools that allow it to operate
+on our behalf in the world – like reading and writing files, executing bash
+commands, loading web pages, calling web APIs, and so on – that LLM is in
+“agent” mode.
+
+### Guardrails
+
+The difference between ask and agent mode is not the model you choose or the
+prompts you give it, but the tools you supply. The combination of the tools and
+the agentic loop described in the Tool calls section allow an LLM to call any
+number of those tools as often as it decides. Giving it that power puts the
+responsibility on you to make sure you treat it as unpredictable; more like a
+person than a program.
+
+You do that the same way that you validate user input, by building up a suite of
+tests to see how your app works against LLM responses. Give real LLMs a wide
+variety of prompts and mock the tools to evaluate how the LLM is using them.
+Like your first user testing experience, your first LLM testing results might
+surprise you. Use that data to build the guardrails you need to harden your app.
+
+In the sample, we didn’t have to guard against harm, but we did have to guard
+against imperfect results. It was extensive testing against real-world data that
+led to the institution of human-in-the-loop guards against attempting to solve
+an invalid puzzle or conflicting solutions. In this way, Flutter and Firebase AI
+Logic make the perfect combination to harness the power of an LLM and bring
+unique capabilities to your apps.
+