Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
e2e577f
Add link checking instructions to README.md
csells Dec 20, 2025
c517c59
Add AI Best Practices write-up
csells Dec 20, 2025
54e5e12
Update src/content/ai-best-practices/structure-output.md
csells Dec 20, 2025
feb6603
Update src/content/ai-best-practices/prompting.md
csells Dec 20, 2025
1f47d59
Update src/content/ai-best-practices/prompting.md
csells Dec 20, 2025
0f936d0
Update src/content/ai-best-practices/prompting.md
csells Dec 20, 2025
1f7ee53
Update src/content/ai-best-practices/mode-of-interaction.md
csells Dec 20, 2025
6a7f19a
Update src/content/ai-best-practices/developer-experience.md
csells Dec 20, 2025
8589c2c
Update src/content/ai-best-practices/tool-calls-aka-function-calls.md
csells Dec 20, 2025
2854b89
Update src/content/ai-best-practices/tool-calls-aka-function-calls.md
csells Dec 20, 2025
2c286ff
Applying gemini-code-assist feedback
csells Dec 20, 2025
ab9bc73
Merge branch 'main' into ai-best-practices
csells Dec 20, 2025
be58d64
Update src/content/ai-best-practices/tool-calls-aka-function-calls.md
csells Dec 20, 2025
10e111b
Update src/content/ai-best-practices/tool-calls-aka-function-calls.md
csells Dec 20, 2025
3d4426d
Update src/content/ai-best-practices/prompting.md
csells Dec 20, 2025
b81e7e2
Update src/content/ai-best-practices/prompting.md
csells Dec 20, 2025
0fc24c1
Update src/content/ai-best-practices/developer-experience.md
csells Dec 20, 2025
4a30b60
Update src/content/ai-best-practices/developer-experience.md
csells Dec 20, 2025
97493a7
Update README.md
csells Dec 20, 2025
bdbe64e
Update src/content/ai-best-practices/prompting.md
csells Dec 20, 2025
b5795d7
replace curly quotes with straight quotes
csells Dec 21, 2025
00f436a
Merge branch 'main' into ai-best-practices
csells Dec 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,22 @@ on the [Flutter contributors Discord][]!

[Flutter contributors Discord]: https://github.com/flutter/flutter/blob/main/docs/contributing/Chat.md

### Check links

If you've made changes to the content and you'd like to make sure the site
builds and that the links resolve properly, then run the following command:

```terminal
# build the site with the updated content
dart run dash_site build
# check the links
dart run dash_site check-link-references
```
If this script reports any errors or warnings, then address those issues and
rerun the command.


### Refresh code excerpts

A build that fails with the error
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
127 changes: 127 additions & 0 deletions src/content/ai-best-practices/developer-experience.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
---
title: Developer experience
description: >
Learn how to use spec-driven development and Gemini to plan, code, and iterate on high-quality Flutter applications.
prev:
title: Mode of interaction
path: /ai-best-practices/mode-of-interaction
---


Generative AI is not just useful for implementing features in your app; it’s also useful for generating the code to implement those features.

Unfortunately, it’s just as easy as prompting an AI coding agent to “build a Flutter app that solves crossword puzzles.” I’m sure that prompt would yield something, but I doubt very much that it would give us the powerful AI-assisted, user-validated combination the Crossword Companion provides.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This paragraph, and many others in the new documentation files, exceeds the 80-character line length limit mentioned in the contribution guidelines ("semantic line breaks"). Please apply semantic line breaks to improve readability.

Suggested change
Unfortunately, it’s just as easy as prompting an AI coding agent to “build a Flutter app that solves crossword puzzles.” I’m sure that prompt would yield something, but I doubt very much that it would give us the powerful AI-assisted, user-validated combination the Crossword Companion provides.
Unfortunately, it’s just as easy as prompting an AI coding agent to
“build a Flutter app that solves crossword puzzles.”
I’m sure that prompt would yield something,
but I doubt very much that it would give us the powerful AI-assisted,
user-validated combination the Crossword Companion provides.


With better prompting, however, the sample app was implemented with Gemini 2.5 Pro for the bulk of the functionality and Gemini 3 Pro Preview to add the final touches. The process to get the best results from both models was the same:

- Plan
- Code
- Validate
- Iterate

### Plan

The goal of the planning process is to kick off the coding process with enough detail to let the agent know what you have in mind. The Crossword Companion planning process was started with the following prompt:

```plaintext
I'd like to create a file called requirements.md in the plans folder at the root of the project. here's a description of the project:
The application will be an open-source sample hosted on GitHub in the flutter/demos directory. It aims to demonstrate the use of Flutter, Firebase AI Logic, and Gemini to produce an agentic workflow that can solve a small crossword puzzle (one with a size under 10x10). ...lots more description of the app along with a sample puzzle screenshot...
Ask any questions you may have before you get started.
```

This prompt, with a little bit of Q&A, manual edits by a human, and some updates during the coding process, yielded [the requirements file][requirements].

Before jumping into architectural design, the Gemini CLI was asked to initialize the GEMINI.md rules file and then to update it with a list of architectural principles:

```plaintext
DRY (Don’t Repeat Yourself) – eliminate duplicated logic by extracting shared utilities and modules.
Separation of Concerns – each module should handle one distinct responsibility.
Single Responsibility Principle (SRP) – every class/module/function/file should have exactly one reason to change.
Clear Abstractions & Contracts – expose intent through small, stable interfaces and hide implementation details.
Low Coupling, High Cohesion – keep modules self-contained, minimize cross-dependencies.
Scalability & Statelessness – design components to scale horizontally and prefer stateless services when possible.
Observability & Testability – build in logging, metrics, tracing, and ensure components can be unit/integration tested.
KISS (Keep It Simple, Sir) - keep solutions as simple as possible.
YAGNI (You're Not Gonna Need It) – avoid speculative complexity or over-engineering.
```

The GEMINI.md file is loaded into every new prompt you create with Gemini; it provides the set of rules you want it to remember for any activity. Gemini was running inside of an empty Flutter app project, so the `/init` command documented how to build, test and run it, which was useful during coding.

If you’d building something more than a sample, I also recommend adding something for test-driven development:

```markdown
- **TDD (Test-Driven Development)** - write the tests first; the implementation
code isn't done until the tests pass.
```

This helps to build guardrails to ensure the coding agent is writing solid code over time.

With the requirements and rules in place, prompting for the design.md file was next:

```plaintext
great. i'd like to work on the design with you to be created in a design.md file to be stored in the plans folder. please use the @GEMINI.md and @requirements.md files as input. ask any questions you may have before you get started.
```

After inspecting and editing the generated app design, Gemini was prompted to break it down into [tasks][tasks-spec]:

```plaintext
please read the files in the @specs folder and create a corresponding tasks.md file in the same folder that lays out a set of tasks and subtasks representing the functionality of this app. lay out the top-level tasks as minimal new functionality that the user can see in the running app, step-by-step as each top-level task is completed. each top-level task should include sub-tasks for creating and running tests and updating the @README.md with a description of the current functionality of the app. ask any questions you may have before you get started.
```

All of this happens before any code is written. You don’t have to split things into separate files, but by carefully considering the requirements, the design and the task breakdown, you’re helping the agent to provide results that meet your expectations. This is called “Spec-Driven Development” and it’s currently the best way we know of to upgrade your process from “vibe coding” to “AI-assisted software development.”

Also, the sentence that says “ask any questions you may have before you get started” is a great way for the agent to clarify anything that it doesn’t understand instead of just making up the answers as it goes. It’s also useful to help you to decide on details you might not otherwise have considered.

### Code

With the requirements, rules, design and tasks in place, kicking off the coding part is easy:

```plaintext
Read the @tasks.md file and implement the first milestone.
```

You can watch the coding agent at work, jumping in to correct it as it works, or just let it go. Either way, when it’s done, it’s time to check it’s work.

### Validate

At this point, you have some code and (in the world outside of samples) some tests. To validate, ask yourself some questions:

- Does the analyzer show it to be free of errors? Of warnings?
- Does the app run?
- Does it have the features you asked for? Do they work?
- Do the tests pass?
- Does the code pass your review?

The questions to these questions form the input to the next phase.

### Iterate

Gather the issues that need to be addressed and hand the ones that need fixing back to the coding agent, iterating between it coding and your validation until you get to a good place from a functional point of view.

Now take another pass through validation from an architectural principles point of view, spinning up a new agent to check the code. By clearing out the agent’s context, you remove the biases the original agent gathered choosing what code to write in the first place. To ground it on just the code changes the agent has just made, use a prompt like this:

```plaintext
Use git diff to find the new code and check it against the architectural principles listed here: @GEMINI.md. Make recommendations for important improvements.
```

Doing this a few times keeps the code in good shape for AI agents and humans alike.









[requirements]: {{site.github}}/flutter/demos/blob/main/crossword_companion/specs/requirements.md
[tasks-spec]: {{site.github}}/flutter/demos/blob/main/crossword_companion/specs/tasks.md
31 changes: 31 additions & 0 deletions src/content/ai-best-practices/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
title: Flutter AI best practices
description: >
Learn best practices for building AI-powered Flutter apps using guardrails to verify and correct AI-generated data.
next:
title: Prompting
path: /ai-best-practices/prompting
---


Flutter and AI go well together on multiple levels. If you’re using AI to generate Flutter code, you only have to generate the code for a single app to target multiple platforms. And if you’re harnessing Gemini to implement features in your app, the Firebase AI Logic SDK makes that simple, with an easy-to-use API, and secure, by keeping the API keys out of your code.

If you’re new to AI for either of these two use cases, you should know: as good as it is (and the Gemini 3 Pro Preview is *very* good), AI still makes mistakes. If you’re using AI to write your code, then you can use guardrails to keep AI on track using tools like the Flutter analyzer and unit tests.

But what do you do when you’re using AI to implement the features in your app, knowing that sometimes it’s going to get things wrong? Or, to quote a friend of mine:

***Morgan’s Law***
*“Eventually, due to the nature of sampling from a probability distribution, [AI] will fail to do the thing that must be done.”*
*–Brett Morgan, Flutter Developer Relations Engineer, July, 2025\.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a stray backslash \ before the closing asterisk. This seems to be an artifact from the document conversion and should be removed.

Suggested change
*–Brett Morgan, Flutter Developer Relations Engineer, July, 2025\.*
*–Brett Morgan, Flutter Developer Relations Engineer, July, 2025.*


The good news is that, just as you can use developer tools to build guardrails around the AI writing your code, you can use Flutter to build guardrails around the AI you use to implement your features. The [Crossword Companion app][crossword-app] was built to demonstrate these techniques.
<img src="/assets/images/docs/ai-best-practices/crossword-companion-app-interface-showin.png" alt="Crossword Companion app interface showing a 5-step setup process starting with selecting a crossword image.">
The goal of the Crossword Companion app is not to help you cheat at mini-crosswords – although it’s darn good at that – but to illustrate how to channel the power of AI using Flutter. As an example, the first thing you do when running the app is upload the screenshot of a mini-crossword puzzle. When you press the **Next** button, the AI uses that image to infer the size, contents and clues of the puzzle:
<img src="/assets/images/docs/ai-best-practices/crossword-companion-app-showing-a-5x5-gr.png" alt="Crossword Companion app showing a 5x5 grid with settings incorrectly displaying 4 rows and 5 columns.">
Notice that while the crossword puzzle is a 5x5 grid, the AI says it’s 4x5. Because we know that mistakes happen (apparently AIs are only human, too), we built the app to allow the user to verify and correct the AI-generated data. That’s important; bad data leads to bad results.

So this write-up is not about the app in detail but rather about the best practices to use when you’re building your own AI apps with Flutter. So let’s get to it!



[crossword-app]: {{site.github}}/flutter/demos/tree/main/crossword_companion
59 changes: 59 additions & 0 deletions src/content/ai-best-practices/mode-of-interaction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
title: Mode of interaction
description: >
Learn to balance LLM capabilities with traditional code and implement guardrails to manage nondeterministic AI behavior.
prev:
title: Tool calls (aka function calls)
path: /ai-best-practices/tool-calls-aka-function-calls
next:
title: Developer experience
path: /ai-best-practices/developer-experience
---


It’s a mistake to think of a request to an LLM in the same way as calling a function. Given the same set of inputs in the same order, a function acts predictably. We can write tests and inject faults and harden a function for a wide variety of inputs.

An LLM is not like that. A better way to think about it is as if the LLM were a user and to treat the data we get from them as such. Like a user, an LLM is nondeterministic, often wrong (partially or wholly) and sometimes plain random. To guard our apps under these conditions, we need to build the same guardrails around LLM input as we do around user input.

If we can do that successfully, then we can bring extraordinary abilities to apps in the form of problem solving and creativity that can rival that of a human.

### Separation of concerns

LLMs are good at some things and bad at others; the key is to bring them into your apps for the good while mitigating the bad. As an example, let’s consider the task list in the Crossword Companion:

<img src="/assets/images/docs/ai-best-practices/crossword-task-list-showing-solved-clues.png" alt="Crossword task list showing solved clues in green with confidence percentages and unsolved clues in red">

The task list is the set of clues that need solving. The goal is to use colors and solutions in the task list to show progress during the solving process. The initial implementation provided the model with a tool for managing the task list, asking it to provide updates on progress as it went. Flash could not solve the puzzle this way, but Pro could. Unfortunately, it solved it in big chunks, only remembering to update the task list once or twice with a big delay in between. No amount of prompting could convince it to update the tasks as it went. You’ll see the same behavior with modern AI agents managing their own task lists; that’s just where we are in the evolution of LLMs at the moment.

So how do we get consistent, deterministic updates of the task list? Take task management out of the LLM’s hands and handle it in the code.

To generalize, before applying an LLM solution to a problem you’re facing, ask yourself whether an LLM is the best tool for the job. Is human-like problem solving and creatively worth the tradeoff in unpredictability?

The answer to that question comes with experimentation. Here are some examples from the sample:

| Task | LLM Suitability | Code Suitability |
| ----- | ----- | ----- |
| **Parsing the grid for size, contents and clues** | Great for an LLM by using vision and language understanding | Difficult to write the code to do this |
| **Validating grid contents** | Possible to do with another LLM checking the work | Easier for a human to glance at and adjust |
| **Handling the task list** | An LLM is unlikely to do this consistently | Easy to write the code to loop through a task list, updating as it goes |
| **Solving each clue** | Great for an LLM using language understanding and generation | Difficult to do given real world clues that depend on word play, names, and slang |
| **Resolving conflicts** | An LLM is inconsistent on this kind of looping | Easy for a human to glance at and adjust |

It’s a judgement call for sure, but if you can reasonably write the code to do it, your results will be predictable. However, if writing the code would be unreasonably difficult, then consider an LLM, knowing you’ll have to build the guardrails like we did in the sample.

### Ask vs agent

There’s more than just one pivot to consider besides code vs. LLM. Models operate in roughly two modes: “ask” and “agent”.

A LLM is in “ask” mode when we prompt it without giving it tools to affect change in the world, for example, no tools at all or tools just for looking up data. Both the crossword interference model and clue solver models run in ask mode, using tools only for additional data.

On the other hand, when we give an LLM a set of tools that allow it to operate on our behalf in the world – like reading and writing files, executing bash commands, loading web pages, calling web APIs, and so on – that LLM is in “agent” mode.

### Guardrails

The difference between ask and agent mode is not the model you choose or the prompts you give it, but the tools you supply. The combination of the tools and the agentic loop described in the Tool calls section allow an LLM to call any number of those tools as often as it decides. Giving it that power puts the responsibility on you to make sure you treat it as unpredictable; more like a person than a program.

You do that the same way that you validate user input, by building up a suite of tests to see how your app works against LLM responses. Give real LLMs a wide variety of prompts and mock the tools to evaluate how the LLM is using them. Like your first user testing experience, your first LLM testing results might surprise you. Use that data to build the guardrails you need to harden your app.

In the sample, we didn’t have to guard against harm, but we did have to guard against imperfect results. It was extensive testing against real-world data that led to the institution of human-in-the-loop guards against attempting to solve an invalid puzzle or conflicting solutions. In this way, Flutter and Firebase AI Logic make the perfect combination to harness the power of an LLM and bring unique capabilities to your apps.

Loading