Skip to content
Open
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
e2e577f
Add link checking instructions to README.md
csells Dec 20, 2025
c517c59
Add AI Best Practices write-up
csells Dec 20, 2025
54e5e12
Update src/content/ai-best-practices/structure-output.md
csells Dec 20, 2025
feb6603
Update src/content/ai-best-practices/prompting.md
csells Dec 20, 2025
1f47d59
Update src/content/ai-best-practices/prompting.md
csells Dec 20, 2025
0f936d0
Update src/content/ai-best-practices/prompting.md
csells Dec 20, 2025
1f7ee53
Update src/content/ai-best-practices/mode-of-interaction.md
csells Dec 20, 2025
6a7f19a
Update src/content/ai-best-practices/developer-experience.md
csells Dec 20, 2025
8589c2c
Update src/content/ai-best-practices/tool-calls-aka-function-calls.md
csells Dec 20, 2025
2854b89
Update src/content/ai-best-practices/tool-calls-aka-function-calls.md
csells Dec 20, 2025
2c286ff
Applying gemini-code-assist feedback
csells Dec 20, 2025
ab9bc73
Merge branch 'main' into ai-best-practices
csells Dec 20, 2025
be58d64
Update src/content/ai-best-practices/tool-calls-aka-function-calls.md
csells Dec 20, 2025
10e111b
Update src/content/ai-best-practices/tool-calls-aka-function-calls.md
csells Dec 20, 2025
3d4426d
Update src/content/ai-best-practices/prompting.md
csells Dec 20, 2025
b81e7e2
Update src/content/ai-best-practices/prompting.md
csells Dec 20, 2025
0fc24c1
Update src/content/ai-best-practices/developer-experience.md
csells Dec 20, 2025
4a30b60
Update src/content/ai-best-practices/developer-experience.md
csells Dec 20, 2025
97493a7
Update README.md
csells Dec 20, 2025
bdbe64e
Update src/content/ai-best-practices/prompting.md
csells Dec 20, 2025
b5795d7
replace curly quotes with straight quotes
csells Dec 21, 2025
00f436a
Merge branch 'main' into ai-best-practices
csells Dec 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,17 @@ on the [Flutter contributors Discord][]!

[Flutter contributors Discord]: https://github.com/flutter/flutter/blob/main/docs/contributing/Chat.md

### Check links

If you've made changes to the content and you'd like to make sure the site
builds and that the links resolve properly, then run the following command:

# build the site and check links
dart run dash_site build && dart run dash_site check-link-references
If this script reports any errors or warnings, then address those issues and
rerun the command.


### Refresh code excerpts

A build that fails with the error
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
166 changes: 166 additions & 0 deletions src/content/ai-best-practices/developer-experience.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
---
title: Developer experience
description: >
Learn how to use spec-driven development and Gemini to plan, code, and
iterate on high-quality Flutter applications.
prev:
title: Mode of interaction
path: /ai-best-practices/mode-of-interaction
---


Generative AI is not just useful for implementing features in your app; it’s
also useful for generating the code to implement those features.

Unfortunately, it’s just as easy as prompting an AI coding agent to “build a
Flutter app that solves crossword puzzles.” I’m sure that prompt would yield
something, but I doubt very much that it would give us the powerful AI-assisted,
user-validated combination the Crossword Companion provides.

With better prompting, however, the sample app was implemented with Gemini 2.5
Pro for the bulk of the functionality and Gemini 3 Pro Preview to add the final
touches. The process to get the best results from both models was the same:

- Plan
- Code
- Validate
- Iterate

### Plan

The goal of the planning process is to kick off the coding process with enough
detail to let the agent know what you have in mind. The Crossword Companion
planning process was started with the following prompt:

```plaintext
I'd like to create a file called requirements.md in the plans folder at the root of the project. here's a description of the project:

The application will be an open-source sample hosted on GitHub in the flutter/demos directory. It aims to demonstrate the use of Flutter, Firebase AI Logic, and Gemini to produce an agentic workflow that can solve a small crossword puzzle (one with a size under 10x10)....lots more description of the app along with a sample puzzle screenshot...
Ask any questions you may have before you get started.
```

This prompt, with a little bit of Q&A, manual edits by a human, and some updates
during the coding process, yielded [the requirements file][requirements].

Before jumping into architectural design, the Gemini CLI was asked to initialize
the GEMINI.md rules file and then to update it with a list of architectural
principles:

```plaintext
DRY (Don’t Repeat Yourself) – eliminate duplicated logic by extracting shared utilities and modules.

Separation of Concerns – each module should handle one distinct responsibility.

Single Responsibility Principle (SRP) – every class/module/function/file should have exactly one reason to change.

Clear Abstractions & Contracts – expose intent through small, stable interfaces and hide implementation details.

Low Coupling, High Cohesion – keep modules self-contained, minimize cross-dependencies.

Scalability & Statelessness – design components to scale horizontally and prefer stateless services when possible.

Observability & Testability – build in logging, metrics, tracing, and ensure components can be unit/integration tested.

KISS (Keep It Simple, Sir) - keep solutions as simple as possible.

YAGNI (You're Not Gonna Need It) – avoid speculative complexity or over-engineering.
```

The GEMINI.md file is loaded into every new prompt you create with Gemini; it
provides the set of rules you want it to remember for any activity. Gemini was
running inside of an empty Flutter app project, so the `/init` command
documented how to build, test and run it, which was useful during coding.

If you’re building something more than a sample, I also recommend adding
something for test-driven development:

```markdown
- **TDD (Test-Driven Development)** - write the tests first; the implementation
code isn't done until the tests pass.
```

This helps to build guardrails to ensure the coding agent is writing solid code
over time.

With the requirements and rules in place, prompting for the design.md file was
next:

```plaintext
great. i'd like to work on the design with you to be created in a design.md file to be stored in the plans folder. please use the @GEMINI.md and @requirements.md files as input. ask any questions you may have before you get started.
```

After inspecting and editing the generated app design, Gemini was prompted to
break it down into [tasks][tasks-spec]:

```plaintext
please read the files in the @specs folder and create a corresponding tasks.md file in the same folder that lays out a set of tasks and subtasks representing the functionality of this app. lay out the top-level tasks as minimal new functionality that the user can see in the running app, step-by-step as each top-level task is completed. each top-level task should include sub-tasks for creating and running tests and updating the @README.md with a description of the current functionality of the app. ask any questions you may have before you get started.
```

All of this happens before any code is written. You don’t have to split things
into separate files, but by carefully considering the requirements, the design
and the task breakdown, you’re helping the agent to provide results that meet
your expectations. This is called “Spec-Driven Development” and it’s currently
the best way we know of to upgrade your process from “vibe coding” to
“AI-assisted software development.”

Also, the sentence that says “ask any questions you may have before you get
started” is a great way for the agent to clarify anything that it doesn’t
understand instead of just making up the answers as it goes. It’s also useful to
help you to decide on details you might not otherwise have considered.

### Code

With the requirements, rules, design and tasks in place, kicking off the coding
part is easy:

```plaintext
Read the @tasks.md file and implement the first milestone.
```

You can watch the coding agent at work, jumping in to correct it as it works, or
just let it go. Either way, when it’s done, it’s time to check its work.

### Validate

At this point, you have some code and (in the world outside of samples) some
tests. To validate, ask yourself some questions:

- Does the analyzer show it to be free of errors? Of warnings?
- Does the app run?
- Does it have the features you asked for? Do they work?
- Do the tests pass?
- Does the code pass your review?

The questions to these questions form the input to the next phase.

### Iterate

Gather the issues that need to be addressed and hand the ones that need fixing
back to the coding agent, iterating between it coding and your validation until
you get to a good place from a functional point of view.

Now take another pass through validation from an architectural principles point
of view, spinning up a new agent to check the code. By clearing out the agent’s
context, you remove the biases the original agent gathered choosing what code to
write in the first place. To ground it on just the code changes the agent has
just made, use a prompt like this:

```plaintext
Use git diff to find the new code and check it against the architectural principles listed here: @GEMINI.md. Make recommendations for important improvements.
```

Doing this a few times keeps the code in good shape for AI agents and humans
alike.









[requirements]:
{{site.github}}/flutter/demos/blob/main/crossword_companion/specs/requirements.md
[tasks-spec]:
{{site.github}}/flutter/demos/blob/main/crossword_companion/specs/tasks.md
61 changes: 61 additions & 0 deletions src/content/ai-best-practices/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: Flutter AI best practices
description: >
Learn best practices for building AI-powered Flutter apps using guardrails to
verify and correct AI-generated data.
next:
title: Prompting
path: /ai-best-practices/prompting
---


Flutter and AI go well together on multiple levels. If you’re using AI to
generate Flutter code, you only have to generate the code for a single app to
target multiple platforms. And if you’re harnessing Gemini to implement features
in your app, the Firebase AI Logic SDK makes that simple, with an easy-to-use
API, and secure, by keeping the API keys out of your code.

If you’re new to AI for either of these two use cases, you should know: as good
as it is (and the Gemini 3 Pro Preview is *very* good), AI still makes mistakes.
If you’re using AI to write your code, then you can use guardrails to keep AI on
track using tools like the Flutter analyzer and unit tests.

But what do you do when you’re using AI to implement the features in your app,
knowing that sometimes it’s going to get things wrong? Or, to quote a friend of
mine:

***Morgan’s Law***
*“Eventually, due to the nature of sampling from a probability distribution,
[AI] will fail to do the thing that must be done.”*
*–Brett Morgan, Flutter Developer Relations Engineer, July, 2025\.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a stray backslash \ before the closing asterisk. This seems to be an artifact from the document conversion and should be removed.

Suggested change
*–Brett Morgan, Flutter Developer Relations Engineer, July, 2025\.*
*–Brett Morgan, Flutter Developer Relations Engineer, July, 2025.*


The good news is that, just as you can use developer tools to build guardrails
around the AI writing your code, you can use Flutter to build guardrails around
the AI you use to implement your features. The [Crossword Companion
app][crossword-app] was built to demonstrate these techniques.
<img
src="/assets/images/docs/ai-best-practices/crossword-companion-app-interface-showin.png"
alt="Crossword Companion app interface showing a 5-step setup process starting
with selecting a crossword image.">
The goal of the Crossword Companion app is not to help you cheat at
mini-crosswords – although it’s darn good at that – but to illustrate how to
channel the power of AI using Flutter. As an example, the first thing you do
when running the app is upload the screenshot of a mini-crossword puzzle. When
you press the **Next** button, the AI uses that image to infer the size,
contents and clues of the puzzle:
<img
src="/assets/images/docs/ai-best-practices/crossword-companion-app-showing-a-5x5-gr.png"
alt="Crossword Companion app showing a 5x5 grid with settings incorrectly
displaying 4 rows and 5 columns.">
Notice that while the crossword puzzle is a 5x5 grid, the AI says it’s 4x5.
Because we know that mistakes happen (apparently AIs are only human, too), we
built the app to allow the user to verify and correct the AI-generated data.
That’s important; bad data leads to bad results.

So this write-up is not about the app in detail but rather about the best
practices to use when you’re building your own AI apps with Flutter. So let’s
get to it!



[crossword-app]: {{site.github}}/flutter/demos/tree/main/crossword_companion
108 changes: 108 additions & 0 deletions src/content/ai-best-practices/mode-of-interaction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
title: Mode of interaction
description: >
Learn to balance LLM capabilities with traditional code and implement
guardrails to manage nondeterministic AI behavior.
prev:
title: Tool calls (aka function calls)
path: /ai-best-practices/tool-calls-aka-function-calls
next:
title: Developer experience
path: /ai-best-practices/developer-experience
---


It’s a mistake to think of a request to an LLM in the same way as calling a
function. Given the same set of inputs in the same order, a function acts
predictably. We can write tests and inject faults and harden a function for a
wide variety of inputs.

An LLM is not like that. A better way to think about it is as if the LLM were a
user and to treat the data we get from them as such. Like a user, an LLM is
nondeterministic, often wrong (partially or wholly) and sometimes plain random.
To guard our apps under these conditions, we need to build the same guardrails
around LLM input as we do around user input.

If we can do that successfully, then we can bring extraordinary abilities to
apps in the form of problem solving and creativity that can rival that of a
human.

### Separation of concerns

LLMs are good at some things and bad at others; the key is to bring them into
your apps for the good while mitigating the bad. As an example, let’s consider
the task list in the Crossword Companion:

<img
src="/assets/images/docs/ai-best-practices/crossword-task-list-showing-solved-clues.png"
alt="Crossword task list showing solved clues in green with confidence
percentages and unsolved clues in red">

The task list is the set of clues that need solving. The goal is to use colors
and solutions in the task list to show progress during the solving process. The
initial implementation provided the model with a tool for managing the task
list, asking it to provide updates on progress as it went. Flash could not solve
the puzzle this way, but Pro could. Unfortunately, it solved it in big chunks,
only remembering to update the task list once or twice with a big delay in
between. No amount of prompting could convince it to update the tasks as it
went. You’ll see the same behavior with modern AI agents managing their own task
lists; that’s just where we are in the evolution of LLMs at the moment.

So how do we get consistent, deterministic updates of the task list? Take task
management out of the LLM’s hands and handle it in the code.

To generalize, before applying an LLM solution to a problem you’re facing, ask yourself whether an LLM is the best tool for the job. Is human-like problem solving and creativity worth the tradeoff in unpredictability?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This line is 179 characters long, which violates the "semantic line breaks of 80 characters or fewer" rule mentioned in the PR's checklist. Breaking long lines into shorter ones improves readability. This issue is present across most of the new documentation files.

Suggested change
To generalize, before applying an LLM solution to a problem you’re facing, ask yourself whether an LLM is the best tool for the job. Is human-like problem solving and creativity worth the tradeoff in unpredictability?
To generalize, before applying an LLM solution to a problem you’re facing, ask
yourself whether an LLM is the best tool for the job. Is human-like problem
solving and creativity worth the tradeoff in unpredictability?


The answer to that question comes with experimentation. Here are some examples
from the sample:

| Task | LLM Suitability | Code Suitability |
| ------------------------------------------------- | ------------------------------------------------------------ | --------------------------------------------------------------------------------- |
| **Parsing the grid for size, contents and clues** | Great for an LLM by using vision and language understanding | Difficult to write the code to do this |
| **Validating grid contents** | Possible to do with another LLM checking the work | Easier for a human to glance at and adjust |
| **Handling the task list** | An LLM is unlikely to do this consistently | Easy to write the code to loop through a task list, updating as it goes |
| **Solving each clue** | Great for an LLM using language understanding and generation | Difficult to do given real world clues that depend on word play, names, and slang |
| **Resolving conflicts** | An LLM is inconsistent on this kind of looping | Easy for a human to glance at and adjust |

It’s a judgement call for sure, but if you can reasonably write the code to do
it, your results will be predictable. However, if writing the code would be
unreasonably difficult, then consider an LLM, knowing you’ll have to build the
guardrails like we did in the sample.

### Ask vs agent

There’s more than just one pivot to consider besides code vs. LLM. Models
operate in roughly two modes: “ask” and “agent”.

A LLM is in “ask” mode when we prompt it without giving it tools to affect
change in the world, for example, no tools at all or tools just for looking up
data. Both the crossword interference model and clue solver models run in ask
mode, using tools only for additional data.

On the other hand, when we give an LLM a set of tools that allow it to operate
on our behalf in the world – like reading and writing files, executing bash
commands, loading web pages, calling web APIs, and so on – that LLM is in
“agent” mode.

### Guardrails

The difference between ask and agent mode is not the model you choose or the
prompts you give it, but the tools you supply. The combination of the tools and
the agentic loop described in the Tool calls section allow an LLM to call any
number of those tools as often as it decides. Giving it that power puts the
responsibility on you to make sure you treat it as unpredictable; more like a
person than a program.

You do that the same way that you validate user input, by building up a suite of
tests to see how your app works against LLM responses. Give real LLMs a wide
variety of prompts and mock the tools to evaluate how the LLM is using them.
Like your first user testing experience, your first LLM testing results might
surprise you. Use that data to build the guardrails you need to harden your app.

In the sample, we didn’t have to guard against harm, but we did have to guard
against imperfect results. It was extensive testing against real-world data that
led to the institution of human-in-the-loop guards against attempting to solve
an invalid puzzle or conflicting solutions. In this way, Flutter and Firebase AI
Logic make the perfect combination to harness the power of an LLM and bring
unique capabilities to your apps.

Loading