-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add AI best practices and Crossword Companion docs #12853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 20 commits
e2e577f
c517c59
54e5e12
feb6603
1f47d59
0f936d0
1f7ee53
6a7f19a
8589c2c
2854b89
2c286ff
ab9bc73
be58d64
10e111b
3d4426d
b81e7e2
0fc24c1
4a30b60
97493a7
bdbe64e
b5795d7
00f436a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,166 @@ | ||
| --- | ||
| title: Developer experience | ||
| description: > | ||
| Learn how to use spec-driven development and Gemini to plan, code, and | ||
| iterate on high-quality Flutter applications. | ||
| prev: | ||
| title: Mode of interaction | ||
| path: /ai-best-practices/mode-of-interaction | ||
| --- | ||
|
|
||
|
|
||
| Generative AI is not just useful for implementing features in your app; it’s | ||
| also useful for generating the code to implement those features. | ||
|
|
||
| Unfortunately, it’s just as easy as prompting an AI coding agent to “build a | ||
| Flutter app that solves crossword puzzles.” I’m sure that prompt would yield | ||
| something, but I doubt very much that it would give us the powerful AI-assisted, | ||
| user-validated combination the Crossword Companion provides. | ||
|
|
||
| With better prompting, however, the sample app was implemented with Gemini 2.5 | ||
| Pro for the bulk of the functionality and Gemini 3 Pro Preview to add the final | ||
| touches. The process to get the best results from both models was the same: | ||
|
|
||
| - Plan | ||
| - Code | ||
| - Validate | ||
| - Iterate | ||
|
|
||
| ### Plan | ||
|
|
||
| The goal of the planning process is to kick off the coding process with enough | ||
| detail to let the agent know what you have in mind. The Crossword Companion | ||
| planning process was started with the following prompt: | ||
|
|
||
| ```plaintext | ||
| I'd like to create a file called requirements.md in the plans folder at the root of the project. here's a description of the project: | ||
|
|
||
| The application will be an open-source sample hosted on GitHub in the flutter/demos directory. It aims to demonstrate the use of Flutter, Firebase AI Logic, and Gemini to produce an agentic workflow that can solve a small crossword puzzle (one with a size under 10x10)....lots more description of the app along with a sample puzzle screenshot... | ||
| Ask any questions you may have before you get started. | ||
| ``` | ||
|
|
||
| This prompt, with a little bit of Q&A, manual edits by a human, and some updates | ||
| during the coding process, yielded [the requirements file][requirements]. | ||
|
|
||
| Before jumping into architectural design, the Gemini CLI was asked to initialize | ||
| the GEMINI.md rules file and then to update it with a list of architectural | ||
| principles: | ||
|
|
||
| ```plaintext | ||
| DRY (Don’t Repeat Yourself) – eliminate duplicated logic by extracting shared utilities and modules. | ||
|
|
||
| Separation of Concerns – each module should handle one distinct responsibility. | ||
|
|
||
| Single Responsibility Principle (SRP) – every class/module/function/file should have exactly one reason to change. | ||
|
|
||
| Clear Abstractions & Contracts – expose intent through small, stable interfaces and hide implementation details. | ||
|
|
||
| Low Coupling, High Cohesion – keep modules self-contained, minimize cross-dependencies. | ||
|
|
||
| Scalability & Statelessness – design components to scale horizontally and prefer stateless services when possible. | ||
|
|
||
| Observability & Testability – build in logging, metrics, tracing, and ensure components can be unit/integration tested. | ||
|
|
||
| KISS (Keep It Simple, Sir) - keep solutions as simple as possible. | ||
|
|
||
| YAGNI (You're Not Gonna Need It) – avoid speculative complexity or over-engineering. | ||
| ``` | ||
|
|
||
| The GEMINI.md file is loaded into every new prompt you create with Gemini; it | ||
| provides the set of rules you want it to remember for any activity. Gemini was | ||
| running inside of an empty Flutter app project, so the `/init` command | ||
| documented how to build, test and run it, which was useful during coding. | ||
|
|
||
| If you’re building something more than a sample, I also recommend adding | ||
| something for test-driven development: | ||
|
|
||
| ```markdown | ||
| - **TDD (Test-Driven Development)** - write the tests first; the implementation | ||
| code isn't done until the tests pass. | ||
| ``` | ||
|
|
||
| This helps to build guardrails to ensure the coding agent is writing solid code | ||
| over time. | ||
|
|
||
| With the requirements and rules in place, prompting for the design.md file was | ||
| next: | ||
|
|
||
| ```plaintext | ||
| great. i'd like to work on the design with you to be created in a design.md file to be stored in the plans folder. please use the @GEMINI.md and @requirements.md files as input. ask any questions you may have before you get started. | ||
| ``` | ||
|
|
||
| After inspecting and editing the generated app design, Gemini was prompted to | ||
| break it down into [tasks][tasks-spec]: | ||
|
|
||
| ```plaintext | ||
| please read the files in the @specs folder and create a corresponding tasks.md file in the same folder that lays out a set of tasks and subtasks representing the functionality of this app. lay out the top-level tasks as minimal new functionality that the user can see in the running app, step-by-step as each top-level task is completed. each top-level task should include sub-tasks for creating and running tests and updating the @README.md with a description of the current functionality of the app. ask any questions you may have before you get started. | ||
| ``` | ||
|
|
||
| All of this happens before any code is written. You don’t have to split things | ||
| into separate files, but by carefully considering the requirements, the design | ||
| and the task breakdown, you’re helping the agent to provide results that meet | ||
| your expectations. This is called “Spec-Driven Development” and it’s currently | ||
| the best way we know of to upgrade your process from “vibe coding” to | ||
| “AI-assisted software development.” | ||
|
|
||
| Also, the sentence that says “ask any questions you may have before you get | ||
| started” is a great way for the agent to clarify anything that it doesn’t | ||
| understand instead of just making up the answers as it goes. It’s also useful to | ||
| help you to decide on details you might not otherwise have considered. | ||
|
|
||
| ### Code | ||
|
|
||
| With the requirements, rules, design and tasks in place, kicking off the coding | ||
| part is easy: | ||
|
|
||
| ```plaintext | ||
| Read the @tasks.md file and implement the first milestone. | ||
| ``` | ||
|
|
||
| You can watch the coding agent at work, jumping in to correct it as it works, or | ||
| just let it go. Either way, when it’s done, it’s time to check its work. | ||
|
|
||
| ### Validate | ||
|
|
||
| At this point, you have some code and (in the world outside of samples) some | ||
| tests. To validate, ask yourself some questions: | ||
|
|
||
| - Does the analyzer show it to be free of errors? Of warnings? | ||
| - Does the app run? | ||
| - Does it have the features you asked for? Do they work? | ||
| - Do the tests pass? | ||
| - Does the code pass your review? | ||
|
|
||
| The questions to these questions form the input to the next phase. | ||
|
|
||
| ### Iterate | ||
|
|
||
| Gather the issues that need to be addressed and hand the ones that need fixing | ||
| back to the coding agent, iterating between it coding and your validation until | ||
| you get to a good place from a functional point of view. | ||
|
|
||
| Now take another pass through validation from an architectural principles point | ||
| of view, spinning up a new agent to check the code. By clearing out the agent’s | ||
| context, you remove the biases the original agent gathered choosing what code to | ||
| write in the first place. To ground it on just the code changes the agent has | ||
| just made, use a prompt like this: | ||
|
|
||
| ```plaintext | ||
| Use git diff to find the new code and check it against the architectural principles listed here: @GEMINI.md. Make recommendations for important improvements. | ||
| ``` | ||
|
|
||
| Doing this a few times keeps the code in good shape for AI agents and humans | ||
| alike. | ||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
| [requirements]: | ||
| {{site.github}}/flutter/demos/blob/main/crossword_companion/specs/requirements.md | ||
| [tasks-spec]: | ||
| {{site.github}}/flutter/demos/blob/main/crossword_companion/specs/tasks.md | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| --- | ||
| title: Flutter AI best practices | ||
| description: > | ||
| Learn best practices for building AI-powered Flutter apps using guardrails to | ||
| verify and correct AI-generated data. | ||
| next: | ||
| title: Prompting | ||
| path: /ai-best-practices/prompting | ||
| --- | ||
|
|
||
|
|
||
| Flutter and AI go well together on multiple levels. If you’re using AI to | ||
| generate Flutter code, you only have to generate the code for a single app to | ||
| target multiple platforms. And if you’re harnessing Gemini to implement features | ||
| in your app, the Firebase AI Logic SDK makes that simple, with an easy-to-use | ||
| API, and secure, by keeping the API keys out of your code. | ||
|
|
||
| If you’re new to AI for either of these two use cases, you should know: as good | ||
| as it is (and the Gemini 3 Pro Preview is *very* good), AI still makes mistakes. | ||
| If you’re using AI to write your code, then you can use guardrails to keep AI on | ||
| track using tools like the Flutter analyzer and unit tests. | ||
|
|
||
| But what do you do when you’re using AI to implement the features in your app, | ||
| knowing that sometimes it’s going to get things wrong? Or, to quote a friend of | ||
| mine: | ||
|
|
||
| ***Morgan’s Law*** | ||
| *“Eventually, due to the nature of sampling from a probability distribution, | ||
| [AI] will fail to do the thing that must be done.”* | ||
| *–Brett Morgan, Flutter Developer Relations Engineer, July, 2025\.* | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| The good news is that, just as you can use developer tools to build guardrails | ||
| around the AI writing your code, you can use Flutter to build guardrails around | ||
| the AI you use to implement your features. The [Crossword Companion | ||
| app][crossword-app] was built to demonstrate these techniques. | ||
| <img | ||
| src="/assets/images/docs/ai-best-practices/crossword-companion-app-interface-showin.png" | ||
| alt="Crossword Companion app interface showing a 5-step setup process starting | ||
| with selecting a crossword image."> | ||
| The goal of the Crossword Companion app is not to help you cheat at | ||
| mini-crosswords – although it’s darn good at that – but to illustrate how to | ||
| channel the power of AI using Flutter. As an example, the first thing you do | ||
| when running the app is upload the screenshot of a mini-crossword puzzle. When | ||
| you press the **Next** button, the AI uses that image to infer the size, | ||
| contents and clues of the puzzle: | ||
| <img | ||
| src="/assets/images/docs/ai-best-practices/crossword-companion-app-showing-a-5x5-gr.png" | ||
| alt="Crossword Companion app showing a 5x5 grid with settings incorrectly | ||
| displaying 4 rows and 5 columns."> | ||
| Notice that while the crossword puzzle is a 5x5 grid, the AI says it’s 4x5. | ||
| Because we know that mistakes happen (apparently AIs are only human, too), we | ||
| built the app to allow the user to verify and correct the AI-generated data. | ||
| That’s important; bad data leads to bad results. | ||
|
|
||
| So this write-up is not about the app in detail but rather about the best | ||
| practices to use when you’re building your own AI apps with Flutter. So let’s | ||
| get to it! | ||
|
|
||
|
|
||
|
|
||
| [crossword-app]: {{site.github}}/flutter/demos/tree/main/crossword_companion | ||
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,108 @@ | ||||||||||
| --- | ||||||||||
| title: Mode of interaction | ||||||||||
| description: > | ||||||||||
| Learn to balance LLM capabilities with traditional code and implement | ||||||||||
| guardrails to manage nondeterministic AI behavior. | ||||||||||
| prev: | ||||||||||
| title: Tool calls (aka function calls) | ||||||||||
| path: /ai-best-practices/tool-calls-aka-function-calls | ||||||||||
| next: | ||||||||||
| title: Developer experience | ||||||||||
| path: /ai-best-practices/developer-experience | ||||||||||
| --- | ||||||||||
|
|
||||||||||
|
|
||||||||||
| It’s a mistake to think of a request to an LLM in the same way as calling a | ||||||||||
| function. Given the same set of inputs in the same order, a function acts | ||||||||||
| predictably. We can write tests and inject faults and harden a function for a | ||||||||||
| wide variety of inputs. | ||||||||||
|
|
||||||||||
| An LLM is not like that. A better way to think about it is as if the LLM were a | ||||||||||
| user and to treat the data we get from them as such. Like a user, an LLM is | ||||||||||
| nondeterministic, often wrong (partially or wholly) and sometimes plain random. | ||||||||||
| To guard our apps under these conditions, we need to build the same guardrails | ||||||||||
| around LLM input as we do around user input. | ||||||||||
|
|
||||||||||
| If we can do that successfully, then we can bring extraordinary abilities to | ||||||||||
| apps in the form of problem solving and creativity that can rival that of a | ||||||||||
| human. | ||||||||||
|
|
||||||||||
| ### Separation of concerns | ||||||||||
|
|
||||||||||
| LLMs are good at some things and bad at others; the key is to bring them into | ||||||||||
| your apps for the good while mitigating the bad. As an example, let’s consider | ||||||||||
| the task list in the Crossword Companion: | ||||||||||
|
|
||||||||||
| <img | ||||||||||
| src="/assets/images/docs/ai-best-practices/crossword-task-list-showing-solved-clues.png" | ||||||||||
| alt="Crossword task list showing solved clues in green with confidence | ||||||||||
| percentages and unsolved clues in red"> | ||||||||||
|
|
||||||||||
| The task list is the set of clues that need solving. The goal is to use colors | ||||||||||
| and solutions in the task list to show progress during the solving process. The | ||||||||||
| initial implementation provided the model with a tool for managing the task | ||||||||||
| list, asking it to provide updates on progress as it went. Flash could not solve | ||||||||||
| the puzzle this way, but Pro could. Unfortunately, it solved it in big chunks, | ||||||||||
| only remembering to update the task list once or twice with a big delay in | ||||||||||
| between. No amount of prompting could convince it to update the tasks as it | ||||||||||
| went. You’ll see the same behavior with modern AI agents managing their own task | ||||||||||
| lists; that’s just where we are in the evolution of LLMs at the moment. | ||||||||||
|
|
||||||||||
| So how do we get consistent, deterministic updates of the task list? Take task | ||||||||||
| management out of the LLM’s hands and handle it in the code. | ||||||||||
|
|
||||||||||
| To generalize, before applying an LLM solution to a problem you’re facing, ask yourself whether an LLM is the best tool for the job. Is human-like problem solving and creativity worth the tradeoff in unpredictability? | ||||||||||
|
||||||||||
| To generalize, before applying an LLM solution to a problem you’re facing, ask yourself whether an LLM is the best tool for the job. Is human-like problem solving and creativity worth the tradeoff in unpredictability? | |
| To generalize, before applying an LLM solution to a problem you’re facing, ask | |
| yourself whether an LLM is the best tool for the job. Is human-like problem | |
| solving and creativity worth the tradeoff in unpredictability? |
Uh oh!
There was an error while loading. Please reload this page.