-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add AI best practices and Crossword Companion docs #12853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
e2e577f
c517c59
54e5e12
feb6603
1f47d59
0f936d0
1f7ee53
6a7f19a
8589c2c
2854b89
2c286ff
ab9bc73
be58d64
10e111b
3d4426d
b81e7e2
0fc24c1
4a30b60
97493a7
bdbe64e
b5795d7
00f436a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,127 @@ | ||||||||||||||
| --- | ||||||||||||||
| title: Developer experience | ||||||||||||||
| description: > | ||||||||||||||
| Learn how to use spec-driven development and Gemini to plan, code, and iterate on high-quality Flutter applications. | ||||||||||||||
| prev: | ||||||||||||||
| title: Mode of interaction | ||||||||||||||
| path: /ai-best-practices/mode-of-interaction | ||||||||||||||
| --- | ||||||||||||||
|
|
||||||||||||||
|
|
||||||||||||||
| Generative AI is not just useful for implementing features in your app; it’s also useful for generating the code to implement those features. | ||||||||||||||
|
|
||||||||||||||
| Unfortunately, it’s just as easy as prompting an AI coding agent to “build a Flutter app that solves crossword puzzles.” I’m sure that prompt would yield something, but I doubt very much that it would give us the powerful AI-assisted, user-validated combination the Crossword Companion provides. | ||||||||||||||
|
||||||||||||||
| Unfortunately, it’s just as easy as prompting an AI coding agent to “build a Flutter app that solves crossword puzzles.” I’m sure that prompt would yield something, but I doubt very much that it would give us the powerful AI-assisted, user-validated combination the Crossword Companion provides. | |
| Unfortunately, it’s just as easy as prompting an AI coding agent to | |
| “build a Flutter app that solves crossword puzzles.” | |
| I’m sure that prompt would yield something, | |
| but I doubt very much that it would give us the powerful AI-assisted, | |
| user-validated combination the Crossword Companion provides. |
csells marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
csells marked this conversation as resolved.
Show resolved
Hide resolved
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| --- | ||
| title: Flutter AI best practices | ||
| description: > | ||
| Learn best practices for building AI-powered Flutter apps using guardrails to verify and correct AI-generated data. | ||
| next: | ||
| title: Prompting | ||
| path: /ai-best-practices/prompting | ||
| --- | ||
|
|
||
|
|
||
| Flutter and AI go well together on multiple levels. If you’re using AI to generate Flutter code, you only have to generate the code for a single app to target multiple platforms. And if you’re harnessing Gemini to implement features in your app, the Firebase AI Logic SDK makes that simple, with an easy-to-use API, and secure, by keeping the API keys out of your code. | ||
|
|
||
| If you’re new to AI for either of these two use cases, you should know: as good as it is (and the Gemini 3 Pro Preview is *very* good), AI still makes mistakes. If you’re using AI to write your code, then you can use guardrails to keep AI on track using tools like the Flutter analyzer and unit tests. | ||
|
|
||
| But what do you do when you’re using AI to implement the features in your app, knowing that sometimes it’s going to get things wrong? Or, to quote a friend of mine: | ||
|
|
||
| ***Morgan’s Law*** | ||
| *“Eventually, due to the nature of sampling from a probability distribution, [AI] will fail to do the thing that must be done.”* | ||
| *–Brett Morgan, Flutter Developer Relations Engineer, July, 2025\.* | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| The good news is that, just as you can use developer tools to build guardrails around the AI writing your code, you can use Flutter to build guardrails around the AI you use to implement your features. The [Crossword Companion app][crossword-app] was built to demonstrate these techniques. | ||
| <img src="/assets/images/docs/ai-best-practices/crossword-companion-app-interface-showin.png" alt="Crossword Companion app interface showing a 5-step setup process starting with selecting a crossword image."> | ||
| The goal of the Crossword Companion app is not to help you cheat at mini-crosswords – although it’s darn good at that – but to illustrate how to channel the power of AI using Flutter. As an example, the first thing you do when running the app is upload the screenshot of a mini-crossword puzzle. When you press the **Next** button, the AI uses that image to infer the size, contents and clues of the puzzle: | ||
| <img src="/assets/images/docs/ai-best-practices/crossword-companion-app-showing-a-5x5-gr.png" alt="Crossword Companion app showing a 5x5 grid with settings incorrectly displaying 4 rows and 5 columns."> | ||
| Notice that while the crossword puzzle is a 5x5 grid, the AI says it’s 4x5. Because we know that mistakes happen (apparently AIs are only human, too), we built the app to allow the user to verify and correct the AI-generated data. That’s important; bad data leads to bad results. | ||
|
|
||
| So this write-up is not about the app in detail but rather about the best practices to use when you’re building your own AI apps with Flutter. So let’s get to it! | ||
|
|
||
|
|
||
|
|
||
| [crossword-app]: {{site.github}}/flutter/demos/tree/main/crossword_companion | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| --- | ||
| title: Mode of interaction | ||
| description: > | ||
| Learn to balance LLM capabilities with traditional code and implement guardrails to manage nondeterministic AI behavior. | ||
| prev: | ||
| title: Tool calls (aka function calls) | ||
| path: /ai-best-practices/tool-calls-aka-function-calls | ||
| next: | ||
| title: Developer experience | ||
| path: /ai-best-practices/developer-experience | ||
| --- | ||
|
|
||
|
|
||
| It’s a mistake to think of a request to an LLM in the same way as calling a function. Given the same set of inputs in the same order, a function acts predictably. We can write tests and inject faults and harden a function for a wide variety of inputs. | ||
|
|
||
| An LLM is not like that. A better way to think about it is as if the LLM were a user and to treat the data we get from them as such. Like a user, an LLM is nondeterministic, often wrong (partially or wholly) and sometimes plain random. To guard our apps under these conditions, we need to build the same guardrails around LLM input as we do around user input. | ||
|
|
||
| If we can do that successfully, then we can bring extraordinary abilities to apps in the form of problem solving and creativity that can rival that of a human. | ||
|
|
||
| ### Separation of concerns | ||
|
|
||
| LLMs are good at some things and bad at others; the key is to bring them into your apps for the good while mitigating the bad. As an example, let’s consider the task list in the Crossword Companion: | ||
|
|
||
| <img src="/assets/images/docs/ai-best-practices/crossword-task-list-showing-solved-clues.png" alt="Crossword task list showing solved clues in green with confidence percentages and unsolved clues in red"> | ||
|
|
||
| The task list is the set of clues that need solving. The goal is to use colors and solutions in the task list to show progress during the solving process. The initial implementation provided the model with a tool for managing the task list, asking it to provide updates on progress as it went. Flash could not solve the puzzle this way, but Pro could. Unfortunately, it solved it in big chunks, only remembering to update the task list once or twice with a big delay in between. No amount of prompting could convince it to update the tasks as it went. You’ll see the same behavior with modern AI agents managing their own task lists; that’s just where we are in the evolution of LLMs at the moment. | ||
|
|
||
| So how do we get consistent, deterministic updates of the task list? Take task management out of the LLM’s hands and handle it in the code. | ||
|
|
||
| To generalize, before applying an LLM solution to a problem you’re facing, ask yourself whether an LLM is the best tool for the job. Is human-like problem solving and creatively worth the tradeoff in unpredictability? | ||
csells marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| The answer to that question comes with experimentation. Here are some examples from the sample: | ||
|
|
||
| | Task | LLM Suitability | Code Suitability | | ||
| | ----- | ----- | ----- | | ||
| | **Parsing the grid for size, contents and clues** | Great for an LLM by using vision and language understanding | Difficult to write the code to do this | | ||
| | **Validating grid contents** | Possible to do with another LLM checking the work | Easier for a human to glance at and adjust | | ||
| | **Handling the task list** | An LLM is unlikely to do this consistently | Easy to write the code to loop through a task list, updating as it goes | | ||
| | **Solving each clue** | Great for an LLM using language understanding and generation | Difficult to do given real world clues that depend on word play, names, and slang | | ||
| | **Resolving conflicts** | An LLM is inconsistent on this kind of looping | Easy for a human to glance at and adjust | | ||
|
|
||
| It’s a judgement call for sure, but if you can reasonably write the code to do it, your results will be predictable. However, if writing the code would be unreasonably difficult, then consider an LLM, knowing you’ll have to build the guardrails like we did in the sample. | ||
|
|
||
| ### Ask vs agent | ||
|
|
||
| There’s more than just one pivot to consider besides code vs. LLM. Models operate in roughly two modes: “ask” and “agent”. | ||
|
|
||
| A LLM is in “ask” mode when we prompt it without giving it tools to affect change in the world, for example, no tools at all or tools just for looking up data. Both the crossword interference model and clue solver models run in ask mode, using tools only for additional data. | ||
|
|
||
| On the other hand, when we give an LLM a set of tools that allow it to operate on our behalf in the world – like reading and writing files, executing bash commands, loading web pages, calling web APIs, and so on – that LLM is in “agent” mode. | ||
|
|
||
| ### Guardrails | ||
|
|
||
| The difference between ask and agent mode is not the model you choose or the prompts you give it, but the tools you supply. The combination of the tools and the agentic loop described in the Tool calls section allow an LLM to call any number of those tools as often as it decides. Giving it that power puts the responsibility on you to make sure you treat it as unpredictable; more like a person than a program. | ||
|
|
||
| You do that the same way that you validate user input, by building up a suite of tests to see how your app works against LLM responses. Give real LLMs a wide variety of prompts and mock the tools to evaluate how the LLM is using them. Like your first user testing experience, your first LLM testing results might surprise you. Use that data to build the guardrails you need to harden your app. | ||
|
|
||
| In the sample, we didn’t have to guard against harm, but we did have to guard against imperfect results. It was extensive testing against real-world data that led to the institution of human-in-the-loop guards against attempting to solve an invalid puzzle or conflicting solutions. In this way, Flutter and Firebase AI Logic make the perfect combination to harness the power of an LLM and bring unique capabilities to your apps. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.