Skip to content

Conversation

@clangenb
Copy link

@clangenb clangenb commented Feb 28, 2025

So our goal is to prevent unnecessary restarts as our restarts incur minimal offline time, which we absolutely want to prevent. Hence, this PR introduces the following optimizations:

  • Only redeploy the stack iff the git revision and the particular stack's docker compose file have changed.
  • Additionally, we persist both into an SQL light DB such that a restart of swarm cd won't cause a restart of all stacks.

@clangenb clangenb changed the title Persist current revision to a litesql db and only update the stack if the revision changed Persist current revision to a SQL-lite db and only update the stack if the revision changed Feb 28, 2025
Copy link
Collaborator

@sanzoghenzo sanzoghenzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

I've added my personal opinions on this: functionally it is OK (though I still have to test it), but I would reword some things.

@clangenb
Copy link
Author

clangenb commented Mar 4, 2025

Cool, thanks a lot for taking a look at it! I have added your requested changes. 👍

Copy link
Collaborator

@sanzoghenzo sanzoghenzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the edits!
these are my last ramblings, I'll let you decide if they make sense 😉

@clangenb
Copy link
Author

clangenb commented Mar 6, 2025

Thanks again for the detailed review. I have added the changes according to your feedback. 🚀

Copy link
Collaborator

@sanzoghenzo sanzoghenzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for being so pedantic, but I ask you to do one last change to the docs to be as explicit as possible.

One thing I'm wondering just now, though: is it important to persist the database to disk?
Couldn't we just use an in-memory instance of the sqlite db?
I mean, the added complexity of the storage on disk will only save you a few re-deployment when you restart swarm-cd (which should be a zero operation since nothing changed), while the big benefit from this (as I see it) is to avoid the re-deployment every two minutes.

@clangenb
Copy link
Author

clangenb commented Mar 7, 2025

sorry for being so pedantic, but I ask you to do one last change to the docs to be as explicit as possible.

One thing I'm wondering just now, though: is it important to persist the database to disk? Couldn't we just use an in-memory instance of the sqlite db? I mean, the added complexity of the storage on disk will only save you a few re-deployment when you restart swarm-cd (which should be a zero operation since nothing changed), while the big benefit from this (as I see it) is to avoid the re-deployment every two minutes.

No worries, being pedantic creates better code. In our case, we explicitly want to avoid redeployments due to restarts of swarm, as it will cause an unnecessary restart of the deployed service, which does incur minimal downtime - sometimes at unpredictable times.

However, maybe we could maybe disable that feature by default, and setting the env var will enable it. What do you think?

@sanzoghenzo
Copy link
Collaborator

as it will cause an unnecessary restart of the deployed service

Oh, now I'm the non-pro in this domain 😅 I thoight that re-issuing a docker stack deploy with the same exact file wouldn't result in a redeploy...

I see that there's the --resolve-image flag that defaults to always, and can be set to changed to only recreate the container if the image hash changes (be it a new tag or an update of the same tag)... maybe it can be enough to avoid or at least reduce the restarts?

This is not to turn down your contribution, it is just to analyze the situation and make it as efficient as possible (event if this is a small, simple addition, no code is the best code 🤣 )

@clangenb
Copy link
Author

clangenb commented Mar 8, 2025

hahaha, no worries. I am equally interested in finding the best aka most slim solution here. Let's have @mosonyi chime in here. He is the pro, and if he agrees that your suggestion might fix our issue, then I happy.

@clangenb
Copy link
Author

clangenb commented Mar 22, 2025

Thanks to your pedantic feedback the code has been improved much more. :) It should be ready for another round of review.

Copy link
Collaborator

@sanzoghenzo sanzoghenzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

Only one note on the updateStack function: the logger is initialized to show the stack name as extra parameter, so there's no need to add it to the message... but it doesn't hurt, either 😉

@clangenb
Copy link
Author

This looks great!

Only one note on the updateStack function: the logger is initialized to show the stack name as extra parameter, so there's no need to add it to the message... but it doesn't hurt, either 😉

Thanks a lot for the continuously reat feedback! Yeah, I noticed that, but I find it hard to see to which stack the logs belong to, as the logger appends the stack name instead of prepending. This is why I did it like that.

If you have a better idea, I am happy to listen. 😄

@sanzoghenzo
Copy link
Collaborator

I find it hard to see to which stack the logs belong to, as the logger appends the stack name instead of prepending. This is why I did it like that.

If you have a better idea, I am happy to listen. 😄

You're right, it is more readable, and I don't know go that well to propose better ways.

Now that all is good, I'll try to give it a spin before merging

@sanzoghenzo
Copy link
Collaborator

I'm sorry I didn't have the time to test this, and I don't believe I can spend time on this project for a while...
@m-adawi should I go ahead and merge it anyway, or would you like to give it a spin first?

Copy link
Collaborator

@sanzoghenzo sanzoghenzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there, sorry for the radio silence!

I finally got some time to test it, but I realized there still are some things I would do differently, mainly to concentrate the configuration in a single place and to avoid high coupling in the code.

Comment on lines +41 to +46
func getDBFilePath() string {
if path := os.Getenv("SWARMCD_DB"); path != "" {
return path
}
return "/data/revisions.db" // Default path
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this to the config module?

viper could also handle environment variables, but it is not enabled yet.
It should be a matter of adding the db field to the Config struct and the following lines to LoadConfigs

configViper.SetEnvPrefix("SWARMCD")
configViper.AutomaticEnv()
configViper.SetDefault("db", "/data/revisions.db")

Comment on lines +52 to +56
stackDb, err := initStackDB(getDBFilePath(), swarmStack.name)
if err != nil {
return nil, fmt.Errorf("failed to open database: %s", err)
}
defer stackDb.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something here, but isn't it better to initialize the database when swarmcd starts and just pass the db to updateStack? This will be a first step towards dependency injection to perform test with a fake database, or simplify the swap to another DB if needed.

Comment on lines +48 to +73
type stackDB struct {
db *sql.DB
stackName string
}

// Ensure database and table exist
func initStackDB(dbFile string, stackName string) (*stackDB, error) {
db, err := initSqlDB(dbFile)
if err != nil {
return nil, err
}

return &stackDB{db: db, stackName: stackName}, nil
}

func (stackDb *stackDB) saveLastDeployedMetadata(stackMetadata *stackMetadata) error {
return saveLastDeployedMetadata(stackDb.db, stackDb.stackName, stackMetadata)
}

func (stackDb *stackDB) loadLastDeployedMetadata() (*stackMetadata, error) {
return loadLastDeployedMetadata(stackDb.db, stackDb.stackName)
}

func (stackDb *stackDB) close() error {
return stackDb.db.Close()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is really needed, it seems to me that it adds complexity just to avoid repeating the name of the stack on 3 LOC

@clangenb
Copy link
Author

Hey there, thanks for looking into this. I have put in some considerable effort into this, which goes beyond this PR.

I guess, it is better to extract some new PRs based on my current state if there is still interest in this. :)

@sanzoghenzo
Copy link
Collaborator

sanzoghenzo commented Oct 21, 2025

Hey there, thanks for looking into this. I have put in some considerable effort into this, which goes beyond this PR.

Oh wow, now I see the other branches on your fork.. you've done a tremendous job!
Again, sorry for ignoring this 😅

I guess, it is better to extract some new PRs based on my current state if there is still interest in this. :)

Of course! Should we close this and start with a new PR? Or could you push your latest edits on this branch? (We will sqash everything in a single feature commit, so the lengthy commit history is not a problem!)

@vhaudiquet
Copy link
Contributor

This should fix #126, and I'm really grateful for your work ! :)

@clangenb
Copy link
Author

Sorry I was really busy the last two weeks. I will try to follow up soon. I guess, I will do separate PR, but keep this open for visibility until the new one is in place.

@vhaudiquet
Copy link
Contributor

vhaudiquet commented Nov 18, 2025

@clangenb Any updates on this? I would really love to try that! Is the version on your branch ready or does it need more work? I could build it myself in the mean time

@clangenb
Copy link
Author

Thanks for remindinding me. I should have time this today. 😃

@clangenb
Copy link
Author

clangenb commented Nov 20, 2025

Sorry, I could not use the time window I thought that I have yesterday. But I invite you to check downstream's develop branch: https://github.com/clangenb/swarm-cd/tree/develop

I have been using it in production since a year now without any issues. It contains this fix, and also quite some other improvements.

@vhaudiquet
Copy link
Contributor

I have to say, thank you! I've been running your branch for a week now and it has been wonderful.
The only thing I miss is that swarm-cd used to automatically update images that I tagged :latest, which is not the case anymore. I've heard that this wasn't really a feature, and wasn't really reliable anyway, so I won't mind losing that for now. I guess that could be fully implemented as a feature later on, if needed.

@clangenb
Copy link
Author

Cool, glad to hear that! I will still try to upstream this eventually, but it could be that it needs to wait until the Christmastime.

Yeah, the latest has always been kind of tricky. It should be possible to implement though, although I would still prefer to have an explicit re-pull of the image in my case. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants