-
Notifications
You must be signed in to change notification settings - Fork 221
rfc: add initial draft of pcap-cf #1309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
rfc: add initial draft of pcap-cf #1309
Conversation
Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments and points of discussion.
General:
- I'm missing the "opt-in" nature and control of the feature and discussion of possible security risks when this feature is enabled overall. The "attacker" in this case is a rouge operator and that might be more of an organizational problem. But the app's operator shouldn't be able to override an org policy, etc.
- Because this is an analog to bosh pcap, maybe we can explain the functionality and one of its main benefits (multiplexing of traffic captured across multiple instances) a bit earlier? Right now it's in the very last sentence of the very last section (before the references). If it's important it should be repeated.
Application developers frequently need to perform in-depth troubleshooting | ||
of their applications when deployed via Cloud Foundry buildpacks. Currently, | ||
there is no possibility for app developers to perform privileged debugging | ||
actions such as packet captures (tcpdump) within their application | ||
containers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lack of possibility to inspect network traffic (see comment below) is the core problem, because that requires elevated privileges, at least for network analysis capabilities.
So maybe the next paragraph could be the start and we end with: "Currently this is not possible as such network analysis requires elevated privileges, which cannot be gained." (or something like that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you want me to change. Is the first sentence of the paragraph the issue as it mentions just generic troubleshooting whereas the RFC should focus on the network part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment was mostly together with the next paragraph that you want to drop now. So combined it could be something like this:
Application developers frequently need to perform in-depth troubleshooting | |
of their applications when deployed via Cloud Foundry buildpacks. Currently, | |
there is no possibility for app developers to perform privileged debugging | |
actions such as packet captures (tcpdump) within their application | |
containers. | |
Application developers frequently need to perform in-depth troubleshooting | |
of their applications when deployed via Cloud Foundry buildpacks. Network | |
analysis tasks, such as packet captures (tcpdump), connectivity checks and | |
performance checks, require elevated privileges as the captured data may be sensitive. | |
Currently, there is no possibility for app developers to perform any privileged | |
actions within their application containers, which also excludes such network analysis tasks. |
While [RFC-0019 (pcap-bosh)](rfc-0019-pcap-bosh.md) addresses packet capture | ||
at the BOSH infrastructure level for operators, there remains a gap for | ||
application developers who need similar capabilities scoped to their | ||
individual applications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also quite LLM-y.
The core message: We have the functionality for BOSH already and having it for CF is desirable / a logical next step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this does clearly state the issue, no? The next step comes in the proposal section of the RFC, this is just identifying the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the 'remains a gap' thing triggered my LLM detector.
It's two groups of people: platform operators (bosh pcap) and app developers / operators (cf pcap). So the 'gap' is a conceptual one, as one of those groups has no ability yet and would like to gain it.
My main point:
While [RFC-0019 (pcap-bosh)](rfc-0019-pcap-bosh.md) addresses packet capture | |
at the BOSH infrastructure level for operators, there remains a gap for | |
application developers who need similar capabilities scoped to their | |
individual applications. | |
[RFC-0019 (pcap-bosh)](rfc-0019-pcap-bosh.md) addresses packet capture | |
at the BOSH infrastructure level for platform operators. Application developers | |
and operators, an equally important group of users, need similar capabilities, | |
scoped to their individual applications. |
The challenge is providing this functionality while maintaining the security | ||
model of Cloud Foundry, where applications run in isolated, unprivileged | ||
containers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we highlight here that this could be achieved by not generally elevating privileges but elevating privileges for clearly defined, narrow use cases?
Or rather: "Elevated privileges can easily be misused and it is paramount (or some less fancy word) that the security model of CF remains intact, while giving app developers and operators the choice and tools to be able to asses network traffic."
This also alludes to this feature being opt-in. You might not want to enable it permanently, and you may not want to enable it for some specific apps at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like this?
The challenge is providing this functionality while maintaining the security | |
model of Cloud Foundry, where applications run in isolated, unprivileged | |
containers. | |
While platform traffic can be secured from eavesdropping, e.g. via mTLS, access to | |
the network traffic in a container will contain sensitive and possibly private | |
information. | |
Elevated privileges of any kind can easily be misused and it is paramount that the | |
security model of Cloud Foundry remains intact, where applications usually run in | |
isolated, unprivileged containers. | |
Any solution must consider that network capture is a privilege that has to be | |
enabled explicitly, not given by default and can be forbidden altogether. |
|
||
#### Option 1: `sudo` Access | ||
|
||
Adding a platform switch to make the `vcap` user a sudoer was considered but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general sudo is dangerous. But you can limit what commands can be run via sudo. Has this been explored in more detail?
Because tcpdump is still in cflinuxfsX
as is, this might be equally dangerous or extremely hard to verify. Still worth mentioning that there's a possibility beyond sudo
== vcap acts as full root
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was discussed but discarded. It opens up a whole array of security issues caused by misconfiguration as creating a secure sudo config is not trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was discussed but discarded.
This whole section is 'discussed but discarded'. My request was to maybe mention this with a half sentence, i.e. "we considered limited sudo for specific invocations but it's not viable."
applications | ||
|
||
## Implementation Considerations | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is customary to write a small intro paragraph that outlines what the next bits will contain.
Something like: based on the proposed solution for a custom packet capturing tool, the following sections describe
- the technical detail for said tool
- a description of the cf cli plugin
pcap
, its functionality and user experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a sentence, please check again.
Maybe a general note on the opt-in: there were no extensive discussions on what that "opt-in" would look like. In my opinion it should just be the existing SSH access and not a dedicated flag. Mainly because having SSH access already puts you in a position of being able to do a lot and we have this setup as a feature flag on the different levels of CF. |
the various lifecycle archives that are added to the final app container and | ||
the necessary capabilities (`CAP_NET_RAW` and `CAP_NET_ADMIN`) will be assigned | ||
to the executable via file capabilities. This allows regular users to gain those | ||
capabilities when executing the binary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capabilities when executing the binary. | |
capabilities when executing the binary. | |
The functionality is limited to network capturing with the aforementioned scope | |
of selected network interface, filtering expression in pcap-filter format and | |
length of captured packets (snaplen). This reduces the attack surface, compared | |
to invoking a full `tcpdump`. |
(if it's important, repeat it ;-) )
Similar to the `bosh pcap` command a `cf pcap` command will be added. Like its | ||
predecessor it will connect to the desired instances via SSH and execute the new | ||
packet capturing tool and stream back the captured packets via stdout. If there | ||
are multiple streams, the CLI will merge them and write them out to a single | ||
file in the pcap format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the `bosh pcap` command a `cf pcap` command will be added. Like its | |
predecessor it will connect to the desired instances via SSH and execute the new | |
packet capturing tool and stream back the captured packets via stdout. If there | |
are multiple streams, the CLI will merge them and write them out to a single | |
file in the pcap format. | |
Similar to the `bosh pcap` command, a `cf pcap` command will be added. Like its | |
counterpart, it will connect to the desired instances via SSH, execute the new | |
packet capturing tool and stream back the captured packets via stdout and thus via SSH. | |
If there are multiple streams, the CLI will merge them and write them out to a single | |
file in the pcap format. |
"predecessor" makes it look like bosh pcap might be going away.
I see your point. That said, capturing network traffic might show data that is not even stored plain-text in a database that you could gain access to via SSH. So maybe something worth discussing more extensively. For future discussions and votes: My opinion is that network capture is a privilege beyond SSH. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Would a user be able to see unencrypted traffic contents?
-
What roles would be able to use this comment for a given app?
-
Would this feature always be on, or would there be a foundation wide flag?
-
Could there be a
cf event
to log when this action is taken?
[meta]: #meta | ||
- Name: Integrate pcap feature for Cloud Foundry applications | ||
- Start Date: 2025-09-10 | ||
- Author(s): @domdom82 @maxmoehl @peanball @ameowlia @mariash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Author(s): @domdom82 @maxmoehl @peanball @ameowlia @mariash | |
- Author(s): @domdom82 @maxmoehl @peanball |
😅 We are happy to comment on the RFC, but we're not the authors here.
```bash | ||
# Capture HTTP traffic for myapp | ||
cf pcap myapp --interface eth0 --filter "tcp port 80" --snaplen 1500 | ||
|
||
# Capture specific instance with custom filter | ||
cf pcap myapp --instance 1 --filter "host database.example.com" | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give some examples of what the output would look like?
Some replies to Amelia's questions. @maxmoehl, please correct me if I'm wrong.
If they capture on the interface
This needs to be fleshed out in the RFC a bit. I think it depends on the scenario. In a test org or space a group of developers may want to capture stuff, in a production org maybe only select people after prior approval.
Foundation wide general flag, and ideally additional per org/space/app(?) flag.
That is a good idea, for traceability purposes. |
My current proposal (though I should maybe clarify this a bit more) does not introduce any additional permissions. As long as a user can SSH into the app they will be able to initiate a capture. That includes capturing plain-text traffic.
This is where it gets tricky. Foundational flag we can somehow make work via diego-release, it would need to control whether the binary is injected or not which makes this complicated as the injected binaries right now come via a BOSH package which is not configurable in any way. I would prefer not to add org/space/app flags beyond the SSH one1.
This comes down to the feature just being SSH with a special binary. There won't be any interaction with the CF API beyond SSH, so no special audit log will be written. I will also spend some time today addressing the remaining comments. Footnotes
|
🚀 link for easy viewing