import { Callout } from '@/components/Callout' import ExportedImage from "next-image-export-optimizer";
At its core, the goal of Pelican's authorization system is simple: give the right people access to the right data, and protect the data from everyone else. Whether data is private, public, or shared with a specific collaboration, Pelican ensures that only authorized users can access it. To achieve this in a distributed environment without sacrificing performance, Pelican relies on a modern, token-based architecture that separates the responsibility of verifying identity from the responsibility of granting access.
While often used interchangeably in casual conversation, Authentication and Authorization (often abbreviated AuthN and AuthZ) are distinct concepts in computer security.
- Authentication (AuthN) is the process of verifying who you are. It's used to confirm your identity and typically involves providing some kind of secret or unique thing that only you possess. Common authentication methods include passwords, certificates, or federated identity providers like CILogon.
- Authorization (Authz) answers the question what you are allowed to do. It determines your permissions and is typically managed through policies, roles, or access tokens.
Part of what makes these concepts confusing is that in order to determine what you're allowed to do, we usually first have to determine who you are; before granting authorization, you have to be authenticated.
This is true, but Pelican draws a clear line between the two by leaving it up to external identity providers to handle the "who" (Authentication), while Pelican focuses exclusively on the "what" (Authorization). Pelican does not manage usernames, passwords, or user accounts. Instead, it relies on the OpenID Connect (OIDC) standard to integrate with trusted third-party identity providers like CILogon, university logins, or Google.
In a distributed environment like the OSDF, Federated Authentication is notoriously difficult. It requires that every service in the federation trusts and understands the identity providers of every user. If a user from University A wants to access data at Laboratory B, Laboratory B must be able to verify University A's credentials directly. Scaling this "mesh of trust" across hundreds of institutions is complex and fragile.
Pelican addresses this by relying on Federated Authorization. Instead of passing user identities around, we pass tokens with capabilities. When a user authenticates with their home institution, that institution's identity provider gives information about who the user is to a token issuer. The token issuer then decides what the user is allowed to do and creates a token explaining those permissions. Importantly, this token doesn't say "This is User X"; it says "The bearer of this token is allowed to read /data/project-y".
The services in the federation (Origins and Caches) do not need to know who the user is; they only need to trust the Issuer that signed the token and verify that the token grants the necessary permissions. This decouples the identity verification from resource access, allowing Pelican to protect data without ever needing to store or manage sensitive user credentials.
In the context of Pelican and modern web security, an Authorization Token is a portable, digital credential that grants access to specific resources. While tokens can be used for authentication, Pelican uses them exclusively for authorization to specify what the token holder is allowed to do.
Sometimes these tokens are also referred to as Bearer Tokens because we don't make any assertions about who is presenting the tokens, only about what the token says the bearer/presenter is allowed to do. This is a sure sign that we're working with an authorization framework, not an authentication framework.
Since there is a distinction between authentication tokens and authorization tokens, whenever these docs refer simply to "tokens", we're talking about "authorization" tokens.
A good analogy to understand how tokens work is to think of them like concert tickets:
Importantly, if you tried to use a ticket for a different band or you scribbled all over the token (tampered with it), you'd be denied entry by the security guard. These same principles apply to tokens, which is why they work well for guarding protected resources. While tokens cannot be revoked the way certificates can be, their short lifetimes minimize the potential for leaking data by building in an automatic expiration.
Ultimately, the security guarantees of tokens come from public/private key cryptography, which lets the token issuer sign tokens using a secret while the public key is openly available for anyone to verify the signature on the token. In practice, the service that signs the token and the service that hosts the public keys for verification are often split. For more information about this, see the section on "Issuers".
So far we've discussed what role tokens play and how they work at a high level, but we haven't yet discussed what a real token looks like.
Pelican uses a specific type of token called a JSON Web Token (JWT). A JWT is an open standard (RFC 7519) that defines a compact and self-contained way for securely transmitting information between parties as a JSON object.
JWTs consist of three parts:
- Header: Describes how the token was signed (e.g., "We used the RS256 algorithm").
- Payload: Contains the "Claims" — the actual data/permissions. This includes:
iss(Issuer): Who created this token?aud(Audience): Who do we expect this token to be presented to?exp(Expiration): When does this token expire?iat(Issued At): When was this token created?nbf(Not Before): When does the lifetime of this token start?scope(Scope): Which actions can be performed on which resources, i.e. what is this token allowed to do?jti(JWT Identifier): A unique identifier that is specific to this token.- Other fields can be defined according to specific token profiles (see "Token Profiles" for more information)
- Signature: The cryptographic proof that ensures the token was created by the right issuer and hasn't been altered. Signatures do not decode to JSON
In order to pass these tokens around, each part of the token is base64-encoded and strung together using periods (.), i.e. Header.Payload.Signature.
However, they cannot change that information. Changing the payload would invalidate the Signature, causing the token to be rejected by Pelican.
If you're working with JWTs, it's useful to know about a few tools that help you decode them for interpreting what they say.
Both the SciTokens demo site and jwt.io are great web-based portals for converting tokens to/from their encoded and decoded forms. In particular, jwt.io is nice because you can hover your mouse over the token's timestamps to see them in human-readable form. Both websites operate purely in your browser, so any tokens you input won't be sent anywhere -- this lets you use the sites without worrying about anyone stealing your secrets!
In general, most of the JWTs you'll need to create to work with Pelican can be created using Pelican's command line token tools.
However, other command line libraries like the ones provided at https://demo.scitokens.org/ and the htgettoken tool can be used to create/decode generic JWTs.
JWT is a generic framework for packing information into a string that's easy to pass between web services.
While some of the core JSON keys like iss, exp, etc. are present in every JWT, some people have extended the keys/semantics of JWTs to communicate specific information relevant in their ecosystems.
Specifications that describe the extra contents of a JWT are called profiles, and they typically describe which "scopes" are recognized and which other fields are mandatory.
The two profiles used in the Pelican ecosystem are the WLCG Profile and the SciTokens profile.
The Worldwide Large Hadron Collider (WLCG) auth group maintains the WLCG token profile.
Some of the WLCG scopes/capabilities Pelican uses include:
- storage.read:/path/to/resource: grants the ability to read the specified resource
- storage.create:/path/to/resource: grants the ability to create but not modify the specified resource
- storage.modify:/path/to/resource: a superset of storage.create, grants the ability to modify and delete the specified resource
See WLCG's Token Profile documentation for more information about other WLCG token requirements and available capabilities/scopes.
The SciTokens Profile is another option Pelican understands how to work with, although Pelican prefers the WLCG profile.
Some of the SciTokens scopes/capabilities Pelican uses include:
- read:/path/to/resource: grants the ability to read the specified resource
- write:/path/to/resource: grants the ability to create/modify the specified resource
See the SciTokens site for generic documentation and the SciTokens Claims specification for more information about other SciTokens requirements and available capabilities/scopes.
Be careful using tokens that let you modify object contents after the object has been written -- Pelican objects should be treated as immutable, so these tokens can get you in trouble!While discussing what tokens are and how they work, it's been unavoidable to briefly discuss the concept of issuers, but up until now we haven't rigorously explained what issuers are and why it's difficult to precisely describe them.
At its simplest, an Issuer is the authority that creates, signs, and manages tokens. It possesses a cryptographic private key that only it has, which lets other services verify that tokens are coming from the right source. In our concert ticket analogy, the Issuer is a combination of the ticket booth and the clipboard; it is the only entity trusted to print valid tickets, and it provides a means for others to double check a ticket's validity.
Part of what makes discussing the term "Issuer" difficult is that it is heavily overloaded. In fact, an Issuer may refer to two related-yet-distinct things, as evidenced by the fact that our analogy needed both a ticket booth and a clipboard to describe it:
- A service that creates tokens when provided with identifiers from a trusted identity provider (ticket booth)
- A service that others can use to look up the public keys linked to the token creator's private keys (clipboard)
These two functions can live in the same place, but it's often beneficial to split them up
To distinguish between these two concepts, we'll refer to 1 as the "Get Issuer" (you "get" tokens from it) and 2 as the "Verify Issuer" (you use it to "verify" that a token is legitimate)
The "Get Issuer" is used by clients to fetch any tokens that are needed to perform an action on a resource (e.g. get/put /foo/bar)
This is the private, secure side of the issuer. Before a user can get a token, they must prove who they are to an identity provider. Once that provider validates the user's identity, it passes identifiers to the "Get Issuer" like which groups the user belongs to. Finally, the "Get Issuer", which is configured to know which tokens can be created for which groups, provides a cryptographically-signed token for the request.
This is an important distinction: Issuers are *not* the same as Identity Providers. Rather, Identity Providers authenticate a user and then pass the user's identifiers along to the issuer. The issuer is configured to know which tokens it can create based on those identifiers.In Pelican, the "Get Issuer" is often a service called OA4MP (OAuth for Many People) built into the Origin. Other services can be configured to issue tokens as well, like HTCondor's CredMon.
However the "Get Issuer" is set up, the Origin must be configured to trust it by providing the source of truth for the public keys that can be used to verify tokens it creates, which leads us to...
The "Verify Issuer" is used by Origins and Caches or any service that needs to check the validity tokens. It is the public-facing side of an Issuer and must make any public keys associated with the "Get Issuer's" private keys publicly accessible.
The only thing you need to interact with a "Verify Issuer" is its URL and knowledge of its key discovery protocol. Because the entire issuer scheme follows the OpenID Connect (OIDC) specification, the "Verify Issuer's" public keys can be fetched according to:
- Given the "Verify Issuer's" URL, append the path
/.well-known/openid-configurationand download the JSON hosted here. For example, if the URL ishttps://osg-htc.org, you'd fetchhttps://osg-htc.org/.well-known/openid-configurationand parse it as JSON. - This JSON will provide a key named
"jwks_uri"whose value is another URL. The public keys live as a JSON Web Keyset (JWKS) behind this URL. - By fetching and keys pointed to by the
"jwks_uri", you can check whether any given token was in fact signed by the corresponding "Get Issuer".
Crucially, the "Verify" side does not know or care who the user is or who is asking for the keys. It simply publishes the mathematical keys required to check the signature of the token ("hologram" on the ticket).
At the end of the day, the user provides a token to an Origin or Cache along with a request to perform an action on a resource (read /foo).
But it's the Origin/Cache that has to decide whether it will fulfill the request and let the user do what it wants.
Because this process involves stringing together multiple services, it's worth taking a moment to analyze how trust is bootstrapped between each of them.
-
The Storage Provider Trusts the Origin and the Identity Provider: The entire chain of trust starts with the owner of some underlying storage trusting the Origin to enforce whatever access policies it provides. Crucially, because the Storage Provider defines the policies that map a user's identity to their permissions (e.g. "Bob can read /foo"), they must fundamentally trust the Identity Provider to accurately verify that identity. If the Identity Provider cannot be trusted to say "This is Bob", nobody can safely grant "Bob" access.
-
The "Issuer" Trusts the Identity Provider: When users authenticate with the Identity Provider, the "Get Issuer" trusts the identifiers it is handed. This trust is typically established when the Issuer is registered as a client with the Identity Provider. The Issuer trusts that the Identity Provider has rigorously verified the user's credentials (password, MFA) before asserting their identity and attributes.
-
Origins Trust their Issuers (both "Get" and "Verify") and Federation Central Services (Director/Registry): The Origin is explicitly configured with which Issuers it trusts for which namespaces. This is done by providing the "Verify Issuer" URL that hosts a set of public keys in the Origin's exports. The Origin trusts that if a token bears the signature of a trusted Issuer, the permissions inside that token are valid. Furthermore, Origins send information about their namespaces Issuers to the Federation's Director, which the Origin also trusts.
-
The Director/Registry Can Verify Origins/Caches: When Origins and Caches join a federation, they register their identities with the Registry and provide a public key corresponding to a private key they possess. Origins/Caches advertise who they are and what they do to the Director service, which uses the Registry to verify their identities. The Director can hand out the information in these advertisements, and anyone who's part of the federation trusts the information because they trust the Director. Because Origins trust Central Services and Central Services trust Caches, Origins transitively trust that Caches will respect their access policies.
-
Caches Trust the Director/Registry: Because Caches can grant access to copies of namespaced data, they must know which Issuers are trusted for which namespaces. Each Origin/namespace and their Issuers are advertised to the Director by the Origin. Because both the Cache and the Origin trust the Director, the Cache trusts each namespace Issuer it learns about via the Director.
By chaining these trust relationships together transitively, we create a system where a Cache can serve an object to a user without ever knowing who the user is or directly contacting the user's home institution.
XRootD is the service at Origins and Caches that serves requests for data. It's also the service that receives tokens and decides whether the token is sufficient to permit the request.
XRootD does this using two frameworks, the "Authorization Database File" (often abbreviated as "authfile") and the XRootD-Scitokens plugin. Note that SciTokens plugin works with multiple token formats (WLCG, SciTokens).
When Origins/Caches start up, they parse the policies provided in their configuration (Origins) or discovered via the Director (Caches) to generate configuration files for both of these frameworks. Because Caches usually serve multiple federation namespaces, a Cache's authfile/SciTokens configuration is a union over the policies of the federation's namespaces as presented by Origins.
Additionally, these generated configuration files may be merged with admin-supplied files by specifying the Xrootd.Authfile and Xrootd.ScitokensConfig config parameters.
If you're setting up an Origin to serve protected data and you can use tokens to download directly via the Origin but not via Caches, double check that you're not relying on admin-supplied authfile/SciTokens configuration!
Whenever an Origin/Cache receives a request, it first consults the SciTokens configuration and then falls back to the authfile if access cannot be granted via the SciTokens plugin.
The SciTokens config file's basic contents include a list of Issuer URLs along with which namespaces (base paths) those Issuers should be used for.
Below is a sample of a generated Origin SciTokens configuration: ```ini # # Copyright (C) 2024, Pelican Project, Morgridge Institute for Research # # Licensed under the Apache License, Version 2.0 (the "License"); you # may not use this file except in compliance with the License. You may # obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. #[Global] audience_json = ["https://my-origin.com:8443"]
[Issuer Origin https://my-origin:8440 and Built-in Monitoring] issuer = https://my-origin:8440 base_path = /my-namespace, /pelican/monitoring
[Issuer Federation-based Monitoring] issuer = https://osg-htc.org base_path = /pelican/monitoring default_user = xrootd
</Callout>
For more information about XRootD's SciTokens plugin configuration, see https://github.com/xrootd/xrootd/tree/master/src/XrdSciTokens.
### Authfile Configuration
Each line in an authfile at Origins/Caches maps some kind of identifier to a list of path:privilege pairs.
While admin-supplied authfiles can be quite complicated, a Pelican-generated authfile that hasn't been merged with anything will only use the `l` (list) and `r` (read) privileges for a given path.
A `-` is used before each set of privileges to subtract those privileges, while the absence of a `-` means the privileges are granted.
Pelican-generated authfiles are never used to grant write privileges -- this is always done via the SciTokens configuration.
<Callout type="example">
Below is a sample of a generated Origin authfile configuration:
```text
u * /my-prefix-auth -lr /.well-known lr /my-prefix lr
This authfile allows any user to list/read the contents of the /my-prefix namespace with /my-prefix lr, but prevents all users from listing/reading the contents of /my-prefix-auth with /my-prefix/auth -lr.
The /.well-known path for the Origin's public keys is also exposed via XRootD.
For more information about XRootD Authfiles, see https://xrootd.web.cern.ch/doc/dev56/sec_config.htm#_Toc119617472