Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"x509: certificate signed by unknown authority" when using HOSA on top of oc cluster up --metrics #164

Open
metmajer opened this issue Apr 15, 2017 · 11 comments

Comments

@metmajer
Copy link

I have deployed HOSA according to the documentation in https://github.com/hawkular/hawkular-openshift-agent/blob/master/README.adoc#running-inside-openshift on my local oc cluster up --metrics cluster. Instead of deploying the agent to the default project, I modified the instructions to deploy to openshift-infra via -n openshift-infra.

Unfortunately, HOSA keeps reporting an "x509: certificate signed by unknown authority" error when connecting to https://hawkular-metrics for both Get and Post operations:

I0415 12:33:11.607507       1 prometheus_metrics_collector.go:103] DEBUG: Told to collect [22] Prometheus metrics from [http://172.17.0.2:8080/metrics]
W0415 12:33:11.628654       1 metrics_storage.go:149] Failed to store metrics. err=Post https://hawkular-metrics/hawkular/metrics/counters/raw: x509: certificate signed by unknown authority
…
I0415 13:32:40.237395       1 discovery.go:264] DEBUG: Detected a new pod that was added: localhost/default/persistent-volume-setup-b516p/42a7b212-21d7-11e7-b0f7-12d904fd13ae
I0415 13:32:40.237412       1 discovery.go:145] DEBUG: Changed pod [localhost/default/persistent-volume-setup-b516p/42a7b212-21d7-11e7-b0f7-12d904fd13ae] does not have volume [hawkular-openshift-agent]
W0415 13:32:40.239585       1 metrics_storage.go:101] Failed to determine if metric definition [pod/9d07d697-21d7-11e7-b0f7-12d904fd13ae/custom/hawkular_openshift_agent_metric_data_points_collected_total] of type [counter] in tenant [openshift-infra] exists. err=Get https://hawkular-metrics/hawkular/metrics/counters/pod%2F9d07d697-21d7-11e7-b0f7-12d904fd13ae%2Fcustom%2Fhawkular_openshift_agent_metric_data_points_collected_total: x509: certificate signed by unknown authority

I am using oc cluster up --metrics with the following version of the oc cli:

$ oc version
oc v3.6.0-alpha.0+0343989
kubernetes v1.5.2+43a9be4
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://127.0.0.1:8443
openshift v3.6.0-alpha.0+0343989
kubernetes v1.5.2+43a9be4

I wanted to try out the most recent https://github.com/openshift/origin/releases/tag/v3.6.0-alpha.1 release, but found that the metrics-deployer component is currently borked: openshift/origin#13777.

@jmazzitelli
Copy link
Contributor

Can you look at the Origin Metrics docs and confirm you followed these directions with the configuration files specified in those docs (I am just wondering if the HOSA docs are somehow outdated due to something changing recently in Origin):

https://docs.openshift.org/latest/install_config/cluster_metrics.html#deploying-hawkular-openshift-agent

What is happening is your Origin Metrics server is using a self-signed certificate and HOSA is not accepting it. I don't know what changed, but I've never seen this before, I just wonder if your config is different than what is expected.

When all else fails, just to get you up and going (and still be able to use a self-signed certificate), I think this should work around the problem - in your agent's yaml configuration where it points to your Origin Metrics server (aka "hawkular_server"), set the tls section to indicate you want to ignore this error:

hawkular_server:
  tls:
    skip_certificate_validation: true

This feature was added recently via #161 but should be in the latest HOSA release.

Though, again, I don't know why you would need this unless something changed recently in Origin Metrics.

@metmajer
Copy link
Author

@jmazzitelli Thank you for your quick reply, this sounds like a valuable workaround. Looking at https://github.com/openshift/origin-metrics/blob/master/hawkular-agent/hawkular-openshift-agent.yaml, I am wondering where your suggestion would fit in best? The document uses environment variables, such as HAWKULAR_SERVER_URL, but not the hawkular_server map. Would you please suggest how modify?

Also, I want to confirm that I've followed the approach listed here: https://docs.openshift.org/latest/install_config/cluster_metrics.html#deploying-hawkular-openshift-agent.

@metmajer
Copy link
Author

@jmazzitelli I now understand how this needs to be done, here's how I've adapted https://github.com/openshift/origin-metrics/blob/master/hawkular-agent/hawkular-openshift-agent-configmap.yaml:

data:
  config.yaml: |
    hawkular_server:
      url: "https://metrics-openshift-infra.1.2.3.4.nip.io"
      tls:
        skip_certificate_validation: true

What I don't understand though is, why the same example doesn't work using the cluster internal hostname https://hawkular-metrics.openshift-infra.svc.cluster.local?

W0415 20:26:37.967163       1 metrics_storage.go:101] Failed to determine if metric definition [pod/d282524e-2219-11e7-a78d-12d904fd13ae/custom/process_start_time_seconds] of type [gauge] in tenant [default] exists. err=Get https://hawkular-metrics.openshift-infra.svc.cluster.local/hawkular/metrics/gauges/pod%2Fd282524e-2219-11e7-a78d-12d904fd13ae%2Fcustom%2Fprocess_start_time_seconds: dial tcp: no suitable address found

The example does only work when I use the cluster external hostname.

@metmajer
Copy link
Author

I want to add that the wget examples in https://docs.openshift.org/latest/install_config/cluster_metrics.html#deploying-hawkular-openshift-agent contain a mistake. The example downloads a GitHub HTML page, what you want to do here instead is to download the raw file contents. The following would be correct:

wget https://raw.githubusercontent.com/openshift/origin-metrics/master/hawkular-agent/hawkular-openshift-agent-configmap.yaml
wget https://raw.githubusercontent.com/openshift/origin-metrics/master/hawkular-agent/hawkular-openshift-agent.yaml

@jmazzitelli
Copy link
Contributor

The example downloads a GitHub HTML page, what you want to do here instead is to download the raw file contents.

yeah, I mentioned that earlier on the hawkular-dev mailing list. But to make it official, I wrote a git issue on the origin-metrics git repo: openshift/origin-metrics#333

@jmazzitelli
Copy link
Contributor

What I don't understand though is, why the same example doesn't work using the cluster internal hostname https://hawkular-metrics.openshift-infra.svc.cluster.local?

I wonder if this is happening because you moved the agent to the openshift-infra rather than default project? Maybe @mwringe can shed some light on this?

I should probably run this on the new 3.6 - because you are hitting things I've never seen before.

@metmajer
Copy link
Author

I wonder if this is happening because you moved the agent to the openshift-infra rather than default project? Maybe @mwringe can shed some light on this?

I actually had the agent deployed both in the default and in the openshift-infra projects.

@metmajer
Copy link
Author

metmajer commented Apr 16, 2017

For the sake of completness: when running an oc cluster up --metrics cluster, the hawkular-metrics-account secret, containing hawkular-metrics.username and hawkular-metrics.password, is missing. However, this information is required for the above mentioned HOSA deployment. I've stripped and modified the following deployment from https://github.com/openshift/origin-metrics/blob/master/deployer/scripts/hawkular.sh#L75-L92:

HAWKULAR_HOSA_PROJECT=default # will monitor all pods

HAWKULAR_METRICS_USERNAME=hawkular
HAWKULAR_METRICS_PASSWORD=`openssl rand -base64 512 | tr -dc A-Z-a-z-0-9 | head -c 17`

define() {
  IFS='\n' read -r -d '' ${1} || true;
}

oc login -u system:admin
oc project $HAWKULAR_HOSA_PROJECT

echo "Creating the Hawkular Metrics User Account Secrets"
define HAWKULAR_ACCOUNT_SECRET <<EOF
{
  "apiVersion": "v1",
  "kind": "Secret",
  "metadata":
  {
    "name": "hawkular-metrics-account",
    "labels":
    {
      "metrics-infra": "hawkular-metrics"
    }
  },
  "data":
  {
    "hawkular-metrics.username": "$(base64 <<< `echo $HAWKULAR_METRICS_USERNAME`)",
    "hawkular-metrics.password": "$(base64 <<< `echo $HAWKULAR_METRICS_PASSWORD`)"
  }
}
EOF

echo "$HAWKULAR_ACCOUNT_SECRET" | oc create -f - -n $HAWKULAR_HOSA_PROJECT

In case you want to try out a recent 3.6.0 cluster, this could be helpful for you.

The above script is not necessary. I have double-checked and the hawkular-metrics-account secret is available in the openshift-infra project upon oc cluster up --metrics.

@mwringe
Copy link
Contributor

mwringe commented Apr 19, 2017

Yeah, we have had some changes in Origin Metrics lately with rearranging how our certificates are being used. We will need to get a new update of HOSA to take this into account

@jmazzitelli
Copy link
Contributor

@jpkrohling - for some reason I can't assign you this ticket. But it's yours :)

@jpkrohling jpkrohling self-assigned this Apr 19, 2017
@jmazzitelli
Copy link
Contributor

This will be addressed in PR #167

@jpkrohling jpkrohling removed their assignment Oct 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants