Skip to content

Commit d49bc17

Browse files
authored
Merge pull request #217 from max-rocket-internet/readme_update
Adding helm chart info to readme and fixing some formatting
2 parents cf9b6cf + 4ecb267 commit d49bc17

File tree

1 file changed

+26
-4
lines changed

1 file changed

+26
-4
lines changed

README.md

Lines changed: 26 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# node-problem-detector
2+
23
[![Build Status](https://travis-ci.org/kubernetes/node-problem-detector.svg?branch=master)](https://travis-ci.org/kubernetes/node-problem-detector) [![Go Report Card](https://goreportcard.com/badge/github.com/kubernetes/node-problem-detector)](https://goreportcard.com/report/github.com/kubernetes/node-problem-detector)
34

45
node-problem-detector aims to make various node problems visible to the upstream
@@ -12,6 +13,7 @@ Now it is running as a
1213
enabled by default in the GCE cluster.
1314

1415
# Background
16+
1517
There are tons of node problems could possibly affect the pods running on the
1618
node such as:
1719
* Infrastructure daemon issues: ntp service down;
@@ -29,6 +31,7 @@ layers. Once upstream layers have the visibility to those problems, we can discu
2931
[remedy system](#remedy-systems).
3032

3133
# Problem API
34+
3235
node-problem-detector uses `Event` and `NodeCondition` to report problems to
3336
apiserver.
3437
* `NodeCondition`: Permanent problem that makes the node unavailable for pods should
@@ -37,6 +40,7 @@ be reported as `NodeCondition`.
3740
should be reported as `Event`.
3841

3942
# Problem Daemon
43+
4044
A problem daemon is a sub-daemon of node-problem-detector. It monitors a specific
4145
kind of node problems and reports them to node-problem-detector.
4246

@@ -57,7 +61,9 @@ List of supported problem daemons:
5761
| [CustomPluginMonitor](https://github.com/kubernetes/node-problem-detector/blob/master/config/custom-plugin-monitor.json) | On-demand(According to users configuration) | A custom plugin monitor for node-problem-detector to invoke and check various node problems with user defined check scripts. See proposal [here](https://docs.google.com/document/d/1jK_5YloSYtboj-DtfjmYKxfNnUxCAvohLnsH5aGCAYQ/edit#). |
5862

5963
# Usage
64+
6065
## Flags
66+
6167
* `--version`: Print current version of node-problem-detector.
6268
* `--address`: The address to bind the node problem detector server.
6369
* `--port`: The port to bind the node problem detector server. Use 0 to disable.
@@ -81,6 +87,7 @@ For example, to run without auth, use the following config:
8187
* `--hostname-override`: A customized node name used for node-problem-detector to update conditions and emit events. node-problem-detector gets node name first from `hostname-override`, then `NODE_NAME` environment variable and finally fall back to `os.Hostname`.
8288

8389
## Build Image
90+
8491
* `go get` or `git clone` node-problem-detector repo into `$GOPATH/src/k8s.io` or `$GOROOT/src/k8s.io`
8592
with one of the below directions:
8693
* `cd $GOPATH/src/k8s.io && git clone git@github.com:kubernetes/node-problem-detector.git`
@@ -97,17 +104,29 @@ You should download the systemd develop files first. For Ubuntu, `libsystemd-jou
97104
be installed. For Debian, `libsystemd-dev` package should be installed.
98105

99106
## Push Image
107+
100108
`make push` uploads the docker image to registry. By default, the image will be uploaded to
101109
`staging-k8s.gcr.io`. It's easy to modify the `Makefile` to push the image
102110
to another registry.
103111

104-
## Start DaemonSet
105-
* Edit [node-problem-detector.yaml](https://github.com/kubernetes/node-problem-detector/blob/master/deployment/node-problem-detector.yaml) to fit your environment: Set `log` volume to your system log directory. (Used by SystemLogMonitor). For **kubernetes <1.9** use [node-problem-detector-old.yaml](https://github.com/kubernetes/node-problem-detector/blob/master/deployment/node-problem-detector-old.yaml)
106-
* If needed, you can use [ConfigMap](https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/)
107-
to overwrite the `config/`, Edit [node-problem-detector-config.yaml](https://github.com/kubernetes/node-problem-detector/blob/master/deployment/node-problem-detector-config.yaml) to fit your environment. and create the ConfigMap with `kubectl create -f node-problem-detector-config.yaml`.
112+
## Installation
113+
114+
The easiest way to install node-problem-detector into your cluster is to use the [Helm](https://helm.sh/) [chart](https://github.com/helm/charts/tree/master/stable/node-problem-detector):
115+
116+
```
117+
helm install stable/node-problem-detector
118+
```
119+
120+
Or alternatively, to install node-problem-detector manually:
121+
122+
* Edit [node-problem-detector.yaml](deployment/node-problem-detector.yaml) to fit your environment. Set `log` volume to your system log directory (used by SystemLogMonitor). For Kubernetes versions older than 1.9, use [node-problem-detector-old.yaml](deployment/node-problem-detector-old.yaml).
123+
124+
* If needed, you can use a [ConfigMap](https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/) to overwrite the `config` directory inside the pod. Edit [node-problem-detector-config.yaml](deployment/node-problem-detector-config.yaml) as required and create the `ConfigMap` with `kubectl create -f node-problem-detector-config.yaml`.
125+
108126
* Create the DaemonSet with `kubectl create -f node-problem-detector.yaml`.
109127

110128
## Start Standalone
129+
111130
To run node-problem-detector standalone, you should set `inClusterConfig` to `false` and
112131
teach node-problem-detector how to access apiserver with `apiserver-override`.
113132

@@ -119,6 +138,7 @@ node-problem-detector --apiserver-override=http://APISERVER_IP:APISERVER_INSECUR
119138
For more scenarios, see [here](https://github.com/kubernetes/heapster/blob/master/docs/source-configuration.md#kubernetes)
120139

121140
## Try It Out
141+
122142
You can try node-problem-detector in a running cluster by injecting messages to the logs that node-problem-detector is watching. For example, Let's assume node-problem-detector is using [KernelMonitor](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json). On your workstation, run ```kubectl get events -w```. On the node, run ```sudo sh -c "echo 'kernel: BUG: unable to handle kernel NULL pointer dereference at TESTING' >> /dev/kmsg"```. Then you should see the ```KernelOops``` event.
123143

124144
When adding new rules or developing node-problem-detector, it is probably easier to test it on the local workstation in the standalone mode. For the API server, an easy way is to use ```kubectl proxy``` to make a running cluster's API server available locally. You will get some errors because your local workstation is not recognized by the API server. But you should still be able to test your new rules regardless.
@@ -139,6 +159,7 @@ For example, to test [KernelMonitor](https://github.com/kubernetes/node-problem-
139159
- For [KernelMonitor](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json) message injection, all messages should have ```kernel: ``` prefix (also note there is a space after ```:```).
140160

141161
# Remedy Systems
162+
142163
A _remedy system_ is a process or processes designed to attempt to remedy problems
143164
detected by the node-problem-detector. Remedy systems observe events and/or node
144165
conditions emitted by the node-problem-detector and take action to return the
@@ -156,6 +177,7 @@ Kubernetes cluster to a healthy state. The following remedy systems exist:
156177
for an example production use case for Draino.
157178

158179
# Links
180+
159181
* [Design Doc](https://docs.google.com/document/d/1cs1kqLziG-Ww145yN6vvlKguPbQQ0psrSBnEqpy0pzE/edit?usp=sharing)
160182
* [Slides](https://docs.google.com/presentation/d/1bkJibjwWXy8YnB5fna6p-Ltiy-N5p01zUsA22wCNkXA/edit?usp=sharing)
161183
* [Plugin Interface Proposal](https://docs.google.com/document/d/1jK_5YloSYtboj-DtfjmYKxfNnUxCAvohLnsH5aGCAYQ/edit#)

0 commit comments

Comments
 (0)