Fedora 33/CentOSでKubernetesをSELinux有効で動かす

よりセキュリティを考慮したKubernetesを目指して、いろいろ調べているところです。今回はSELinuxを有効な状態でKubernetesを動かしてみることにしました。

対象のバージョンについて

Fedoraは33、CentOSは7.9および8.3のアップデート適用したバージョンを想定します。CentOS Stream 8やその他のRHELクローンでは確認していません。

追記

Fedora 34 Betaでも同様に動作しました。Fedora 34 Betaではruncがアップデートされたため、Control Group v1に切り替える必要がなくなっているのを確認しました。

OSインストール

OSをセットアップします。 Fedora ServerのISOでインストールするとファイルシステムとしてXFSが使われます。ファイルシステムがBtrfsだとKubernetes環境で使うとちょっと怪しいようです。

SELinuxの確認

あえて設定を変えていない限りデフォルトは有効になっているはずですが、念のため「enforcing」モードであることを確認しておきます。

# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Memory protection checking:     actual (secure)
Max kernel policy version:      33

Kubeadm（というかKubelet）は公式にはSELinuxに対応できておらず、公式の手順では permissiveモードにして実行する手順が用意されています。

ドキュメントにはまとめられていないようですが、KubeadmでのSELinux対応については定期的に話し合われているので、いずれドキュメント化されると思います。

今回は以下のIssueに含まれる方法で回避します。これらは kubeadm initを実行する前に実行します。

https://github.com/kubernetes/kubeadm/issues/1654

# mkdir -p /var/lib/etcd/
# mkdir -p /etc/kubernetes/pki/
# chcon -R -t svirt_sandbox_file_t /var/lib/etcd/
# chcon -R -t svirt_sandbox_file_t /etc/kubernetes/

スワップ

CentOSの場合

これらのディストリビューションは従来のスワップの仕組みが使われているため、従来どおりの対応をします。

# vi /etc/fstab  #スワップパーティションもしくはファイルをコメントアウト
# swapoff -a

Fedora 33の場合

Fedora 33ではswap-on-zramがデフォルトで設定されています。これを無効化します。

# yum remove zram-generator && swapoff -a

Control Group v2問題への対応

CentOSの場合

CentOS 7.xや8.xでは、特に対処不要です。

Fedora 31-33の場合

Fedora 33までのruncパッケージがControl Group v2に対応していないので、v1に移行するか、Control Group v2に対応するcrunを使います。本例はv1に戻す場合を例とします。詳細はFedora 31 and Control Group v2を参照してください。ちなみにFedora 34以降では対応しないでも良くなるはずです。

# grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0" && reboot

ちなみにcrunを使う方法は、次のような感じです。

ytooyama.hatenadiary.jp

追記

現在詳細は確認中ですが、まもなくruncのアップデートが行われるようで、次のバージョンが落ちてくればControl Group v2からv1への切り替えという処理は不要になるのではないかと思います。

# dnf changelog runc
メタデータの期限切れの最終確認: 0:42:38 時間前の 2021年04月06日 15時14分00秒 に実施しました。
Listing all changelogs
runc-2:1.0.0-375.dev.git12644e6.fc33.x86_64 の Changelogs
* 月  4月 05 00時00分00秒 2021 Peter Hunt <pehunt@redhat.com> - 2:1.0.0-375.dev.git12644e6
- bump to v1.0.0-rc93

* 月  4月 05 00時00分00秒 2021 Peter Hunt <pehunt@redhat.com> - 2:1.0.0-374.dev.git7e3c3e8
- Patch: revert https://github.com/opencontainers/runc/pull/2773
...

ポートの解放

10.244.0.0/16…CNIで利用するIPアドレスのレンジ
192.168.0.0/24…物理ネットワークのレンジ

を想定した場合の例が以下の通りです。1台で動かす場合はマージして実行します。

master

firewall-cmd --permanent --add-port=6443/tcp
firewall-cmd --permanent --add-port=10250/tcp
firewall-cmd --zone trusted --add-source 10.244.0.0/16 --permanent
firewall-cmd --zone trusted --add-source 192.168.0.0/24 --permanent
firewall-cmd --zone=public --add-masquerade
firewall-cmd --permanent --zone=public --add-masquerade
firewall-cmd --reload

worker

firewall-cmd --permanent --add-port=30000-32767/tcp
firewall-cmd --permanent --add-port=10250/tcp
firewall-cmd --zone trusted --add-source 10.244.0.0/16 --permanent
firewall-cmd --zone trusted --add-source 192.168.0.0/24 --permanent
firewall-cmd --zone=public --add-masquerade
firewall-cmd --permanent --zone=public --add-masquerade
firewall-cmd --reload

構成によって、追加のポート解放が必要な場合があります。例えばCalicoの場合。

docs.projectcalico.org

CRIのインストール

Kubernetes Docs - Container runtimesを参考に、CRI-Oをインストールします。

前提条件の設定変更

overlayやbr_netfilterモジュール読み込みなどを設定します。

# Create the .conf file to load the modules at bootup
cat <<EOF | sudo tee /etc/modules-load.d/crio.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# Set up required sysctl params, these persist across reboots.
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF

sudo sysctl --system

CentOSの場合

CentOSの場合は外部パッケージリポジトリーのパッケージを使って、CRI-Oをセットアップします。 Kubernetes Docs - CRI-OのCentOSの項に従ってインストールします。

Fedora 33の場合

モジュールを使って、CRI-Oをセットアップします。導入できるバージョンを確認した後リポジトリーを有効にして、CRI-Oパッケージを導入します。今回はKubernetesとCRI-O共に1.20を想定しているので、次のように実行します(基本的に一緒のバージョンを使う)。

# dnf module list cri-o
# dnf module enable cri-o:1.20
# dnf install cri-o

kubeadmのインストール

コマンドを実行して、kubeadmなどのツールをインストールします。

# cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOF

# yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes

kubeletの設定変更

Kubeletの設定にcgroup-driver=systemdを追加します。

# vi /etc/sysconfig/kubelet 
KUBELET_EXTRA_ARGS=--cgroup-driver=systemd

CRI-Oサービスを起動。

systemctl enable --now crio

Kubeletサービスを起動。

# systemctl enable --now kubelet

Kubernetesクラスターの作成

次のように実行して、クラスターを作成します。 --control-plane-endpointには、ノードのIPアドレスを指定します。

# kubeadm init --kubernetes-version 1.20.5 --pod-network-cidr=10.244.0.0/16 --control-plane-endpoint=<Public IP Address>

クラスター作成に成功すると、次のような出力があります。

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a Pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  /docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join <control-plane-host>:<control-plane-port> --token <token> --discovery-token-ca-cert-hash sha256:<hash>

kubectlコマンドを使えるようにするために以下を実行するか、上記に書かれている方法で~/.kube/configを生成します。

# export KUBECONFIG=/etc/kubernetes/admin.conf

クラスターノードのセットアップ

一台で動かす場合

masterノードをWorkerとして使う場合（1台でクラスターを動かしたい場合も含む）は次のコマンドを実行して、Podのスケジュールをできるようにする。

# kubectl taint nodes --all node-role.kubernetes.io/master-

master+ worker Nで動かす場合

同じように初期セットアップしたノードで、kubeadm init実行完了したときに出力されたコマンドを、rootユーザーでノード上で実行します。

You can now join any number of machines by running the following on each node
as root:

  kubeadm join <control-plane-host>:<control-plane-port> --token <token> --discovery-token-ca-cert-hash sha256:<hash>

CNIのセットアップ

Kubeadmのセットアップツールは基本的にクラスターを作るツールなので、それ以上のことは自動でされない。Podに対してネットワークを提供するため、ネットワークアドオンをクラスターにセットアップします。とは言っても、大抵の場合はYAMLファイルを使って導入するだけです。今回はInstall Calico networking and network policy for on-premises deploymentsの手順に従って、Calicoを導入しました。

YAMLにPod CIDRが定義されているので10.244.0.0/16に書き換えてからコマンドを実行し、Calico CNIを導入した。クラスターの規模によって、他のインストール方法を検討する必要があります。

# kubectl apply -f calico.yaml

およそ10分経ったら、ノードがReadyになっていることを確認します。

# kubectl get nodes
NAME                 STATUS   ROLES                  AGE     VERSION
fedora.localdomain   Ready    control-plane,master   3h13m   v1.20.5

デフォルトの状態でPodを作成する

CRI-Oの最近のバージョンではLinux Capabilitiesのデフォルト設定によりかなり良い感じに絞ってくれているので、ちょっと変なことをしようとしてもブロックしてくれます。ホストではSELinuxも有効になっているので、より安全です。

例えば、以前こちらのブログに書いた「HostPathを悪用したシェルの奪取」も実行できないようになっています（記事のCRI-Oの項でも触れていますが）。

tech.virtualtech.jp

Podまたはコンテナのセキュリティコンテキストを構成する

Configure a Security Context for a Pod or Containerを参考に、ポッドまたはコンテナのセキュリティコンテキストを構成するのを試してみます。

いろいろ具体例が書かれているので、ここではSELinuxによるセキュリティラベルの設定を試してみます。次のようなYAMLを作成し、Podを作ってみます。

# cat security-context-selinux.yaml
apiVersion: v1
kind: Pod
metadata:
  name: security-context-demo-selinux
spec:
  containers:
  - name: sec-ctx-selinux
    image: gcr.io/google-samples/node-hello:1.0
    securityContext:
      seLinuxOptions:
        level: "s0:c123,c456"

# kubectl create -f security-context-selinux.yaml

次のように実行すると、Podの中で指定したラベルが設定されているはずです。

# kubectl get -f security-context-selinux.yaml
NAME                            READY   STATUS    RESTARTS   AGE
security-context-demo-selinux   1/1     Running   0          59s
# kubectl exec -it security-context-demo-selinux -- sh
# ps auZ
LABEL                           USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
system_u:system_r:container_t:s0:c123,c456 root 12 0.0  0.0  4340   760 pts/0    Ss   02:21   0:00 sh
system_u:system_r:container_t:s0:c123,c456 root 18 0.0  0.0 17504  2116 pts/0    R+   02:21   0:00 ps auZ

以下に書かれているその他の例についても、想定どおり動くのを確認できます。

Configure a Security Context for a Pod or Container

ちなみに、Kubernetes のSELinuxの利用については各ドキュメントや書籍など文献を漁ってみたものの、そんなに詳しく纏まっている情報はありませんでした。とりあえず動く環境はできたため、これからじっくり調べる必要がありそうです。