服务器信息:
服务器 | IP地址 | 备注 |
master | 172.16.0.4 | 主节点 |
一、系统设置
1、所有机器需要设定/etc/sysctl.d/k8s.conf的系统参数。
cat < /etc/sysctl.d/k8s.conf # https://github.com/moby/moby/issues/31208 # ipvsadm -l --timout # 修复ipvs模式下长连接timeout问题 小于900即可 net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_intvl = 30 net.ipv4.tcp_keepalive_probes = 10 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 net.ipv4.neigh.default.gc_stale_time = 120 net.ipv4.conf.all.rp_filter = 0 net.ipv4.conf.default.rp_filter = 0 net.ipv4.conf.default.arp_announce = 2 net.ipv4.conf.lo.arp_announce = 2 net.ipv4.conf.all.arp_announce = 2 net.ipv4.ip_forward = 1 net.ipv4.tcp_max_tw_buckets = 5000 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_max_syn_backlog = 1024 net.ipv4.tcp_synack_retries = 2 net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.netfilter.nf_conntrack_max = 2310720 fs.inotify.max_user_watches=89100 fs.may_detach_mounts = 1 fs.file-max = 52706963 fs.nr_open = 52706963 net.bridge.bridge-nf-call-arptables = 1 vm.swappiness = 0 vm.overcommit_memory=1 vm.panic_on_oom=0 EOF
sysctl --system
2、检查系统内核和模块是否适合运行 docker (仅适用于 linux 系统)
curl -fsSL https://raw.githubusercontent.com/docker/docker/master/contrib/check-config.sh | bash
3、安装docker
所有机器需要安装Docker CE 版本的容器引擎,推荐使用年份命名版本的docker ce:
在官方查看K8s支持的docker版本 https://github.com/kubernetes/kubernetes 里进对应版本的changelog里搜The list of validated docker versions remain
这里利用docker的官方安装脚本来安装
export VERSION=18.06 curl -fsSL "https://get.docker.com/" | bash -s -- --mirror Aliyun
所有机器配置加速源并配置docker的启动参数使用systemd,使用systemd是官方的建议,详见 https://kubernetes.io/docs/setup/cri/
mkdir -p /etc/docker/
vim /etc/docker/daemon.json
{ "exec-opts": ["native.cgroupdriver=systemd"], "registry-mirrors": ["https://fz5yth0r.mirror.aliyuncs.com"], "storage-driver": "overlay2", "storage-opts": [ "overlay2.override_kernel_check=true" ], "log-driver": "json-file", "log-opts": { "max-size": "100m", "max-file": "3" } }
设置docker开机启动
4、切记所有机器需要自行设定ntp
二、下载并解压所需软件
wget https://github.com/etcd-io/etcd/releases/download/v3.3.12/etcd-v3.3.12-linux-amd64.tar.gz wget https://dl.k8s.io/v1.13.4/kubernetes-server-linux-amd64.tar.gz wget https://dl.k8s.io/v1.13.4/kubernetes-client-linux-amd64.tar.gz wget https://dl.k8s.io/v1.13.4/kubernetes-node-linux-amd64.tar.gz wget https://github.com/coreos/flannel/releases/download/v0.11.0/flannel-v0.11.0-linux-amd64.tar.gz
cp kubernetes/server/bin/{kube-apiserver,kube-controller-manager,kube-scheduler,kubectl} /usr/local/sbin/ cp kubernetes/node/bin/{kubectl,kubelet,kube-proxy} /usr/local/sbin/ cp etcd-v3.3.12-linux-amd64/{etcd,etcdctl} /usr/local/sbin/ cp flannel/{flanneld,mk-docker-opts.sh} /usr/local/sbin/
三、安装etcd
vim /lib/systemd/system/etcd.service
[Unit] Description=Etcd Server After=network.target After=network-online.target Wants=network-online.target [Service] Type=notify WorkingDirectory=/var/lib/etcd/ EnvironmentFile=/etc/etcd/etcd.conf #User=etcd # set GOMAXPROCS to number of processors ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) \ /usr/local/sbin/etcd \ --name=\"${ETCD_NAME}\" \ --data-dir=\"${ETCD_DATA_DIR}\" \ --listen-client-urls=\"${ETCD_LISTEN_CLIENT_URLS}\" \ --initial-advertise-peer-urls=\"${ETCD_INITIAL_ADVERTISE_PEER_URLS}\" \ --listen-peer-urls=\"${ETCD_LISTEN_PEER_URLS}\" \ --advertise-client-urls=\"${ETCD_ADVERTISE_CLIENT_URLS}\" \ --initial-cluster=\"${ETCD_INITIAL_CLUSTER}\" \ --initial-cluster-token=\"${ETCD_INITIAL_CLUSTER_TOKEN}\" \ --initial-cluster-state=\"${ETCD_INITIAL_CLUSTER_STATE}\"" Restart=on-failure LimitNOFILE=65536 [Install] WantedBy=multi-user.target
注释:
name 节点名称 data-dir 指定节点的数据存储目录 listen-client-urls 对外提供服务的地址:比如 http://ip:2379,http://127.0.0.1:2379 ,客户端会连接到这里和 etcd 交互 initial-advertise-peer-urls 该节点同伴监听地址,这个值会告诉集群中其他节点 listen-peer-urls 监听URL,用于与其他节点通讯 advertise-client-urls 对外公告的该节点客户端监听地址,这个值会告诉集群中其他节点 initial-cluster 集群中所有节点的信息,格式为 node1=http://ip1:2380,node2=http://ip2:2380,… 。注意:这里的 node1 是节点的 --name 指定的名字;后面的 ip1:2380 是 --initial-advertise-peer-urls 指定的值 initial-cluster-token 创建集群的 token,这个值每个集群保持唯一。这样的话,如果你要重新创建集群,即使配置和之前一样,也会再次生成新的集群和节点 uuid;否则会导致多个集群之间的冲突,造成未知的错误 initial-cluster-state 新建集群的时候,这个值为 new ;假如已经存在的集群,这个值为 existing
创建所需文件夹:
mkdir -p /etc/etcd/ /var/lib/etcd/
创建配置文件
vim /etc/etcd/etcd.conf
#数据存储目录 ETCD_DATA_DIR="/var/lib/etcd/default.etcd" #群集通讯URL,此处IP地址为本机IP地址 ETCD_LISTEN_PEER_URLS="http://172.16.0.4:2380" #供外部客户端使用的url, 此处IP地址为本机IP地址 ETCD_LISTEN_CLIENT_URLS="http://172.16.0.4:2379,http://127.0.0.1:2379" #etcd节点名称 ETCD_NAME="master" #广播给集群内其他成员访问的URL 此处IP地址为本机IP地址 ETCD_INITIAL_ADVERTISE_PEER_URLS="http://172.16.0.4:2380" #广播给外部客户端使用的url 此处IP地址为本机IP地址 ETCD_ADVERTISE_CLIENT_URLS="http://172.16.0.4:2379" #初始集群成员列表 此处IP地址为所有节点的名称与对应的IP地址 ETCD_INITIAL_CLUSTER="master=http://172.16.0.4:2380" #集群名称 ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-1" #初始集群状态,new为新建集群 ETCD_INITIAL_CLUSTER_STATE="new"
测试并启动服务:
systemctl daemon-reload systemctl enable etcd systemctl start etcd systemctl status etcd
查看成员:
etcdctl member list
每个节点上执行查看健康状态:
etcdctl cluster-health
问题: 检查群集只有本机节点信息,没有其他节点信息
配置好集群启动服务器不生效:原因是etcd服务已经初始化etcd数据库,这时需要删除数据库文件 “/var/lib/etcd/default.etcd/member/ ” 此目录下的所有文件,重启服务时会报错不用管将所有节点重启完成,在检查所有节点服务是否正常,没有启动的重新启动就行了
四、安装master
1、创建kube-apiserver.service
vim /lib/systemd/system/kube-apiserver.service
[Unit] Description=Kubernetes API Server Documentation=https://github.com/GoogleCloudPlatform/kubernetes After=network.target After=etcd.service [Service] EnvironmentFile=/etc/kubernetes/apiserver ExecStart=/usr/local/sbin/kube-apiserver $KUBE_API_ARGS Restart=on-failure Type=notify LimitNOFILE=65536 [Install] WantedBy=multi-user.target
创建kube-apiserver.service配置文件
vim /etc/kubernetes/apiserver
KUBE_API_ARGS="--storage-backend=etcd3 --etcd-servers=http://127.0.0.1:2379 --insecure-bind-address=0.0.0.0 --insecure-port=8080 --service-cluster-ip-range=169.169.0.0/16 --service-node-port-range=1-65535 --admission-control=NamespaceLifecycle,LimitRanger,ResourceQuota --logtostderr=false --log-dir=/var/log/kubernets/log --v=2"
2、创建kube-controller-manager.service
vim /lib/systemd/system/kube-controller-manager.service
[Unit] Description=Kubernetes Controller Manager Documentation=https://github.com/GoogleCloudPlatform/kubernetes [Service] EnvironmentFile=/etc/kubernetes/controller-manager ExecStart=/usr/bin/kube-controller-manager $KUBE_CONTROLLER_MANAGER_ARGS Restart=on-failure LimitNOFILE=65536 [Install] WantedBy=multi-user.target
创建controller-manager.service配置文件
vim /etc/kubernetes/controller-manager
KUBE_CONTROLLER_MANAGER_ARGS="--master=http://127.0.0.1:8080 --logtostderr=true --log-dir=/var/log/kubernets/log --v=2"
3、创建kube-scheduler.service
vim /lib/systemd/system/kube-scheduler.service
[Unit] Description=Kubernetes Scheduler Plugin Documentation=https://github.com/GoogleCloudPlatform/kubernetes [Service] EnvironmentFile=/etc/kubernetes/scheduler ExecStart=/usr/bin/kube-scheduler $KUBE_SCHEDULER_ARGS Restart=on-failure LimitNOFILE=65536 [Install] WantedBy=multi-user.target
创建kube-scheduler.service配置文件
vim /etc/kubernetes/scheduler
KUBE_SCHEDULER_ARGS="--master=http://127.0.0.1:8080 --logtostderr=false --log-dir=/var/log/kubernets/log --v=2"
4、配置开机启动并启动服务
systemctl daemon-reload systemctl enable kube-apiserver.service systemctl enable kube-controller-manager.service systemctl enable kube-scheduler.service systemctl start kube-apiserver.service systemctl start kube-controller-manager.service systemctl start kube-scheduler.service
5、验证Master是否安装成功
kubectl get componentstatuses
6、查看服务器状态
systemctl status kube-apiserver.service systemctl status kube-controller-manager.service systemctl status kube-scheduler.service
五、安装node节点
1、创建kubelet.service服务
vim /lib/systemd/system/kubelet.service
[Unit] Description=Kubernetes Kubelet Server Documentation=https://github.com/GoogleCloudPlatform/kubernetes After=docker.service Requires=docker.service [Service] #手动创建此目录 WorkingDirectory=/var/kubeletwork EnvironmentFile=/etc/kubernetes/kubelet ExecStart=/usr/local/sbin/kubelet $KUBELET_ARGS Restart=on-failure [Install] WantedBy=multi-user.target
创建kubelet.service配置文件
vim /etc/kubernetes/kubelet
KUBELET_ARGS="--cgroup-driver=systemd --kubeconfig=/etc/kubernetes/kubelet.kubeconfig --hostname-override=172.16.0.4 --logtostderr=true --log-dir=/var/log/kubernets/log --v=2 --address=172.16.0.4 --port=10250 --fail-swap-on=false --pod-infra-container-image=zengshaoyong/pod-infrastructure"
vim /etc/kubernetes/kubelet.kubeconfig
current-context: test-context apiVersion: v1 clusters: - cluster: api-version: v1 server: http://172.16.0.4:8080 name: test-cluster contexts: - context: cluster: test-cluster namespace: default name: test-context kind: Config preferences: colors: true
2、创建kube-proxy.service服务
vim /lib/systemd/system/kube-proxy.service
[Unit] Description=Kubernetes Kube-Proxy Server Documentation=https://github.com/GoogleCloudPlatform/kubernetes After=networking.service Requires=networking.service [Service] EnvironmentFile=/etc/kubernetes/proxy ExecStart=/usr/local/sbin/kube-proxy $KUBE_PROXY_ARGS Restart=on-failure LimitNOFILE=65536 [Install] WantedBy=multi-user.target
创建kube-proxy.service配置文件
vim /etc/kubernetes/proxy
KUBE_PROXY_ARGS="--master=http://172.16.0.4:8080 --hostname-override=node1 --v=2 --logtostderr=true --log-dir=/var/log/kubernets/log"
3、配置开机启动并启动服务
systemctl daemon-reload systemctl enable kubelet systemctl enable kube-proxy systemctl start kubelet systemctl start kube-proxy
4、查看服务状态
systemctl status kubelet systemctl status kube-proxy
5、常见问题
F0406 23:01:02.939036 23516 server.go:261] failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"
vim /etc/docker/daemon.json
"exec-opts": ["native.cgroupdriver=cgroupfs"],
变更为:
"exec-opts": ["native.cgroupdriver=systemd"],
重启docker
systemctl daemon-reload systemctl restart docker
六、安装flannel
1、在etcd中新建flannel的网络信息
etcdctl set /etc/kubernetes/network/config '{"Network": "172.17.0.0/16"}'
或
etcdctl --endpoints http://127.0.0.1:2379 set /coreos.com/network/config '{"Network": "172.17.0.0/16", "SubnetLen": 24, "SubnetMin": "172.17.0.0","SubnetMax": "172.17.20.0", "Backend": {"Type": "vxlan"}}'
注释:
Network:用于指定Flannel地址池 SubnetLen:用于指定分配给单个宿主机的docker0的ip段的子网掩码的长度 SubnetMin:用于指定最小能够分配的ip段 SudbnetMax:用于指定最大能够分配的ip段,在上面的示例中,表示每个宿主机可以分配一个24位掩码长度的子网,可以分配的子网从10.0.1.0/24到10.0.20.0/24,也就意味着在这个网段中,最多只能有20台宿主机 Backend:用于指定数据包以什么方式转发,默认为udp模式,host-gw模式性能最好,但不能跨宿主机网络
2、创建flanneld.service服务
vim /etc/systemd/system/flanneld.service
[Unit] Description=Flanneld Documentation=https://github.com/coreos/flannel After=network.target Before=docker.service [Service] User=root ExecStartPost=/usr/local/sbin/mk-docker-opts.sh ExecStart=/usr/local/sbin/flanneld \ --etcd-endpoints="http://172.16.0.4:2379" \ --iface=172.16.0.4 \ --ip-masq=true \ --etcd-prefix=/coreos.com/network Restart=on-failure Type=notify LimitNOFILE=65536 [Install] WantedBy=multi-user.target
3、配置开机启动并启动服务
systemctl daemon-reload systemctl enable flanneld systemctl start flanneld systemctl status flanneld
七、重启docker、kube-apiserver、kube-controller-manager、kube-scheduler、kubelet、kube-proxy
在master上执行如下命令
systemctl restart docker.service systemctl restart kube-apiserver.service systemctl restart kube-controller-manager.service systemctl restart kube-scheduler.service
在node上执行如下命令
systemctl restart docker.service systemctl restart kubelet.service systemctl restart kube-proxy.service
八、检查网卡是否正确
执行 ifconfig 命令检查docker0网卡与flannel0网卡是否在同一个IP地址段
/usr/local/sbin/mk-docker-opts.sh #脚本会自动转换 /run/flannel/subnet.env #生成 /run/docker_opts.env
cat /run/docker_opts.env
DOCKER_OPT_BIP="--bip=172.17.0.1/24" DOCKER_OPT_IPMASQ="--ip-masq=false" DOCKER_OPT_MTU="--mtu=1450" DOCKER_OPTS=" --bip=172.17.0.1/24 --ip-masq=false --mtu=1450"
修改docker.service启动配置文件,增加
After=flanneld.service
以及
EnvironmentFile=/run/docker_opts.env
,修改dockerd启动配置行为:
ExecStart=/usr/bin/dockerd $DOCKER_OPTS -H fd://
vim /lib/systemd/system/docker.service
[Unit] Description=Docker Application Container Engine Documentation=https://docs.docker.com After=network-online.target docker.socket firewalld.service After=flanneld.service Wants=network-online.target Requires=docker.socket [Service] Type=notify # the default is not to use systemd for cgroups because the delegate issues still # exists and systemd currently does not support the cgroup feature set required # for containers run by docker EnvironmentFile=/run/docker_opts.env ExecStart=/usr/bin/dockerd $DOCKER_OPTS -H fd:// ExecReload=/bin/kill -s HUP $MAINPID LimitNOFILE=1048576 # Having non-zero Limit*s causes performance problems due to accounting overhead # in the kernel. We recommend using cgroups to do container-local accounting. LimitNPROC=infinity LimitCORE=infinity # Uncomment TasksMax if your systemd version supports it. # Only systemd 226 and above support this version. TasksMax=infinity TimeoutStartSec=0 # set delegate yes so that systemd does not reset the cgroups of docker containers Delegate=yes # kill only the docker process, not all processes in the cgroup KillMode=process # restart the docker process if it exits prematurely Restart=on-failure StartLimitBurst=3 StartLimitInterval=60s [Install] WantedBy=multi-user.target
重载并重启docker服务
systemctl daemon-reload systemctl start docker systemctl restart docker systemctl status docker
九、查看全部服务状态
systemctl status etcd systemctl status kube-apiserver systemctl status kube-controller-manager systemctl status kube-scheduler systemctl status kubelet systemctl status kube-proxy systemctl status flanneld
十、创建测试应用
1、创建应用
kubectl run nginx --image=nginx --replicas=2 --port=80 --expose=true
查看pods
kubectl get pods
查看svc
kubectl get svc
2、导出服务让外部可访问
通过 kubectl run 创建的应用,默认情况下服务是一个虚拟 IP,从集群外部访问是访问不到的,这时需要用到 NodePort 类型的服务,将节点端口映射到容器内部。
kubectl expose deployment nginx --type=NodePort --name=nginx-nodeport
kubectl get deployment
kubectl get pods
kubectl get svc
3、通过NodePort类型服务访问应用
十一、其他
1、配置端口转发
vim /etc/sysctl.conf
net.ipv4.ip_forward=1 net.ipv4.conf.all.rp_filter=0 net.ipv4.conf.default.rp_filter=0
sysctl -p