This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Getting started

1: Release notes and version skew

1.1: v1.21 Release Notes
1.2: Kubernetes version and version skew support policy

2: Learning environment

3: Production environment

3.1: Container runtimes
3.2: Installing Kubernetes with deployment tools

3.2.1: Bootstrapping clusters with kubeadm

3.2.1.1: Installing kubeadm
3.2.1.2: Troubleshooting kubeadm
3.2.1.3: Creating a cluster with kubeadm
3.2.1.4: Customizing control plane configuration with kubeadm
3.2.1.5: Options for Highly Available topology
3.2.1.6: Creating Highly Available clusters with kubeadm
3.2.1.7: Set up a High Availability etcd cluster with kubeadm
3.2.1.8: Configuring each kubelet in your cluster using kubeadm
3.2.1.9: Dual-stack support with kubeadm

3.2.2: Installing Kubernetes with kops
3.2.3: Installing Kubernetes with Kubespray

3.3: Turnkey Cloud Solutions
3.4: Windows in Kubernetes

3.4.1: Intro to Windows support in Kubernetes
3.4.2: Guide for scheduling Windows containers in Kubernetes

4: Best practices

4.1: Considerations for large clusters
4.2: Running in multiple zones
4.3: Validate node setup
4.4: PKI certificates and requirements

This section lists the different ways to set up and run Kubernetes. When you install Kubernetes, choose an installation type based on: ease of maintenance, security, control, available resources, and expertise required to operate and manage a cluster.

You can deploy a Kubernetes cluster on a local machine, cloud, on-prem datacenter, or choose a managed Kubernetes cluster. There are also custom solutions across a wide range of cloud providers, or bare metal environments.

Learning environment

If you're learning Kubernetes, use the tools supported by the Kubernetes community, or tools in the ecosystem to set up a Kubernetes cluster on a local machine.

Production environment

When evaluating a solution for a production environment, consider which aspects of operating a Kubernetes cluster (or abstractions) you want to manage yourself or offload to a provider.

Kubernetes Partners includes a list of Certified Kubernetes providers.

1 - Release notes and version skew

1.1 - v1.21 Release Notes

v1.21.0

Documentation

Downloads for v1.21.0

Source Code

filename	sha512 hash
kubernetes.tar.gz	`19bb76a3fa5ce4b9f043b2a3a77c32365ab1fcb902d8dd6678427fb8be8f49f64a5a03dc46aaef9c7dadee05501cf83412eda46f0edacbb8fc1ed0bf5fb79142`
kubernetes-src.tar.gz	`f942e6d6c10007a6e9ce21e94df597015ae646a7bc3e515caf1a3b79f1354efb9aff59c40f2553a8e3d43fe4a01742241f5af18b69666244906ed11a22e3bc49`

Client Binaries

filename	sha512 hash
kubernetes-client-darwin-amd64.tar.gz	`be9d1440e418e5253fb8a3d8aba705ca8160746a9bd17325ad626a986b6da9f733af864155a651a32b7bca94b533b8d596005ddbe5248bdeea85db47a1b957ed`
kubernetes-client-darwin-arm64.tar.gz	`eed0ddc81d104bb2d41ace13f737c490423d5df4ebddc7376e45c18ed66af35933c9376b912c1c3da105945b04056f6ca0870c156bee8a307cf4189ca5eb1dd1`
kubernetes-client-linux-386.tar.gz	`8a2f30c4434199762f2a96141dab4241c1cce2711bea9ea39cc63c2c5e7d31719ed7f076efac1931604e3a94578d3bbf0cfa454965708c96f3cfb91789868746`
kubernetes-client-linux-amd64.tar.gz	`cd3cfa645fa31de3716f1f63506e31b73d2aa8d37bb558bb3b3e8c151f35b3d74d44e03cbd05be67e380f9a5d015aba460222afdac6677815cd99a85c2325cf0`
kubernetes-client-linux-arm.tar.gz	`936042aa11cea0f6dfd2c30fc5dbe655420b34799bede036b1299a92d6831f589ca10290b73b9c9741560b603ae31e450ad024e273f2b4df5354bfac272691d8`
kubernetes-client-linux-arm64.tar.gz	`42beb75364d7bf4bf526804b8a35bd0ab3e124b712e9d1f45c1b914e6be0166619b30695feb24b3eecef134991dacb9ab3597e788bd9e45cf35addddf20dd7f6`
kubernetes-client-linux-ppc64le.tar.gz	`4baba2ed7046b28370eccc22e2378ae79e3ce58220d6f4f1b6791e8233bec8379e30200bb20b971456b83f2b791ea166fdfcf1ea56908bc1eea03590c0eda468`
kubernetes-client-linux-s390x.tar.gz	`37fa0c4d703aef09ce68c10ef3e7362b0313c8f251ce38eea579cd18fae4023d3d2b70e0f31577cabe6958ab9cfc30e98d25a7c64e69048b423057c3cf728339`
kubernetes-client-windows-386.tar.gz	`6900db36c1e3340edfd6dfd8d720575a904c932d39a8a7fa36401595e971a0235bd42111dbcc1cbb77e7374e47f1380a68c637997c18f96a0d9cdc9f3714c4c9`
kubernetes-client-windows-amd64.tar.gz	`90de67f6f79fc63bcfdf35066e3d84501cc85433265ffad36fd1a7a428a31b446249f0644a1e97495ea8b2a08e6944df6ef30363003750339edaa2aceffe937c`

Server Binaries

filename	sha512 hash
kubernetes-server-linux-amd64.tar.gz	`3941dcc2309ac19ec185603a79f5a086d8a198f98c04efa23f15a177e5e1f34946ea9392ba9f5d24d0d727839438f067fef1001fc6e88b27b8b01e35bbd962ca`
kubernetes-server-linux-arm.tar.gz	`6507abf6c2ec2b336901dc23269f6c577ec0049b8bad3c9dd6ad63f21aa10f09bfbbfa6e064c2466d250411d3e10f8672791a9e10942e38de7bfbaf7a8bcc9da`
kubernetes-server-linux-arm64.tar.gz	`5abe76f867ca6865344e957bf166b81766c049ec4eb183a8a5580c22a7f8474db1edf90fd901a5833e56128b6825811653a1d27f72fd34ce5b1287a8c10da05c`
kubernetes-server-linux-ppc64le.tar.gz	`62507b182ca25396a285d91241536860e58f54fac937e97cbdf91948c83bb41be97d33277400489bf50e85164d560205540b76e94e5d519892312bdc63df1067`
kubernetes-server-linux-s390x.tar.gz	`04f2a1f7d1388e4a7d7d9f597f872a3da36f26839cfed16aad6df07021c03f4dca1df06b19cfda56df09d1c2d9a13ebd0af40ca1b9b6aecfaf427ab7712d88f3`

Node Binaries

filename	sha512 hash
kubernetes-node-linux-amd64.tar.gz	`c1831c708109c31b3878e5a9327ea4b9e546504d0b6b00f3d43db78b5dd7d5114d32ac24a9a505f9cadbe61521f0419933348d2cd309ed8cfe3987d9ca8a7e2c`
kubernetes-node-linux-arm.tar.gz	`b68dd5bcfc7f9ce2781952df40c8c3a64c29701beff6ac22f042d6f31d4de220e9200b7e8272ddf608114327770acdaf3cb9a34a0a5206e784bda717ea080e0f`
kubernetes-node-linux-arm64.tar.gz	`7fa84fc500c28774ed25ca34b6f7b208a2bea29d6e8379f84b9f57bd024aa8fe574418cee7ee26edd55310716d43d65ae7b9cbe11e40c995fe2eac7f66bdb423`
kubernetes-node-linux-ppc64le.tar.gz	`a4278b3f8e458e9581e01f0c5ba8443303c987988ee136075a8f2f25515d70ca549fbd2e4d10eefca816c75c381d62d71494bd70c47034ab47f8315bbef4ae37`
kubernetes-node-linux-s390x.tar.gz	`8de2bc6f22f232ff534b45012986eac23893581ccb6c45bd637e40dbe808ce31d5a92375c00dc578bdbadec342b6e5b70c1b9f3d3a7bb26ccfde97d71f9bf84a`
kubernetes-node-windows-amd64.tar.gz	`b82e94663d330cff7a117f99a7544f27d0bc92b36b5a283b3c23725d5b33e6f15e0ebf784627638f22f2d58c58c0c2b618ddfd226a64ae779693a0861475d355`

Changelog since v1.20.0

What's New (Major Themes)

Deprecation of PodSecurityPolicy

PSP as an admission controller resource is being deprecated. Deployed PodSecurityPolicy's will keep working until version 1.25, their target removal from the codebase. A new feature, with a working title of "PSP replacement policy", is being developed in KEP-2579. To learn more, read PodSecurityPolicy Deprecation: Past, Present, and Future.

Kubernetes API Reference Documentation

The API reference is now generated with gen-resourcesdocs and it is moving to Kubernetes API

Kustomize Updates in Kubectl

Kustomize version in kubectl had a jump from v2.0.3 to v4.0.5. Kustomize is now treated as a library and future updates will be less sporadic.

Default Container Labels

Pod with multiple containers can use kubectl.kubernetes.io/default-container label to have a container preselected for kubectl commands. More can be read in KEP-2227.

Immutable Secrets and ConfigMaps

Immutable Secrets and ConfigMaps graduates to GA. This feature allows users to specify that the contents of a particular Secret or ConfigMap is immutable for its object lifetime. For such instances, Kubelet will not watch/poll for changes and therefore reducing apiserver load.

Structured Logging in Kubelet

Kubelet has adopted structured logging, thanks to community effort in accomplishing this within the release timeline. Structured logging in the project remains an ongoing effort -- for folks interested in participating, keep an eye / chime in to the mailing list discussion.

Storage Capacity Tracking

Traditionally, the Kubernetes scheduler was based on the assumptions that additional persistent storage is available everywhere in the cluster and has infinite capacity. Topology constraints addressed the first point, but up to now pod scheduling was still done without considering that the remaining storage capacity may not be enough to start a new pod. Storage capacity tracking addresses that by adding an API for a CSI driver to report storage capacity and uses that information in the Kubernetes scheduler when choosing a node for a pod. This feature serves as a stepping stone for supporting dynamic provisioning for local volumes and other volume types that are more capacity constrained.

Generic Ephemeral Volumes

Generic ephermeral volumes feature allows any existing storage driver that supports dynamic provisioning to be used as an ephemeral volume with the volume’s lifecycle bound to the Pod. It can be used to provide scratch storage that is different from the root disk, for example persistent memory, or a separate local disk on that node. All StorageClass parameters for volume provisioning are supported. All features supported with PersistentVolumeClaims are supported, such as storage capacity tracking, snapshots and restore, and volume resizing.

CSI Service Account Token

CSI Service Account Token feature moves to Beta in 1.21. This feature improves the security posture and allows CSI drivers to receive pods' bound service account tokens. This feature also provides a knob to re-publish volumes so that short-lived volumes can be refreshed.

CSI Health Monitoring

The CSI health monitoring feature is being released as a second Alpha in Kubernetes 1.21. This feature enables CSI Drivers to share abnormal volume conditions from the underlying storage systems with Kubernetes so that they can be reported as events on PVCs or Pods. This feature serves as a stepping stone towards programmatic detection and resolution of individual volume health issues by Kubernetes.

Known Issues

`TopologyAwareHints` feature falls back to default behavior

The feature gate currently falls back to the default behavior in most cases. Enabling the feature gate will add hints to EndpointSlices, but functional differences are only observed in non-dual stack kube-proxy implementation. The fix will be available in coming releases.

Urgent Upgrade Notes

(No, really, you MUST read this before you upgrade)

Kube-proxy's IPVS proxy mode no longer sets the net.ipv4.conf.all.route_localnet sysctl parameter. Nodes upgrading will have net.ipv4.conf.all.route_localnet set to 1 but new nodes will inherit the system default (usually 0). If you relied on any behavior requiring net.ipv4.conf.all.route_localnet, you must set ensure it is enabled as kube-proxy will no longer set it automatically. This change helps to further mitigate CVE-2020-8558. (#92938, @lbernail) [SIG Network and Release]
Kubeadm: during "init" an empty cgroupDriver value in the KubeletConfiguration is now always set to "systemd" unless the user is explicit about it. This requires existing machine setups to configure the container runtime to use the "systemd" driver. Documentation on this topic can be found here: https://kubernetes.io/docs/setup/production-environment/container-runtimes/. When upgrading existing clusters / nodes using "kubeadm upgrade" the old cgroupDriver value is preserved, but in 1.22 this change will also apply to "upgrade". For more information on migrating to the "systemd" driver or remaining on the "cgroupfs" driver see: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/configure-cgroup-driver/. (#99471, @neolit123) [SIG Cluster Lifecycle]
Newly provisioned PVs by EBS plugin will no longer use the deprecated "failure-domain.beta.kubernetes.io/zone" and "failure-domain.beta.kubernetes.io/region" labels. It will use "topology.kubernetes.io/zone" and "topology.kubernetes.io/region" labels instead. (#99130, @ayberk) [SIG Cloud Provider, Storage and Testing]
Newly provisioned PVs by OpenStack Cinder plugin will no longer use the deprecated "failure-domain.beta.kubernetes.io/zone" and "failure-domain.beta.kubernetes.io/region" labels. It will use "topology.kubernetes.io/zone" and "topology.kubernetes.io/region" labels instead. (#99719, @jsafrane) [SIG Cloud Provider and Storage]
Newly provisioned PVs by gce-pd will no longer have the beta FailureDomain label. gce-pd volume plugin will start to have GA topology label instead. (#98700, @Jiawei0227) [SIG Cloud Provider, Storage and Testing]
OpenStack Cinder CSI migration is on by default, Clinder CSI driver must be installed on clusters on OpenStack for Cinder volumes to work. (#98538, @dims) [SIG Storage]
Remove alpha CSIMigrationXXComplete flag and add alpha InTreePluginXXUnregister flag. Deprecate CSIMigrationvSphereComplete flag and it will be removed in v1.22. (#98243, @Jiawei0227)
Remove storage metrics storage_operation_errors_total, since we already have storage_operation_status_count.And add new field status for storage_operation_duration_seconds, so that we can know about all status storage operation latency. (#98332, @JornShen) [SIG Instrumentation and Storage]
The metric storage_operation_errors_total is not removed, but is marked deprecated, and the metric storage_operation_status_count is marked deprecated. In both cases the storage_operation_duration_seconds metric can be used to recover equivalent counts (using status=fail-unknown in the case of storage_operations_errors_total). (#99045, @mattcary)
ServiceNodeExclusion, NodeDisruptionExclusion and LegacyNodeRoleBehavior features have been promoted to GA. ServiceNodeExclusion and NodeDisruptionExclusion are now unconditionally enabled, while LegacyNodeRoleBehavior is unconditionally disabled. To prevent control plane nodes from being added to load balancers automatically, upgrade users need to add "node.kubernetes.io/exclude-from-external-load-balancers" label to control plane nodes. (#97543, @pacoxu)

Changes by Kind

Deprecation

Aborting the drain command in a list of nodes will be deprecated. The new behavior will make the drain command go through all nodes even if one or more nodes failed during the drain. For now, users can try such experience by enabling --ignore-errors flag. (#98203, @yuzhiquan)
Delete deprecated service.beta.kubernetes.io/azure-load-balancer-mixed-protocols mixed procotol annotation in favor of the MixedProtocolLBService feature (#97096, @nilo19) [SIG Cloud Provider]
Deprecate the topologyKeys field in Service. This capability will be replaced with upcoming work around Topology Aware Subsetting and Service Internal Traffic Policy. (#96736, @andrewsykim) [SIG Apps]
Kube-proxy: remove deprecated --cleanup-ipvs flag of kube-proxy, and make --cleanup flag always to flush IPVS (#97336, @maaoBit) [SIG Network]
Kubeadm: deprecated command "alpha selfhosting pivot" is now removed. (#97627, @knight42)
Kubeadm: graduate the command kubeadm alpha kubeconfig user to kubeadm kubeconfig user. The kubeadm alpha kubeconfig user command is deprecated now. (#97583, @knight42) [SIG Cluster Lifecycle]
Kubeadm: the "kubeadm alpha certs" command is removed now, please use "kubeadm certs" instead. (#97706, @knight42) [SIG Cluster Lifecycle]
Kubeadm: the deprecated kube-dns is no longer supported as an option. If "ClusterConfiguration.dns.type" is set to "kube-dns" kubeadm will now throw an error. (#99646, @rajansandeep) [SIG Cluster Lifecycle]
Kubectl: The deprecated kubectl alpha debug command is removed. Use kubectl debug instead. (#98111, @pandaamanda) [SIG CLI]
Official support to build kubernetes with docker-machine / remote docker is removed. This change does not affect building kubernetes with docker locally. (#97935, @adeniyistephen) [SIG Release and Testing]
Remove deprecated --generator, --replicas, --service-generator, --service-overrides, --schedule from kubectl run Deprecate --serviceaccount, --hostport, --requests, --limits in kubectl run (#99732, @soltysh)
Remove the deprecated metrics "scheduling_algorithm_preemption_evaluation_seconds" and "binding_duration_seconds", suggest to use "scheduler_framework_extension_point_duration_seconds" instead. (#96447, @chendave) [SIG Cluster Lifecycle, Instrumentation, Scheduling and Testing]
Removing experimental windows container hyper-v support with Docker (#97141, @wawa0210) [SIG Node and Windows]
Rename metrics etcd_object_counts to apiserver_storage_object_counts and mark it as stable. The original etcd_object_counts metrics name is marked as "Deprecated" and will be removed in the future. (#99785, @erain) [SIG API Machinery, Instrumentation and Testing]
The GA TokenRequest and TokenRequestProjection feature gates have been removed and are unconditionally enabled. Remove explicit use of those feature gates in CLI invocations. (#97148, @wawa0210) [SIG Node]
The PodSecurityPolicy API is deprecated in 1.21, and will no longer be served starting in 1.25. (#97171, @deads2k) [SIG Auth and CLI]
The batch/v2alpha1 CronJob type definitions and clients are deprecated and removed. (#96987, @soltysh) [SIG API Machinery, Apps, CLI and Testing]
The export query parameter (inconsistently supported by API resources and deprecated in v1.14) is fully removed. Requests setting this query parameter will now receive a 400 status response. (#98312, @deads2k) [SIG API Machinery, Auth and Testing]
audit.k8s.io/v1beta1 and audit.k8s.io/v1alpha1 audit policy configuration and audit events are deprecated in favor of audit.k8s.io/v1, available since v1.13. kube-apiserver invocations that specify alpha or beta policy configurations with --audit-policy-file, or explicitly request alpha or beta audit events with --audit-log-version / --audit-webhook-version must update to use audit.k8s.io/v1 and accept audit.k8s.io/v1 events prior to v1.24. (#98858, @carlory) [SIG Auth]
discovery.k8s.io/v1beta1 EndpointSlices are deprecated in favor of discovery.k8s.io/v1, and will no longer be served in Kubernetes v1.25. (#100472, @liggitt)
diskformat storage class parameter for in-tree vSphere volume plugin is deprecated as of v1.21 release. Please consider updating storageclass and remove diskformat parameter. vSphere CSI Driver does not support diskformat storageclass parameter.

vSphere releases less than 67u3 are deprecated as of v1.21. Please consider upgrading vSphere to 67u3 or above. vSphere CSI Driver requires minimum vSphere 67u3.

VM Hardware version less than 15 is deprecated as of v1.21. Please consider upgrading the Node VM Hardware version to 15 or above. vSphere CSI Driver recommends Node VM's Hardware version set to at least vmx-15.

Multi vCenter support is deprecated as of v1.21. If you have a Kubernetes cluster spanning across multiple vCenter servers, please consider moving all k8s nodes to a single vCenter Server. vSphere CSI Driver does not support Kubernetes deployment spanning across multiple vCenter servers.

Support for these deprecations will be available till Kubernetes v1.24. (#98546, @divyenpatel)

API Change

1. PodAffinityTerm includes a namespaceSelector field to allow selecting eligible namespaces based on their labels.
2. A new CrossNamespacePodAffinity quota scope API that allows restricting which namespaces allowed to use PodAffinityTerm with corss-namespace reference via namespaceSelector or namespaces fields. (#98582, @ahg-g) [SIG API Machinery, Apps, Auth and Testing]
Add Probe-level terminationGracePeriodSeconds field (#99375, @ehashman) [SIG API Machinery, Apps, Node and Testing]
Added .spec.completionMode field to Job, with accepted values NonIndexed (default) and Indexed. This is an alpha field and is only honored by servers with the IndexedJob feature gate enabled. (#98441, @alculquicondor) [SIG Apps and CLI]
Adds support for endPort field in NetworkPolicy (#97058, @rikatz) [SIG Apps and Network]
CSIServiceAccountToken graduates to Beta and enabled by default. (#99298, @zshihang)
Cluster admins can now turn off /debug/pprof and /debug/flags/v endpoint in kubelet by setting enableProfilingHandler and enableDebugFlagsHandler to false in the Kubelet configuration file. Options enableProfilingHandler and enableDebugFlagsHandler can be set to true only when enableDebuggingHandlers is also set to true. (#98458, @SaranBalaji90)
DaemonSets accept a MaxSurge integer or percent on their rolling update strategy that will launch the updated pod on nodes and wait for those pods to go ready before marking the old out-of-date pods as deleted. This allows workloads to avoid downtime during upgrades when deployed using DaemonSets. This feature is alpha and is behind the DaemonSetUpdateSurge feature gate. (#96441, @smarterclayton) [SIG Apps and Testing]
Enable SPDY pings to keep connections alive, so that kubectl exec and kubectl portforward won't be interrupted. (#97083, @knight42) [SIG API Machinery and CLI]
FieldManager no longer owns fields that get reset before the object is persisted (e.g. "status wiping"). (#99661, @kevindelgado) [SIG API Machinery, Auth and Testing]
Fixes server-side apply for APIService resources. (#98576, @kevindelgado)
Generic ephemeral volumes are beta. (#99643, @pohly) [SIG API Machinery, Apps, Auth, CLI, Node, Storage and Testing]
Hugepages request values are limited to integer multiples of the page size. (#98515, @lala123912) [SIG Apps]
Implement the GetAvailableResources in the podresources API. (#95734, @fromanirh) [SIG Instrumentation, Node and Testing]
IngressClass resource can now reference a resource in a specific namespace for implementation-specific configuration (previously only Cluster-level resources were allowed). This feature can be enabled using the IngressClassNamespacedParams feature gate. (#99275, @hbagdi)
Jobs API has a new .spec.suspend field that can be used to suspend and resume Jobs. This is an alpha field which is only honored by servers with the SuspendJob feature gate enabled. (#98727, @adtac)
Kubelet Graceful Node Shutdown feature graduates to Beta and enabled by default. (#99735, @bobbypage)
Kubernetes is now built using go1.15.7 (#98363, @cpanato) [SIG Cloud Provider, Instrumentation, Node, Release and Testing]
Namespace API objects now have a kubernetes.io/metadata.name label matching their metadata.name field to allow selecting any namespace by its name using a label selector. (#96968, @jayunit100) [SIG API Machinery, Apps, Cloud Provider, Storage and Testing]
One new field "InternalTrafficPolicy" in Service is added. It specifies if the cluster internal traffic should be routed to all endpoints or node-local endpoints only. "Cluster" routes internal traffic to a Service to all endpoints. "Local" routes traffic to node-local endpoints only, and traffic is dropped if no node-local endpoints are ready. The default value is "Cluster". (#96600, @maplain) [SIG API Machinery, Apps and Network]
PodDisruptionBudget API objects can now contain conditions in status. (#98127, @mortent) [SIG API Machinery, Apps, Auth, CLI, Cloud Provider, Cluster Lifecycle and Instrumentation]
PodSecurityPolicy only stores "generic" as allowed volume type if the GenericEphemeralVolume feature gate is enabled (#98918, @pohly) [SIG Auth and Security]
Promote CronJobs to batch/v1 (#99423, @soltysh) [SIG API Machinery, Apps, CLI and Testing]
Promote Immutable Secrets/ConfigMaps feature to Stable. This allows to set immutable field in Secret or ConfigMap object to mark their contents as immutable. (#97615, @wojtek-t) [SIG Apps, Architecture, Node and Testing]
Remove support for building Kubernetes with bazel. (#99561, @BenTheElder) [SIG API Machinery, Apps, Architecture, Auth, Autoscaling, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Network, Node, Release, Scalability, Scheduling, Storage, Testing and Windows]
Scheduler extender filter interface now can report unresolvable failed nodes in the new field FailedAndUnresolvableNodes of ExtenderFilterResult struct. Nodes in this map will be skipped in the preemption phase. (#92866, @cofyc) [SIG Scheduling]
Services can specify loadBalancerClass to use a custom load balancer (#98277, @XudongLiuHarold)
Storage capacity tracking (= the CSIStorageCapacity feature) graduates to Beta and enabled by default, storage.k8s.io/v1alpha1/VolumeAttachment and storage.k8s.io/v1alpha1/CSIStorageCapacity objects are deprecated (#99641, @pohly)
Support for Indexed Job: a Job that is considered completed when Pods associated to indexes from 0 to (.spec.completions-1) have succeeded. (#98812, @alculquicondor) [SIG Apps and CLI]
The BoundServiceAccountTokenVolume feature has been promoted to beta, and enabled by default.
- This changes the tokens provided to containers at /var/run/secrets/kubernetes.io/serviceaccount/token to be time-limited, auto-refreshed, and invalidated when the containing pod is deleted.
- Clients should reload the token from disk periodically (once per minute is recommended) to ensure they continue to use a valid token. k8s.io/client-go version v11.0.0+ and v0.15.0+ reload tokens automatically.
- By default, injected tokens are given an extended lifetime so they remain valid even after a new refreshed token is provided. The metric serviceaccount_stale_tokens_total can be used to monitor for workloads that are depending on the extended lifetime and are continuing to use tokens even after a refreshed token is provided to the container. If that metric indicates no existing workloads are depending on extended lifetimes, injected token lifetime can be shortened to 1 hour by starting kube-apiserver with --service-account-extend-token-expiration=false. (#95667, @zshihang) [SIG API Machinery, Auth, Cluster Lifecycle and Testing]
The EndpointSlice Controllers are now GA. The EndpointSliceController will not populate the deprecatedTopology field and will only provide topology information through the zone and nodeName fields. (#99870, @swetharepakula)
The Endpoints controller will now set the endpoints.kubernetes.io/over-capacity annotation to "warning" when an Endpoints resource contains more than 1000 addresses. In a future release, the controller will truncate Endpoints that exceed this limit. The EndpointSlice API can be used to support significantly larger number of addresses. (#99975, @robscott) [SIG Apps and Network]
The PodDisruptionBudget API has been promoted to policy/v1 with no schema changes. The only functional change is that an empty selector ({}) written to a policy/v1 PodDisruptionBudget now selects all pods in the namespace. The behavior of the policy/v1beta1 API remains unchanged. The policy/v1beta1 PodDisruptionBudget API is deprecated and will no longer be served in 1.25+. (#99290, @mortent) [SIG API Machinery, Apps, Auth, Autoscaling, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Scheduling and Testing]
The EndpointSlice API is now GA. The EndpointSlice topology field has been removed from the GA API and will be replaced by a new per Endpoint Zone field. If the topology field was previously used, it will be converted into an annotation in the v1 Resource. The discovery.k8s.io/v1alpha1 API is removed. (#99662, @swetharepakula)
The controller.kubernetes.io/pod-deletion-cost annotation can be set to offer a hint on the cost of deleting a Pod compared to other pods belonging to the same ReplicaSet. Pods with lower deletion cost are deleted first. This is an alpha feature. (#99163, @ahg-g)
The kube-apiserver now resets managedFields that got corrupted by a mutating admission controller. (#98074, @kwiesmueller)
Topology Aware Hints are now available in alpha and can be enabled with the TopologyAwareHints feature gate. (#99522, @robscott) [SIG API Machinery, Apps, Auth, Instrumentation, Network and Testing]
Users might specify the kubectl.kubernetes.io/default-exec-container annotation in a Pod to preselect container for kubectl commands. (#97099, @pacoxu) [SIG CLI]

Feature

A client-go metric, rest_client_exec_plugin_call_total, has been added to track total calls to client-go credential plugins. (#98892, @ankeesler) [SIG API Machinery, Auth, Cluster Lifecycle and Instrumentation]
A new histogram metric to track the time it took to delete a job by the TTLAfterFinished controller (#98676, @ahg-g)
AWS cloud provider supports auto-discovering subnets without any kubernetes.io/cluster/<clusterName> tags. It also supports additional service annotation service.beta.kubernetes.io/aws-load-balancer-subnets to manually configure the subnets. (#97431, @kishorj)
Aborting the drain command in a list of nodes will be deprecated. The new behavior will make the drain command go through all nodes even if one or more nodes failed during the drain. For now, users can try such experience by enabling --ignore-errors flag. (#98203, @yuzhiquan)
Add --permit-address-sharing flag to kube-apiserver to listen with SO_REUSEADDR. While allowing to listen on wildcard IPs like 0.0.0.0 and specific IPs in parallel, it avoids waiting for the kernel to release socket in TIME_WAIT state, and hence, considerably reducing kube-apiserver restart times under certain conditions. (#93861, @sttts)
Add csi_operations_seconds metric on kubelet that exposes CSI operations duration and status for node CSI operations. (#98979, @Jiawei0227) [SIG Instrumentation and Storage]
Add migrated field into storage_operation_duration_seconds metric (#99050, @Jiawei0227) [SIG Apps, Instrumentation and Storage]
Add flag --lease-reuse-duration-seconds for kube-apiserver to config etcd lease reuse duration. (#97009, @lingsamuel) [SIG API Machinery and Scalability]
Add metric etcd_lease_object_counts for kube-apiserver to observe max objects attached to a single etcd lease. (#97480, @lingsamuel) [SIG API Machinery, Instrumentation and Scalability]
Add support to generate client-side binaries for new darwin/arm64 platform (#97743, @dims) [SIG Release and Testing]
Added ephemeral_volume_controller_create[_failures]_total counters to kube-controller-manager metrics (#99115, @pohly) [SIG API Machinery, Apps, Cluster Lifecycle, Instrumentation and Storage]
Added support for installing arm64 node artifacts. (#99242, @liu-cong)
Adds alpha feature VolumeCapacityPriority which makes the scheduler prioritize nodes based on the best matching size of statically provisioned PVs across multiple topologies. (#96347, @cofyc) [SIG Apps, Network, Scheduling, Storage and Testing]
Adds the ability to pass --strict-transport-security-directives to the kube-apiserver to set the HSTS header appropriately. Be sure you understand the consequences to browsers before setting this field. (#96502, @249043822) [SIG Auth]
Adds two new metrics to cronjobs, a histogram to track the time difference when a job is created and the expected time when it should be created, as well as a gauge for the missed schedules of a cronjob (#99341, @alaypatel07)
Alpha implementation of Kubectl Command Headers: SIG CLI KEP 859 enabled when KUBECTL_COMMAND_HEADERS environment variable set on the client command line. (#98952, @seans3)
Base-images: Update to debian-iptables:buster-v1.4.0
- Uses iptables 1.8.5
- base-images: Update to debian-base:buster-v1.3.0
- cluster/images/etcd: Build etcd:3.4.13-2 image
  - Uses debian-base:buster-v1.3.0 (#98401, @pacoxu) [SIG Testing]
CRIContainerLogRotation graduates to GA and unconditionally enabled. (#99651, @umohnani8)
Component owner can configure the allowlist of metric label with flag '--allow-metric-labels'. (#99385, @YoyinZyc) [SIG API Machinery, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation and Release]
Component owner can configure the allowlist of metric label with flag '--allow-metric-labels'. (#99738, @YoyinZyc) [SIG API Machinery, Cluster Lifecycle and Instrumentation]
EmptyDir memory backed volumes are sized as the the minimum of pod allocatable memory on a host and an optional explicit user provided value. (#100319, @derekwaynecarr) [SIG Node]
Enables Kubelet to check volume condition and log events to corresponding pods. (#99284, @fengzixu) [SIG Apps, Instrumentation, Node and Storage]
EndpointSliceNodeName graduates to GA and thus will be unconditionally enabled -- NodeName will always be available in the v1beta1 API. (#99746, @swetharepakula)
Export NewDebuggingRoundTripper function and DebugLevel options in the k8s.io/client-go/transport package. (#98324, @atosatto)
Kube-proxy iptables: new metric sync_proxy_rules_iptables_total that exposes the number of rules programmed per table in each iteration (#99653, @aojea) [SIG Instrumentation and Network]
Kube-scheduler now logs plugin scoring summaries at --v=4 (#99411, @damemi) [SIG Scheduling]
Kubeadm now includes CoreDNS v1.8.0. (#96429, @rajansandeep) [SIG Cluster Lifecycle]
Kubeadm: IPv6DualStack feature gate graduates to Beta and enabled by default (#99294, @pacoxu)
Kubeadm: a warning to user as ipv6 site-local is deprecated (#99574, @pacoxu) [SIG Cluster Lifecycle and Network]
Kubeadm: add support for certificate chain validation. When using kubeadm in external CA mode, this allows an intermediate CA to be used to sign the certificates. The intermediate CA certificate must be appended to each signed certificate for this to work correctly. (#97266, @robbiemcmichael) [SIG Cluster Lifecycle]
Kubeadm: amend the node kernel validation to treat CGROUP_PIDS, FAIR_GROUP_SCHED as required and CFS_BANDWIDTH, CGROUP_HUGETLB as optional (#96378, @neolit123) [SIG Cluster Lifecycle and Node]
Kubeadm: apply the "node.kubernetes.io/exclude-from-external-load-balancers" label on control plane nodes during "init", "join" and "upgrade" to preserve backwards compatibility with the lagacy LB mode where nodes labeled as "master" where excluded. To opt-out you can remove the label from a node. See #97543 and the linked KEP for more details. (#98269, @neolit123) [SIG Cluster Lifecycle]
Kubeadm: if the user has customized their image repository via the kubeadm configuration, pass the custom pause image repository and tag to the kubelet via --pod-infra-container-image not only for Docker but for all container runtimes. This flag tells the kubelet that it should not garbage collect the image. (#99476, @neolit123) [SIG Cluster Lifecycle]
Kubeadm: perform pre-flight validation on host/node name upon kubeadm init and kubeadm join, showing warnings on non-compliant names (#99194, @pacoxu)
Kubectl version changed to write a warning message to stderr if the client and server version difference exceeds the supported version skew of +/-1 minor version. (#98250, @brianpursley) [SIG CLI]
Kubectl: Add --use-protocol-buffers flag to kubectl top pods and nodes. (#96655, @serathius)
Kubectl: kubectl get will omit managed fields by default now. Users could set --show-managed-fields to true to show managedFields when the output format is either json or yaml. (#96878, @knight42) [SIG CLI and Testing]
Kubectl: a Pod can be preselected as default container using kubectl.kubernetes.io/default-container annotation (#99833, @mengjiao-liu)
Kubectl: add bash-completion for comma separated list on kubectl get (#98301, @phil9909)
Kubernetes is now built using go1.15.8 (#98834, @cpanato) [SIG Cloud Provider, Instrumentation, Release and Testing]
Kubernetes is now built with Golang 1.16 (#98572, @justaugustus) [SIG API Machinery, Auth, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Node, Release and Testing]
Kubernetes is now built with Golang 1.16.1 (#100106, @justaugustus) [SIG Cloud Provider, Instrumentation, Release and Testing]
Metrics can now be disabled explicitly via a command line flag (i.e. '--disabled-metrics=metric1,metric2') (#99217, @logicalhan)
New admission controller DenyServiceExternalIPs is available. Clusters which do not need the Service externalIPs feature should enable this controller and be more secure. (#97395, @thockin)
Overall, enable the feature of PreferNominatedNode will improve the performance of scheduling where preemption might frequently happen, but in theory, enable the feature of PreferNominatedNode, the pod might not be scheduled to the best candidate node in the cluster. (#93179, @chendave) [SIG Scheduling and Testing]
Persistent Volumes formatted with the btrfs filesystem will now automatically resize when expanded. (#99361, @Novex) [SIG Storage]
Port the devicemanager to Windows node to allow device plugins like directx (#93285, @aarnaud) [SIG Node, Testing and Windows]
Removes cAdvisor JSON metrics (/stats/container, /stats//, /stats////) from the kubelet. (#99236, @pacoxu)
Rename metrics etcd_object_counts to apiserver_storage_object_counts and mark it as stable. The original etcd_object_counts metrics name is marked as "Deprecated" and will be removed in the future. (#99785, @erain) [SIG API Machinery, Instrumentation and Testing]
Sysctls graduates to General Availability and thus unconditionally enabled. (#99158, @wgahnagl)
The Kubernetes pause image manifest list now contains an image for Windows Server 20H2. (#97322, @claudiubelu) [SIG Windows]
The NodeAffinity plugin implements the PreFilter extension, offering enhanced performance for Filter. (#99213, @AliceZhang2016) [SIG Scheduling]
The CronJobControllerV2 feature flag graduates to Beta and set to be enabled by default. (#98878, @soltysh)
The EndpointSlice mirroring controller mirrors endpoints annotations and labels to the generated endpoint slices, it also ensures that updates on any of these fields are mirrored. The well-known annotation endpoints.kubernetes.io/last-change-trigger-time is skipped and not mirrored. (#98116, @aojea)
The RunAsGroup feature has been promoted to GA in this release. (#94641, @krmayankk) [SIG Auth and Node]
The ServiceAccountIssuerDiscovery feature has graduated to GA, and is unconditionally enabled. The ServiceAccountIssuerDiscovery feature-gate will be removed in 1.22. (#98553, @mtaufen) [SIG API Machinery, Auth and Testing]
The TTLAfterFinished feature flag is now beta and enabled by default (#98678, @ahg-g)
The apimachinery util/net function used to detect the bind address ResolveBindAddress() takes into consideration global IP addresses on loopback interfaces when 1) the host has default routes, or 2) there are no global IPs on those interfaces in order to support more complex network scenarios like BGP Unnumbered RFC 5549 (#95790, @aojea) [SIG Network]
The feature gate RootCAConfigMap graduated to GA in v1.21 and therefore will be unconditionally enabled. This flag will be removed in v1.22 release. (#98033, @zshihang)
The pause image upgraded to v3.4.1 in kubelet and kubeadm for both Linux and Windows. (#98205, @pacoxu)
Update pause container to run as pseudo user and group 65535:65535. This implies the release of version 3.5 of the container images. (#97963, @saschagrunert) [SIG CLI, Cloud Provider, Cluster Lifecycle, Node, Release, Security and Testing]
Update the latest validated version of Docker to 20.10 (#98977, @neolit123) [SIG CLI, Cluster Lifecycle and Node]
Upgrade node local dns to 1.17.0 for better IPv6 support (#99749, @pacoxu) [SIG Cloud Provider and Network]
Upgrades IPv6Dualstack to Beta and turns it on by default. New clusters or existing clusters are not be affected until an actor starts adding secondary Pods and service CIDRS CLI flags as described here: IPv4/IPv6 Dual-stack (#98969, @khenidak)
Users might specify the kubectl.kubernetes.io/default-container annotation in a Pod to preselect container for kubectl commands. (#99581, @mengjiao-liu) [SIG CLI]
When downscaling ReplicaSets, ready and creation timestamps are compared in a logarithmic scale. (#99212, @damemi) [SIG Apps and Testing]
When the kubelet is watching a ConfigMap or Secret purely in the context of setting environment variables for containers, only hold that watch for a defined duration before cancelling it. This change reduces the CPU and memory usage of the kube-apiserver in large clusters. (#99393, @chenyw1990) [SIG API Machinery, Node and Testing]
WindowsEndpointSliceProxying feature gate has graduated to beta and is enabled by default. This means kube-proxy will read from EndpointSlices instead of Endpoints on Windows by default. (#99794, @robscott) [SIG Network]
kubectl wait ensures that observedGeneration >= generation to prevent stale state reporting. An example scenario can be found on CRD updates. (#97408, @KnicKnic)

Documentation

Azure file migration graduates to beta, with CSIMigrationAzureFile flag off by default as it requires installation of AzureFile CSI Driver. Users should enable CSIMigration and CSIMigrationAzureFile features and install the AzureFile CSI Driver to avoid disruption to existing Pod and PVC objects at that time. Azure File CSI driver does not support using same persistent volume with different fsgroups. When CSI migration is enabled for azurefile driver, such case is not supported. (there is a case we support where volume is mounted with 0777 and then it readable/writable by everyone) (#96293, @andyzhangx)
Official support to build kubernetes with docker-machine / remote docker is removed. This change does not affect building kubernetes with docker locally. (#97935, @adeniyistephen) [SIG Release and Testing]
Set kubelet option --volume-stats-agg-period to negative value to disable volume calculations. (#96675, @pacoxu) [SIG Node]

Failing Test

Escape the special characters like [, ] and that exist in vsphere windows path (#98830, @liyanhui1228) [SIG Storage and Windows]
Kube-proxy: fix a bug on UDP NodePort Services where stale connection tracking entries may blackhole the traffic directed to the NodePort (#98305, @aojea)
Kubelet: fixes a bug in the HostPort dockershim implementation that caused the conformance test "HostPort validates that there is no conflict between pods with same hostPort but different hostIP and protocol" to fail. (#98755, @aojea) [SIG Cloud Provider, Network and Node]

Bug or Regression

AcceleratorStats will be available in the Summary API of kubelet when cri_stats_provider is used. (#96873, @ruiwen-zhao) [SIG Node]
All data is no longer automatically deleted when a failure is detected during creation of the volume data file on a CSI volume. Now only the data file and volume path is removed. (#96021, @huffmanca)
Clean ReplicaSet by revision instead of creation timestamp in deployment controller (#97407, @waynepeking348) [SIG Apps]
Cleanup subnet in frontend IP configs to prevent huge subnet request bodies in some scenarios. (#98133, @nilo19) [SIG Cloud Provider]
Client-go exec credential plugins will pass stdin only when interactive terminal is detected on stdin. This fixes a bug where previously it was checking if stdout is an interactive terminal. (#99654, @ankeesler)
Cloud-controller-manager: routes controller should not depend on --allocate-node-cidrs (#97029, @andrewsykim) [SIG Cloud Provider and Testing]
Cluster Autoscaler version bump to v1.20.0 (#97011, @towca)
Creating a PVC with DataSource should fail for non-CSI plugins. (#97086, @xing-yang) [SIG Apps and Storage]
EndpointSlice controller is now less likely to emit FailedToUpdateEndpointSlices events. (#99345, @robscott) [SIG Apps and Network]
EndpointSlice controllers are less likely to create duplicate EndpointSlices. (#100103, @robscott) [SIG Apps and Network]
EndpointSliceMirroring controller is now less likely to emit FailedToUpdateEndpointSlices events. (#99756, @robscott) [SIG Apps and Network]
Ensure all vSphere nodes are are tracked by volume attach-detach controller (#96689, @gnufied)
Ensure empty string annotations are copied over in rollbacks. (#94858, @waynepeking348)
Ensure only one LoadBalancer rule is created when HA mode is enabled (#99825, @feiskyer) [SIG Cloud Provider]
Ensure that client-go's EventBroadcaster is safe (non-racy) during shutdown. (#95664, @DirectXMan12) [SIG API Machinery]
Explicitly pass KUBE_BUILD_CONFORMANCE=y in package-tarballs to reenable building the conformance tarballs. (#100571, @puerco)
Fix Azure file migration e2e test failure when CSIMigration is turned on. (#97877, @andyzhangx)
Fix CSI-migrated inline EBS volumes failing to mount if their volumeID is prefixed by aws:// (#96821, @wongma7) [SIG Storage]
Fix CVE-2020-8555 for Gluster client connections. (#97922, @liggitt) [SIG Storage]
Fix NPE in ephemeral storage eviction (#98261, @wzshiming) [SIG Node]
Fix PermissionDenied issue on SMB mount for Windows (#99550, @andyzhangx)
Fix bug that would let the Horizontal Pod Autoscaler scale down despite at least one metric being unavailable/invalid (#99514, @mikkeloscar) [SIG Apps and Autoscaling]
Fix cgroup handling for systemd with cgroup v2 (#98365, @odinuge) [SIG Node]
Fix counting error in service/nodeport/loadbalancer quota check (#97451, @pacoxu) [SIG API Machinery, Network and Testing]
Fix errors when accessing Windows container stats for Dockershim (#98510, @jsturtevant) [SIG Node and Windows]
Fix kube-proxy container image architecture for non amd64 images. (#98526, @saschagrunert)
Fix missing cadvisor machine metrics. (#97006, @lingsamuel) [SIG Node]
Fix nil VMSS name when setting service to auto mode (#97366, @nilo19) [SIG Cloud Provider]
Fix privileged config of Pod Sandbox which was previously ignored. (#96877, @xeniumlee)
Fix the panic when kubelet registers if a node object already exists with no Status.Capacity or Status.Allocatable (#95269, @SataQiu) [SIG Node]
Fix the regression with the slow pods termination. Before this fix pods may take an additional time to terminate - up to one minute. Reversing the change that ensured that CNI resources cleaned up when the pod is removed on API server. (#97980, @SergeyKanzhelev) [SIG Node]
Fix to recover CSI volumes from certain dangling attachments (#96617, @yuga711) [SIG Apps and Storage]
Fix: azure file latency issue for metadata-heavy workloads (#97082, @andyzhangx) [SIG Cloud Provider and Storage]
Fixed Cinder volume IDs on OpenStack Train (#96673, @jsafrane) [SIG Cloud Provider]
Fixed FibreChannel volume plugin corrupting filesystems on detach of multipath volumes. (#97013, @jsafrane) [SIG Storage]
Fixed a bug in kubelet that will saturate CPU utilization after containerd got restarted. (#97174, @hanlins) [SIG Node]
Fixed a bug that causes smaller number of conntrack-max being used under CPU static policy. (#99225, @xh4n3) (#99613, @xh4n3) [SIG Network]
Fixed a bug that on k8s nodes, when the policy of INPUT chain in filter table is not ACCEPT, healthcheck nodeport would not work. Added iptables rules to allow healthcheck nodeport traffic. (#97824, @hanlins) [SIG Network]
Fixed a bug that the kubelet cannot start on BtrfS. (#98042, @gjkim42) [SIG Node]
Fixed a race condition on API server startup ensuring previously created webhook configurations are effective before the first write request is admitted. (#95783, @roycaihw) [SIG API Machinery]
Fixed an issue with garbage collection failing to clean up namespaced children of an object also referenced incorrectly by cluster-scoped children (#98068, @liggitt) [SIG API Machinery and Apps]
Fixed authentication_duration_seconds metric scope. Previously, it included whole apiserver request duration which yields inaccurate results. (#99944, @marseel)
Fixed bug in CPUManager with race on container map access (#97427, @klueska) [SIG Node]
Fixed bug that caused cAdvisor to incorrectly detect single-socket multi-NUMA topology. (#99315, @iwankgb) [SIG Node]
Fixed cleanup of block devices when /var/lib/kubelet is a symlink. (#96889, @jsafrane) [SIG Storage]
Fixed no effect namespace when exposing deployment with --dry-run=client. (#97492, @masap) [SIG CLI]
Fixed provisioning of Cinder volumes migrated to CSI when StorageClass with AllowedTopologies was used. (#98311, @jsafrane) [SIG Storage]
Fixes a bug of identifying the correct containerd process. (#97888, @pacoxu)
Fixes add-on manager leader election to use leases instead of endpoints, similar to what kube-controller-manager does in 1.20 (#98968, @liggitt)
Fixes connection errors when using --volume-host-cidr-denylist or --volume-host-allow-local-loopback (#98436, @liggitt) [SIG Network and Storage]
Fixes problem where invalid selector on PodDisruptionBudget leads to a nil pointer dereference that causes the Controller manager to crash loop. (#98750, @mortent)
Fixes spurious errors about IPv6 in kube-proxy logs on nodes with IPv6 disabled. (#99127, @danwinship)
Fixing a bug where a failed node may not have the NoExecute taint set correctly (#96876, @howieyuen) [SIG Apps and Node]
GCE Internal LoadBalancer sync loop will now release the ILB IP address upon sync failure. An error in ILB forwarding rule creation will no longer leak IP addresses. (#97740, @prameshj) [SIG Cloud Provider and Network]
Ignore update pod with no new images in alwaysPullImages admission controller (#96668, @pacoxu) [SIG Apps, Auth and Node]
Improve speed of vSphere PV provisioning and reduce number of API calls (#100054, @gnufied) [SIG Cloud Provider and Storage]
KUBECTL_EXTERNAL_DIFF now accepts equal sign for additional parameters. (#98158, @dougsland) [SIG CLI]
Kube-apiserver: an update of a pod with a generic ephemeral volume dropped that volume if the feature had been disabled since creating the pod with such a volume (#99446, @pohly) [SIG Apps, Node and Storage]
Kube-proxy: remove deprecated --cleanup-ipvs flag of kube-proxy, and make --cleanup flag always to flush IPVS (#97336, @maaoBit) [SIG Network]
Kubeadm installs etcd v3.4.13 when creating cluster v1.19 (#97244, @pacoxu)
Kubeadm: Fixes a kubeadm upgrade bug that could cause a custom CoreDNS configuration to be replaced with the default. (#97016, @rajansandeep) [SIG Cluster Lifecycle]
Kubeadm: Some text in the kubeadm upgrade plan output has changed. If you have scripts or other automation that parses this output, please review these changes and update your scripts to account for the new output. (#98728, @stmcginnis) [SIG Cluster Lifecycle]
Kubeadm: fix a bug in the host memory detection code on 32bit Linux platforms (#97403, @abelbarrera15) [SIG Cluster Lifecycle]
Kubeadm: fix a bug where "kubeadm join" would not properly handle missing names for existing etcd members. (#97372, @ihgann) [SIG Cluster Lifecycle]
Kubeadm: fix a bug where "kubeadm upgrade" commands can fail if CoreDNS v1.8.0 is installed. (#97919, @neolit123) [SIG Cluster Lifecycle]
Kubeadm: fix a bug where external credentials in an existing admin.conf prevented the CA certificate to be written in the cluster-info ConfigMap. (#98882, @kvaps) [SIG Cluster Lifecycle]
Kubeadm: get k8s CI version markers from k8s infra bucket (#98836, @hasheddan) [SIG Cluster Lifecycle and Release]
Kubeadm: skip validating pod subnet against node-cidr-mask when allocate-node-cidrs is set to be false (#98984, @SataQiu) [SIG Cluster Lifecycle]
Kubectl logs: --ignore-errors is now honored by all containers, maintaining consistency with parallelConsumeRequest behavior. (#97686, @wzshiming)
Kubectl-convert: Fix no kind "Ingress" is registered for version error (#97754, @wzshiming)
Kubectl: Fixed panic when describing an ingress backend without an API Group (#100505, @lauchokyip) [SIG CLI]
Kubelet now cleans up orphaned volume directories automatically (#95301, @lorenz) [SIG Node and Storage]
Kubelet.exe on Windows now checks that the process running as administrator and the executing user account is listed in the built-in administrators group. This is the equivalent to checking the process is running as uid 0. (#96616, @perithompson) [SIG Node and Windows]
Kubelet: Fix kubelet from panic after getting the wrong signal (#98200, @wzshiming) [SIG Node]
Kubelet: Fix repeatedly acquiring the inhibit lock (#98088, @wzshiming) [SIG Node]
Kubelet: Fixed the bug of getting the number of cpu when the number of cpu logical processors is more than 64 in windows (#97378, @hwdef) [SIG Node and Windows]
Limits lease to have 1000 maximum attached objects. (#98257, @lingsamuel)
Mitigate CVE-2020-8555 for kube-up using GCE by preventing local loopback folume hosts. (#97934, @mattcary) [SIG Cloud Provider and Storage]
On single-stack configured (IPv4 or IPv6, but not both) clusters, Services which are both headless (no clusterIP) and selectorless (empty or undefined selector) will report ipFamilyPolicy RequireDualStack and will have entries in ipFamilies[] for both IPv4 and IPv6. This is a change from alpha, but does not have any impact on the manually-specified Endpoints and EndpointSlices for the Service. (#99555, @thockin) [SIG Apps and Network]
Performance regression #97685 has been fixed. (#97860, @MikeSpreitzer) [SIG API Machinery]
Pod Log stats for windows now reports metrics (#99221, @jsturtevant) [SIG Node, Storage, Testing and Windows]
Pod status updates faster when reacting on probe results. The first readiness probe will be called faster when startup probes succeeded, which will make Pod status as ready faster. (#98376, @matthyx)
Readjust kubelet_containers_per_pod_count buckets to only show metrics greater than 1. (#98169, @wawa0210)
Remove CSI topology from migrated in-tree gcepd volume. (#97823, @Jiawei0227) [SIG Cloud Provider and Storage]
Requests with invalid timeout parameters in the request URL now appear in the audit log correctly. (#96901, @tkashem) [SIG API Machinery and Testing]
Resolve a "concurrent map read and map write" crashing error in the kubelet (#95111, @choury) [SIG Node]
Resolves spurious Failed to list *v1.Secret or Failed to list *v1.ConfigMap messages in kubelet logs. (#99538, @liggitt) [SIG Auth and Node]
ResourceQuota of an entity now inclusively calculate Pod overhead (#99600, @gjkim42)
Return zero time (midnight on Jan. 1, 1970) instead of negative number when reporting startedAt and finishedAt of the not started or a running Pod when using dockershim as a runtime. (#99585, @Iceber)
Reverts breaking change to inline AzureFile volumes; referenced secrets are now searched for in the same namespace as the pod as in previous releases. (#100563, @msau42)
Scores from InterPodAffinity have stronger differentiation. (#98096, @leileiwan) [SIG Scheduling]
Specifying the KUBE_TEST_REPO environment variable when e2e tests are executed will instruct the test infrastructure to load that image from a location within the specified repo, using a predefined pattern. (#93510, @smarterclayton) [SIG Testing]
Static pods will be deleted gracefully. (#98103, @gjkim42) [SIG Node]
Sync node status during kubelet node shutdown. Adds an pod admission handler that rejects new pods when the node is in progress of shutting down. (#98005, @wzshiming) [SIG Node]
The calculation of pod UIDs for static pods has changed to ensure each static pod gets a unique value - this will cause all static pod containers to be recreated/restarted if an in-place kubelet upgrade from 1.20 to 1.21 is performed. Note that draining pods before upgrading the kubelet across minor versions is the supported upgrade path. (#87461, @bboreham) [SIG Node]
The maximum number of ports allowed in EndpointSlices has been increased from 100 to 20,000 (#99795, @robscott) [SIG Network]
Truncates a message if it hits the NoteLengthLimit when the scheduler records an event for the pod that indicates the pod has failed to schedule. (#98715, @carlory)
Updated k8s.gcr.io/ingress-gce-404-server-with-metrics-amd64 to a version that serves /metrics endpoint on a non-default port. (#97621, @vbannai) [SIG Cloud Provider]
Updates the commands `
- kubectl kustomize {arg}
- kubectl apply -k {arg} `to use same code as kustomize CLI v4.0.5 (#98946, @monopole)
Use force unmount for NFS volumes if regular mount fails after 1 minute timeout (#96844, @gnufied) [SIG Storage]
Use network.Interface.VirtualMachine.ID to get the binded VM Skip standalone VM when reconciling LoadBalancer (#97635, @nilo19) [SIG Cloud Provider]
Using exec auth plugins with kubectl no longer results in warnings about constructing many client instances from the same exec auth config. (#97857, @liggitt) [SIG API Machinery and Auth]
When a CNI plugin returns dual-stack pod IPs, kubelet will now try to respect the "primary IP family" of the cluster by picking a primary pod IP of the same family as the (primary) node IP, rather than assuming that the CNI plugin returned the IPs in the order the administrator wanted (since some CNI plugins don't allow configuring this). (#97979, @danwinship) [SIG Network and Node]
When dynamically provisioning Azure File volumes for a premium account, the requested size will be set to 100GB if the request is initially lower than this value to accommodate Azure File requirements. (#99122, @huffmanca) [SIG Cloud Provider and Storage]
When using Containerd on Windows, the C:\Windows\System32\drivers\etc\hosts file will now be managed by kubelet. (#83730, @claudiubelu)
VolumeBindingArgs now allow BindTimeoutSeconds to be set as zero, while the value zero indicates no waiting for the checking of volume binding operation. (#99835, @chendave) [SIG Scheduling and Storage]
kubectl exec and kubectl attach now honor the --quiet flag which suppresses output from the local binary that could be confused by a script with the remote command output (all non-failure output is hidden). In addition, print inline with exec and attach the list of alternate containers when we default to the first spec.container. (#99004, @smarterclayton) [SIG CLI]

Other (Cleanup or Flake)

APIs for kubelet annotations and labels from k8s.io/kubernetes/pkg/kubelet/apis are now moved under k8s.io/kubelet/pkg/apis/ (#98931, @michaelbeaumont)
Apiserver_request_duration_seconds is promoted to stable status. (#99925, @logicalhan) [SIG API Machinery, Instrumentation and Testing]
Bump github.com/Azure/go-autorest/autorest to v0.11.12 (#97033, @patrickshan) [SIG API Machinery, CLI, Cloud Provider and Cluster Lifecycle]
Clients required to use go1.15.8+ or go1.16+ if kube-apiserver has the goaway feature enabled to avoid unexpected data race condition. (#98809, @answer1991)
Delete deprecated service.beta.kubernetes.io/azure-load-balancer-mixed-protocols mixed procotol annotation in favor of the MixedProtocolLBService feature (#97096, @nilo19) [SIG Cloud Provider]
EndpointSlice generation is now incremented when labels change. (#99750, @robscott) [SIG Network]
Featuregate AllowInsecureBackendProxy graduates to GA and unconditionally enabled. (#99658, @deads2k)
Increase timeout for pod lifecycle test to reach pod status=ready (#96691, @hh)
Increased CSINodeIDMaxLength from 128 bytes to 192 bytes. (#98753, @Jiawei0227)
Kube-apiserver: The OIDC authenticator no longer waits 10 seconds before attempting to fetch the metadata required to verify tokens. (#97693, @enj) [SIG API Machinery and Auth]
Kube-proxy: Traffic from the cluster directed to ExternalIPs is always sent directly to the Service. (#96296, @aojea) [SIG Network and Testing]
Kubeadm: change the default image repository for CI images from 'gcr.io/kubernetes-ci-images' to 'gcr.io/k8s-staging-ci-images' (#97087, @SataQiu) [SIG Cluster Lifecycle]
Kubectl: The deprecated kubectl alpha debug command is removed. Use kubectl debug instead. (#98111, @pandaamanda) [SIG CLI]
Kubelet command line flags related to dockershim are now showing deprecation message as they will be removed along with dockershim in future release. (#98730, @dims)
Official support to build kubernetes with docker-machine / remote docker is removed. This change does not affect building kubernetes with docker locally. (#97618, @jherrera123) [SIG Release and Testing]
Process start time on Windows now uses current process information (#97491, @jsturtevant) [SIG API Machinery, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation and Windows]
Resolves flakes in the Ingress conformance tests due to conflicts with controllers updating the Ingress object (#98430, @liggitt) [SIG Network and Testing]
The AttachVolumeLimit feature gate (GA since v1.17) has been removed and now unconditionally enabled. (#96539, @ialidzhikov)
The CSINodeInfo feature gate that is GA since v1.17 is unconditionally enabled, and can no longer be specified via the --feature-gates argument. (#96561, @ialidzhikov) [SIG Apps, Auth, Scheduling, Storage and Testing]
The apiserver_request_total metric is promoted to stable status and no longer has a content-type dimensions, so any alerts/charts which presume the existence of this will fail. This is however, unlikely to be the case since it was effectively an unbounded dimension in the first place. (#99788, @logicalhan)
The default delegating authorization options now allow unauthenticated access to healthz, readyz, and livez. A system:masters user connecting to an authz delegator will not perform an authz check. (#98325, @deads2k) [SIG API Machinery, Auth, Cloud Provider and Scheduling]
The deprecated feature gates CSIDriverRegistry, BlockVolume and CSIBlockVolume are now unconditionally enabled and can no longer be specified in component invocations. (#98021, @gavinfish) [SIG Storage]
The deprecated feature gates RotateKubeletClientCertificate, AttachVolumeLimit, VolumePVCDataSource and EvenPodsSpread are now unconditionally enabled and can no longer be specified in component invocations. (#97306, @gavinfish) [SIG Node, Scheduling and Storage]
The e2e suite can be instructed not to wait for pods in kube-system to be ready or for all nodes to be ready by passing --allowed-not-ready-nodes=-1 when invoking the e2e.test program. This allows callers to run subsets of the e2e suite in scenarios other than perfectly healthy clusters. (#98781, @smarterclayton) [SIG Testing]
The feature gates WindowsGMSA and WindowsRunAsUserName that are GA since v1.18 are now removed. (#96531, @ialidzhikov) [SIG Node and Windows]
The new -gce-zones flag on the e2e.test binary instructs tests that check for information about how the cluster interacts with the cloud to limit their queries to the provided zone list. If not specified, the current behavior of asking the cloud provider for all available zones in multi zone clusters is preserved. (#98787, @smarterclayton) [SIG API Machinery, Cluster Lifecycle and Testing]
Update cri-tools to v1.20.0 (#97967, @rajibmitra) [SIG Cloud Provider]
Windows nodes on GCE will take longer to start due to dependencies installed at node creation time. (#98284, @pjh) [SIG Cloud Provider]
apiserver_storage_objects (a newer version of etcd_object_counts) is promoted and marked as stable. (#100082, @logicalhan)

Uncategorized

GCE L4 Loadbalancers now handle > 5 ports in service spec correctly. (#99595, @prameshj) [SIG Cloud Provider]
The DownwardAPIHugePages feature is beta. Users may use the feature if all workers in their cluster are min 1.20 version. The feature will be enabled by default in all installations in 1.22. (#99610, @derekwaynecarr) [SIG Node]

Dependencies

Added

github.com/go-errors/errors: v1.0.1
github.com/gobuffalo/here: v0.6.0
github.com/google/shlex: e7afc7f
github.com/markbates/pkger: v0.17.1
github.com/moby/spdystream: v0.2.0
github.com/monochromegane/go-gitignore: 205db1a
github.com/niemeyer/pretty: a10e7ca
github.com/xlab/treeprint: a009c39
go.starlark.net: 8dd3e2e
golang.org/x/term: 6a3ed07
sigs.k8s.io/kustomize/api: v0.8.5
sigs.k8s.io/kustomize/cmd/config: v0.9.7
sigs.k8s.io/kustomize/kustomize/v4: v4.0.5
sigs.k8s.io/kustomize/kyaml: v0.10.15

Changed

dmitri.shuralyov.com/gpu/mtl: 666a987 → 28db891
github.com/Azure/go-autorest/autorest: v0.11.1 → v0.11.12
github.com/NYTimes/gziphandler: 56545f4 → v1.1.1
github.com/cilium/ebpf: 1c8d4c9 → v0.2.0
github.com/container-storage-interface/spec: v1.2.0 → v1.3.0
github.com/containerd/console: v1.0.0 → v1.0.1
github.com/containerd/containerd: v1.4.1 → v1.4.4
github.com/coredns/corefile-migration: v1.0.10 → v1.0.11
github.com/creack/pty: v1.1.7 → v1.1.11
github.com/docker/docker: bd33bbf → v20.10.2+incompatible
github.com/go-logr/logr: v0.2.0 → v0.4.0
github.com/go-openapi/spec: v0.19.3 → v0.19.5
github.com/go-openapi/strfmt: v0.19.3 → v0.19.5
github.com/go-openapi/validate: v0.19.5 → v0.19.8
github.com/gogo/protobuf: v1.3.1 → v1.3.2
github.com/golang/mock: v1.4.1 → v1.4.4
github.com/google/cadvisor: v0.38.5 → v0.39.0
github.com/heketi/heketi: c2e2a4a → v10.2.0+incompatible
github.com/kisielk/errcheck: v1.2.0 → v1.5.0
github.com/konsorten/go-windows-terminal-sequences: v1.0.3 → v1.0.2
github.com/kr/text: v0.1.0 → v0.2.0
github.com/mattn/go-runewidth: v0.0.2 → v0.0.7
github.com/miekg/dns: v1.1.4 → v1.1.35
github.com/moby/sys/mountinfo: v0.1.3 → v0.4.0
github.com/moby/term: 672ec06 → df9cb8a
github.com/mrunalp/fileutils: abd8a0e → v0.5.0
github.com/olekukonko/tablewriter: a0225b3 → v0.0.4
github.com/opencontainers/runc: v1.0.0-rc92 → v1.0.0-rc93
github.com/opencontainers/runtime-spec: 4d89ac9 → e6143ca
github.com/opencontainers/selinux: v1.6.0 → v1.8.0
github.com/sergi/go-diff: v1.0.0 → v1.1.0
github.com/sirupsen/logrus: v1.6.0 → v1.7.0
github.com/syndtr/gocapability: d983527 → 42c35b4
github.com/willf/bitset: d5bec33 → v1.1.11
github.com/yuin/goldmark: v1.1.27 → v1.2.1
golang.org/x/crypto: 7f63de1 → 5ea612d
golang.org/x/exp: 6cc2880 → 85be41e
golang.org/x/mobile: d2bd2a2 → e6ae53a
golang.org/x/mod: v0.3.0 → ce943fd
golang.org/x/net: 69a7880 → 3d97a24
golang.org/x/sync: cd5d95a → 67f06af
golang.org/x/sys: 5cba982 → a50acf3
golang.org/x/time: 3af7569 → f8bda1e
golang.org/x/tools: c1934b7 → v0.1.0
gopkg.in/check.v1: 41f04d3 → 8fa4692
gopkg.in/yaml.v2: v2.2.8 → v2.4.0
gotest.tools/v3: v3.0.2 → v3.0.3
k8s.io/gengo: 83324d8 → b6c5ce2
k8s.io/klog/v2: v2.4.0 → v2.8.0
k8s.io/kube-openapi: d219536 → 591a79e
k8s.io/system-validators: v1.2.0 → v1.4.0
sigs.k8s.io/apiserver-network-proxy/konnectivity-client: v0.0.14 → v0.0.15
sigs.k8s.io/structured-merge-diff/v4: v4.0.2 → v4.1.0

Removed

github.com/codegangsta/negroni: v1.0.0
github.com/docker/spdystream: 449fdfc
github.com/golangplus/bytes: 45c989f
github.com/golangplus/fmt: 2a5d6d7
github.com/gorilla/context: v1.1.1
github.com/kr/pty: v1.1.5
rsc.io/quote/v3: v3.1.0
rsc.io/sampler: v1.3.0
sigs.k8s.io/kustomize: v2.0.3+incompatible

v1.21.0-rc.0

Downloads for v1.21.0-rc.0

Source Code

filename	sha512 hash
kubernetes.tar.gz	ef53a41955d6f8a8d2a94636af98b55d633fb8a5081517559039e019b3dd65c9d10d4e7fa297ab88a7865d772f3eecf72e7b0eeba5e87accb4000c91da33e148
kubernetes-src.tar.gz	9335a01b50d351776d3b8d00c07a5233844c51d307e361fa7e55a0620c1cb8b699e43eacf45ae9cafd8cbc44752e6987450c528a5bede8204706b7673000b5fc

Client binaries

filename	sha512 hash
kubernetes-client-darwin-amd64.tar.gz	964135e43234cee275c452f5f06fb6d2bcd3cff3211a0d50fa35fff1cc4446bc5a0ac5125405dadcfb6596cb152afe29fabf7aad5b35b100e1288db890b70f8e
kubernetes-client-darwin-arm64.tar.gz	50d782abaa4ded5e706b3192d87effa953ceabbd7d91e3d48b0c1fa2206a1963a909c14b923560f5d09cac2c7392edc5f38a13fbf1e9a40bc94e3afe8de10622
kubernetes-client-linux-386.tar.gz	72af5562f24184a2d7c27f95fa260470da979fbdcacce39a372f8f3add2991d7af8bc78f4e1dbe7a0f97e3f559b149b72a51491d3b13008da81872ee50f02f37
kubernetes-client-linux-amd64.tar.gz	1eddb8f6b51e005bc6f7b519d036cbe3d2f6d97dbf7d212dd933fb56354c29f222d050519115a9bcf94555aef095db7cf763469e47bb4ae3c6c07f97edf437cb
kubernetes-client-linux-arm.tar.gz	670f8ca60ea3cf0bb3262a772715e0ea735fccda6a92f3186299361dc455b304ae177d4017e0b67bbfa4a95e36f4cc3f7eb335e2a5130c93ac3fba2aff4519bf
kubernetes-client-linux-arm64.tar.gz	a69a47907cff138ba393d8c87044fd95d97f3ca8f35d301b50742e2801ad7c229d99d6667971091f65825eb51854d585be0dd7421670110b1aa567e67e7ab4b3
kubernetes-client-linux-ppc64le.tar.gz	b929feade94b71c81908abdcd4343b1e1e20098fd65e10d4d02585ad649d292d06f52c7ddc349efa188ce5b093e703c7aa9582c6ae5a69699adb87bbf5350243
kubernetes-client-linux-s390x.tar.gz	899d1470e412282cf289d8e24806d1a08c62ec0151f345ae3c9e497cc7bc0feab76498de4dd897d6adcdfa0c422e6b1a37e25d928669030f53457fd69d6e7df7
kubernetes-client-windows-386.tar.gz	9f0bc90a269eabd06fe4f637b5172a3a6a7d3de26de0d66504c2e1f2093083c584ea39031db6075a7da7a86b98c48bed25aa88d4ac09060b38692c6a5b637078
kubernetes-client-windows-amd64.tar.gz	05c8cc10188a1294b0d51d052942742a9b26411a08ec73494bf0e728a8a167e0a7863bdfc8864e76a371b584380098381805341e18b4b283b5d0cf298d5f7c7c

Server binaries

filename	sha512 hash
kubernetes-server-linux-amd64.tar.gz	355f278728ef7ac7eb2f5568c99c1429543c6302bbd0ed3bd0378c08116075e56ae850a49241313f078e2392702672ec6c9b70c8d97b4f2f5f4bee36828a63ba
kubernetes-server-linux-arm.tar.gz	9ac02c2825e2fd4e92f0c0f67180c67c24e32841ccbabc82284bf6293727ffecfae65e8a42b527c2a7ca482752384928eb65c2a1706144ae7819a6b3a1ab291c
kubernetes-server-linux-arm64.tar.gz	eb412453da03c82a9248412c8ccf4d4baa1fbfa81edd8d4f81d28969b40a3727e18934accc68f643d253446c58ffd2623292402495480b3d4b2a837b5318b957
kubernetes-server-linux-ppc64le.tar.gz	07da2812c35bbc427ee5b4a0b601c3ae271e0d50ab0dd4c5c25399f43506fa2a187642eb9d4d2085df7b90264d48ea2f31088af87d9efa7eb2e87f91e1fdbde4
kubernetes-server-linux-s390x.tar.gz	3b79442a3d6e389c4ff105922a8e49994c0b6c088d2c501bd8c78d9f9e814902f5bb72c8f9c89380b750fda9b3a336759b9b68f11d70bef4f0e984564a95c29e

Node binaries

filename	sha512 hash
kubernetes-node-linux-amd64.tar.gz	f12edf1faf5f07de1ebc5a8626601c12927902e10aca3f11e398637382fdf55365dbd9a0ef38858553fb7569495ae2cf68f155dd2e49b85b27d76fb599bb92e4
kubernetes-node-linux-arm.tar.gz	4fba8fc4e2102f07fb778aab597ec7231ea65c35e1aa618fe98b707b64a931237bd842c173e9120326e4d9deb983bb3917176762bba2212612bbc09d6e2105c4
kubernetes-node-linux-arm64.tar.gz	a2e1be5459a8346839970faf4e7ebdb8ab9f3273e02babf1f3199b06bdb67434a2d18fcd1628cf1b989756e99d8dad6624a455b9db11d50f51f509f4df5c27da
kubernetes-node-linux-ppc64le.tar.gz	16d2c1cc295474fc49fe9a827ddd73e81bdd6b76af7074987b90250023f99b6d70bf474e204c7d556802111984fcb3a330740b150bdc7970d0e3634eb94a1665
kubernetes-node-linux-s390x.tar.gz	9dc6faa6cd007b13dfce703f3e271f80adcc4e029c90a4a9b4f2f143b9756f2893f8af3d7c2cf813f2bd6731cffd87d15d4229456c1685939f65bf467820ec6e
kubernetes-node-windows-amd64.tar.gz	f8bac2974c9142bfb80cd5eadeda79f79f27b78899a4e6e71809b795c708824ba442be83fdbadb98e01c3823dd8350776358258a205e851ed045572923cacba7

Changelog since v1.21.0-beta.1

Urgent Upgrade Notes

(No, really, you MUST read this before you upgrade)

Migrated pkg/kubelet/cm/cpuset/cpuset.go to structured logging. Exit code changed from 255 to 1. (#100007, @utsavoza) [SIG Instrumentation and Node]

Changes by Kind

API Change

Add Probe-level terminationGracePeriodSeconds field (#99375, @ehashman) [SIG API Machinery, Apps, Node and Testing]
CSIServiceAccountToken is Beta now (#99298, @zshihang) [SIG Auth, Storage and Testing]
Discovery.k8s.io/v1beta1 EndpointSlices are deprecated in favor of discovery.k8s.io/v1, and will no longer be served in Kubernetes v1.25. (#100472, @liggitt) [SIG Network]
FieldManager no longer owns fields that get reset before the object is persisted (e.g. "status wiping"). (#99661, @kevindelgado) [SIG API Machinery, Auth and Testing]
Generic ephemeral volumes are beta. (#99643, @pohly) [SIG API Machinery, Apps, Auth, CLI, Node, Storage and Testing]
Implement the GetAvailableResources in the podresources API. (#95734, @fromanirh) [SIG Instrumentation, Node and Testing]
The Endpoints controller will now set the endpoints.kubernetes.io/over-capacity annotation to "warning" when an Endpoints resource contains more than 1000 addresses. In a future release, the controller will truncate Endpoints that exceed this limit. The EndpointSlice API can be used to support significantly larger number of addresses. (#99975, @robscott) [SIG Apps and Network]
The PodDisruptionBudget API has been promoted to policy/v1 with no schema changes. The only functional change is that an empty selector ({}) written to a policy/v1 PodDisruptionBudget now selects all pods in the namespace. The behavior of the policy/v1beta1 API remains unchanged. The policy/v1beta1 PodDisruptionBudget API is deprecated and will no longer be served in 1.25+. (#99290, @mortent) [SIG API Machinery, Apps, Auth, Autoscaling, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Scheduling and Testing]
Topology Aware Hints are now available in alpha and can be enabled with the TopologyAwareHints feature gate. (#99522, @robscott) [SIG API Machinery, Apps, Auth, Instrumentation, Network and Testing]

Feature

Add e2e test to validate performance metrics of volume lifecycle operations (#94334, @RaunakShah) [SIG Storage and Testing]
EmptyDir memory backed volumes are sized as the the minimum of pod allocatable memory on a host and an optional explicit user provided value. (#100319, @derekwaynecarr) [SIG Node]
Enables Kubelet to check volume condition and log events to corresponding pods. (#99284, @fengzixu) [SIG Apps, Instrumentation, Node and Storage]
Introduce a churn operator to scheduler perf testing framework. (#98900, @Huang-Wei) [SIG Scheduling and Testing]
Kubernetes is now built with Golang 1.16.1 (#100106, @justaugustus) [SIG Cloud Provider, Instrumentation, Release and Testing]
Migrated pkg/kubelet/cm/devicemanager to structured logging (#99976, @knabben) [SIG Instrumentation and Node]
Migrated pkg/kubelet/cm/memorymanager to structured logging (#99974, @knabben) [SIG Instrumentation and Node]
Migrated pkg/kubelet/cm/topologymanager to structure logging (#99969, @knabben) [SIG Instrumentation and Node]
Rename metrics etcd_object_counts to apiserver_storage_object_counts and mark it as stable. The original etcd_object_counts metrics name is marked as "Deprecated" and will be removed in the future. (#99785, @erain) [SIG API Machinery, Instrumentation and Testing]
Update pause container to run as pseudo user and group 65535:65535. This implies the release of version 3.5 of the container images. (#97963, @saschagrunert) [SIG CLI, Cloud Provider, Cluster Lifecycle, Node, Release, Security and Testing]
Users might specify the kubectl.kubernetes.io/default-exec-container annotation in a Pod to preselect container for kubectl commands. (#99833, @mengjiao-liu) [SIG CLI]

Bug or Regression

Add ability to skip OpenAPI handler installation to the GenericAPIServer (#100341, @kevindelgado) [SIG API Machinery]
Count pod overhead against an entity's ResourceQuota (#99600, @gjkim42) [SIG API Machinery and Node]
EndpointSlice controllers are less likely to create duplicate EndpointSlices. (#100103, @robscott) [SIG Apps and Network]
Ensure only one LoadBalancer rule is created when HA mode is enabled (#99825, @feiskyer) [SIG Cloud Provider]
Fixed a race condition on API server startup ensuring previously created webhook configurations are effective before the first write request is admitted. (#95783, @roycaihw) [SIG API Machinery]
Fixed authentication_duration_seconds metric. Previously it included whole apiserver request duration. (#99944, @marseel) [SIG API Machinery, Instrumentation and Scalability]
Fixes issue where inline AzueFile secrets could not be accessed from the pod's namespace. (#100563, @msau42) [SIG Storage]
Improve speed of vSphere PV provisioning and reduce number of API calls (#100054, @gnufied) [SIG Cloud Provider and Storage]
Kubectl: Fixed panic when describing an ingress backend without an API Group (#100505, @lauchokyip) [SIG CLI]
Kubectl: fix case of age column in describe node (#96963, @bl-ue) (#96963, @bl-ue) [SIG CLI]
Kubelet.exe on Windows now checks that the process running as administrator and the executing user account is listed in the built-in administrators group. This is the equivalent to checking the process is running as uid 0. (#96616, @perithompson) [SIG Node and Windows]
Kubelet: Fixed the bug of getting the number of cpu when the number of cpu logical processors is more than 64 in windows (#97378, @hwdef) [SIG Node and Windows]
Pass KUBE_BUILD_CONFORMANCE=y to the package-tarballs to reenable building the conformance tarballs. (#100571, @puerco) [SIG Release]
Pod Log stats for windows now reports metrics (#99221, @jsturtevant) [SIG Node, Storage, Testing and Windows]

Other (Cleanup or Flake)

A new storage E2E testsuite covers CSIStorageCapacity publishing if a driver opts into the test. (#100537, @pohly) [SIG Storage and Testing]
Convert cmd/kubelet/app/server.go to structured logging (#98334, @wawa0210) [SIG Node]
If kube-apiserver enabled goaway feature, clients required golang 1.15.8 or 1.16+ version to avoid un-expected data race issue. (#98809, @answer1991) [SIG API Machinery]
Increased CSINodeIDMaxLength from 128 bytes to 192 bytes. (#98753, @Jiawei0227) [SIG Apps and Storage]
Migrate pkg/kubelet/pluginmanager to structured logging (#99885, @qingwave) [SIG Node]
Migrate pkg/kubelet/preemption/preemption.go and pkg/kubelet/logs/container_log_manager.go to structured logging (#99848, @qingwave) [SIG Node]
Migrate pkg/kubelet/(cri) to structured logging (#99006, @yangjunmyfm192085) [SIG Node]
Migrate pkg/kubelet/(node, pod) to structured logging (#98847, @yangjunmyfm192085) [SIG Node]
Migrate pkg/kubelet/(volume,container) to structured logging (#98850, @yangjunmyfm192085) [SIG Node]
Migrate pkg/kubelet/kubelet_node_status.go to structured logging (#98154, @yangjunmyfm192085) [SIG Node and Release]
Migrate pkg/kubelet/lifecycle,oom to structured logging (#99479, @mengjiao-liu) [SIG Instrumentation and Node]
Migrate cmd/kubelet/+ pkg/kubelet/cadvisor/cadvisor_linux.go + pkg/kubelet/cri/remote/util/util_unix.go + pkg/kubelet/images/image_manager.go to structured logging (#99994, @AfrouzMashayekhi) [SIG Instrumentation and Node]
Migrate pkg/kubelet/cm/container_manager_linux.go and pkg/kubelet/cm/container_manager_stub.go to structured logging (#100001, @shiyajuan123) [SIG Instrumentation and Node]
Migrate pkg/kubelet/cm/cpumanage/{topology/togit pology.go, policy_none.go, cpu_assignment.go} to structured logging (#100163, @lala123912) [SIG Instrumentation and Node]
Migrate pkg/kubelet/cm/cpumanager/state to structured logging (#99563, @jmguzik) [SIG Instrumentation and Node]
Migrate pkg/kubelet/config to structured logging (#100002, @AfrouzMashayekhi) [SIG Instrumentation and Node]
Migrate pkg/kubelet/kubelet.go to structured logging (#99861, @navidshaikh) [SIG Instrumentation and Node]
Migrate pkg/kubelet/kubeletconfig to structured logging (#100265, @ehashman) [SIG Node]
Migrate pkg/kubelet/kuberuntime to structured logging (#99970, @krzysiekg) [SIG Instrumentation and Node]
Migrate pkg/kubelet/prober to structured logging (#99830, @krzysiekg) [SIG Instrumentation and Node]
Migrate pkg/kubelet/winstats to structured logging (#99855, @hexxdump) [SIG Instrumentation and Node]
Migrate probe log messages to structured logging (#97093, @aldudko) [SIG Instrumentation and Node]
Migrate remaining kubelet files to structured logging (#100196, @ehashman) [SIG Instrumentation and Node]
apiserver_storage_objects (a newer version of `etcd_object_counts) is promoted and marked as stable. (#100082, @logicalhan) [SIG API Machinery, Instrumentation and Testing]

Dependencies

Added

Nothing has changed.

Changed

github.com/cilium/ebpf: 1c8d4c9 → v0.2.0
github.com/containerd/console: v1.0.0 → v1.0.1
github.com/containerd/containerd: v1.4.1 → v1.4.4
github.com/creack/pty: v1.1.9 → v1.1.11
github.com/docker/docker: bd33bbf → v20.10.2+incompatible
github.com/google/cadvisor: v0.38.8 → v0.39.0
github.com/konsorten/go-windows-terminal-sequences: v1.0.3 → v1.0.2
github.com/moby/sys/mountinfo: v0.1.3 → v0.4.0
github.com/moby/term: 672ec06 → df9cb8a
github.com/mrunalp/fileutils: abd8a0e → v0.5.0
github.com/opencontainers/runc: v1.0.0-rc92 → v1.0.0-rc93
github.com/opencontainers/runtime-spec: 4d89ac9 → e6143ca
github.com/opencontainers/selinux: v1.6.0 → v1.8.0
github.com/sirupsen/logrus: v1.6.0 → v1.7.0
github.com/syndtr/gocapability: d983527 → 42c35b4
github.com/willf/bitset: d5bec33 → v1.1.11
gotest.tools/v3: v3.0.2 → v3.0.3
k8s.io/klog/v2: v2.5.0 → v2.8.0
sigs.k8s.io/structured-merge-diff/v4: v4.0.3 → v4.1.0

Removed

Nothing has changed.

v1.21.0-beta.1

Downloads for v1.21.0-beta.1

Source Code

filename	sha512 hash
kubernetes.tar.gz	c9f4f25242e319e5d90f49d26f239a930aad69677c0f3c2387c56bb13482648a26ed234be2bfe2352508f35010e3eb6d3b127c31a9f24fa1e53ac99c38520fe4
kubernetes-src.tar.gz	255357db8fa160cab2187658906b674a8b0d9b9a5b5f688cc7b69dc124f5da00362c6cc18ae9b80f7ddb3da6f64c2ab2f12fb9b63a4e063c7366a5375b175cda

Client binaries

filename	sha512 hash
kubernetes-client-darwin-amd64.tar.gz	02efd389c8126456416fd2c7ea25c3cc30f612649ad91f631f068d6c0e5e539484d3763cb9a8645ad6b8077e4fcd1552a659d7516ebc4ce6828cf823b65c3016
kubernetes-client-darwin-arm64.tar.gz	ac90dcd1699d1d7ff9c8342d481f6d0d97ccdc3ec501a56dc7c9e1898a8f77f712bf66942d304bfe581b5494f13e3efa211865de88f89749780e9e26e673dbdb
kubernetes-client-linux-386.tar.gz	cce5fb84cc7a1ee664f89d8ad3064307c51c044e9ddd2ae5a004939b69d3b3ef6f29acc5782e27d0c8f0d6d3d9c96e922f5d1b99d210ca3e754666d775df9f0c
kubernetes-client-linux-amd64.tar.gz	2e93bbd2e60ad7cd8fe495115e96c55b1dc8facd100a827ef9c197a732679b60cceb9ea7bf92a1f5e328c3b8adfa8d3922cbc5d8370e374f3381b83f5b877b4f
kubernetes-client-linux-arm.tar.gz	23f03b6a8fa9decce9b89a2c1bd3dae6d0b2f9e533e35a79e2c5a29326a165259677594ae83c877219a21bdb95557a284e55f4eec12954742794579c89a7d7e5
kubernetes-client-linux-arm64.tar.gz	3acf3101b46568b0ded6b90f13df0e918870d6812dc1a584903ddb8ba146484a204b9e442f863df47c7d4dab043fd9f7294c5510d3eb09004993d6d3b1e9e13c
kubernetes-client-linux-ppc64le.tar.gz	f749198df69577f62872d3096138a1b8969ec6b1636eb68eb56640bf33cf5f97a11df4363462749a1c0dc3ccbb8ae76c5d66864bf1c5cf7e52599caaf498e504
kubernetes-client-linux-s390x.tar.gz	3f6c0189d59fca22cdded3a02c672ef703d17e6ab0831e173a870e14ccec436c142600e9fc35b403571b6906f2be8d18d38d33330f7caada971bbe1187b388f6
kubernetes-client-windows-386.tar.gz	03d92371c425cf331c80807c0ac56f953be304fc6719057258a363d527d186d610e1d4b4d401b34128062983265c2e21f2d2389231aa66a6f5787eee78142cf6
kubernetes-client-windows-amd64.tar.gz	489ece0c886a025ca3a25d28518637a5a824ea6544e7ef8778321036f13c8909a978ad4ceca966cec1e1cda99f25ca78bfd37460d1231c77436d216d43c872ad

Server binaries

filename	sha512 hash
kubernetes-server-linux-amd64.tar.gz	2e95cb31d5afcb6842c41d25b7d0c18dd7e65693b2d93c8aa44e5275f9c6201e1a67685c7a8ddefa334babb04cb559d26e39b6a18497695a07dc270568cae108
kubernetes-server-linux-arm.tar.gz	2927e82b98404c077196ce3968f3afd51a7576aa56d516019bd3976771c0213ba01e78da5b77478528e770da0d334e9457995fafb98820ed68b2ee34beb68856
kubernetes-server-linux-arm64.tar.gz	e0f7aea3ea598214a9817bc04949389cb7e4e7b9503141a590ef48c0b681fe44a4243ebc6280752fa41aa1093149b3ee1bcef7664edb746097a342281825430b
kubernetes-server-linux-ppc64le.tar.gz	c011f7eb01294e9ba5d5ced719068466f88ed595dcb8d554a36a4dd5118fb6b3d6bafe8bf89aa2d42988e69793ed777ba77b8876c6ec74f898a43cfce1f61bf4
kubernetes-server-linux-s390x.tar.gz	15f6683e7f16caab7eebead2b7c15799460abbf035a43de0b75f96b0be19908f58add98a777a0cca916230d60cf6bfe3fee92b9dcff50274b1e37c243c157969

Node binaries

filename	sha512 hash
kubernetes-node-linux-amd64.tar.gz	ed58679561197110f366b9109f7afd62c227bfc271918ccf3eea203bb2ab6428eb5db4dd6c965f202a8a636f66da199470269b863815809b99d53d2fa47af2ea
kubernetes-node-linux-arm.tar.gz	7e6c7f1957fcdecec8fef689c5019edbc0d0c11d22dafbfef0a07121d10d8f6273644f73511bd06a9a88b04d81a940bd6645ffb5711422af64af547a45c76273
kubernetes-node-linux-arm64.tar.gz	a3618f29967e7a1574917a67f0296e65780321eda484b99aa32bfd4dc9b35acdefce33da952ac52dfb509fbac5bf700cf177431fad2ab4adcab0544538939faa
kubernetes-node-linux-ppc64le.tar.gz	326d3eb521b41bdf489912177f70b8cdd7cd828bb9b3d847ed3694eb27e457f24e0a88b8e51b726eee39800a3c5a40c1b30e3a8ec4a34d8041b3d8ef05d1b749
kubernetes-node-linux-s390x.tar.gz	022d05ebaa66a0332c4fe18cdaf23d14c2c7e4d1f2af7f27baaf1eb042e6890dc3434b4ac8ba58c35d590717956f8c3458112685aff4938b94b18e263c3f4256
kubernetes-node-windows-amd64.tar.gz	fa691ed93f07af6bc1cf57e20a30580d6c528f88e5fea3c14f39c1820969dc5a0eb476c5b87b288593d0c086c4dd93aff6165082393283c3f46c210f9bb66d61

Changelog since v1.21.0-beta.0

Urgent Upgrade Notes

(No, really, you MUST read this before you upgrade)

Kubeadm: during "init" an empty cgroupDriver value in the KubeletConfiguration is now always set to "systemd" unless the user is explicit about it. This requires existing machine setups to configure the container runtime to use the "systemd" driver. Documentation on this topic can be found here: https://kubernetes.io/docs/setup/production-environment/container-runtimes/. When upgrading existing clusters / nodes using "kubeadm upgrade" the old cgroupDriver value is preserved, but in 1.22 this change will also apply to "upgrade". For more information on migrating to the "systemd" driver or remaining on the "cgroupfs" driver see: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/configure-cgroup-driver/. (#99471, @neolit123) [SIG Cluster Lifecycle]
Migrate pkg/kubelet/(dockershim, network) to structured logging Exit code changed from 255 to 1 (#98939, @yangjunmyfm192085) [SIG Network and Node]
Migrate pkg/kubelet/certificate to structured logging Exit code changed from 255 to 1 (#98993, @SataQiu) [SIG Auth and Node]
Newly provisioned PVs by EBS plugin will no longer use the deprecated "failure-domain.beta.kubernetes.io/zone" and "failure-domain.beta.kubernetes.io/region" labels. It will use "topology.kubernetes.io/zone" and "topology.kubernetes.io/region" labels instead. (#99130, @ayberk) [SIG Cloud Provider, Storage and Testing]
Newly provisioned PVs by OpenStack Cinder plugin will no longer use the deprecated "failure-domain.beta.kubernetes.io/zone" and "failure-domain.beta.kubernetes.io/region" labels. It will use "topology.kubernetes.io/zone" and "topology.kubernetes.io/region" labels instead. (#99719, @jsafrane) [SIG Cloud Provider and Storage]
OpenStack Cinder CSI migration is on by default, Clinder CSI driver must be installed on clusters on OpenStack for Cinder volumes to work. (#98538, @dims) [SIG Storage]
Package pkg/kubelet/server migrated to structured logging Exit code changed from 255 to 1 (#99838, @adisky) [SIG Node]
Pkg/kubelet/kuberuntime/kuberuntime_manager.go migrated to structured logging Exit code changed from 255 to 1 (#99841, @adisky) [SIG Instrumentation and Node]

Changes by Kind

Deprecation

Kubeadm: the deprecated kube-dns is no longer supported as an option. If "ClusterConfiguration.dns.type" is set to "kube-dns" kubeadm will now throw an error. (#99646, @rajansandeep) [SIG Cluster Lifecycle]
Remove deprecated --generator --replicas --service-generator --service-overrides --schedule from kubectl run Deprecate --serviceaccount --hostport --requests --limits in kubectl run (#99732, @soltysh) [SIG CLI and Testing]
audit.k8s.io/v1beta1 and audit.k8s.io/v1alpha1 audit policy configuration and audit events are deprecated in favor of audit.k8s.io/v1, available since v1.13. kube-apiserver invocations that specify alpha or beta policy configurations with --audit-policy-file, or explicitly request alpha or beta audit events with --audit-log-version / --audit-webhook-version must update to use audit.k8s.io/v1 and accept audit.k8s.io/v1 events prior to v1.24. (#98858, @carlory) [SIG Auth]
diskformat stroage class parameter for in-tree vSphere volume plugin is deprecated as of v1.21 release. Please consider updating storageclass and remove diskformat parameter. vSphere CSI Driver does not support diskformat storageclass parameter.

vSphere releases less than 67u3 are deprecated as of v1.21. Please consider upgrading vSphere to 67u3 or above. vSphere CSI Driver requires minimum vSphere 67u3.

VM Hardware version less than 15 is deprecated as of v1.21. Please consider upgrading the Node VM Hardware version to 15 or above. vSphere CSI Driver recommends Node VM's Hardware version set to at least vmx-15.

Multi vCenter support is deprecated as of v1.21. If you have a Kubernetes cluster spanning across multiple vCenter servers, please consider moving all k8s nodes to a single vCenter Server. vSphere CSI Driver does not support Kubernetes deployment spanning across multiple vCenter servers.

Support for these deprecations will be available till Kubernetes v1.24. (#98546, @divyenpatel) [SIG Cloud Provider and Storage]

API Change

1. PodAffinityTerm includes a namespaceSelector field to allow selecting eligible namespaces based on their labels.
2. A new CrossNamespacePodAffinity quota scope API that allows restricting which namespaces allowed to use PodAffinityTerm with corss-namespace reference via namespaceSelector or namespaces fields. (#98582, @ahg-g) [SIG API Machinery, Apps, Auth and Testing]
Add a default metadata name labels for selecting any namespace by its name. (#96968, @jayunit100) [SIG API Machinery, Apps, Cloud Provider, Storage and Testing]
Added .spec.completionMode field to Job, with accepted values NonIndexed (default) and Indexed (#98441, @alculquicondor) [SIG Apps and CLI]
Clarified NetworkPolicy policyTypes documentation (#97216, @joejulian) [SIG Network]
DaemonSets accept a MaxSurge integer or percent on their rolling update strategy that will launch the updated pod on nodes and wait for those pods to go ready before marking the old out-of-date pods as deleted. This allows workloads to avoid downtime during upgrades when deployed using DaemonSets. This feature is alpha and is behind the DaemonSetUpdateSurge feature gate. (#96441, @smarterclayton) [SIG Apps and Testing]
EndpointSlice API is now GA. The EndpointSlice topology field has been removed from the GA API and will be replaced by a new per Endpoint Zone field. If the topology field was previously used, it will be converted into an annotation in the v1 Resource. The discovery.k8s.io/v1alpha1 API is removed. (#99662, @swetharepakula) [SIG API Machinery, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Network and Testing]
EndpointSlice Controllers are now GA. The EndpointSlice Controller will not populate the deprecatedTopology field and will only provide topology information through the zone and nodeName fields. (#99870, @swetharepakula) [SIG API Machinery, Apps, Auth, Network and Testing]
IngressClass resource can now reference a resource in a specific namespace for implementation-specific configuration(previously only Cluster-level resources were allowed). This feature can be enabled using the IngressClassNamespacedParams feature gate. (#99275, @hbagdi) [SIG API Machinery, CLI and Network]
Introduce conditions for PodDisruptionBudget (#98127, @mortent) [SIG API Machinery, Apps, Auth, CLI, Cloud Provider, Cluster Lifecycle and Instrumentation]
Jobs API has a new .spec.suspend field that can be used to suspend and resume Jobs (#98727, @adtac) [SIG API Machinery, Apps, Node, Scheduling and Testing]
Kubelet Graceful Node Shutdown feature is now beta. (#99735, @bobbypage) [SIG Node]
Limit the quest value of hugepage to integer multiple of page size. (#98515, @lala123912) [SIG Apps]
One new field "InternalTrafficPolicy" in Service is added. It specifies if the cluster internal traffic should be routed to all endpoints or node-local endpoints only. "Cluster" routes internal traffic to a Service to all endpoints. "Local" routes traffic to node-local endpoints only, and traffic is dropped if no node-local endpoints are ready. The default value is "Cluster". (#96600, @maplain) [SIG API Machinery, Apps and Network]
PodSecurityPolicy only stores "generic" as allowed volume type if the GenericEphemeralVolume feature gate is enabled (#98918, @pohly) [SIG Auth and Security]
Promote CronJobs to batch/v1 (#99423, @soltysh) [SIG API Machinery, Apps, CLI and Testing]
Remove support for building Kubernetes with bazel. (#99561, @BenTheElder) [SIG API Machinery, Apps, Architecture, Auth, Autoscaling, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Network, Node, Release, Scalability, Scheduling, Storage, Testing and Windows]
Setting loadBalancerClass in load balancer type of service is available with this PR. Users who want to use a custom load balancer can specify loadBalancerClass to achieve it. (#98277, @XudongLiuHarold) [SIG API Machinery, Apps, Cloud Provider and Network]
Storage capacity tracking (= the CSIStorageCapacity feature) is beta, storage.k8s.io/v1alpha1/VolumeAttachment and storage.k8s.io/v1alpha1/CSIStorageCapacity objects are deprecated (#99641, @pohly) [SIG API Machinery, Apps, Auth, Scheduling, Storage and Testing]
Support for Indexed Job: a Job that is considered completed when Pods associated to indexes from 0 to (.spec.completions-1) have succeeded. (#98812, @alculquicondor) [SIG Apps and CLI]
The apiserver now resets managedFields that got corrupted by a mutating admission controller. (#98074, @kwiesmueller) [SIG API Machinery and Testing]
controller.kubernetes.io/pod-deletion-cost annotation can be set to offer a hint on the cost of deleting a pod compared to other pods belonging to the same ReplicaSet. Pods with lower deletion cost are deleted first. This is an alpha feature. (#99163, @ahg-g) [SIG Apps]

Feature

A client-go metric, rest_client_exec_plugin_call_total, has been added to track total calls to client-go credential plugins. (#98892, @ankeesler) [SIG API Machinery, Auth, Cluster Lifecycle and Instrumentation]
Add --use-protocol-buffers flag to kubectl top pods and nodes (#96655, @serathius) [SIG CLI]
Add support to generate client-side binaries for new darwin/arm64 platform (#97743, @dims) [SIG Release and Testing]
Added ephemeral_volume_controller_create[_failures]_total counters to kube-controller-manager metrics (#99115, @pohly) [SIG API Machinery, Apps, Cluster Lifecycle, Instrumentation and Storage]
Adds alpha feature VolumeCapacityPriority which makes the scheduler prioritize nodes based on the best matching size of statically provisioned PVs across multiple topologies. (#96347, @cofyc) [SIG Apps, Network, Scheduling, Storage and Testing]
Adds two new metrics to cronjobs, a histogram to track the time difference when a job is created and the expected time when it should be created, and a gauge for the missed schedules of a cronjob (#99341, @alaypatel07) [SIG Apps and Instrumentation]
Alpha implementation of Kubectl Command Headers: SIG CLI KEP 859 enabled when KUBECTL_COMMAND_HEADERS environment variable set on the client command line.
- To enable: export KUBECTL_COMMAND_HEADERS=1; kubectl ... (#98952, @seans3) [SIG API Machinery and CLI]
Component owner can configure the allowlist of metric label with flag '--allow-metric-labels'. (#99738, @YoyinZyc) [SIG API Machinery, Cluster Lifecycle and Instrumentation]
Disruption controller only sends one event per PodDisruptionBudget if scale can't be computed (#98128, @mortent) [SIG Apps]
EndpointSliceNodeName will always be enabled, so NodeName will always be available in the v1beta1 API. (#99746, @swetharepakula) [SIG Apps and Network]
Graduate CRIContainerLogRotation feature gate to GA. (#99651, @umohnani8) [SIG Node and Testing]
Kube-proxy iptables: new metric sync_proxy_rules_iptables_total that exposes the number of rules programmed per table in each iteration (#99653, @aojea) [SIG Instrumentation and Network]
Kube-scheduler now logs plugin scoring summaries at --v=4 (#99411, @damemi) [SIG Scheduling]
Kubeadm: a warning to user as ipv6 site-local is deprecated (#99574, @pacoxu) [SIG Cluster Lifecycle and Network]
Kubeadm: apply the "node.kubernetes.io/exclude-from-external-load-balancers" label on control plane nodes during "init", "join" and "upgrade" to preserve backwards compatibility with the lagacy LB mode where nodes labeled as "master" where excluded. To opt-out you can remove the label from a node. See #97543 and the linked KEP for more details. (#98269, @neolit123) [SIG Cluster Lifecycle]
Kubeadm: if the user has customized their image repository via the kubeadm configuration, pass the custom pause image repository and tag to the kubelet via --pod-infra-container-image not only for Docker but for all container runtimes. This flag tells the kubelet that it should not garbage collect the image. (#99476, @neolit123) [SIG Cluster Lifecycle]
Kubeadm: promote IPv6DualStack feature gate to Beta (#99294, @pacoxu) [SIG Cluster Lifecycle]
Kubectl version changed to write a warning message to stderr if the client and server version difference exceeds the supported version skew of +/-1 minor version. (#98250, @brianpursley) [SIG CLI]
Kubernetes is now built with Golang 1.16 (#98572, @justaugustus) [SIG API Machinery, Auth, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Node, Release and Testing]
Persistent Volumes formatted with the btrfs filesystem will now automatically resize when expanded. (#99361, @Novex) [SIG Storage]
Remove cAdvisor json metrics api collected by Kubelet (#99236, @pacoxu) [SIG Node]
Sysctls is now GA and locked to default (#99158, @wgahnagl) [SIG Node]
The NodeAffinity plugin implements the PreFilter extension, offering enhanced performance for Filter. (#99213, @AliceZhang2016) [SIG Scheduling]
The endpointslice mirroring controller mirrors endpoints annotations and labels to the generated endpoint slices, it also ensures that updates on any of these fields are mirrored. The well-known annotation endpoints.kubernetes.io/last-change-trigger-time is skipped and not mirrored. (#98116, @aojea) [SIG Apps, Network and Testing]
Update the latest validated version of Docker to 20.10 (#98977, @neolit123) [SIG CLI, Cluster Lifecycle and Node]
Upgrade node local dns to 1.17.0 for better IPv6 support (#99749, @pacoxu) [SIG Cloud Provider and Network]
Users might specify the kubectl.kubernetes.io/default-exec-container annotation in a Pod to preselect container for kubectl commands. (#99581, @mengjiao-liu) [SIG CLI]
When downscaling ReplicaSets, ready and creation timestamps are compared in a logarithmic scale. (#99212, @damemi) [SIG Apps and Testing]
When the kubelet is watching a ConfigMap or Secret purely in the context of setting environment variables for containers, only hold that watch for a defined duration before cancelling it. This change reduces the CPU and memory usage of the kube-apiserver in large clusters. (#99393, @chenyw1990) [SIG API Machinery, Node and Testing]
WindowsEndpointSliceProxying feature gate has graduated to beta and is enabled by default. This means kube-proxy will read from EndpointSlices instead of Endpoints on Windows by default. (#99794, @robscott) [SIG Network]

Bug or Regression

Creating a PVC with DataSource should fail for non-CSI plugins. (#97086, @xing-yang) [SIG Apps and Storage]
EndpointSlice controller is now less likely to emit FailedToUpdateEndpointSlices events. (#99345, @robscott) [SIG Apps and Network]
EndpointSliceMirroring controller is now less likely to emit FailedToUpdateEndpointSlices events. (#99756, @robscott) [SIG Apps and Network]
Fix --ignore-errors does not take effect if multiple logs are printed and unfollowed (#97686, @wzshiming) [SIG CLI]
Fix bug that would let the Horizontal Pod Autoscaler scale down despite at least one metric being unavailable/invalid (#99514, @mikkeloscar) [SIG Apps and Autoscaling]
Fix cgroup handling for systemd with cgroup v2 (#98365, @odinuge) [SIG Node]
Fix smb mount PermissionDenied issue on Windows (#99550, @andyzhangx) [SIG Cloud Provider, Storage and Windows]
Fixed a bug that causes smaller number of conntrack-max being used under CPU static policy. (#99225, @xh4n3) (#99613, @xh4n3) [SIG Network]
Fixed bug that caused cAdvisor to incorrectly detect single-socket multi-NUMA topology. (#99315, @iwankgb) [SIG Node]
Fixes add-on manager leader election (#98968, @liggitt) [SIG Cloud Provider]
Improved update time of pod statuses following new probe results. (#98376, @matthyx) [SIG Node and Testing]
Kube-apiserver: an update of a pod with a generic ephemeral volume dropped that volume if the feature had been disabled since creating the pod with such a volume (#99446, @pohly) [SIG Apps, Node and Storage]
Kubeadm: skip validating pod subnet against node-cidr-mask when allocate-node-cidrs is set to be false (#98984, @SataQiu) [SIG Cluster Lifecycle]
On single-stack configured (IPv4 or IPv6, but not both) clusters, Services which are both headless (no clusterIP) and selectorless (empty or undefined selector) will report ipFamilyPolicy RequireDualStack and will have entries in ipFamilies[] for both IPv4 and IPv6. This is a change from alpha, but does not have any impact on the manually-specified Endpoints and EndpointSlices for the Service. (#99555, @thockin) [SIG Apps and Network]
Resolves spurious Failed to list *v1.Secret or Failed to list *v1.ConfigMap messages in kubelet logs. (#99538, @liggitt) [SIG Auth and Node]
Return zero time (midnight on Jan. 1, 1970) instead of negative number when reporting startedAt and finishedAt of the not started or a running Pod when using dockershim as a runtime. (#99585, @Iceber) [SIG Node]
Stdin is now only passed to client-go exec credential plugins when it is detected to be an interactive terminal. Previously, it was passed to client-go exec plugins when *stdout- was detected to be an interactive terminal. (#99654, @ankeesler) [SIG API Machinery and Auth]
The maximum number of ports allowed in EndpointSlices has been increased from 100 to 20,000 (#99795, @robscott) [SIG Network]
Updates the commands
- kubectl kustomize {arg}
- kubectl apply -k {arg} to use same code as kustomize CLI v4.0.5
- [v4.0.5]: https://github.com/kubernetes-sigs/kustomize/releases/tag/kustomize%2Fv4.0.5 (#98946, @monopole) [SIG API Machinery, Architecture, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Node and Storage]
When a CNI plugin returns dual-stack pod IPs, kubelet will now try to respect the "primary IP family" of the cluster by picking a primary pod IP of the same family as the (primary) node IP, rather than assuming that the CNI plugin returned the IPs in the order the administrator wanted (since some CNI plugins don't allow configuring this). (#97979, @danwinship) [SIG Network and Node]
When using Containerd on Windows, the "C:\Windows\System32\drivers\etc\hosts" file will now be managed by kubelet. (#83730, @claudiubelu) [SIG Node and Windows]
VolumeBindingArgs now allow BindTimeoutSeconds to be set as zero, while the value zero indicates no waiting for the checking of volume binding operation. (#99835, @chendave) [SIG Scheduling and Storage]
kubectl exec and kubectl attach now honor the --quiet flag which suppresses output from the local binary that could be confused by a script with the remote command output (all non-failure output is hidden). In addition, print inline with exec and attach the list of alternate containers when we default to the first spec.container. (#99004, @smarterclayton) [SIG CLI]

Other (Cleanup or Flake)

Apiserver_request_duration_seconds is promoted to stable status. (#99925, @logicalhan) [SIG API Machinery, Instrumentation and Testing]
Apiserver_request_total is promoted to stable status and no longer has a content-type dimensions, so any alerts/charts which presume the existence of this will fail. This is however, unlikely to be the case since it was effectively an unbounded dimension in the first place. (#99788, @logicalhan) [SIG API Machinery, Instrumentation and Testing]
EndpointSlice generation is now incremented when labels change. (#99750, @robscott) [SIG Network]
Featuregate AllowInsecureBackendProxy is promoted to GA (#99658, @deads2k) [SIG API Machinery]
Migrate pkg/kubelet/(eviction) to structured logging (#99032, @yangjunmyfm192085) [SIG Node]
Migrate deployment controller log messages to structured logging (#97507, @aldudko) [SIG Apps]
Migrate pkg/kubelet/cloudresource to structured logging (#98999, @sladyn98) [SIG Node]
Migrate pkg/kubelet/cri/remote logs to structured logging (#98589, @chenyw1990) [SIG Node]
Migrate pkg/kubelet/kuberuntime/kuberuntime_container.go logs to structured logging (#96973, @chenyw1990) [SIG Instrumentation and Node]
Migrate pkg/kubelet/status to structured logging (#99836, @navidshaikh) [SIG Instrumentation and Node]
Migrate pkg/kubelet/token to structured logging (#99264, @palnabarun) [SIG Auth, Instrumentation and Node]
Migrate pkg/kubelet/util to structured logging (#99823, @navidshaikh) [SIG Instrumentation and Node]
Migrate proxy/userspace/proxier.go logs to structured logging (#97837, @JornShen) [SIG Network]
Migrate some kubelet/metrics log messages to structured logging (#98627, @jialaijun) [SIG Instrumentation and Node]
Process start time on Windows now uses current process information (#97491, @jsturtevant) [SIG API Machinery, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation and Windows]

Uncategorized

Migrate pkg/kubelet/stats to structured logging (#99607, @krzysiekg) [SIG Node]
The DownwardAPIHugePages feature is beta. Users may use the feature if all workers in their cluster are min 1.20 version. The feature will be enabled by default in all installations in 1.22. (#99610, @derekwaynecarr) [SIG Node]

Dependencies

Added

github.com/go-errors/errors: v1.0.1
github.com/gobuffalo/here: v0.6.0
github.com/google/shlex: e7afc7f
github.com/markbates/pkger: v0.17.1
github.com/monochromegane/go-gitignore: 205db1a
github.com/niemeyer/pretty: a10e7ca
github.com/xlab/treeprint: a009c39
go.starlark.net: 8dd3e2e
golang.org/x/term: 6a3ed07
sigs.k8s.io/kustomize/api: v0.8.5
sigs.k8s.io/kustomize/cmd/config: v0.9.7
sigs.k8s.io/kustomize/kustomize/v4: v4.0.5
sigs.k8s.io/kustomize/kyaml: v0.10.15

Changed

dmitri.shuralyov.com/gpu/mtl: 666a987 → 28db891
github.com/creack/pty: v1.1.7 → v1.1.9
github.com/go-openapi/spec: v0.19.3 → v0.19.5
github.com/go-openapi/strfmt: v0.19.3 → v0.19.5
github.com/go-openapi/validate: v0.19.5 → v0.19.8
github.com/google/cadvisor: v0.38.7 → v0.38.8
github.com/kr/text: v0.1.0 → v0.2.0
github.com/mattn/go-runewidth: v0.0.2 → v0.0.7
github.com/olekukonko/tablewriter: a0225b3 → v0.0.4
github.com/sergi/go-diff: v1.0.0 → v1.1.0
golang.org/x/crypto: 7f63de1 → 5ea612d
golang.org/x/exp: 6cc2880 → 85be41e
golang.org/x/mobile: d2bd2a2 → e6ae53a
golang.org/x/mod: v0.3.0 → ce943fd
golang.org/x/net: 69a7880 → 3d97a24
golang.org/x/sys: 5cba982 → a50acf3
golang.org/x/time: 3af7569 → f8bda1e
golang.org/x/tools: 113979e → v0.1.0
gopkg.in/check.v1: 41f04d3 → 8fa4692
gopkg.in/yaml.v2: v2.2.8 → v2.4.0
k8s.io/kube-openapi: d219536 → 591a79e
k8s.io/system-validators: v1.3.0 → v1.4.0

Removed

github.com/codegangsta/negroni: v1.0.0
github.com/golangplus/bytes: 45c989f
github.com/golangplus/fmt: 2a5d6d7
github.com/gorilla/context: v1.1.1
github.com/kr/pty: v1.1.5
sigs.k8s.io/kustomize: v2.0.3+incompatible

v1.21.0-beta.0

Downloads for v1.21.0-beta.0

Source Code

filename	sha512 hash
kubernetes.tar.gz	69b73a03b70b0ed006e9fef3f5b9bc68f0eb8dc40db6cc04777c03a2cb83a008c783012ca186b1c48357fb192403dbcf6960f120924785e2076e215b9012d546
kubernetes-src.tar.gz	9620fb6d37634271bdd423c09f33f3bd29e74298aa82c47dffc8cb6bd2ff44fa8987a53c53bc529db4ca96ec41503aa81cc8d0c3ac106f3b06c4720de933a8e6

Client binaries

filename	sha512 hash
kubernetes-client-darwin-amd64.tar.gz	2a6f3fcd6b571f5ccde56b91e6e179a01899244be496dae16a2a16e0405c9437b75c6dc853b56f9a4876a7c0a60ec624ccd28400bf8fb960258263172f6860ba
kubernetes-client-linux-386.tar.gz	78fe9ad9f9a9bc043293327223f0038a2c087ca65e87187a6dcae7a24aef9565fe498d295a4639b0b90524469a04930022fcecd815d0afc742eb87ddd8eb7ef5
kubernetes-client-linux-amd64.tar.gz	c025f5e5bd132355e7dd1296cf2ec752264e7f754c4d95fc34b076bd75bef2f571d30872bcb3d138ce95c592111353d275a80eb31f82c07000874b4c56282dbd
kubernetes-client-linux-arm.tar.gz	9975cd2f08fbc202575fb15ba6fc51dab23155ca4d294ebb48516a81efa51f58bab3a87d41c865103756189b554c020371d729ad42880ba788f25047ffc46910
kubernetes-client-linux-arm64.tar.gz	56a6836e24471e42e9d9a8488453f2d55598d70c8aca0a307d5116139c930c25c469fd0d1ab5060fbe88dad75a9b5209a08dc11d644af5f3ebebfbcb6c16266c
kubernetes-client-linux-ppc64le.tar.gz	b6a6cc9baad0ad85ed079ee80e6d6acc905095cfb440998bbc0f553b94fa80077bd58b8692754de477517663d51161705e6e89a1b6d04aa74819800db3517722
kubernetes-client-linux-s390x.tar.gz	7b743481b340f510bf9ae28ea8ea91150aa1e8c37fe104b66d7b3aff62f5e6db3c590d2c13d14dbb5c928de31c7613372def2496075853611d10d6b5fa5b60bd
kubernetes-client-windows-386.tar.gz	df06c7a524ce84c1f8d7836aa960c550c88dbca0ec4854df4dd0a85b3c84b8ecbc41b54e8c4669ce28ac670659ff0fad795deb1bc539f3c3b3aa885381265f5a
kubernetes-client-windows-amd64.tar.gz	4568497b684564f2a94fbea6cbfd778b891231470d9a6956c3b7a3268643d13b855c0fc5ebea5f769300cc0c7719c2c331c387f468816f182f63e515adeaa7a0

Server binaries

filename	sha512 hash
kubernetes-server-linux-amd64.tar.gz	42883cca2d312153baf693fc6024a295359a421e74fd70eefc927413be4e0353debe634e7cca6b9a8f7d8a0cee3717e03ba5d29a306e93139b1c2f3027535a6d
kubernetes-server-linux-arm.tar.gz	e0042215e84c769ba4fc4d159ccf67b2c4a26206bfffb0ec5152723dc813ff9c1426aa0e9b963d7bfa2efb266ca43561b596b459152882ebb42102ccf60bd8eb
kubernetes-server-linux-arm64.tar.gz	bfad29d43e14152cb9bc7c4df6aa77929c6eca64a294bb832215bdba9fa0ee2195a2b709c0267dc7426bb371b547ee80bb8461a8c678c9bffa0819aa7db96289
kubernetes-server-linux-ppc64le.tar.gz	ca67674c01c6cebdc8160c85b449eab1a23bb0557418665246e0208543fa2eaaf97679685c7b49bee3a4300904c0399c3d762ae34dc3e279fd69ce792c4b07ff
kubernetes-server-linux-s390x.tar.gz	285352b628ec754b01b8ad4ef1427223a142d58ebcb46f6861df14d68643133b32330460b213b1ba5bc5362ff2b6dacd8e0c2d20cce6e760fa1954af8a60df8b

Node binaries

filename	sha512 hash
kubernetes-node-linux-amd64.tar.gz	d92d9b30e7e44134a0cd9db4c01924d365991ea16b3131200b02a82cff89c8701f618cd90e7f1c65427bd4bb5f78b10d540b2262de2c143b401fa44e5b25627b
kubernetes-node-linux-arm.tar.gz	551092f23c27fdea4bb2d0547f6075892534892a96fc2be7786f82b58c93bffdb5e1c20f8f11beb8bed46c24f36d4c18ec5ac9755435489efa28e6ae775739bd
kubernetes-node-linux-arm64.tar.gz	26ae7f4163e527349b8818ee38b9ee062314ab417f307afa49c146df8f5a2bd689509b128bd4a1efd3896fd89571149a9955ada91f8ca0c2f599cd863d613c86
kubernetes-node-linux-ppc64le.tar.gz	821fa953f6cebc69d2d481e489f3e90899813d20e2eefbabbcadd019d004108e7540f741fabe60e8e7c6adbb1053ac97898bbdddec3ca19f34a71aa3312e0d4e
kubernetes-node-linux-s390x.tar.gz	22197d4f66205d5aa9de83dfddcc4f2bb3195fd7067cdb5c21e61dbeae217bc112fb7ecff8a539579b60ad92298c2b4c87b9b7c7e6ec1ee1ffa0c6e4bc4412c1
kubernetes-node-windows-amd64.tar.gz	7e22e0d9603562a04dee16a513579f06b1ff6354d97d669bd68f8777ec7f89f6ef027fb23ab0445d7bba0bb689352f0cc748ce90e3f597c6ebe495464a96b860

Changelog since v1.21.0-alpha.3

Urgent Upgrade Notes

(No, really, you MUST read this before you upgrade)

The metric storage_operation_errors_total is not removed, but is marked deprecated, and the metric storage_operation_status_count is marked deprecated. In both cases the storage_operation_duration_seconds metric can be used to recover equivalent counts (using status=fail-unknown in the case of storage_operations_errors_total). (#99045, @mattcary) [SIG Instrumentation and Storage]

Changes by Kind

Deprecation

The batch/v2alpha1 CronJob type definitions and clients are deprecated and removed. (#96987, @soltysh) [SIG API Machinery, Apps, CLI and Testing]

API Change

Cluster admins can now turn off /debug/pprof and /debug/flags/v endpoint in kubelet by setting enableProfilingHandler and enableDebugFlagsHandler to false in their kubelet configuration file. enableProfilingHandler and enableDebugFlagsHandler can be set to true only when enableDebuggingHandlers is also set to true. (#98458, @SaranBalaji90) [SIG Node]
The BoundServiceAccountTokenVolume feature has been promoted to beta, and enabled by default.
- This changes the tokens provided to containers at /var/run/secrets/kubernetes.io/serviceaccount/token to be time-limited, auto-refreshed, and invalidated when the containing pod is deleted.
- Clients should reload the token from disk periodically (once per minute is recommended) to ensure they continue to use a valid token. k8s.io/client-go version v11.0.0+ and v0.15.0+ reload tokens automatically.
- By default, injected tokens are given an extended lifetime so they remain valid even after a new refreshed token is provided. The metric serviceaccount_stale_tokens_total can be used to monitor for workloads that are depending on the extended lifetime and are continuing to use tokens even after a refreshed token is provided to the container. If that metric indicates no existing workloads are depending on extended lifetimes, injected token lifetime can be shortened to 1 hour by starting kube-apiserver with --service-account-extend-token-expiration=false. (#95667, @zshihang) [SIG API Machinery, Auth, Cluster Lifecycle and Testing]

Feature

A new histogram metric to track the time it took to delete a job by the ttl-after-finished controller (#98676, @ahg-g) [SIG Apps and Instrumentation]
AWS cloudprovider supports auto-discovering subnets without any kubernetes.io/cluster/ tags. It also supports additional service annotation service.beta.kubernetes.io/aws-load-balancer-subnets to manually configure the subnets. (#97431, @kishorj) [SIG Cloud Provider]
Add --permit-address-sharing flag to kube-apiserver to listen with SO_REUSEADDR. While allowing to listen on wildcard IPs like 0.0.0.0 and specific IPs in parallel, it avoid waiting for the kernel to release socket in TIME_WAIT state, and hence, considably reducing kube-apiserver restart times under certain conditions. (#93861, @sttts) [SIG API Machinery]
Add csi_operations_seconds metric on kubelet that exposes CSI operations duration and status for node CSI operations. (#98979, @Jiawei0227) [SIG Instrumentation and Storage]
Add migrated field into storage_operation_duration_seconds metric (#99050, @Jiawei0227) [SIG Apps, Instrumentation and Storage]
Add bash-completion for comma separated list on kubectl get (#98301, @phil9909) [SIG CLI]
Added support for installing arm64 node artifacts. (#99242, @liu-cong) [SIG Cloud Provider]
Feature gate RootCAConfigMap is graduated to GA in 1.21 and will be removed in 1.22. (#98033, @zshihang) [SIG API Machinery and Auth]
Kubeadm: during "init" and "join" perform preflight validation on the host / node name and throw warnings if a name is not compliant (#99194, @pacoxu) [SIG Cluster Lifecycle]
Kubectl: kubectl get will omit managed fields by default now. Users could set --show-managed-fields to true to show managedFields when the output format is either json or yaml. (#96878, @knight42) [SIG CLI and Testing]
Metrics can now be disabled explicitly via a command line flag (i.e. '--disabled-metrics=bad_metric1,bad_metric2') (#99217, @logicalhan) [SIG API Machinery, Cluster Lifecycle and Instrumentation]
TTLAfterFinished is now beta and enabled by default (#98678, @ahg-g) [SIG Apps and Auth]
The RunAsGroup feature has been promoted to GA in this release. (#94641, @krmayankk) [SIG Auth and Node]
Turn CronJobControllerV2 on by default. (#98878, @soltysh) [SIG Apps]
UDP protocol support for Agnhost connect subcommand (#98639, @knabben) [SIG Testing]
Upgrades IPv6Dualstack to Beta and turns it on by default. Clusters new and existing will not be affected until user starting adding secondary pod and service cidrs cli flags as described here: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/563-dual-stack (#98969, @khenidak) [SIG API Machinery, Apps, Cloud Provider, Network and Node]

Documentation

Fix ALPHA stability level reference link (#98641, @Jeffwan) [SIG Auth, Cloud Provider, Instrumentation and Storage]

Failing Test

Escape the special characters like [, ] and that exist in vsphere windows path (#98830, @liyanhui1228) [SIG Storage and Windows]
Kube-proxy: fix a bug on UDP NodePort Services where stale conntrack entries may blackhole the traffic directed to the NodePort. (#98305, @aojea) [SIG Network]

Bug or Regression

Add missing --kube-api-content-type in kubemark hollow template (#98911, @Jeffwan) [SIG Scalability and Testing]
Avoid duplicate error messages when runing kubectl edit quota (#98201, @pacoxu) [SIG API Machinery and Apps]
Cleanup subnet in frontend IP configs to prevent huge subnet request bodies in some scenarios. (#98133, @nilo19) [SIG Cloud Provider]
Fix errors when accessing Windows container stats for Dockershim (#98510, @jsturtevant) [SIG Node and Windows]
Fixes spurious errors about IPv6 in kube-proxy logs on nodes with IPv6 disabled. (#99127, @danwinship) [SIG Network and Node]
In the method that ensures that the docker and containerd are in the correct containers with the proper OOM score set up, fixed the bug of identifying containerd process. (#97888, @pacoxu) [SIG Node]
Kubelet now cleans up orphaned volume directories automatically (#95301, @lorenz) [SIG Node and Storage]
When dynamically provisioning Azure File volumes for a premium account, the requested size will be set to 100GB if the request is initially lower than this value to accommodate Azure File requirements. (#99122, @huffmanca) [SIG Cloud Provider and Storage]

Other (Cleanup or Flake)

APIs for kubelet annotations and labels from k8s.io/kubernetes/pkg/kubelet/apis are now available under k8s.io/kubelet/pkg/apis/ (#98931, @michaelbeaumont) [SIG Apps, Auth and Node]
Migrate pkg/kubelet/(pod, pleg) to structured logging (#98990, @gjkim42) [SIG Instrumentation and Node]
Migrate pkg/kubelet/nodestatus to structured logging (#99001, @QiWang19) [SIG Node]
Migrate pkg/kubelet/server logs to structured logging (#98643, @chenyw1990) [SIG Node]
Migrate proxy/winkernel/proxier.go logs to structured logging (#98001, @JornShen) [SIG Network and Windows]
Migrate scheduling_queue.go to structured logging (#98358, @tanjing2020) [SIG Scheduling]
Several flags related to the deprecated dockershim which are present in the kubelet command line are now deprecated. (#98730, @dims) [SIG Node]
The deprecated feature gates CSIDriverRegistry, BlockVolume and CSIBlockVolume are now unconditionally enabled and can no longer be specified in component invocations. (#98021, @gavinfish) [SIG Storage]

Dependencies

Added

Nothing has changed.

Changed

sigs.k8s.io/structured-merge-diff/v4: v4.0.2 → v4.0.3

Removed

Nothing has changed.

v1.21.0-alpha.3

Downloads for v1.21.0-alpha.3

Source Code

filename	sha512 hash
kubernetes.tar.gz	704ec916a1dbd134c54184d2652671f80ae09274f9d23dbbed312944ebeccbc173e2e6b6949b38bdbbfdaf8aa032844deead5efeda1b3150f9751386d9184bc8
kubernetes-src.tar.gz	57db9e7560cfc9c10e7059cb5faf9c4bd5eb8f9b7964f44f000a417021cf80873184b774e7c66c80d4aba84c14080c6bc335618db3d2e5f276436ae065e25408

Client binaries

filename	sha512 hash
kubernetes-client-darwin-amd64.tar.gz	e2706efda92d5cf4f8b69503bb2f7703a8754407eff7f199bb77847838070e720e5f572126c14daa4c0c03b59bb1a63c1dfdeb6e936a40eff1d5497e871e3409
kubernetes-client-linux-386.tar.gz	007bb23c576356ed0890bdfd25a0f98d552599e0ffec19fb982591183c7c1f216d8a3ffa3abf15216be12ae5c4b91fdcd48a7306a2d26b007b86a6abd553fc61
kubernetes-client-linux-amd64.tar.gz	39504b0c610348beba60e8866fff265bad58034f74504951cd894c151a248db718d10f77ebc83f2c38b2d517f8513a46325b38889eefa261ca6dbffeceba50ff
kubernetes-client-linux-arm.tar.gz	30bc2c40d0c759365422ad1651a6fb35909be771f463c5b971caf401f9209525d05256ab70c807e88628dd357c2896745eecf13eda0b748464da97d0a5ef2066
kubernetes-client-linux-arm64.tar.gz	085cdf574dc8fd33ece667130b8c45830b522a07860e03a2384283b1adea73a9652ef3dfaa566e69ee00aea1a6461608814b3ce7a3f703e4a934304f7ae12f97
kubernetes-client-linux-ppc64le.tar.gz	b34b845037d83ea7b3e2d80a9ede4f889b71b17b93b1445f0d936a36e98c13ed6ada125630a68d9243a5fcd311ee37cdcc0c05da484da8488ea5060bc529dbfc
kubernetes-client-linux-s390x.tar.gz	c4758adc7a404b776556efaa79655db2a70777c562145d6ea6887f3335988367a0c2fcd4383e469340f2a768b22e786951de212805ca1cb91104d41c21e0c9ce
kubernetes-client-windows-386.tar.gz	f51edc79702bbd1d9cb3a672852a405e11b20feeab64c5411a7e85c9af304960663eb6b23ef96e0f8c44a722fecf58cb6d700ea2c42c05b3269d8efd5ad803f2
kubernetes-client-windows-amd64.tar.gz	6a3507ce4ac40a0dc7e4720538863fa15f8faf025085a032f34b8fa0f6fa4e8c26849baf649b5b32829b9182e04f82721b13950d31cf218c35be6bf1c05d6abf

Server binaries

filename	sha512 hash
kubernetes-server-linux-amd64.tar.gz	19181d162dfb0b30236e2bf1111000e037eece87c037ca2b24622ca94cb88db86aa4da4ca533522518b209bc9983bbfd6b880a7898e0da96b33f3f6c4690539b
kubernetes-server-linux-arm.tar.gz	42a02f9e08a78ad5da6e5fa1ab12bf1e3c967c472fdbdadbd8746586da74dc8093682ba8513ff2a5301393c47ee9021b860e88ada56b13da386ef485708e46ca
kubernetes-server-linux-arm64.tar.gz	3c8ba8eb02f70061689bd7fab7813542005efe2edc6cfc6b7aecd03ffedf0b81819ad91d69fff588e83023d595eefbfe636aa55e1856add8733bf42fff3c748f
kubernetes-server-linux-ppc64le.tar.gz	cd9e6537450411c39a06fd0b5819db3d16b668d403fb3627ec32c0e32dd1c4860e942934578ca0e1d1b8e6f21f450ff81e37e0cd46ff5c5faf7847ab074aefc5
kubernetes-server-linux-s390x.tar.gz	ada3f65e53bc0e0c0229694dd48c425388089d6d77111a62476d1b08f6ad1d8ab3d60b9ed7d95ac1b42c2c6be8dc0618f40679717160769743c43583d8452362

Node binaries

filename	sha512 hash
kubernetes-node-linux-amd64.tar.gz	ae0fec6aa59e49624b55d9a11c12fdf717ddfe04bdfd4f69965d03004a34e52ee4a3e83f7b61d0c6a86f43b72c99f3decb195b39ae529ef30526d18ec5f58f83
kubernetes-node-linux-arm.tar.gz	9a48c140ab53b7ed8ecec6903988a1a474efc16d2538e5974bc9a12f0c9190be78c4f9e326bf4e982d0b7045a80b99dd0fda7e9b650663be5b89bfd991596746
kubernetes-node-linux-arm64.tar.gz	6912adbc9300344bea470d6435f7b387bfce59767078c11728ce59faf47cd3f72b41b9604fcc5cda45e9816fe939fbe2fb33e52a773e6ff2dfa9a615b4df6141
kubernetes-node-linux-ppc64le.tar.gz	d66dccfe3e6ed6d81567c70703f15375a53992b3a5e2814b98c32e581b861ad95912e03ed2562415d087624c008038bb4a816611fa255442ae752968ea15856b
kubernetes-node-linux-s390x.tar.gz	ad8c69a28f1fbafa3f1cb54909bfd3fc22b104bed63d7ca2b296208c9d43eb5f2943a0ff267da4c185186cdd9f7f77b315cd7f5f1bf9858c0bf42eceb9ac3c58
kubernetes-node-windows-amd64.tar.gz	91d723aa848a9cb028f5bcb41090ca346fb973961521d025c4399164de2c8029b57ca2c4daca560d3c782c05265d2eb0edb0abcce6f23d3efbecf2316a54d650

Changelog since v1.21.0-alpha.2

Urgent Upgrade Notes

(No, really, you MUST read this before you upgrade)

Newly provisioned PVs by gce-pd will no longer have the beta FailureDomain label. gce-pd volume plugin will start to have GA topology label instead. (#98700, @Jiawei0227) [SIG Cloud Provider, Storage and Testing]
Remove alpha CSIMigrationXXComplete flag and add alpha InTreePluginXXUnregister flag. Deprecate CSIMigrationvSphereComplete flag and it will be removed in 1.22. (#98243, @Jiawei0227) [SIG Node and Storage]

Changes by Kind

API Change

Adds support for portRange / EndPort in Network Policy (#97058, @rikatz) [SIG Apps and Network]
Fixes using server-side apply with APIService resources (#98576, @kevindelgado) [SIG API Machinery, Apps and Testing]
Kubernetes is now built using go1.15.7 (#98363, @cpanato) [SIG Cloud Provider, Instrumentation, Node, Release and Testing]
Scheduler extender filter interface now can report unresolvable failed nodes in the new field FailedAndUnresolvableNodes of ExtenderFilterResult struct. Nodes in this map will be skipped in the preemption phase. (#92866, @cofyc) [SIG Scheduling]

Feature

A lease can only attach up to 10k objects. (#98257, @lingsamuel) [SIG API Machinery]
Add ignore-errors flag for drain, support none-break drain in group (#98203, @yuzhiquan) [SIG CLI]
Base-images: Update to debian-iptables:buster-v1.4.0
- Uses iptables 1.8.5
- base-images: Update to debian-base:buster-v1.3.0
- cluster/images/etcd: Build etcd:3.4.13-2 image
  - Uses debian-base:buster-v1.3.0 (#98401, @pacoxu) [SIG Testing]
Export NewDebuggingRoundTripper function and DebugLevel options in the k8s.io/client-go/transport package. (#98324, @atosatto) [SIG API Machinery]
Kubectl wait ensures that observedGeneration >= generation if applicable (#97408, @KnicKnic) [SIG CLI]
Kubernetes is now built using go1.15.8 (#98834, @cpanato) [SIG Cloud Provider, Instrumentation, Release and Testing]
New admission controller "denyserviceexternalips" is available. Clusters which do not *need- the Service "externalIPs" feature should enable this controller and be more secure. (#97395, @thockin) [SIG API Machinery]
Overall, enable the feature of PreferNominatedNode will improve the performance of scheduling where preemption might frequently happen, but in theory, enable the feature of PreferNominatedNode, the pod might not be scheduled to the best candidate node in the cluster. (#93179, @chendave) [SIG Scheduling and Testing]
Pause image upgraded to 3.4.1 in kubelet and kubeadm for both Linux and Windows. (#98205, @pacoxu) [SIG CLI, Cloud Provider, Cluster Lifecycle, Node, Testing and Windows]
The ServiceAccountIssuerDiscovery feature has graduated to GA, and is unconditionally enabled. The ServiceAccountIssuerDiscovery feature-gate will be removed in 1.22. (#98553, @mtaufen) [SIG API Machinery, Auth and Testing]

Documentation

Feat: azure file migration go beta in 1.21. Feature gates CSIMigration to Beta (on by default) and CSIMigrationAzureFile to Beta (off by default since it requires installation of the AzureFile CSI Driver) The in-tree AzureFile plugin "kubernetes.io/azure-file" is now deprecated and will be removed in 1.23. Users should enable CSIMigration + CSIMigrationAzureFile features and install the AzureFile CSI Driver (https://github.com/kubernetes-sigs/azurefile-csi-driver) to avoid disruption to existing Pod and PVC objects at that time. Users should start using the AzureFile CSI Driver directly for any new volumes. (#96293, @andyzhangx) [SIG Cloud Provider]

Failing Test

Kubelet: the HostPort implementation in dockershim was not taking into consideration the HostIP field, causing that the same HostPort can not be used with different IP addresses. This bug causes the conformance test "HostPort validates that there is no conflict between pods with same hostPort but different hostIP and protocol" to fail. (#98755, @aojea) [SIG Cloud Provider, Network and Node]

Bug or Regression

Fix NPE in ephemeral storage eviction (#98261, @wzshiming) [SIG Node]
Fixed a bug that on k8s nodes, when the policy of INPUT chain in filter table is not ACCEPT, healthcheck nodeport would not work. Added iptables rules to allow healthcheck nodeport traffic. (#97824, @hanlins) [SIG Network]
Fixed kube-proxy container image architecture for non amd64 images. (#98526, @saschagrunert) [SIG API Machinery, Release and Testing]
Fixed provisioning of Cinder volumes migrated to CSI when StorageClass with AllowedTopologies was used. (#98311, @jsafrane) [SIG Storage]
Fixes a panic in the disruption budget controller for PDB objects with invalid selectors (#98750, @mortent) [SIG Apps]
Fixes connection errors when using --volume-host-cidr-denylist or --volume-host-allow-local-loopback (#98436, @liggitt) [SIG Network and Storage]
If the user specifies an invalid timeout in the request URL, the request will be aborted with an HTTP 400.
- in cases where the client specifies a timeout in the request URL, the overall request deadline is shortened now since the deadline is setup as soon as the request is received by the apiserver. (#96901, @tkashem) [SIG API Machinery and Testing]
Kubeadm: Some text in the kubeadm upgrade plan output has changed. If you have scripts or other automation that parses this output, please review these changes and update your scripts to account for the new output. (#98728, @stmcginnis) [SIG Cluster Lifecycle]
Kubeadm: fix a bug where external credentials in an existing admin.conf prevented the CA certificate to be written in the cluster-info ConfigMap. (#98882, @kvaps) [SIG Cluster Lifecycle]
Kubeadm: fix bad token placeholder text in "config print *-defaults --help" (#98839, @Mattias-) [SIG Cluster Lifecycle]
Kubeadm: get k8s CI version markers from k8s infra bucket (#98836, @hasheddan) [SIG Cluster Lifecycle and Release]
Mitigate CVE-2020-8555 for kube-up using GCE by preventing local loopback folume hosts. (#97934, @mattcary) [SIG Cloud Provider and Storage]
Remove CSI topology from migrated in-tree gcepd volume. (#97823, @Jiawei0227) [SIG Cloud Provider and Storage]
Sync node status during kubelet node shutdown. Adds an pod admission handler that rejects new pods when the node is in progress of shutting down. (#98005, @wzshiming) [SIG Node]
Truncates a message if it hits the NoteLengthLimit when the scheduler records an event for the pod that indicates the pod has failed to schedule. (#98715, @carlory) [SIG Scheduling]
We will no longer automatically delete all data when a failure is detected during creation of the volume data file on a CSI volume. Now we will only remove the data file and volume path. (#96021, @huffmanca) [SIG Storage]

Other (Cleanup or Flake)

Fix the description of command line flags that can override --config (#98254, @changshuchao) [SIG Scheduling]
Migrate scheduler/taint_manager.go structured logging (#98259, @tanjing2020) [SIG Apps]
Migrate staging/src/k8s.io/apiserver/pkg/admission logs to structured logging (#98138, @lala123912) [SIG API Machinery]
Resolves flakes in the Ingress conformance tests due to conflicts with controllers updating the Ingress object (#98430, @liggitt) [SIG Network and Testing]
The default delegating authorization options now allow unauthenticated access to healthz, readyz, and livez. A system:masters user connecting to an authz delegator will not perform an authz check. (#98325, @deads2k) [SIG API Machinery, Auth, Cloud Provider and Scheduling]
The e2e suite can be instructed not to wait for pods in kube-system to be ready or for all nodes to be ready by passing --allowed-not-ready-nodes=-1 when invoking the e2e.test program. This allows callers to run subsets of the e2e suite in scenarios other than perfectly healthy clusters. (#98781, @smarterclayton) [SIG Testing]
The feature gates WindowsGMSA and WindowsRunAsUserName that are GA since v1.18 are now removed. (#96531, @ialidzhikov) [SIG Node and Windows]
The new -gce-zones flag on the e2e.test binary instructs tests that check for information about how the cluster interacts with the cloud to limit their queries to the provided zone list. If not specified, the current behavior of asking the cloud provider for all available zones in multi zone clusters is preserved. (#98787, @smarterclayton) [SIG API Machinery, Cluster Lifecycle and Testing]

Dependencies

Added

github.com/moby/spdystream: v0.2.0

Changed

github.com/NYTimes/gziphandler: 56545f4 → v1.1.1
github.com/container-storage-interface/spec: v1.2.0 → v1.3.0
github.com/go-logr/logr: v0.2.0 → v0.4.0
github.com/gogo/protobuf: v1.3.1 → v1.3.2
github.com/kisielk/errcheck: v1.2.0 → v1.5.0
github.com/yuin/goldmark: v1.1.27 → v1.2.1
golang.org/x/sync: cd5d95a → 67f06af
golang.org/x/tools: c1934b7 → 113979e
k8s.io/klog/v2: v2.4.0 → v2.5.0
sigs.k8s.io/apiserver-network-proxy/konnectivity-client: v0.0.14 → v0.0.15

Removed

github.com/docker/spdystream: 449fdfc

v1.21.0-alpha.2

Downloads for v1.21.0-alpha.2

Source Code

filename	sha512 hash
kubernetes.tar.gz	6836f6c8514253fe0831fd171fc4ed92eb6d9a773491c8dc82b90d171a1b10076bd6bfaea56ec1e199c5f46c273265bdb9f174f0b2d99c5af1de4c99b862329e
kubernetes-src.tar.gz	d137694804741a05ab09e5f9a418448b66aba0146c028eafce61bcd9d7c276521e345ce9223ffbc703e8172041d58dfc56a3242a4df3686f24905a4541fcd306

Client binaries

filename	sha512 hash
kubernetes-client-darwin-amd64.tar.gz	9478b047a97717953f365c13a098feb7e3cb30a3df22e1b82aa945f2208dcc5cb90afc441ba059a3ae7aafb4ee000ec3a52dc65a8c043a5ac7255a391c875330
kubernetes-client-linux-386.tar.gz	44c8dd4b1ddfc256d35786c8abf45b0eb5f0794f5e310d2efc865748adddc50e8bf38aa71295ae8a82884cb65f2e0b9b0737b000f96fd8f2d5c19971d7c4d8e8
kubernetes-client-linux-amd64.tar.gz	e1291989892769de6b978c17b8612b94da6f3b735a4d895100af622ca9ebb968c75548afea7ab00445869625dd0da3afec979e333afbb445805f5d31c1c13cc7
kubernetes-client-linux-arm.tar.gz	3c4bcb8cbe73822d68a2f62553a364e20bec56b638c71d0f58679b4f4b277d809142346f18506914e694f6122a3e0f767eab20b7b1c4dbb79e4c5089981ae0f1
kubernetes-client-linux-arm64.tar.gz	9389974a790268522e187f5ba5237f3ee4684118c7db76bc3d4164de71d8208702747ec333b204c7a78073ab42553cbbce13a1883fab4fec617e093b05fab332
kubernetes-client-linux-ppc64le.tar.gz	63399e53a083b5af3816c28ff162c9de6b64c75da4647f0d6bbaf97afdf896823cb1e556f2abac75c6516072293026d3ff9f30676fd75143ac6ca3f4d21f4327
kubernetes-client-linux-s390x.tar.gz	50898f197a9d923971ff9046c9f02779b57f7b3cea7da02f3ea9bab8c08d65a9c4a7531a2470fa14783460f52111a52b96ebf916c0a1d8215b4070e4e861c1b0
kubernetes-client-windows-386.tar.gz	a7743e839e1aa19f5ee20b6ee5000ac8ef9e624ac5be63bb574fad6992e4b9167193ed07e03c9bc524e88bfeed66c95341a38a03bff1b10bc9910345f33019f0
kubernetes-client-windows-amd64.tar.gz	5f1d19c230bd3542866d16051808d184e9dd3e2f8c001ed4cee7b5df91f872380c2bf56a3add8c9413ead9d8c369efce2bcab4412174df9b823d3592677bf74e

Server binaries

filename	sha512 hash
kubernetes-server-linux-amd64.tar.gz	ef2cac10febde231aeb6f131e589450c560eeaab8046b49504127a091cddc17bc518c2ad56894a6a033033ab6fc6e121b1cc23691683bc36f45fe6b1dd8e0510
kubernetes-server-linux-arm.tar.gz	d11c9730307f08e80b2b8a7c64c3e9a9e43c622002e377dfe3a386f4541e24adc79a199a6f280f40298bb36793194fd44ed45defe8a3ee54a9cb1386bc26e905
kubernetes-server-linux-arm64.tar.gz	28f8c32bf98ee1add7edf5d341c3bac1afc0085f90dcbbfb8b27a92087f13e2b53c327c8935ee29bf1dc3160655b32bbe3e29d5741a8124a3848a777e7d42933
kubernetes-server-linux-ppc64le.tar.gz	99ae8d44b0de3518c27fa8bbddd2ecf053dfb789fb9d65f8a4ecf4c8331cf63d2f09a41c2bcd5573247d5f66a1b2e51944379df1715017d920d521b98589508a
kubernetes-server-linux-s390x.tar.gz	f8c0e954a2dfc6845614488dadeed069cc7f3f08e33c351d7a77c6ef97867af590932e8576d12998a820a0e4d35d2eee797c764e2810f09ab1e90a5acaeaad33

Node binaries

filename	sha512 hash
kubernetes-node-linux-amd64.tar.gz	c5456d50bfbe0d75fb150b3662ed7468a0abd3970792c447824f326894382c47bbd3a2cc5a290f691c8c09585ff6fe505ab86b4aff2b7e5ccee11b5e6354ae6c
kubernetes-node-linux-arm.tar.gz	335b5cd8672e053302fd94d932fb2fa2e48eeeb1799650b3f93acdfa635e03a8453637569ab710c46885c8317759f4c60aaaf24dca9817d9fa47500fe4a3ca53
kubernetes-node-linux-arm64.tar.gz	3ee87dbeed8ace9351ac89bdaf7274dd10b4faec3ceba0825f690ec7a2bb7eb7c634274a1065a0939eec8ff3e43f72385f058f4ec141841550109e775bc5eff9
kubernetes-node-linux-ppc64le.tar.gz	6956f965b8d719b164214ec9195fdb2c776b907fe6d2c524082f00c27872a73475927fd7d2a994045ce78f6ad2aa5aeaf1eb5514df1810d2cfe342fd4e5ce4a1
kubernetes-node-linux-s390x.tar.gz	3b643aa905c709c57083c28dd9e8ffd88cb64466cda1499da7fc54176b775003e08b9c7a07b0964064df67c8142f6f1e6c13bfc261bd65fb064049920bfa57d0
kubernetes-node-windows-amd64.tar.gz	b2e6d6fb0091f2541f9925018c2bdbb0138a95bab06b4c6b38abf4b7144b2575422263b78fb3c6fd09e76d90a25a8d35a6d4720dc169794d42c95aa22ecc6d5f

Changelog since v1.21.0-alpha.1

Urgent Upgrade Notes

(No, really, you MUST read this before you upgrade)

Remove storage metrics storage_operation_errors_total, since we already have storage_operation_status_count.And add new field status for storage_operation_duration_seconds, so that we can know about all status storage operation latency. (#98332, @JornShen) [SIG Instrumentation and Storage]

Changes by Kind

Deprecation

Remove the TokenRequest and TokenRequestProjection feature gates (#97148, @wawa0210) [SIG Node]
Removing experimental windows container hyper-v support with Docker (#97141, @wawa0210) [SIG Node and Windows]
The export query parameter (inconsistently supported by API resources and deprecated in v1.14) is fully removed. Requests setting this query parameter will now receive a 400 status response. (#98312, @deads2k) [SIG API Machinery, Auth and Testing]

API Change

Enable SPDY pings to keep connections alive, so that kubectl exec and kubectl port-forward won't be interrupted. (#97083, @knight42) [SIG API Machinery and CLI]

Documentation

Official support to build kubernetes with docker-machine / remote docker is removed. This change does not affect building kubernetes with docker locally. (#97935, @adeniyistephen) [SIG Release and Testing]
Set kubelet option --volume-stats-agg-period to negative value to disable volume calculations. (#96675, @pacoxu) [SIG Node]

Bug or Regression

Clean ReplicaSet by revision instead of creation timestamp in deployment controller (#97407, @waynepeking348) [SIG Apps]
Ensure that client-go's EventBroadcaster is safe (non-racy) during shutdown. (#95664, @DirectXMan12) [SIG API Machinery]
Fix azure file migration issue (#97877, @andyzhangx) [SIG Auth, Cloud Provider and Storage]
Fix kubelet from panic after getting the wrong signal (#98200, @wzshiming) [SIG Node]
Fix repeatedly acquire the inhibit lock (#98088, @wzshiming) [SIG Node]
Fixed a bug that the kubelet cannot start on BtrfS. (#98042, @gjkim42) [SIG Node]
Fixed an issue with garbage collection failing to clean up namespaced children of an object also referenced incorrectly by cluster-scoped children (#98068, @liggitt) [SIG API Machinery and Apps]
Fixed no effect namespace when exposing deployment with --dry-run=client. (#97492, @masap) [SIG CLI]
Fixing a bug where a failed node may not have the NoExecute taint set correctly (#96876, @howieyuen) [SIG Apps and Node]
Indentation of Resource Quota block in kubectl describe namespaces output gets correct. (#97946, @dty1er) [SIG CLI]
KUBECTL_EXTERNAL_DIFF now accepts equal sign for additional parameters. (#98158, @dougsland) [SIG CLI]
Kubeadm: fix a bug where "kubeadm join" would not properly handle missing names for existing etcd members. (#97372, @ihgann) [SIG Cluster Lifecycle]
Kubelet should ignore cgroup driver check on Windows node. (#97764, @pacoxu) [SIG Node and Windows]
Make podTopologyHints protected by lock (#95111, @choury) [SIG Node]
Readjust kubelet_containers_per_pod_count bucket (#98169, @wawa0210) [SIG Instrumentation and Node]
Scores from InterPodAffinity have stronger differentiation. (#98096, @leileiwan) [SIG Scheduling]
Specifying the KUBE_TEST_REPO environment variable when e2e tests are executed will instruct the test infrastructure to load that image from a location within the specified repo, using a predefined pattern. (#93510, @smarterclayton) [SIG Testing]
Static pods will be deleted gracefully. (#98103, @gjkim42) [SIG Node]
Use network.Interface.VirtualMachine.ID to get the binded VM Skip standalone VM when reconciling LoadBalancer (#97635, @nilo19) [SIG Cloud Provider]

Other (Cleanup or Flake)

Kubeadm: change the default image repository for CI images from 'gcr.io/kubernetes-ci-images' to 'gcr.io/k8s-staging-ci-images' (#97087, @SataQiu) [SIG Cluster Lifecycle]
Migrate generic_scheduler.go and types.go to structured logging. (#98134, @tanjing2020) [SIG Scheduling]
Migrate proxy/winuserspace/proxier.go logs to structured logging (#97941, @JornShen) [SIG Network]
Migrate staging/src/k8s.io/apiserver/pkg/audit/policy/reader.go logs to structured logging. (#98252, @lala123912) [SIG API Machinery and Auth]
Migrate staging\src\k8s.io\apiserver\pkg\endpoints logs to structured logging (#98093, @lala123912) [SIG API Machinery]
Node (#96552, @pandaamanda) [SIG Apps, Cloud Provider, Node and Scheduling]
The kubectl alpha debug command was scheduled to be removed in v1.21. (#98111, @pandaamanda) [SIG CLI]
Update cri-tools to v1.20.0 (#97967, @rajibmitra) [SIG Cloud Provider]
Windows nodes on GCE will take longer to start due to dependencies installed at node creation time. (#98284, @pjh) [SIG Cloud Provider]

Dependencies

Added

Nothing has changed.

Changed

github.com/google/cadvisor: v0.38.6 → v0.38.7
k8s.io/gengo: 83324d8 → b6c5ce2

Removed

Nothing has changed.

v1.21.0-alpha.1

Downloads for v1.21.0-alpha.1

Source Code

filename	sha512 hash
kubernetes.tar.gz	b2bacd5c3fc9f829e6269b7d2006b0c6e464ff848bb0a2a8f2fe52ad2d7c4438f099bd8be847d8d49ac6e4087f4d74d5c3a967acd798e0b0cb4d7a2bdb122997
kubernetes-src.tar.gz	518ac5acbcf23902fb1b902b69dbf3e86deca5d8a9b5f57488a15f185176d5a109558f3e4df062366af874eca1bcd61751ee8098b0beb9bcdc025d9a1c9be693

Client binaries

filename	sha512 hash
kubernetes-client-darwin-amd64.tar.gz	eaa7aea84a5ed954df5ec710cbeb6ec88b46465f43cb3d09aabe2f714b84a050a50bf5736089f09dbf1090f2e19b44823d656c917e3c8c877630756c3026f2b6
kubernetes-client-linux-386.tar.gz	47f74b8d46ad1779c5b0b5f15aa15d5513a504eeb6f53db4201fbe9ff8956cb986b7c1b0e9d50a99f78e9e2a7f304f3fc1cc2fa239296d9a0dd408eb6069e975
kubernetes-client-linux-amd64.tar.gz	1a148e282628b008c8abd03dd12ec177ced17584b5115d92cd33dd251e607097d42e9da8c7089bd947134b900f85eb75a4740b6a5dd580c105455b843559df39
kubernetes-client-linux-arm.tar.gz	d13d2feb73bd032dc01f7e2955b98d8215a39fe1107d037a73fa1f7d06c3b93ebaa53ed4952d845c64454ef3cca533edb97132d234d50b6fb3bcbd8a8ad990eb
kubernetes-client-linux-arm64.tar.gz	8252105a17b09a78e9ad2c024e4e401a69764ac869708a071aaa06f81714c17b9e7c5b2eb8efde33f24d0b59f75c5da607d5e1e72bdf12adfbb8c829205cd1c1
kubernetes-client-linux-ppc64le.tar.gz	297a9082df4988389dc4be30eb636dff49f36f5d87047bab44745884e610f46a17ae3a08401e2cab155b7c439f38057bfd8288418215f7dd3bf6a49dbe61ea0e
kubernetes-client-linux-s390x.tar.gz	04c06490dd17cd5dccfd92bafa14acf64280ceaea370d9635f23aeb6984d1beae6d0d1d1506edc6f30f927deeb149b989d3e482b47fbe74008b371f629656e79
kubernetes-client-windows-386.tar.gz	ec6e9e87a7d685f8751d7e58f24f417753cff5554a7229218cb3a08195d461b2e12409344950228e9fbbc92a8a06d35dd86242da6ff1e6652ec1fae0365a88c1
kubernetes-client-windows-amd64.tar.gz	51039e6221d3126b5d15e797002ae01d4f0b10789c5d2056532f27ef13f35c5a2e51be27764fda68e8303219963126559023aed9421313bec275c0827fbcaf8a

Server binaries

filename	sha512 hash
kubernetes-server-linux-amd64.tar.gz	4edf820930c88716263560275e3bd7fadb8dc3700b9f8e1d266562e356e0abeb1a913f536377dab91218e3940b447d6bf1da343b85da25c2256dc4dcde5798dd
kubernetes-server-linux-arm.tar.gz	b15213e53a8ab4ba512ce6ef9ad42dd197d419c61615cd23de344227fd846c90448d8f3d98e555b63ba5b565afa627cca6b7e3990ebbbba359c96f2391302df1
kubernetes-server-linux-arm64.tar.gz	5be29cca9a9358fc68351ee63e99d57dc2ffce6e42fc3345753dbbf7542ff2d770c4852424158540435fa6e097ce3afa9b13affc40c8b3b69fe8406798f8068f
kubernetes-server-linux-ppc64le.tar.gz	89fd99ab9ce85db0b94b86709932105efc883cc93959cf7ea9a39e79a4acea23064d7010eeb577450cccabe521c04b7ba47bbec212ed37edeed7cb04bad34518
kubernetes-server-linux-s390x.tar.gz	2fbc30862c77d247aa8d96ab9d1a144599505287b0033a3a2d0988958e7bb2f2e8b67f52c1fec74b4ec47d74ba22cd0f6cb5c4228acbaa72b1678d5fece0254d

Node binaries

filename	sha512 hash
kubernetes-node-linux-amd64.tar.gz	95658d321a0a371c0900b401d1469d96915310afbc4e4b9b11f031438bb188513b57d5a60b5316c3b0c18f541cda6f0ac42f59a76495f8abc743a067115da23a
kubernetes-node-linux-arm.tar.gz	f375acfb42aad6c65b833c270e7e3acfe9cd1d6b2441c33874e77faae263957f7acfe86f1b71f14298118595e4cc6952c7dea0c832f7f2e72428336f13034362
kubernetes-node-linux-arm64.tar.gz	43b4baccd58d74e7f48d096ab92f2bbbcdf47e30e7a3d2b56c6cc9f90002cfd4fefaac894f69bd5f9f4dbdb09a4749a77eb76b1b97d91746bd96fe94457879ab
kubernetes-node-linux-ppc64le.tar.gz	e7962b522c6c7c14b9ee4c1d254d8bdd9846b2b33b0443fc9c4a41be6c40e5e6981798b720f0148f36263d5cc45d5a2bb1dd2f9ab2838e3d002e45b9bddeb7bf
kubernetes-node-linux-s390x.tar.gz	49ebc97f01829e65f7de15be00b882513c44782eaadd1b1825a227e3bd3c73cc6aca8345af05b303d8c43aa2cb944a069755b2709effb8cc22eae621d25d4ba5
kubernetes-node-windows-amd64.tar.gz	6e0fd7724b09e6befbcb53b33574e97f2db089f2eee4bbf391abb7f043103a5e6e32e3014c0531b88f9a3ca88887bbc68625752c44326f98dd53adb3a6d1bed8

Changelog since v1.20.0

Urgent Upgrade Notes

(No, really, you MUST read this before you upgrade)

Kube-proxy's IPVS proxy mode no longer sets the net.ipv4.conf.all.route_localnet sysctl parameter. Nodes upgrading will have net.ipv4.conf.all.route_localnet set to 1 but new nodes will inherit the system default (usually 0). If you relied on any behavior requiring net.ipv4.conf.all.route_localnet, you must set ensure it is enabled as kube-proxy will no longer set it automatically. This change helps to further mitigate CVE-2020-8558. (#92938, @lbernail) [SIG Network and Release]

Changes by Kind

Deprecation

Deprecate the topologyKeys field in Service. This capability will be replaced with upcoming work around Topology Aware Subsetting and Service Internal Traffic Policy. (#96736, @andrewsykim) [SIG Apps]
Kubeadm: deprecated command "alpha selfhosting pivot" is removed now. (#97627, @knight42) [SIG Cluster Lifecycle]
Kubeadm: graduate the command kubeadm alpha kubeconfig user to kubeadm kubeconfig user. The kubeadm alpha kubeconfig user command is deprecated now. (#97583, @knight42) [SIG Cluster Lifecycle]
Kubeadm: the "kubeadm alpha certs" command is removed now, please use "kubeadm certs" instead. (#97706, @knight42) [SIG Cluster Lifecycle]
Remove the deprecated metrics "scheduling_algorithm_preemption_evaluation_seconds" and "binding_duration_seconds", suggest to use "scheduler_framework_extension_point_duration_seconds" instead. (#96447, @chendave) [SIG Cluster Lifecycle, Instrumentation, Scheduling and Testing]
The PodSecurityPolicy API is deprecated in 1.21, and will no longer be served starting in 1.25. (#97171, @deads2k) [SIG Auth and CLI]

API Change

Change the APIVersion proto name of BoundObjectRef from aPIVersion to apiVersion. (#97379, @kebe7jun) [SIG Auth]
Promote Immutable Secrets/ConfigMaps feature to Stable. This allows to set Immutable field in Secrets or ConfigMap object to mark their contents as immutable. (#97615, @wojtek-t) [SIG Apps, Architecture, Node and Testing]

Feature

Add flag --lease-max-object-size and metric etcd_lease_object_counts for kube-apiserver to config and observe max objects attached to a single etcd lease. (#97480, @lingsamuel) [SIG API Machinery, Instrumentation and Scalability]
Add flag --lease-reuse-duration-seconds for kube-apiserver to config etcd lease reuse duration. (#97009, @lingsamuel) [SIG API Machinery and Scalability]
Adds the ability to pass --strict-transport-security-directives to the kube-apiserver to set the HSTS header appropriately. Be sure you understand the consequences to browsers before setting this field. (#96502, @249043822) [SIG Auth]
Kubeadm now includes CoreDNS v1.8.0. (#96429, @rajansandeep) [SIG Cluster Lifecycle]
Kubeadm: add support for certificate chain validation. When using kubeadm in external CA mode, this allows an intermediate CA to be used to sign the certificates. The intermediate CA certificate must be appended to each signed certificate for this to work correctly. (#97266, @robbiemcmichael) [SIG Cluster Lifecycle]
Kubeadm: amend the node kernel validation to treat CGROUP_PIDS, FAIR_GROUP_SCHED as required and CFS_BANDWIDTH, CGROUP_HUGETLB as optional (#96378, @neolit123) [SIG Cluster Lifecycle and Node]
The Kubernetes pause image manifest list now contains an image for Windows Server 20H2. (#97322, @claudiubelu) [SIG Windows]
The apimachinery util/net function used to detect the bind address ResolveBindAddress() takes into consideration global ip addresses on loopback interfaces when: - the host has default routes - there are no global IPs on those interfaces. in order to support more complex network scenarios like BGP Unnumbered RFC 5549 (#95790, @aojea) [SIG Network]

Bug or Regression

Changelog

General
- Fix priority expander falling back to a random choice even though there is a higher priority option to choose
- Clone kubernetes/kubernetes in update-vendor.sh shallowly, instead of fetching all revisions
- Speed up binpacking by reducing the number of PreFilter calls (call once per pod instead of #pods*#nodes times)
- Speed up finding unneeded nodes by 5x+ in very large clusters by reducing the number of PreFilter calls
- Expose --max-nodes-total as a metric
- Errors in IncreaseSize changed from type apiError to cloudProviderError
- Make build-in-docker and test-in-docker work on Linux systems with SELinux enabled
- Fix an error where existing nodes were not considered as destinations while finding place for pods in scale-down simulations
- Remove redundant log lines and reduce severity around parsing kubeEnv
- Don't treat nodes created by virtual kubelet as nodes from non-autoscaled node groups
- Remove redundant logging around calculating node utilization
- Add configurable --network and --rm flags for docker in Makefile
- Subtract DaemonSet pods' requests from node allocatable in the denominator while computing node utilization
- Include taints by condition when determining if a node is unready/still starting
- Fix update-vendor.sh to work on OSX and zsh
- Add best-effort eviction for DaemonSet pods while scaling down non-empty nodes
- Add build support for ARM64
AliCloud
- Add missing daemonsets and replicasets to ALI example cluster role
Apache CloudStack
- Add support for Apache CloudStack
AWS
- Regenerate list of EC2 instances
- Fix pricing endpoint in AWS China Region
Azure
- Add optional jitter on initial VMSS VM cache refresh, keep the refreshes spread over time
- Serve from cache for the whole period of ongoing throttling
- Fix unwanted VMSS VMs cache invalidations
- Enforce setting the number of retries if cloud provider backoff is enabled
- Don't update capacity if VMSS provisioning state is updating
- Support allocatable resources overrides via VMSS tags
- Add missing stable labels in template nodes
- Proactively set instance status to deleting on node deletions
Cluster API
- Migrate interaction with the API from using internal types to using Unstructured
- Improve tests to work better with constrained resources
- Add support for node autodiscovery
- Add support for --cloud-config
- Update group identifier to use for Cluster API annotations
Exoscale
- Add support for Exoscale
GCE
- Decrease the number of GCE Read Requests made while deleting nodes
- Base pricing of custom instances on their instance family type
- Add pricing information for missing machine types
- Add pricing information for different GPU types
- Ignore the new topology.gke.io/zone label when comparing groups
- Add missing stable labels to template nodes
HuaweiCloud
- Add auto scaling group support
- Implement node group by AS
- Implement getting desired instance number of node group
- Implement increasing node group size
- Implement TemplateNodeInfo
- Implement caching instances
IONOS
- Add support for IONOS
Kubemark
- Skip non-kubemark nodes while computing node infos for node groups.
Magnum
- Add Magnum support in the Cluster Autoscaler helm chart
Packet
- Allow empty nodepools
- Add support for multiple nodepools
- Add pricing support
Image

Image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.20.0 (#97011, @towca) [SIG Cloud Provider]
AcceleratorStats will be available in the Summary API of kubelet when cri_stats_provider is used. (#96873, @ruiwen-zhao) [SIG Node]
Add limited lines to log when having tail option (#93920, @zhouya0) [SIG Node]
Avoid systemd-logind loading configuration warning (#97950, @wzshiming) [SIG Node]
Cloud-controller-manager: routes controller should not depend on --allocate-node-cidrs (#97029, @andrewsykim) [SIG Cloud Provider and Testing]
Copy annotations with empty value when deployment rolls back (#94858, @waynepeking348) [SIG Apps]
Detach volumes from vSphere nodes not tracked by attach-detach controller (#96689, @gnufied) [SIG Cloud Provider and Storage]
Fix kubectl label error when local=true is set. (#97440, @pandaamanda) [SIG CLI]
Fix Azure file share not deleted issue when the namespace is deleted (#97417, @andyzhangx) [SIG Cloud Provider and Storage]
Fix CVE-2020-8555 for Gluster client connections. (#97922, @liggitt) [SIG Storage]
Fix counting error in service/nodeport/loadbalancer quota check (#97451, @pacoxu) [SIG API Machinery, Network and Testing]
Fix kubectl-convert import known versions (#97754, @wzshiming) [SIG CLI and Testing]
Fix missing cadvisor machine metrics. (#97006, @lingsamuel) [SIG Node]
Fix nil VMSS name when setting service to auto mode (#97366, @nilo19) [SIG Cloud Provider]
Fix the panic when kubelet registers if a node object already exists with no Status.Capacity or Status.Allocatable (#95269, @SataQiu) [SIG Node]
Fix the regression with the slow pods termination. Before this fix pods may take an additional time to terminate - up to one minute. Reversing the change that ensured that CNI resources cleaned up when the pod is removed on API server. (#97980, @SergeyKanzhelev) [SIG Node]
Fix to recover CSI volumes from certain dangling attachments (#96617, @yuga711) [SIG Apps and Storage]
Fix: azure file latency issue for metadata-heavy workloads (#97082, @andyzhangx) [SIG Cloud Provider and Storage]
Fixed Cinder volume IDs on OpenStack Train (#96673, @jsafrane) [SIG Cloud Provider]
Fixed FibreChannel volume plugin corrupting filesystems on detach of multipath volumes. (#97013, @jsafrane) [SIG Storage]
Fixed a bug in kubelet that will saturate CPU utilization after containerd got restarted. (#97174, @hanlins) [SIG Node]
Fixed bug in CPUManager with race on container map access (#97427, @klueska) [SIG Node]
Fixed cleanup of block devices when /var/lib/kubelet is a symlink. (#96889, @jsafrane) [SIG Storage]
GCE Internal LoadBalancer sync loop will now release the ILB IP address upon sync failure. An error in ILB forwarding rule creation will no longer leak IP addresses. (#97740, @prameshj) [SIG Cloud Provider and Network]
Ignore update pod with no new images in alwaysPullImages admission controller (#96668, @pacoxu) [SIG Apps, Auth and Node]
Kubeadm now installs version 3.4.13 of etcd when creating a cluster with v1.19 (#97244, @pacoxu) [SIG Cluster Lifecycle]
Kubeadm: avoid detection of the container runtime for commands that do not need it (#97625, @pacoxu) [SIG Cluster Lifecycle]
Kubeadm: fix a bug in the host memory detection code on 32bit Linux platforms (#97403, @abelbarrera15) [SIG Cluster Lifecycle]
Kubeadm: fix a bug where "kubeadm upgrade" commands can fail if CoreDNS v1.8.0 is installed. (#97919, @neolit123) [SIG Cluster Lifecycle]
Performance regression #97685 has been fixed. (#97860, @MikeSpreitzer) [SIG API Machinery]
Remove deprecated --cleanup-ipvs flag of kube-proxy, and make --cleanup flag always to flush IPVS (#97336, @maaoBit) [SIG Network]
The current version of the container image publicly exposed IP serving a /metrics endpoint to the Internet. The new version of the container image serves /metrics endpoint on a different port. (#97621, @vbannai) [SIG Cloud Provider]
Use force unmount for NFS volumes if regular mount fails after 1 minute timeout (#96844, @gnufied) [SIG Storage]
Users will see increase in time for deletion of pods and also guarantee that removal of pod from api server would mean deletion of all the resources from container runtime. (#92817, @kmala) [SIG Node]
Using exec auth plugins with kubectl no longer results in warnings about constructing many client instances from the same exec auth config. (#97857, @liggitt) [SIG API Machinery and Auth]
Warning about using a deprecated volume plugin is logged only once. (#96751, @jsafrane) [SIG Storage]

Other (Cleanup or Flake)

Bump github.com/Azure/go-autorest/autorest to v0.11.12 (#97033, @patrickshan) [SIG API Machinery, CLI, Cloud Provider and Cluster Lifecycle]
Delete deprecated mixed protocol annotation (#97096, @nilo19) [SIG Cloud Provider]
Kube-proxy: Traffic from the cluster directed to ExternalIPs is always sent directly to the Service. (#96296, @aojea) [SIG Network and Testing]
Kubeadm: fix a whitespace issue in the output of the "kubeadm join" command shown as the output of "kubeadm init" and "kubeadm token create --print-join-command" (#97413, @SataQiu) [SIG Cluster Lifecycle]
Kubeadm: improve the error messaging when the user provides an invalid discovery token CA certificate hash. (#97290, @neolit123) [SIG Cluster Lifecycle]
Migrate log messages in pkg/scheduler/{scheduler.go,factory.go} to structured logging (#97509, @aldudko) [SIG Scheduling]
Migrate proxy/iptables/proxier.go logs to structured logging (#97678, @JornShen) [SIG Network]
Migrate some scheduler log messages to structured logging (#97349, @aldudko) [SIG Scheduling]
NONE (#97167, @geegeea) [SIG Node]
NetworkPolicy validation framework optimizations for rapidly verifying CNI's work correctly across several pods and namespaces (#91592, @jayunit100) [SIG Network, Storage and Testing]
Official support to build kubernetes with docker-machine / remote docker is removed. This change does not affect building kubernetes with docker locally. (#97618, @jherrera123) [SIG Release and Testing]
Scheduler plugin validation now provides all errors detected instead of the first one. (#96745, @lingsamuel) [SIG Node, Scheduling and Testing]
Storage related e2e testsuite redesign & cleanup (#96573, @Jiawei0227) [SIG Storage and Testing]
The OIDC authenticator no longer waits 10 seconds before attempting to fetch the metadata required to verify tokens. (#97693, @enj) [SIG API Machinery and Auth]
The AttachVolumeLimit feature gate that is GA since v1.17 is now removed. (#96539, @ialidzhikov) [SIG Storage]
The CSINodeInfo feature gate that is GA since v1.17 is unconditionally enabled, and can no longer be specified via the --feature-gates argument. (#96561, @ialidzhikov) [SIG Apps, Auth, Scheduling, Storage and Testing]
The deprecated feature gates RotateKubeletClientCertificate, AttachVolumeLimit, VolumePVCDataSource and EvenPodsSpread are now unconditionally enabled and can no longer be specified in component invocations. (#97306, @gavinfish) [SIG Node, Scheduling and Storage]
ServiceNodeExclusion, NodeDisruptionExclusion and LegacyNodeRoleBehavior(locked to false) features have been promoted to GA. To prevent control plane nodes being added to load balancers automatically, upgrade users need to add "node.kubernetes.io/exclude-from-external-load-balancers" label to control plane nodes. (#97543, @pacoxu) [SIG API Machinery, Apps, Cloud Provider and Network]

Uncategorized

Adding Brazilian Portuguese translation for kubectl (#61595, @cpanato) [SIG CLI]

Dependencies

Added

Nothing has changed.

Changed

github.com/Azure/go-autorest/autorest: v0.11.1 → v0.11.12
github.com/coredns/corefile-migration: v1.0.10 → v1.0.11
github.com/golang/mock: v1.4.1 → v1.4.4
github.com/google/cadvisor: v0.38.5 → v0.38.6
github.com/heketi/heketi: c2e2a4a → v10.2.0+incompatible
github.com/miekg/dns: v1.1.4 → v1.1.35
k8s.io/system-validators: v1.2.0 → v1.3.0

Removed

rsc.io/quote/v3: v3.1.0
rsc.io/sampler: v1.3.0

1.2 - Kubernetes version and version skew support policy

This document describes the maximum version skew supported between various Kubernetes components. Specific cluster deployment tools may place additional restrictions on version skew.

Supported versions

Kubernetes versions are expressed as x.y.z, where x is the major version, y is the minor version, and z is the patch version, following Semantic Versioning terminology. For more information, see Kubernetes Release Versioning.

The Kubernetes project maintains release branches for the most recent three minor releases (1.22, 1.21, 1.20). Kubernetes 1.19 and newer receive approximately 1 year of patch support. Kubernetes 1.18 and older received approximately 9 months of patch support.

Applicable fixes, including security fixes, may be backported to those three release branches, depending on severity and feasibility. Patch releases are cut from those branches at a regular cadence, plus additional urgent releases, when required.

The Release Managers group owns this decision.

For more information, see the Kubernetes patch releases page.

Supported version skew

kube-apiserver

In highly-available (HA) clusters, the newest and oldest kube-apiserver instances must be within one minor version.

Example:

newest kube-apiserver is at 1.22
other kube-apiserver instances are supported at 1.22 and 1.21

kubelet

kubelet must not be newer than kube-apiserver, and may be up to two minor versions older.

Example:

kube-apiserver is at 1.22
kubelet is supported at 1.22, 1.21, and 1.20

Note: If version skew exists between kube-apiserver instances in an HA cluster, this narrows the allowed kubelet versions.

Example:

kube-apiserver instances are at 1.22 and 1.21
kubelet is supported at 1.21, and 1.20 (1.22 is not supported because that would be newer than the kube-apiserver instance at version 1.21)

kube-controller-manager, kube-scheduler, and cloud-controller-manager

kube-controller-manager, kube-scheduler, and cloud-controller-manager must not be newer than the kube-apiserver instances they communicate with. They are expected to match the kube-apiserver minor version, but may be up to one minor version older (to allow live upgrades).

Example:

kube-apiserver is at 1.22
kube-controller-manager, kube-scheduler, and cloud-controller-manager are supported at 1.22 and 1.21

Note: If version skew exists between kube-apiserver instances in an HA cluster, and these components can communicate with any kube-apiserver instance in the cluster (for example, via a load balancer), this narrows the allowed versions of these components.

Example:

kube-apiserver instances are at 1.22 and 1.21
kube-controller-manager, kube-scheduler, and cloud-controller-manager communicate with a load balancer that can route to any kube-apiserver instance
kube-controller-manager, kube-scheduler, and cloud-controller-manager are supported at 1.21 (1.22 is not supported because that would be newer than the kube-apiserver instance at version 1.21)

kubectl

kubectl is supported within one minor version (older or newer) of kube-apiserver.

Example:

kube-apiserver is at 1.22
kubectl is supported at 1.23, 1.22, and 1.21

Note: If version skew exists between kube-apiserver instances in an HA cluster, this narrows the supported kubectl versions.

Example:

kube-apiserver instances are at 1.22 and 1.21
kubectl is supported at 1.22 and 1.21 (other versions would be more than one minor version skewed from one of the kube-apiserver components)

Supported component upgrade order

The supported version skew between components has implications on the order in which components must be upgraded. This section describes the order in which components must be upgraded to transition an existing cluster from version 1.21 to version 1.22.

kube-apiserver

Pre-requisites:

In a single-instance cluster, the existing kube-apiserver instance is 1.21
In an HA cluster, all kube-apiserver instances are at 1.21 or 1.22 (this ensures maximum skew of 1 minor version between the oldest and newest kube-apiserver instance)
The kube-controller-manager, kube-scheduler, and cloud-controller-manager instances that communicate with this server are at version 1.21 (this ensures they are not newer than the existing API server version, and are within 1 minor version of the new API server version)
kubelet instances on all nodes are at version 1.21 or 1.20 (this ensures they are not newer than the existing API server version, and are within 2 minor versions of the new API server version)
Registered admission webhooks are able to handle the data the new kube-apiserver instance will send them:
- ValidatingWebhookConfiguration and MutatingWebhookConfiguration objects are updated to include any new versions of REST resources added in 1.22 (or use the matchPolicy: Equivalent option available in v1.15+)
- The webhooks are able to handle any new versions of REST resources that will be sent to them, and any new fields added to existing versions in 1.22

Upgrade kube-apiserver to 1.22

Note: Project policies for API deprecation and API change guidelines require kube-apiserver to not skip minor versions when upgrading, even in single-instance clusters.

kube-controller-manager, kube-scheduler, and cloud-controller-manager

Pre-requisites:

The kube-apiserver instances these components communicate with are at 1.22 (in HA clusters in which these control plane components can communicate with any kube-apiserver instance in the cluster, all kube-apiserver instances must be upgraded before upgrading these components)

Upgrade kube-controller-manager, kube-scheduler, and cloud-controller-manager to 1.22

kubelet

Pre-requisites:

The kube-apiserver instances the kubelet communicates with are at 1.22

Optionally upgrade kubelet instances to 1.22 (or they can be left at 1.21 or 1.20)

Note: Before performing a minor version kubelet upgrade, drain pods from that node. In-place minor version kubelet upgrades are not supported.

Warning:
Running a cluster with kubelet instances that are persistently two minor versions behind kube-apiserver is not recommended:

they must be upgraded within one minor version of kube-apiserver before the control plane can be upgraded

it increases the likelihood of running kubelet versions older than the three maintained minor releases

kube-proxy

kube-proxy must be the same minor version as kubelet on the node.
kube-proxy must not be newer than kube-apiserver.
kube-proxy must be at most two minor versions older than kube-apiserver.

Example:

If kube-proxy version is 1.20:

kubelet version must be at the same minor version as 1.20.
kube-apiserver version must be between 1.20 and 1.22, inclusive.

2 - Learning environment

kind

kind lets you run Kubernetes on your local computer. This tool requires that you have Docker installed and configured.

The kind Quick Start page shows you what you need to do to get up and running with kind.

minikube

Like kind, minikube is a tool that lets you run Kubernetes locally. minikube runs a single-node Kubernetes cluster on your personal computer (including Windows, macOS and Linux PCs) so that you can try out Kubernetes, or for daily development work.

You can follow the official Get Started! guide if your focus is on getting the tool installed.

3 - Production environment

Create a production-quality Kubernetes cluster

A production-quality Kubernetes cluster requires planning and preparation. If your Kubernetes cluster is to run critical workloads, it must be configured to be resilient. This page explains steps you can take to set up a production-ready cluster, or to uprate an existing cluster for production use. If you're already familiar with production setup and want the links, skip to What's next.

Production considerations

Typically, a production Kubernetes cluster environment has more requirements than a personal learning, development, or test environment Kubernetes. A production environment may require secure access by many users, consistent availability, and the resources to adapt to changing demands.

As you decide where you want your production Kubernetes environment to live (on premises or in a cloud) and the amount of management you want to take on or hand to others, consider how your requirements for a Kubernetes cluster are influenced by the following issues:

Availability: A single-machine Kubernetes learning environment has a single point of failure. Creating a highly available cluster means considering:
- Separating the control plane from the worker nodes.
- Replicating the control plane components on multiple nodes.
- Load balancing traffic to the cluster’s API server.
- Having enough worker nodes available, or able to quickly become available, as changing workloads warrant it.
Scale: If you expect your production Kubernetes environment to receive a stable amount of demand, you might be able to set up for the capacity you need and be done. However, if you expect demand to grow over time or change dramatically based on things like season or special events, you need to plan how to scale to relieve increased pressure from more requests to the control plane and worker nodes or scale down to reduce unused resources.
Security and access management: You have full admin privileges on your own Kubernetes learning cluster. But shared clusters with important workloads, and more than one or two users, require a more refined approach to who and what can access cluster resources. You can use role-based access control (RBAC) and other security mechanisms to make sure that users and workloads can get access to the resources they need, while keeping workloads, and the cluster itself, secure. You can set limits on the resources that users and workloads can access by managing policies and container resources.

Before building a Kubernetes production environment on your own, consider handing off some or all of this job to Turnkey Cloud Solutions providers or other Kubernetes Partners. Options include:

Serverless: Just run workloads on third-party equipment without managing a cluster at all. You will be charged for things like CPU usage, memory, and disk requests.
Managed control plane: Let the provider manage the scale and availability of the cluster's control plane, as well as handle patches and upgrades.
Managed worker nodes: Configure pools of nodes to meet your needs, then the provider makes sure those nodes are available and ready to implement upgrades when needed.
Integration: There are providers that integrate Kubernetes with other services you may need, such as storage, container registries, authentication methods, and development tools.

Whether you build a production Kubernetes cluster yourself or work with partners, review the following sections to evaluate your needs as they relate to your cluster’s control plane, worker nodes, user access, and workload resources.

Production cluster setup

In a production-quality Kubernetes cluster, the control plane manages the cluster from services that can be spread across multiple computers in different ways. Each worker node, however, represents a single entity that is configured to run Kubernetes pods.

Production control plane

The simplest Kubernetes cluster has the entire control plane and worker node services running on the same machine. You can grow that environment by adding worker nodes, as reflected in the diagram illustrated in Kubernetes Components. If the cluster is meant to be available for a short period of time, or can be discarded if something goes seriously wrong, this might meet your needs.

If you need a more permanent, highly available cluster, however, you should consider ways of extending the control plane. By design, one-machine control plane services running on a single machine are not highly available. If keeping the cluster up and running and ensuring that it can be repaired if something goes wrong is important, consider these steps:

Choose deployment tools: You can deploy a control plane using tools such as kubeadm, kops, and kubespray. See Installing Kubernetes with deployment tools to learn tips for production-quality deployments using each of those deployment methods. Different Container Runtimes are available to use with your deployments.
Manage certificates: Secure communications between control plane services are implemented using certificates. Certificates are automatically generated during deployment or you can generate them using your own certificate authority. See PKI certificates and requirements for details.
Configure load balancer for apiserver: Configure a load balancer to distribute external API requests to the apiserver service instances running on different nodes. See Create an External Load Balancer for details.
Separate and backup etcd service: The etcd services can either run on the same machines as other control plane services or run on separate machines, for extra security and availability. Because etcd stores cluster configuration data, backing up the etcd database should be done regularly to ensure that you can repair that database if needed. See the etcd FAQ for details on configuring and using etcd. See Operating etcd clusters for Kubernetes and Set up a High Availability etcd cluster with kubeadm for details.
Create multiple control plane systems: For high availability, the control plane should not be limited to a single machine. If the control plane services are run by an init service (such as systemd), each service should run on at least three machines. However, running control plane services as pods in Kubernetes ensures that the replicated number of services that you request will always be available. The scheduler should be fault tolerant, but not highly available. Some deployment tools set up Raft consensus algorithm to do leader election of Kubernetes services. If the primary goes away, another service elects itself and take over.
Span multiple zones: If keeping your cluster available at all times is critical, consider creating a cluster that runs across multiple data centers, referred to as zones in cloud environments. Groups of zones are referred to as regions. By spreading a cluster across multiple zones in the same region, it can improve the chances that your cluster will continue to function even if one zone becomes unavailable. See Running in multiple zones for details.
Manage on-going features: If you plan to keep your cluster over time, there are tasks you need to do to maintain its health and security. For example, if you installed with kubeadm, there are instructions to help you with Certificate Management and Upgrading kubeadm clusters. See Administer a Cluster for a longer list of Kubernetes administrative tasks.

To learn about available options when you run control plane services, see kube-apiserver, kube-controller-manager, and kube-scheduler component pages. For highly available control plane examples, see Options for Highly Available topology, Creating Highly Available clusters with kubeadm, and Operating etcd clusters for Kubernetes. See Backing up an etcd cluster for information on making an etcd backup plan.

Production worker nodes

Production-quality workloads need to be resilient and anything they rely on needs to be resilient (such as CoreDNS). Whether you manage your own control plane or have a cloud provider do it for you, you still need to consider how you want to manage your worker nodes (also referred to simply as nodes).

Configure nodes: Nodes can be physical or virtual machines. If you want to create and manage your own nodes, you can install a supported operating system, then add and run the appropriate Node services. Consider:
- The demands of your workloads when you set up nodes by having appropriate memory, CPU, and disk speed and storage capacity available.
- Whether generic computer systems will do or you have workloads that need GPU processors, Windows nodes, or VM isolation.
Validate nodes: See Valid node setup for information on how to ensure that a node meets the requirements to join a Kubernetes cluster.
Add nodes to the cluster: If you are managing your own cluster you can add nodes by setting up your own machines and either adding them manually or having them register themselves to the cluster’s apiserver. See the Nodes section for information on how to set up Kubernetes to add nodes in these ways.
Add Windows nodes to the cluster: Kubernetes offers support for Windows worker nodes, allowing you to run workloads implemented in Windows containers. See Windows in Kubernetes for details.
Scale nodes: Have a plan for expanding the capacity your cluster will eventually need. See Considerations for large clusters to help determine how many nodes you need, based on the number of pods and containers you need to run. If you are managing nodes yourself, this can mean purchasing and installing your own physical equipment.
Autoscale nodes: Most cloud providers support Cluster Autoscaler to replace unhealthy nodes or grow and shrink the number of nodes as demand requires. See the Frequently Asked Questions for how the autoscaler works and Deployment for how it is implemented by different cloud providers. For on-premises, there are some virtualization platforms that can be scripted to spin up new nodes based on demand.
Set up node health checks: For important workloads, you want to make sure that the nodes and pods running on those nodes are healthy. Using the Node Problem Detector daemon, you can ensure your nodes are healthy.

Production user management

In production, you may be moving from a model where you or a small group of people are accessing the cluster to where there may potentially be dozens or hundreds of people. In a learning environment or platform prototype, you might have a single administrative account for everything you do. In production, you will want more accounts with different levels of access to different namespaces.

Taking on a production-quality cluster means deciding how you want to selectively allow access by other users. In particular, you need to select strategies for validating the identities of those who try to access your cluster (authentication) and deciding if they have permissions to do what they are asking (authorization):

Authentication: The apiserver can authenticate users using client certificates, bearer tokens, an authenticating proxy, or HTTP basic auth. You can choose which authentication methods you want to use. Using plugins, the apiserver can leverage your organization’s existing authentication methods, such as LDAP or Kerberos. See Authentication for a description of these different methods of authenticating Kubernetes users.
Authorization: When you set out to authorize your regular users, you will probably choose between RBAC and ABAC authorization. See Authorization Overview to review different modes for authorizing user accounts (as well as service account access to your cluster):
- Role-based access control (RBAC): Lets you assign access to your cluster by allowing specific sets of permissions to authenticated users. Permissions can be assigned for a specific namespace (Role) or across the entire cluster (ClusterRole). Then using RoleBindings and ClusterRoleBindings, those permissions can be attached to particular users.
- Attribute-based access control (ABAC): Lets you create policies based on resource attributes in the cluster and will allow or deny access based on those attributes. Each line of a policy file identifies versioning properties (apiVersion and kind) and a map of spec properties to match the subject (user or group), resource property, non-resource property (/version or /apis), and readonly. See Examples for details.

As someone setting up authentication and authorization on your production Kubernetes cluster, here are some things to consider:

Set the authorization mode: When the Kubernetes API server (kube-apiserver) starts, the supported authentication modes must be set using the --authorization-mode flag. For example, that flag in the kube-adminserver.yaml file (in /etc/kubernetes/manifests) could be set to Node,RBAC. This would allow Node and RBAC authorization for authenticated requests.
Create user certificates and role bindings (RBAC): If you are using RBAC authorization, users can create a CertificateSigningRequest (CSR) that can be signed by the cluster CA. Then you can bind Roles and ClusterRoles to each user. See Certificate Signing Requests for details.
Create policies that combine attributes (ABAC): If you are using ABAC authorization, you can assign combinations of attributes to form policies to authorize selected users or groups to access particular resources (such as a pod), namespace, or apiGroup. For more information, see Examples.
Consider Admission Controllers: Additional forms of authorization for requests that can come in through the API server include Webhook Token Authentication. Webhooks and other special authorization types need to be enabled by adding Admission Controllers to the API server.

Set limits on workload resources

Demands from production workloads can cause pressure both inside and outside of the Kubernetes control plane. Consider these items when setting up for the needs of your cluster's workloads:

Set namespace limits: Set per-namespace quotas on things like memory and CPU. See Manage Memory, CPU, and API Resources for details. You can also set Hierarchical Namespaces for inheriting limits.
Prepare for DNS demand: If you expect workloads to massively scale up, your DNS service must be ready to scale up as well. See Autoscale the DNS service in a Cluster.
Create additional service accounts: User accounts determine what users can do on a cluster, while a service account defines pod access within a particular namespace. By default, a pod takes on the default service account from its namespace. See Managing Service Accounts for information on creating a new service account. For example, you might want to:
- Add secrets that a pod could use to pull images from a particular container registry. See Configure Service Accounts for Pods for an example.
- Assign RBAC permissions to a service account. See ServiceAccount permissions for details.

What's next

Decide if you want to build your own production Kubernetes or obtain one from available Turnkey Cloud Solutions or Kubernetes Partners.
If you choose to build your own cluster, plan how you want to handle certificates and set up high availability for features such as etcd and the API server.
Choose from kubeadm, kops or Kubespray deployment methods.
Configure user management by determining your Authentication and Authorization methods.
Prepare for application workloads by setting up resource limits, DNS autoscaling and service accounts.

3.1 - Container runtimes

You need to install a container runtime into each node in the cluster so that Pods can run there. This page outlines what is involved and describes related tasks for setting up nodes.

This page lists details for using several common container runtimes with Kubernetes, on Linux:

containerd
CRI-O
Docker

Note: For other operating systems, look for documentation specific to your platform.

Cgroup drivers

Control groups are used to constrain resources that are allocated to processes.

When systemd is chosen as the init system for a Linux distribution, the init process generates and consumes a root control group (cgroup) and acts as a cgroup manager. Systemd has a tight integration with cgroups and allocates a cgroup per systemd unit. It's possible to configure your container runtime and the kubelet to use cgroupfs. Using cgroupfs alongside systemd means that there will be two different cgroup managers.

A single cgroup manager simplifies the view of what resources are being allocated and will by default have a more consistent view of the available and in-use resources. When there are two cgroup managers on a system, you end up with two views of those resources. In the field, people have reported cases where nodes that are configured to use cgroupfs for the kubelet and Docker, but systemd for the rest of the processes, become unstable under resource pressure.

Changing the settings such that your container runtime and kubelet use systemd as the cgroup driver stabilized the system. To configure this for Docker, set native.cgroupdriver=systemd.

Caution:
Changing the cgroup driver of a Node that has joined a cluster is a sensitive operation. If the kubelet has created Pods using the semantics of one cgroup driver, changing the container runtime to another cgroup driver can cause errors when trying to re-create the Pod sandbox for such existing Pods. Restarting the kubelet may not solve such errors.

If you have automation that makes it feasible, replace the node with another using the updated configuration, or reinstall it using automation.

Migrating to the `systemd` driver in kubeadm managed clusters

Follow this Migration guide if you wish to migrate to the systemd cgroup driver in existing kubeadm managed clusters.

Container runtimes

Caution: This section links to third party projects that provide functionality required by Kubernetes. The Kubernetes project authors aren't responsible for these projects. This page follows CNCF website guidelines by listing projects alphabetically. To add a project to this list, read the content guide before submitting a change.

containerd

This section contains the necessary steps to use containerd as CRI runtime.

Use the following commands to install Containerd on your system:

Install and configure prerequisites:

cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# Setup required sysctl params, these persist across reboots.
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF

# Apply sysctl params without reboot
sudo sysctl --system

Install containerd:

Linux
Windows (PowerShell)

Install the containerd.io package from the official Docker repositories. Instructions for setting up the Docker repository for your respective Linux distribution and installing the containerd.io package can be found at Install Docker Engine.

Configure containerd:

sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml

Restart containerd:
```
sudo systemctl restart containerd
```

Start a Powershell session, set $Version to the desired version (ex: $Version=1.4.3), and then run the following commands:

Download containerd:

curl.exe -L https://github.com/containerd/containerd/releases/download/v$Version/containerd-$Version-windows-amd64.tar.gz -o containerd-windows-amd64.tar.gz
tar.exe xvf .\containerd-windows-amd64.tar.gz

Extract and configure:

Copy-Item -Path ".\bin\" -Destination "$Env:ProgramFiles\containerd" -Recurse -Force
cd $Env:ProgramFiles\containerd\
.\containerd.exe config default | Out-File config.toml -Encoding ascii

# Review the configuration. Depending on setup you may want to adjust:
# - the sandbox_image (Kubernetes pause image)
# - cni bin_dir and conf_dir locations
Get-Content config.toml

# (Optional - but highly recommended) Exclude containerd from Windows Defender Scans
Add-MpPreference -ExclusionProcess "$Env:ProgramFiles\containerd\containerd.exe"

Start containerd:

.\containerd.exe --register-service
Start-Service containerd

Using the `systemd` cgroup driver

To use the systemd cgroup driver in /etc/containerd/config.toml with runc, set

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  ...
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true

If you apply this change make sure to restart containerd again:

sudo systemctl restart containerd

When using kubeadm, manually configure the cgroup driver for kubelet.

CRI-O

This section contains the necessary steps to install CRI-O as a container runtime.

Use the following commands to install CRI-O on your system:

Note: The CRI-O major and minor versions must match the Kubernetes major and minor versions. For more information, see the CRI-O compatibility matrix.

Install and configure prerequisites:

# Create the .conf file to load the modules at bootup
cat <<EOF | sudo tee /etc/modules-load.d/crio.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# Set up required sysctl params, these persist across reboots.
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF

sudo sysctl --system

To install CRI-O on the following operating systems, set the environment variable OS to the appropriate value from the following table:

Operating system	`$OS`
Debian Unstable	`Debian_Unstable`
Debian Testing	`Debian_Testing`

Then, set $VERSION to the CRI-O version that matches your Kubernetes version. For instance, if you want to install CRI-O 1.20, set VERSION=1.20. You can pin your installation to a specific release. To install version 1.20.0, set VERSION=1.20:1.20.0.

Then run

cat <<EOF | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list
deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/ /
EOF
cat <<EOF | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable:cri-o:$VERSION.list
deb http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VERSION/$OS/ /
EOF

curl -L https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable:cri-o:$VERSION/$OS/Release.key | sudo apt-key --keyring /etc/apt/trusted.gpg.d/libcontainers.gpg add -
curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/Release.key | sudo apt-key --keyring /etc/apt/trusted.gpg.d/libcontainers.gpg add -

sudo apt-get update
sudo apt-get install cri-o cri-o-runc

To install on the following operating systems, set the environment variable OS to the appropriate field in the following table:

Operating system	`$OS`
Ubuntu 20.04	`xUbuntu_20.04`
Ubuntu 19.10	`xUbuntu_19.10`
Ubuntu 19.04	`xUbuntu_19.04`
Ubuntu 18.04	`xUbuntu_18.04`

Then run

cat <<EOF | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list
deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/ /
EOF
cat <<EOF | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable:cri-o:$VERSION.list
deb http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VERSION/$OS/ /
EOF

curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/Release.key | sudo apt-key --keyring /etc/apt/trusted.gpg.d/libcontainers.gpg add -
curl -L https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable:cri-o:$VERSION/$OS/Release.key | sudo apt-key --keyring /etc/apt/trusted.gpg.d/libcontainers-cri-o.gpg add -

sudo apt-get update
sudo apt-get install cri-o cri-o-runc

To install on the following operating systems, set the environment variable OS to the appropriate field in the following table:

Operating system	`$OS`
Centos 8	`CentOS_8`
Centos 8 Stream	`CentOS_8_Stream`
Centos 7	`CentOS_7`

Then run

sudo curl -L -o /etc/yum.repos.d/devel:kubic:libcontainers:stable.repo https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/devel:kubic:libcontainers:stable.repo
sudo curl -L -o /etc/yum.repos.d/devel:kubic:libcontainers:stable:cri-o:$VERSION.repo https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable:cri-o:$VERSION/$OS/devel:kubic:libcontainers:stable:cri-o:$VERSION.repo
sudo yum install cri-o

sudo zypper install cri-o

Set $VERSION to the CRI-O version that matches your Kubernetes version. For instance, if you want to install CRI-O 1.20, VERSION=1.20.

You can find available versions with:

sudo dnf module list cri-o

CRI-O does not support pinning to specific releases on Fedora.

Then run

sudo dnf module enable cri-o:$VERSION
sudo dnf install cri-o

Start CRI-O:

sudo systemctl daemon-reload
sudo systemctl enable crio --now

Refer to the CRI-O installation guide for more information.

cgroup driver

CRI-O uses the systemd cgroup driver per default. To switch to the cgroupfs cgroup driver, either edit /etc/crio/crio.conf or place a drop-in configuration in /etc/crio/crio.conf.d/02-cgroup-manager.conf, for example:

[crio.runtime]
conmon_cgroup = "pod"
cgroup_manager = "cgroupfs"

Please also note the changed conmon_cgroup, which has to be set to the value pod when using CRI-O with cgroupfs. It is generally necessary to keep the cgroup driver configuration of the kubelet (usually done via kubeadm) and CRI-O in sync.

Docker

On each of your nodes, install the Docker for your Linux distribution as per Install Docker Engine. You can find the latest validated version of Docker in this dependencies file.
Configure the Docker daemon, in particular to use systemd for the management of the container’s cgroups.
```
sudo mkdir /etc/docker
cat <<EOF | sudo tee /etc/docker/daemon.json
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF
```
Note: overlay2 is the preferred storage driver for systems running Linux kernel version 4.0 or higher, or RHEL or CentOS using version 3.10.0-514 and above.

Restart Docker and enable on boot:

sudo systemctl enable docker
sudo systemctl daemon-reload
sudo systemctl restart docker

Note:
For more information refer to

Configure the Docker daemon

Control Docker with systemd

3.2 - Installing Kubernetes with deployment tools

3.2.1 - Bootstrapping clusters with kubeadm

3.2.1.1 - Installing kubeadm

This page shows how to install the kubeadm toolbox. For information how to create a cluster with kubeadm once you have performed this installation process, see the Using kubeadm to Create a Cluster page.

Before you begin

A compatible Linux host. The Kubernetes project provides generic instructions for Linux distributions based on Debian and Red Hat, and those distributions without a package manager.
2 GB or more of RAM per machine (any less will leave little room for your apps).
2 CPUs or more.
Full network connectivity between all machines in the cluster (public or private network is fine).
Unique hostname, MAC address, and product_uuid for every node. See here for more details.
Certain ports are open on your machines. See here for more details.
Swap disabled. You MUST disable swap in order for the kubelet to work properly.

Verify the MAC address and product_uuid are unique for every node

You can get the MAC address of the network interfaces using the command ip link or ifconfig -a
The product_uuid can be checked by using the command sudo cat /sys/class/dmi/id/product_uuid

It is very likely that hardware devices will have unique addresses, although some virtual machines may have identical values. Kubernetes uses these values to uniquely identify the nodes in the cluster. If these values are not unique to each node, the installation process may fail.

Check network adapters

If you have more than one network adapter, and your Kubernetes components are not reachable on the default route, we recommend you add IP route(s) so Kubernetes cluster addresses go via the appropriate adapter.

Letting iptables see bridged traffic

Make sure that the br_netfilter module is loaded. This can be done by running lsmod | grep br_netfilter. To load it explicitly call sudo modprobe br_netfilter.

As a requirement for your Linux Node's iptables to correctly see bridged traffic, you should ensure net.bridge.bridge-nf-call-iptables is set to 1 in your sysctl config, e.g.

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sudo sysctl --system

For more details please see the Network Plugin Requirements page.

Check required ports

Control-plane node(s)

Protocol	Direction	Port Range	Purpose	Used By
TCP	Inbound	6443*	Kubernetes API server	All
TCP	Inbound	2379-2380	etcd server client API	kube-apiserver, etcd
TCP	Inbound	10250	kubelet API	Self, Control plane
TCP	Inbound	10251	kube-scheduler	Self
TCP	Inbound	10252	kube-controller-manager	Self

Worker node(s)

Protocol	Direction	Port Range	Purpose	Used By
TCP	Inbound	10250	kubelet API	Self, Control plane
TCP	Inbound	30000-32767	NodePort Services†	All

† Default port range for NodePort Services.

Any port numbers marked with * are overridable, so you will need to ensure any custom ports you provide are also open.

Although etcd ports are included in control-plane nodes, you can also host your own etcd cluster externally or on custom ports.

The pod network plugin you use (see below) may also require certain ports to be open. Since this differs with each pod network plugin, please see the documentation for the plugins about what port(s) those need.

Installing runtime

To run containers in Pods, Kubernetes uses a container runtime.

Linux nodes
other operating systems

By default, Kubernetes uses the Container Runtime Interface (CRI) to interface with your chosen container runtime.

If you don't specify a runtime, kubeadm automatically tries to detect an installed container runtime by scanning through a list of well known Unix domain sockets. The following table lists container runtimes and their associated socket paths:

Container runtimes and their socket paths
Runtime	Path to Unix domain socket
Docker	`/var/run/dockershim.sock`
containerd	`/run/containerd/containerd.sock`
CRI-O	`/var/run/crio/crio.sock`

If both Docker and containerd are detected, Docker takes precedence. This is needed because Docker 18.09 ships with containerd and both are detectable even if you only installed Docker. If any other two or more runtimes are detected, kubeadm exits with an error.

The kubelet integrates with Docker through the built-in dockershim CRI implementation.

See container runtimes for more information.

By default, kubeadm uses Docker as the container runtime. The kubelet integrates with Docker through the built-in dockershim CRI implementation.

See container runtimes for more information.

Installing kubeadm, kubelet and kubectl

You will install these packages on all of your machines:

kubeadm: the command to bootstrap the cluster.
kubelet: the component that runs on all of the machines in your cluster and does things like starting pods and containers.
kubectl: the command line util to talk to your cluster.

kubeadm will not install or manage kubelet or kubectl for you, so you will need to ensure they match the version of the Kubernetes control plane you want kubeadm to install for you. If you do not, there is a risk of a version skew occurring that can lead to unexpected, buggy behaviour. However, one minor version skew between the kubelet and the control plane is supported, but the kubelet version may never exceed the API server version. For example, the kubelet running 1.7.0 should be fully compatible with a 1.8.0 API server, but not vice versa.

For information about installing kubectl, see Install and set up kubectl.

Warning: These instructions exclude all Kubernetes packages from any system upgrades. This is because kubeadm and Kubernetes require special attention to upgrade.

For more information on version skews, see:

Kubernetes version and version-skew policy
Kubeadm-specific version skew policy

Update the apt package index and install packages needed to use the Kubernetes apt repository:
```
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
```

Download the Google Cloud public signing key:

sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg

Add the Kubernetes apt repository:

echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list

Update apt package index, install kubelet, kubeadm and kubectl, and pin their version:

sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOF

# Set SELinux in permissive mode (effectively disabling it)
sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes

sudo systemctl enable --now kubelet

Notes:

Setting SELinux in permissive mode by running setenforce 0 and sed ... effectively disables it. This is required to allow containers to access the host filesystem, which is needed by pod networks for example. You have to do this until SELinux support is improved in the kubelet.
You can leave SELinux enabled if you know how to configure it but it may require settings that are not supported by kubeadm.

Install CNI plugins (required for most pod network):

CNI_VERSION="v0.8.2"
sudo mkdir -p /opt/cni/bin
curl -L "https://github.com/containernetworking/plugins/releases/download/${CNI_VERSION}/cni-plugins-linux-amd64-${CNI_VERSION}.tgz" | sudo tar -C /opt/cni/bin -xz

Define the directory to download command files

Note: The DOWNLOAD_DIR variable must be set to a writable directory. If you are running Flatcar Container Linux, set DOWNLOAD_DIR=/opt/bin.

DOWNLOAD_DIR=/usr/local/bin
sudo mkdir -p $DOWNLOAD_DIR

Install crictl (required for kubeadm / Kubelet Container Runtime Interface (CRI))

CRICTL_VERSION="v1.17.0"
curl -L "https://github.com/kubernetes-sigs/cri-tools/releases/download/${CRICTL_VERSION}/crictl-${CRICTL_VERSION}-linux-amd64.tar.gz" | sudo tar -C $DOWNLOAD_DIR -xz

Install kubeadm, kubelet, kubectl and add a kubelet systemd service:

RELEASE="$(curl -sSL https://dl.k8s.io/release/stable.txt)"
cd $DOWNLOAD_DIR
sudo curl -L --remote-name-all https://storage.googleapis.com/kubernetes-release/release/${RELEASE}/bin/linux/amd64/{kubeadm,kubelet,kubectl}
sudo chmod +x {kubeadm,kubelet,kubectl}

RELEASE_VERSION="v0.4.0"
curl -sSL "https://raw.githubusercontent.com/kubernetes/release/${RELEASE_VERSION}/cmd/kubepkg/templates/latest/deb/kubelet/lib/systemd/system/kubelet.service" | sed "s:/usr/bin:${DOWNLOAD_DIR}:g" | sudo tee /etc/systemd/system/kubelet.service
sudo mkdir -p /etc/systemd/system/kubelet.service.d
curl -sSL "https://raw.githubusercontent.com/kubernetes/release/${RELEASE_VERSION}/cmd/kubepkg/templates/latest/deb/kubeadm/10-kubeadm.conf" | sed "s:/usr/bin:${DOWNLOAD_DIR}:g" | sudo tee /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

Enable and start kubelet:

systemctl enable --now kubelet

Note: The Flatcar Container Linux distribution mounts the /usr directory as a read-only filesystem. Before bootstrapping your cluster, you need to take additional steps to configure a writable directory. See the Kubeadm Troubleshooting guide to learn how to set up a writable directory.

The kubelet is now restarting every few seconds, as it waits in a crashloop for kubeadm to tell it what to do.

Configuring a cgroup driver

Both the container runtime and the kubelet have a property called "cgroup driver", which is important for the management of cgroups on Linux machines.

Warning:
Matching the container runtime and kubelet cgroup drivers is required or otherwise the kubelet process will fail.

See Configuring a cgroup driver for more details.

Troubleshooting

If you are running into difficulties with kubeadm, please consult our troubleshooting docs.

What's next

Using kubeadm to Create a Cluster

3.2.1.2 - Troubleshooting kubeadm

As with any program, you might run into an error installing or running kubeadm. This page lists some common failure scenarios and have provided steps that can help you understand and fix the problem.

If your problem is not listed below, please follow the following steps:

If you think your problem is a bug with kubeadm:
- Go to github.com/kubernetes/kubeadm and search for existing issues.
- If no issue exists, please open one and follow the issue template.
If you are unsure about how kubeadm works, you can ask on Slack in #kubeadm, or open a question on StackOverflow. Please include relevant tags like #kubernetes and #kubeadm so folks can help you.

Not possible to join a v1.18 Node to a v1.17 cluster due to missing RBAC

In v1.18 kubeadm added prevention for joining a Node in the cluster if a Node with the same name already exists. This required adding RBAC for the bootstrap-token user to be able to GET a Node object.

However this causes an issue where kubeadm join from v1.18 cannot join a cluster created by kubeadm v1.17.

To workaround the issue you have two options:

Execute kubeadm init phase bootstrap-token on a control-plane node using kubeadm v1.18. Note that this enables the rest of the bootstrap-token permissions as well.

Apply the following RBAC manually using kubectl apply -f ...:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kubeadm:get-nodes
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kubeadm:get-nodes
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kubeadm:get-nodes
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:bootstrappers:kubeadm:default-node-token

`ebtables` or some similar executable not found during installation

If you see the following warnings while running kubeadm init

[preflight] WARNING: ebtables not found in system path
[preflight] WARNING: ethtool not found in system path

Then you may be missing ebtables, ethtool or a similar executable on your node. You can install them with the following commands:

For Ubuntu/Debian users, run apt install ebtables ethtool.
For CentOS/Fedora users, run yum install ebtables ethtool.

kubeadm blocks waiting for control plane during installation

If you notice that kubeadm init hangs after printing out the following line:

[apiclient] Created API client, waiting for the control plane to become ready

This may be caused by a number of problems. The most common are:

network connection problems. Check that your machine has full network connectivity before continuing.
the default cgroup driver configuration for the kubelet differs from that used by Docker. Check the system log file (e.g. /var/log/message) or examine the output from journalctl -u kubelet. If you see something like the following:
```
error: failed to run Kubelet: failed to create kubelet:
misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"
```
There are two common ways to fix the cgroup driver problem:
1. Install Docker again following instructions here.
2. Change the kubelet config to match the Docker cgroup driver manually, you can refer to Configure cgroup driver used by kubelet on control-plane node
control plane Docker containers are crashlooping or hanging. You can check this by running docker ps and investigating each container by running docker logs.

kubeadm blocks when removing managed containers

The following could happen if Docker halts and does not remove any Kubernetes-managed containers:

sudo kubeadm reset

[preflight] Running pre-flight checks
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Removing kubernetes-managed containers
(block)

A possible solution is to restart the Docker service and then re-run kubeadm reset:

sudo systemctl restart docker.service
sudo kubeadm reset

Inspecting the logs for docker may also be useful:

journalctl -u docker

Pods in `RunContainerError`, `CrashLoopBackOff` or `Error` state

Right after kubeadm init there should not be any pods in these states.

If there are pods in one of these states right after kubeadm init, please open an issue in the kubeadm repo. coredns (or kube-dns) should be in the Pending state until you have deployed the network add-on.
If you see Pods in the RunContainerError, CrashLoopBackOff or Error state after deploying the network add-on and nothing happens to coredns (or kube-dns), it's very likely that the Pod Network add-on that you installed is somehow broken. You might have to grant it more RBAC privileges or use a newer version. Please file an issue in the Pod Network providers' issue tracker and get the issue triaged there.
If you install a version of Docker older than 1.12.1, remove the MountFlags=slave option when booting dockerd with systemd and restart docker. You can see the MountFlags in /usr/lib/systemd/system/docker.service. MountFlags can interfere with volumes mounted by Kubernetes, and put the Pods in CrashLoopBackOff state. The error happens when Kubernetes does not find var/run/secrets/kubernetes.io/serviceaccount files.

`coredns` is stuck in the `Pending` state

This is expected and part of the design. kubeadm is network provider-agnostic, so the admin should install the pod network add-on of choice. You have to install a Pod Network before CoreDNS may be deployed fully. Hence the Pending state before the network is set up.

`HostPort` services do not work

The HostPort and HostIP functionality is available depending on your Pod Network provider. Please contact the author of the Pod Network add-on to find out whether HostPort and HostIP functionality are available.

Calico, Canal, and Flannel CNI providers are verified to support HostPort.

For more information, see the CNI portmap documentation.

If your network provider does not support the portmap CNI plugin, you may need to use the NodePort feature of services or use HostNetwork=true.

Pods are not accessible via their Service IP

Many network add-ons do not yet enable hairpin mode which allows pods to access themselves via their Service IP. This is an issue related to CNI. Please contact the network add-on provider to get the latest status of their support for hairpin mode.
If you are using VirtualBox (directly or via Vagrant), you will need to ensure that hostname -i returns a routable IP address. By default the first interface is connected to a non-routable host-only network. A work around is to modify /etc/hosts, see this Vagrantfile for an example.

TLS certificate errors

The following error indicates a possible certificate mismatch.

# kubectl get pods
Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")

Verify that the $HOME/.kube/config file contains a valid certificate, and regenerate a certificate if necessary. The certificates in a kubeconfig file are base64 encoded. The base64 --decode command can be used to decode the certificate and openssl x509 -text -noout can be used for viewing the certificate information.
Unset the KUBECONFIG environment variable using:
```
unset KUBECONFIG
```
Or set it to the default KUBECONFIG location:
```
export KUBECONFIG=/etc/kubernetes/admin.conf
```

Another workaround is to overwrite the existing kubeconfig for the "admin" user:

mv  $HOME/.kube $HOME/.kube.bak
mkdir $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Default NIC When using flannel as the pod network in Vagrant

The following error might indicate that something was wrong in the pod network:

Error from server (NotFound): the server could not find the requested resource

If you're using flannel as the pod network inside Vagrant, then you will have to specify the default interface name for flannel.

Vagrant typically assigns two interfaces to all VMs. The first, for which all hosts are assigned the IP address 10.0.2.15, is for external traffic that gets NATed.

This may lead to problems with flannel, which defaults to the first interface on a host. This leads to all hosts thinking they have the same public IP address. To prevent this, pass the --iface eth1 flag to flannel so that the second interface is chosen.

Non-public IP used for containers

In some situations kubectl logs and kubectl run commands may return with the following errors in an otherwise functional cluster:

Error from server: Get https://10.19.0.41:10250/containerLogs/default/mysql-ddc65b868-glc5m/mysql: dial tcp 10.19.0.41:10250: getsockopt: no route to host

This may be due to Kubernetes using an IP that can not communicate with other IPs on the seemingly same subnet, possibly by policy of the machine provider.
DigitalOcean assigns a public IP to eth0 as well as a private one to be used internally as anchor for their floating IP feature, yet kubelet will pick the latter as the node's InternalIP instead of the public one.

Use ip addr show to check for this scenario instead of ifconfig because ifconfig will not display the offending alias IP address. Alternatively an API endpoint specific to DigitalOcean allows to query for the anchor IP from the droplet:
```
curl http://169.254.169.254/metadata/v1/interfaces/public/0/anchor_ipv4/address
```
The workaround is to tell kubelet which IP to use using --node-ip. When using DigitalOcean, it can be the public one (assigned to eth0) or the private one (assigned to eth1) should you want to use the optional private network. The KubeletExtraArgs section of the kubeadm NodeRegistrationOptions structure can be used for this.

Then restart kubelet:
```
systemctl daemon-reload
systemctl restart kubelet
```

`coredns` pods have `CrashLoopBackOff` or `Error` state

If you have nodes that are running SELinux with an older version of Docker you might experience a scenario where the coredns pods are not starting. To solve that you can try one of the following options:

Upgrade to a newer version of Docker.
Disable SELinux.
Modify the coredns deployment to set allowPrivilegeEscalation to true:

kubectl -n kube-system get deployment coredns -o yaml | \
  sed 's/allowPrivilegeEscalation: false/allowPrivilegeEscalation: true/g' | \
  kubectl apply -f -

Another cause for CoreDNS to have CrashLoopBackOff is when a CoreDNS Pod deployed in Kubernetes detects a loop. A number of workarounds are available to avoid Kubernetes trying to restart the CoreDNS Pod every time CoreDNS detects the loop and exits.

Warning: Disabling SELinux or setting allowPrivilegeEscalation to true can compromise the security of your cluster.

etcd pods restart continually

If you encounter the following error:

rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\""

this issue appears if you run CentOS 7 with Docker 1.13.1.84. This version of Docker can prevent the kubelet from executing into the etcd container.

To work around the issue, choose one of these options:

Roll back to an earlier version of Docker, such as 1.13.1-75

yum downgrade docker-1.13.1-75.git8633870.el7.centos.x86_64 docker-client-1.13.1-75.git8633870.el7.centos.x86_64 docker-common-1.13.1-75.git8633870.el7.centos.x86_64

Install one of the more recent recommended versions, such as 18.06:

sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum install docker-ce-18.06.1.ce-3.el7.x86_64

Not possible to pass a comma separated list of values to arguments inside a `--component-extra-args` flag

kubeadm init flags such as --component-extra-args allow you to pass custom arguments to a control-plane component like the kube-apiserver. However, this mechanism is limited due to the underlying type used for parsing the values (mapStringString).

If you decide to pass an argument that supports multiple, comma-separated values such as --apiserver-extra-args "enable-admission-plugins=LimitRanger,NamespaceExists" this flag will fail with flag: malformed pair, expect string=string. This happens because the list of arguments for --apiserver-extra-args expects key=value pairs and in this case NamespacesExists is considered as a key that is missing a value.

Alternatively, you can try separating the key=value pairs like so: --apiserver-extra-args "enable-admission-plugins=LimitRanger,enable-admission-plugins=NamespaceExists" but this will result in the key enable-admission-plugins only having the value of NamespaceExists.

A known workaround is to use the kubeadm configuration file.

kube-proxy scheduled before node is initialized by cloud-controller-manager

In cloud provider scenarios, kube-proxy can end up being scheduled on new worker nodes before the cloud-controller-manager has initialized the node addresses. This causes kube-proxy to fail to pick up the node's IP address properly and has knock-on effects to the proxy function managing load balancers.

The following error can be seen in kube-proxy Pods:

server.go:610] Failed to retrieve node IP: host IP unknown; known addresses: []
proxier.go:340] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP

A known solution is to patch the kube-proxy DaemonSet to allow scheduling it on control-plane nodes regardless of their conditions, keeping it off of other nodes until their initial guarding conditions abate:

kubectl -n kube-system patch ds kube-proxy -p='{ "spec": { "template": { "spec": { "tolerations": [ { "key": "CriticalAddonsOnly", "operator": "Exists" }, { "effect": "NoSchedule", "key": "node-role.kubernetes.io/master" } ] } } } }'

The tracking issue for this problem is here.

The NodeRegistration.Taints field is omitted when marshalling kubeadm configuration

Note: This issue only applies to tools that marshal kubeadm types (e.g. to a YAML configuration file). It will be fixed in kubeadm API v1beta2.

By default, kubeadm applies the node-role.kubernetes.io/master:NoSchedule taint to control-plane nodes. If you prefer kubeadm to not taint the control-plane node, and set InitConfiguration.NodeRegistration.Taints to an empty slice, the field will be omitted when marshalling. When the field is omitted, kubeadm applies the default taint.

There are at least two workarounds:

Use the node-role.kubernetes.io/master:PreferNoSchedule taint instead of an empty slice. Pods will get scheduled on masters, unless other nodes have capacity.
Remove the taint after kubeadm init exits:

kubectl taint nodes NODE_NAME node-role.kubernetes.io/master:NoSchedule-

`/usr` is mounted read-only on nodes

On Linux distributions such as Fedora CoreOS or Flatcar Container Linux, the directory /usr is mounted as a read-only filesystem. For flex-volume support, Kubernetes components like the kubelet and kube-controller-manager use the default path of /usr/libexec/kubernetes/kubelet-plugins/volume/exec/, yet the flex-volume directory must be writeable for the feature to work.

To workaround this issue you can configure the flex-volume directory using the kubeadm configuration file.

On the primary control-plane Node (created using kubeadm init) pass the following file using --config:

apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
nodeRegistration:
  kubeletExtraArgs:
    volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
controllerManager:
  extraArgs:
    flex-volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"

On joining Nodes:

apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
nodeRegistration:
  kubeletExtraArgs:
    volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"

Alternatively, you can modify /etc/fstab to make the /usr mount writeable, but please be advised that this is modifying a design principle of the Linux distribution.

`kubeadm upgrade plan` prints out `context deadline exceeded` error message

This error message is shown when upgrading a Kubernetes cluster with kubeadm in the case of running an external etcd. This is not a critical bug and happens because older versions of kubeadm perform a version check on the external etcd cluster. You can proceed with kubeadm upgrade apply ....

This issue is fixed as of version 1.19.

`kubeadm reset` unmounts `/var/lib/kubelet`

If /var/lib/kubelet is being mounted, performing a kubeadm reset will effectively unmount it.

To workaround the issue, re-mount the /var/lib/kubelet directory after performing the kubeadm reset operation.

This is a regression introduced in kubeadm 1.15. The issue is fixed in 1.20.

Cannot use the metrics-server securely in a kubeadm cluster

In a kubeadm cluster, the metrics-server can be used insecurely by passing the --kubelet-insecure-tls to it. This is not recommended for production clusters.

If you want to use TLS between the metrics-server and the kubelet there is a problem, since kubeadm deploys a self-signed serving certificate for the kubelet. This can cause the following errors on the side of the metrics-server:

x509: certificate signed by unknown authority
x509: certificate is valid for IP-foo not IP-bar

See Enabling signed kubelet serving certificates to understand how to configure the kubelets in a kubeadm cluster to have properly signed serving certificates.

Also see How to run the metrics-server securely.

3.2.1.3 - Creating a cluster with kubeadm

Using kubeadm, you can create a minimum viable Kubernetes cluster that conforms to best practices. In fact, you can use kubeadm to set up a cluster that will pass the Kubernetes Conformance tests. kubeadm also supports other cluster lifecycle functions, such as bootstrap tokens and cluster upgrades.

The kubeadm tool is good if you need:

A simple way for you to try out Kubernetes, possibly for the first time.
A way for existing users to automate setting up a cluster and test their application.
A building block in other ecosystem and/or installer tools with a larger scope.

You can install and use kubeadm on various machines: your laptop, a set of cloud servers, a Raspberry Pi, and more. Whether you're deploying into the cloud or on-premises, you can integrate kubeadm into provisioning systems such as Ansible or Terraform.

Before you begin

To follow this guide, you need:

One or more machines running a deb/rpm-compatible Linux OS; for example: Ubuntu or CentOS.
2 GiB or more of RAM per machine--any less leaves little room for your apps.
At least 2 CPUs on the machine that you use as a control-plane node.
Full network connectivity among all machines in the cluster. You can use either a public or a private network.

You also need to use a version of kubeadm that can deploy the version of Kubernetes that you want to use in your new cluster.

Kubernetes' version and version skew support policy applies to kubeadm as well as to Kubernetes overall. Check that policy to learn about what versions of Kubernetes and kubeadm are supported. This page is written for Kubernetes v1.22.

The kubeadm tool's overall feature state is General Availability (GA). Some sub-features are still under active development. The implementation of creating the cluster may change slightly as the tool evolves, but the overall implementation should be pretty stable.

Note: Any commands under kubeadm alpha are, by definition, supported on an alpha level.

Objectives

Install a single control-plane Kubernetes cluster
Install a Pod network on the cluster so that your Pods can talk to each other

Instructions

Installing kubeadm on your hosts

See "Installing kubeadm".

Note:
If you have already installed kubeadm, run apt-get update && apt-get upgrade or yum update to get the latest version of kubeadm.

When you upgrade, the kubelet restarts every few seconds as it waits in a crashloop for kubeadm to tell it what to do. This crashloop is expected and normal. After you initialize your control-plane, the kubelet runs normally.

Initializing your control-plane node

The control-plane node is the machine where the control plane components run, including etcd (the cluster database) and the API Server (which the kubectl command line tool communicates with).

(Recommended) If you have plans to upgrade this single control-plane kubeadm cluster to high availability you should specify the --control-plane-endpoint to set the shared endpoint for all control-plane nodes. Such an endpoint can be either a DNS name or an IP address of a load-balancer.
Choose a Pod network add-on, and verify whether it requires any arguments to be passed to kubeadm init. Depending on which third-party provider you choose, you might need to set the --pod-network-cidr to a provider-specific value. See Installing a Pod network add-on.
(Optional) Since version 1.14, kubeadm tries to detect the container runtime on Linux by using a list of well known domain socket paths. To use different container runtime or if there are more than one installed on the provisioned node, specify the --cri-socket argument to kubeadm init. See Installing runtime.
(Optional) Unless otherwise specified, kubeadm uses the network interface associated with the default gateway to set the advertise address for this particular control-plane node's API server. To use a different network interface, specify the --apiserver-advertise-address=<ip-address> argument to kubeadm init. To deploy an IPv6 Kubernetes cluster using IPv6 addressing, you must specify an IPv6 address, for example --apiserver-advertise-address=fd00::101
(Optional) Run kubeadm config images pull prior to kubeadm init to verify connectivity to the gcr.io container image registry.

To initialize the control-plane node run:

kubeadm init <args>

Considerations about apiserver-advertise-address and ControlPlaneEndpoint

While --apiserver-advertise-address can be used to set the advertise address for this particular control-plane node's API server, --control-plane-endpoint can be used to set the shared endpoint for all control-plane nodes.

--control-plane-endpoint allows both IP addresses and DNS names that can map to IP addresses. Please contact your network administrator to evaluate possible solutions with respect to such mapping.

Here is an example mapping:

192.168.0.102 cluster-endpoint

Where 192.168.0.102 is the IP address of this node and cluster-endpoint is a custom DNS name that maps to this IP. This will allow you to pass --control-plane-endpoint=cluster-endpoint to kubeadm init and pass the same DNS name to kubeadm join. Later you can modify cluster-endpoint to point to the address of your load-balancer in an high availability scenario.

Turning a single control plane cluster created without --control-plane-endpoint into a highly available cluster is not supported by kubeadm.

More information

For more information about kubeadm init arguments, see the kubeadm reference guide.

To configure kubeadm init with a configuration file see Using kubeadm init with a configuration file.

To customize control plane components, including optional IPv6 assignment to liveness probe for control plane components and etcd server, provide extra arguments to each component as documented in custom arguments.

To run kubeadm init again, you must first tear down the cluster.

If you join a node with a different architecture to your cluster, make sure that your deployed DaemonSets have container image support for this architecture.

kubeadm init first runs a series of prechecks to ensure that the machine is ready to run Kubernetes. These prechecks expose warnings and exit on errors. kubeadm init then downloads and installs the cluster control plane components. This may take several minutes. After it finishes you should see:

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a Pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  /docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join <control-plane-host>:<control-plane-port> --token <token> --discovery-token-ca-cert-hash sha256:<hash>

To make kubectl work for your non-root user, run these commands, which are also part of the kubeadm init output:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

export KUBECONFIG=/etc/kubernetes/admin.conf

Warning: Kubeadm signs the certificate in the admin.conf to have Subject: O = system:masters, CN = kubernetes-admin. system:masters is a break-glass, super user group that bypasses the authorization layer (e.g. RBAC). Do not share the admin.conf file with anyone and instead grant users custom permissions by generating them a kubeconfig file using the kubeadm kubeconfig user command.

Make a record of the kubeadm join command that kubeadm init outputs. You need this command to join nodes to your cluster.

The token is used for mutual authentication between the control-plane node and the joining nodes. The token included here is secret. Keep it safe, because anyone with this token can add authenticated nodes to your cluster. These tokens can be listed, created, and deleted with the kubeadm token command. See the kubeadm reference guide.

Installing a Pod network add-on

Caution:
This section contains important information about networking setup and deployment order. Read all of this advice carefully before proceeding.

You must deploy a Container Network Interface (CNI) based Pod network add-on so that your Pods can communicate with each other. Cluster DNS (CoreDNS) will not start up before a network is installed.

Take care that your Pod network must not overlap with any of the host networks: you are likely to see problems if there is any overlap. (If you find a collision between your network plugin's preferred Pod network and some of your host networks, you should think of a suitable CIDR block to use instead, then use that during kubeadm init with --pod-network-cidr and as a replacement in your network plugin's YAML).

By default, kubeadm sets up your cluster to use and enforce use of RBAC (role based access control). Make sure that your Pod network plugin supports RBAC, and so do any manifests that you use to deploy it.

If you want to use IPv6--either dual-stack, or single-stack IPv6 only networking--for your cluster, make sure that your Pod network plugin supports IPv6. IPv6 support was added to CNI in v0.6.0.

Note: Kubeadm should be CNI agnostic and the validation of CNI providers is out of the scope of our current e2e testing. If you find an issue related to a CNI plugin you should log a ticket in its respective issue tracker instead of the kubeadm or kubernetes issue trackers.

Several external projects provide Kubernetes Pod networks using CNI, some of which also support Network Policy.

See a list of add-ons that implement the Kubernetes networking model.

You can install a Pod network add-on with the following command on the control-plane node or a node that has the kubeconfig credentials:

kubectl apply -f <add-on.yaml>

You can install only one Pod network per cluster.

Once a Pod network has been installed, you can confirm that it is working by checking that the CoreDNS Pod is Running in the output of kubectl get pods --all-namespaces. And once the CoreDNS Pod is up and running, you can continue by joining your nodes.

If your network is not working or CoreDNS is not in the Running state, check out the troubleshooting guide for kubeadm.

Control plane node isolation

By default, your cluster will not schedule Pods on the control-plane node for security reasons. If you want to be able to schedule Pods on the control-plane node, for example for a single-machine Kubernetes cluster for development, run:

kubectl taint nodes --all node-role.kubernetes.io/master-

With output looking something like:

node "test-01" untainted
taint "node-role.kubernetes.io/master:" not found
taint "node-role.kubernetes.io/master:" not found

This will remove the node-role.kubernetes.io/master taint from any nodes that have it, including the control-plane node, meaning that the scheduler will then be able to schedule Pods everywhere.

Joining your nodes

The nodes are where your workloads (containers and Pods, etc) run. To add new nodes to your cluster do the following for each machine:

SSH to the machine
Become root (e.g. sudo su -)
Run the command that was output by kubeadm init. For example:

kubeadm join --token <token> <control-plane-host>:<control-plane-port> --discovery-token-ca-cert-hash sha256:<hash>

If you do not have the token, you can get it by running the following command on the control-plane node:

kubeadm token list

The output is similar to this:

TOKEN                    TTL  EXPIRES              USAGES           DESCRIPTION            EXTRA GROUPS
8ewj1p.9r9hcjoqgajrj4gi  23h  2018-06-12T02:51:28Z authentication,  The default bootstrap  system:
                                                   signing          token generated by     bootstrappers:
                                                                    'kubeadm init'.        kubeadm:
                                                                                           default-node-token

By default, tokens expire after 24 hours. If you are joining a node to the cluster after the current token has expired, you can create a new token by running the following command on the control-plane node:

kubeadm token create

The output is similar to this:

5didvk.d09sbcov8ph2amjw

If you don't have the value of --discovery-token-ca-cert-hash, you can get it by running the following command chain on the control-plane node:

openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | \
   openssl dgst -sha256 -hex | sed 's/^.* //'

The output is similar to:

8cb2de97839780a412b93877f8507ad6c94f73add17d5d7058e91741c9d5ec78

Note: To specify an IPv6 tuple for <control-plane-host>:<control-plane-port>, IPv6 address must be enclosed in square brackets, for example: [fd00::101]:2073.

The output should look something like:

[preflight] Running pre-flight checks

... (log output of join workflow) ...

Node join complete:
* Certificate signing request sent to control-plane and response
  received.
* Kubelet informed of new secure connection details.

Run 'kubectl get nodes' on control-plane to see this machine join.

A few seconds later, you should notice this node in the output from kubectl get nodes when run on the control-plane node.

(Optional) Controlling your cluster from machines other than the control-plane node

In order to get a kubectl on some other computer (e.g. laptop) to talk to your cluster, you need to copy the administrator kubeconfig file from your control-plane node to your workstation like this:

scp root@<control-plane-host>:/etc/kubernetes/admin.conf .
kubectl --kubeconfig ./admin.conf get nodes

Note:
The example above assumes SSH access is enabled for root. If that is not the case, you can copy the admin.conf file to be accessible by some other user and scp using that other user instead.

The admin.conf file gives the user superuser privileges over the cluster. This file should be used sparingly. For normal users, it's recommended to generate an unique credential to which you grant privileges. You can do this with the kubeadm alpha kubeconfig user --client-name <CN> command. That command will print out a KubeConfig file to STDOUT which you should save to a file and distribute to your user. After that, grant privileges by using kubectl create (cluster)rolebinding.

(Optional) Proxying API Server to localhost

If you want to connect to the API Server from outside the cluster you can use kubectl proxy:

scp root@<control-plane-host>:/etc/kubernetes/admin.conf .
kubectl --kubeconfig ./admin.conf proxy

You can now access the API Server locally at http://localhost:8001/api/v1

Clean up

If you used disposable servers for your cluster, for testing, you can switch those off and do no further clean up. You can use kubectl config delete-cluster to delete your local references to the cluster.

However, if you want to deprovision your cluster more cleanly, you should first drain the node and make sure that the node is empty, then deconfigure the node.

Remove the node

Talking to the control-plane node with the appropriate credentials, run:

kubectl drain <node name> --delete-local-data --force --ignore-daemonsets

Before removing the node, reset the state installed by kubeadm:

kubeadm reset

The reset process does not reset or clean up iptables rules or IPVS tables. If you wish to reset iptables, you must do so manually:

iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If you want to reset the IPVS tables, you must run the following command:

ipvsadm -C

Now remove the node:

kubectl delete node <node name>

If you wish to start over, run kubeadm init or kubeadm join with the appropriate arguments.

Clean up the control plane

You can use kubeadm reset on the control plane host to trigger a best-effort clean up.

See the kubeadm reset reference documentation for more information about this subcommand and its options.

What's next

Verify that your cluster is running properly with Sonobuoy
See Upgrading kubeadm clusters for details about upgrading your cluster using kubeadm.
Learn about advanced kubeadm usage in the kubeadm reference documentation
Learn more about Kubernetes concepts and kubectl.
See the Cluster Networking page for a bigger list of Pod network add-ons.
See the list of add-ons to explore other add-ons, including tools for logging, monitoring, network policy, visualization & control of your Kubernetes cluster.
Configure how your cluster handles logs for cluster events and from applications running in Pods. See Logging Architecture for an overview of what is involved.

Feedback

For bugs, visit the kubeadm GitHub issue tracker
For support, visit the #kubeadm Slack channel
General SIG Cluster Lifecycle development Slack channel: #sig-cluster-lifecycle
SIG Cluster Lifecycle SIG information
SIG Cluster Lifecycle mailing list: kubernetes-sig-cluster-lifecycle

Version skew policy

The kubeadm tool of version v1.22 may deploy clusters with a control plane of version v1.22 or v1.21. kubeadm v1.22 can also upgrade an existing kubeadm-created cluster of version v1.21.

Due to that we can't see into the future, kubeadm CLI v1.22 may or may not be able to deploy v1.23 clusters.

These resources provide more information on supported version skew between kubelets and the control plane, and other Kubernetes components:

Kubernetes version and version-skew policy
Kubeadm-specific installation guide

Limitations

Cluster resilience

The cluster created here has a single control-plane node, with a single etcd database running on it. This means that if the control-plane node fails, your cluster may lose data and may need to be recreated from scratch.

Workarounds:

Regularly back up etcd. The etcd data directory configured by kubeadm is at /var/lib/etcd on the control-plane node.
Use multiple control-plane nodes. You can read Options for Highly Available topology to pick a cluster topology that provides high-availability.

Platform compatibility

kubeadm deb/rpm packages and binaries are built for amd64, arm (32-bit), arm64, ppc64le, and s390x following the multi-platform proposal.

Multiplatform container images for the control plane and addons are also supported since v1.12.

Only some of the network providers offer solutions for all platforms. Please consult the list of network providers above or the documentation from each provider to figure out whether the provider supports your chosen platform.

Troubleshooting

If you are running into difficulties with kubeadm, please consult our troubleshooting docs.

3.2.1.4 - Customizing control plane configuration with kubeadm

FEATURE STATE: Kubernetes v1.12 [stable]

The kubeadm ClusterConfiguration object exposes the field extraArgs that can override the default flags passed to control plane components such as the APIServer, ControllerManager and Scheduler. The components are defined using the following fields:

apiServer
controllerManager
scheduler

The extraArgs field consist of key: value pairs. To override a flag for a control plane component:

Add the appropriate fields to your configuration.
Add the flags to override to the field.
Run kubeadm init with --config <YOUR CONFIG YAML>.

For more details on each field in the configuration you can navigate to our API reference pages.

Note: You can generate a ClusterConfiguration object with default values by running kubeadm config print init-defaults and saving the output to a file of your choice.

APIServer flags

For details, see the reference documentation for kube-apiserver.

Example usage:

apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: v1.16.0
apiServer:
  extraArgs:
    advertise-address: 192.168.0.103
    anonymous-auth: "false"
    enable-admission-plugins: AlwaysPullImages,DefaultStorageClass
    audit-log-path: /home/johndoe/audit.log

ControllerManager flags

For details, see the reference documentation for kube-controller-manager.

Example usage:

apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: v1.16.0
controllerManager:
  extraArgs:
    cluster-signing-key-file: /home/johndoe/keys/ca.key
    bind-address: 0.0.0.0
    deployment-controller-sync-period: "50"

Scheduler flags

For details, see the reference documentation for kube-scheduler.

Example usage:

apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: v1.16.0
scheduler:
  extraArgs:
    bind-address: 0.0.0.0
    config: /home/johndoe/schedconfig.yaml
    kubeconfig: /home/johndoe/kubeconfig.yaml

3.2.1.5 - Options for Highly Available topology

This page explains the two options for configuring the topology of your highly available (HA) Kubernetes clusters.

You can set up an HA cluster:

With stacked control plane nodes, where etcd nodes are colocated with control plane nodes
With external etcd nodes, where etcd runs on separate nodes from the control plane

You should carefully consider the advantages and disadvantages of each topology before setting up an HA cluster.

Note: kubeadm bootstraps the etcd cluster statically. Read the etcd Clustering Guide for more details.

Stacked etcd topology

A stacked HA cluster is a topology where the distributed data storage cluster provided by etcd is stacked on top of the cluster formed by the nodes managed by kubeadm that run control plane components.

Each control plane node runs an instance of the kube-apiserver, kube-scheduler, and kube-controller-manager. The kube-apiserver is exposed to worker nodes using a load balancer.

Each control plane node creates a local etcd member and this etcd member communicates only with the kube-apiserver of this node. The same applies to the local kube-controller-manager and kube-scheduler instances.

This topology couples the control planes and etcd members on the same nodes. It is simpler to set up than a cluster with external etcd nodes, and simpler to manage for replication.

However, a stacked cluster runs the risk of failed coupling. If one node goes down, both an etcd member and a control plane instance are lost, and redundancy is compromised. You can mitigate this risk by adding more control plane nodes.

You should therefore run a minimum of three stacked control plane nodes for an HA cluster.

This is the default topology in kubeadm. A local etcd member is created automatically on control plane nodes when using kubeadm init and kubeadm join --control-plane.

Stacked etcd topology

External etcd topology

An HA cluster with external etcd is a topology where the distributed data storage cluster provided by etcd is external to the cluster formed by the nodes that run control plane components.

Like the stacked etcd topology, each control plane node in an external etcd topology runs an instance of the kube-apiserver, kube-scheduler, and kube-controller-manager. And the kube-apiserver is exposed to worker nodes using a load balancer. However, etcd members run on separate hosts, and each etcd host communicates with the kube-apiserver of each control plane node.

This topology decouples the control plane and etcd member. It therefore provides an HA setup where losing a control plane instance or an etcd member has less impact and does not affect the cluster redundancy as much as the stacked HA topology.

However, this topology requires twice the number of hosts as the stacked HA topology. A minimum of three hosts for control plane nodes and three hosts for etcd nodes are required for an HA cluster with this topology.

External etcd topology

What's next

Set up a highly available cluster with kubeadm

3.2.1.6 - Creating Highly Available clusters with kubeadm

This page explains two different approaches to setting up a highly available Kubernetes cluster using kubeadm:

With stacked control plane nodes. This approach requires less infrastructure. The etcd members and control plane nodes are co-located.
With an external etcd cluster. This approach requires more infrastructure. The control plane nodes and etcd members are separated.

Before proceeding, you should carefully consider which approach best meets the needs of your applications and environment. This comparison topic outlines the advantages and disadvantages of each.

If you encounter issues with setting up the HA cluster, please provide us with feedback in the kubeadm issue tracker.

Before you begin

For both methods you need this infrastructure:

Three machines that meet kubeadm's minimum requirements for the control-plane nodes
Three machines that meet kubeadm's minimum requirements for the workers
Full network connectivity between all machines in the cluster (public or private network)
sudo privileges on all machines
SSH access from one device to all nodes in the system
kubeadm and kubelet installed on all machines. kubectl is optional.

For the external etcd cluster only, you also need:

Three additional machines for etcd members

First steps for both methods

Create load balancer for kube-apiserver

Note: There are many configurations for load balancers. The following example is only one option. Your cluster requirements may need a different configuration.

Create a kube-apiserver load balancer with a name that resolves to DNS.
- In a cloud environment you should place your control plane nodes behind a TCP forwarding load balancer. This load balancer distributes traffic to all healthy control plane nodes in its target list. The health check for an apiserver is a TCP check on the port the kube-apiserver listens on (default value :6443).
- It is not recommended to use an IP address directly in a cloud environment.
- The load balancer must be able to communicate with all control plane nodes on the apiserver port. It must also allow incoming traffic on its listening port.
- Make sure the address of the load balancer always matches the address of kubeadm's ControlPlaneEndpoint.
- Read the Options for Software Load Balancing guide for more details.
Add the first control plane nodes to the load balancer and test the connection:
```
nc -v LOAD_BALANCER_IP PORT
```
- A connection refused error is expected because the apiserver is not yet running. A timeout, however, means the load balancer cannot communicate with the control plane node. If a timeout occurs, reconfigure the load balancer to communicate with the control plane node.
Add the remaining control plane nodes to the load balancer target group.

Stacked control plane and etcd nodes

Steps for the first control plane node

Initialize the control plane:
```
sudo kubeadm init --control-plane-endpoint "LOAD_BALANCER_DNS:LOAD_BALANCER_PORT" --upload-certs
```
- You can use the --kubernetes-version flag to set the Kubernetes version to use. It is recommended that the versions of kubeadm, kubelet, kubectl and Kubernetes match.
- The --control-plane-endpoint flag should be set to the address or DNS and port of the load balancer.
- The --upload-certs flag is used to upload the certificates that should be shared across all the control-plane instances to the cluster. If instead, you prefer to copy certs across control-plane nodes manually or using automation tools, please remove this flag and refer to Manual certificate distribution section below.
Note: The kubeadm init flags --config and --certificate-key cannot be mixed, therefore if you want to use the kubeadm configuration you must add the certificateKey field in the appropriate config locations (under InitConfiguration and JoinConfiguration: controlPlane).

Note: Some CNI network plugins require additional configuration, for example specifying the pod IP CIDR, while others do not. See the CNI network documentation. To add a pod CIDR pass the flag --pod-network-cidr, or if you are using a kubeadm configuration file set the podSubnet field under the networking object of ClusterConfiguration.
- The output looks similar to:
```
...
You can now join any number of control-plane node by running the following command on each as a root:
    kubeadm join 192.168.0.200:6443 --token 9vr73a.a8uxyaju799qwdjv --discovery-token-ca-cert-hash sha256:7c2e69131a36ae2a042a339b33381c6d0d43887e2de83720eff5359e26aec866 --control-plane --certificate-key f8902e114ef118304e561c3ecd4d0b543adc226b7a07f675f56564185ffe0c07

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use kubeadm init phase upload-certs to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:
    kubeadm join 192.168.0.200:6443 --token 9vr73a.a8uxyaju799qwdjv --discovery-token-ca-cert-hash sha256:7c2e69131a36ae2a042a339b33381c6d0d43887e2de83720eff5359e26aec866
```
- Copy this output to a text file. You will need it later to join control plane and worker nodes to the cluster.
- When --upload-certs is used with kubeadm init, the certificates of the primary control plane are encrypted and uploaded in the kubeadm-certs Secret.
- To re-upload the certificates and generate a new decryption key, use the following command on a control plane node that is already joined to the cluster:
```
sudo kubeadm init phase upload-certs --upload-certs
```
- You can also specify a custom --certificate-key during init that can later be used by join. To generate such a key you can use the following command:
```
kubeadm certs certificate-key
```
Note: The kubeadm-certs Secret and decryption key expire after two hours.

Caution: As stated in the command output, the certificate key gives access to cluster sensitive data, keep it secret!
Apply the CNI plugin of your choice: Follow these instructions to install the CNI provider. Make sure the configuration corresponds to the Pod CIDR specified in the kubeadm configuration file if applicable.

In this example we are using Weave Net:
```
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
```
Type the following and watch the pods of the control plane components get started:
```
kubectl get pod -n kube-system -w
```

Steps for the rest of the control plane nodes

Note: Since kubeadm version 1.15 you can join multiple control-plane nodes in parallel. Prior to this version, you must join new control plane nodes sequentially, only after the first node has finished initializing.

For each additional control plane node you should:

Execute the join command that was previously given to you by the kubeadm init output on the first node. It should look something like this:
```
sudo kubeadm join 192.168.0.200:6443 --token 9vr73a.a8uxyaju799qwdjv --discovery-token-ca-cert-hash sha256:7c2e69131a36ae2a042a339b33381c6d0d43887e2de83720eff5359e26aec866 --control-plane --certificate-key f8902e114ef118304e561c3ecd4d0b543adc226b7a07f675f56564185ffe0c07
```
- The --control-plane flag tells kubeadm join to create a new control plane.
- The --certificate-key ... will cause the control plane certificates to be downloaded from the kubeadm-certs Secret in the cluster and be decrypted using the given key.

External etcd nodes

Setting up a cluster with external etcd nodes is similar to the procedure used for stacked etcd with the exception that you should setup etcd first, and you should pass the etcd information in the kubeadm config file.

Set up the etcd cluster

Follow these instructions to set up the etcd cluster.
Setup SSH as described here.

Copy the following files from any etcd node in the cluster to the first control plane node:

export CONTROL_PLANE="ubuntu@10.0.0.7"
scp /etc/kubernetes/pki/etcd/ca.crt "${CONTROL_PLANE}":
scp /etc/kubernetes/pki/apiserver-etcd-client.crt "${CONTROL_PLANE}":
scp /etc/kubernetes/pki/apiserver-etcd-client.key "${CONTROL_PLANE}":

Replace the value of CONTROL_PLANE with the user@host of the first control-plane node.

Set up the first control plane node

Create a file called kubeadm-config.yaml with the following contents:

apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "LOAD_BALANCER_DNS:LOAD_BALANCER_PORT"
etcd:
    external:
        endpoints:
        - https://ETCD_0_IP:2379
        - https://ETCD_1_IP:2379
        - https://ETCD_2_IP:2379
        caFile: /etc/kubernetes/pki/etcd/ca.crt
        certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
        keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key

Note: The difference between stacked etcd and external etcd here is that the external etcd setup requires a configuration file with the etcd endpoints under the external object for etcd. In the case of the stacked etcd topology this is managed automatically.

Replace the following variables in the config template with the appropriate values for your cluster:

- `LOAD_BALANCER_DNS`
- `LOAD_BALANCER_PORT`
- `ETCD_0_IP`
- `ETCD_1_IP`
- `ETCD_2_IP`

The following steps are similar to the stacked etcd setup:

Run sudo kubeadm init --config kubeadm-config.yaml --upload-certs on this node.
Write the output join commands that are returned to a text file for later use.

Apply the CNI plugin of your choice. The given example is for Weave Net:

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

Steps for the rest of the control plane nodes

The steps are the same as for the stacked etcd setup:

Make sure the first control plane node is fully initialized.
Join each control plane node with the join command you saved to a text file. It's recommended to join the control plane nodes one at a time.
Don't forget that the decryption key from --certificate-key expires after two hours, by default.

Common tasks after bootstrapping control plane

Install workers

Worker nodes can be joined to the cluster with the command you stored previously as the output from the kubeadm init command:

sudo kubeadm join 192.168.0.200:6443 --token 9vr73a.a8uxyaju799qwdjv --discovery-token-ca-cert-hash sha256:7c2e69131a36ae2a042a339b33381c6d0d43887e2de83720eff5359e26aec866

Manual certificate distribution

If you choose to not use kubeadm init with the --upload-certs flag this means that you are going to have to manually copy the certificates from the primary control plane node to the joining control plane nodes.

There are many ways to do this. In the following example we are using ssh and scp:

SSH is required if you want to control all nodes from a single machine.

Enable ssh-agent on your main device that has access to all other nodes in the system:
```
eval $(ssh-agent)
```
Add your SSH identity to the session:
```
ssh-add ~/.ssh/path_to_private_key
```
SSH between nodes to check that the connection is working correctly.
- When you SSH to any node, make sure to add the -A flag:
```
ssh -A 10.0.0.7
```
- When using sudo on any node, make sure to preserve the environment so SSH forwarding works:
```
sudo -E -s
```

After configuring SSH on all the nodes you should run the following script on the first control plane node after running kubeadm init. This script will copy the certificates from the first control plane node to the other control plane nodes:

In the following example, replace CONTROL_PLANE_IPS with the IP addresses of the other control plane nodes.

USER=ubuntu # customizable
CONTROL_PLANE_IPS="10.0.0.7 10.0.0.8"
for host in ${CONTROL_PLANE_IPS}; do
    scp /etc/kubernetes/pki/ca.crt "${USER}"@$host:
    scp /etc/kubernetes/pki/ca.key "${USER}"@$host:
    scp /etc/kubernetes/pki/sa.key "${USER}"@$host:
    scp /etc/kubernetes/pki/sa.pub "${USER}"@$host:
    scp /etc/kubernetes/pki/front-proxy-ca.crt "${USER}"@$host:
    scp /etc/kubernetes/pki/front-proxy-ca.key "${USER}"@$host:
    scp /etc/kubernetes/pki/etcd/ca.crt "${USER}"@$host:etcd-ca.crt
    # Quote this line if you are using external etcd
    scp /etc/kubernetes/pki/etcd/ca.key "${USER}"@$host:etcd-ca.key
done

Caution: Copy only the certificates in the above list. kubeadm will take care of generating the rest of the certificates with the required SANs for the joining control-plane instances. If you copy all the certificates by mistake, the creation of additional nodes could fail due to a lack of required SANs.

Then on each joining control plane node you have to run the following script before running kubeadm join. This script will move the previously copied certificates from the home directory to /etc/kubernetes/pki:

USER=ubuntu # customizable
mkdir -p /etc/kubernetes/pki/etcd
mv /home/${USER}/ca.crt /etc/kubernetes/pki/
mv /home/${USER}/ca.key /etc/kubernetes/pki/
mv /home/${USER}/sa.pub /etc/kubernetes/pki/
mv /home/${USER}/sa.key /etc/kubernetes/pki/
mv /home/${USER}/front-proxy-ca.crt /etc/kubernetes/pki/
mv /home/${USER}/front-proxy-ca.key /etc/kubernetes/pki/
mv /home/${USER}/etcd-ca.crt /etc/kubernetes/pki/etcd/ca.crt
# Quote this line if you are using external etcd
mv /home/${USER}/etcd-ca.key /etc/kubernetes/pki/etcd/ca.key

3.2.1.7 - Set up a High Availability etcd cluster with kubeadm

Note: While kubeadm is being used as the management tool for external etcd nodes in this guide, please note that kubeadm does not plan to support certificate rotation or upgrades for such nodes. The long term plan is to empower the tool etcdadm to manage these aspects.

Kubeadm defaults to running a single member etcd cluster in a static pod managed by the kubelet on the control plane node. This is not a high availability setup as the etcd cluster contains only one member and cannot sustain any members becoming unavailable. This task walks through the process of creating a high availability etcd cluster of three members that can be used as an external etcd when using kubeadm to set up a kubernetes cluster.

Before you begin

Three hosts that can talk to each other over ports 2379 and 2380. This document assumes these default ports. However, they are configurable through the kubeadm config file.
Each host must have docker, kubelet, and kubeadm installed.
Each host should have access to the Kubernetes container image registry (k8s.gcr.io) or list/pull the required etcd image using kubeadm config images list/pull. This guide will setup etcd instances as static pods managed by a kubelet.
Some infrastructure to copy files between hosts. For example ssh and scp can satisfy this requirement.

Setting up the cluster

The general approach is to generate all certs on one node and only distribute the necessary files to the other nodes.

Note: kubeadm contains all the necessary crytographic machinery to generate the certificates described below; no other cryptographic tooling is required for this example.

Configure the kubelet to be a service manager for etcd.

Note: You must do this on every host where etcd should be running.

Since etcd was created first, you must override the service priority by creating a new unit file that has higher precedence than the kubeadm-provided kubelet unit file.

cat << EOF > /etc/systemd/system/kubelet.service.d/20-etcd-service-manager.conf
[Service]
ExecStart=
#  Replace "systemd" with the cgroup driver of your container runtime. The default value in the kubelet is "cgroupfs".
ExecStart=/usr/bin/kubelet --address=127.0.0.1 --pod-manifest-path=/etc/kubernetes/manifests --cgroup-driver=systemd
Restart=always
EOF

systemctl daemon-reload
systemctl restart kubelet

Check the kubelet status to ensure it is running.

systemctl status kubelet

Create configuration files for kubeadm.

Generate one kubeadm configuration file for each host that will have an etcd member running on it using the following script.

# Update HOST0, HOST1, and HOST2 with the IPs or resolvable names of your hosts
export HOST0=10.0.0.6
export HOST1=10.0.0.7
export HOST2=10.0.0.8

# Create temp directories to store files that will end up on other hosts.
mkdir -p /tmp/${HOST0}/ /tmp/${HOST1}/ /tmp/${HOST2}/

ETCDHOSTS=(${HOST0} ${HOST1} ${HOST2})
NAMES=("infra0" "infra1" "infra2")

for i in "${!ETCDHOSTS[@]}"; do
HOST=${ETCDHOSTS[$i]}
NAME=${NAMES[$i]}
cat << EOF > /tmp/${HOST}/kubeadmcfg.yaml
apiVersion: "kubeadm.k8s.io/v1beta2"
kind: ClusterConfiguration
etcd:
    local:
        serverCertSANs:
        - "${HOST}"
        peerCertSANs:
        - "${HOST}"
        extraArgs:
            initial-cluster: ${NAMES[0]}=https://${ETCDHOSTS[0]}:2380,${NAMES[1]}=https://${ETCDHOSTS[1]}:2380,${NAMES[2]}=https://${ETCDHOSTS[2]}:2380
            initial-cluster-state: new
            name: ${NAME}
            listen-peer-urls: https://${HOST}:2380
            listen-client-urls: https://${HOST}:2379
            advertise-client-urls: https://${HOST}:2379
            initial-advertise-peer-urls: https://${HOST}:2380
EOF
done

Generate the certificate authority

If you already have a CA then the only action that is copying the CA's crt and key file to /etc/kubernetes/pki/etcd/ca.crt and /etc/kubernetes/pki/etcd/ca.key. After those files have been copied, proceed to the next step, "Create certificates for each member".

If you do not already have a CA then run this command on $HOST0 (where you generated the configuration files for kubeadm).
```
kubeadm init phase certs etcd-ca
```
This creates two files
- /etc/kubernetes/pki/etcd/ca.crt
- /etc/kubernetes/pki/etcd/ca.key

Create certificates for each member

kubeadm init phase certs etcd-server --config=/tmp/${HOST2}/kubeadmcfg.yaml
kubeadm init phase certs etcd-peer --config=/tmp/${HOST2}/kubeadmcfg.yaml
kubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST2}/kubeadmcfg.yaml
kubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST2}/kubeadmcfg.yaml
cp -R /etc/kubernetes/pki /tmp/${HOST2}/
# cleanup non-reusable certificates
find /etc/kubernetes/pki -not -name ca.crt -not -name ca.key -type f -delete

kubeadm init phase certs etcd-server --config=/tmp/${HOST1}/kubeadmcfg.yaml
kubeadm init phase certs etcd-peer --config=/tmp/${HOST1}/kubeadmcfg.yaml
kubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST1}/kubeadmcfg.yaml
kubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST1}/kubeadmcfg.yaml
cp -R /etc/kubernetes/pki /tmp/${HOST1}/
find /etc/kubernetes/pki -not -name ca.crt -not -name ca.key -type f -delete

kubeadm init phase certs etcd-server --config=/tmp/${HOST0}/kubeadmcfg.yaml
kubeadm init phase certs etcd-peer --config=/tmp/${HOST0}/kubeadmcfg.yaml
kubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST0}/kubeadmcfg.yaml
kubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST0}/kubeadmcfg.yaml
# No need to move the certs because they are for HOST0

# clean up certs that should not be copied off this host
find /tmp/${HOST2} -name ca.key -type f -delete
find /tmp/${HOST1} -name ca.key -type f -delete

Copy certificates and kubeadm configs

The certificates have been generated and now they must be moved to their respective hosts.

USER=ubuntu
HOST=${HOST1}
scp -r /tmp/${HOST}/* ${USER}@${HOST}:
ssh ${USER}@${HOST}
USER@HOST $ sudo -Es
root@HOST $ chown -R root:root pki
root@HOST $ mv pki /etc/kubernetes/

Ensure all expected files exist

The complete list of required files on $HOST0 is:

/tmp/${HOST0}
└── kubeadmcfg.yaml
---
/etc/kubernetes/pki
├── apiserver-etcd-client.crt
├── apiserver-etcd-client.key
└── etcd
    ├── ca.crt
    ├── ca.key
    ├── healthcheck-client.crt
    ├── healthcheck-client.key
    ├── peer.crt
    ├── peer.key
    ├── server.crt
    └── server.key

On $HOST1:

$HOME
└── kubeadmcfg.yaml
---
/etc/kubernetes/pki
├── apiserver-etcd-client.crt
├── apiserver-etcd-client.key
└── etcd
    ├── ca.crt
    ├── healthcheck-client.crt
    ├── healthcheck-client.key
    ├── peer.crt
    ├── peer.key
    ├── server.crt
    └── server.key

On $HOST2

$HOME
└── kubeadmcfg.yaml
---
/etc/kubernetes/pki
├── apiserver-etcd-client.crt
├── apiserver-etcd-client.key
└── etcd
    ├── ca.crt
    ├── healthcheck-client.crt
    ├── healthcheck-client.key
    ├── peer.crt
    ├── peer.key
    ├── server.crt
    └── server.key

Create the static pod manifests

Now that the certificates and configs are in place it's time to create the manifests. On each host run the kubeadm command to generate a static manifest for etcd.

root@HOST0 $ kubeadm init phase etcd local --config=/tmp/${HOST0}/kubeadmcfg.yaml
root@HOST1 $ kubeadm init phase etcd local --config=/home/ubuntu/kubeadmcfg.yaml
root@HOST2 $ kubeadm init phase etcd local --config=/home/ubuntu/kubeadmcfg.yaml

Optional: Check the cluster health

docker run --rm -it \
--net host \
-v /etc/kubernetes:/etc/kubernetes k8s.gcr.io/etcd:${ETCD_TAG} etcdctl \
--cert /etc/kubernetes/pki/etcd/peer.crt \
--key /etc/kubernetes/pki/etcd/peer.key \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--endpoints https://${HOST0}:2379 endpoint health --cluster
...
https://[HOST0 IP]:2379 is healthy: successfully committed proposal: took = 16.283339ms
https://[HOST1 IP]:2379 is healthy: successfully committed proposal: took = 19.44402ms
https://[HOST2 IP]:2379 is healthy: successfully committed proposal: took = 35.926451ms

Set ${ETCD_TAG} to the version tag of your etcd image. For example 3.4.3-0. To see the etcd image and tag that kubeadm uses execute kubeadm config images list --kubernetes-version ${K8S_VERSION}, where ${K8S_VERSION} is for example v1.17.0
Set ${HOST0}to the IP address of the host you are testing.

What's next

Once you have a working 3 member etcd cluster, you can continue setting up a highly available control plane using the external etcd method with kubeadm.

3.2.1.8 - Configuring each kubelet in your cluster using kubeadm

FEATURE STATE: Kubernetes v1.11 [stable]

The lifecycle of the kubeadm CLI tool is decoupled from the kubelet, which is a daemon that runs on each node within the Kubernetes cluster. The kubeadm CLI tool is executed by the user when Kubernetes is initialized or upgraded, whereas the kubelet is always running in the background.

Since the kubelet is a daemon, it needs to be maintained by some kind of an init system or service manager. When the kubelet is installed using DEBs or RPMs, systemd is configured to manage the kubelet. You can use a different service manager instead, but you need to configure it manually.

Some kubelet configuration details need to be the same across all kubelets involved in the cluster, while other configuration aspects need to be set on a per-kubelet basis to accommodate the different characteristics of a given machine (such as OS, storage, and networking). You can manage the configuration of your kubelets manually, but kubeadm now provides a KubeletConfiguration API type for managing your kubelet configurations centrally.

Kubelet configuration patterns

The following sections describe patterns to kubelet configuration that are simplified by using kubeadm, rather than managing the kubelet configuration for each Node manually.

Propagating cluster-level configuration to each kubelet

You can provide the kubelet with default values to be used by kubeadm init and kubeadm join commands. Interesting examples include using a different CRI runtime or setting the default subnet used by services.

If you want your services to use the subnet 10.96.0.0/12 as the default for services, you can pass the --service-cidr parameter to kubeadm:

kubeadm init --service-cidr 10.96.0.0/12

Virtual IPs for services are now allocated from this subnet. You also need to set the DNS address used by the kubelet, using the --cluster-dns flag. This setting needs to be the same for every kubelet on every manager and Node in the cluster. The kubelet provides a versioned, structured API object that can configure most parameters in the kubelet and push out this configuration to each running kubelet in the cluster. This object is called KubeletConfiguration. The KubeletConfiguration allows the user to specify flags such as the cluster DNS IP addresses expressed as a list of values to a camelCased key, illustrated by the following example:

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
clusterDNS:
- 10.96.0.10

For more details on the KubeletConfiguration have a look at this section.

Providing instance-specific configuration details

Some hosts require specific kubelet configurations due to differences in hardware, operating system, networking, or other host-specific parameters. The following list provides a few examples.

The path to the DNS resolution file, as specified by the --resolv-conf kubelet configuration flag, may differ among operating systems, or depending on whether you are using systemd-resolved. If this path is wrong, DNS resolution will fail on the Node whose kubelet is configured incorrectly.
The Node API object .metadata.name is set to the machine's hostname by default, unless you are using a cloud provider. You can use the --hostname-override flag to override the default behavior if you need to specify a Node name different from the machine's hostname.
Currently, the kubelet cannot automatically detect the cgroup driver used by the CRI runtime, but the value of --cgroup-driver must match the cgroup driver used by the CRI runtime to ensure the health of the kubelet.
Depending on the CRI runtime your cluster uses, you may need to specify different flags to the kubelet. For instance, when using Docker, you need to specify flags such as --network-plugin=cni, but if you are using an external runtime, you need to specify --container-runtime=remote and specify the CRI endpoint using the --container-runtime-endpoint=<path>.

You can specify these flags by configuring an individual kubelet's configuration in your service manager, such as systemd.

Configure kubelets using kubeadm

It is possible to configure the kubelet that kubeadm will start if a custom KubeletConfiguration API object is passed with a configuration file like so kubeadm ... --config some-config-file.yaml.

By calling kubeadm config print init-defaults --component-configs KubeletConfiguration you can see all the default values for this structure.

Also have a look at the reference for the KubeletConfiguration for more information on the individual fields.

Workflow when using `kubeadm init`

When you call kubeadm init, the kubelet configuration is marshalled to disk at /var/lib/kubelet/config.yaml, and also uploaded to a ConfigMap in the cluster. The ConfigMap is named kubelet-config-1.X, where X is the minor version of the Kubernetes version you are initializing. A kubelet configuration file is also written to /etc/kubernetes/kubelet.conf with the baseline cluster-wide configuration for all kubelets in the cluster. This configuration file points to the client certificates that allow the kubelet to communicate with the API server. This addresses the need to propagate cluster-level configuration to each kubelet.

To address the second pattern of providing instance-specific configuration details, kubeadm writes an environment file to /var/lib/kubelet/kubeadm-flags.env, which contains a list of flags to pass to the kubelet when it starts. The flags are presented in the file like this:

KUBELET_KUBEADM_ARGS="--flag1=value1 --flag2=value2 ..."

In addition to the flags used when starting the kubelet, the file also contains dynamic parameters such as the cgroup driver and whether to use a different CRI runtime socket (--cri-socket).

After marshalling these two files to disk, kubeadm attempts to run the following two commands, if you are using systemd:

systemctl daemon-reload && systemctl restart kubelet

If the reload and restart are successful, the normal kubeadm init workflow continues.

Workflow when using `kubeadm join`

When you run kubeadm join, kubeadm uses the Bootstrap Token credential to perform a TLS bootstrap, which fetches the credential needed to download the kubelet-config-1.X ConfigMap and writes it to /var/lib/kubelet/config.yaml. The dynamic environment file is generated in exactly the same way as kubeadm init.

Next, kubeadm runs the following two commands to load the new configuration into the kubelet:

systemctl daemon-reload && systemctl restart kubelet

After the kubelet loads the new configuration, kubeadm writes the /etc/kubernetes/bootstrap-kubelet.conf KubeConfig file, which contains a CA certificate and Bootstrap Token. These are used by the kubelet to perform the TLS Bootstrap and obtain a unique credential, which is stored in /etc/kubernetes/kubelet.conf. When this file is written, the kubelet has finished performing the TLS Bootstrap.

The kubelet drop-in file for systemd

kubeadm ships with configuration for how systemd should run the kubelet. Note that the kubeadm CLI command never touches this drop-in file.

This configuration file installed by the kubeadm DEB or RPM package is written to /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and is used by systemd. It augments the basic kubelet.service for RPM or kubelet.service for DEB:

[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf
--kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generate at runtime, populating
the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably,
# the user should use the .NodeRegistration.KubeletExtraArgs object in the configuration files instead.
# KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

This file specifies the default locations for all of the files managed by kubeadm for the kubelet.

The KubeConfig file to use for the TLS Bootstrap is /etc/kubernetes/bootstrap-kubelet.conf, but it is only used if /etc/kubernetes/kubelet.conf does not exist.
The KubeConfig file with the unique kubelet identity is /etc/kubernetes/kubelet.conf.
The file containing the kubelet's ComponentConfig is /var/lib/kubelet/config.yaml.
The dynamic environment file that contains KUBELET_KUBEADM_ARGS is sourced from /var/lib/kubelet/kubeadm-flags.env.
The file that can contain user-specified flag overrides with KUBELET_EXTRA_ARGS is sourced from /etc/default/kubelet (for DEBs), or /etc/sysconfig/kubelet (for RPMs). KUBELET_EXTRA_ARGS is last in the flag chain and has the highest priority in the event of conflicting settings.

Kubernetes binaries and package contents

The DEB and RPM packages shipped with the Kubernetes releases are:

Package name	Description
`kubeadm`	Installs the `/usr/bin/kubeadm` CLI tool and the kubelet drop-in file for the kubelet.
`kubelet`	Installs the kubelet binary in `/usr/bin` and CNI binaries in `/opt/cni/bin`.
`kubectl`	Installs the `/usr/bin/kubectl` binary.
`cri-tools`	Installs the `/usr/bin/crictl` binary from the cri-tools git repository.

3.2.1.9 - Dual-stack support with kubeadm

FEATURE STATE: Kubernetes v1.21 [beta]

Your Kubernetes cluster can run in dual-stack networking mode, which means that cluster networking lets you use either address family. In a dual-stack cluster, the control plane can assign both an IPv4 address and an IPv6 address to a single Pod or a Service.

Before you begin

You need to have installed the kubeadm tool, following the steps from Installing kubeadm.

For each server that you want to use as a node, make sure it allows IPv6 forwarding. On Linux, you can set this by running run sysctl -w net.ipv6.conf.all.forwarding=1 as the root user on each server.

You need to have an IPv4 and and IPv6 address range to use. Cluster operators typically use private address ranges for IPv4. For IPv6, a cluster operator typically chooses a global unicast address block from within 2000::/3, using a range that is assigned to the operator.
You don't have to route the cluster's IP address ranges to the public internet.

The size of the IP address allocations should be suitable for the number of Pods and Services that you are planning to run.

Note: If you are upgrading an existing cluster then, by default, the kubeadm upgrade command changes the feature gate IPv6DualStack to true if that is not already enabled.
However, kubeadm does not support making modifications to the pod IP address range (“cluster CIDR”) nor to the cluster's Service address range (“Service CIDR”).

Create a dual-stack cluster

To create a dual-stack cluster with kubeadm init you can pass command line arguments similar to the following example:

# These address ranges are examples
kubeadm init --pod-network-cidr=10.244.0.0/16,2001:db8:42:0::/56 --service-cidr=10.96.0.0/16,2001:db8:42:1::/112

To make things clearer, here is an example kubeadm configuration file kubeadm-config.yaml for the primary dual-stack control plane node.

---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
featureGates:
  IPv6DualStack: true
networking:
  podSubnet: 10.244.0.0/16,2001:db8:42:0::/56
  serviceSubnet: 10.96.0.0/16,2001:db8:42:1::/112
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: "10.100.0.1"
  bindPort: 6443
nodeRegistration:
  kubeletExtraArgs:
    node-ip: 10.100.0.2,fd00:1:2:3::2

advertiseAddress in InitConfiguration specifies the IP address that the API Server will advertise it is listening on. The value of advertiseAddress equals the --apiserver-advertise-address flag of kubeadm init

Run kubeadm to initiate the dual-stack control plane node:

kubeadm init --config=kubeadm-config.yaml

Currently, the kube-controller-manager flags --node-cidr-mask-size-ipv4|--node-cidr-mask-size-ipv6 are being left with default values. See enable IPv4/IPv6 dual stack.

Note: The --apiserver-advertise-address flag does not support dual-stack.

Join a node to dual-stack cluster

Before joining a node, make sure that the node has IPv6 routable network interface and allows IPv6 forwarding.

Here is an example kubeadm configuration file kubeadm-config.yaml for joining a worker node to the cluster.

apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
discovery:
  bootstrapToken:
    apiServerEndpoint: 10.100.0.1:6443
nodeRegistration:
  kubeletExtraArgs:
    node-ip: 10.100.0.3,fd00:1:2:3::3

Also, here is an example kubeadm configuration file kubeadm-config.yaml for joining another control plane node to the cluster.

apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
controlPlane:
  localAPIEndpoint:
    advertiseAddress: "10.100.0.2"
    bindPort: 6443
discovery:
  bootstrapToken:
    apiServerEndpoint: 10.100.0.1:6443
nodeRegistration:
  kubeletExtraArgs:
    node-ip: 10.100.0.4,fd00:1:2:3::4

advertiseAddress in JoinConfiguration.controlPlane specifies the IP address that the API Server will advertise it is listening on. The value of advertiseAddress equals the --apiserver-advertise-address flag of kubeadm join.

kubeadm join --config=kubeadm-config.yaml ...

Create a single-stack cluster

Note: Enabling the dual-stack feature doesn't mean that you need to use dual-stack addressing.
You can deploy a single-stack cluster that has the dual-stack networking feature enabled.

In 1.21 the IPv6DualStack feature is Beta and the feature gate is defaulted to true. To disable the feature you must configure the feature gate to false. Note that once the feature is GA, the feature gate will be removed.

kubeadm init --feature-gates IPv6DualStack=false

To make things more clear, here is an example kubeadm configuration file kubeadm-config.yaml for the single-stack control plane node.

apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
featureGates:
  IPv6DualStack: false
networking:
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.96.0.0/16

What's next

Validate IPv4/IPv6 dual-stack networking
Read about Dual-stack cluster networking

3.2.2 - Installing Kubernetes with kops

This quickstart shows you how to easily install a Kubernetes cluster on AWS. It uses a tool called kops.

kops is an automated provisioning system:

Fully automated installation
Uses DNS to identify clusters
Self-healing: everything runs in Auto-Scaling Groups
Multiple OS support (Debian, Ubuntu 16.04 supported, CentOS & RHEL, Amazon Linux and CoreOS) - see the images.md
High-Availability support - see the high_availability.md
Can directly provision, or generate terraform manifests - see the terraform.md

Before you begin

You must have kubectl installed.
You must install kops on a 64-bit (AMD64 and Intel 64) device architecture.
You must have an AWS account, generate IAM keys and configure them. The IAM user will need adequate permissions.

Creating a cluster

(1/5) Install kops

Installation

Download kops from the releases page (it is also convenient to build from source):

macOS
Linux

Download the latest release with the command:

curl -LO https://github.com/kubernetes/kops/releases/download/$(curl -s https://api.github.com/repos/kubernetes/kops/releases/latest | grep tag_name | cut -d '"' -f 4)/kops-darwin-amd64

To download a specific version, replace the following portion of the command with the specific kops version.

$(curl -s https://api.github.com/repos/kubernetes/kops/releases/latest | grep tag_name | cut -d '"' -f 4)

For example, to download kops version v1.20.0 type:

curl -LO https://github.com/kubernetes/kops/releases/download/v1.20.0/kops-darwin-amd64

Make the kops binary executable.

chmod +x kops-darwin-amd64

Move the kops binary in to your PATH.

sudo mv kops-darwin-amd64 /usr/local/bin/kops

You can also install kops using Homebrew.

brew update && brew install kops

Download the latest release with the command:

curl -LO https://github.com/kubernetes/kops/releases/download/$(curl -s https://api.github.com/repos/kubernetes/kops/releases/latest | grep tag_name | cut -d '"' -f 4)/kops-linux-amd64

To download a specific version of kops, replace the following portion of the command with the specific kops version.

$(curl -s https://api.github.com/repos/kubernetes/kops/releases/latest | grep tag_name | cut -d '"' -f 4)

For example, to download kops version v1.20.0 type:

curl -LO https://github.com/kubernetes/kops/releases/download/v1.20.0/kops-linux-amd64

Make the kops binary executable

chmod +x kops-linux-amd64

Move the kops binary in to your PATH.

sudo mv kops-linux-amd64 /usr/local/bin/kops

You can also install kops using Homebrew.

brew update && brew install kops

(2/5) Create a route53 domain for your cluster

kops uses DNS for discovery, both inside the cluster and outside, so that you can reach the kubernetes API server from clients.

kops has a strong opinion on the cluster name: it should be a valid DNS name. By doing so you will no longer get your clusters confused, you can share clusters with your colleagues unambiguously, and you can reach them without relying on remembering an IP address.

You can, and probably should, use subdomains to divide your clusters. As our example we will use useast1.dev.example.com. The API server endpoint will then be api.useast1.dev.example.com.

A Route53 hosted zone can serve subdomains. Your hosted zone could be useast1.dev.example.com, but also dev.example.com or even example.com. kops works with any of these, so typically you choose for organization reasons (e.g. you are allowed to create records under dev.example.com, but not under example.com).

Let's assume you're using dev.example.com as your hosted zone. You create that hosted zone using the normal process, or with a command such as aws route53 create-hosted-zone --name dev.example.com --caller-reference 1.

You must then set up your NS records in the parent domain, so that records in the domain will resolve. Here, you would create NS records in example.com for dev. If it is a root domain name you would configure the NS records at your domain registrar (e.g. example.com would need to be configured where you bought example.com).

Verify your route53 domain setup (it is the #1 cause of problems!). You can double-check that your cluster is configured correctly if you have the dig tool by running:

dig NS dev.example.com

You should see the 4 NS records that Route53 assigned your hosted zone.

(3/5) Create an S3 bucket to store your clusters state

kops lets you manage your clusters even after installation. To do this, it must keep track of the clusters that you have created, along with their configuration, the keys they are using etc. This information is stored in an S3 bucket. S3 permissions are used to control access to the bucket.

Multiple clusters can use the same S3 bucket, and you can share an S3 bucket between your colleagues that administer the same clusters - this is much easier than passing around kubecfg files. But anyone with access to the S3 bucket will have administrative access to all your clusters, so you don't want to share it beyond the operations team.

So typically you have one S3 bucket for each ops team (and often the name will correspond to the name of the hosted zone above!)

In our example, we chose dev.example.com as our hosted zone, so let's pick clusters.dev.example.com as the S3 bucket name.

Export AWS_PROFILE (if you need to select a profile for the AWS CLI to work)
Create the S3 bucket using aws s3 mb s3://clusters.dev.example.com
You can export KOPS_STATE_STORE=s3://clusters.dev.example.com and then kops will use this location by default. We suggest putting this in your bash profile or similar.

(4/5) Build your cluster configuration

Run kops create cluster to create your cluster configuration:

kops create cluster --zones=us-east-1c useast1.dev.example.com

kops will create the configuration for your cluster. Note that it only creates the configuration, it does not actually create the cloud resources - you'll do that in the next step with a kops update cluster. This give you an opportunity to review the configuration or change it.

It prints commands you can use to explore further:

List your clusters with: kops get cluster
Edit this cluster with: kops edit cluster useast1.dev.example.com
Edit your node instance group: kops edit ig --name=useast1.dev.example.com nodes
Edit your master instance group: kops edit ig --name=useast1.dev.example.com master-us-east-1c

If this is your first time using kops, do spend a few minutes to try those out! An instance group is a set of instances, which will be registered as kubernetes nodes. On AWS this is implemented via auto-scaling-groups. You can have several instance groups, for example if you wanted nodes that are a mix of spot and on-demand instances, or GPU and non-GPU instances.

(5/5) Create the cluster in AWS

Run "kops update cluster" to create your cluster in AWS:

kops update cluster useast1.dev.example.com --yes

That takes a few seconds to run, but then your cluster will likely take a few minutes to actually be ready. kops update cluster will be the tool you'll use whenever you change the configuration of your cluster; it applies the changes you have made to the configuration to your cluster - reconfiguring AWS or kubernetes as needed.

For example, after you kops edit ig nodes, then kops update cluster --yes to apply your configuration, and sometimes you will also have to kops rolling-update cluster to roll out the configuration immediately.

Without --yes, kops update cluster will show you a preview of what it is going to do. This is handy for production clusters!

Explore other add-ons

See the list of add-ons to explore other add-ons, including tools for logging, monitoring, network policy, visualization, and control of your Kubernetes cluster.

Cleanup

To delete your cluster: kops delete cluster useast1.dev.example.com --yes

What's next

Learn more about Kubernetes concepts and kubectl.
Learn more about kops advanced usage for tutorials, best practices and advanced configuration options.
Follow kops community discussions on Slack: community discussions
Contribute to kops by addressing or raising an issue GitHub Issues

3.2.3 - Installing Kubernetes with Kubespray

This quickstart helps to install a Kubernetes cluster hosted on GCE, Azure, OpenStack, AWS, vSphere, Packet (bare metal), Oracle Cloud Infrastructure (Experimental) or Baremetal with Kubespray.

Kubespray is a composition of Ansible playbooks, inventory, provisioning tools, and domain knowledge for generic OS/Kubernetes clusters configuration management tasks. Kubespray provides:

a highly available cluster
composable attributes
support for most popular Linux distributions
- Ubuntu 16.04, 18.04, 20.04
- CentOS/RHEL/Oracle Linux 7, 8
- Debian Buster, Jessie, Stretch, Wheezy
- Fedora 31, 32
- Fedora CoreOS
- openSUSE Leap 15
- Flatcar Container Linux by Kinvolk
continuous integration tests

To choose a tool which best fits your use case, read this comparison to kubeadm and kops.

Creating a cluster

(1/5) Meet the underlay requirements

Provision servers with the following requirements:

Ansible v2.9 and python-netaddr is installed on the machine that will run Ansible commands
Jinja 2.11 (or newer) is required to run the Ansible Playbooks
The target servers must have access to the Internet in order to pull docker images. Otherwise, additional configuration is required (See Offline Environment)
The target servers are configured to allow IPv4 forwarding
Your ssh key must be copied to all the servers part of your inventory
The firewalls are not managed, you'll need to implement your own rules the way you used to. in order to avoid any issue during deployment you should disable your firewall
If kubespray is ran from non-root user account, correct privilege escalation method should be configured in the target servers. Then the ansible_become flag or command parameters --become or -b should be specified

Kubespray provides the following utilities to help provision your environment:

Terraform scripts for the following cloud providers:
- AWS
- OpenStack
- Packet

(2/5) Compose an inventory file

After you provision your servers, create an inventory file for Ansible. You can do this manually or via a dynamic inventory script. For more information, see "Building your own inventory".

(3/5) Plan your cluster deployment

Kubespray provides the ability to customize many aspects of the deployment:

Choice deployment mode: kubeadm or non-kubeadm
CNI (networking) plugins
DNS configuration
Choice of control plane: native/binary or containerized
Component versions
Calico route reflectors
Component runtime options
- Docker
- containerd
- CRI-O
Certificate generation methods

Kubespray customizations can be made to a variable file. If you are getting started with Kubespray, consider using the Kubespray defaults to deploy your cluster and explore Kubernetes.

(4/5) Deploy a Cluster

Next, deploy your cluster:

Cluster deployment using ansible-playbook.

ansible-playbook -i your/inventory/inventory.ini cluster.yml -b -v \
  --private-key=~/.ssh/private_key

Large deployments (100+ nodes) may require specific adjustments for best results.

(5/5) Verify the deployment

Kubespray provides a way to verify inter-pod connectivity and DNS resolve with Netchecker. Netchecker ensures the netchecker-agents pods can resolve DNS requests and ping each over within the default namespace. Those pods mimic similar behavior of the rest of the workloads and serve as cluster health indicators.

Cluster operations

Kubespray provides additional playbooks to manage your cluster: scale and upgrade.

Scale your cluster

You can add worker nodes from your cluster by running the scale playbook. For more information, see "Adding nodes". You can remove worker nodes from your cluster by running the remove-node playbook. For more information, see "Remove nodes".

Upgrade your cluster

You can upgrade your cluster by running the upgrade-cluster playbook. For more information, see "Upgrades".

Cleanup

You can reset your nodes and wipe out all components installed with Kubespray via the reset playbook.

Caution: When running the reset playbook, be sure not to accidentally target your production cluster!

Feedback

Slack Channel: #kubespray (You can get your invite here)
GitHub Issues

What's next

Check out planned work on Kubespray's roadmap.

3.3 - Turnkey Cloud Solutions

This page provides a list of Kubernetes certified solution providers. From each provider page, you can learn how to install and setup production ready clusters.

3.4 - Windows in Kubernetes

3.4.1 - Intro to Windows support in Kubernetes

Windows applications constitute a large portion of the services and applications that run in many organizations. Windows containers provide a modern way to encapsulate processes and package dependencies, making it easier to use DevOps practices and follow cloud native patterns for Windows applications. Kubernetes has become the defacto standard container orchestrator, and the release of Kubernetes 1.14 includes production support for scheduling Windows containers on Windows nodes in a Kubernetes cluster, enabling a vast ecosystem of Windows applications to leverage the power of Kubernetes. Organizations with investments in Windows-based applications and Linux-based applications don't have to look for separate orchestrators to manage their workloads, leading to increased operational efficiencies across their deployments, regardless of operating system.

Windows containers in Kubernetes

To enable the orchestration of Windows containers in Kubernetes, include Windows nodes in your existing Linux cluster. Scheduling Windows containers in Pods on Kubernetes is similar to scheduling Linux-based containers.

In order to run Windows containers, your Kubernetes cluster must include multiple operating systems, with control plane nodes running Linux and workers running either Windows or Linux depending on your workload needs. Windows Server 2019 is the only Windows operating system supported, enabling Kubernetes Node on Windows (including kubelet, container runtime, and kube-proxy). For a detailed explanation of Windows distribution channels see the Microsoft documentation.

Note: The Kubernetes control plane, including the master components, continues to run on Linux. There are no plans to have a Windows-only Kubernetes cluster.

Note: In this document, when we talk about Windows containers we mean Windows containers with process isolation. Windows containers with Hyper-V isolation is planned for a future release.

Supported Functionality and Limitations

Supported Functionality

Windows OS Version Support

Refer to the following table for Windows operating system support in Kubernetes. A single heterogeneous Kubernetes cluster can have both Windows and Linux worker nodes. Windows containers have to be scheduled on Windows nodes and Linux containers on Linux nodes.

Kubernetes version	Windows Server LTSC releases	Windows Server SAC releases
Kubernetes v1.17	Windows Server 2019	Windows Server ver 1809
Kubernetes v1.18	Windows Server 2019	Windows Server ver 1809, Windows Server ver 1903, Windows Server ver 1909
Kubernetes v1.19	Windows Server 2019	Windows Server ver 1909, Windows Server ver 2004
Kubernetes v1.20	Windows Server 2019	Windows Server ver 1909, Windows Server ver 2004

Note: Information on the different Windows Server servicing channels including their support models can be found at Windows Server servicing channels.

Note: We don't expect all Windows customers to update the operating system for their apps frequently. Upgrading your applications is what dictates and necessitates upgrading or introducing new nodes to the cluster. For the customers that chose to upgrade their operating system for containers running on Kubernetes, we will offer guidance and step-by-step instructions when we add support for a new operating system version. This guidance will include recommended upgrade procedures for upgrading user applications together with cluster nodes. Windows nodes adhere to Kubernetes version-skew policy (node to control plane versioning) the same way as Linux nodes do today.

Note: The Windows Server Host Operating System is subject to the Windows Server licensing. The Windows Container images are subject to the Supplemental License Terms for Windows containers.

Note: Windows containers with process isolation have strict compatibility rules, where the host OS version must match the container base image OS version. Once we support Windows containers with Hyper-V isolation in Kubernetes, the limitation and compatibility rules will change.

Pause Image

Microsoft maintains a Windows pause infrastructure container at mcr.microsoft.com/oss/kubernetes/pause:1.4.1.

Compute

From an API and kubectl perspective, Windows containers behave in much the same way as Linux-based containers. However, there are some notable differences in key functionality which are outlined in the limitation section.

Key Kubernetes elements work the same way in Windows as they do in Linux. In this section, we talk about some of the key workload enablers and how they map to Windows.

Pods

A Pod is the basic building block of Kubernetes–the smallest and simplest unit in the Kubernetes object model that you create or deploy. You may not deploy Windows and Linux containers in the same Pod. All containers in a Pod are scheduled onto a single Node where each Node represents a specific platform and architecture. The following Pod capabilities, properties and events are supported with Windows containers:
- Single or multiple containers per Pod with process isolation and volume sharing
- Pod status fields
- Readiness and Liveness probes
- postStart & preStop container lifecycle events
- ConfigMap, Secrets: as environment variables or volumes
- EmptyDir
- Named pipe host mounts
- Resource limits
Controllers

Kubernetes controllers handle the desired state of Pods. The following workload controllers are supported with Windows containers:
- ReplicaSet
- ReplicationController
- Deployments
- StatefulSets
- DaemonSet
- Job
- CronJob
Services

A Kubernetes Service is an abstraction which defines a logical set of Pods and a policy by which to access them - sometimes called a micro-service. You can use services for cross-operating system connectivity. In Windows, services can utilize the following types, properties and capabilities:
- Service Environment variables
- NodePort
- ClusterIP
- LoadBalancer
- ExternalName
- Headless services

Pods, Controllers and Services are critical elements to managing Windows workloads on Kubernetes. However, on their own they are not enough to enable the proper lifecycle management of Windows workloads in a dynamic cloud native environment. We added support for the following features:

Pod and container metrics
Horizontal Pod Autoscaler support
kubectl Exec
Resource Quotas
Scheduler preemption

Container Runtime

Docker EE

FEATURE STATE: Kubernetes v1.14 [stable]

Docker EE-basic 19.03+ is the recommended container runtime for all Windows Server versions. This works with the dockershim code included in the kubelet.

CRI-ContainerD

FEATURE STATE: Kubernetes v1.20 [stable]

ContainerD 1.4.0+ can also be used as the container runtime for Windows Kubernetes nodes.

Learn how to install ContainerD on a Windows.

Caution: There is a known limitation when using GMSA with ContainerD to access Windows network shares which requires a kernel patch. Updates to address this limitation are currently available for Windows Server, Version 2004 and will be available for Windows Server 2019 in early 2021. Check for updates on the Microsoft Windows Containers issue tracker.

Persistent Storage

Kubernetes volumes enable complex applications, with data persistence and Pod volume sharing requirements, to be deployed on Kubernetes. Management of persistent volumes associated with a specific storage back-end or protocol includes actions such as: provisioning/de-provisioning/resizing of volumes, attaching/detaching a volume to/from a Kubernetes node and mounting/dismounting a volume to/from individual containers in a pod that needs to persist data. The code implementing these volume management actions for a specific storage back-end or protocol is shipped in the form of a Kubernetes volume plugin. The following broad classes of Kubernetes volume plugins are supported on Windows:

In-tree Volume Plugins

Code associated with in-tree volume plugins ship as part of the core Kubernetes code base. Deployment of in-tree volume plugins do not require installation of additional scripts or deployment of separate containerized plugin components. These plugins can handle: provisioning/de-provisioning and resizing of volumes in the storage backend, attaching/detaching of volumes to/from a Kubernetes node and mounting/dismounting a volume to/from individual containers in a pod. The following in-tree plugins support Windows nodes:

FlexVolume Plugins

Code associated with FlexVolume plugins ship as out-of-tree scripts or binaries that need to be deployed directly on the host. FlexVolume plugins handle attaching/detaching of volumes to/from a Kubernetes node and mounting/dismounting a volume to/from individual containers in a pod. Provisioning/De-provisioning of persistent volumes associated with FlexVolume plugins may be handled through an external provisioner that is typically separate from the FlexVolume plugins. The following FlexVolume plugins, deployed as powershell scripts on the host, support Windows nodes:

SMB
iSCSI

CSI Plugins

FEATURE STATE: Kubernetes v1.19 [beta]

Code associated with CSI plugins ship as out-of-tree scripts and binaries that are typically distributed as container images and deployed using standard Kubernetes constructs like DaemonSets and StatefulSets. CSI plugins handle a wide range of volume management actions in Kubernetes: provisioning/de-provisioning/resizing of volumes, attaching/detaching of volumes to/from a Kubernetes node and mounting/dismounting a volume to/from individual containers in a pod, backup/restore of persistent data using snapshots and cloning. CSI plugins typically consist of node plugins (that run on each node as a DaemonSet) and controller plugins.

CSI node plugins (especially those associated with persistent volumes exposed as either block devices or over a shared file-system) need to perform various privileged operations like scanning of disk devices, mounting of file systems, etc. These operations differ for each host operating system. For Linux worker nodes, containerized CSI node plugins are typically deployed as privileged containers. For Windows worker nodes, privileged operations for containerized CSI node plugins is supported using csi-proxy, a community-managed, stand-alone binary that needs to be pre-installed on each Windows node. Please refer to the deployment guide of the CSI plugin you wish to deploy for further details.

Networking

Networking for Windows containers is exposed through CNI plugins. Windows containers function similarly to virtual machines in regards to networking. Each container has a virtual network adapter (vNIC) which is connected to a Hyper-V virtual switch (vSwitch). The Host Networking Service (HNS) and the Host Compute Service (HCS) work together to create containers and attach container vNICs to networks. HCS is responsible for the management of containers whereas HNS is responsible for the management of networking resources such as:

Virtual networks (including creation of vSwitches)
Endpoints / vNICs
Namespaces
Policies (Packet encapsulations, Load-balancing rules, ACLs, NAT'ing rules, etc.)

The following service spec types are supported:

NodePort
ClusterIP
LoadBalancer
ExternalName

Network modes

Windows supports five different networking drivers/modes: L2bridge, L2tunnel, Overlay, Transparent, and NAT. In a heterogeneous cluster with Windows and Linux worker nodes, you need to select a networking solution that is compatible on both Windows and Linux. The following out-of-tree plugins are supported on Windows, with recommendations on when to use each CNI:

Network Driver	Description	Container Packet Modifications	Network Plugins	Network Plugin Characteristics
L2bridge	Containers are attached to an external vSwitch. Containers are attached to the underlay network, although the physical network doesn't need to learn the container MACs because they are rewritten on ingress/egress.	MAC is rewritten to host MAC, IP may be rewritten to host IP using HNS OutboundNAT policy.	win-bridge, Azure-CNI, Flannel host-gateway uses win-bridge	win-bridge uses L2bridge network mode, connects containers to the underlay of hosts, offering best performance. Requires user-defined routes (UDR) for inter-node connectivity.
L2Tunnel	This is a special case of l2bridge, but only used on Azure. All packets are sent to the virtualization host where SDN policy is applied.	MAC rewritten, IP visible on the underlay network	Azure-CNI	Azure-CNI allows integration of containers with Azure vNET, and allows them to leverage the set of capabilities that Azure Virtual Network provides. For example, securely connect to Azure services or use Azure NSGs. See azure-cni for some examples
Overlay (Overlay networking for Windows in Kubernetes is in alpha stage)	Containers are given a vNIC connected to an external vSwitch. Each overlay network gets its own IP subnet, defined by a custom IP prefix.The overlay network driver uses VXLAN encapsulation.	Encapsulated with an outer header.	Win-overlay, Flannel VXLAN (uses win-overlay)	win-overlay should be used when virtual container networks are desired to be isolated from underlay of hosts (e.g. for security reasons). Allows for IPs to be re-used for different overlay networks (which have different VNID tags) if you are restricted on IPs in your datacenter. This option requires KB4489899 on Windows Server 2019.
Transparent (special use case for ovn-kubernetes)	Requires an external vSwitch. Containers are attached to an external vSwitch which enables intra-pod communication via logical networks (logical switches and routers).	Packet is encapsulated either via GENEVE or STT tunneling to reach pods which are not on the same host. Packets are forwarded or dropped via the tunnel metadata information supplied by the ovn network controller. NAT is done for north-south communication.	ovn-kubernetes	Deploy via ansible. Distributed ACLs can be applied via Kubernetes policies. IPAM support. Load-balancing can be achieved without kube-proxy. NATing is done without using iptables/netsh.
NAT (not used in Kubernetes)	Containers are given a vNIC connected to an internal vSwitch. DNS/DHCP is provided using an internal component called WinNAT	MAC and IP is rewritten to host MAC/IP.	nat	Included here for completeness

As outlined above, the Flannel CNI meta plugin is also supported on Windows via the VXLAN network backend (alpha support ; delegates to win-overlay) and host-gateway network backend (stable support; delegates to win-bridge). This plugin supports delegating to one of the reference CNI plugins (win-overlay, win-bridge), to work in conjunction with Flannel daemon on Windows (Flanneld) for automatic node subnet lease assignment and HNS network creation. This plugin reads in its own configuration file (cni.conf), and aggregates it with the environment variables from the FlannelD generated subnet.env file. It then delegates to one of the reference CNI plugins for network plumbing, and sends the correct configuration containing the node-assigned subnet to the IPAM plugin (e.g. host-local).

For the node, pod, and service objects, the following network flows are supported for TCP/UDP traffic:

Pod -> Pod (IP)
Pod -> Pod (Name)
Pod -> Service (Cluster IP)
Pod -> Service (PQDN, but only if there are no ".")
Pod -> Service (FQDN)
Pod -> External (IP)
Pod -> External (DNS)
Node -> Pod
Pod -> Node

IP address management (IPAM)

The following IPAM options are supported on Windows:

Host-local
HNS IPAM (Inbox platform IPAM, this is a fallback when no IPAM is set)
Azure-vnet-ipam (for azure-cni only)

Load balancing and Services

On Windows, you can use the following settings to configure Services and load balancing behavior:

Windows Service Settings
Feature	Description	Supported Kubernetes version	Supported Windows OS build	How to enable
Session affinity	Ensures that connections from a particular client are passed to the same Pod each time.	v1.20+	Windows Server vNext Insider Preview Build 19551 (or higher)	Set `service.spec.sessionAffinity` to "ClientIP"
Direct Server Return (DSR)	Load balancing mode where the IP address fixups and the LBNAT occurs at the container vSwitch port directly; service traffic arrives with the source IP set as the originating pod IP.	v1.20+	Windows Server 2019	Set the following flags in kube-proxy: `--feature-gates="WinDSR=true" --enable-dsr=true`
Preserve-Destination	Skips DNAT of service traffic, thereby preserving the virtual IP of the target service in packets reaching the backend Pod. Also disables node-node forwarding.	v1.20+	Windows Server, version 1903 (or higher)	Set `"preserve-destination": "true"` in service annotations and enable DSR in kube-proxy.
IPv4/IPv6 dual-stack networking	Native IPv4-to-IPv4 in parallel with IPv6-to-IPv6 communications to, from, and within a cluster	v1.19+	Windows Server, version 2004 (or higher)	See IPv4/IPv6 dual-stack
Client IP preservation	Ensures that source IP of incoming ingress traffic gets preserved. Also disables node-node forwarding.	v1.20+	Windows Server, version 2019 (or higher)	Set `service.spec.externalTrafficPolicy` to "Local" and enable DSR in kube-proxy

IPv4/IPv6 dual-stack

You can enable IPv4/IPv6 dual-stack networking for l2bridge networks using the IPv6DualStack feature gate. See enable IPv4/IPv6 dual stack for more details.

Note: On Windows, using IPv6 with Kubernetes require Windows Server, version 2004 (kernel version 10.0.19041.610) or later.

Note: Overlay (VXLAN) networks on Windows do not support dual-stack networking today.

Limitations

Windows is only supported as a worker node in the Kubernetes architecture and component matrix. This means that a Kubernetes cluster must always include Linux master nodes, zero or more Linux worker nodes, and zero or more Windows worker nodes.

Resource Handling

Linux cgroups are used as a pod boundary for resource controls in Linux. Containers are created within that boundary for network, process and file system isolation. The cgroups APIs can be used to gather cpu/io/memory stats. In contrast, Windows uses a Job object per container with a system namespace filter to contain all processes in a container and provide logical isolation from the host. There is no way to run a Windows container without the namespace filtering in place. This means that system privileges cannot be asserted in the context of the host, and thus privileged containers are not available on Windows. Containers cannot assume an identity from the host because the Security Account Manager (SAM) is separate.

Resource Reservations

Memory Reservations

Windows does not have an out-of-memory process killer as Linux does. Windows always treats all user-mode memory allocations as virtual, and pagefiles are mandatory. The net effect is that Windows won't reach out of memory conditions the same way Linux does, and processes page to disk instead of being subject to out of memory (OOM) termination. If memory is over-provisioned and all physical memory is exhausted, then paging can slow down performance.

Keeping memory usage within reasonable bounds is possible using the kubelet parameters --kubelet-reserve and/or --system-reserve to account for memory usage on the node (outside of containers). This reduces NodeAllocatable.

Note: As you deploy workloads, use resource limits (must set only limits or limits must equal requests) on containers. This also subtracts from NodeAllocatable and prevents the scheduler from adding more pods once a node is full.

A best practice to avoid over-provisioning is to configure the kubelet with a system reserved memory of at least 2GB to account for Windows, Docker, and Kubernetes processes.

CPU Reservations

To account for Windows, Docker and other Kubernetes host processes it is recommended to reserve a percentage of CPU so they are able to respond to events. This value needs to be scaled based on the number of CPU cores available on the Windows node.To determine this percentage a user should identify the maximum pod density for each of their nodes and monitor the CPU usage of the system services choosing a value that meets their workload needs.

Keeping CPU usage within reasonable bounds is possible using the kubelet parameters --kubelet-reserve and/or --system-reserve to account for CPU usage on the node (outside of containers). This reduces NodeAllocatable.

Feature Restrictions

TerminationGracePeriod: not implemented
Single file mapping: to be implemented with CRI-ContainerD
Termination message: to be implemented with CRI-ContainerD
Privileged Containers: not currently supported in Windows containers
HugePages: not currently supported in Windows containers
The existing node problem detector is Linux-only and requires privileged containers. In general, we don't expect this to be used on Windows because privileged containers are not supported
Not all features of shared namespaces are supported (see API section for more details)

Difference in behavior of flags when compared to Linux

The behavior of the following kubelet flags is different on Windows nodes as described below:

--kubelet-reserve, --system-reserve , and --eviction-hard flags update Node Allocatable
Eviction by using --enforce-node-allocable is not implemented
Eviction by using --eviction-hard and --eviction-soft are not implemented
MemoryPressure Condition is not implemented
There are no OOM eviction actions taken by the kubelet
Kubelet running on the windows node does not have memory restrictions. --kubelet-reserve and --system-reserve do not set limits on kubelet or processes running on the host. This means kubelet or a process on the host could cause memory resource starvation outside the node-allocatable and scheduler
An additional flag to set the priority of the kubelet process is available on the Windows nodes called --windows-priorityclass. This flag allows kubelet process to get more CPU time slices when compared to other processes running on the Windows host. More information on the allowable values and their meaning is available at Windows Priority Classes. In order for kubelet to always have enough CPU cycles it is recommended to set this flag to ABOVE_NORMAL_PRIORITY_CLASS and above

Storage

Windows has a layered filesystem driver to mount container layers and create a copy filesystem based on NTFS. All file paths in the container are resolved only within the context of that container.

With Docker Volume mounts can only target a directory in the container, and not an individual file. This limitation does not exist with CRI-containerD.
Volume mounts cannot project files or directories back to the host filesystem
Read-only filesystems are not supported because write access is always required for the Windows registry and SAM database. However, read-only volumes are supported
Volume user-masks and permissions are not available. Because the SAM is not shared between the host & container, there's no mapping between them. All permissions are resolved within the context of the container

As a result, the following storage functionality is not supported on Windows nodes

Volume subpath mounts. Only the entire volume can be mounted in a Windows container.
Subpath volume mounting for Secrets
Host mount projection
DefaultMode (due to UID/GID dependency)
Read-only root filesystem. Mapped volumes still support readOnly
Block device mapping
Memory as the storage medium
File system features like uui/guid, per-user Linux filesystem permissions
NFS based storage/volume support
Expanding the mounted volume (resizefs)

Networking

Windows Container Networking differs in some important ways from Linux networking. The Microsoft documentation for Windows Container Networking contains additional details and background.

The Windows host networking service and virtual switch implement namespacing and can create virtual NICs as needed for a pod or container. However, many configurations such as DNS, routes, and metrics are stored in the Windows registry database rather than /etc/... files as they are on Linux. The Windows registry for the container is separate from that of the host, so concepts like mapping /etc/resolv.conf from the host into a container don't have the same effect they would on Linux. These must be configured using Windows APIs run in the context of that container. Therefore CNI implementations need to call the HNS instead of relying on file mappings to pass network details into the pod or container.

The following networking functionality is not supported on Windows nodes

Host networking mode is not available for Windows pods
Local NodePort access from the node itself fails (works for other nodes or external clients)
Accessing service VIPs from nodes will be available with a future release of Windows Server
A single service can only support up to 64 backend pods / unique destination IPs
Overlay networking support in kube-proxy is a beta feature. In addition, it requires KB4482887 to be installed on Windows Server 2019
Local Traffic Policy in non-DSR mode
Windows containers connected to overlay networks do not support communicating over the IPv6 stack. There is outstanding Windows platform work required to enable this network driver to consume IPv6 addresses and subsequent Kubernetes work in kubelet, kube-proxy, and CNI plugins.
Outbound communication using the ICMP protocol via the win-overlay, win-bridge, and Azure-CNI plugin. Specifically, the Windows data plane (VFP) doesn't support ICMP packet transpositions. This means:
- ICMP packets directed to destinations within the same network (e.g. pod to pod communication via ping) work as expected and without any limitations
- TCP/UDP packets work as expected and without any limitations
- ICMP packets directed to pass through a remote network (e.g. pod to external internet communication via ping) cannot be transposed and thus will not be routed back to their source
- Since TCP/UDP packets can still be transposed, one can substitute ping <destination> with curl <destination> to be able to debug connectivity to the outside world.

These features were added in Kubernetes v1.15:

kubectl port-forward

CNI Plugins

Windows reference network plugins win-bridge and win-overlay do not currently implement CNI spec v0.4.0 due to missing "CHECK" implementation.
The Flannel VXLAN CNI has the following limitations on Windows:

Node-pod connectivity isn't possible by design. It's only possible for local pods with Flannel v0.12.0 (or higher).
We are restricted to using VNI 4096 and UDP port 4789. The VNI limitation is being worked on and will be overcome in a future release (open-source flannel changes). See the official Flannel VXLAN backend docs for more details on these parameters.

DNS

ClusterFirstWithHostNet is not supported for DNS. Windows treats all names with a '.' as a FQDN and skips PQDN resolution
On Linux, you have a DNS suffix list, which is used when trying to resolve PQDNs. On Windows, we only have 1 DNS suffix, which is the DNS suffix associated with that pod's namespace (mydns.svc.cluster.local for example). Windows can resolve FQDNs and services or names resolvable with only that suffix. For example, a pod spawned in the default namespace, will have the DNS suffix default.svc.cluster.local. On a Windows pod, you can resolve both kubernetes.default.svc.cluster.local and kubernetes, but not the in-betweens, like kubernetes.default or kubernetes.default.svc.
On Windows, there are multiple DNS resolvers that can be used. As these come with slightly different behaviors, using the Resolve-DNSName utility for name query resolutions is recommended.

IPv6

Kubernetes on Windows does not support single-stack "IPv6-only" networking. However,dual-stack IPv4/IPv6 networking for pods and nodes with single-family services is supported. See IPv4/IPv6 dual-stack networking for more details.

Session affinity

Setting the maximum session sticky time for Windows services using service.spec.sessionAffinityConfig.clientIP.timeoutSeconds is not supported.

Security

Secrets are written in clear text on the node's volume (as compared to tmpfs/in-memory on linux). This means customers have to do two things

Use file ACLs to secure the secrets file location
Use volume-level encryption using BitLocker

RunAsUsername can be specified for Windows Pod's or Container's to execute the Container processes as a node-default user. This is roughly equivalent to RunAsUser.

Linux specific pod security context privileges such as SELinux, AppArmor, Seccomp, Capabilities (POSIX Capabilities), and others are not supported.

In addition, as mentioned already, privileged containers are not supported on Windows.

API

There are no differences in how most of the Kubernetes APIs work for Windows. The subtleties around what's different come down to differences in the OS and container runtime. In certain situations, some properties on workload APIs such as Pod or Container were designed with an assumption that they are implemented on Linux, failing to run on Windows.

At a high level, these OS concepts are different:

Identity - Linux uses userID (UID) and groupID (GID) which are represented as integer types. User and group names are not canonical - they are an alias in /etc/groups or /etc/passwd back to UID+GID. Windows uses a larger binary security identifier (SID) which is stored in the Windows Security Access Manager (SAM) database. This database is not shared between the host and containers, or between containers.
File permissions - Windows uses an access control list based on SIDs, rather than a bitmask of permissions and UID+GID
File paths - convention on Windows is to use \ instead of /. The Go IO libraries accept both types of file path separators. However, when you're setting a path or command line that's interpreted inside a container, \ may be needed.
Signals - Windows interactive apps handle termination differently, and can implement one or more of these:
- A UI thread handles well-defined messages including WM_CLOSE
- Console apps handle ctrl-c or ctrl-break using a Control Handler
- Services register a Service Control Handler function that can accept SERVICE_CONTROL_STOP control codes

Exit Codes follow the same convention where 0 is success, nonzero is failure. The specific error codes may differ across Windows and Linux. However, exit codes passed from the Kubernetes components (kubelet, kube-proxy) are unchanged.

V1.Container

V1.Container.ResourceRequirements.limits.cpu and V1.Container.ResourceRequirements.limits.memory - Windows doesn't use hard limits for CPU allocations. Instead, a share system is used. The existing fields based on millicores are scaled into relative shares that are followed by the Windows scheduler. see: kuberuntime/helpers_windows.go, see: resource controls in Microsoft docs
- Huge pages are not implemented in the Windows container runtime, and are not available. They require asserting a user privilege that's not configurable for containers.
V1.Container.ResourceRequirements.requests.cpu and V1.Container.ResourceRequirements.requests.memory - Requests are subtracted from node available resources, so they can be used to avoid overprovisioning a node. However, they cannot be used to guarantee resources in an overprovisioned node. They should be applied to all containers as a best practice if the operator wants to avoid overprovisioning entirely.
V1.Container.SecurityContext.allowPrivilegeEscalation - not possible on Windows, none of the capabilities are hooked up
V1.Container.SecurityContext.Capabilities - POSIX capabilities are not implemented on Windows
V1.Container.SecurityContext.privileged - Windows doesn't support privileged containers
V1.Container.SecurityContext.procMount - Windows doesn't have a /proc filesystem
V1.Container.SecurityContext.readOnlyRootFilesystem - not possible on Windows, write access is required for registry & system processes to run inside the container
V1.Container.SecurityContext.runAsGroup - not possible on Windows, no GID support
V1.Container.SecurityContext.runAsNonRoot - Windows does not have a root user. The closest equivalent is ContainerAdministrator which is an identity that doesn't exist on the node.
V1.Container.SecurityContext.runAsUser - not possible on Windows, no UID support as int.
V1.Container.SecurityContext.seLinuxOptions - not possible on Windows, no SELinux
V1.Container.terminationMessagePath - this has some limitations in that Windows doesn't support mapping single files. The default value is /dev/termination-log, which does work because it does not exist on Windows by default.

V1.Pod

V1.Pod.hostIPC, v1.pod.hostpid - host namespace sharing is not possible on Windows
V1.Pod.hostNetwork - There is no Windows OS support to share the host network
V1.Pod.dnsPolicy - ClusterFirstWithHostNet - is not supported because Host Networking is not supported on Windows.
V1.Pod.podSecurityContext - see V1.PodSecurityContext below
V1.Pod.shareProcessNamespace - this is a beta feature, and depends on Linux namespaces which are not implemented on Windows. Windows cannot share process namespaces or the container's root filesystem. Only the network can be shared.
V1.Pod.terminationGracePeriodSeconds - this is not fully implemented in Docker on Windows, see: reference. The behavior today is that the ENTRYPOINT process is sent CTRL_SHUTDOWN_EVENT, then Windows waits 5 seconds by default, and finally shuts down all processes using the normal Windows shutdown behavior. The 5 second default is actually in the Windows registry inside the container, so it can be overridden when the container is built.
V1.Pod.volumeDevices - this is a beta feature, and is not implemented on Windows. Windows cannot attach raw block devices to pods.
V1.Pod.volumes - EmptyDir, Secret, ConfigMap, HostPath - all work and have tests in TestGrid
- V1.emptyDirVolumeSource - the Node default medium is disk on Windows. Memory is not supported, as Windows does not have a built-in RAM disk.
V1.VolumeMount.mountPropagation - mount propagation is not supported on Windows.

V1.PodSecurityContext

None of the PodSecurityContext fields work on Windows. They're listed here for reference.

V1.PodSecurityContext.SELinuxOptions - SELinux is not available on Windows
V1.PodSecurityContext.RunAsUser - provides a UID, not available on Windows
V1.PodSecurityContext.RunAsGroup - provides a GID, not available on Windows
V1.PodSecurityContext.RunAsNonRoot - Windows does not have a root user. The closest equivalent is ContainerAdministrator which is an identity that doesn't exist on the node.
V1.PodSecurityContext.SupplementalGroups - provides GID, not available on Windows
V1.PodSecurityContext.Sysctls - these are part of the Linux sysctl interface. There's no equivalent on Windows.

Operating System Version Restrictions

Windows has strict compatibility rules, where the host OS version must match the container base image OS version. Only Windows containers with a container operating system of Windows Server 2019 are supported. Hyper-V isolation of containers, enabling some backward compatibility of Windows container image versions, is planned for a future release.

Getting Help and Troubleshooting

Your main source of help for troubleshooting your Kubernetes cluster should start with this section. Some additional, Windows-specific troubleshooting help is included in this section. Logs are an important element of troubleshooting issues in Kubernetes. Make sure to include them any time you seek troubleshooting assistance from other contributors. Follow the instructions in the SIG-Windows contributing guide on gathering logs.

How do I know start.ps1 completed successfully?

You should see kubelet, kube-proxy, and (if you chose Flannel as your networking solution) flanneld host-agent processes running on your node, with running logs being displayed in separate PowerShell windows. In addition to this, your Windows node should be listed as "Ready" in your Kubernetes cluster.

Can I configure the Kubernetes node processes to run in the background as services?

Kubelet and kube-proxy are already configured to run as native Windows Services, offering resiliency by re-starting the services automatically in the event of failure (for example a process crash). You have two options for configuring these node components as services.

As native Windows Services

Kubelet & kube-proxy can be run as native Windows Services using sc.exe.

# Create the services for kubelet and kube-proxy in two separate commands
sc.exe create <component_name> binPath= "<path_to_binary> --service <other_args>"

# Please note that if the arguments contain spaces, they must be escaped.
sc.exe create kubelet binPath= "C:\kubelet.exe --service --hostname-override 'minion' <other_args>"

# Start the services
Start-Service kubelet
Start-Service kube-proxy

# Stop the service
Stop-Service kubelet (-Force)
Stop-Service kube-proxy (-Force)

# Query the service status
Get-Service kubelet
Get-Service kube-proxy

Using nssm.exe

You can also always use alternative service managers like nssm.exe to run these processes (flanneld, kubelet & kube-proxy) in the background for you. You can use this sample script, leveraging nssm.exe to register kubelet, kube-proxy, and flanneld.exe to run as Windows services in the background.

register-svc.ps1 -NetworkMode <Network mode> -ManagementIP <Windows Node IP> -ClusterCIDR <Cluster subnet> -KubeDnsServiceIP <Kube-dns Service IP> -LogDir <Directory to place logs>

# NetworkMode      = The network mode l2bridge (flannel host-gw, also the default value) or overlay (flannel vxlan) chosen as a network solution
# ManagementIP     = The IP address assigned to the Windows node. You can use ipconfig to find this
# ClusterCIDR      = The cluster subnet range. (Default value 10.244.0.0/16)
# KubeDnsServiceIP = The Kubernetes DNS service IP (Default value 10.96.0.10)
# LogDir           = The directory where kubelet and kube-proxy logs are redirected into their respective output files (Default value C:\k)

If the above referenced script is not suitable, you can manually configure nssm.exe using the following examples.

# Register flanneld.exe
nssm install flanneld C:\flannel\flanneld.exe
nssm set flanneld AppParameters --kubeconfig-file=c:\k\config --iface=<ManagementIP> --ip-masq=1 --kube-subnet-mgr=1
nssm set flanneld AppEnvironmentExtra NODE_NAME=<hostname>
nssm set flanneld AppDirectory C:\flannel
nssm start flanneld

# Register kubelet.exe
# Microsoft releases the pause infrastructure container at mcr.microsoft.com/oss/kubernetes/pause:1.4.1
nssm install kubelet C:\k\kubelet.exe
nssm set kubelet AppParameters --hostname-override=<hostname> --v=6 --pod-infra-container-image=mcr.microsoft.com/oss/kubernetes/pause:1.4.1 --resolv-conf="" --allow-privileged=true --enable-debugging-handlers --cluster-dns=<DNS-service-IP> --cluster-domain=cluster.local --kubeconfig=c:\k\config --hairpin-mode=promiscuous-bridge --image-pull-progress-deadline=20m --cgroups-per-qos=false  --log-dir=<log directory> --logtostderr=false --enforce-node-allocatable="" --network-plugin=cni --cni-bin-dir=c:\k\cni --cni-conf-dir=c:\k\cni\config
nssm set kubelet AppDirectory C:\k
nssm start kubelet

# Register kube-proxy.exe (l2bridge / host-gw)
nssm install kube-proxy C:\k\kube-proxy.exe
nssm set kube-proxy AppDirectory c:\k
nssm set kube-proxy AppParameters --v=4 --proxy-mode=kernelspace --hostname-override=<hostname>--kubeconfig=c:\k\config --enable-dsr=false --log-dir=<log directory> --logtostderr=false
nssm.exe set kube-proxy AppEnvironmentExtra KUBE_NETWORK=cbr0
nssm set kube-proxy DependOnService kubelet
nssm start kube-proxy

# Register kube-proxy.exe (overlay / vxlan)
nssm install kube-proxy C:\k\kube-proxy.exe
nssm set kube-proxy AppDirectory c:\k
nssm set kube-proxy AppParameters --v=4 --proxy-mode=kernelspace --feature-gates="WinOverlay=true" --hostname-override=<hostname> --kubeconfig=c:\k\config --network-name=vxlan0 --source-vip=<source-vip> --enable-dsr=false --log-dir=<log directory> --logtostderr=false
nssm set kube-proxy DependOnService kubelet
nssm start kube-proxy

For initial troubleshooting, you can use the following flags in nssm.exe to redirect stdout and stderr to a output file:

nssm set <Service Name> AppStdout C:\k\mysvc.log
nssm set <Service Name> AppStderr C:\k\mysvc.log

For additional details, see official nssm usage docs.

My Windows Pods do not have network connectivity

If you are using virtual machines, ensure that MAC spoofing is enabled on all the VM network adapter(s).
My Windows Pods cannot ping external resources

Windows Pods do not have outbound rules programmed for the ICMP protocol today. However, TCP/UDP is supported. When trying to demonstrate connectivity to resources outside of the cluster, please substitute ping <IP> with corresponding curl <IP> commands.

If you are still facing problems, most likely your network configuration in cni.conf deserves some extra attention. You can always edit this static file. The configuration update will apply to any newly created Kubernetes resources.

One of the Kubernetes networking requirements (see Kubernetes model) is for cluster communication to occur without NAT internally. To honor this requirement, there is an ExceptionList for all the communication where we do not want outbound NAT to occur. However, this also means that you need to exclude the external IP you are trying to query from the ExceptionList. Only then will the traffic originating from your Windows pods be SNAT'ed correctly to receive a response from the outside world. In this regard, your ExceptionList in cni.conf should look as follows:
```
"ExceptionList": [
                "10.244.0.0/16",  # Cluster subnet
                "10.96.0.0/12",   # Service subnet
                "10.127.130.0/24" # Management (host) subnet
            ]
```
My Windows node cannot access NodePort service

Local NodePort access from the node itself fails. This is a known limitation. NodePort access works from other nodes or external clients.
vNICs and HNS endpoints of containers are being deleted

This issue can be caused when the hostname-override parameter is not passed to kube-proxy. To resolve it, users need to pass the hostname to kube-proxy as follows:
```
C:\k\kube-proxy.exe --hostname-override=$(hostname)
```
With flannel my nodes are having issues after rejoining a cluster

Whenever a previously deleted node is being re-joined to the cluster, flannelD tries to assign a new pod subnet to the node. Users should remove the old pod subnet configuration files in the following paths:
```
Remove-Item C:\k\SourceVip.json
Remove-Item C:\k\SourceVipRequest.json
```
After launching start.ps1, flanneld is stuck in "Waiting for the Network to be created"

There are numerous reports of this issue; most likely it is a timing issue for when the management IP of the flannel network is set. A workaround is to relaunch start.ps1 or relaunch it manually as follows:
```
PS C:> [Environment]::SetEnvironmentVariable("NODE_NAME", "<Windows_Worker_Hostname>")
PS C:> C:\flannel\flanneld.exe --kubeconfig-file=c:\k\config --iface=<Windows_Worker_Node_IP> --ip-masq=1 --kube-subnet-mgr=1
```
My Windows Pods cannot launch because of missing /run/flannel/subnet.env

This indicates that Flannel didn't launch correctly. You can either try to restart flanneld.exe or you can copy the files over manually from /run/flannel/subnet.env on the Kubernetes master to C:\run\flannel\subnet.env on the Windows worker node and modify the FLANNEL_SUBNET row to a different number. For example, if node subnet 10.244.4.1/24 is desired:
```
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.4.1/24
FLANNEL_MTU=1500
FLANNEL_IPMASQ=true
```
My Windows node cannot access my services using the service IP

This is a known limitation of the current networking stack on Windows. Windows Pods are able to access the service IP however.
No network adapter is found when starting kubelet

The Windows networking stack needs a virtual adapter for Kubernetes networking to work. If the following commands return no results (in an admin shell), virtual network creation — a necessary prerequisite for Kubelet to work — has failed:
```
Get-HnsNetwork | ? Name -ieq "cbr0"
Get-NetAdapter | ? Name -Like "vEthernet (Ethernet*"
```
Often it is worthwhile to modify the InterfaceName parameter of the start.ps1 script, in cases where the host's network adapter isn't "Ethernet". Otherwise, consult the output of the start-kubelet.ps1 script to see if there are errors during virtual network creation.
My Pods are stuck at "Container Creating" or restarting over and over

Check that your pause image is compatible with your OS version. The instructions assume that both the OS and the containers are version 1803. If you have a later version of Windows, such as an Insider build, you need to adjust the images accordingly. Please refer to the Microsoft's Docker repository for images. Regardless, both the pause image Dockerfile and the sample service expect the image to be tagged as :latest.
DNS resolution is not properly working

Check the DNS limitations for Windows in this section.
kubectl port-forward fails with "unable to do port forwarding: wincat not found"

This was implemented in Kubernetes 1.15 by including wincat.exe in the pause infrastructure container mcr.microsoft.com/oss/kubernetes/pause:1.4.1. Be sure to use these versions or newer ones. If you would like to build your own pause infrastructure container be sure to include wincat.

My Kubernetes installation is failing because my Windows Server node is behind a proxy

If you are behind a proxy, the following PowerShell environment variables must be defined:

[Environment]::SetEnvironmentVariable("HTTP_PROXY", "http://proxy.example.com:80/", [EnvironmentVariableTarget]::Machine)
[Environment]::SetEnvironmentVariable("HTTPS_PROXY", "http://proxy.example.com:443/", [EnvironmentVariableTarget]::Machine)

What is a pause container?

In a Kubernetes Pod, an infrastructure or "pause" container is first created to host the container endpoint. Containers that belong to the same pod, including infrastructure and worker containers, share a common network namespace and endpoint (same IP and port space). Pause containers are needed to accommodate worker containers crashing or restarting without losing any of the networking configuration.

The "pause" (infrastructure) image is hosted on Microsoft Container Registry (MCR). You can access it using mcr.microsoft.com/oss/kubernetes/pause:1.4.1. For more details, see the DOCKERFILE.

Further investigation

If these steps don't resolve your problem, you can get help running Windows containers on Windows nodes in Kubernetes through:

StackOverflow Windows Server Container topic
Kubernetes Official Forum discuss.kubernetes.io
Kubernetes Slack #SIG-Windows Channel

Reporting Issues and Feature Requests

If you have what looks like a bug, or you would like to make a feature request, please use the GitHub issue tracking system. You can open issues on GitHub and assign them to SIG-Windows. You should first search the list of issues in case it was reported previously and comment with your experience on the issue and add additional logs. SIG-Windows Slack is also a great avenue to get some initial support and troubleshooting ideas prior to creating a ticket.

If filing a bug, please include detailed information about how to reproduce the problem, such as:

Kubernetes version: kubectl version
Environment details: Cloud provider, OS distro, networking choice and configuration, and Docker version
Detailed steps to reproduce the problem
Relevant logs
Tag the issue sig/windows by commenting on the issue with /sig windows to bring it to a SIG-Windows member's attention

What's next

We have a lot of features in our roadmap. An abbreviated high level list is included below, but we encourage you to view our roadmap project and help us make Windows support better by contributing.

Hyper-V isolation

Hyper-V isolation is required to enable the following use cases for Windows containers in Kubernetes:

Hypervisor-based isolation between pods for additional security
Backwards compatibility allowing a node to run a newer Windows Server version without requiring containers to be rebuilt
Specific CPU/NUMA settings for a pod
Memory isolation and reservations

Hyper-V isolation support will be added in a later release and will require CRI-Containerd.

Deployment with kubeadm and cluster API

Kubeadm is becoming the de facto standard for users to deploy a Kubernetes cluster. Windows node support in kubeadm is currently a work-in-progress but a guide is available here. We are also making investments in cluster API to ensure Windows nodes are properly provisioned.

3.4.2 - Guide for scheduling Windows containers in Kubernetes

Windows applications constitute a large portion of the services and applications that run in many organizations. This guide walks you through the steps to configure and deploy a Windows container in Kubernetes.

Objectives

Configure an example deployment to run Windows containers on the Windows node
(Optional) Configure an Active Directory Identity for your Pod using Group Managed Service Accounts (GMSA)

Before you begin

Create a Kubernetes cluster that includes a master and a worker node running Windows Server
It is important to note that creating and deploying services and workloads on Kubernetes behaves in much the same way for Linux and Windows containers. Kubectl commands to interface with the cluster are identical. The example in the section below is provided to jumpstart your experience with Windows containers.

Getting Started: Deploying a Windows container

To deploy a Windows container on Kubernetes, you must first create an example application. The example YAML file below creates a simple webserver application. Create a service spec named win-webserver.yaml with the contents below:

apiVersion: v1
kind: Service
metadata:
  name: win-webserver
  labels:
    app: win-webserver
spec:
  ports:
    # the port that this service should serve on
    - port: 80
      targetPort: 80
  selector:
    app: win-webserver
  type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: win-webserver
  name: win-webserver
spec:
  replicas: 2
  selector:
    matchLabels:
      app: win-webserver
  template:
    metadata:
      labels:
        app: win-webserver
      name: win-webserver
    spec:
     containers:
      - name: windowswebserver
        image: mcr.microsoft.com/windows/servercore:ltsc2019
        command:
        - powershell.exe
        - -command
        - "<#code used from https://gist.github.com/19WAS85/5424431#> ; $$listener = New-Object System.Net.HttpListener ; $$listener.Prefixes.Add('http://*:80/') ; $$listener.Start() ; $$callerCounts = @{} ; Write-Host('Listening at http://*:80/') ; while ($$listener.IsListening) { ;$$context = $$listener.GetContext() ;$$requestUrl = $$context.Request.Url ;$$clientIP = $$context.Request.RemoteEndPoint.Address ;$$response = $$context.Response ;Write-Host '' ;Write-Host('> {0}' -f $$requestUrl) ;  ;$$count = 1 ;$$k=$$callerCounts.Get_Item($$clientIP) ;if ($$k -ne $$null) { $$count += $$k } ;$$callerCounts.Set_Item($$clientIP, $$count) ;$$ip=(Get-NetAdapter | Get-NetIpAddress); $$header='<html><body><H1>Windows Container Web Server</H1>' ;$$callerCountsString='' ;$$callerCounts.Keys | % { $$callerCountsString+='<p>IP {0} callerCount {1} ' -f $$ip[1].IPAddress,$$callerCounts.Item($$_) } ;$$footer='</body></html>' ;$$content='{0}{1}{2}' -f $$header,$$callerCountsString,$$footer ;Write-Output $$content ;$$buffer = [System.Text.Encoding]::UTF8.GetBytes($$content) ;$$response.ContentLength64 = $$buffer.Length ;$$response.OutputStream.Write($$buffer, 0, $$buffer.Length) ;$$response.Close() ;$$responseStatus = $$response.StatusCode ;Write-Host('< {0}' -f $$responseStatus)  } ; "
     nodeSelector:
      kubernetes.io/os: windows

Note: Port mapping is also supported, but for simplicity in this example the container port 80 is exposed directly to the service.

Check that all nodes are healthy:
```
kubectl get nodes
```
Deploy the service and watch for pod updates:
```
kubectl apply -f win-webserver.yaml
kubectl get pods -o wide -w
```
When the service is deployed correctly both Pods are marked as Ready. To exit the watch command, press Ctrl+C.
Check that the deployment succeeded. To verify:
- Two containers per pod on the Windows node, use docker ps
- Two pods listed from the Linux master, use kubectl get pods
- Node-to-pod communication across the network, curl port 80 of your pod IPs from the Linux master to check for a web server response
- Pod-to-pod communication, ping between pods (and across hosts, if you have more than one Windows node) using docker exec or kubectl exec
- Service-to-pod communication, curl the virtual service IP (seen under kubectl get services) from the Linux master and from individual pods
- Service discovery, curl the service name with the Kubernetes default DNS suffix
- Inbound connectivity, curl the NodePort from the Linux master or machines outside of the cluster
- Outbound connectivity, curl external IPs from inside the pod using kubectl exec

Note: Windows container hosts are not able to access the IP of services scheduled on them due to current platform limitations of the Windows networking stack. Only Windows pods are able to access service IPs.

Observability

Capturing logs from workloads

Logs are an important element of observability; they enable users to gain insights into the operational aspect of workloads and are a key ingredient to troubleshooting issues. Because Windows containers and workloads inside Windows containers behave differently from Linux containers, users had a hard time collecting logs, limiting operational visibility. Windows workloads for example are usually configured to log to ETW (Event Tracing for Windows) or push entries to the application event log. LogMonitor, an open source tool by Microsoft, is the recommended way to monitor configured log sources inside a Windows container. LogMonitor supports monitoring event logs, ETW providers, and custom application logs, piping them to STDOUT for consumption by kubectl logs <pod>.

Follow the instructions in the LogMonitor GitHub page to copy its binaries and configuration files to all your containers and add the necessary entrypoints for LogMonitor to push your logs to STDOUT.

Using configurable Container usernames

Starting with Kubernetes v1.16, Windows containers can be configured to run their entrypoints and processes with different usernames than the image defaults. The way this is achieved is a bit different from the way it is done for Linux containers. Learn more about it here.

Managing Workload Identity with Group Managed Service Accounts

Starting with Kubernetes v1.14, Windows container workloads can be configured to use Group Managed Service Accounts (GMSA). Group Managed Service Accounts are a specific type of Active Directory account that provides automatic password management, simplified service principal name (SPN) management, and the ability to delegate the management to other administrators across multiple servers. Containers configured with a GMSA can access external Active Directory Domain resources while carrying the identity configured with the GMSA. Learn more about configuring and using GMSA for Windows containers here.

Taints and Tolerations

Users today need to use some combination of taints and node selectors in order to keep Linux and Windows workloads on their respective OS-specific nodes. This likely imposes a burden only on Windows users. The recommended approach is outlined below, with one of its main goals being that this approach should not break compatibility for existing Linux workloads.

Ensuring OS-specific workloads land on the appropriate container host

Users can ensure Windows containers can be scheduled on the appropriate host using Taints and Tolerations. All Kubernetes nodes today have the following default labels:

kubernetes.io/os = [windows|linux]
kubernetes.io/arch = [amd64|arm64|...]

If a Pod specification does not specify a nodeSelector like "kubernetes.io/os": windows, it is possible the Pod can be scheduled on any host, Windows or Linux. This can be problematic since a Windows container can only run on Windows and a Linux container can only run on Linux. The best practice is to use a nodeSelector.

However, we understand that in many cases users have a pre-existing large number of deployments for Linux containers, as well as an ecosystem of off-the-shelf configurations, such as community Helm charts, and programmatic Pod generation cases, such as with Operators. In those situations, you may be hesitant to make the configuration change to add nodeSelectors. The alternative is to use Taints. Because the kubelet can set Taints during registration, it could easily be modified to automatically add a taint when running on Windows only.

For example: --register-with-taints='os=windows:NoSchedule'

By adding a taint to all Windows nodes, nothing will be scheduled on them (that includes existing Linux Pods). In order for a Windows Pod to be scheduled on a Windows node, it would need both the nodeSelector to choose Windows, and the appropriate matching toleration.

nodeSelector:
    kubernetes.io/os: windows
    node.kubernetes.io/windows-build: '10.0.17763'
tolerations:
    - key: "os"
      operator: "Equal"
      value: "windows"
      effect: "NoSchedule"

Handling multiple Windows versions in the same cluster

The Windows Server version used by each pod must match that of the node. If you want to use multiple Windows Server versions in the same cluster, then you should set additional node labels and nodeSelectors.

Kubernetes 1.17 automatically adds a new label node.kubernetes.io/windows-build to simplify this. If you're running an older version, then it's recommended to add this label manually to Windows nodes.

This label reflects the Windows major, minor, and build number that need to match for compatibility. Here are values used today for each Windows Server version.

Product Name	Build Number(s)
Windows Server 2019	10.0.17763
Windows Server version 1809	10.0.17763
Windows Server version 1903	10.0.18362

Simplifying with RuntimeClass

RuntimeClass can be used to simplify the process of using taints and tolerations. A cluster administrator can create a RuntimeClass object which is used to encapsulate these taints and tolerations.

Save this file to runtimeClasses.yml. It includes the appropriate nodeSelector for the Windows OS, architecture, and version.

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: windows-2019
handler: 'docker'
scheduling:
  nodeSelector:
    kubernetes.io/os: 'windows'
    kubernetes.io/arch: 'amd64'
    node.kubernetes.io/windows-build: '10.0.17763'
  tolerations:
  - effect: NoSchedule
    key: os
    operator: Equal
    value: "windows"

Run kubectl create -f runtimeClasses.yml using as a cluster administrator
Add runtimeClassName: windows-2019 as appropriate to Pod specs

For example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: iis-2019
  labels:
    app: iis-2019
spec:
  replicas: 1
  template:
    metadata:
      name: iis-2019
      labels:
        app: iis-2019
    spec:
      runtimeClassName: windows-2019
      containers:
      - name: iis
        image: mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019
        resources:
          limits:
            cpu: 1
            memory: 800Mi
          requests:
            cpu: .1
            memory: 300Mi
        ports:
          - containerPort: 80
 selector:
    matchLabels:
      app: iis-2019
---
apiVersion: v1
kind: Service
metadata:
  name: iis
spec:
  type: LoadBalancer
  ports:
  - protocol: TCP
    port: 80
  selector:
    app: iis-2019

4 - Best practices

4.1 - Considerations for large clusters

A cluster is a set of nodes (physical or virtual machines) running Kubernetes agents, managed by the control plane. Kubernetes v1.22 supports clusters with up to 5000 nodes. More specifically, Kubernetes is designed to accommodate configurations that meet all of the following criteria:

No more than 100 pods per node
No more than 5000 nodes
No more than 150000 total pods
No more than 300000 total containers

You can scale your cluster by adding or removing nodes. The way you do this depends on how your cluster is deployed.

Cloud provider resource quotas

To avoid running into cloud provider quota issues, when creating a cluster with many nodes, consider:

Request a quota increase for cloud resources such as:
- Computer instances
- CPUs
- Storage volumes
- In-use IP addresses
- Packet filtering rule sets
- Number of load balancers
- Network subnets
- Log streams
Gate the cluster scaling actions to brings up new nodes in batches, with a pause between batches, because some cloud providers rate limit the creation of new instances.

Control plane components

For a large cluster, you need a control plane with sufficient compute and other resources.

Typically you would run one or two control plane instances per failure zone, scaling those instances vertically first and then scaling horizontally after reaching the point of falling returns to (vertical) scale.

You should run at least one instance per failure zone to provide fault-tolerance. Kubernetes nodes do not automatically steer traffic towards control-plane endpoints that are in the same failure zone; however, your cloud provider might have its own mechanisms to do this.

For example, using a managed load balancer, you configure the load balancer to send traffic that originates from the kubelet and Pods in failure zone A, and direct that traffic only to the control plane hosts that are also in zone A. If a single control-plane host or endpoint failure zone A goes offline, that means that all the control-plane traffic for nodes in zone A is now being sent between zones. Running multiple control plane hosts in each zone makes that outcome less likely.

etcd storage

To improve performance of large clusters, you can store Event objects in a separate dedicated etcd instance.

When creating a cluster, you can (using custom tooling):

start and configure additional etcd instance
configure the API server to use it for storing events

Addon resources

Kubernetes resource limits help to minimize the impact of memory leaks and other ways that pods and containers can impact on other components. These resource limits apply to addon resources just as they apply to application workloads.

For example, you can set CPU and memory limits for a logging component:

  ...
  containers:
  - name: fluentd-cloud-logging
    image: fluent/fluentd-kubernetes-daemonset:v1
    resources:
      limits:
        cpu: 100m
        memory: 200Mi

Addons' default limits are typically based on data collected from experience running each addon on small or medium Kubernetes clusters. When running on large clusters, addons often consume more of some resources than their default limits. If a large cluster is deployed without adjusting these values, the addon(s) may continuously get killed because they keep hitting the memory limit. Alternatively, the addon may run but with poor performance due to CPU time slice restrictions.

To avoid running into cluster addon resource issues, when creating a cluster with many nodes, consider the following:

Some addons scale vertically - there is one replica of the addon for the cluster or serving a whole failure zone. For these addons, increase requests and limits as you scale out your cluster.
Many addons scale horizontally - you add capacity by running more pods - but with a very large cluster you may also need to raise CPU or memory limits slightly. The VerticalPodAutoscaler can run in recommender mode to provide suggested figures for requests and limits.
Some addons run as one copy per node, controlled by a DaemonSet: for example, a node-level log aggregator. Similar to the case with horizontally-scaled addons, you may also need to raise CPU or memory limits slightly.

What's next

VerticalPodAutoscaler is a custom resource that you can deploy into your cluster to help you manage resource requests and limits for pods.
Visit Vertical Pod Autoscaler to learn more about VerticalPodAutoscaler and how you can use it to scale cluster components, including cluster-critical addons.

The cluster autoscaler integrates with a number of cloud providers to help you run the right number of nodes for the level of resource demand in your cluster.

4.2 - Running in multiple zones

This page describes running Kubernetes across multiple zones.

Background

Kubernetes is designed so that a single Kubernetes cluster can run across multiple failure zones, typically where these zones fit within a logical grouping called a region. Major cloud providers define a region as a set of failure zones (also called availability zones) that provide a consistent set of features: within a region, each zone offers the same APIs and services.

Typical cloud architectures aim to minimize the chance that a failure in one zone also impairs services in another zone.

Control plane behavior

All control plane components support running as a pool of interchangeable resources, replicated per component.

When you deploy a cluster control plane, place replicas of control plane components across multiple failure zones. If availability is an important concern, select at least three failure zones and replicate each individual control plane component (API server, scheduler, etcd, cluster controller manager) across at least three failure zones. If you are running a cloud controller manager then you should also replicate this across all the failure zones you selected.

Note: Kubernetes does not provide cross-zone resilience for the API server endpoints. You can use various techniques to improve availability for the cluster API server, including DNS round-robin, SRV records, or a third-party load balancing solution with health checking.

Node behavior

Kubernetes automatically spreads the Pods for workload resources (such as Deployment or StatefulSet) across different nodes in a cluster. This spreading helps reduce the impact of failures.

When nodes start up, the kubelet on each node automatically adds labels to the Node object that represents that specific kubelet in the Kubernetes API. These labels can include zone information.

If your cluster spans multiple zones or regions, you can use node labels in conjunction with Pod topology spread constraints to control how Pods are spread across your cluster among fault domains: regions, zones, and even specific nodes. These hints enable the scheduler to place Pods for better expected availability, reducing the risk that a correlated failure affects your whole workload.

For example, you can set a constraint to make sure that the 3 replicas of a StatefulSet are all running in different zones to each other, whenever that is feasible. You can define this declaratively without explicitly defining which availability zones are in use for each workload.

Distributing nodes across zones

Kubernetes' core does not create nodes for you; you need to do that yourself, or use a tool such as the Cluster API to manage nodes on your behalf.

Using tools such as the Cluster API you can define sets of machines to run as worker nodes for your cluster across multiple failure domains, and rules to automatically heal the cluster in case of whole-zone service disruption.

Manual zone assignment for Pods

You can apply node selector constraints to Pods that you create, as well as to Pod templates in workload resources such as Deployment, StatefulSet, or Job.

Storage access for zones

When persistent volumes are created, the PersistentVolumeLabel admission controller automatically adds zone labels to any PersistentVolumes that are linked to a specific zone. The scheduler then ensures, through its NoVolumeZoneConflict predicate, that pods which claim a given PersistentVolume are only placed into the same zone as that volume.

You can specify a StorageClass for PersistentVolumeClaims that specifies the failure domains (zones) that the storage in that class may use. To learn about configuring a StorageClass that is aware of failure domains or zones, see Allowed topologies.

Networking

By itself, Kubernetes does not include zone-aware networking. You can use a network plugin to configure cluster networking, and that network solution might have zone-specific elements. For example, if your cloud provider supports Services with type=LoadBalancer, the load balancer might only send traffic to Pods running in the same zone as the load balancer element processing a given connection. Check your cloud provider's documentation for details.

For custom or on-premises deployments, similar considerations apply. Service and Ingress behavior, including handling of different failure zones, does vary depending on exactly how your cluster is set up.

Fault recovery

When you set up your cluster, you might also need to consider whether and how your setup can restore service if all the failure zones in a region go off-line at the same time. For example, do you rely on there being at least one node able to run Pods in a zone?
Make sure that any cluster-critical repair work does not rely on there being at least one healthy node in your cluster. For example: if all nodes are unhealthy, you might need to run a repair Job with a special toleration so that the repair can complete enough to bring at least one node into service.

Kubernetes doesn't come with an answer for this challenge; however, it's something to consider.

What's next

To learn how the scheduler places Pods in a cluster, honoring the configured constraints, visit Scheduling and Eviction.

4.3 - Validate node setup

Node Conformance Test

Node conformance test is a containerized test framework that provides a system verification and functionality test for a node. The test validates whether the node meets the minimum requirements for Kubernetes; a node that passes the test is qualified to join a Kubernetes cluster.

Node Prerequisite

To run node conformance test, a node must satisfy the same prerequisites as a standard Kubernetes node. At a minimum, the node should have the following daemons installed:

Container Runtime (Docker)
Kubelet

Running Node Conformance Test

To run the node conformance test, perform the following steps:

Work out the value of the --kubeconfig option for the kubelet; for example: --kubeconfig=/var/lib/kubelet/config.yaml. Because the test framework starts a local control plane to test the kubelet, use http://localhost:8080 as the URL of the API server. There are some other kubelet command line parameters you may want to use:

--pod-cidr: If you are using kubenet, you should specify an arbitrary CIDR to Kubelet, for example --pod-cidr=10.180.0.0/24.
--cloud-provider: If you are using --cloud-provider=gce, you should remove the flag to run the test.

Run the node conformance test with command:

# $CONFIG_DIR is the pod manifest path of your Kubelet.
# $LOG_DIR is the test output path.
sudo docker run -it --rm --privileged --net=host \
  -v /:/rootfs -v $CONFIG_DIR:$CONFIG_DIR -v $LOG_DIR:/var/result \
  k8s.gcr.io/node-test:0.2

Running Node Conformance Test for Other Architectures

Kubernetes also provides node conformance test docker images for other architectures:

Arch	Image
amd64	node-test-amd64
arm	node-test-arm
arm64	node-test-arm64

Running Selected Test

To run specific tests, overwrite the environment variable FOCUS with the regular expression of tests you want to run.

sudo docker run -it --rm --privileged --net=host \
  -v /:/rootfs:ro -v $CONFIG_DIR:$CONFIG_DIR -v $LOG_DIR:/var/result \
  -e FOCUS=MirrorPod \ # Only run MirrorPod test
  k8s.gcr.io/node-test:0.2

To skip specific tests, overwrite the environment variable SKIP with the regular expression of tests you want to skip.

sudo docker run -it --rm --privileged --net=host \
  -v /:/rootfs:ro -v $CONFIG_DIR:$CONFIG_DIR -v $LOG_DIR:/var/result \
  -e SKIP=MirrorPod \ # Run all conformance tests but skip MirrorPod test
  k8s.gcr.io/node-test:0.2

Node conformance test is a containerized version of node e2e test. By default, it runs all conformance tests.

Theoretically, you can run any node e2e test if you configure the container and mount required volumes properly. But it is strongly recommended to only run conformance test, because it requires much more complex configuration to run non-conformance test.

Caveats

The test leaves some docker images on the node, including the node conformance test image and images of containers used in the functionality test.
The test leaves dead containers on the node. These containers are created during the functionality test.

4.4 - PKI certificates and requirements

Kubernetes requires PKI certificates for authentication over TLS. If you install Kubernetes with kubeadm, the certificates that your cluster requires are automatically generated. You can also generate your own certificates -- for example, to keep your private keys more secure by not storing them on the API server. This page explains the certificates that your cluster requires.

How certificates are used by your cluster

Kubernetes requires PKI for the following operations:

Client certificates for the kubelet to authenticate to the API server
Server certificate for the API server endpoint
Client certificates for administrators of the cluster to authenticate to the API server
Client certificates for the API server to talk to the kubelets
Client certificate for the API server to talk to etcd
Client certificate/kubeconfig for the controller manager to talk to the API server
Client certificate/kubeconfig for the scheduler to talk to the API server.
Client and server certificates for the front-proxy

Note: front-proxy certificates are required only if you run kube-proxy to support an extension API server.

etcd also implements mutual TLS to authenticate clients and peers.

Where certificates are stored

If you install Kubernetes with kubeadm, certificates are stored in /etc/kubernetes/pki. All paths in this documentation are relative to that directory.

Configure certificates manually

If you don't want kubeadm to generate the required certificates, you can create them in either of the following ways.

Single root CA

You can create a single root CA, controlled by an administrator. This root CA can then create multiple intermediate CAs, and delegate all further creation to Kubernetes itself.

Required CAs:

path	Default CN	description
ca.crt,key	kubernetes-ca	Kubernetes general CA
etcd/ca.crt,key	etcd-ca	For all etcd-related functions
front-proxy-ca.crt,key	kubernetes-front-proxy-ca	For the front-end proxy

On top of the above CAs, it is also necessary to get a public/private key pair for service account management, sa.key and sa.pub.

All certificates

If you don't wish to copy the CA private keys to your cluster, you can generate all certificates yourself.

Required certificates:

Default CN	Parent CA	O (in Subject)	kind	hosts (SAN)
kube-etcd	etcd-ca		server, client	`localhost`, `127.0.0.1`
kube-etcd-peer	etcd-ca		server, client	`<hostname>`, `<Host_IP>`, `localhost`, `127.0.0.1`
kube-etcd-healthcheck-client	etcd-ca		client
kube-apiserver-etcd-client	etcd-ca	system:masters	client
kube-apiserver	kubernetes-ca		server	`<hostname>`, `<Host_IP>`, `<advertise_IP>`, `[1]`
kube-apiserver-kubelet-client	kubernetes-ca	system:masters	client
front-proxy-client	kubernetes-front-proxy-ca		client

[1]: any other IP or DNS name you contact your cluster on (as used by kubeadm the load balancer stable IP and/or DNS name, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster, kubernetes.default.svc.cluster.local)

where kind maps to one or more of the x509 key usage types:

kind	Key usage
server	digital signature, key encipherment, server auth
client	digital signature, key encipherment, client auth

Note: Hosts/SAN listed above are the recommended ones for getting a working cluster; if required by a specific setup, it is possible to add additional SANs on all the server certificates.

Note:
For kubeadm users only:

The scenario where you are copying to your cluster CA certificates without private keys is referred as external CA in the kubeadm documentation.

If you are comparing the above list with a kubeadm generated PKI, please be aware that kube-etcd, kube-etcd-peer and kube-etcd-healthcheck-client certificates are not generated in case of external etcd.

Certificate paths

Certificates should be placed in a recommended path (as used by kubeadm). Paths should be specified using the given argument regardless of location.

Default CN	recommended key path	recommended cert path	command	key argument	cert argument
etcd-ca	etcd/ca.key	etcd/ca.crt	kube-apiserver		--etcd-cafile
kube-apiserver-etcd-client	apiserver-etcd-client.key	apiserver-etcd-client.crt	kube-apiserver	--etcd-keyfile	--etcd-certfile
kubernetes-ca	ca.key	ca.crt	kube-apiserver		--client-ca-file
kubernetes-ca	ca.key	ca.crt	kube-controller-manager	--cluster-signing-key-file	--client-ca-file, --root-ca-file, --cluster-signing-cert-file
kube-apiserver	apiserver.key	apiserver.crt	kube-apiserver	--tls-private-key-file	--tls-cert-file
kube-apiserver-kubelet-client	apiserver-kubelet-client.key	apiserver-kubelet-client.crt	kube-apiserver	--kubelet-client-key	--kubelet-client-certificate
front-proxy-ca	front-proxy-ca.key	front-proxy-ca.crt	kube-apiserver		--requestheader-client-ca-file
front-proxy-ca	front-proxy-ca.key	front-proxy-ca.crt	kube-controller-manager		--requestheader-client-ca-file
front-proxy-client	front-proxy-client.key	front-proxy-client.crt	kube-apiserver	--proxy-client-key-file	--proxy-client-cert-file
etcd-ca	etcd/ca.key	etcd/ca.crt	etcd		--trusted-ca-file, --peer-trusted-ca-file
kube-etcd	etcd/server.key	etcd/server.crt	etcd	--key-file	--cert-file
kube-etcd-peer	etcd/peer.key	etcd/peer.crt	etcd	--peer-key-file	--peer-cert-file
etcd-ca		etcd/ca.crt	etcdctl		--cacert
kube-etcd-healthcheck-client	etcd/healthcheck-client.key	etcd/healthcheck-client.crt	etcdctl	--key	--cert

Same considerations apply for the service account key pair:

private key path	public key path	command	argument
sa.key		kube-controller-manager	--service-account-private-key-file
	sa.pub	kube-apiserver	--service-account-key-file

Configure certificates for user accounts

You must manually configure these administrator account and service accounts:

filename	credential name	Default CN	O (in Subject)
admin.conf	default-admin	kubernetes-admin	system:masters
kubelet.conf	default-auth	system:node:`<nodeName>` (see note)	system:nodes
controller-manager.conf	default-controller-manager	system:kube-controller-manager
scheduler.conf	default-scheduler	system:kube-scheduler

Note: The value of <nodeName> for kubelet.conf must match precisely the value of the node name provided by the kubelet as it registers with the apiserver. For further details, read the Node Authorization.

For each config, generate an x509 cert/key pair with the given CN and O.
Run kubectl as follows for each config:

KUBECONFIG=<filename> kubectl config set-cluster default-cluster --server=https://<host ip>:6443 --certificate-authority <path-to-kubernetes-ca> --embed-certs
KUBECONFIG=<filename> kubectl config set-credentials <credential-name> --client-key <path-to-key>.pem --client-certificate <path-to-cert>.pem --embed-certs
KUBECONFIG=<filename> kubectl config set-context default-system --cluster default-cluster --user <credential-name>
KUBECONFIG=<filename> kubectl config use-context default-system

These files are used as follows:

filename	command	comment
admin.conf	kubectl	Configures administrator user for the cluster
kubelet.conf	kubelet	One required for each node in the cluster.
controller-manager.conf	kube-controller-manager	Must be added to manifest in `manifests/kube-controller-manager.yaml`
scheduler.conf	kube-scheduler	Must be added to manifest in `manifests/kube-scheduler.yaml`

Getting started

Learning environment

Production environment

1 - Release notes and version skew

1.1 - v1.21 Release Notes

v1.21.0

Downloads for v1.21.0

Source Code

Client Binaries

Server Binaries

Node Binaries

Changelog since v1.20.0

What's New (Major Themes)

Deprecation of PodSecurityPolicy

Kubernetes API Reference Documentation

Kustomize Updates in Kubectl

Default Container Labels

Immutable Secrets and ConfigMaps

Structured Logging in Kubelet

Storage Capacity Tracking

Generic Ephemeral Volumes

CSI Service Account Token

CSI Health Monitoring

Known Issues

TopologyAwareHints feature falls back to default behavior

Urgent Upgrade Notes

(No, really, you MUST read this before you upgrade)

Changes by Kind

Deprecation

API Change

Feature

Documentation

Failing Test

Bug or Regression

Other (Cleanup or Flake)

Uncategorized

Dependencies

Added

Changed

Removed

v1.21.0-rc.0

Downloads for v1.21.0-rc.0

Source Code

Client binaries

Server binaries

Node binaries

Changelog since v1.21.0-beta.1

Urgent Upgrade Notes

(No, really, you MUST read this before you upgrade)

Changes by Kind

API Change

Feature

Bug or Regression

Other (Cleanup or Flake)

Dependencies

Added

Changed

Removed

v1.21.0-beta.1

Downloads for v1.21.0-beta.1

Source Code

Client binaries

Server binaries

Node binaries

Changelog since v1.21.0-beta.0

Urgent Upgrade Notes

(No, really, you MUST read this before you upgrade)

Changes by Kind

Deprecation

API Change

Feature

Bug or Regression

Other (Cleanup or Flake)

Uncategorized

Dependencies

Added

Changed

Removed

v1.21.0-beta.0

Downloads for v1.21.0-beta.0

`TopologyAwareHints` feature falls back to default behavior