• trjordan 2 months ago

    I think this blog post is one turn of the crank away from a truth we're all about to learn: don't hand roll your own Kubernetes ingress.

    Dealing with the traffic handling between your users and your code is not a trivial problem. Like all good ops problems, you can fix it with good tools, deep knowledge of those tools, fine-grained observability, and smart people running all that.

    This has been the recipe for a couple of really successful SaaS offerings. Individual servers? Datadog. CDN? Akamai / Fastly.

    Disclaimer: I work at one of those companies, Turbine Labs, and we're trying to make ingress better. Here's a presentation from our CEO on Kubernetes ingress, and why the specification creates the problems that this blog post is trying to fix. https://www.slideshare.net/mobile/MarkMcBride11/beyond-ingre...

  • odammit 2 months ago

    This is a great read. I know the single cluster for all env is something that is sort of popular but it's always made me uncomfortable for the reasons stated in the article but also for handling kube upgrades. I'd like to give upgrades a swing on a staging server ahead of time rather than go straight to prod or building out a cluster to test an upgrade on.

    I tend to keep my staging and prod clusters identical, even names of services (no prod-web and stage-web, just web).

    I'll set them up in different AWS accounts to clearly separate them and the only difference they have is the DNS name of the cluster and who can access them.

    Edit: I suck at italicizing and grammar.

  • web007 2 months ago

    +100 to this. Why would any sane Op/Inf/SRE choose not to have at least account-level isolation - is it only a matter of cost due to under-utilization?

    I prefer to have everything 100% isolated for dev / qa / stage / prod, and have process and tooling in place to explicitly cross the streams. This comes from a history of pain with random dev-to-prod (or worse, prod-to-dev) access and dealing with "real companies" with things like audit requirements.

    Having them separate lets you do things like @odammit suggests, upgrade your cluster in staging without affecting your developers or customers.

    If you don't want to go that far, you can set up separate AWS accounts that are all tied together via an organization, and you can set up IAM roles and whatnot to share your API keys between accounts. That gives you at least some isolation, but still lets you GSD the same way as if you have a single account.

  • toomuchtodo 2 months ago

    > and you can set up IAM roles and whatnot to share your API keys between accounts. That gives you at least some isolation, but still lets you GSD the same way as if you have a single account.

    Do not do this. You are defeating the purpose of account level separation if you're sharing API keys between accounts. Each AWS environment should be totally segregated from the others (cross-account IAM permissions only if you must), limiting the blast radius in the event of human error or a malicious actor.

    Source: Previously did devops/infra for 6 years, currently doing security

  • danielmartins 2 months ago

    > Why would any sane Op/Inf/SRE choose not to have at least account-level isolation - is it only a matter of cost due to under-utilization?

    In our particular case, yes, pretty much. We are a small company with a small development team, so even if I would want to split accounts to different teams, we would end up having one account for 2-3 users, which doesn't make a lot of sense now.

  • danielmartins 2 months ago

    > This is a great read. I know the single cluster for all env is something that is sort of popular but it's always made me uncomfortable for the reasons stated in the article but also for handling kube upgrades. I'd like to give upgrades a swing on a staging server ahead of time rather than go straight to prod or building out a cluster to test an upgrade on.

    I've been doing patch-level upgrades in-place since the beginning, and never had a problem. For more sensitive upgrades, this is what I do: create a new cluster using based on the current state in order to test the upgrade in a safe environment before applying it to production.

    And for even more risky upgrades, I go blue/green-like by creating a new cluster with the same stuff running in it, and gradually shifting traffic to the new cluster.

  • hltbra 2 months ago

    Cool read. I don't use Kubernetes but I learned a few things from this blog post that are applicable to my ECS environment.

    The NGINX config part is tricky and it didn't come to mind that many programs will try to be smart about machine resources and it won't work in the container world as expected. This was a good reminder. OP didn't mention what Linux distro he's using and what are all of the OS-level configs he changed in the end of the day; I'd like to see that (was there any config not mentioned in the post?).

    It's awesome that OP had lots of monitoring to guide him through the problem discovery and experimentation. I need more of this in my ECS setup. I didn't hop on the Prometheus train yet, by the way.

  • danielmartins 2 months ago

    > OP didn't mention what Linux distro he's using and what are all of the OS-level configs he changed in the end of the day.

    I'm using Container Linux, and yes, I did a few modifications, but I intentionally left them out of the blog post as someone would be tempted to use them as-is.

    I'll share more details in that regard if more people seem interested.

  • robszumski 2 months ago

    I'd be interested to hear more.

  • hardwaresofton 2 months ago

    Shameless plug! The insights in this article are pretty deep but if you're looking for just a clumsy step 1 to setting up the NGINX ingress controller on Kubernetes, check out what I wrote:

    https://vadosware.io/post/serving-http-applications-on-kuber...

    The most important thing that I found out while working on the NGINX controller was that you can just jump into it and do some debugging by poking around at the NGINX configuration that's inside it. There's no insight in there as deep as what's in this article, but for those that are maybe new to Kubernetes, hope it's helpful!

  • 2 months ago
    [deleted]
  • Thaxll 2 months ago

    "Most Linux distributions do not provide an optimal configuration for running high load web servers out-of-the-box; double-check the values for each kernel param via sysctl -a."

    This is not true, if you run Debian / CentOS7 / Ubuntu, out of the box the settings are good. The thing you don't want to do is start to modify the network stack by reading random blogs.

  • danielmartins 2 months ago

    > This is not true, if you run Debian / CentOS7 / Ubuntu, out of the box the settings are good. The thing you don't want to do is start to modify the network stack by reading random blogs.

    I agree these are good defaults, but they are not meant to work well for all kinds of workloads. And yes, if things are working for you they way they are, that's okay; there's no need to change anything.

    On the other hand, I personally don't know anyone who runs production servers of any kind on top of unmodified Linux distros.

  • tinix 2 months ago

    > On the other hand, I personally don't know anyone who runs production servers of any kind on top of unmodified Linux distros.

    You are so, so so lucky... lol. I say that as someone who has come across a desktop CentOS install on a server on multiple occasions, complete with running x-org and like 3-4 desktop environments to choose from, along with ALL of the extras. KDE office apps, Gnome's office apps, etc... HORRIBLE.

  • zrth 2 months ago

    Sounds interesting! Do you have urls for more information about this? Would love to read good posts about that! My production servers have been running with standard parameters at every company so far. I feel I might be missing out!

  • manigandham 2 months ago

    > high load web servers

    Really? The distributions might work for the average site but high-load always requires tuning from the defaults on even the latest distros.

  • manigandham 2 months ago

    NGINX also has their own ingress controller (in addition to the kubernetes community version): https://github.com/nginxinc/kubernetes-ingress

  • ultimoo 2 months ago

    Great read!

    >> "Let me start by saying that if you are not alerting on accept queue overflows, well, you should."

    Does anyone know how to effectively keep a tab on this on a docker container running nginx open source? I have an external log/metrics monitoring server that could alert on this, but I'm asking more on the lines of how to get this information to the monitoring server.

  • foxylion 2 months ago

    In this case (ingress controller) this is done with a Prometheus metric exporter. So all the metrics are available in Prometheus.

    https://github.com/hnlq715/nginx-vts-exporter

  • zaroth 2 months ago

    It sounded like there's a config directive to have Ingress Controller push all its metrics into Prometheus?

  • guslees 2 months ago

    If it's helpful at all, here's a concrete example of a k8s nginx setup that exports-to/is-monitored-by prometheus: https://github.com/bitnami/kube-manifests/blob/master/common... (Start at https://engineering.bitnami.com/articles/an-example-of-real-... if you would prefer to approach that repo top-down)

  • zaroth 2 months ago

    Am I correct in assuming that there is the Kube Service IP routing happening via iptables DNAT to get the request into the Kube running the Ingress Controller, and then the Ingress Controller is on top of that routing traffic to another Service IP which also has to go through the iptables DNAT?

  • danielmartins 2 months ago

    No. By default, the NGINX ingress controller routes traffic directly to pod IPs (the Service endpoints):

    https://github.com/kubernetes/ingress/tree/master/controller...

  • zaroth 2 months ago

    Thank you. So there is a DNAT to get to the Ingress Controller but from there at least it's direct routing to the service endpoint(s)? Does that mean the Virtual IP given to the Service is basically bypassed when using Ingress Controller?

    TLS termination at the Ingress Controller and by default unencrypted from there to the service endpoint?

    I found this useful: http://blog.wercker.com/troubleshooting-ingress-kubernetes

    Interesting discussion here: https://github.com/kubernetes/ingress/issues/257

    It seems like a lot of overhead before even starting to process a request!

  • danielmartins 2 months ago

    > TLS termination at the Ingress Controller and by default unencrypted from there to the service endpoint?

    We are doing TLS termination at the ELB (we're running on AWS).

    > Interesting discussion here: https://github.com/kubernetes/ingress/issues/257

    Great, thanks!

    Regarding ways of updating of the NGINX upstreams without requiring a reload, I was just made aware of modules like ngx_dynamic_upstream[1]. I'm sure there are other ways to address this in a less disruptive way than reloading everything, so this is probably something that could be improved in the future.

    [1] https://github.com/cubicdaiya/ngx_dynamic_upstream

  • gtirloni 2 months ago

    May I ask how you are automating the ELB/TLS configuration and how that ties into the Ingress controller? Do you somehow specify which ELB it should use? We're in a similar situation.

  • danielmartins 2 months ago

    You can annotate any Service of type LoadBalancer in order to configure various aspects[1] of the associated ELB, including which ACM-managed certificate you want to attach to each listener port.

    [1] https://github.com/kubernetes/kubernetes/blob/master/pkg/clo...

  • gtirloni 2 months ago

    Thanks a lot, this will save us quite some time.

  • rjcaricio 2 months ago

    Thanks for sharing your experience. I've got great insights to double check in my current environment.

    Could you share which version of NGINX you found the issue with the reloads? Which version the fix was released?

    PS.: I find it interesting/brave that you use a single cluster for several environments.

  • danielmartins 2 months ago

    > Could you share which version of NGINX you found the issue with the reloads? Which version the fix was released?

    I'm using 0.9.0-beta.13. I first reported this issue in a NGINX ingress PR[1], so the last couple of releases are not suffering from the bug I reported in the blog post.

    > I find it interesting/brave that you use a single cluster for several environments.

    I'm not working for a big corporation, so dev/staging/prod "environments" are just three deployment pipelines to the same infrastructure.

    As of now, things are running smoothly as they are, but I might as well use different clusters for each environment in the future.

    [1] https://github.com/kubernetes/ingress/pull/1088

  • tostaki 2 months ago

    Great read! Especially the part on ingress class which I didn't know about. Would you mind sharing some of your grafana dashboards?

  • mindfulmonkey 2 months ago

    I still don't really understand the benefit of an Ingress controller versus just a Service > Nginx Deployment.

  • zimbatm 2 months ago

    It's the most confusing part of Kubernetes IMO. It's a load-balancer with a very restricted feature set so what is it good for?

    The main issue it tries to solve is how to get traffic from outside of the cluster to inside. The ingress resource is also supposed to be orthogonal to the ingress controller so that if your app is deployed on AWS or GCP (in practice it's not true though).

    With the nginx ingress controller the main advantage I see is that you can share the port 80 on the nodes between multiple Ingress resources.

  • sandGorgon 2 months ago

    ingress+overlay network confusion was the reason why we moved from k8s to Docker Swarmkit.

    I still keep hoping for kubernetes kompose (https://github.com/kubernetes/kompose) to bring the simplicity of Docker Swarmkit to k8s.

    Or will Docker Infrakit bring creeping sophistication first and eat kuberentes lunch ? (https://github.com/docker/infrakit/pull/601)

  • fulafel 2 months ago

    Why does everyone use reverse proxies? It seems complex and inefficient. Why not serve xhr's and other dynamic content from the app server(s) and static content from a static webserver?

  • odammit 2 months ago

    Off the top of my head: load balancing, hiding details of app servers, compressing responses and multivariate testing.

    All of which could be done at the app server level sure, but then that would shift that complexity to your app and your developers.

    Oh and job security, obviously.

  • fulafel 2 months ago

    You could do all of those, except hiding app servers, with the client based technique I outlined in the nearby other comment. It would just be a tweak to the rule that the frontend uses to choose the the app server.

  • manigandham 2 months ago

    That's a simplistic scenario and does not apply at all here. Kubernetes is a container orchestration platform that can run thousands of containers over thousands of compute nodes and directing traffic to them will require some sort of routing/proxy system.

  • fulafel 2 months ago

    We already have routing systems for large numbers of nodes in the internet technology stack, it's not obvious to me why we another one on the HTTP layer.

  • manigandham 2 months ago

    Many of those routing systems are proxies, and they can apply at any layer.

  • fulafel 2 months ago

    I'm not sure I follow you. Do you mean that the routing systems at the lower networking layers can be thought of as proxies in the sense that they copy data in and copy data out? That's technically correct, but they're not conventionally called proxies.

  • manigandham 2 months ago

    Proxies, by definition, are intermediaries. Everything on the internet is connected by a giant network of proxies at some layer - NATs, gateways, firewalls, etc.

    Kubernetes runs a cluster of machines that act like a mini internet, with many containers running many apps. These apps communicate with each other across containers and machines through a series of proxies so that apps only have to worry about a single address or service name. Kubernetes does allow for headless services which will publish all of the pod IPs under a DNS name if you want but this is usually not the common scenario.

    Beyond just knowing endpoints, apps may need to worry about healthchecking, failover, load balancing, rate limiting, security, observability, routing decisions and more. It's far simpler to consolidate all this functionality rather than leaving it to every single app to implement it all over again.

    An ingress controller is in charge of running a specific proxy that deals with traffic into and out of the cluster rather than within it. There are several implementations other than nginx and they all require various levels of tuning to fit the needs of the cluster, but it's an optimal solution since you might not want or have access to control traffic on the other side.

  • philipcristiano 2 months ago

    What would you use to provide a single endpoint to multiple instances of an app server?

  • endorphone 2 months ago

    There are scenarios where your app servers might be varied as well -- I've leveraged reverse proxies in front of a PHP application that had parts in .NET and parts in Go, for instance.

    Technologies/competencies change as projects evolve, and being able to effortlessly reorganized and reroute is so profoundly powerful.

  • fulafel 2 months ago

    Sure, I'm symphatetic to this kind of "in the trenches" application of reverse proxies - just not doing it by default.

  • fulafel 2 months ago

    What about just exposing the multiple instances of app server, and have the frontend code select one for load balancing or failover purpouses? There could be a load balancing config read by the client, or you can have static rules in the frontend js, like choosing shard number based on a hash from the client ip address.

    Round-robin DNS might also work or complement this.

  • manigandham 2 months ago

    So you're answer to not using a reverse proxy is to fake your own via client-side logic? It's far better to have a tested, reliable, dynamic, and scalable solution right next to the actual app servers instead.

    Almost everything on the internet is behind layers of proxies, it's not a bad thing and isn't much cause for concern.

  • fulafel 2 months ago

    I think your viewpoint might be somewhat inflexible if routing logic in the client & server looks like "faking a reverse proxy" to you. That's where the rest of the logic is, after all, and when designing systems we generally prefer to have the logic in fewer places.

    It's a proven design rule (the end-to-end principle) to prefer the smarts at the edges of your system, and the problems stemming from the reverse proxy described in the article, in my book, counts as further evidence for this idea.

  • manigandham 2 months ago

    > prefer the smarts at the edges of your system

    That's exactly what reverse proxies do - leaving the internal apps free to just serve requests instead of worrying about the perimeter.

    The problems described in this article have nothing to do with reverse proxies but rather the ingress controller and config settings.

  • fulafel 2 months ago

    Reverse proxies making routing decisions based on request content and making server load balancing decisions is putting policy in the middle.