All Articles

Weekly Changelog @ Pixeesoft

Please support us by subscribing to our channel! Thanks a lot! 👏

Welcome

Hello and welcome to our Weekly Changelog mapping out everything that happened last week. Here we summarize everything and link all useful content that you might have missed.

This week was really an eventful one - we continued with our work on the 3rd Sprint outlined in its Sprint Planning. The focus remained on the Python services and it went pretty well. But one thing at a time - so let’s get started!

First of all, we are FIRED UP because we will be going to Firebase Summit 🔥 in Madrid, Spain 🇪🇸 this September!

Firebase Summit 2019

If you’re around send a DM or an e-mail and we can collab and network! 🤝

Standup 2

We are in the middle of the neverending DevOps battle - how to get this thing up-and-running? The problem is that the Python code does not have any health check endpoints. On one port exposes WebSocket interface and that’s it. Since the Google L7 Load Balancer requires a health check, we’re in a very perplexing situation:

A very perplexing situation

“The health of each backend instance is verified using an HTTP health check, an HTTPS health check, or an HTTP/2 health check.” — documentation

As I struggled with the deployment, I attempted to expose a Flask application on the same port, but as it is an application per process, it didn’t work either.

NGINX comes to the rescue, though! According to their elevator pitch, NGINX Ingress Controller supports WebSockets right out of the box and does not require external health checks to see that a service is up and running. This might be the way to go once the Dockerfile is ready. Currently it’s a work in progress - I have spent a lot of time reading the code, getting my head around how things are structured.

Two key things have come to light:

  1. gevent is the main coroutine package used for the development of all the Python services.
  2. Memcached server will have to be deployed as it is used as an async communicator for multiple services. Something like a message queue, but more lightweight (at the time of development).

With that in mind, some issues have been rearranged and on we go!

Standup 3

Thanks to Anaconda, we were able to advance quickly and efficiently in packaging the Python code in a smallest possible container image. I managed to capture all necessary dependencies in requirements.txt and now it’s possible to reproduce the image on demand. There were some changes in the Python code, mainly for logging and also constants, such as URLs and IP addresses had to change.

All of the changes need to properly be tested in the new environment, so that will be the goal for today - taking the NGINX Ingress Controller, deploying it in a small cluster and see how it behaves.

A nice thing I found out is that Swift Object Storage and MongoDB don’t need to be deployed at this stage (Data Feeder) if there are no edge devices registered in the database - lazy loading FTW!

Standup 4

In the previous standup I mentioned that I’ll be working on deployment of the service in a Kubernetes cluster with NGINX Ingress Controller. I’m stoked to announce that it is DONE! However, there were (as usual) obstacles along the way.

Python specific obstacles

Because the code is a bit…rusty, there are some issues that just don’t fall into the category “yeah I’ve seen this happen before”. The packages gevent and greenlet had some problems with SSLv3, so I installed a newer version, but it ended up being too new and kept crashing. After some debugging and digging out some fossils on GitHub, I found out that in fact I had to download a version in between that fixed this issue with SSLv23. Mind. Blown. But yeah, it works now.

Packaging specific obstacles

An hour of productive debugging - not reaching the docker container at all, because the service was exposed on localhost, rather than 0.0.0.0 inside the Docker container.

If I never came across this ever in my life before, it would be understandable to miss this out. But I did. And I even said it LIVE 🔴 on one of our standups!

Facepalm

Take. Your. Own. Advice. Michal!

NGINX specific obstacles

Google Managed SSL Certificates cannot be provisioned with NGINX Ingress controller. This is because we’re no longer using the L7 Load Balancer and therefore we are unable to use a global IP address. Therefore, we had to deploy Jetstack’s cert-manager.

But fortunately I managed to overcome all of the above and now back to integration testing to see if all the services are working well together!

Standup 5

Previously we talked about getting the Python component deployed and how we had to face some obstacles. Well, I am happy to announce that we faced some new ones, yet again!

As you know - we currently have no service discovery in place and we absolutely do not have any sort of versioning of contracts between services. It’s all just an opaque set of services.

So the Python component got deployed, but wouldn’t talk to other services! Some constants changed, some ACL had to be modified to suit new domains, but that just wouldn’t be enough…

At one point I tried looking into the pod to see how the API service is caching user access tokens, but the short lived ones are stored in attached emptyDir volume that I couldn’t find out a way to access:

Slack emptyDir

Thank goodness I didn’t actually follow up on this issue, because it was a dead end. I found out by discovering a new problem! The iOS client stopped working. As I wanted to test the service integration out, I opted for digging out yet another fossil - my old iOS code I haven’t touched since March 2015.

Hello? Is that Objective-C? Yeah, 2015 called.

The whole problem lied in the fact that there was some major differences in the way the tokens were cached - if the service sent a flag in each request with dev flag or not. Thanks to the iOS code I found that out and quickly managed to get everything up and running.

As the iOS code needed some URL changes - why not take one of the issues in the backlog and complete it? I’m talking about creating a git repository for the iOS client! We haven’t planned this one for this sprint, but since I’ve already started working on it, why not just use this opportunity to get it out of the way…

Standup 6

This is the last standup in the current sprint and TGIF! All of the services are up and running, the components work together, we have an edge device connected to the service and it works in the most basic deployment. YES!

There was one key learning during the last mile of debugging:

Ingress only routes HTTP/S services and NOTHING else. The feeder talks to the eater using a tiny protocol via sockets and that just wouldn’t get through Ingress at all. So instead I had to expose the eater’s protocol on a private IP accessible only from inside and not via Ingress but simply a LoadBalancer object in the GKE cluster.

Moving on to “done and done”:

Memcached is deployed in a single tiny instance as a Docker image and works like a charm!

And last but not least - new issue discovery - we’ll need to deploy a notification Python service that is necessary for the whole system. But that’s for the next Sprint Planning 😎

So what now? Clean up the mess we created while experimenting, commit and push all the code, add missing documentation and prepare for the next Sprint Review!

Conclusion

That’s it for the weekly changelog, thank you for being here with us on this incredible journey. Don’t forget to subscribe on YouTube, follow us on Instagram, Twitter, Facebook or LinkedIn. And keep an eye out on our open-source endeavors on GitHub as well!

See you out there! 👋