Static Sites and Apps On Your Own Dokku Server

https://ericmjl.github.io/essays-on-data-science/miscellaneous/static-sites-on-dokku/

Summary: In this essay, I’m going to share with you how you can deploy your own static sites and apps on a Dokku server.

Introduction You’ve worked on this awesome Streamlit app, or a Panel dashboard, or a Plotly Dash web frontend for your data science work, and now you’ve decided to share the work. Or you’ve built documentation for the project, and you need to serve it up. Except, if your company doesn’t have a dedicated platform for apps, you’re stuck! That’s because you’ve now got to share it from your laptop/workstation and point your colleagues to there (right… go ahead and email them a link to http://10.16.213.24:8501 right now…) and keep your computer running perpetually to serve the app in order for them to interact with it.

Or, you copy/paste the docs into a separate hosted solution, and now the docs are divorced from your code, leading to documentation rot in the long-run because it’s too troublesome to maintain two things simultaneously. That’s much too fragile. What you need is another hosting machine that isolated from your development machine, so that you can develop in peace while the hosting machine reliably serves up the working, stable version of your work.

The specific thing that you really need is a “Platform as a Service”. There’s a lot of commercial offerings, but if you’re “cheap” and don’t mind learning some new concepts to help you get around the web, then this essay is for you. Here, I’m going to show you how to configure and deploy Dokku as a personal PaaS solution that you can use at work and for hobby projects. I’m then going to show you how to deploy a static site (which can be useful for hosting blogs and documentation), and finally I’ll show you how to deploy a Streamlit app, which you can use to show off a front-end to your fancy modelling work. Along the way, I hope to also point out the “generalizable” ideas behind the steps listed here, and give you a framework (at least, one useful one) for building things on the web.

But, but why? If you’re part of a company… Your organisation might not be equipped with modern PaaS tools that will enable you, a data scientist, to move quickly from local prototype to share-able prototype. If, however, you have access to bare metal cloud resources (i.e. just gimme a Linux box!), then as a hacker-type, you might be able to stand up your own PaaS and help demonstrate to your infrastructure team the value of having one.

If you’re doing this for your hobby projects… You might be as cheap as I am, but need a bit more beefy power than the restrictions given to you on Heroku (512MB RAM, 30 minute timeouts, and limited number of records in databases), and you don’t want to pay $7/month for each project.

Additionally, you might want a bit more control over your hosting options, but you don’t feel completely ready to go fiddling with containers and networking without a software stack to help out just a bit.

More generally… You might like the idea of containers, but find it kind of troublesome to learn yet another thing that’s not trivial to configure, execute and debug (i.e. Docker). Dokku can be a bridge to get you there, as it automates much of the workflow surrounding Docker containers. It also comes with an API that both matches closely to Heroku (which is famous for being very developer-friendly) and also helps you handle proxy port mapping and domains easily.

Are you ready? Let’s go!

Pre-requisites I’m assuming you know how to generate and use SSH keys to remotely access another machine. This is an incredibly useful thing to know how to do, and so I’d recommend that you pick this skill up. (As usual, DigitalOcean’s community pages have a great tutorial.)

I am also assuming that you have access to a Linux box of some kind with an IP address that you can expose to the “internet”. The “internet” in this case can mean the world wide web if you’re working on a personal project, or your organisation’s “intranet” if you’re planning on only letting those in your organisation access the sites and apps that you will build.

I’m also assuming familiarity with git, which I consider to be an indispensable tool in a data scientist’s toolkit.

This last point is not an assumption, but an exhortation: you should be building your data app prototype in accordance to 12-factor app principles. It’ll make (most of) your Engineering colleagues delight in working with you. (Yes, there are some esoteric types that don’t subscribe to 12 factor app principles…) If you’ve never heard of it, go check it out here. It’s a wonderful read that will change how you build your data app prototypes for the better. It will also make handing over the app to Engineering much easier in the long-run, improving your relationship with the Engineering team that will take care of your app!

Set up a Box in the Cloud (optional if you have one) If you don’t already have another computer that you have access to, or if you’re curious on how to get set up in the cloud, then follow along these instructions.

To make things a bit more concrete, I’m going to rent a server on DigitalOcean (DO). If you don’t have a DigitalOcean account, feel free to use my referral link to sign up (and get free $100 credits that you can spend over 60 days) if you haven’t got a DigitalOcean account already. (Disclaimer: I benefit too, but your support helps me make more of this kind of content!) Once you’ve signed up and logged into the DigitalOcean cloud dashboard, you can now set up a new Droplet.

Droplets? To do so, click on the big green “Create” button, set up a Droplet with the following settings:

Ubuntu operating system “Standard” plan at $5/mo or $10/mo 1 Additional options: “monitoring” Authentication: you should use SSH keys (this is a pre-requisite for this essay). Hostname: Give it something memorable. Backups: Highly recommended. Give yourself peace of mind that you can rollback anytime. Once you’re done with that, hit the big green “Create Droplet” button right at the bottom of the screen!

Once your droplet is set up, you can go ahead and click on the “Manage > Droplets” left menu item, and that will bring you to a place where you can see all of your rented computers.

The Cloud Setup Dokku on your Shiny Rented Machine Let’s now go ahead and set up Dokku on your shiny new droplet.

Run the Dokku installation commands Dokku installation on Ubuntu is quite easy; the following instructions are taken from the Dokku docs. SSH into your machine, then type the following:

wget https://raw.githubusercontent.com/dokku/dokku/v0.20.4/bootstrap.sh;

CHECK FOR LATEST DOKKU VERSION!

http://dokku.viewdocs.io/dokku/getting-started/installation/

sudo DOKKU_TAG=v0.20.4 bash bootstrap.sh Wrap up Dokku installation If you’re on an Ubuntu installation, you’ll want to navigate to your Linux box’s IP address, and finish up the installation instructions, which involves letting your Dokku installation know about your SSH public keys. These keys are important, as the are literally the keys to letting you push your app to your Dokku box!

SSH Keys and Dokku For those of you who have CentOS or other flavours of Linux, you will need to follow analogous instructions on the Dokku website. I have had experience following the CentOS instructions at work, and had to modify the installation commands a little bit to work with our internal proxies.

Test that Dokku is working To test that your Dokku installation is working, type the following command:

dokku help That should show the Dokku help menu, and you’ll know the installation has completed successfully!

Optionally set proxies for Docker Now, because Dokku builds upon Docker, if you’re behind a corporate proxy, you might need to configure your Docker daemon proxies as well. You’ll then want to follow instructions on the Docker daemon documentation.

Those steps will generally be the same as what’s in the docs, though the specifics will change (e.g. your proxy server address).

Configure domain names The allure of Heroku is that it gives your app a name: myapp.herokuapp.com, or myshinyapp.herokuapp.com quite automagically. With Dokku and a bit of extra legwork, we can replicate this facility.

We’re going to set up your Dokku box uch that its main domain will be mydomain.com, and apps will get a subdomain myapp.mydomain.com.

Register a domain name Firstly, you’ll need a domain name from a Domain Name Service (DNS) registrar. Cloudflare seems to be doing all of the right things at the moment, so their domain name registration service is something I’m not hesitant to recommend (at least as of 2020). For historical reasons, I’m currently using Google Domains. At work, we have an internal service that lets us register an internal domain name. What matters most is that you have the ability to assign a domain name that points to your Dokku machine’s IP address.

Go ahead and register a domain name that reflects who “you” are on the web. For myself, I have a personal domain name that I use. At work, I registered a name that reflects the research group that I work in. Make sure that the name “points”/”forwards” to the IP address of your Dokku box.

Enable subdomains! To enable the ability to use subdomains like myapp.mydomain.com for each app, you’ll want to also configure the DNS settings. On your domain registrar, look for the place where you can customise “resource records”. On Google Domains, it’s under “DNS > Custom resource records”.

There, you’ll want to add an “A” record (as opposed to other options that you might see, like “CNAME”, “AAAA”, and others). The “name” should be *, literally just an asterisk. The IPv4 address should point to your Dokku machine. This is all that is needed.

What is an ‘A’ record? What then about the name *? To test whether your domain name is setup correctly, head over to the domain in your favourite browser. At this point, you should see the default NGINX landing page, as you have no apps deployed and no domains configured.

How do you pronounce ‘NGINX’? And what is NGINX? Deploy a test app Heroku provides a “Python getting started” repository that we will use to check that the installation is working correctly. This one deploys reliably with all of the vanilla commands entered. Leveraging this, I will also show you how to leverage your * A record to put in nice subdomains!

Clone the test app Firstly, git clone Heroku’s python-getting-started repository to your laptop/local machine (i.e. not your Dokku box).

Next, cd into the repository:

cd python-getting-started After that, add your Dokku box as a git remote to the repository:

git remote add dokku dokku@your-domain-name:python-getting-started Be sure to replace your-domain-name with your newfangled domain that you registered.

App name Now, push the app to your Dokku box!

git push dokku master Unlike your usual pushes to GitHub, GitLab or Bitbucket, you’ll see a series of remote outputs being beamed back to your terminal. What’s happening here is the build of the app! In particular, a Docker build is happening behind-the-scenes, so your app is completely self-contained and containerised on the Dokku box!

If everything went well, the last output beamed back to you should look like:

=====> Application deployed: http://mydomain.com:10161 Wonderful! Now let’s go back to Dokku and configure your app.

Configure the app domain and ports We’re now going to configure Dokku to recognise which subdomains should point to which apps.

Firstly, get familiar with the Dokku domains command:

On your Dokku box

dokku domains:help That should list out all of the Dokku domains sub-commands.

Usage: dokku domains[:COMMAND]

Manage domains used by the proxy

Additional commands: domains:add [ ...] Add domains to app domains:add-global [ ...] Add global domain names domains:clear Clear all domains for app domains:clear-global Clear global domain names domains:disable Disable VHOST support domains:enable Enable VHOST support domains:remove [ ...] Remove domains from app domains:remove-global [ ...] Remove global domain names domains:report [|--global] [] Displays a domains report for one or more apps domains:set [ ...] Set domains for app domains:set-global [ ...] Set global domain names You can report domains used for the app, python-getting-started:

On your Dokku box

dokku domains:report python-getting-started The output should look something like this:

$ dokku domains:report python-getting-started =====> python-getting-started domains information Domains app enabled: false Domains app vhosts: Domains global enabled: false Domains global vhosts: This tells us that python-getting-started has no domains configured for it. We can now set it:

On your Dokku box

dokku domains:set python-getting-started python-getting-started.mydomain.com The output will look like this:

—–> Added python-getting-started.mydomain.com to python-getting-started —–> Configuring python-getting-started.mydomain.com…(using built-in template) —–> Creating http nginx.conf Reloading nginx Now, you should be able to go to http://python-getting-started.mydomain.com, and the page that gets loaded should be the “Getting Started with Python on Heroku” landing page!

So, what magic happened here? If everything deployed correctly up till this point, you’re good to go with deploying a data app on your Dokku machine!

Deploy your data app Deploying the python-getting-started app should have given you:

the confidence that your Dokku installation is working correctly, firsthand experience configuring Dokku, a taste of the workflow for deploying an app. Now, we’re going to apply that to a Streamlit app. I’ve chosen Streamlit because it’s got the easiest programming model amongst all of the Dashboard/app development frameworks that I’ve seen; in fact, I was able to stand up an explainer on the Beta distribution in under 3 hours, the bulk of which was spent on writing prose, not figuring out how to program with Streamlit.

Build a streamlit app (skip if you already have an app) If you don’t have a Streamlit app, here’s one that you can use as a starter, which simply displays some text and a button:

app.py

import streamlit as st

”””

First Streamlit App!

This is a dummy streamlit app. “””

finished = st.button(“Click me!”) if finished: st.balloons() Save it as app.py in your project directory.

Now, you can run the app with Streamlit:

On your local machine

streamlit run app.py You should see the following show up in your terminal:

You can now view your Streamlit app in your browser. Local URL: http://localhost:8501 Network URL: http://:8501 You can go to the local URL and confirm that the app is running correctly, and that it does exactly what's expected.

Add project-specific configuration files for Dokku Now, we need to add a few configuration files that Dokku will recognise.

requirements.txt Firstly, make sure you have a requirements.txt file in the project root directory, in which you specify all of the requirements for your app to run.

requirements.txt

streamlit==0.57.3 # pinning version numbers is good for apps.

put more below as necessary, e.g.:

numpy==0.16 With a requirements.txt file, Dokku (and Heroku) will automagically recognise that you have a Python app. Dokku will then create a Docker container equipped with Python, and install all of the dependencies in there. Declarative configuration FTW!

Procfile Next, you need a Procfile in the project root directory:

Procfile

web: streamlit run app.py Procfile The general syntax in the Procfile is:

: The command is always a single line, and tells Dokku/Heroku what commands to execute in order to run the app. In our case, we simply execute the same command that we used to run the app locally for development purposes. More complex Procfile commands For the process_type, in the case of Dokku, is always "web". Heroku, on the other hand, can handle other process types. Since we're dealing with Dokku and not Heroku, don't bother changing process_type. Configure git Remote with Dokku Now, let's configure your Dokku remote. # On your local machine git remote add dokku dokku@mydomain.com:streamlit-app Remember two points! Firstly, change mydomain.com to your domain. Secondly, you can use any app name you want, it doesn't have to be streamlit-app. A convention that has helped me is to have a 1:1 mapping between app and project repository folder name. It means one less thing to be confused about. Once you're done configuring the remote, now push it up! # On your local machine git push dokku master The same build commands will take place. While they are taking place, go ahead and open a new Terminal, and SSH into the Dokku box. We're going to configure the new app on Dokku! Configure Dokku Subdomain Let's start with the subdomain name first. For this tutorial, I'm going to use the domain name streamlit-app.mydomain.com. Let's configure the app streamlit-app with that domain name: # On Dokku box dokku domains:set streamlit-app streamlit-app.mydomain.com Configure Dokku port mapping Next, we have to configure the port mapping that Dokku's proxy server will recognise. By default, every container has the "hosting box" (i.e. the machine Dokku is running on) port 80 mapped to "container box" (i.e. the container the app is running on) port 5000. You can see this with: # On Dokku box dokku proxy:report streamlit-app That will give you something like: =====> python-getting-started proxy information Proxy enabled: true Proxy port map: http:80:5000 Proxy type: nginx Now, because streamlit is going to be run on port 8501 (in the container) by default, we need to change the port mapping from http:80:5000 to http:80:8501. To do so: # On Dokku box dokku proxy:ports-set streamlit-app http:80:8501 Putting these two configurations together, i.e. setting the subdomain and port mapping, we have now told Dokku, "Each time you get a request from http://streamlit-app.mydomain.com, forward it to port 8501 on the streamlit-app container." Test it out! Well, we now can test it out. Go ahead and head over to your app URL, and see if the app works for you! Debugging If things look like they're crashing, how do you debug? Well, you always should know how to look at the logs: dokku logs my_app_name -t That will keep the logs updating in the terminal as you refresh the page. Use the information in the logs to help you debug. Also, see if you can reproduce the error in the logs locally. Additionally, if you get nginx errors, you can look at the nginx logs to help you debug proxy errors as they show up: dokku nginx:access-logs my_app_name -t Look at the logs and dig through for anything that might help you with your Google searches. Follow this pattern, and soon enough, you'll become an expert at debugging your web apps! Deploy a static site Now that you've seen how to deploy an app that's powered by a container behind-the-scenes, let's now figure out how to deploy a static site that is built upon every deploy. It's essentially the same. We have configuration files (in this case, slightly different ones) that declare what kind of environment we need. We basically treat the static site generator as an "app" that generates the HTML pages that we serve up freshly on each build. For this example, I'm going to use mkdocs, as it is also easy to use to build sites, and can be extended with some pretty awesome templates (like mkdocs-material) for responsive docs generated from Markdown files. If you've got another static site builder (I have used Lektor, sphinx, and Nikola before), the places where we use mkdocs commands can be easily replaced by the relevant ones for your situation. Build a static site (skip if you already have one) If you don't already have a static site, then feel free to use the following example. In your project root directory, create a docs/ directory, and then place a dummy index.md in there: # Index Page Hello world! Now, in the project root directory, create a mkdocs.yml file, in which you configure mkdocs to build the static site: # mkdocs.yml site_name: Marshmallow Generator This is a minimal mkdocs configuration. Now, make sure you have mkdocs installed in the Python environment that you're using. It's available on PyPI: # On your local machine pip install -U mkdocs Once installation has finished, you can now command mkdocs to build the static site to view locally: # On your local machine mkdocs serve If you can successfully view the static site on your local machine, i.e. you see the contents of index.md show up as a beautifully rendered HTML page, you're good to move on! Add project-specific configuration files for Dokku We're now going to add the necessary configuration files to work with Dokku. Firstly, we have to add in a .static file in the project root directory. This file tells Dokku that the site that is going to be built is a static site. To do so in the terminal, you only have to touch the file at the command line: # On your local machine touch .static Secondly, we have to add a .buildpacks file, where we specify that we are using two "buildpacks": one to provide the environment to run the commands that build the site, and another to build the site and serve up the static site files. In the case of our dummy mkdocs static sites, we need in .buildpacks: https://github.com/heroku/heroku-buildpack-python.git https://github.com/dokku/buildpack-nginx.git They have to go in that order, so that the first one is used for building, and the second one is used for serving the site. What are 'buildpacks? Thirdly, instead of a Procfile, we add an app.json file that contains the command for building the static site. { "scripts": { "dokku": { "predeploy": "cd /app/www && mkdocs build" } } } Deployment tasks OK, we just created a bunch of files, but I haven't explained how they're interacting with Dokku. There's definitely some opinionated things that we'll have to unpack. Firstly, the Dokku buildpack-nginx buildpack makes the opinionated assumption that your repository will be copied over into the Docker container's /app/www directory. That is why we have the cd /app/www command. Then, we follow it up with a mkdocs build, which you can change depending on what static site generator you're using. Secondly, the predeploy key declares to Dokku to execute the commands in the value (i.e. cd /app/www && mkdocs build) before starting up the nginx server that points to the static site files. As you probably can grok by now, basically, the static sites are being built upon every deploy. This saves you from having to build the site locally and then pushing it up, which is both in-line with how git is supposed to be used (you only git push files that are generated by hand), and is in-line with the continuous deployment philosophy. Finally, we still need our requirements.txt file to be populated with whatever is needed to build the docs locally: # requirements.txt mkdocs==1.1 # put other dependencies below! Now that we're done, let's configure our remotes once again. Configure git Remote with Dokku As with the Streamlit app, go ahead and configure the git remote with Dokku: # On your local machine git remote add dokku dokku@mydomain.com:my-static-site Now, push up to Dokku! # On your local machine git push dokku master Configure Dokku As with the Streamlit app, let's now configure the domains: # On your Dokku box dokku domains:set my-static-site my-static-site.mydomain.com Unlike the app, we don't have to configure ports, because they will be mapped correctly by default. Finally, we need to configure nginx to point to the directory in which the index.html page is generated. In the case of mkdocs, the directory is in the site/ directory in the project root directory. We'll now configure it: # On your Dokku box dokku config:set my-static-site NGINX_ROOT='site' Warning You'll want to change site to whatever the output directory is for the static site generator you use! Alrighty, go ahead and visit your static site to confirm that it's running! Debugging As with the Streamlit app above, debugging is done in exactly the same way, using the two commands: # Inspect application logs dokku logs my_app_name -t # Inspect nginx logs dokku nginx:access-logs my_app_name -t The Framework I have a habit of categorising things into a "framework" to help me anchor how I debug things, and I hope to share my framework for domains, apps, and Dokku with you. Firstly, we organised our Dokku box + domain name such that the Dokku box was referenced by the domain name, while individual apps got subdomains. We got subdomains for free by configuring a * on the DNS provider's A records, which forwarded all sub-domains to the Dokku box. Secondly, we configured each app on the Dokku box to resolve which subdomain points to it. In this way, subdomains need not be set on our DNS provider. Thirdly, we configured both static sites and dynamic data apps, using a collection of configuration files. For our data apps, it was primarily a Procfile and requirements.txt. For our static sites, it was a .buildpacks file, app.json file, and requirements.txt. Each have their purpose, but together they tell Dokku how to configure the environment in which apps are built. Cheatsheet of Commands Here's a cheatsheet of commands we used in this essay, to help you with getting set up. Domain Name Registration Register your domain. Add a custom resource record * pointing to your Dokku box's IP address Streamlit Commands # Run streamlit app streamlit run app.py Git commands # Add dokku remote git remote add dokku dokku@mydomain.com:streamlit-app # Push master branch to Dokku box git push dokku master Interacting with proxies # View port forwarding for app dokku proxy:report streamlit-app # Set port forarding for app dokku proxy:ports-set streamlit-app http:80:8501 The syntax for the ports is: :: For port forwarding, if you follow the general framework we're using here, you should only have to configure the container port. Interacting with domains # View domains for an app dokku domains:report streamlit-app # Set domains for an app dokku domains:set streamlit-app streamlit-app.mydomain.com Again, if you follow the framework we have used here, then you should only need to configure .mydomain.com Config Files Procfile for apps web: streamlit run app.py mkdocs.yml for MkDocs config # mkdocs.yml site_name: Marshmallow Generator Create .static for static sites: touch .static .buildpacks for static sites and multi-buildpack apps https://github.com/heroku/heroku-buildpack-python.git https://github.com/dokku/buildpack-nginx.git app.json for static sites: { "scripts": { "dokku": { "predeploy": "cd /app/www && mkdocs build" } } } I am paying for the $10/mo plan for the extra RAM seems to help. ↩ Made with Material for MkDocs

Updated: