/var/

Various programming stuff

Hello! I see that you have an ad blocker. If you find something useful here and want to support me somehow please consider disabling your ad blocker for this site. It won't cost you anything but will convince me that there's actually useful info here!

Setting up Postgres on Windows for development

Introduction

To install Postgresql for a production server on Windows you’d usually go to the official website and use the download link. This will give you an executable installer that would install Postgresql on your server and help you configure it.

However, since I only use Windows for development (and never running any in production on Windows) I’ve found out that there’s a much better and easier way to install postgresql for development and windows which I’ll describe in this post.

If you want to avoid reading the whole post, you can just follow the steps described on the TL;DR below however I’d recommend reading to understand everything.

Downloading the server

First, you’ll click the zip archives link on the . official website and then download the zip archive of the Postgres version you’ll want to install. Right now there are archives for every current version like 14.5, 13.8, 12.12 etc. Let’s get the latest one, 14.5.

This will give me a zip file named postgresql-14.5-1-windows-x64-binaries.zip which contains a single folder named pgsql. I’ll extract that folder, rename it to pgsql145 and move it to c:\progr (I keep stuff there to avoid putting everything on C:). Now you should have a folder named c:\progr\pgsql145 that contains a bunch of folder named bin, doc, include etc.

Setting up the server

Now we are ready to setup Postgresql. Open a command line and move to the pgsql145\bin folder:

cd c:\progr\pgsql145\bin

The bin folder contains all executables of your server and client, like psql.exe (the CUI client), pg_dump.exe (backup), initdb.exe (create a new DB cluster), createdb/dropdb/createuser/dropuser.exe `` (create/drop database/user - these can also be run from SQL) and ``postgres.exe which is the actual server executable.

Our first step is to create a database cluster using initdb. We need to pass it a folder that will contain the data of our cluster. So we’ll run it like:

initdb.exe -D c:\progr\pgsql145\data

(also you could run initdb.exe -D ..\data, since we are on the bin folder). We’ll get output similar to:

The files belonging to this database system will be owned by user "serafeim".
This user must also own the server process.

The database cluster will be initialized with locale "Greek_Greece.1252".
The default database encoding has accordingly been set to "WIN1252".
The default text search configuration will be set to "greek".

Data page checksums are disabled.

fixing permissions on existing directory c:/progr/pgsql145/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... windows
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Europe/Bucharest
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D ^"c^:^\progr^\pgsql145^\data^" -l logfile start

And now we’ll have a folder named c:\progr\pgsql145\data that contains files like pg_hba.conf, pg_ident.conf, postgresql.conf and various folders that will keep our database server data. All these can be configured but we’re going to keep using the default config since it fits our needs!

Notice that:

  • The files of our database belong to the “serafeim” role. This role is automatically created by initdb. This is the same username that I’m using to log in to windows (i.e my home folder is c:\users\serafeim\ folder) so this will be different for you. If you wanted to use a different user name or the classic postgres you could pass it to initdb with the -U parameter, for example: initdb.exe -D c:\progr\pgsql145\data_postgres -U postgres.
  • By default “trust” authentication has been configured. This means, copying from postgres trust authentication page that “[…] PostgreSQL assumes that anyone who can connect to the server is authorized to access the database with whatever database user name they specify (even superuser names)”. So local connections will always be accepted with the username we are passing. We’ll see how this works in a minute.
  • The default database encoding will be WIN1252 (on my system). We’ll talk about that a little more later (hint: it’s better to pass -E utf-8 to set your cluster encodign to utf-8)

Starting the server

We could use the pg_ctl.exe executable as proposed by the initdb to start the server as a a background process. However, for our purposes it’s better to start the server as a foreground process on a dedicated window. So we’ll run the postgres.exe directly like:

postgres.exe -D c:\progr\pgsql145\data

or, from the bin directory we could run postgres.exe -D ..\data. The output will be

2022-09-20 09:34:10.184 EEST [10648] LOG:  starting PostgreSQL 14.5, compiled by Visual C++ build 1914, 64-bit
2022-09-20 09:34:10.189 EEST [10648] LOG:  listening on IPv6 address "::1", port 5432
2022-09-20 09:34:10.189 EEST [10648] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2022-09-20 09:34:10.330 EEST [3084] LOG:  database system was shut down at 2022-09-20 09:34:08 EEST
2022-09-20 09:34:10.369 EEST [10648] LOG:  database system is ready to accept connections

Success! Our server is running and listening on 127.0.0.1 port 5432. This means that it accepts connection only from our local machine (which is what we want for our purposes). We can now connect to it using the psql.exe client. Open another cmd, go to C:\progr\pgsql145\bin and run psql.exe: You’ll probably get an error similar to psql: error: connection to server at "localhost" (::1), port 5432 failed: FATAL:  database "serafeim" does not exist (unless your windows username is postgres).

By default psql.exe tries to connect with a role with the username of your Windows user and to a database named after the user you are connecting with. Our database server has a role named serafeim (it is created by default by the initdb as described before) but it doesn’t have a database named serafeim! Let’s connect to the postgres database instead by passing it as a parameter psql postgres:

C:\progr\pgsql145\bin>psql postgres
psql (14.5)
WARNING: Console code page (437) differs from Windows code page (1252)
        8-bit characters might not work correctly. See psql reference
        page "Notes for Windows users" for details.
Type "help" for help.

postgres=# select version();
                          version
------------------------------------------------------------
PostgreSQL 14.5, compiled by Visual C++ build 1914, 64-bit
(1 row)

Success!

Let’s cerate a sample user and database to make user that everything’s working fine createuser.exe koko, createdb kokodb and connect to the kokodb as koko: psql -U koko kokodb.

kokodb=> create table kokotable(foo varchar);
CREATE TABLE
kokodb=> insert into kokotable values('kokoko');
INSERT 0 1
kokodb=> select * from kokotable;
  foo
--------
kokoko
(1 row)

Everything’s working fine! In the meantime, we should get useful output on our postgres dedicated windows, like 2022-09-20 09:36:01.899 EEST [9704] FATAL:  database "serafeim" does not exist. To stop it, just press Ctrl+C on that window and you should get output similar to:

2022-09-20 09:46:45.178 EEST [10648] LOG:  background worker "logical replication launcher" (PID 7860) exited with exit code 1
2022-09-20 09:46:45.185 EEST [10048] LOG:  shutting down
2022-09-20 09:46:45.278 EEST [10648] LOG:  database system is shut down

I usually add a pg.bat file on my c:\progr\pgsql145\ that will start the database with its data folder. It’s contents are only bin\postgres.exe -D data

So let’s create the pg.bat like this:

c:\>cd c:\progr\pgsql145

c:\progr\pgsql145>copy con pg.bat
bin\postgres.exe -D data
^Z
        1 file(s) copied.

c:\progr\pgsql145>pg.bat
2022-09-20 09:49:53.642 EEST [11660] LOG:  starting PostgreSQL 14.5, compiled by Visual C++ build 1914, 64-bit
...

One final thing to notice is that, since we use the trust authentication there’s no check for the password, so if we tried to pass a password like psql -U koko -W kokodb it will work no matter what password we type.

Encoding stuff

The default encoding situation

You may have noticed before that the default encoding for databases will be WIN1252 (or some other similar 8-bit character set). You never want that (I guess this default is there for compatibility reasons), you want to have utf-8 encoding. So you should either pass the proper encoding to initdb, like:

initdb -D ..\datautf8 -E utf-8

This will create a new cluster with utf-8 encoding. All databases created on that cluster will be utf-8 by default.

If you’ve already got a non-utf-8 cluster, you should force utf-8 for your new database instead:

createdb -E utf-8 -T template0 dbutf8

Notice that I also passed the -T template0 parameter to use the template0 template database. If I tried to run createdb -E utf-8 dbutf8 (so it would use the template1) I’d get an error similar to:

createdb: error: database creation failed: ERROR:  new encoding (UTF8) is incompatible with the encoding of the template database (WIN1252)
HINT:  Use the same encoding as in the template database, or use template0 as template.

About the psql codepage warning

You may (or may not) have noticed a warning similar to this when starting the server:

WARNING: Console code page (437) differs from Windows code page (1252)
      8-bit characters might not work correctly. See psql reference
      page "Notes for Windows users" for details.

Some more info about this can be found in the psql reference page and this SO issue. To avoid this warning you’ll use chcp 1252 to set the console code page to 1252 before running psql.

I have to warn you though that using psql.exe from the windows console will be problematic anyway because of not good unicode support. You can use it fine as long as you write only ascii characters but I’d avoid anything else.

That’s why I’d recommend using a graphical database client like for example dbeaver.

A TL;DR walkthrough

Here are the steps to follow to get a working postgresql server on windows:

  1. Download the postgresql windows binaries of the version you want from the zip archives page and extract it to a folder, let’s name it pgsql.
  2. Go to pgsql\bin folder on a command line
  3. Run initdb.exe -D ..\data -E utf-8 from inside the pgsql\bin folder of the to create a new database cluster with utf-8 encoding on the data directory
  4. Run postgresql.exe -D ..\data to start the database server
  5. Go to pgsql\bin folder on another command line
  6. Run psql postgres to connect to the postgres database with a role similar to your windows username
  7. Profit!

Conclusion

Using the above steps you can easily setup a postgres database server on windows for development. Some advantages of the method proposed here are:

  • Since you configure the data directory you can have as many clusters as you want (run initdb with different data directories and pass them to postgres)
  • Since nothing is installed globally, you can have as many postgresql versions as you want, each one having its own data directory. Then you’ll start the one you want each time! For example I’ve got Postgresql 12,13 and 14.5.
  • Using the trust authentication makes it easy to connect with whatever user
  • Running the database from postgresql.exe so it has a dedicated window makes it easy to know what the database is doing, peeking at the logs and stopping it (using ctrl+c)

Better Django inlines

Django has a feature called inlines which can be used to edit multiple related objects at once. You’ll get a single view that will contain a single html form that includes a different Django form for each object, edit any of them and submit them all to be saved.

This feature is heavily used when you have objects that have a parent-child relation between them. For example, a book and a testimonial for each book. Each testimonial will belong to a single book and from a UX point of view it seems better to be able to edit all testimonials for each book at the same time.

The biggest disadvantage of inlines is that, because of how Django works, their interface is very primitive: For adding new objects, you need to define the number of empty forms that will be included for each inline. The user can fill them up and press save. Then the objects will be created and the user will get new empty forms to fill. To understand this better, let’s suppose that you have defined 3 empty forms (which is the default) and the user wants to create 10 inline objects. The flow will be:

  • The user sees the 3 empty forms and fills them with data.
  • The user presses save to POST the data.
  • The user sees the 3 new objects and another 3 empty forms.
  • The user fills the 3 empty forms with data.
  • The user presses save to POST the data.
  • The user sees the 6 new objects and another 3 empty forms.
  • The user fills the 3 empty forms with data.
  • The user presses save to POST the data.
  • The user sees the 9 new objects and another 3 empty forms.
  • The user fills 1 empty form with data.
  • The user presses save to POST the data.
  • The user sees the 10 new objects and another 3 empty forms.

As you can see the user is filling up the available forms and presses save all the time to get the new forms to display the objects. This makes the experience very problematic and confuses users that are not familiar with it. Also, when deleting objects, the user will see a delete checkbox for each object which needs to select and press save to actually delete the object. This is also counter-intuitive because it’s not easy for the user to understand that the object will be deleted when he saves the form.

In this article I’ll present a way to improve the experience of inlines: We’ll have a way to add new inlines without the need to save the form all the time. Also we’ll be able to improve the behavior of the delete button so it has a better UX.

The work in this article is published in this github repository: https://github.com/spapas/inlinesample.

The project implements a Book model containing multiple testimonials and editions. For each book you use inlines to add/edit the Book, its testimonials and editions in the same form. Also, I have included two ways to add/edit a book: Using the traditional django inlines way and using the javascript way I propose here.

Models

The models used in this project are the following:

from django.db import models


class Book(models.Model):
    title = models.CharField(max_length=256)
    author = models.CharField(max_length=256)


class Edition(models.Model):
    book = models.ForeignKey(Book, on_delete=models.CASCADE)
    publisher = models.CharField(max_length=256)
    year = models.IntegerField()
    pages = models.IntegerField()


class Testimonial(models.Model):
    book = models.ForeignKey(Book, on_delete=models.CASCADE)
    name = models.CharField(max_length=256)
    testimonial = models.TextField()

As you can see they are very simple; each edition and testimonial has a foreign key to a book.

Views

For the views I’m going to use the django-extra-views package that provides a bunch of useful inline-related Class Based Views:

from django.views.generic import ListView
from extra_views import (
    CreateWithInlinesView,
    UpdateWithInlinesView,
    InlineFormSetFactory,
)
from . import models


class BookListView(ListView):
    model = models.Book

    def get_queryset(self):
        return super().get_queryset().prefetch_related("edition_set", "testimonial_set")


class EditionInline(InlineFormSetFactory):
    model = models.Edition
    fields = ["publisher", "year", "pages"]
    factory_kwargs = {"extra": 1}


class TestimonialInline(InlineFormSetFactory):
    model = models.Testimonial
    fields = ["name", "testimonial"]
    factory_kwargs = {"extra": 1}


class BetterMixin:
    def get_template_names(self):
        if self.request.GET.get("better"):
            return ["books/book_better_form.html"]
        return super().get_template_names()

    def get_success_url(self):
        return "/"


class BookCreateView(BetterMixin, CreateWithInlinesView):
    model = models.Book
    inlines = [EditionInline, TestimonialInline]
    fields = ["title", "author"]


class BookUpdateView(BetterMixin, UpdateWithInlinesView):
    model = models.Book
    inlines = [EditionInline, TestimonialInline]
    fields = ["title", "author"]

As you can see for starts we add a BookListView that will be mapped to the / URL. This displays a table with all the books along with links to add a new or edit an existing book using both the traditional and better approach.

Then we define two classes inheriting from InlineFormSetFactory: EditionInline and TestimonialInline. These classes define our inlines: We set a model for them, the fields that will be displayed and pass extra parameters if needed. In this case we pass factory_kwargs = {"extra": 1} to have a single extra form for each inline. If we didn’t pass this Django would create 3 extra forms for each inline. Notice that if we were only using the better inlines we’d pass 0 to the extra parameter since it’s not really needed here. However because we use the same inlines for both the traditional and the better inlines I’m using 1 here (or else we wouldn’t be able to add new objects on the traditional approach).

Then we define a BetterMixin; the only thing it does it to return a different html template if the user visits the better views and override the get_sucess_url method to return to “/”. As you can understand from this, the only difference between the traditional and better approach is the template.

Finally, we’ve got two views for adding/editing a new book. We inherit from CreateWithInlinesView and UpdateWithInlinesView and set their model, inlines and fields attributes to the correct values.

Traditional templates

The traditional book_form.html template is like this:

{% extends "base.html" %}
{% load crispy_forms_tags %}
{% block html_title %}Book form{% endblock%}
{% block page_title %}Book form{% endblock%}

{% block content %}
    <form method='POST'>
        {% csrf_token %}
        <div class="card w-full bg-base-100 shadow-xl card-bordered card-compact border border-gray-900">
            <div class="card-body">
                <h2 class="card-title">Book</h2>
                {{ form|crispy }}
            </div>
        </div>

        {% include "partials/_inline_set_simple.html" with formset=inlines.0 title="Editions" %}
        {% include "partials/_inline_set_simple.html" with formset=inlines.1 title="Testimonials" %}

        <input type='submit' class='btn bg-blue-600' value='Save'>
        <a href='/' class='btn bg-gray-600'>Back</a>
    </form>
{% endblock %}

I’m using tailwind css for the templates. As you can see we get a two important context variables: form and inlines. The form is the main object form (book) and the inlines is the list of inlines (editions and testimonials). Notice that I’m using a partial template for each of the inlines to improve re-usability. The _inline_set_simple.html is like this:

{% load crispy_forms_tags %}

<div class="card w-full bg-base-100 shadow-xl card-bordered card-compact border border-gray-900">
  <div class="card-body">
    <h2 class="card-title">{{ title }}</h2>
    {{ formset.management_form }}
    {% for form in formset %}
      <div class='flex border rounded p-1 m-1'>
        {% for field in form %}
          <div class='flex-col mx-2 my-2'>
            {{ field|as_crispy_field }}
          </div>
        {% endfor %}
      </div>
    {% endfor %}
  </div>
</div>

This uses the django-crispy-forms package to improve form handling. See this article for a tutorial on using django-crispy-forms.

Notice that i’m doing formset=inlines[n], so each inline will have a management_form that is used internally by django and a bunch of forms (1 for each object). Each form will have the fields we defined for that inline with the addition of the delete checkbox.

This is enough to get the basic function. The user will get the following form when adding a new book:

The traditional book form

As we already discussed, the user fills the info and presses save if he wants to add more testimonials or editions.

Better templates

Let’s now take a peek at the book_better_form.html template:

{% extends "base.html" %}
{% load crispy_forms_tags static %}
{% block html_title %}Book better form{% endblock%}
{% block page_title %}Book better form{% endblock%}

{% block content %}
    <form method='POST'>
        {% csrf_token %}
        <div class="card w-full bg-base-100 shadow-xl card-bordered card-compact border border-gray-900">
            <div class="card-body">
                <h2 class="card-title">Book</h2>
                {{ form|crispy }}
            </div>
        </div>

        {% include "partials/_inline_set.html" with inline_name='edition_set' inline_title="Editions" formset=inlines.0 %}
        {% include "partials/_inline_set.html" with inline_name='testimonial_set' inline_title="Testimonials" formset=inlines.1 %}

        <input type='submit' class='btn bg-blue-600' value='Save'>
        <a href='/' class='btn bg-gray-600'>Back</a>
    </form>

<script src="{% static 'inline-editor.js' %}"></script>

{% endblock %}

This is similar to the book_form.html with the following differences:

  • We include the partials/_inline_set.html partial template passing it the inline_name which is used to identify the inline. We also pass it the actual inline formset object and a title.
  • We include some custom javascript called inline-editor.js that is used to handle the inline formset.

Notice here that we need to use the correct inline_name and not whatever we want! Usually it will be child_name_set but to be sure we can easily find it by taking a peek at the management form django will generate for us (we’ll see something like testimonial_set-TOTAL_FORMS, so we know that the name is testimonial_set).

The partial _inline_set.html is a little more complex:

<div id='better_inline_{{ inline_name }}' class="card w-full bg-base-100 shadow-xl card-bordered card-compact border border-gray-900">
    <div class="card-body">
        <h2 class="card-title">
            {{ inline_title }}
            <button class='btn btn-primary' type="button bg-blue-500" id="add-form-{{ inline_name }}">Add</button>
        </h2>
        {% if formset.non_form_errors %}
            <div class="alert alert-danger">{{ formset.non_form_errors }}</div>
        {% endif %}

        <template id="empty-form-{{ inline_name }}">
            <div class='flex border border-primary rounded p-1 m-1 inline-form'>
                {% for field in formset.empty_form %}
                    {% include "partials/_inline_field.html" %}
                {% endfor %}
            </div>
        </template>

        {{ formset.management_form }}

        {% for form in formset %}
            <div class='flex border rounded p-1 m-1 inline-form'>
                {% for field in form %}
                    {% include "partials/_inline_field.html" %}
                {% endfor %}
            </div>
        {% empty %}
            <div class='flex p-1 m-1 inline-form'></div>
        {% endfor %}
    </div> <!-- card body -->
</div><!-- card -->

We use the inline_name we passed to generate a unique id for this inline to reference it in the javascript. Then we have an add new form button. We also add an empty form template that we’ll use to copy over when adding a new form. The formset.empty_form is generated by django. After we include the management_form we enumerate the forms using a for loop. Notice the empty div <div class='flex p-1 m-1 inline-form'></div> when there are no forms, we need that to help us position the place of the forms to be added as will be explained later. The same inline-form class is used on the empty template and on the existing forms.

This uses the _inline_field.html partial template which is like this:

{% load widget_tweaks %}
{% load crispy_forms_tags %}

{% if field.field.widget.input_type == 'hidden' %}
    {{ field }}
{% else %}
    <div class='flex-col my-1 mx-2'>
        {% if "DELETE" in field.name  %}
            {{ field|add_class:""|attr:"onclick:delClick(this)"|as_crispy_field }}
        {% elif field.name == "testimonial" %}
            {{ field|attr:"rows:2"|as_crispy_field }}
        {% else %}
            {{ field|as_crispy_field }}
        {% endif %}
    </div>
{% endif %}

In this template we add an onclick function called delClick when the user clicks the delete checkbox. We could also do various other stuff like hide the delete checkbox and add a delete button instead but i’m leaving it as an exercise to the reader.

Better templates js

Let’s now take a peek at the actual javascript. First of all we define a function named inlinEditor:

function inlineEditor(inlineSetName) {
    let tmpl = document.querySelector('#empty-form-' + inlineSetName);
    let counter = document.querySelector('[name=' + inlineSetName + '-TOTAL_FORMS]')

    document.querySelector('#add-form-' + inlineSetName).addEventListener('click', ev => {
        ev.preventDefault()

        let newForm = tmpl.content.cloneNode(true);
        newForm.querySelectorAll('[id*=__prefix__]').forEach(el => {
            el.id = el.id.replace('__prefix__', counter.value);
            if (el.name) el.name = el.name.replace('__prefix__', counter.value);
        });

        newForm.querySelectorAll('[for*=__prefix__]').forEach(el => {
            el.htmlFor = el.htmlFor.replace('__prefix__', counter.value);
        })

        counter.value = 1 + Number(counter.value);
        let last_element_selector = 'form #better_inline_' + inlineSetName + ' .inline-form:last-of-type'
        document.querySelector(last_element_selector).insertAdjacentElement('afterend', newForm.children[0])
    })
}

This initially function saves the empty form template and the number of forms in the inline. The number of the forms initially is provided by the django management form. Then we add a click event to the add button for that particular inline. When the user clicks the add button we’ll add a new empty form to the end of the existing forms. This works like this:

Each of the inline forms has an id that has the following form inline_name-NUMBER-field_name, so for example for the first form of editions publisher we’ll get something like edition_set-0-publisher. The empty form has the string __prefix__ instead of the number so it will be edition_set-__prefix__-publisher. To create the new form we clone the empty form template and replace the __prefix__ on the elements with the correct number (based on the total number of forms). Then we increase the number of forms and insert the new form next to the element with the last_element_selector we define there. As you can see this selector will find the last element that is inside our inline and has a class of inline-form. That’s why we need the inline-form class to all three cases as we discussed above

Beyond this, we also have the implementation of delClick that adds a red-border class to form of the element that was clicked (notice the parentElement.parentElement thingie):

function delClick(el) {
    if(el.checked) {
        el.parentElement.parentElement.parentElement.classList.add('border-red-500')
    } else {
        el.parentElement.parentElement.parentElement.classList.remove('border-red-500')
    }
}

Finally, we generate the inlineEditors when the dom is loaded:

document.addEventListener('DOMContentLoaded', function(event) {
    inlineEditor('edition_set');
    inlineEditor('testimonial_set');
})

Please notice that here we also need to use the correct name of the inlines (both here and in the template).

Conclusion

Using the better approach our book form will be like this:

The better book form

Now the user can click the add button and a new form will be added in the end of the current list of forms. Also when he clicks the delete button he’ll get a red border around the form to be deleted.

Before finishing this tutorial I’d like to point out some things that you need to be extra careful about, especially since you are probably going to use your own html structure:

  • Don’t forget to use the correct name for the inlines in the partial template and when initializing it with inlineEditor
  • Make sure to add the inline-form class to an empty div if there are no forms in the inline, to the existing forms of the inline and to the empty template
  • Be careful on where you’ll add classes to the delClick handler; it depends on the structure of your html

Using clojure from Windows

In this small article I’m going to post a guide on how to install and use clojure from Windows using good old’ cmd.exe.

Unfortunately, most guides on the official clojure site have instructions on using Clojure from Windows through Powershell or WSL. For my own reasons I hate both these approaches and only use the cmd.exe to interact with the Windows command line.

There are more or less two approaches to using clojure. Using leiningen or using the clj tools. The clojure official guide seems to be biased towards clj tools. However I think that leiningen may be easier for new users. I’ll cover both approaches here.

Warning Before doing anything else please make sure to install Java. You need a version of java that is at least 1.8. Try running java -version in cmd.exe to make sure you have java and it is the correct version.

Leiningen

To install leiningen you just download the lein.bat file from their page and put it in a folder in your PATH. You’ll then run lein and it will download all dependencies and install itself!

To start a clojure repl to be able to play with clojure you write lein repl. If everything went smooth you should see a prompt and if you write (+ 1 2) you should get 3. To exit press ctrl+d or write exit.

To start a new project you’ll use lein new [template name] [project name]. For example, to create a new app you’ll write: lein new app leinapp. You’ll get a new directory called leinapp. The important stuff in this directory are:

  • project.clj: The basic descriptor of your project; here you can set various attrs of your project and also add dependencies
  • src\leinapp: The source directory of your project. This is where you’ll put your code.
  • test\leinapp: Add tests here

There should be a core.clj file inside your src\leinapp folder. The main function is the entry point of the app. Try running lein run from the project folder and you should get the output of the main function.

Add this to the end of the core.clj to define a foo function:

(defn foo []
  "bar")

And run lein repl. You should get a repl command prompt for your application in the leinapp.core namespace (if you named your app leinapp). Type (foo) and you should see "bar".

To create a stand alone jar with your code (called uberjar) you can use lein uberjar. This will create a file named target\uberjar\leinapp-0.1.0-SNAPSHOT-standalone.jar. Then try java -jar target\uberjar\leinapp-0.1.0-SNAPSHOT-standalone.jar (notice I’m still on the leinapp project folder) and you’ll see the output of main!

clj

Using the clj is a more modern approach to clojure development. As I said before the official clojure page seems to be biased towards using this approach. The problem is that it seems to require Powershell to run as you can see on the clj on Windows page.

Thankfully, the good people at the clojurians slack pointed me to deps.clj project. This is an implementation of clj in clojure and can be installed natively on Windows by downloading the .zip from the releases page. This zip should contain a deps.exe file. Put that executable it in your path. You can also rename it to clj.exe if you want. Also if you have the powershell installed you can run this command from cmd.exe PowerShell -Command "iwr -useb https://raw.githubusercontent.com/borkdude/deps.clj/master/install.ps1 | iex" to install it automatically.

You can now run deps and you should get a clojure repl similar to lein repl.

To create a new project skeleton you can use the use the deps-new project. To install it run the following command from cmd.exe: deps -Ttools install io.github.seancorfield/deps-new "{:git/tag """v0.4.9"""}" :as new (please notice that there are various problems with the quoting on windows but this command should work fine).

To create a new app run: deps -Tnew app :name organization/depsapp and you’ll get your app in the depsapp folder. If you want a similar form as with lein, try deps -Tnew app :name depsapp/core :target-dir depsapp. Now the depsapp folder will contain:

  • deps.edn: The basic descriptor of your project; here you can set various attrs of your project and also add dependencies. This more or less changes the project.clj we got from leiningen.
  • src\depsapp: The source directory of your project. This is where you’ll put your code.
  • test\depsapp: Add tests here

To run the project, try: deps -M -m depsapp.core or deps  -M:run-m or deps  -X:run-x to directly run the greet function (run-m and run-x are aliases defined in deps.edn take a peek).

To start a REPL, run deps. Notice this will start on the user namespace, so you’ll need to do something like:

user=> (require 'depsapp.core)
nil
user=> (depsapp.core/foo)
"bar"

to run a (foo) function that you’ve added in the core.clj file.

To run the tests use: deps -T:build test.

To create the uberjar you’ll run: deps -T:build ci (tests must pass). Then execute it directly using java -jar target\core-0.1.0-SNAPSHOT.jar.

Also, notice that it’s really simple to create a new project with deps without the deps-new. For example, create a folder named manualapp and in this folder create a deps.edn file containing just the string {}. Then add another folder named src with a hello.clj file containing something like:

(ns hello)

(defn foo []
  "bar")

(defn run [opts]
  (println "Hello world"))

You can then open a REPL on the project using deps or run the run function using deps -X hello/run.

VSCode integration

Both leining and clj projects can easily be used with VSCode. First of all, install the calva package in your VSCode. Then, open your clojure project in VScode and press ctrl+shift+p to bring up the command pallete. Here write “Jack” (from jack-in) and select it (also this has the shortctut ctrl+alt+c ctrl+alt+j). Select the correct project type (leiningen or deps.edn). A repl will be opened to the side; you can then go to your core.clj file and run ctrl+alt+c enter to load the current file.

Then you can move to the repl on the side and run the function with (foo) or run (-main). Also you can write (foo) in your source file and press ctrl+enter to execute it and see the result; the ctrl+enter will execute the form where your cursor is. See this for more.

PDFs in Django like it’s 2022!

In a previous article I had written a very comprehensive guide on how to render PDFs in Django using tools like reportlab and xhtml2pdf. Although these tools are still valid and work fine I have decided that usually it’s way too much pain to set them up to work properly.

Another common way to generate PDFs in the Python world is using the weasyprint library. Unfortunately this library has way too many requirements and installing it on Windows is worse than putting needles in your eyes. I don’t like needles in my eyes, thank you very much.

There are various other ways to generate PDFs like using a report server like Japser or SQL Server Reporting Services but these are too “enterpris-y” for most people and require another server, a learning curve, etc.

I was actually so disappointed by the status of PDF generation today that in some recent projects instead of the PDF file I generated an HTML page with a nice pdf-printing stylesheet and instructed the users to print it as PDF (from the browser) so as to generate the PDF themselves!

However, recently I found another way to generate PDFs in my Django projects which I’d like to share with you: Using the wkhtmltopdf tool. The wkhtmltopdf is a command line program that has binaries for more or less every operating system. It’s a single binary that you can download and put it in a directory, you don’t need to run another server or any fancy installation. Only an executable. To use it? You call it like wkhtmltopdf http://google.com google.pdf and it will download the url and generate the pdf! It’s as simple as that! This tool is old and heavily used but only recently I researched its integration with Django.

Please notice that there’s actually a django-wkhtmltopdf library for integrating wkhtmltopdf with django. However I din’t have good results while trying to use it (maybe because of my Windows dev environment). Also, implementing the integration myself allowed my to more easily understand what’s happening and better debug the wkhtmltopdf. However YMMV, after you read this small post to understand how the integration works you can try django-wkhtmltopdf to see if it works in your case.

In any way, the first thing you need to do is download and install wkhtmltopdf for your platform and save its full path in your settings.py like this:

# For linux
WKHTMLTOPDF_CMD = '/usr/local/bin/wkhtmltopdf'

# or for windows
WKHTMLTOPDF_CMD = r'c:\util\wkhtmltopdf.exe'

Notice that I’m using the full path. I have observed that even if you put the binary to a directory in the system PATH it won’t be picked (at least in my case) thus I recommend using the full path.

Now, let’s suppose we’ve got a DetailView (let’s call it SampleDetailView) that we’d like to render as PDF. We can use the following CBV for that:

from subprocess import check_output
from django.template import Context, Template
from django.template.loader import get_template
from tempfile import NamedTemporaryFile
import os

class SamplePdfDetailView(SampleDetailView):
  def get_resp_from_file(self, filename, context):
      template = get_template(filename)
      resp = template.render(context)
      return resp

  def get_resp_from_string(self, template_str, context):
      template = Template(template_str)
      resp = template.render(Context(context))
      return resp

  def render_to_response(self, context):
      context['pdf'] = True
      # We can use either
      resp = self.get_resp_from_string("<h1>Hello, world! {{ object }}</h1>", context)
      # or
      # resp = self.get_resp_from_file('test_pdf.html', context)

      tempfile = NamedTemporaryFile(mode='w+b', buffering=-1,
                                    suffix='.html', prefix='tmp',
                                    dir=None, delete=False)

      tempfile.write(resp.encode('utf-8'))
      tempfile.flush()
      tempfile.close()
      cmd = [
          settings.WKHTMLTOPDF_CMD,
          '--page-size', 'A4', '--encoding', 'utf-8',
          '--footer-center', '[page] / [topage]',
          '--enable-local-file-access',  tempfile.name, '-']
      # print(" ".join(cmd))
      out = check_output(cmd)

      os.remove(tempfile.name)
      return HttpResponse(out, content_type='application/pdf')

We can put the pdf view on our url patterns right next to our DetailView i.e:

[
  ...
  path(
      "detail/<int:pk>/",
      permission_required("core.user")(
          views.SampleDetailView.as_view()
      ),
      name="detail",
  ),
  path(
      "detail/<int:pk>/pdf/",
      permission_required("core.user")(
          views.SamplePdfDetailView.as_view()
      ),
      name="detail_pdf",
  ),
  ...
]

Let’s try to understand how this works: First of all notice that we have two options, either create a PDF from an html string or from a normal template file. For the first option we pass the full html string to the get_resp_from_string and the context and we’ll get the rendered html (i.e the context will be applied to the template). For the second option we pass the filename of a django template and the context. Notice that there’s a small difference on how the template.render() method is called in the two methods.

After that we’ve got an html file saved in the resp string. We want to give this to wkhtmltopdf so as to be converted to PDF. To do that we first create a temporary file using the NamedTemporaryFile class and write the resp to it. Then we call wkhtmltopdf passing it this temporary file. Notice we use the subprocess.check_output function that will capture the output of the command and return it.

Finally we delete the temporary file and return the pdf as an HttpResponse.

We call the wkhtmltopdf like this:

c:\util\wkhtmltopdf.exe --page-size A4 --encoding utf-8 --footer-center [page] / [topage] --enable-local-file-access C:\Users\serafeim\AppData\Local\Temp\tmp_lh5r6f9.html -

The page-size can be changed to letter if you are in the US. The encoding should be utf-8. The —footer-center option adds a footer to the PDF page with the current page and the total number of pages. The —enable-local-file-access is very important since it allows wkhtmltopdf to access local files (in the filesystem) and not only remote ones. After that we’ve got the full path of our temporary file and following is the - which means that the pdf output will be on the stdout (so we’ll capture it with check_output).

Notice that there’s a commented out print command before the check_output call. If you have problems you can call this command from your command line to debug the wkhtmltopdf command (don’t forget to comment out the os.remove line to keep the temporary file). Also, wkhtmltopdf will output some stuff while rendering the command, for example something like:

Loading pages (1/6)
Counting pages (2/6)
Resolving links (4/6)
Loading headers and footers (5/6)
Printing pages (6/6)
Done

You can pass the --quiet option to hide this output. However the output is useful to see what wkhtmltopdf is doing in case there are problems so I recommend leaving it on while developing. Let’s take a look at a problematic output:

Loading pages (1/6)
Error: Failed to load file:///static/bootstrap/css/bootstrap.min.css, with network status code 203 and http status code 0 - Error opening /static_edla/bootstrap/css/govgr_bootstrap.min.css: The system cannot find the path specified.
[...]
Counting pages (2/6)
Resolving links (4/6)
Loading headers and footers (5/6)
Printing pages (6/6)
Done

The above output means that our template tries to load a css file that wkhtmltopdf can’t find and errors out! To understand this error, I had a line like this in my template:

<link href="{% static 'bootstrap/css/bootstrap.min.css' %}" rel="stylesheet">

which will be converted to a link like `/static/bootstrap/css/bootstrap.min.css. However notice that I tell wkhtmltopdf to render a file from my temporary directory, it doesn’t know where that link points to! Following this thing you need to be extra careful to include everything in your HTML-pdf template and not use any external links. So all styles must be inlined in the template using <style> tags and all images must be converted to data images with base64, something like:

<img src='data:image/png;base64,...>

To do that in python for a dynamic image you can use something like:

import base64

def convert_to_data(image):
  return 'data:image/xyz;base64,{}'.format(base64.b64encode(image).decode('utf-8'))

and then use that as your image src (notice I’m using image/xyz here for an arbitrary image, please use the correct image type if you know it i.e image/png or image/jpg).

If you’ve got a static image you want to include you can convert it to base64 using an online service like this, or read it with python and convert it:

import base64

with open('static/images/image.png', 'rb') as image:
  print(base64.b64encode(image.read()).decode('utf-8'))

Instead of a DetailView we could use the same approach for any kind of CBV. If you are to use the PDF response to multiple CBVs I recommend exporting the functionality to a mixin and inheriting from that also (see my CBV guide for more).

Finally, the big question in the room is why should I convert my template to a file and pass that to wkhtmltopdf, can’t I use the URL of my template, i.e pass wkhtmltopdf something like http://example.com/app/detail/321/?

By all means you can! This will also enable you to avoid using inline styles and media!! However keep in mind that the usual case is that this view will not be public but will need an authenticated user to access it; wkhtmltopdf is publicly trying to access it, it doesn’t have any rights to it so you’ll get a 404 or 403 error! Of course you can start an adventure on authenticating it somehow (and maybe doing something stupid) or you can just follow my lead and render it to a file :)

A forward and reverse proxy primer for the layman

Before some days I’d written an answer on HN where I explained as simply as possible how a forward and a reverse proxy is working and what is the difference between them. In this article I’m going to extend this answer a bit to make it a full post and clarify some things even more.

Forward and reverse proxies is an important concept that a lot of technical people aren’t familiar with. HTTP Proxying is a process of forwarding (HTTP) requests from one server to the other. So when an HTTP client issues a request to the server, the request will pass through the proxy server and be forwarded to the destination server (called the origin server). This explanation is true both for forward and reverse proxying.

Forward Proxy

A forward proxy is used when an HTTP Client (i.e a browser) wants to access resources in the internet but isn’t allowed to connect directly to the public internet so instead uses the proxy.

Usually companies don’t allow unrestricted access to the internet from their internal network. Thus the internal users would need to use a proxy to access the internet. This is the concept of the forward proxy. What happens is that when an internal user want to access an internet resource (i.e www.google.com) her client (i.e browser) will ask a specific server (the proxy server) for that resource. The client needs to be configured properly with the address of the proxy server.

So instead of http://www.google.com the browser will access http://proxy.company.com/?url=www.google.com and the proxy will fetch the results and return them to you. If the browser wants to access https://www.google.com without a configured proxy server it will get a network error.

Here’s an image that explains this:

Forward proxy

The internal client can access the internal web server directly without problems. However he cannot access the internet server directly so he needs to use the proxy to access it.

One thing that needs to be made crystal is that the fact that your browser works with the proxy does not mean that any other HTTP clients you use will also work. For example, you may want to run curl or wget to download some files from an external server; these programs will not work without setting a proxy (usually by setting the http_proxy and https_proxy environment variables or by passing a parameter). Also, the proxy only works for HTTP requests. If you are in a private network without external access you will not be able to access non-HTTP resources. For example you will not be able to access your non-company mail server (which uses either IMAP or POP3) from behind your company’s network. Typically, you’ll use a web client for accessing your mails.

So it seems that using a proxy heavily restricts the internal users usage of internet. What are the advantages of using a forward proxy?

  • Security: Since the internal computers of a company will not have internet access there’s no easy way for attackers to access these computers.
  • Content moderation: The company through the proxy can block access to various internet sites (i.e social network, gaming etc) that the users shouldn’t access during work.
  • Caching: The proxy server can have a cache so when multiple users access the same internet resource it will downloaded only once saving the company’s bandwidth.

Especially the security thing is so important that almost all corporate (or university etc) networks will use a proxy server and never allow direct access to the internet.

A well known, open source forward proxy server is Squid.

Reverse proxy

A reverse proxy is an HTTP server that “proxies” (i.e forwards) some (or all) requests it receives to a different HTTP server and returns the answer back. For example, a company may have a couple of HTTP servers in its internal network. These servers have private addresses and cannot be accessed through the internet. To allow external users to access these servers, the company will configure a reverse proxy server that will forward the requests to the internal servers as seen in the picture:

Reverse proxy

What happens is that the proxy server will forward requests that fulfill some specific criteria to other web servers. The criteria may be requests that have * a specific host (forward the requests that have a hostname of www.server1.company.com to the internal server named server1 and www.server2.company.com to the internal server named server2) * or a specific port (forward requests in the port 81 to server1 and requests in the port 82 to server2) * or even a particular path (forward requests with the path www.company.com/server1 to server1 and requests with the path www.company.com/server2 to server2)

or even other criteria that may be decided.

Let’s see some example of reverse proxying:

  • A characteristic example of reverse proxy is the well-known 3-tier architecture (web server / app server / database server). The web server is used to serve all requests but it “proxies” (forwards) some of the requests to the app server. This is used because the web server cannot serve dynamic replies but can serve static replies like for example files.
  • Offloading the SSL (https) security to a particular web server. This server will store the private key of your certificate and terminate the SSL connections. It will then forward the requests to the internal web servers using plain HTTP.
  • An HTTP load balancer will proxy the requests to a set of other servers based on some algorithm to share the load (i.e the HAProxy software load balancer or even a hardware load balancer)
  • A reverse proxy can be used to act as a security and DOS “shield” for your web servers. It will check the requests for common attack patterns and forward them to your servers only if they are safe
  • A reverse proxy can be used for caching; it will return cached versions of resources if they are available to avoid overloading the application servers
  • A CDN (content delivery network) is more or less a set of glorified reverse proxy servers that act as a first step for serving the user’s requests (based on the geographic location) also offering security protection and caching (this is what akamai or cloudflare do)

As can be seen from the previous examples there are a lot of apps that do reverse proxying, for example apache HTTP, nginx, HAProxy, varnish cache et al.

Notice that while there’s only one forward proxy, there could be a (large) chain of reverse proxies when accessing a remote server. Let’s take a look at a rather complex scenario: A user in a corporate network will access an application in another network. In this case the user’s request may pass through:

forward proxy (squid) -> security server / CDN (akamai) -> ssl termination (nginx) -> caching (varnish) -> web server (nginx again) -> app server (tomcat or gunicorn or IIS etc) as can be seen on the following image:

Reverse proxy

Notice that is this case (which is not uncommon) there are six (05) servers between your client and the application server!

One common problem with this is that unless all the intermediate servers are configured properly (by properly modifying and passing the X-Forwarded-For header) you won’t be able to retrieve the IP of the user that did the initial request.