Matt Healy - PropTech CTO
Technology Leader, Solutions Architect and Business Analyst
Perth, Western Australia
 

AWS CodeCommit is Amazon Web Service's Git hosting service and part of their suite of developer tools. It can be thought of as a comparable service to something like GitHub or GitLab although being completely honest both of those services are ahead of CodeCommit in terms of usability and feature set. It does however have the advantage of being closely tied to other AWS services, in particular IAM for permission management.

CodeCommit allows for the usual development workflow of Pull Requests. However, something annoying about this is that you can only view outstanding pull requests against a single repository. There is no way in the AWS Console to have an overview of all outstanding pull requests across all code repositories in your account (within the current region).

As someone who is responsible for performing code reviews, I found this quite frustrating in that I couldn't at a glance see how many PR's are waiting for my review and thus started to fall behind in making those reviews. As you may know from my earlier blog entries, one thing I love about AWS is that almost anything can be scripted using their API or CLI interfaces.

Using the excellent Boto3 Python library, I managed to create the below script which I find useful for summarising all outstanding pull requests in the account.

View on GitHub

import boto3
import argparse

parser = argparse.ArgumentParser()

parser.add_argument(
    '--repo', metavar='repo', type=str,
    help='Optionally filter by repository')

args = parser.parse_args()
filterRepository = args.repo

client = boto3.client('codecommit')
region = client.meta.region_name

pullRequests = []

resp = client.list_repositories()
repositories = resp['repositories']

for r in repositories:

    if filterRepository:
        if r['repositoryName'] != filterRepository:
            continue

    resp = client.list_pull_requests(
        repositoryName=r['repositoryName'],
        pullRequestStatus='OPEN',
    )

    pullRequestIds = resp['pullRequestIds']
    for p in pullRequestIds:
        pullRequests.append(p)

for i in pullRequests:

    resp = client.get_pull_request(
        pullRequestId=i
    )

    pr = resp['pullRequest']

    title = pr['title']
    description = pr.get('description')
    lastActivity = pr['lastActivityDate']
    created = pr['creationDate']
    authorArn = pr['authorArn']

    targets = pr['pullRequestTargets']

    for t in targets:

        repo = t['repositoryName']

        link = 'https://{}.console.aws.amazon.com/codesuite/'.format(region) + \
               'codecommit/repositories/{}/pull-requests/'.format(repo) + \
               '{}?region={}'.format(i, region)

        print("\nLink:\n{}".format(link))

        print("\nRepo: {}".format(t['repositoryName']))
        print("Source branch: {}".format(t['sourceReference']))
        print("Target branch: {}\n".format(t['destinationReference']))

    print("Created: {}".format(created))
    print("Last Activity: {}".format(lastActivity))
    print("Author: {}\n".format(authorArn))

    print("Title: {}".format(title))
    print("Description: {}\n".format(description))

    print("------------------------------------------------------")

 

I recently wrote an API endpoint for a project at VaultRealEstate which required me to generate a month-by-month commission summary breakdown for a sales agent. The API endpoint accepts an arbitrary start and end month and should return a JSON object showing all the distinct months in that date range and the commission performance for those months (for example, for displaying a bar chart).

I found that this was actually not a trivial problem, and built-in Python libraries like datetime can't really solve this problem nicely. There are third party libraries like dateutil which can solve the problem, and normally I would immediately go for a library like this. However, this project is hosted on AWS Lambda and I'm conscious about the deployment size of the project getting bigger with each dependency introduced, so I really like to only introduce a dependency when it's necessary.

I found the following StackOverflow answer which seemed to suit my needs:

https://stackoverflow.com/a/34898764/272193

Adapting the answer for my use case results in the below snippet.

from datetime import datetime, timedelta
from collections import OrderedDict

# Sample start and end dates
start = datetime(year=2017, month=10, day=1)
end = datetime(year=2018, month=3, day=1)

# Get list of months >= start and < end

months = OrderedDict(((start + timedelta(_)).strftime("%Y-%m-01"), 0) for _ in range((end - start).days)) 
# OrderedDict([('2017-10-01', 0), ('2017-11-01', 0), ('2017-12-01', 0), ('2018-01-01', 0), ('2018-02-01', 0)])

 

In my last blog post we looked at how to deploy our Flask application using Gunicorn on Amazon's EC2 service. That blog post was more focused on getting a very simple test case up and running, but one thing we didn't cover in detail was how best to manage our Gunicorn process.

If you recall from the previous post, we set up our server with Nginx acting as the web server listening for traffic on port 80, which forwarded requests to our Flask application server (Gunicorn) running on port 8000. This works well, except for the fact that we aren't really looking after our Gunicorn process. A Gunicorn process can die because of a coding error, or perhaps some other external factor. We also want our Gunicorn process to start again in the event of a reboot.

To kick things off, let's install supervisord:

[ec2-user@ip-172-31-6-157 ~]$ sudo pip install supervisor --pre

We need to set the configuration for supervisor. First, run the following command:

[ec2-user@ip-172-31-6-157 ~]$ echo_supervisord_conf

This should print out a sample configuration file to your terminal. Let's use this as the basis for our configuration.

[ec2-user@ip-172-31-6-157 ~]$ sudo bash -c '/usr/local/bin/echo_supervisord_conf > /etc/supervisord.conf'
[ec2-user@ip-172-31-6-157 ~]$ sudo vi /etc/supervisord.conf

At the very bottom of the script, add the following block and adjust to suit your application.

[program:myapp]
command = /home/apps/.virtualenvs/myapp/bin/python /home/apps/.virtualenvs/myapp/bin/gunicorn app:app -b localhost:8000
directory = /home/apps/myapp
user = apps
autostart=true                ; start at supervisord start (default: true)
autorestart=true                ; whether/when to restart (default: unexpected)

Save the file, and now let's start supervisor. We want supervisor to start automatically at boot time, so we will need an init script for this. Supervisor doesn't usually come packaged with an init script, but you can download one from this link.

[ec2-user@ip-172-31-6-157 ~]$ cd /etc/init.d
[ec2-user@ip-172-31-6-157 init.d]$ sudo bash -c 'wget https://gist.githubusercontent.com/MattHealy/a3772c19b6641eb3157d/raw/06932656e8a173e91e978468a10d837a69a1ecfa/supervisord'
[ec2-user@ip-172-31-6-157 init.d]$ sudo chmod +x supervisord
[ec2-user@ip-172-31-6-157 init.d]$ sudo chkconfig --add supervisord
[ec2-user@ip-172-31-6-157 init.d]$ sudo /etc/init.d/supervisord start

The above commands ensure that every time the machine is restarted, supervisor will start automatically, and in turn will start our Gunicorn process for serving our Flask app.


 

This blog post explains how to get your Flask app up and running on Amazon EC2. This tutorial assumes you can use a Unix-like terminal environment (e.g. Linux or Mac OS X)

Firstly, within your Amazon Management Console, you need to create an EC2 instance. Click the "Launch Instance" button

Let's choose "Amazon Linux" as our base machine image.

Now you can choose your machine type - to keep things within the free tier let's choose a t2.micro instance. All new Amazon Web Services customers can use the "free tier" to run certain services for 12 months at no cost.

We'll now continue with "Review and Launch"

From here, we can click "Edit Security Groups" and define our firewall rules. We want to allow SSH from anywhere so we can get in and modify our instance, and also allow HTTP traffic (Port 80) to our instance from the Internet.

Go ahead and launch the instance, ensuring you either already have a key pair file or you create a new key pair.

After your instance finishes booting, you can SSH in to continue with our deployment. View the instance details to get our public DNS address:

and SSH in:

ssh -i /path/to/your/keyfile ec2-user@your_public_dnsname_here

Now we want to create a user which will run our Flask app. It is a good idea to run our Flask app as a separate user and definitely not run it as root. It can be dangerous to run a service as root because if that service was to be compromised somehow (e.g. by a bug in our code) then the attacker would have access to our whole system.

[ec2-user@ip-172-31-6-157 ~]$ sudo /usr/sbin/useradd apps

Change to the apps user:

[ec2-user@ip-172-31-6-157 ~]$ sudo su apps
[apps@ip-172-31-6-157 ec2-user]$ cd ~
[apps@ip-172-31-6-157 ~]$ mkdir myapp
[apps@ip-172-31-6-157 ~]$ cd myapp

Now upload the code for your Flask app - you could do this by cloning from an existing Git repository, but for this tutorial we'll just create a simple test app. Use your favourite text editor (mine is Vim) to create app.py:

from flask import Flask

app = Flask(__name__)

@app.route('/')
def index():
    return "It works!"

if __name__ == '__main__':
    app.run(debug=True)

We'll run our Flask app in a virtual environment, so let's install virtualenvwrapper.

[apps@ip-172-31-6-157 myapp]$ exit
[ec2-user@ip-172-31-6-157 ~]$ sudo easy_install pip
[ec2-user@ip-172-31-6-157 ~]$ sudo pip install virtualenvwrapper
[ec2-user@ip-172-31-6-157 ~]$ sudo su apps
[apps@ip-172-31-6-157 ec2-user]$ cd
[apps@ip-172-31-6-157 ~]$ vi .bashrc

Add the following lines

export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_VIRTUALENV_ARGS='--no-site-packages'

source /usr/bin/virtualenvwrapper.sh

This allows us to access the virtualenvwrapper command line tools easily. Let's reload our .bashrc profile and create a virtual environment.

[apps@ip-172-31-6-157 ~]$ . .bashrc
[apps@ip-172-31-6-157 ~]$ mkvirtualenv myapp

Now install your project's dependencies - typically this will be from running pip install -r requirements.txt, but for this example I'll simply install Flask:

(myapp)[apps@ip-172-31-6-157 ~]$ pip install Flask

We don't want to use the Flask development server, as it is not safe to do so for a production environment, so let's also install Gunicorn to serve our Python code.

(myapp)[apps@ip-172-31-6-157 ~]$ pip install gunicorn

We'll need a web server installed on our instance, because we're going to forward requests from port 80 to our Flask app running internally. It is best practice to have a web server handle our port 80 requests, because the Flask application server (Gunicorn) we are using is designed to serve Python code, but not ideal for handling port 80 requests from the Internet.

(myapp)[apps@ip-172-31-6-157 ~]$ exit
[ec2-user@ip-172-31-6-157 ~]$ sudo yum install nginx
[ec2-user@ip-172-31-6-157 ~]$ sudo vi /etc/nginx/nginx.conf

Replace this line:

user  nginx;

with this:

user  apps;

and in the http block, add this line:

server_names_hash_bucket_size 128;

And now let's define a server block for our site:

[ec2-user@ip-172-31-6-157 ~]$ sudo vi /etc/nginx/conf.d/virtual.conf`

Paste in the below:

server {
    listen       80;
    server_name  your_public_dnsname_here;

    location / {
        proxy_pass http://127.0.0.1:8000;
    }
}

Start the web server

[ec2-user@ip-172-31-6-157 myapp]$ sudo /etc/rc.d/init.d/nginx start

And finally, let's start our Gunicorn process to serve our Flask app:

[ec2-user@ip-172-31-6-157 myapp]$ sudo su apps
[apps@ip-172-31-6-157 ~]$ cd ~/myapp
[apps@ip-172-31-6-157 ~]$ workon myapp
(myapp)[apps@ip-172-31-6-157 myapp]$ gunicorn app:app -b localhost:8000 &

This will set our Gunicorn process off running in the background, which will work fine for our purposes here. An improvement that can made here is to run Gunicorn via Supervisor. Supervisor can look after our Gunicorn processes and make sure that they are restarted if anything goes wrong, or to ensure the processes are started at boot time. I'll be writing a followup post about implementing Supervisor later on.

Now, if you visit your public DNS name in your web browser, you should see something like the below:

Congratulations! You have now successfully deployed your Flask app to an Amazon Web Services EC2 server.


 

Always be learning

Almost anything you do in programming can be used as a learning experience. It's a fine line between doing everything from scratch and re-inventing the wheel, and standing on the shoulders of others so you can reach further. There is so much out there in the way of frameworks, scaffolding, tools to help you achieve what you want to achieve. While it's great to be able to use these fantastic tools, it is also extremely beneficial to have some understanding of what's going on under the surface.

I've been hand-coding in Perl for pretty much my whole career, without using an existing framework. As such, I've been developing my side-projects after hours using modern programming languages and frameworks to try and broaden my skill set and stay relevant in the world of web development. For the past few months I've been developing using the Flask framework for Python. The very blog site you're reading now is written by myself using Flask. Yes, I could have just used an "off the shelf" solution for blogging, such as Wordpress, but in doing so I wouldn't really learn anything.

Along the way I've run in to small problems, mostly caused by not being totally familiar with what I'm doing in Python and Flask, but these too have presented interesting learning challenges. As an example, the blog entries on this site are created and formatted using the Markdown syntax. I'm also running the content through Python's Bleach library for sanitising the markup. So far, so good.

I then wanted to truncate the blog entries for display on the home page of the blog. This isn't as simple as just taking a slice of the string representing the content, because we might end up with broken and un-nested HTML tags. I was lucky enough to stumble across this interesting snippet which appeared to hold all the answers I was looking for. The code uses a Python class derived from the HTMLParser library to build up a tag stack, truncate the HTML content and ensure the tag stack is closed correctly.

I implemented the code on my site, pushed it to production, and all was well.

The next day I decided to alter my code to allow <img /> tags for displaying images within my blog posts. I edited a blog post on my test environment, reloaded the home page - and everything broke.

Exception: end tag u'p' does not match stack: [u'p', u'img']

What? I thought the HTMLAbbrev code was supposed to take care of this for me?

Digging deeper in to this, I found that the HTMLAbbrev class was overriding the methods for handle_starttag, handle_endtag and handle_startendtag, inherited from the base class of HTMLParser. handle_startendtag is similar to handle_starttag but it applies to empty tags such as <img />. Ok, this is somewhere to start looking.

The HTML output of my blog post was coming up with

<img class="float-left" height="200" src="http://s3-ap-southeast-2.amazonaws.com/matthealy-blog/1024px-Cable_closet_bh.jpg">

which isn't correct XHTML style. Perhaps this is why the HTMLParser module couldn't process my HTML?

Backtracking through the code to find out where the responsibility lies, I isolated the problem to Bleach itself. "Surely this has come up with others before me?" I thought. I checked the project's GitHub Issues page, and found the following closed issue

This looks promising, but the issue was closed due to lack of interest! (This is the good and bad thing about open source software - if something is broken, feel free to fix it yourself!) A commenter in that thread mentioned that they were using Beautiful Soup to tidy up the HTML and make sure it is formatted as XHTML. I installed the package to my site, ran my HTML content through Beautiful Soup and called soup.prettify() and hey presto, we have valid XHTML, and the HTMLAbbrev class can once again handle my blog posts.

I finally thought that all was working as expected, but no, there was one more hurdle to overcome! It turns out that using soup.prettify() adds a whole bunch of extra whitespace around your HTML elements, making your anchor links look funny. I found this article providing the answer - use str(soup) instead of soup.prettify().

Finally I end up with the desired result, with the added satisfaction of nutting out a few little problems along the way.