Posts


Mar. 1, 2022

Making Animations and Videos in R

This post details how to make animated graphs in R. For the sake of simplicity (and pretty graphics) I don’t use any real data in this post but instead create some random movement patterns using the ambient library - which is explained in the first section. For this we will need: library(ambient) library(dplyr) library(ggplot2) library(data.table) library(magick) # you may need to install imagemagick separately library(gapminder) # you may need to install ffmpeg separately Now lets create some random noise images using the ambient library:

Jun. 30, 2021

Making Bubble Maps with Folium

This post details how to make interactive bubble maps in Python with Folium. But before we jump into using Folium, lets generate some fake data with random geographic locations. import pandas as pd import random import sys import math # A long and lat point around which the random data will be generated # In this case a point in central London. latitude = 51.45 longitude = -0.10 # A function which will generate a random dataset around our long and lat def random_data(nlat, nlong, nrows, nrange): df = pd.

Feb. 28, 2021

Configuring Python to work like ESS mode in Spacemacs

I have found one of the best packages in emacs to be ESS mode which allows me to work seamlessly with R. I much prefer the setup and workflow of ESS than R Studio which I had previously done most of my R coding in. However, I spend most of my time coding in Python in jupyter notebooks. I have tried on several occasions to switch to using jupyter notebooks inside emacs, but it has never quite clicked for me and I’ve always gone back to just working in a browser.

Aug. 25, 2019

Optimal Data Decompression in R

I’ve recently encountered a situation where I am working with very large datasets in a constrained environment. As a result, the only practical option has been to store the datasets in a compressed format, and then load them into R to start on the data analysis. The problem is that when your working with datasets in the 10gb+ range on a normal desktop loading the data into memory (thankfully there is still enough of that available!

May. 23, 2017

Monitoring memory usage in jupyter notebooks

I recently encountered a series of memory errors when using pandas in the LSE high powered computing environment. While detailing the problem, isn’t going to be of particular interest to any one I thought that a quick run down of how to monitor memory usage in a jupyter notebook may be of use to a few people out there. While there are a few different ways of doing this I found that using the package memory profiler was by far the easiest option.

Mar. 23, 2017

Working with deciles and timeseries in python and pandas

Recently I’ve been working with the Land Registry’s price paid data set looking at shifts in prices in different areas of the market. One of the ways I’ve been segmenting the massive amounts of data into something more manageable has been to look at specific deciles, say the top 10% of the market. Deciles, and the like have been put to great use recently in the literature on income and wealth, ‘the 1%’ as a phrase we all now instantly ‘get’ being the perfect example.

Nov. 21, 2015

Academic PDF management with Zotero

Nowadays there are a plethora of reference managers competing for the attention of academics. While each has its own pros and cons, Zotero stands out as the best tool to use for managing a digital library of PDFs. Zotero is free, open source and maintained by academics – a real advantage at time when most other reference managers are now run for profit or owned by large journal publishers. Moreover, Zotero is perfectly suited to helping you seamlessly manage your PDFs in a manner which suits you, without the potential of costly upgrades, being locked out when you change institution or the company behind the product goes bust.