Dear reader, first of all, thanks for stopping by, and showing interest in the Stata Guide! The Guide would not have been possible without the support and encouragement from the community. I have received countless messages, comments, suggestions, and feedback on various articles, all of which have tremendously helped in improving the content. I wanted to write this small article to explain the motivation behind all of this.

Why Stata?

In the field of micro econometrics, and economics in general, Stata is the go-to language. Having being involved in a dozens of projects in institutions like LUMS, the World Bank, DFID, USAID, J-PAL, CERP, and others, Stata was, and still is, the core software used for analysis. While the current orientation might be more towards open-source, data science-oriented languages, like R and Python, my current academic/research circle uses only Stata. And this is unlikely to change in the next few years. I myself started using Stata around 2003 during my Masters’ program. This was a major leap from other softwares available at that point in time, like TSP and SPSS. So for me personally, there is also a lock-in effect. …


In this guide, learn how to add arrows to lines graphs in Stata as shown in the figure below:

Image for post
Image for post

While the idea of an arrow at the end of a line seems a banal one, it does pose an interesting challenge, since it involves calculating angles in Stata. The core principles behind calculating angles for adding arrows can be used for a lot of interesting visualizations, that will be covered in subsequent guides.

The guide is split in to two parts. Part I covers the fundamentals of angles in Stata, while Part II applies the fundamentals to actual COVID-19 data that has been covered in previous guides as well. …


In this Stata guide, learn how to create the following Stream graph using publicly available COVID-19 information on daily cases and deaths from the Our World in Data (OWID) database:

Image for post
Image for post

Stream graphs are a follow-up of the Stacked-area graphs guide. It is recommended that you go through the earlier guide before using this one since the essential building blocks are the same, and the earlier guide explains the steps carefully. …


In this guide, we will learn how to make the following bar graph in Stata that have custom color schemes, automated legends, and labels:

Image for post
Image for post

Two challenges exist with automating bar graphs. First, there is a loss of information that occurs when collapsing or reshaping a dataset. For example, variable and value labels drop, which would typically be used to label graphs. Reshaping of the data is required, in one form or the other, to make bars of different colors. Second, while the graph above looks straightforward to make, individually fixing bars or legends quickly becomes cumbersome. Bar graphs are used to plot summary statistics, usually sum or mean value of different variables. As a result, legend labels need to be manually fixed. …


In the world of data visualizations, color schemes are a essential for making the graphics stand out. An important part of learning how to use colors is an understanding of the color wheel and color harmonies that help define color palettes or schemes. In the first part of this guide, we will briefly introduce these topics. In the second part of the guide we will learn how to define our own color scheme and generate Stata graphs that are fully personalized, customized, and automated:

Image for post
Image for post

Custom color schemes can be used in any type of graphs. For example, in this guide, we will learn how to push a custom palette on a line graph shown…


In this guide we will learn how to make the following hex map in Stata using the recent 2020 United States Presidential elections data:

Image for post
Image for post

This guide is more limited in its application than other guides since it makes use of the maptile package, a wrapper of the spmap package that I have discussed before here and here. Michael Stepner, the creator of maptile has also uploaded the hex map of the US states here including a host of other geographic boundaries as well. Since the number of maps are limited, the application is also limited to the templates provided. One can, theoretically, covert and generate other hex maps for Stata as well but this requires some effort. Since this program was last updated three years ago, customization options for hex plots are also limited, especially the ability to change labels and outlines of hex tiles. …


In the world of data visualizations an enormous amount of thinking has gone into defining what constitutes good infographics. At the end of the day, visualizations need to communicate a story in an aesthetically pleasing way and in order to achieve this, various elements like colors and fonts need to fit together neatly.

In the world of Stata graphs, fonts are often an overlooked feature, even though the software provides the functionality to fully customize and utilize them.

This guide aims to introduce the basics of fonts usage in Stata and provides a step-by-step guide on how to incorporate them in your graphs. …


In this guide learn how to make ridgeline plots (also known as joy plots), in Stata using publicly available COVID-19 dataset from Our World in Data. At the end of the guide, we will learn how to make this graph below:

Image for post
Image for post

This guide builds on the concepts introduced in previous guides. Here we learn how to use Lowess curves to generate the graph above.

The guide goes step-by-step over the idea behind the construction of the ridgeline plots. If you are an advanced user, the core code is provided at the end of the guide.

Preamble

A basic knowledge of Stata is assumed. If you are using this guide for the first time, and are new to Stata, then Guide 1 and Guide 2 are highly recommended. …


In this guide we will learn how to make the following doubling time graph in Stata:

Image for post
Image for post

Doubling time graphs were popular in the beginning of the COVID-19 pandemic with John Murdoch from the Financial Times (FT), and also a first mover in data visualizations on the virus spread. The figures were promoted daily on Twitter. The graph is no longer available on the FT COVID-19 webpage but is discussed here and here.

The graph was also not without controversy. There was significant debate around the use of log-scales, which are not easy to interpret, and the use of absolute cases instead of per capita cases. The graph also shows the doubling time relative to the 10th or 100th case, which sort of loses it value as we move ahead in time. This is also different from a rolling doubling time numbers based on the last 10 or 14 days which make more sense later in the pandemic. …


Eurostat is the official statistical office for the European Union (EU) with the aim of providing data on EU member countries. The database is extensive covering various aspects of socio-economic, environmental, trade, mobility, regional, and a host of other indicators.

Navigating the database can be daunting. Same, or similar, variables exist across different datasets, several subsets of larger datasets exist, and some datasets contain derived variables from other datasets. Therefore, for research or any data-related project, it is not unusual to extract and collate information by combining different files into one large database for analysis. …

About

Asjad Naqvi

Here you will find information on Stata, COVID-19, and data visualizations.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store