Monday, 19 March 2018

Quick and clever data sparsity / density tool

Hi all,

Just a quick post today to share a clever python tool I’m using for data sparsity / density analysis.

  • Data sparsity : number or percentage of cells that are empty.
  • Data density : number or percentage of cells that contain information.


It’s quite common to find tools or libraries that aim to analyse data and deliver indicators. What I wanted is to have a datavisualization tool in order to display a meaningful picture of data density / sparsity.

Here comes “missingno”, developed by Aleksey Bilogur, a really talented data analyst from NYC, and available on github.

No more bla-bla, here is what you can get with simple python code within your Jupyter editor.


You can clearly see the amout of data available for each column. Not the nice sparkline on the right, showing “missing data bursts”.

Different plots are available, have a look on this “heatmap” showing nullity correlation : how the presence or absence of one variable has a correlation in the presence of another.


Bars, GeoPlot and Dendogram are also available.

Definitely a must have tool for all python and data enthusiasts.

No comments: