2D-KDE

What’s this?

Macro for two dimensional kernel density estimation (2D-KDE) plot using SAS GRAPH.

2D-KDE will show as contour plot with marginal KDE curve

Input data

key

variable

type

1

group

numeric or character

2

x

numeric

y

numeric

I recommended that format is applied to group variable.

Syntax

ods graphics / < graphics option > ;
ods listing gpath=< output path >;

%macro kde2d(
         data=,
         group=None,
         x=,
         y=,
         xlabel=x,
         ylabel=y,
         univar_grid=401,
         bw_method=sjpi,
         bw_adjust=1,
         univar_style=line,
         bivar_grid=60,
         bivar_nlevel=10,
         bivar_style=line,
         thresh=0,
         legend=true,
         legendtitle=#,
         scatter=false,
         palette=sns
);

Parameters

  • data : dataset name (required)

    input data. keep, rename and where options are available.

  • group : variable name (required)

    group variable. if group variable is not set, the legend parameter will be ignored.

  • x : variable name (required)

    numeric variable of x-axis.

  • y : variable name (required)

    numeric variable of y-axis.

  • xlabel : string (optional)

    label string of x-axis. default is “x”. when the label is not displayed , set like below.

    xlabel=,

  • ylabel : string (optional)

    optional. label string of y-axis. default is “y”. when the label is not displayed , set like below.

    ylabel=,

  • univar_grid : integer (optional)

    the number of univariate KDE grid size. default is 401 (the default of proc kde)

  • bw_method : keyword (optional)

    the bandwidth estimation method of univariate KDE. default is “sjpi” (the default of proc kde).

    method keyword described below is available.

    • sjpi (Sheather-Jones plug-in)

    • snr (simple normal reference)

    • snrq (simple normal reference that uses the interquartile range)

    • srot (Silverman’s rule of thumb)

    • os (oversmoothed)

  • bw_adjust : numeric (optional)

    the bandwidth multiplier of univariate KDE. Increasing will make the curve smoother. the default is 1.

  • univar_style : keyword (optional)

    the style of univariate KDE. style keyword described below is available.

    • line

    • fill

  • bivar_grid : integer (optional)

    the number of bivariate KDE grid size. default is 60 (the default of proc kde).

  • bivar_nlevel : integer (optional)

    the number of level for bivariate KDE. default is 10.

  • bivar_style : keyword (optional)

    the style of bivariate KDE. style keyword described below is available.

    • line

    • fill

    • linefill

  • thresh : integer (0 to 1, optional)

    the density threshold of bivariate KDE. The grid that density is under the threshold will removed from the bivariate KDE plot. default is 0.

  • scatter : bool (optional)

    if True the scatter plot will be displayed. default is False.

  • rug : bool (optional)

    if True the rug plot will be displayed on the univariate KDE plot. default is False.

  • legend : bool (optional)

    if “True” , legend of group item is displayed. default is “True”.

  • legendtitle : text (optional)

    the title of legend. default is label of group variable.

  • palette : keyword (optional)

    color palette for fill, line and markers. the palettes described below is available. see color palette section of introduction page. default is “SNS” (Seaborn default palette).

    • SAS

    • SNS (Seaborn)

    • STATA

    • TABLEAU

  • note : statement (optional)

    insert the text entry statement into the graph template and display the title or footnote in the output image. default is “” (not displayed)

  • deletedata : bool (optional)

    if True, the temporary datasets and catalogs generated by macros will be deleted at the end of execution. default is True.

example

output example can be executed using following code after loading SAS plotter.

code

ods listing gpath="your output path";
filename exam url "https://github.com/Superman-jp/SAS_Plotter/raw/main/example/ridgeline_example.sas" encoding='UTF-8';
%include exam;

grouped KDE plot

if the group parameter is set, univariate KDE and bivariate KDE plot will be generated by group. Group color is depend on the color palette set by palette parameter.

raw data

filename raw url "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv";

PROC IMPORT OUT= WORK.penguines
            DATAFILE= raw
            DBMS=CSV REPLACE;
   GETNAMES=YES;
   DATAROW=2;
   guessingrows=max;
RUN;
ods listing gpath=<your path>;
ods graphics /  height=24cm width=24cm imagename="groupedKDE" imagefmt=svg;

title "grouped KDE plot";
ods graphics /  height=24cm width=24cm imagename="groupedKDE" imagefmt=png;
%kde2d(
   data=penguines,
   x=bill_length_mm,
   y=bill_depth_mm,
   group=species,
   univar_style=line,
   bivar_nlevel=10,
   bivar_style=line,
   xlabel=bill_length (mm),
   ylabel=bill_depth (mm),
   legend=true,
   note=%nrstr(entrytitle 'your title here';
               entryfootnote halign=left 'your footnote here';
            entryfootnote halign=left 'your footnote here 2';)

);
_images/groupedKDE1.svg

Style of KDE plot

Univariate KDE plot is supported two styles, LINE, and FILL. these styles can be set by univar_style parameter.

Bivariate KDE plot is supported three styles, LINE, FILL, and LINEFILL. these styles can be set by bivar_style parameter.

Because FILL and LINEFILL styles of KDE plot can not be adjust the transparency of the fill, the overlapped grouped KDE plot with LINE style is recommended.

code

title "KDE plot with linefill style";
ods graphics /  height=24cm width=24cm imagename="linefillKDE" imagefmt=png;

%kde2d(
   data=penguines(where=(species='Chinstrap')),
   x=bill_length_mm,
   y=bill_depth_mm,
   group=species,
   univar_style=fill,
   bivar_nlevel=10,
   bivar_style=linefill,
   bivar_grid=100,
   xlabel=bill_length (mm),
   ylabel=bill_depth (mm),
   legend=true,
   legendtitle=
);
_images/linefillKDE1.svg

Diaplay individual data

the individual data visualization is available. this macro supports two visualization method, scatter plot and rug plot.

Scatter plot is displayed the data points for two numeric variable. The data points as dot are overlaid on the bivariate KDE plot. Rug plot is is displayed the data points for one numeric variable, The data points as short bar (like barcode) are overlaid on the bottom of univariate KDE plot.

Because FILL and LINEFILL styles of KDE plot can not be adjust the transparency of the fill, the scatter plot and rug plot with with LINE style is recommended.

code

title "KDE plot with individual data";
ods graphics /  height=24cm width=24cm imagename="rugscatterKDE" imagefmt=png;

%kde2d(
   data=penguines(where=(species='Chinstrap')),
   x=bill_length_mm,
   y=bill_depth_mm,
   group=species,
   univar_style=line,
   bivar_nlevel=10,
   bivar_style=line,
   bivar_grid=100,
   xlabel=bill_length (mm),
   ylabel=bill_depth (mm),
   legend=true,
   rug=true,
   scatter=true
);
_images/rugscatterKDE1.svg

Thresh parameter

the density set to the thresh parameter will be removed bivariate KDE plot. if style is “FILL” or “LINEFILL”, the fill of the density below under the thresh parameter will be disabled. level of the contour will be set using the density except for below under the thresh parameter.

code

title "KDE plot using thresh parameter";
ods graphics /  height=24cm width=24cm imagename="threshKDE" imagefmt=png;

%kde2d(
      data=penguines(where=(species='Chinstrap')),
      x=bill_length_mm,
      y=bill_depth_mm,
      group=species,
      univar_style=line,
      bivar_nlevel=10,
      bivar_style=linefill,
      bivar_grid=100,
      xlabel=bill_length (mm),
      ylabel=bill_depth (mm),
      thresh=0.002,
      legend=true,
      rug=true,
      scatter=true
);
_images/threshKDE1.svg