Custom table summarizing outcomes

1. Exploratory analysis - Custom tables, summary statistics

In this vignette we will discuss about how to customize the summary statistics using ExpCustomStat function from SmartEDA. The output of this function returns matrix object containing descriptive information on all input variables for each level/combination of levels in categorical/group variable. Also, while running the analysis we can filter row/cases of the data. We can apply the filters at individual variable level or complete data like base subsetting.

Function definition:

ExpCustomStat(data,Cvar=NULL,Nvar=NULL,stat=NULL,gpby=TRUE,filt=NULL,dcast=FALSE)

Key functionalities of ExpCustomStat are:

  1. Categorical data descriptive statistics (Frequencies, Proportions)
  2. Numerical data descriptive statistics (Mean, Median, Sum, Variance etc..)
  3. Comparison of numerical data based on categorical data
  4. Filter rows/cases where conditions are true. Options to apply filters at variable level or complete data set like base subsetting
  5. Options to calculate basic statistics like Mean, Median, Std.Dev, Variance, Count, Proportions, Quantiles, IQR, Percentages of Shares (PS) for numerical data

1.1 Usage of ExpCustomStat function

Will open the carseats data from ISLR package and drive different types of use cases using ExpCustomStat function.

In this vignette, we will be using a simulated data set containing sales of child car seats at 400 different stores.

Data Source ISLR package.

Function source SmartEDA package

Carseats data from ISLR package:

options(width = 150)
CData = ISLR::Carseats
head(CData,5)
##   Sales CompPrice Income Advertising Population Price ShelveLoc Age Education Urban  US
## 1  9.50       138     73          11        276   120       Bad  42        17   Yes Yes
## 2 11.22       111     48          16        260    83      Good  65        10   Yes Yes
## 3 10.06       113     35          10        269    80    Medium  59        12   Yes Yes
## 4  7.40       117    100           4        466    97    Medium  55        14   Yes Yes
## 5  4.15       141     64           3        340   128       Bad  38        13   Yes  No

2. Categorical summaries

Categorical summaries to describe the distribution for a qualitative variables.

  • Count (Frequency) the number of subjects in each group
  • The proportions in each group (Relative frequency)
  • Tabulation or custom tables for more than two categorical variables

2.1. Frequency table

The number of observations for particular category

  • Base: All carseats data
  • Table: Calculating the count and proportions of "US", "Urban" and "ShelveLoc" variable
ExpCustomStat(Carseats,Cvar=c("US","Urban","ShelveLoc","Education"),gpby=FALSE)
##      Level  Group_by Count  Prop
##     <fctr>    <char> <int> <num>
##  1:    Yes        US   258 64.50
##  2:     No        US   142 35.50
##  3:    Yes     Urban   282 70.50
##  4:     No     Urban   118 29.50
##  5:    Bad ShelveLoc    96 24.00
##  6:   Good ShelveLoc    85 21.25
##  7: Medium ShelveLoc   219 54.75
##  8:     17 Education    49 12.25
##  9:     10 Education    48 12.00
## 10:     12 Education    49 12.25
## 11:     14 Education    40 10.00
## 12:     13 Education    43 10.75
## 13:     16 Education    47 11.75
## 14:     15 Education    36  9.00
## 15:     18 Education    40 10.00
## 16:     11 Education    48 12.00

OR we can use similar analysis using ExpCTable function from same package, this functions includes cumulative percentages and Total

ExpCTable(Carseats,Target=NULL,clim=5,nlim=15,round=2,bin=NULL,per=F)
##     Variable  Valid Frequency Percent CumPercent
## 1  ShelveLoc    Bad        96   24.00      24.00
## 2  ShelveLoc   Good        85   21.25      45.25
## 3  ShelveLoc Medium       219   54.75     100.00
## 4  ShelveLoc  TOTAL       400      NA         NA
## 5      Urban     No       118   29.50      29.50
## 6      Urban    Yes       282   70.50     100.00
## 7      Urban  TOTAL       400      NA         NA
## 8         US     No       142   35.50      35.50
## 9         US    Yes       258   64.50     100.00
## 10        US  TOTAL       400      NA         NA
## 11 Education     10        48   12.00      12.00
## 12 Education     11        48   12.00      24.00
## 13 Education     12        49   12.25      36.25
## 14 Education     13        43   10.75      47.00
## 15 Education     14        40   10.00      57.00
## 16 Education     15        36    9.00      66.00
## 17 Education     16        47   11.75      77.75
## 18 Education     17        49   12.25      90.00
## 19 Education     18        40   10.00     100.00
## 20 Education  TOTAL       400      NA         NA
  • Base: All carseats data
  • Table: Calculating the count and proportions of "US", "Urban" and "ShelveLoc" variable
ExpCustomStat(Carseats,Cvar=c("US","Urban","ShelveLoc"),gpby=FALSE)
##     Level  Group_by Count  Prop
##    <fctr>    <char> <int> <num>
## 1:    Yes        US   258 64.50
## 2:     No        US   142 35.50
## 3:    Yes     Urban   282 70.50
## 4:     No     Urban   118 29.50
## 5:    Bad ShelveLoc    96 24.00
## 6:   Good ShelveLoc    85 21.25
## 7: Medium ShelveLoc   219 54.75

2.2. Crosstabulation (more than one categorical variable)

To produce cross tables which calculate counts and proportions for each combination of categorical variables we can use ExpCustomStat

NOTE: For crosstabulation change input gpby=TRUE

  • Base: All carseats data
  • Table: Calculating the count and proportions of "US" Vs "Urban"
ExpCustomStat(Carseats,Cvar=c("US","Urban"),gpby=TRUE,filt=NULL)
##        US  Urban Count  Prop
##    <fctr> <fctr> <int> <num>
## 1:    Yes    Yes   186  46.5
## 2:     No    Yes    96  24.0
## 3:    Yes     No    72  18.0
## 4:     No     No    46  11.5

We can also produce multidimensional tables based on three or more categorical variables

  • Base: All carseats data
  • Table: Calculating the count and proportions of "US" Vs "Urban" Vs "ShelveLoc"
ExpCustomStat(Carseats,Cvar=c("US","Urban","ShelveLoc"),gpby=TRUE,filt=NULL)
##         US  Urban ShelveLoc Count  Prop
##     <fctr> <fctr>    <fctr> <int> <num>
##  1:    Yes    Yes       Bad    51 12.75
##  2:    Yes    Yes      Good    39  9.75
##  3:    Yes    Yes    Medium    96 24.00
##  4:     No    Yes       Bad    23  5.75
##  5:    Yes     No       Bad    11  2.75
##  6:     No    Yes    Medium    55 13.75
##  7:     No     No    Medium    29  7.25
##  8:    Yes     No    Medium    39  9.75
##  9:     No    Yes      Good    18  4.50
## 10:    Yes     No      Good    22  5.50
## 11:     No     No      Good     6  1.50
## 12:     No     No       Bad    11  2.75

2.3. Adding filters to tables

If we want to understand the number of stores in US and location is Urban for Population size greater than 150

  • Base: Those stores having their average age of the local Population is > 150
  • Table: Calculating the count and proportions of "US" Vs "Urban"
ExpCustomStat(Carseats,Cvar=c("US","Urban"),gpby=TRUE,filt="Population>150")
##        US  Urban         Filter Count  Prop
##    <fctr> <fctr>         <char> <int> <num>
## 1:    Yes    Yes Population>150   138 48.25
## 2:     No    Yes Population>150    63 22.03
## 3:    Yes     No Population>150    53 18.53
## 4:     No     No Population>150    32 11.19
  • Base: Those stores having their average age of the local Population > 150 and Urban==“Yes”
  • Table: Calculating the count and proportions of "US" Vs "ShelveLoc"
ExpCustomStat(Carseats,Cvar=c("US","ShelveLoc"),gpby=TRUE,filt="Urban=='Yes' & Population>150")
##        US ShelveLoc                        Filter Count  Prop
##    <fctr>    <fctr>                        <char> <int> <num>
## 1:    Yes       Bad Urban=='Yes' & Population>150    40 19.90
## 2:    Yes      Good Urban=='Yes' & Population>150    31 15.42
## 3:    Yes    Medium Urban=='Yes' & Population>150    67 33.33
## 4:     No       Bad Urban=='Yes' & Population>150    13  6.47
## 5:     No    Medium Urban=='Yes' & Population>150    37 18.41
## 6:     No      Good Urban=='Yes' & Population>150    13  6.47

3. Numerical summaries

Numerical summaries to describe the distribution for quantitative variables.

  • Measure of location (Mean, median)
  • Measure of variation (Variance, Std.Dev, Range)
  • Measure of shapes (Skewness and Kurtosis)
  • Other summary statistics

3.1. Numerical variable summary

  • Base: All carseats data
  • Table: Descriptive summary for Price, Population, Sales, CompPrice, Income
options(width = 150)
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','var','sd','min','max','IQR'))
##     Attribute Count       mean       sum          var         sd   min    max    IQR
##        <char> <int>      <num>     <num>        <num>      <num> <num>  <num>  <num>
## 1: Population   400 264.840000 105936.00 21719.813935 147.376436    10 509.00 259.50
## 2:      Sales   400   7.496325   2998.53     7.975626   2.824115     0  16.27   3.93
## 3:  CompPrice   400 124.975000  49990.00   235.147243  15.334512    77 175.00  20.00
## 4:     Income   400  68.657500  27463.00   783.218239  27.986037    21 120.00  48.25
  • Base: All carseats data
  • Table: Tukey’s five number summary (minimum, lower-hinge, median, upper-hinge, maximum) for Price, Population, Sales, CompPrice, Income
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('min','p0.25','median','p0.75','max'))
##     Attribute   min  p0.25 median  p0.75    max
##        <char> <num>  <num>  <num>  <num>  <num>
## 1: Population    10 139.00 272.00 398.50 509.00
## 2:      Sales     0   5.39   7.49   9.32  16.27
## 3:  CompPrice    77 115.00 125.00 135.00 175.00
## 4:     Income    21  42.75  69.00  91.00 120.00

3.2. Adding filters to complete data (like base subset)

Filter rows/cases of complete dataset where conditions are true

  • Base: Whether the store is in urban (Urban ==Yes)
  • Table: Descriptive summary for Price, Population, Sales, CompPrice, Income.
options(width = 150)
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','var','min','median','max'),filt="Urban=='Yes'")
##     Attribute       Filter Count       mean      sum          var   min median    max
##        <char>       <char> <int>      <num>    <num>        <num> <num>  <num>  <num>
## 1: Population Urban=='Yes'   282 259.886525 73288.00 20913.488857 12.00 274.00 508.00
## 2:      Sales Urban=='Yes'   282   7.468191  2106.03     8.044141  0.37   7.42  16.27
## 3:  CompPrice Urban=='Yes'   282 125.634752 35429.00   246.545834 77.00 125.00 175.00
## 4:     Income Urban=='Yes'   282  69.343972 19555.00   743.706885 21.00  70.00 120.00
  • Base: Those store having their average age of the local population > 150 and Urban==“Yes”
  • Table: Descriptive summary for Price, Population, Sales, CompPrice, Income.
options(width=150)
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','median','IQR'),filt="Urban=='Yes' & Population>150")
##     Attribute                        Filter Count       mean      sum median    IQR
##        <char>                        <char> <int>      <num>    <num>  <num>  <num>
## 1: Population Urban=='Yes' & Population>150   201 332.641791 66861.00 331.00 165.00
## 2:      Sales Urban=='Yes' & Population>150   201   7.573383  1522.25   7.49   3.74
## 3:  CompPrice Urban=='Yes' & Population>150   201 125.134328 25152.00 125.00  19.00
## 4:     Income Urban=='Yes' & Population>150   201  68.223881 13713.00  69.00  47.00

3.3. Filter out unique value from all the numeric variables

This will be useful when we need to exclude redundant values like ‘999’ or ‘9999’ or ‘-9’ or ‘-1111’, or ‘888’ etc from each selected variable.

Eg:dat = data.frame(x = c(23,24,34,999,12,12,23,999,45), y = c(1,3,4,999,0,999,0,8,999,0)

Exclude 999:

x = c(23,24,34,12,12,23,45) y = c(1,3,4,0,0,8,0)

data_sam = Carseats[,]
data_sam[sample(1:400,30),"Sales"] <- 999
data_sam[sample(1:400,20),"CompPrice"] <- -9
data_sam[sample(1:400,45),"Income"] <- 999
ExpCustomStat(data_sam,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','min'),filt="All %ni% c(999,-9)")
##     Attribute                   Filter Count       mean       sum   min
##        <char>                   <char> <int>      <num>     <num> <num>
## 1: Population Population%ni% c(999,-9)   400 264.840000 105936.00    10
## 2:      Sales      Sales%ni% c(999,-9)   370   7.499865   2774.95     0
## 3:  CompPrice  CompPrice%ni% c(999,-9)   380 124.771053  47413.00    77
## 4:     Income     Income%ni% c(999,-9)   355  68.932394  24471.00    21

3.4. Adding filters at variable level

Different filters for each numeric variable. For example, below are the conditions (logic) for each variable summary analysis.

"Population" - Consider only Good ShelveLoc (the quality of the shelving location for the car seats at each site) ShelveLoc=='Good'

"Sales" - Inculde only those store belongs to Urban location (Urban==Yes)

"CompPrice" - Exclude Price is greater than 150

"Education" - All stores

"Income" - Inculde only stores in US (US==Yes)

Table: Descriptive summary for Price, Population, Sales, CompPrice, Income based on the filters.

options(width = 150)
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Education","Income"),stat = c('Count','mean','sum','var','sd','IQR','median'),filt=c("ShelveLoc=='Good'^Urban=='Yes'^Price>=150^All^US=='Yes'"))
##     Attribute            Filter Count       mean      sum          var         sd      IQR median
##        <char>            <char> <int>      <num>    <num>        <num>      <num>    <num>  <num>
## 1: Population ShelveLoc=='Good'    85 267.047059 22699.00 16193.473950 127.253581 177.0000 272.00
## 2:      Sales      Urban=='Yes'   282   7.468191  2106.03     8.044141   2.836219   3.9175   7.42
## 3:  CompPrice        Price>=150    32 141.875000  4540.00   181.596774  13.475785  18.2500 142.50
## 4:  Education               All   400  13.900000  5560.00     6.867168   2.620528   4.0000  14.00
## 5:     Income         US=='Yes'   258  70.515504 18193.00   782.849953  27.979456  48.0000  70.00

4. Numerical summaries by category

Descriptive summary for numerical variable by group level.

4.1. Variable summary report (One group variable)

  • Base: All carseats data
  • Table: Descriptive summary for Price, Population, Sales by group (Urban, ShelveLoc)
options(width = 150)
ExpCustomStat(Carseats,Cvar = c("Urban","ShelveLoc"), Nvar=c("Population","Sales"), stat = c('Count','Prop','mean','min','P0.25','median','p0.75','max'),gpby=FALSE)
##      Level  Attribute  Group_by Count  Prop       mean   min    P0.25 median    p0.75    max
##     <fctr>     <char>    <char> <int> <num>      <num> <num>    <num>  <num>    <num>  <num>
##  1:    Yes Population     Urban   282 70.50 259.886525 12.00 139.0000 274.00 376.7500 508.00
##  2:     No Population     Urban   118 29.50 276.677966 10.00 144.0000 271.00 408.0000 509.00
##  3:    Yes      Sales     Urban   282 70.50   7.468191  0.37   5.3750   7.42   9.2925  16.27
##  4:     No      Sales     Urban   118 29.50   7.563559  0.00   5.4400   7.67   9.3350  14.90
##  5:    Bad Population ShelveLoc    96 24.00 275.291667 10.00 145.5000 296.00 400.5000 501.00
##  6:   Good Population ShelveLoc    85 21.25 267.047059 14.00 176.0000 272.00 353.0000 503.00
##  7: Medium Population ShelveLoc   219 54.75 259.401826 12.00 124.0000 261.00 405.0000 509.00
##  8:    Bad      Sales ShelveLoc    96 24.00   5.522917  0.37   4.0525   5.21   7.4625  11.67
##  9:   Good      Sales ShelveLoc    85 21.25  10.214000  3.58   8.3300  10.50  11.9600  16.27
## 10: Medium      Sales ShelveLoc   219 54.75   7.306575  0.00   5.6250   7.38   8.7750  13.36

4.2. Variable summary report (More than One group variable)

  • Base: All carseats data
  • Table: Descriptive summary for CompPrice, Income by group (Urban){PS: Percentages of shares}
options(width = 150)
ExpCustomStat(Carseats,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS','min','max','IQR','sd'), gpby = TRUE)
##      Urban     US ShelveLoc Attribute Count  Prop      mean   sum    PS   min   max   IQR        sd
##     <fctr> <fctr>    <fctr>    <char> <int> <num>     <num> <num> <num> <num> <num> <num>     <num>
##  1:    Yes    Yes       Bad CompPrice    51 12.75 124.88235  6369 12.74    93   161 17.00 14.768408
##  2:    Yes    Yes      Good CompPrice    39  9.75 127.00000  4953  9.91    89   156 22.00 15.998355
##  3:    Yes    Yes    Medium CompPrice    96 24.00 125.05208 12005 24.01    85   175 20.00 15.410434
##  4:    Yes     No       Bad CompPrice    23  5.75 125.52174  2887  5.78    86   157 21.50 17.367340
##  5:     No    Yes       Bad CompPrice    11  2.75 122.45455  1347  2.69    88   149  8.50 17.494934
##  6:    Yes     No    Medium CompPrice    55 13.75 126.00000  6930 13.86    77   159 20.00 16.611018
##  7:     No     No    Medium CompPrice    29  7.25 121.86207  3534  7.07    89   154 21.00 16.012772
##  8:     No    Yes    Medium CompPrice    39  9.75 126.33333  4927  9.86    96   150 18.50 14.411862
##  9:    Yes     No      Good CompPrice    18  4.50 126.94444  2285  4.57    95   157 21.00 16.049454
## 10:     No    Yes      Good CompPrice    22  5.50 122.36364  2692  5.39   107   147 16.25 11.986284
## 11:     No     No      Good CompPrice     6  1.50 126.50000   759  1.52    96   142 13.75 16.610238
## 12:     No     No       Bad CompPrice    11  2.75 118.36364  1302  2.60   106   131 15.00  9.080449
## 13:    Yes    Yes       Bad    Income    51 12.75  76.31373  3892 14.17    28   119 40.50 26.657449
## 14:    Yes    Yes      Good    Income    39  9.75  66.89744  2609  9.50    21   117 49.00 28.163802
## 15:    Yes    Yes    Medium    Income    96 24.00  68.83333  6608 24.06    25   120 41.25 27.070538
## 16:    Yes     No       Bad    Income    23  5.75  67.52174  1553  5.65    25   114 34.00 25.414508
## 17:     No    Yes       Bad    Income    11  2.75  72.72727   800  2.91    21   120 52.50 33.400871
## 18:    Yes     No    Medium    Income    55 13.75  66.14545  3638 13.25    22   118 42.50 26.857248
## 19:     No     No    Medium    Income    29  7.25  62.44828  1811  6.59    22   120 60.00 32.077569
## 20:     No    Yes    Medium    Income    39  9.75  69.05128  2693  9.81    21   119 55.00 30.884291
## 21:    Yes     No      Good    Income    18  4.50  69.72222  1255  4.57    24   113 62.25 31.765326
## 22:     No    Yes      Good    Income    22  5.50  72.31818  1591  5.79    22   115 35.00 27.425026
## 23:     No     No      Good    Income     6  1.50  53.83333   323  1.18    30    78 40.25 22.542552
## 24:     No     No       Bad    Income    11  2.75  62.72727   690  2.51    25   105 17.00 23.524842
##      Urban     US ShelveLoc Attribute Count  Prop      mean   sum    PS   min   max   IQR        sd

4.3. Variable summary report (More than One group variable) with filter

  • Base: Stores in Urban location only
  • Table: Descriptive summary for CompPrice, Income by group (Urban){PS: Percentages of shares}
options(width = 150)
ExpCustomStat(Carseats,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS','median','IQR'), gpby = TRUE,filt="Urban=='Yes'")
##      Urban     US ShelveLoc Attribute       Filter Count  Prop      mean   sum    PS median   IQR
##     <fctr> <fctr>    <fctr>    <char>       <char> <int> <num>     <num> <num> <num>  <num> <num>
##  1:    Yes    Yes       Bad CompPrice Urban=='Yes'    51 18.09 124.88235  6369 17.98  125.0 17.00
##  2:    Yes    Yes      Good CompPrice Urban=='Yes'    39 13.83 127.00000  4953 13.98  128.0 22.00
##  3:    Yes    Yes    Medium CompPrice Urban=='Yes'    96 34.04 125.05208 12005 33.88  125.0 20.00
##  4:    Yes     No       Bad CompPrice Urban=='Yes'    23  8.16 125.52174  2887  8.15  125.0 21.50
##  5:    Yes     No    Medium CompPrice Urban=='Yes'    55 19.50 126.00000  6930 19.56  127.0 20.00
##  6:    Yes     No      Good CompPrice Urban=='Yes'    18  6.38 126.94444  2285  6.45  122.0 21.00
##  7:    Yes    Yes       Bad    Income Urban=='Yes'    51 18.09  76.31373  3892 19.90   81.0 40.50
##  8:    Yes    Yes      Good    Income Urban=='Yes'    39 13.83  66.89744  2609 13.34   69.0 49.00
##  9:    Yes    Yes    Medium    Income Urban=='Yes'    96 34.04  68.83333  6608 33.79   69.0 41.25
## 10:    Yes     No       Bad    Income Urban=='Yes'    23  8.16  67.52174  1553  7.94   67.0 34.00
## 11:    Yes     No    Medium    Income Urban=='Yes'    55 19.50  66.14545  3638 18.60   69.0 42.50
## 12:    Yes     No      Good    Income Urban=='Yes'    18  6.38  69.72222  1255  6.42   73.5 62.25
  • Base: Removing redundant values
  • Table: Descriptive summary for CompPrice, Income by group (Urban){PS: Percentages of shares}
options(width = 150)
data_sam = Carseats[,]
data_sam[sample(1:400,30),"Sales"] <- 888
data_sam[sample(1:400,20),"CompPrice"] <- 999
data_sam[sample(1:400,45),"Income"] <- 999
ExpCustomStat(data_sam,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("Sales","CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS'), gpby = TRUE,filt="All %ni% c(888,999)")
##      Urban     US ShelveLoc Attribute                   Filter Count  Prop       mean      sum    PS
##     <fctr> <fctr>    <fctr>    <char>                   <char> <int> <num>      <num>    <num> <num>
##  1:    Yes    Yes       Bad     Sales     Sales%ni% c(888,999)    49 13.24   5.426735   265.91  9.57
##  2:    Yes    Yes      Good     Sales     Sales%ni% c(888,999)    35  9.46  11.139714   389.89 14.03
##  3:    Yes    Yes    Medium     Sales     Sales%ni% c(888,999)    92 24.86   7.554891   695.05 25.01
##  4:    Yes     No       Bad     Sales     Sales%ni% c(888,999)    22  5.95   5.401818   118.84  4.28
##  5:     No    Yes       Bad     Sales     Sales%ni% c(888,999)    10  2.70   6.043000    60.43  2.17
##  6:     No     No    Medium     Sales     Sales%ni% c(888,999)    25  6.76   6.525600   163.14  5.87
##  7:     No    Yes    Medium     Sales     Sales%ni% c(888,999)    38 10.27   7.723421   293.49 10.56
##  8:    Yes     No    Medium     Sales     Sales%ni% c(888,999)    51 13.78   7.024706   358.26 12.89
##  9:    Yes     No      Good     Sales     Sales%ni% c(888,999)    16  4.32   9.654375   154.47  5.56
## 10:     No    Yes      Good     Sales     Sales%ni% c(888,999)    18  4.86  10.288889   185.20  6.66
## 11:     No     No      Good     Sales     Sales%ni% c(888,999)     5  1.35   8.984000    44.92  1.62
## 12:     No     No       Bad     Sales     Sales%ni% c(888,999)     9  2.43   5.455556    49.10  1.77
## 13:    Yes    Yes       Bad CompPrice CompPrice%ni% c(888,999)    50 13.16 125.480000  6274.00 13.21
## 14:    Yes    Yes      Good CompPrice CompPrice%ni% c(888,999)    36  9.47 126.416667  4551.00  9.58
## 15:    Yes    Yes    Medium CompPrice CompPrice%ni% c(888,999)    91 23.95 124.945055 11370.00 23.94
## 16:    Yes     No       Bad CompPrice CompPrice%ni% c(888,999)    22  5.79 125.863636  2769.00  5.83
## 17:     No    Yes       Bad CompPrice CompPrice%ni% c(888,999)    11  2.89 122.454545  1347.00  2.84
## 18:    Yes     No    Medium CompPrice CompPrice%ni% c(888,999)    52 13.68 125.826923  6543.00 13.78
## 19:     No     No    Medium CompPrice CompPrice%ni% c(888,999)    26  6.84 123.653846  3215.00  6.77
## 20:     No    Yes    Medium CompPrice CompPrice%ni% c(888,999)    37  9.74 125.324324  4637.00  9.76
## 21:    Yes     No      Good CompPrice CompPrice%ni% c(888,999)    17  4.47 125.176471  2128.00  4.48
## 22:     No    Yes      Good CompPrice CompPrice%ni% c(888,999)    22  5.79 122.363636  2692.00  5.67
## 23:     No     No      Good CompPrice CompPrice%ni% c(888,999)     5  1.32 132.600000   663.00  1.40
## 24:     No     No       Bad CompPrice CompPrice%ni% c(888,999)    11  2.89 118.363636  1302.00  2.74
## 25:    Yes    Yes       Bad    Income    Income%ni% c(888,999)    48 13.52  74.291667  3566.00 14.71
## 26:    Yes    Yes      Good    Income    Income%ni% c(888,999)    35  9.86  68.657143  2403.00  9.91
## 27:    Yes    Yes    Medium    Income    Income%ni% c(888,999)    86 24.23  69.046512  5938.00 24.49
## 28:    Yes     No       Bad    Income    Income%ni% c(888,999)    18  5.07  69.722222  1255.00  5.18
## 29:     No    Yes       Bad    Income    Income%ni% c(888,999)     9  2.54  77.888889   701.00  2.89
## 30:    Yes     No    Medium    Income    Income%ni% c(888,999)    50 14.08  64.500000  3225.00 13.30
## 31:     No     No    Medium    Income    Income%ni% c(888,999)    26  7.32  62.346154  1621.00  6.69
## 32:    Yes     No      Good    Income    Income%ni% c(888,999)    15  4.23  64.800000   972.00  4.01
## 33:     No    Yes      Good    Income    Income%ni% c(888,999)    20  5.63  72.200000  1444.00  5.96
## 34:     No     No      Good    Income    Income%ni% c(888,999)     5  1.41  49.000000   245.00  1.01
## 35:     No    Yes    Medium    Income    Income%ni% c(888,999)    34  9.58  67.764706  2304.00  9.50
## 36:     No     No       Bad    Income    Income%ni% c(888,999)     9  2.54  63.444444   571.00  2.36
##      Urban     US ShelveLoc Attribute                   Filter Count  Prop       mean      sum    PS

Different base for each numeric variable.

"Population" - Consider only Good ShelveLoc (the quality of the shelving location for the car seats at each site) ShelveLoc=='Good'

"Sales" - Inculde only those store belongs to Urban location (Urban==Yes)

"CompPrice" - Exclude Price is greater than 150

ExpCustomStat(Carseats,Cvar = c("Urban","US"), Nvar=c("Population","Sales","CompPrice"), stat = c('Count','Prop','mean','sum','var','IQR'), filt=c("ShelveLoc=='Good'^Urban=='Yes'^Price>=150"))
##      Urban     US  Attribute            Filter Count  Prop       mean      sum          var      IQR
##     <fctr> <fctr>     <char>            <char> <int> <num>      <num>    <num>        <num>    <num>
##  1:    Yes    Yes Population ShelveLoc=='Good'    39 45.88 277.538462 10824.00 18259.676113 176.0000
##  2:    Yes     No Population ShelveLoc=='Good'    18 21.18 219.888889  3958.00 13625.633987 155.5000
##  3:     No    Yes Population ShelveLoc=='Good'    22 25.88 283.318182  6233.00 15122.512987 205.5000
##  4:     No     No Population ShelveLoc=='Good'     6  7.06 280.666667  1684.00 13183.066667 125.2500
##  5:    Yes    Yes      Sales      Urban=='Yes'   186 65.96   7.710968  1434.24     8.734665   4.0775
##  6:    Yes     No      Sales      Urban=='Yes'    96 34.04   6.997813   671.79     6.445127   3.4175
##  7:    Yes    Yes  CompPrice        Price>=150    16 50.00 142.187500  2275.00   223.362500  17.5000
##  8:     No    Yes  CompPrice        Price>=150     7 21.88 138.428571   969.00   180.952381  16.0000
##  9:    Yes     No  CompPrice        Price>=150     7 21.88 143.285714  1003.00   156.571429  14.5000
## 10:     No     No  CompPrice        Price>=150     2  6.25 146.500000   293.00   112.500000   7.5000

5. Resahpe data

Reshapes a grouped data

options(width = 150)
ExpCustomStat(Carseats,Cvar = c("Urban"), Nvar=c("Population","Sales"), stat = c('Count','Prop'),gpby=TRUE,dcast=TRUE)
## Row value   : Attribute
## Column value: Urban
## Statistics  : Count + Prop
## Key: <Attribute>
##     Attribute Count_No Count_Yes Prop_No Prop_Yes
##        <char>    <int>     <int>   <num>    <num>
## 1: Population      118       282    29.5     70.5
## 2:      Sales      118       282    29.5     70.5
Example scripts
##Frequency table for categorical variables
ExpCustomStat(Carseats,Cvar=c("US","Urban","ShelveLoc"),gpby=FALSE)

##Crosstabulation between categorical variables
ExpCustomStat(Carseats,Cvar=c("US","Urban"),gpby=TRUE,filt=NULL)
ExpCustomStat(Carseats,Cvar=c("US","Urban","ShelveLoc"),gpby=TRUE,filt=NULL)

##Adding filters for custom tables
ExpCustomStat(Carseats,Cvar=c("US","Urban"),gpby=TRUE,filt="Population>150")
ExpCustomStat(Carseats,Cvar=c("US","ShelveLoc"),gpby=TRUE,filt="Urban=='Yes' & Population>150")

## Numeric variable summary
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','var','min','max'))
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('min','p0.25','median','p0.75','max'))

## Adding filters for complete data (like base Subset)
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','var'),filt="Urban=='Yes'")
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum'),filt="Urban=='Yes' & Population>150")

## Filter unique value from all the numeric variables
ExpCustomStat(data_sam,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','min'),filt="All %ni% c(999,-9)")

## Adding filters at variable level
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Education","Income"),stat = c('Count','mean','sum','var','sd','IQR','median'),filt=c("ShelveLoc=='Good'^Urban=='Yes'^Price>=150^ ^US=='Yes'"))

##Numerical summaries by category
##Variable summary report (One group variable)
ExpCustomStat(Carseats,Cvar = c("Urban","ShelveLoc"), Nvar=c("Population","Sales"), stat = c('Count','Prop','mean','min','P0.25','median','p0.75','max'),gpby=FALSE)

##Variable summary report (More than One group variable)
ExpCustomStat(Carseats,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS','min','max','IQR','sd'), gpby = TRUE)

##Variable summary report (More than One group variable) with filter
ExpCustomStat(Carseats,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS','P0.25','median','p0.75'), gpby = TRUE,filt="Urban=='Yes'")
ExpCustomStat(data_sam,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("Sales","CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS'), gpby = TRUE,filt="All %ni% c(888,999)")
ExpCustomStat(Carseats,Cvar = c("Urban","US"), Nvar=c("Population","Sales","CompPrice"), stat = c('Count','Prop','mean','sum','var','min','max'), filt=c("ShelveLoc=='Good'^Urban=='Yes'^Price>=150"))
References