In this vignette we will discuss about how to customize the summary
statistics using ExpCustomStat function from
SmartEDA
. The output of this function returns matrix object
containing descriptive information on all input variables for each
level/combination of levels in categorical/group variable. Also, while
running the analysis we can filter row/cases of the data. We can apply
the filters at individual variable level or complete data like base
subsetting.
Function definition:
ExpCustomStat(data,Cvar=NULL,Nvar=NULL,stat=NULL,gpby=TRUE,filt=NULL,dcast=FALSE)
Key functionalities of ExpCustomStat
are:
ExpCustomStat
functionWill open the carseats data from ISLR package and drive different
types of use cases using ExpCustomStat
function.
In this vignette, we will be using a simulated data set containing sales of child car seats at 400 different stores.
Data Source ISLR package.
Function source SmartEDA package
Carseats data from ISLR package:
## Sales CompPrice Income Advertising Population Price ShelveLoc Age Education Urban US
## 1 9.50 138 73 11 276 120 Bad 42 17 Yes Yes
## 2 11.22 111 48 16 260 83 Good 65 10 Yes Yes
## 3 10.06 113 35 10 269 80 Medium 59 12 Yes Yes
## 4 7.40 117 100 4 466 97 Medium 55 14 Yes Yes
## 5 4.15 141 64 3 340 128 Bad 38 13 Yes No
Categorical summaries to describe the distribution for a qualitative variables.
The number of observations for particular category
"US", "Urban" and "ShelveLoc"
variable## Level Group_by Count Prop
## <fctr> <char> <int> <num>
## 1: Yes US 258 64.50
## 2: No US 142 35.50
## 3: Yes Urban 282 70.50
## 4: No Urban 118 29.50
## 5: Bad ShelveLoc 96 24.00
## 6: Good ShelveLoc 85 21.25
## 7: Medium ShelveLoc 219 54.75
## 8: 17 Education 49 12.25
## 9: 10 Education 48 12.00
## 10: 12 Education 49 12.25
## 11: 14 Education 40 10.00
## 12: 13 Education 43 10.75
## 13: 16 Education 47 11.75
## 14: 15 Education 36 9.00
## 15: 18 Education 40 10.00
## 16: 11 Education 48 12.00
OR we can use similar analysis using ExpCTable
function
from same package, this functions includes cumulative percentages and
Total
## Variable Valid Frequency Percent CumPercent
## 1 ShelveLoc Bad 96 24.00 24.00
## 2 ShelveLoc Good 85 21.25 45.25
## 3 ShelveLoc Medium 219 54.75 100.00
## 4 ShelveLoc TOTAL 400 NA NA
## 5 Urban No 118 29.50 29.50
## 6 Urban Yes 282 70.50 100.00
## 7 Urban TOTAL 400 NA NA
## 8 US No 142 35.50 35.50
## 9 US Yes 258 64.50 100.00
## 10 US TOTAL 400 NA NA
## 11 Education 10 48 12.00 12.00
## 12 Education 11 48 12.00 24.00
## 13 Education 12 49 12.25 36.25
## 14 Education 13 43 10.75 47.00
## 15 Education 14 40 10.00 57.00
## 16 Education 15 36 9.00 66.00
## 17 Education 16 47 11.75 77.75
## 18 Education 17 49 12.25 90.00
## 19 Education 18 40 10.00 100.00
## 20 Education TOTAL 400 NA NA
"US", "Urban" and "ShelveLoc"
variable## Level Group_by Count Prop
## <fctr> <char> <int> <num>
## 1: Yes US 258 64.50
## 2: No US 142 35.50
## 3: Yes Urban 282 70.50
## 4: No Urban 118 29.50
## 5: Bad ShelveLoc 96 24.00
## 6: Good ShelveLoc 85 21.25
## 7: Medium ShelveLoc 219 54.75
To produce cross tables which calculate counts and proportions for each combination of categorical variables we can use ExpCustomStat
NOTE: For crosstabulation change input gpby
=TRUE
"US" Vs "Urban"
## US Urban Count Prop
## <fctr> <fctr> <int> <num>
## 1: Yes Yes 186 46.5
## 2: No Yes 96 24.0
## 3: Yes No 72 18.0
## 4: No No 46 11.5
We can also produce multidimensional tables based on three or more categorical variables
"US" Vs "Urban" Vs "ShelveLoc"
## US Urban ShelveLoc Count Prop
## <fctr> <fctr> <fctr> <int> <num>
## 1: Yes Yes Bad 51 12.75
## 2: Yes Yes Good 39 9.75
## 3: Yes Yes Medium 96 24.00
## 4: No Yes Bad 23 5.75
## 5: Yes No Bad 11 2.75
## 6: No Yes Medium 55 13.75
## 7: No No Medium 29 7.25
## 8: Yes No Medium 39 9.75
## 9: No Yes Good 18 4.50
## 10: Yes No Good 22 5.50
## 11: No No Good 6 1.50
## 12: No No Bad 11 2.75
If we want to understand the number of stores in US
and
location is Urban
for Population
size greater
than 150
Population
is > 150"US" Vs "Urban"
## US Urban Filter Count Prop
## <fctr> <fctr> <char> <int> <num>
## 1: Yes Yes Population>150 138 48.25
## 2: No Yes Population>150 63 22.03
## 3: Yes No Population>150 53 18.53
## 4: No No Population>150 32 11.19
Population
> 150 and Urban
==“Yes”"US" Vs "ShelveLoc"
## US ShelveLoc Filter Count Prop
## <fctr> <fctr> <char> <int> <num>
## 1: Yes Bad Urban=='Yes' & Population>150 40 19.90
## 2: Yes Good Urban=='Yes' & Population>150 31 15.42
## 3: Yes Medium Urban=='Yes' & Population>150 67 33.33
## 4: No Bad Urban=='Yes' & Population>150 13 6.47
## 5: No Medium Urban=='Yes' & Population>150 37 18.41
## 6: No Good Urban=='Yes' & Population>150 13 6.47
Numerical summaries to describe the distribution for quantitative variables.
options(width = 150)
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','var','sd','min','max','IQR'))
## Attribute Count mean sum var sd min max IQR
## <char> <int> <num> <num> <num> <num> <num> <num> <num>
## 1: Population 400 264.840000 105936.00 21719.813935 147.376436 10 509.00 259.50
## 2: Sales 400 7.496325 2998.53 7.975626 2.824115 0 16.27 3.93
## 3: CompPrice 400 124.975000 49990.00 235.147243 15.334512 77 175.00 20.00
## 4: Income 400 68.657500 27463.00 783.218239 27.986037 21 120.00 48.25
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('min','p0.25','median','p0.75','max'))
## Attribute min p0.25 median p0.75 max
## <char> <num> <num> <num> <num> <num>
## 1: Population 10 139.00 272.00 398.50 509.00
## 2: Sales 0 5.39 7.49 9.32 16.27
## 3: CompPrice 77 115.00 125.00 135.00 175.00
## 4: Income 21 42.75 69.00 91.00 120.00
Filter rows/cases of complete dataset where conditions are true
options(width = 150)
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','var','min','median','max'),filt="Urban=='Yes'")
## Attribute Filter Count mean sum var min median max
## <char> <char> <int> <num> <num> <num> <num> <num> <num>
## 1: Population Urban=='Yes' 282 259.886525 73288.00 20913.488857 12.00 274.00 508.00
## 2: Sales Urban=='Yes' 282 7.468191 2106.03 8.044141 0.37 7.42 16.27
## 3: CompPrice Urban=='Yes' 282 125.634752 35429.00 246.545834 77.00 125.00 175.00
## 4: Income Urban=='Yes' 282 69.343972 19555.00 743.706885 21.00 70.00 120.00
options(width=150)
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','median','IQR'),filt="Urban=='Yes' & Population>150")
## Attribute Filter Count mean sum median IQR
## <char> <char> <int> <num> <num> <num> <num>
## 1: Population Urban=='Yes' & Population>150 201 332.641791 66861.00 331.00 165.00
## 2: Sales Urban=='Yes' & Population>150 201 7.573383 1522.25 7.49 3.74
## 3: CompPrice Urban=='Yes' & Population>150 201 125.134328 25152.00 125.00 19.00
## 4: Income Urban=='Yes' & Population>150 201 68.223881 13713.00 69.00 47.00
This will be useful when we need to exclude redundant values like ‘999’ or ‘9999’ or ‘-9’ or ‘-1111’, or ‘888’ etc from each selected variable.
Eg:dat = data.frame(x = c(23,24,34,999,12,12,23,999,45), y = c(1,3,4,999,0,999,0,8,999,0)
Exclude 999:
x = c(23,24,34,12,12,23,45) y = c(1,3,4,0,0,8,0)
data_sam = Carseats[,]
data_sam[sample(1:400,30),"Sales"] <- 999
data_sam[sample(1:400,20),"CompPrice"] <- -9
data_sam[sample(1:400,45),"Income"] <- 999
ExpCustomStat(data_sam,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','min'),filt="All %ni% c(999,-9)")
## Attribute Filter Count mean sum min
## <char> <char> <int> <num> <num> <num>
## 1: Population Population%ni% c(999,-9) 400 264.84000 105936.00 10
## 2: Sales Sales%ni% c(999,-9) 370 7.51200 2779.44 0
## 3: CompPrice CompPrice%ni% c(999,-9) 380 124.93421 47475.00 77
## 4: Income Income%ni% c(999,-9) 355 68.42254 24290.00 21
Different filters for each numeric variable. For example, below are the conditions (logic) for each variable summary analysis.
"Population" - Consider only Good ShelveLoc (the quality of the shelving location for the car seats at each site) ShelveLoc=='Good'
"Sales" - Inculde only those store belongs to Urban location (Urban==Yes)
"CompPrice" - Exclude Price is greater than 150
"Education" - All stores
"Income" - Inculde only stores in US (US==Yes)
Table: Descriptive summary for Price, Population, Sales, CompPrice, Income based on the filters.
options(width = 150)
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Education","Income"),stat = c('Count','mean','sum','var','sd','IQR','median'),filt=c("ShelveLoc=='Good'^Urban=='Yes'^Price>=150^All^US=='Yes'"))
## Attribute Filter Count mean sum var sd IQR median
## <char> <char> <int> <num> <num> <num> <num> <num> <num>
## 1: Population ShelveLoc=='Good' 85 267.047059 22699.00 16193.473950 127.253581 177.0000 272.00
## 2: Sales Urban=='Yes' 282 7.468191 2106.03 8.044141 2.836219 3.9175 7.42
## 3: CompPrice Price>=150 32 141.875000 4540.00 181.596774 13.475785 18.2500 142.50
## 4: Education All 400 13.900000 5560.00 6.867168 2.620528 4.0000 14.00
## 5: Income US=='Yes' 258 70.515504 18193.00 782.849953 27.979456 48.0000 70.00
Descriptive summary for numerical variable by group level.
options(width = 150)
ExpCustomStat(Carseats,Cvar = c("Urban","ShelveLoc"), Nvar=c("Population","Sales"), stat = c('Count','Prop','mean','min','P0.25','median','p0.75','max'),gpby=FALSE)
## Level Attribute Group_by Count Prop mean min P0.25 median p0.75 max
## <fctr> <char> <char> <int> <num> <num> <num> <num> <num> <num> <num>
## 1: Yes Population Urban 282 70.50 259.886525 12.00 139.0000 274.00 376.7500 508.00
## 2: No Population Urban 118 29.50 276.677966 10.00 144.0000 271.00 408.0000 509.00
## 3: Yes Sales Urban 282 70.50 7.468191 0.37 5.3750 7.42 9.2925 16.27
## 4: No Sales Urban 118 29.50 7.563559 0.00 5.4400 7.67 9.3350 14.90
## 5: Bad Population ShelveLoc 96 24.00 275.291667 10.00 145.5000 296.00 400.5000 501.00
## 6: Good Population ShelveLoc 85 21.25 267.047059 14.00 176.0000 272.00 353.0000 503.00
## 7: Medium Population ShelveLoc 219 54.75 259.401826 12.00 124.0000 261.00 405.0000 509.00
## 8: Bad Sales ShelveLoc 96 24.00 5.522917 0.37 4.0525 5.21 7.4625 11.67
## 9: Good Sales ShelveLoc 85 21.25 10.214000 3.58 8.3300 10.50 11.9600 16.27
## 10: Medium Sales ShelveLoc 219 54.75 7.306575 0.00 5.6250 7.38 8.7750 13.36
options(width = 150)
ExpCustomStat(Carseats,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS','min','max','IQR','sd'), gpby = TRUE)
## Urban US ShelveLoc Attribute Count Prop mean sum PS min max IQR sd
## <fctr> <fctr> <fctr> <char> <int> <num> <num> <num> <num> <num> <num> <num> <num>
## 1: Yes Yes Bad CompPrice 51 12.75 124.88235 6369 12.74 93 161 17.00 14.768408
## 2: Yes Yes Good CompPrice 39 9.75 127.00000 4953 9.91 89 156 22.00 15.998355
## 3: Yes Yes Medium CompPrice 96 24.00 125.05208 12005 24.01 85 175 20.00 15.410434
## 4: Yes No Bad CompPrice 23 5.75 125.52174 2887 5.78 86 157 21.50 17.367340
## 5: No Yes Bad CompPrice 11 2.75 122.45455 1347 2.69 88 149 8.50 17.494934
## 6: Yes No Medium CompPrice 55 13.75 126.00000 6930 13.86 77 159 20.00 16.611018
## 7: No No Medium CompPrice 29 7.25 121.86207 3534 7.07 89 154 21.00 16.012772
## 8: No Yes Medium CompPrice 39 9.75 126.33333 4927 9.86 96 150 18.50 14.411862
## 9: Yes No Good CompPrice 18 4.50 126.94444 2285 4.57 95 157 21.00 16.049454
## 10: No Yes Good CompPrice 22 5.50 122.36364 2692 5.39 107 147 16.25 11.986284
## 11: No No Good CompPrice 6 1.50 126.50000 759 1.52 96 142 13.75 16.610238
## 12: No No Bad CompPrice 11 2.75 118.36364 1302 2.60 106 131 15.00 9.080449
## 13: Yes Yes Bad Income 51 12.75 76.31373 3892 14.17 28 119 40.50 26.657449
## 14: Yes Yes Good Income 39 9.75 66.89744 2609 9.50 21 117 49.00 28.163802
## 15: Yes Yes Medium Income 96 24.00 68.83333 6608 24.06 25 120 41.25 27.070538
## 16: Yes No Bad Income 23 5.75 67.52174 1553 5.65 25 114 34.00 25.414508
## 17: No Yes Bad Income 11 2.75 72.72727 800 2.91 21 120 52.50 33.400871
## 18: Yes No Medium Income 55 13.75 66.14545 3638 13.25 22 118 42.50 26.857248
## 19: No No Medium Income 29 7.25 62.44828 1811 6.59 22 120 60.00 32.077569
## 20: No Yes Medium Income 39 9.75 69.05128 2693 9.81 21 119 55.00 30.884291
## 21: Yes No Good Income 18 4.50 69.72222 1255 4.57 24 113 62.25 31.765326
## 22: No Yes Good Income 22 5.50 72.31818 1591 5.79 22 115 35.00 27.425026
## 23: No No Good Income 6 1.50 53.83333 323 1.18 30 78 40.25 22.542552
## 24: No No Bad Income 11 2.75 62.72727 690 2.51 25 105 17.00 23.524842
## Urban US ShelveLoc Attribute Count Prop mean sum PS min max IQR sd
options(width = 150)
ExpCustomStat(Carseats,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS','median','IQR'), gpby = TRUE,filt="Urban=='Yes'")
## Urban US ShelveLoc Attribute Filter Count Prop mean sum PS median IQR
## <fctr> <fctr> <fctr> <char> <char> <int> <num> <num> <num> <num> <num> <num>
## 1: Yes Yes Bad CompPrice Urban=='Yes' 51 18.09 124.88235 6369 17.98 125.0 17.00
## 2: Yes Yes Good CompPrice Urban=='Yes' 39 13.83 127.00000 4953 13.98 128.0 22.00
## 3: Yes Yes Medium CompPrice Urban=='Yes' 96 34.04 125.05208 12005 33.88 125.0 20.00
## 4: Yes No Bad CompPrice Urban=='Yes' 23 8.16 125.52174 2887 8.15 125.0 21.50
## 5: Yes No Medium CompPrice Urban=='Yes' 55 19.50 126.00000 6930 19.56 127.0 20.00
## 6: Yes No Good CompPrice Urban=='Yes' 18 6.38 126.94444 2285 6.45 122.0 21.00
## 7: Yes Yes Bad Income Urban=='Yes' 51 18.09 76.31373 3892 19.90 81.0 40.50
## 8: Yes Yes Good Income Urban=='Yes' 39 13.83 66.89744 2609 13.34 69.0 49.00
## 9: Yes Yes Medium Income Urban=='Yes' 96 34.04 68.83333 6608 33.79 69.0 41.25
## 10: Yes No Bad Income Urban=='Yes' 23 8.16 67.52174 1553 7.94 67.0 34.00
## 11: Yes No Medium Income Urban=='Yes' 55 19.50 66.14545 3638 18.60 69.0 42.50
## 12: Yes No Good Income Urban=='Yes' 18 6.38 69.72222 1255 6.42 73.5 62.25
options(width = 150)
data_sam = Carseats[,]
data_sam[sample(1:400,30),"Sales"] <- 888
data_sam[sample(1:400,20),"CompPrice"] <- 999
data_sam[sample(1:400,45),"Income"] <- 999
ExpCustomStat(data_sam,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("Sales","CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS'), gpby = TRUE,filt="All %ni% c(888,999)")
## Urban US ShelveLoc Attribute Filter Count Prop mean sum PS
## <fctr> <fctr> <fctr> <char> <char> <int> <num> <num> <num> <num>
## 1: Yes Yes Bad Sales Sales%ni% c(888,999) 48 12.97 5.454792 261.83 9.53
## 2: Yes Yes Good Sales Sales%ni% c(888,999) 36 9.73 10.831111 389.92 14.19
## 3: Yes Yes Medium Sales Sales%ni% c(888,999) 87 23.51 7.528851 655.01 23.83
## 4: Yes No Bad Sales Sales%ni% c(888,999) 22 5.95 5.187273 114.12 4.15
## 5: No Yes Bad Sales Sales%ni% c(888,999) 10 2.70 6.119000 61.19 2.23
## 6: Yes No Medium Sales Sales%ni% c(888,999) 53 14.32 6.950755 368.39 13.40
## 7: No No Medium Sales Sales%ni% c(888,999) 29 7.84 6.440690 186.78 6.80
## 8: No Yes Medium Sales Sales%ni% c(888,999) 34 9.19 7.914412 269.09 9.79
## 9: Yes No Good Sales Sales%ni% c(888,999) 17 4.59 9.106471 154.81 5.63
## 10: No Yes Good Sales Sales%ni% c(888,999) 18 4.86 10.119444 182.15 6.63
## 11: No No Good Sales Sales%ni% c(888,999) 6 1.62 8.968333 53.81 1.96
## 12: No No Bad Sales Sales%ni% c(888,999) 10 2.70 5.133000 51.33 1.87
## 13: Yes Yes Bad CompPrice CompPrice%ni% c(888,999) 46 12.11 124.173913 5712.00 12.04
## 14: Yes Yes Good CompPrice CompPrice%ni% c(888,999) 39 10.26 127.000000 4953.00 10.44
## 15: Yes Yes Medium CompPrice CompPrice%ni% c(888,999) 93 24.47 125.279570 11651.00 24.55
## 16: Yes No Bad CompPrice CompPrice%ni% c(888,999) 22 5.79 125.954545 2771.00 5.84
## 17: No Yes Bad CompPrice CompPrice%ni% c(888,999) 11 2.89 122.454545 1347.00 2.84
## 18: Yes No Medium CompPrice CompPrice%ni% c(888,999) 50 13.16 126.260000 6313.00 13.30
## 19: No No Medium CompPrice CompPrice%ni% c(888,999) 26 6.84 120.846154 3142.00 6.62
## 20: No Yes Medium CompPrice CompPrice%ni% c(888,999) 38 10.00 126.263158 4798.00 10.11
## 21: Yes No Good CompPrice CompPrice%ni% c(888,999) 18 4.74 126.944444 2285.00 4.81
## 22: No Yes Good CompPrice CompPrice%ni% c(888,999) 21 5.53 121.523810 2552.00 5.38
## 23: No No Good CompPrice CompPrice%ni% c(888,999) 5 1.32 126.200000 631.00 1.33
## 24: No No Bad CompPrice CompPrice%ni% c(888,999) 11 2.89 118.363636 1302.00 2.74
## 25: Yes Yes Bad Income Income%ni% c(888,999) 43 12.11 78.534884 3377.00 13.71
## 26: Yes Yes Good Income Income%ni% c(888,999) 33 9.30 65.606061 2165.00 8.79
## 27: Yes Yes Medium Income Income%ni% c(888,999) 88 24.79 68.988636 6071.00 24.65
## 28: Yes No Medium Income Income%ni% c(888,999) 49 13.80 67.857143 3325.00 13.50
## 29: No No Medium Income Income%ni% c(888,999) 24 6.76 61.375000 1473.00 5.98
## 30: No Yes Medium Income Income%ni% c(888,999) 35 9.86 70.771429 2477.00 10.06
## 31: No Yes Bad Income Income%ni% c(888,999) 10 2.82 68.700000 687.00 2.79
## 32: Yes No Good Income Income%ni% c(888,999) 17 4.79 71.882353 1222.00 4.96
## 33: No Yes Good Income Income%ni% c(888,999) 21 5.92 72.095238 1514.00 6.15
## 34: No No Good Income Income%ni% c(888,999) 5 1.41 58.600000 293.00 1.19
## 35: No No Bad Income Income%ni% c(888,999) 10 2.82 63.000000 630.00 2.56
## 36: Yes No Bad Income Income%ni% c(888,999) 20 5.63 69.650000 1393.00 5.66
## Urban US ShelveLoc Attribute Filter Count Prop mean sum PS
Different base for each numeric variable.
"Population" - Consider only Good ShelveLoc (the quality of the shelving location for the car seats at each site) ShelveLoc=='Good'
"Sales" - Inculde only those store belongs to Urban location (Urban==Yes)
"CompPrice" - Exclude Price is greater than 150
ExpCustomStat(Carseats,Cvar = c("Urban","US"), Nvar=c("Population","Sales","CompPrice"), stat = c('Count','Prop','mean','sum','var','IQR'), filt=c("ShelveLoc=='Good'^Urban=='Yes'^Price>=150"))
## Urban US Attribute Filter Count Prop mean sum var IQR
## <fctr> <fctr> <char> <char> <int> <num> <num> <num> <num> <num>
## 1: Yes Yes Population ShelveLoc=='Good' 39 45.88 277.538462 10824.00 18259.676113 176.0000
## 2: Yes No Population ShelveLoc=='Good' 18 21.18 219.888889 3958.00 13625.633987 155.5000
## 3: No Yes Population ShelveLoc=='Good' 22 25.88 283.318182 6233.00 15122.512987 205.5000
## 4: No No Population ShelveLoc=='Good' 6 7.06 280.666667 1684.00 13183.066667 125.2500
## 5: Yes Yes Sales Urban=='Yes' 186 65.96 7.710968 1434.24 8.734665 4.0775
## 6: Yes No Sales Urban=='Yes' 96 34.04 6.997813 671.79 6.445127 3.4175
## 7: Yes Yes CompPrice Price>=150 16 50.00 142.187500 2275.00 223.362500 17.5000
## 8: No Yes CompPrice Price>=150 7 21.88 138.428571 969.00 180.952381 16.0000
## 9: Yes No CompPrice Price>=150 7 21.88 143.285714 1003.00 156.571429 14.5000
## 10: No No CompPrice Price>=150 2 6.25 146.500000 293.00 112.500000 7.5000
Reshapes a grouped data
options(width = 150)
ExpCustomStat(Carseats,Cvar = c("Urban"), Nvar=c("Population","Sales"), stat = c('Count','Prop'),gpby=TRUE,dcast=TRUE)
## Row value : Attribute
## Column value: Urban
## Statistics : Count + Prop
## Key: <Attribute>
## Attribute Count_No Count_Yes Prop_No Prop_Yes
## <char> <int> <int> <num> <num>
## 1: Population 118 282 29.5 70.5
## 2: Sales 118 282 29.5 70.5
##Frequency table for categorical variables
ExpCustomStat(Carseats,Cvar=c("US","Urban","ShelveLoc"),gpby=FALSE)
##Crosstabulation between categorical variables
ExpCustomStat(Carseats,Cvar=c("US","Urban"),gpby=TRUE,filt=NULL)
ExpCustomStat(Carseats,Cvar=c("US","Urban","ShelveLoc"),gpby=TRUE,filt=NULL)
##Adding filters for custom tables
ExpCustomStat(Carseats,Cvar=c("US","Urban"),gpby=TRUE,filt="Population>150")
ExpCustomStat(Carseats,Cvar=c("US","ShelveLoc"),gpby=TRUE,filt="Urban=='Yes' & Population>150")
## Numeric variable summary
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','var','min','max'))
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('min','p0.25','median','p0.75','max'))
## Adding filters for complete data (like base Subset)
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','var'),filt="Urban=='Yes'")
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum'),filt="Urban=='Yes' & Population>150")
## Filter unique value from all the numeric variables
ExpCustomStat(data_sam,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','min'),filt="All %ni% c(999,-9)")
## Adding filters at variable level
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Education","Income"),stat = c('Count','mean','sum','var','sd','IQR','median'),filt=c("ShelveLoc=='Good'^Urban=='Yes'^Price>=150^ ^US=='Yes'"))
##Numerical summaries by category
##Variable summary report (One group variable)
ExpCustomStat(Carseats,Cvar = c("Urban","ShelveLoc"), Nvar=c("Population","Sales"), stat = c('Count','Prop','mean','min','P0.25','median','p0.75','max'),gpby=FALSE)
##Variable summary report (More than One group variable)
ExpCustomStat(Carseats,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS','min','max','IQR','sd'), gpby = TRUE)
##Variable summary report (More than One group variable) with filter
ExpCustomStat(Carseats,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS','P0.25','median','p0.75'), gpby = TRUE,filt="Urban=='Yes'")
ExpCustomStat(data_sam,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("Sales","CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS'), gpby = TRUE,filt="All %ni% c(888,999)")
ExpCustomStat(Carseats,Cvar = c("Urban","US"), Nvar=c("Population","Sales","CompPrice"), stat = c('Count','Prop','mean','sum','var','min','max'), filt=c("ShelveLoc=='Good'^Urban=='Yes'^Price>=150"))