class: center, middle, inverse, title-slide .title[ # 量化金融与金融编程 ] .subtitle[ ## L02 ggplot2 数据可视化 | 课前预习 ] .author[ ###
曾永艺 ] .institute[ ### 厦门大学管理学院 ] .date[ ###
2023-09-18 ] --- class: middle, hide_logo background-image: url(imgs/logo-ggplot2.png) background-size: 10em background-position: 90% 50%
- ## 一图胜千言 - ## The Layered Grammar of Graphics - ## 将数据映射为几何对象的图形属性 - ## 统计变换、位置调整、坐标、分面 - ## 标签、标注、标度、主题等 --- ### 一图胜千言 <img src="L02_Visualization_Prepare_files/figure-html/datasaurus-1.png" width="85%" style="display: block; margin: auto;" /> --- class: inverse, center, middle background-image: url(imgs/logo-ggplot2.svg) background-size: 12% background-position: 14% 50% # `ggplot2`<sup>.font60[v3.4.3]</sup> 快速入门 --- layout: true ### .bold[1 The Layered Grammar of Graphics] --- .font130[_Data Visualization_:将.bold[数据]映射为.bold[几何对象的美学属性]] -- .font110[ - 数据(.red[data]):我们想要可视化的对象,包含变量 - 几何对象(.red[geom]etries):用来呈现数据的几何图形对象,如点、条形、线条等 - 美学属性(.red[aes]thetic):几何对象的视觉属性,如`x`坐标和`y`坐标位置、颜色、形状等 ] -- <img src="imgs/visualization-stat-point.png" width="100%" style="display: block; margin: auto;" /> --- .font130[`ggplot2` 的语法模板] -- .code150[ ```r *ggplot(data = <DATA>) + * <GEOM_FUNCTION>( * mapping = aes(<MAPPINGS>), stat = <STAT>, position = <POSITION> ) + <COORDINATE_FUNCTION> + <FACET_FUNCTION> ``` ] --- layout: true ### .bold[2 将.red[数据]映射为几何对象的美学属性] --- ```r library(tidyverse) ``` -- ```r mpg # print(mpg) ``` ``` #> # A tibble: 234 × 11 #> manufacturer model displ year cyl trans drv cty hwy fl class #> <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> #> 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact #> 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact #> 3 audi a4 2 2008 4 manual(m6) f 20 31 p compact #> # ℹ 231 more rows ``` -- ```r # 查看ggplot2包内置数据集mpg的帮助文档 ?mpg # help(mpg) ``` -- 1. `displ`: a car's engine size, in litres. 2. `hwy`: a car's fuel efficiency on the highway, in miles per gallon (mpg). 3. `class`: "type" of car ... --- layout: true ### .bold[2 将数据.red[映射为几何对象的美学属性]] --- count: false ```r ggplot() ``` <img src="L02_Visualization_Prepare_files/figure-html/mapping1-1-1.png" width="60%" style="display: block; margin: auto;" /> --- count: false ```r ggplot(data = mpg) # 数据集 ``` <img src="L02_Visualization_Prepare_files/figure-html/mapping1-2-1.png" width="60%" style="display: block; margin: auto;" /> --- count: false ```r ggplot(data = mpg, # 数据集 mapping = aes(x = displ, y = hwy)) # 映射:变量 -> x坐标和y坐标 ``` <img src="L02_Visualization_Prepare_files/figure-html/mapping1-3-1.png" width="60%" style="display: block; margin: auto;" /> --- count: false ```r ggplot(data = mpg, # 数据集 mapping = aes(x = displ, y = hwy)) + # 映射:变量 -> x坐标和y坐标 geom_point() # 几何对象 ``` <img src="L02_Visualization_Prepare_files/figure-html/mapping1-4-1.png" width="60%" style="display: block; margin: auto;" /> --- ```r ggplot(data = mpg) + geom_point( * mapping = aes(x = displ, y = hwy, colour = class) # 3个映射 ) ``` <img src="L02_Visualization_Prepare_files/figure-html/mapping2-1.png" width="60%" style="display: block; margin: auto;" /> --- ```r ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, shape = class)) ``` ``` #> Warning: The shape palette can deal with a maximum of 6 discrete values because more than #> 6 becomes difficult to discriminate; you have 7. Consider specifying shapes #> manually if you must have them. ``` ``` #> Warning: Removed 62 rows containing missing values (`geom_point()`). ``` <img src="L02_Visualization_Prepare_files/figure-html/mapping3-1.png" width="60%" style="display: block; margin: auto;" /> --- .font120[ - R内置有25个形状(shape),在`ggplot2`中可用名字或数字进行设定 ] .pull-left[ <img src="L02_Visualization_Prepare_files/figure-html/shapes-name-1.png" width="95%" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="L02_Visualization_Prepare_files/figure-html/shapes-id-1.png" width="95%" style="display: block; margin: auto;" /> ] -- .font120[ - 设定图形属性相关的说明文档 .content-box-yellow.font80[`vignette("ggplot2-specs")`] ] --- ```r ggplot(data = mpg) + geom_point( mapping = aes(x = displ, y = hwy), * shape = "triangle", size = 3, colour = "red", alpha = 0.3 # 变量无关(“大家都一样”,即非映射)的几何对象图形属性应在aes()外进行设定 ) ``` <img src="L02_Visualization_Prepare_files/figure-html/mapping4-1.png" width="55%" style="display: block; margin: auto;" /> --- ```r # layered! ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + * geom_smooth(mapping = aes(x = displ, y = hwy)) # layered! ``` <img src="L02_Visualization_Prepare_files/figure-html/mapping5-1.png" width="58%" style="display: block; margin: auto;" /> --- ```r ggplot( data = mpg, * mapping = aes(x = displ, y = hwy) # 共用映射,提前至ggplot()中 ) + geom_point() + geom_smooth() ``` <img src="L02_Visualization_Prepare_files/figure-html/mapping6-1.png" width="56%" style="display: block; margin: auto;" /> --- ```r *ggplot(mpg, aes(displ, hwy)) + # 省略常用参数名 * geom_point(aes(colour = class)) + # 图层自用映射 * geom_smooth(data = filter(mpg, class == "pickup")) # 不同的数据集 ``` <img src="L02_Visualization_Prepare_files/figure-html/mapping7-1.png" width="65%" style="display: block; margin: auto;" /> --- layout: true ### .bold[3 .red[统计变换]、位置调整、坐标、分面] --- ```r ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut)) ``` -- <img src="L02_Visualization_Prepare_files/figure-html/stat2-1.png" width="65%" style="display: block; margin: auto;" /> .red.font110[Where comes the "count" in y-axis? 😅] --- <img src="imgs/visualization-stat-bar.png" width="100%" style="display: block; margin: auto;" /> -- .font110[下面的代码会得到和使用 `geom_bar()` 相同的结果] ```r ggplot(data = diamonds) + * stat_count(mapping = aes(x = cut)) ``` --- ```r ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = after_stat(prop), # ?aes_eval group = 1)) # ?aes_group_order ``` <img src="L02_Visualization_Prepare_files/figure-html/stat3-2-1.png" width="60%" style="display: block; margin: auto;" /> --- ```r ggplot(data = diamonds %>% group_by(cut) %>% count()) + # 数据分组汇总计算 geom_bar( mapping = aes(x = cut, y = n), * stat = "identity" ) ``` <img src="L02_Visualization_Prepare_files/figure-html/stat4-1.png" width="58%" style="display: block; margin: auto;" /> --- layout: true ### .bold[3 统计变换、.red[位置调整]、坐标、分面] --- ```r ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = clarity)) # geom_bar()默认的position = "stack" ``` <img src="L02_Visualization_Prepare_files/figure-html/pos1-1.png" width="65%" style="display: block; margin: auto;" /> --- ```r ggplot(data = diamonds) + geom_bar( mapping = aes(x = cut, fill = clarity), * position = "dodge" # position = position_dodge(width = 0.9) ) ``` <img src="L02_Visualization_Prepare_files/figure-html/pos2-1.png" width="58%" style="display: block; margin: auto;" /> --- ```r ggplot(data = mpg) + geom_point( mapping = aes(x = displ, y = hwy), * position = "jitter" # same as `ggplot() + geom_jitter()` ) ``` <img src="L02_Visualization_Prepare_files/figure-html/pos3-1.png" width="58%" style="display: block; margin: auto;" /> --- layout: true ### .bold[3 统计变换、位置调整、.red[坐标]、分面] --- ```r ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + geom_boxplot() + * coord_flip() # ggplot2 3.3.0后直接调换x,y参数即可 ``` <img src="L02_Visualization_Prepare_files/figure-html/coord1-1.png" width="65%" style="display: block; margin: auto;" /> --- ```r bar <- ggplot(data = diamonds) + # 将图形对象存为变量bar,可在Environment标签页中检查其内容 geom_bar( mapping = aes(x = cut, fill = cut), show.legend = FALSE, width = 1 ) + theme(aspect.ratio = 1) + labs(x = NULL, y = NULL) *bar + coord_polar() ``` <img src="L02_Visualization_Prepare_files/figure-html/coord2-1.png" width="35%" style="display: block; margin: auto;" /> --- layout: true ### .bold[3 统计变换、位置调整、坐标、.red[分面]] --- ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + geom_point(data = mutate(mpg, class = NULL), colour = "grey") + geom_point() + * facet_wrap(vars(class), nrow = 2) # facet_wrap(~ class, nrow = 2) ``` <img src="L02_Visualization_Prepare_files/figure-html/facet1-1.png" width="65%" style="display: block; margin: auto;" /> --- ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + geom_point(data = transform(mpg, cyl = NULL), colour = "grey") + geom_point() + * facet_grid(rows = vars(drv), cols = vars(cyl)) # facet_grid(drv ~ cyl) ``` <img src="L02_Visualization_Prepare_files/figure-html/facet2-1.png" width="65%" style="display: block; margin: auto;" /> --- layout: true ### .bold[4 .red[标签]、标注、标度、主题等] --- ```r ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + geom_smooth(se = FALSE) + * labs( # 标题、小标题等 * title = "Fuel efficiency", * subtitle = "... generally decreases with engine size", * caption = "Data from fueleconomy.gov" * ) ``` <img src="L02_Visualization_Prepare_files/figure-html/labs1-1.png" width="50%" style="display: block; margin: auto;" /> --- ```r ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + geom_smooth(se = FALSE) + * labs( # 图形属性 = 标签 * x = "Engine displacement (L)", * y = "Highway fuel economy (mpg)", * colour = "Car type" * ) ``` <img src="L02_Visualization_Prepare_files/figure-html/labs2-1.png" width="50%" style="display: block; margin: auto;" /> --- layout: true ### .bold[4 标签、.red[标注]、标度、主题等] --- ```r # 生成辅助数据集 best <- mpg %>% group_by(class) %>% filter(row_number(desc(hwy)) == 1) ``` -- .pull-left[ ```r ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + * geom_text( aes(label = model), data = best ) ``` <img src="L02_Visualization_Prepare_files/figure-html/label1-1.png" width="85%" style="display: block; margin: auto;" /> ] -- .pull-right[ ```r ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + * geom_label( aes(label = model), data = best, nudge_y = 2, alpha = 0.5 ) ``` <img src="L02_Visualization_Prepare_files/figure-html/label2-1.png" width="85%" style="display: block; margin: auto;" /> ] --- ```r ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + * geom_point(size = 3, shape = 1, data = best) + * ggrepel::geom_label_repel(aes(label = model), data = best) # install ggrepel ``` <img src="L02_Visualization_Prepare_files/figure-html/label3-1.png" width="65%" style="display: block; margin: auto;" /> --- layout: true ### .bold[4 标签、标注、.red[标度]、主题等] --- ```r ggplot(diamonds, aes(carat, price)) + geom_bin2d() + * scale_x_log10() + * scale_y_log10() ``` <img src="L02_Visualization_Prepare_files/figure-html/scale1-1.png" width="65%" style="display: block; margin: auto;" /> --- ```r ggplot(mpg, aes(displ, hwy)) + geom_point() + * scale_y_continuous( name = "Highway fuel economy (mpg)", breaks = seq(0, 50, by = 10), limits = c(0, 50) ) ``` <img src="L02_Visualization_Prepare_files/figure-html/scale2-1.png" width="55%" style="display: block; margin: auto;" /> --- ```r ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = drv, shape = drv)) + * scale_colour_brewer(palette = "Set1") # RColorBrewer::display.brewer.all() ``` <img src="L02_Visualization_Prepare_files/figure-html/scale3-1.png" width="65%" style="display: block; margin: auto;" /> --- ```r ggplot(mpg, aes(displ, hwy, colour = class)) + geom_point() + * theme(legend.position = "bottom") + guides( * colour = guide_legend(nrow = 1, override.aes = list(size = 3)) ) ``` <img src="L02_Visualization_Prepare_files/figure-html/scale4-1.png" width="55%" style="display: block; margin: auto;" /> --- layout: true ### .bold[4 标签、标注、标度、.red[主题]等] --- .pull-left[ ```r ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + geom_smooth(se = FALSE) + * theme_minimal() ``` <img src="L02_Visualization_Prepare_files/figure-html/theme1-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ ```r ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + geom_smooth(se = FALSE) + * ggthemes::theme_stata() # install it! ``` <img src="L02_Visualization_Prepare_files/figure-html/theme2-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .code80[ ```r theme(line, rect, text, title, aspect.ratio, axis.title, axis.title.x, axis.title.x.top, axis.title.x.bottom, axis.title.y, axis.title.y.left, axis.title.y.right, axis.text, axis.text.x, axis.text.x.top, axis.text.x.bottom, axis.text.y, axis.text.y.left, axis.text.y.right, axis.ticks, axis.ticks.x, axis.ticks.x.top, axis.ticks.x.bottom, axis.ticks.y, axis.ticks.y.left, axis.ticks.y.right, axis.ticks.length, axis.ticks.length.x, axis.ticks.length.x.top, axis.ticks.length.x.bottom, axis.ticks.length.y, axis.ticks.length.y.left, axis.ticks.length.y.right, axis.line, axis.line.x, axis.line.x.top, axis.line.x.bottom, axis.line.y, axis.line.y.left, axis.line.y.right, legend.background, legend.margin, legend.spacing, legend.spacing.x, legend.spacing.y, legend.key, legend.key.size, legend.key.height, legend.key.width, legend.text, legend.text.align, legend.title, legend.title.align, legend.position, legend.direction, legend.justification, legend.box, legend.box.just, legend.box.margin, legend.box.background, legend.box.spacing, panel.background, panel.border, panel.spacing, panel.spacing.x, panel.spacing.y, panel.grid, panel.grid.major, panel.grid.minor, panel.grid.major.x, panel.grid.major.y, panel.grid.minor.x, panel.grid.minor.y, panel.ontop, plot.background, plot.title, plot.title.position, plot.subtitle, plot.caption, plot.caption.position, plot.tag, plot.tag.position, plot.margin, strip.background, strip.background.x, strip.background.y, strip.clip, strip.placement, strip.text, strip.text.x, strip.text.x.bottom, strip.text.x.top, strip.text.y, strip.text.y.left, strip.text.y.right, strip.switch.pad.grid, strip.switch.pad.wrap, ..., complete = FALSE, validate = TRUE) ``` ] --- layout: false class: center middle background-image: url(imgs/xaringan.png) background-size: 12% background-position: 50% 40% <br><br><br><br><br><br><br> <hr color='#f00' size='2px' width='80%'> <br> .Large.red[_**本网页版讲义的制作由 R 包 [{{`xaringan`}}](https://github.com/yihui/xaringan) 赋能!**_]