Add post about duckplyr performance.
This commit is contained in:
parent
6eefaf6b90
commit
b5118b2835
@ -7,6 +7,18 @@ $highlight-grey: #7d828a;
|
||||
$midnightblue: #2c3e50;
|
||||
$typewriter: hsl(172, 100%, 36%);
|
||||
|
||||
// Scroll to Top Default colors
|
||||
|
||||
$stt-stroke:#CCC;
|
||||
$stt-circle:#3b3e48;
|
||||
$stt-arrow:#018574;
|
||||
|
||||
kbd {
|
||||
font-size: 0.9em !important;
|
||||
color: inherit;
|
||||
background-color: $midnightblue;
|
||||
}
|
||||
|
||||
// Fonts
|
||||
$fonts: "IBM Plex Sans Light", "Segoe UI", Candara, sans-serif;
|
||||
$code-fonts: "IBM Plex Mono", Consolas, "Andale Mono WT", "Andale Mono", Menlo, Monaco, monospace;
|
||||
|
@ -46,6 +46,9 @@ expiryDate = ["expiryDate"]
|
||||
# Categories are disabled by default.
|
||||
# category = "categories"
|
||||
|
||||
[markup.goldmark.renderer]
|
||||
unsafe = true
|
||||
|
||||
# Enable to get proper Mathjax support
|
||||
#[markup]
|
||||
# [markup.goldmark]
|
||||
|
@ -0,0 +1,74 @@
|
||||
<?xml version='1.0' encoding='UTF-8' ?>
|
||||
<svg xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink' class='svglite' width='576.00pt' height='360.00pt' viewBox='0 0 576.00 360.00'>
|
||||
<defs>
|
||||
<style type='text/css'><![CDATA[
|
||||
.svglite line, .svglite polyline, .svglite polygon, .svglite path, .svglite rect, .svglite circle {
|
||||
fill: none;
|
||||
stroke: #000000;
|
||||
stroke-linecap: round;
|
||||
stroke-linejoin: round;
|
||||
stroke-miterlimit: 10.00;
|
||||
}
|
||||
.svglite text {
|
||||
white-space: pre;
|
||||
}
|
||||
]]></style>
|
||||
</defs>
|
||||
<rect width='100%' height='100%' style='stroke: none; fill: #292E32;'/>
|
||||
<defs>
|
||||
<clipPath id='cpMC4wMHw1NzYuMDB8MC4wMHwzNjAuMDA='>
|
||||
<rect x='0.00' y='0.00' width='576.00' height='360.00' />
|
||||
</clipPath>
|
||||
</defs>
|
||||
<g clip-path='url(#cpMC4wMHw1NzYuMDB8MC4wMHwzNjAuMDA=)'>
|
||||
<rect x='0.000000000000064' y='0.00' width='576.00' height='360.00' style='stroke-width: 1.07; stroke: #FFFFFF; fill: #292E32;' />
|
||||
</g>
|
||||
<defs>
|
||||
<clipPath id='cpNTkuMTB8NDIxLjE5fDQ2LjQxfDMwMi45OQ=='>
|
||||
<rect x='59.10' y='46.41' width='362.09' height='256.58' />
|
||||
</clipPath>
|
||||
</defs>
|
||||
<g clip-path='url(#cpNTkuMTB8NDIxLjE5fDQ2LjQxfDMwMi45OQ==)'>
|
||||
<polyline points='59.10,302.99 421.19,302.99 ' style='stroke-width: 1.07; stroke: #3B3E48; stroke-linecap: butt;' />
|
||||
<polyline points='59.10,238.06 421.19,238.06 ' style='stroke-width: 1.07; stroke: #3B3E48; stroke-linecap: butt;' />
|
||||
<polyline points='59.10,173.14 421.19,173.14 ' style='stroke-width: 1.07; stroke: #3B3E48; stroke-linecap: butt;' />
|
||||
<polyline points='59.10,108.22 421.19,108.22 ' style='stroke-width: 1.07; stroke: #3B3E48; stroke-linecap: butt;' />
|
||||
<circle cx='151.61' cy='98.97' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<circle cx='177.19' cy='91.03' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<circle cx='174.46' cy='99.26' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<circle cx='296.36' cy='101.41' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<circle cx='320.05' cy='105.05' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<circle cx='353.79' cy='91.38' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<polygon points='325.12,89.89 331.35,100.67 318.89,100.67 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<polygon points='327.97,92.20 334.20,102.98 321.75,102.98 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<polygon points='327.74,96.20 333.97,106.98 321.52,106.98 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<polygon points='167.19,87.42 173.41,98.20 160.96,98.20 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<polygon points='190.42,81.99 196.65,92.77 184.20,92.77 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<polygon points='165.99,82.14 172.22,92.92 159.77,92.92 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
</g>
|
||||
<g clip-path='url(#cpMC4wMHw1NzYuMDB8MC4wMHwzNjAuMDA=)'>
|
||||
<polyline points='59.10,302.99 59.10,46.41 ' style='stroke-width: 2.13; stroke: #3B3E48; stroke-linecap: butt;' />
|
||||
<text x='54.17' y='306.91' text-anchor='end' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='6.29px' lengthAdjust='spacingAndGlyphs'>0</text>
|
||||
<text x='54.17' y='241.99' text-anchor='end' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='12.59px' lengthAdjust='spacingAndGlyphs'>10</text>
|
||||
<text x='54.17' y='177.07' text-anchor='end' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='12.59px' lengthAdjust='spacingAndGlyphs'>20</text>
|
||||
<text x='54.17' y='112.14' text-anchor='end' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='12.59px' lengthAdjust='spacingAndGlyphs'>30</text>
|
||||
<polyline points='56.36,302.99 59.10,302.99 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
|
||||
<polyline points='56.36,238.06 59.10,238.06 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
|
||||
<polyline points='56.36,173.14 59.10,173.14 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
|
||||
<polyline points='56.36,108.22 59.10,108.22 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
|
||||
<polyline points='59.10,302.99 421.19,302.99 ' style='stroke-width: 2.13; stroke: #3B3E48; stroke-linecap: butt;' />
|
||||
<polyline points='157.85,305.73 157.85,302.99 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
|
||||
<polyline points='322.44,305.73 322.44,302.99 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
|
||||
<text x='157.85' y='315.77' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='26.52px' lengthAdjust='spacingAndGlyphs'>dplyr</text>
|
||||
<text x='322.44' y='315.77' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='44.47px' lengthAdjust='spacingAndGlyphs'>duckplyr</text>
|
||||
<text x='240.15' y='329.01' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='36.24px' lengthAdjust='spacingAndGlyphs'>Library</text>
|
||||
<text transform='translate(36.20,174.70) rotate(-90)' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='40.16px' lengthAdjust='spacingAndGlyphs'>Time (s)</text>
|
||||
<rect x='432.15' y='143.95' width='115.50' height='61.50' style='stroke-width: 1.07; stroke: none; fill: #3B3E48;' />
|
||||
<text x='437.63' y='158.61' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='104.54px' lengthAdjust='spacingAndGlyphs'>Laptop power mode</text>
|
||||
<circle cx='446.27' cy='174.05' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<polygon points='446.27,184.14 452.50,194.92 440.04,194.92 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<text x='460.39' y='177.19' style='font-size: 8.80px;fill: #FFFFFF; font-family: "Arial";' textLength='37.56px' lengthAdjust='spacingAndGlyphs'>balanced</text>
|
||||
<text x='460.39' y='194.47' style='font-size: 8.80px;fill: #FFFFFF; font-family: "Arial";' textLength='53.76px' lengthAdjust='spacingAndGlyphs'>performance</text>
|
||||
<text x='59.10' y='37.76' style='font-size: 13.20px;fill: #FFFFFF; font-family: "Arial";' textLength='224.18px' lengthAdjust='spacingAndGlyphs'>Time elapsed with dplyr vs. duckplyr</text>
|
||||
</g>
|
||||
</svg>
|
After Width: | Height: | Size: 6.5 KiB |
@ -0,0 +1,87 @@
|
||||
<?xml version='1.0' encoding='UTF-8' ?>
|
||||
<svg xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink' class='svglite' width='576.00pt' height='360.00pt' viewBox='0 0 576.00 360.00'>
|
||||
<defs>
|
||||
<style type='text/css'><![CDATA[
|
||||
.svglite line, .svglite polyline, .svglite polygon, .svglite path, .svglite rect, .svglite circle {
|
||||
fill: none;
|
||||
stroke: #000000;
|
||||
stroke-linecap: round;
|
||||
stroke-linejoin: round;
|
||||
stroke-miterlimit: 10.00;
|
||||
}
|
||||
.svglite text {
|
||||
white-space: pre;
|
||||
}
|
||||
]]></style>
|
||||
</defs>
|
||||
<rect width='100%' height='100%' style='stroke: none; fill: #292E32;'/>
|
||||
<defs>
|
||||
<clipPath id='cpMC4wMHw1NzYuMDB8MC4wMHwzNjAuMDA='>
|
||||
<rect x='0.00' y='0.00' width='576.00' height='360.00' />
|
||||
</clipPath>
|
||||
</defs>
|
||||
<g clip-path='url(#cpMC4wMHw1NzYuMDB8MC4wMHwzNjAuMDA=)'>
|
||||
<rect x='0.000000000000064' y='0.00' width='576.00' height='360.00' style='stroke-width: 1.07; stroke: #FFFFFF; fill: #292E32;' />
|
||||
</g>
|
||||
<defs>
|
||||
<clipPath id='cpNTkuMTB8NDIxLjE5fDQ2LjQxfDI3OS4yMw=='>
|
||||
<rect x='59.10' y='46.41' width='362.09' height='232.82' />
|
||||
</clipPath>
|
||||
</defs>
|
||||
<g clip-path='url(#cpNTkuMTB8NDIxLjE5fDQ2LjQxfDI3OS4yMw==)'>
|
||||
<polyline points='59.10,279.23 421.19,279.23 ' style='stroke-width: 1.07; stroke: #3B3E48; stroke-linecap: butt;' />
|
||||
<polyline points='59.10,225.86 421.19,225.86 ' style='stroke-width: 1.07; stroke: #3B3E48; stroke-linecap: butt;' />
|
||||
<polyline points='59.10,172.48 421.19,172.48 ' style='stroke-width: 1.07; stroke: #3B3E48; stroke-linecap: butt;' />
|
||||
<polyline points='59.10,119.11 421.19,119.11 ' style='stroke-width: 1.07; stroke: #3B3E48; stroke-linecap: butt;' />
|
||||
<polyline points='59.10,65.74 421.19,65.74 ' style='stroke-width: 1.07; stroke: #3B3E48; stroke-linecap: butt;' />
|
||||
<circle cx='127.08' cy='111.51' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<circle cx='142.09' cy='105.00' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<circle cx='132.50' cy='111.77' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<circle cx='246.17' cy='113.52' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<circle cx='223.06' cy='116.55' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<circle cx='260.64' cy='105.27' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<polygon points='249.90,102.73 256.13,113.51 243.67,113.51 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<polygon points='243.36,104.64 249.58,115.42 237.13,115.42 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<polygon points='218.81,108.00 225.03,118.79 212.58,118.79 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<polygon points='119.28,100.75 125.50,111.54 113.05,111.54 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<polygon points='114.27,96.29 120.49,107.07 108.04,107.07 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<polygon points='131.88,96.40 138.10,107.19 125.65,107.19 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<circle cx='335.41' cy='90.10' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<circle cx='351.34' cy='85.38' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<circle cx='336.95' cy='85.21' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<polygon points='344.58,92.55 350.81,103.33 338.36,103.33 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<polygon points='349.73,94.20 355.96,104.98 343.51,104.98 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<polygon points='344.11,90.98 350.33,101.77 337.88,101.77 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
</g>
|
||||
<g clip-path='url(#cpMC4wMHw1NzYuMDB8MC4wMHwzNjAuMDA=)'>
|
||||
<polyline points='59.10,279.23 59.10,46.41 ' style='stroke-width: 2.13; stroke: #3B3E48; stroke-linecap: butt;' />
|
||||
<text x='54.17' y='283.15' text-anchor='end' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='6.29px' lengthAdjust='spacingAndGlyphs'>0</text>
|
||||
<text x='54.17' y='229.78' text-anchor='end' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='12.59px' lengthAdjust='spacingAndGlyphs'>10</text>
|
||||
<text x='54.17' y='176.41' text-anchor='end' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='12.59px' lengthAdjust='spacingAndGlyphs'>20</text>
|
||||
<text x='54.17' y='123.04' text-anchor='end' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='12.59px' lengthAdjust='spacingAndGlyphs'>30</text>
|
||||
<text x='54.17' y='69.67' text-anchor='end' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='12.59px' lengthAdjust='spacingAndGlyphs'>40</text>
|
||||
<polyline points='56.36,279.23 59.10,279.23 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
|
||||
<polyline points='56.36,225.86 59.10,225.86 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
|
||||
<polyline points='56.36,172.48 59.10,172.48 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
|
||||
<polyline points='56.36,119.11 59.10,119.11 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
|
||||
<polyline points='56.36,65.74 59.10,65.74 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
|
||||
<polyline points='59.10,279.23 421.19,279.23 ' style='stroke-width: 2.13; stroke: #3B3E48; stroke-linecap: butt;' />
|
||||
<polyline points='126.99,281.97 126.99,279.23 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
|
||||
<polyline points='240.15,281.97 240.15,279.23 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
|
||||
<polyline points='353.30,281.97 353.30,279.23 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
|
||||
<text x='126.99' y='292.01' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='26.52px' lengthAdjust='spacingAndGlyphs'>dplyr</text>
|
||||
<text x='240.15' y='292.01' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='44.47px' lengthAdjust='spacingAndGlyphs'>duckplyr</text>
|
||||
<text x='353.30' y='292.01' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='44.47px' lengthAdjust='spacingAndGlyphs'>duckplyr</text>
|
||||
<text x='353.30' y='303.89' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='22.26px' lengthAdjust='spacingAndGlyphs'>with</text>
|
||||
<text x='353.30' y='315.77' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='101.25px' lengthAdjust='spacingAndGlyphs'>`as_duckplyr_tibble`</text>
|
||||
<text x='240.15' y='329.01' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='36.24px' lengthAdjust='spacingAndGlyphs'>Library</text>
|
||||
<text transform='translate(36.20,162.82) rotate(-90)' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='40.16px' lengthAdjust='spacingAndGlyphs'>Time (s)</text>
|
||||
<rect x='432.15' y='132.07' width='115.50' height='61.50' style='stroke-width: 1.07; stroke: none; fill: #3B3E48;' />
|
||||
<text x='437.63' y='146.73' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='104.54px' lengthAdjust='spacingAndGlyphs'>Laptop power mode</text>
|
||||
<circle cx='446.27' cy='162.17' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<polygon points='446.27,172.26 452.50,183.04 440.04,183.04 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
|
||||
<text x='460.39' y='165.31' style='font-size: 8.80px;fill: #FFFFFF; font-family: "Arial";' textLength='37.56px' lengthAdjust='spacingAndGlyphs'>balanced</text>
|
||||
<text x='460.39' y='182.59' style='font-size: 8.80px;fill: #FFFFFF; font-family: "Arial";' textLength='53.76px' lengthAdjust='spacingAndGlyphs'>performance</text>
|
||||
<text x='59.10' y='37.76' style='font-size: 13.20px;fill: #FFFFFF; font-family: "Arial";' textLength='224.18px' lengthAdjust='spacingAndGlyphs'>Time elapsed with dplyr vs. duckplyr</text>
|
||||
</g>
|
||||
</svg>
|
After Width: | Height: | Size: 8.4 KiB |
127
content/posts/2024-08-26-duckplyr-performance/index.md
Normal file
127
content/posts/2024-08-26-duckplyr-performance/index.md
Normal file
@ -0,0 +1,127 @@
|
||||
---
|
||||
title: Performance experiments with duckplyr in R
|
||||
description: I recently became aware of the 'duckplyr' library for R. Here are the results of my experimenting with it and benchmarking it against `dplyr`.
|
||||
date: 2024-08-26T19:00:00+0200
|
||||
draft: false
|
||||
# ShowLastmod: true
|
||||
toc: false
|
||||
scrolltotop: true
|
||||
tags:
|
||||
- R
|
||||
- statistics
|
||||
---
|
||||
I recently became aware of the [duckplyr][] library for R, which takes the place
|
||||
of tidyverse's [dplyr][] library, but uses the [DuckDB] database under the hood.
|
||||
Without really knowing anything about how dplyr works and if the use of DuckDB
|
||||
would improve my workflow at all, I decided to perform an experiment. I am
|
||||
currently analyzing two datasets, one with ~80k records and ~70 variables and
|
||||
one with ~60k records and ~100 variables. Both datasets are wrangled with
|
||||
[Tidyverse][]-foo in multiple ways and finally combined. The wrangling of the
|
||||
data involves things like `rowwise()` and `c_across()`, which I know from
|
||||
experience is quite an 'expensive' operation.
|
||||
|
||||
In order to get the execution times of my code, I did this repeatedly:
|
||||
|
||||
1. Restart R (by pressing <kbd>CTRL</kbd> <kbd>SHIFT</kbd> <kbd>F10</kbd>).
|
||||
2. Run
|
||||
|
||||
```r
|
||||
system.time(rmarkdown::render("my_file.Rmd"))
|
||||
```
|
||||
|
||||
3. Record the user time and the system time elapsed.
|
||||
4. Repeat twice.
|
||||
|
||||
I did this with both the "balanced power mode" and the "performance mode" on my
|
||||
[laptop][]. During execution of the code, I left the laptop alone in order not
|
||||
to interfere with the timing.
|
||||
|
||||
This is the result of my benchmarking:
|
||||
|
||||
{{< figure src="benchmarking1.svg" >}}
|
||||
|
||||
The times are user times. I left out the system times, which are in the range of
|
||||
2-3 seconds.
|
||||
|
||||
Not really mind-boggling, right? It occurred to me that I rather double-check
|
||||
that `duckplyr` was really being used. Indeed, this was _not_ the case:
|
||||
|
||||
```r
|
||||
> class(clinical_data)
|
||||
[1] "tbl_df" "tbl" "data.frame"
|
||||
```
|
||||
|
||||
`clinical_data` was missing the `duckplyr_df' class. How come?
|
||||
|
||||
I import the raw data from Excel files (don't ask...) into tibbles, and
|
||||
evidently, this prevents `duckplyr` from seeing the data frames. So I piped the
|
||||
data frames through `as_duckplyr_tibble()` explicitly, and this got me the right
|
||||
classes:
|
||||
|
||||
```r
|
||||
> class(clinical_data)
|
||||
[1] "duckplyr_df" "tbl_df" "tbl" "data.frame"
|
||||
```
|
||||
|
||||
However, this did not really speed up the execution either.
|
||||
|
||||
{{< figure src="benchmarking2.svg" >}}
|
||||
|
||||
I looked around my RMarkdown chunks and their outputs, but I did not find any
|
||||
warning that `duckplyr` had to fall back to `dplyr`'s methods. This could have
|
||||
explained the absence of a noticeable difference.
|
||||
|
||||
Here are the average times (in seconds) for the benchmarking runs.
|
||||
|
||||
```r
|
||||
> runs_table
|
||||
# A tibble: 6 × 4
|
||||
# Groups: library, power_mode [6]
|
||||
library power_mode mean sd
|
||||
<chr> <chr> <dbl> <dbl>
|
||||
1 dplyr balanced 31.8 0.722
|
||||
2 dplyr performance 32.6 0.477
|
||||
3 duckplyr balanced 31.4 1.10
|
||||
4 duckplyr performance 31.3 0.495
|
||||
5 duckplyr with `as_duckplyr_tibble` balanced 36.0 0.517
|
||||
6 duckplyr with `as_duckplyr_tibble` performance 33.6 0.303
|
||||
```
|
||||
|
||||
So at least for my (!!!) use case, the use of `duckplyr` instead of `dplyr` did
|
||||
not make any practical difference, and I can also leave my laptop's performance
|
||||
mode alone. When it comes to optimizing performance, you can't just buy a
|
||||
solution off the shelf, you always have to try and find the best solution for
|
||||
your specific problem.
|
||||
|
||||
Your mileage will vary, of course. The people who develop `duckplyr` are
|
||||
brilliant, and the fact that it does not work for me tells more about me and my
|
||||
work than it does about `duckplyr`.
|
||||
|
||||
## The duckplyr demo dataset
|
||||
|
||||
As a case in point, the [duckplyr demo repository][duckplyr-demo] contains a
|
||||
taxi data set. The ZIP file alone is a ~1.7 GB download. Deflated, the files
|
||||
take up 2.4 GB. With about 21 million records (24 variables), this dataset
|
||||
is _considerably_ larger than mine.
|
||||
|
||||
Here are the results from running `dplyr/run_all_queries.R` and
|
||||
`duckplyr/run_all_queries.R` on my Thinkpad P14s (performance mode in F40 KDE):
|
||||
|
||||
| Library | q01 | q02 | q03 | q04 |
|
||||
|----------|------:|------:|------:|-------:|
|
||||
| dplyr | 3.4 s | 3.9 s | 9.1 s | 14.3 s |
|
||||
| duckplyr | 4.3 s | 4.4 s | 9.4 s | 14.8 s |
|
||||
|
||||
I should add that execution times vary with each run, but the big picture stays
|
||||
the same.
|
||||
|
||||
Maybe I'm missing the point and it's not about execution times, after all.
|
||||
|
||||
`¯\_(ツ)_/`
|
||||
|
||||
[dplyr]: https:/dplyr.tidyverse.org
|
||||
[duckdb]: https://duckdb.org
|
||||
[duckplyr]: https://duckplyr.tidyverse.org
|
||||
[duckplyr-demo]: https://github.com/Tmonster/duckplyr_demo
|
||||
[laptop]: {{< ref "/posts/2024-08-05-linux-on-thinkpad-P14s-Gen-5" >}}
|
||||
[tidyverse]: https://tidyverse.org
|
Loading…
Reference in New Issue
Block a user