Add post about duckplyr performance.

This commit is contained in:
daniel 2024-08-28 14:44:13 +02:00
parent 6eefaf6b90
commit b5118b2835
5 changed files with 303 additions and 0 deletions

View File

@ -7,6 +7,18 @@ $highlight-grey: #7d828a;
$midnightblue: #2c3e50; $midnightblue: #2c3e50;
$typewriter: hsl(172, 100%, 36%); $typewriter: hsl(172, 100%, 36%);
// Scroll to Top Default colors
$stt-stroke:#CCC;
$stt-circle:#3b3e48;
$stt-arrow:#018574;
kbd {
font-size: 0.9em !important;
color: inherit;
background-color: $midnightblue;
}
// Fonts // Fonts
$fonts: "IBM Plex Sans Light", "Segoe UI", Candara, sans-serif; $fonts: "IBM Plex Sans Light", "Segoe UI", Candara, sans-serif;
$code-fonts: "IBM Plex Mono", Consolas, "Andale Mono WT", "Andale Mono", Menlo, Monaco, monospace; $code-fonts: "IBM Plex Mono", Consolas, "Andale Mono WT", "Andale Mono", Menlo, Monaco, monospace;

View File

@ -46,6 +46,9 @@ expiryDate = ["expiryDate"]
# Categories are disabled by default. # Categories are disabled by default.
# category = "categories" # category = "categories"
[markup.goldmark.renderer]
unsafe = true
# Enable to get proper Mathjax support # Enable to get proper Mathjax support
#[markup] #[markup]
# [markup.goldmark] # [markup.goldmark]

View File

@ -0,0 +1,74 @@
<?xml version='1.0' encoding='UTF-8' ?>
<svg xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink' class='svglite' width='576.00pt' height='360.00pt' viewBox='0 0 576.00 360.00'>
<defs>
<style type='text/css'><![CDATA[
.svglite line, .svglite polyline, .svglite polygon, .svglite path, .svglite rect, .svglite circle {
fill: none;
stroke: #000000;
stroke-linecap: round;
stroke-linejoin: round;
stroke-miterlimit: 10.00;
}
.svglite text {
white-space: pre;
}
]]></style>
</defs>
<rect width='100%' height='100%' style='stroke: none; fill: #292E32;'/>
<defs>
<clipPath id='cpMC4wMHw1NzYuMDB8MC4wMHwzNjAuMDA='>
<rect x='0.00' y='0.00' width='576.00' height='360.00' />
</clipPath>
</defs>
<g clip-path='url(#cpMC4wMHw1NzYuMDB8MC4wMHwzNjAuMDA=)'>
<rect x='0.000000000000064' y='0.00' width='576.00' height='360.00' style='stroke-width: 1.07; stroke: #FFFFFF; fill: #292E32;' />
</g>
<defs>
<clipPath id='cpNTkuMTB8NDIxLjE5fDQ2LjQxfDMwMi45OQ=='>
<rect x='59.10' y='46.41' width='362.09' height='256.58' />
</clipPath>
</defs>
<g clip-path='url(#cpNTkuMTB8NDIxLjE5fDQ2LjQxfDMwMi45OQ==)'>
<polyline points='59.10,302.99 421.19,302.99 ' style='stroke-width: 1.07; stroke: #3B3E48; stroke-linecap: butt;' />
<polyline points='59.10,238.06 421.19,238.06 ' style='stroke-width: 1.07; stroke: #3B3E48; stroke-linecap: butt;' />
<polyline points='59.10,173.14 421.19,173.14 ' style='stroke-width: 1.07; stroke: #3B3E48; stroke-linecap: butt;' />
<polyline points='59.10,108.22 421.19,108.22 ' style='stroke-width: 1.07; stroke: #3B3E48; stroke-linecap: butt;' />
<circle cx='151.61' cy='98.97' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<circle cx='177.19' cy='91.03' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<circle cx='174.46' cy='99.26' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<circle cx='296.36' cy='101.41' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<circle cx='320.05' cy='105.05' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<circle cx='353.79' cy='91.38' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<polygon points='325.12,89.89 331.35,100.67 318.89,100.67 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<polygon points='327.97,92.20 334.20,102.98 321.75,102.98 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<polygon points='327.74,96.20 333.97,106.98 321.52,106.98 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<polygon points='167.19,87.42 173.41,98.20 160.96,98.20 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<polygon points='190.42,81.99 196.65,92.77 184.20,92.77 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<polygon points='165.99,82.14 172.22,92.92 159.77,92.92 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
</g>
<g clip-path='url(#cpMC4wMHw1NzYuMDB8MC4wMHwzNjAuMDA=)'>
<polyline points='59.10,302.99 59.10,46.41 ' style='stroke-width: 2.13; stroke: #3B3E48; stroke-linecap: butt;' />
<text x='54.17' y='306.91' text-anchor='end' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='6.29px' lengthAdjust='spacingAndGlyphs'>0</text>
<text x='54.17' y='241.99' text-anchor='end' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='12.59px' lengthAdjust='spacingAndGlyphs'>10</text>
<text x='54.17' y='177.07' text-anchor='end' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='12.59px' lengthAdjust='spacingAndGlyphs'>20</text>
<text x='54.17' y='112.14' text-anchor='end' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='12.59px' lengthAdjust='spacingAndGlyphs'>30</text>
<polyline points='56.36,302.99 59.10,302.99 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
<polyline points='56.36,238.06 59.10,238.06 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
<polyline points='56.36,173.14 59.10,173.14 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
<polyline points='56.36,108.22 59.10,108.22 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
<polyline points='59.10,302.99 421.19,302.99 ' style='stroke-width: 2.13; stroke: #3B3E48; stroke-linecap: butt;' />
<polyline points='157.85,305.73 157.85,302.99 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
<polyline points='322.44,305.73 322.44,302.99 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
<text x='157.85' y='315.77' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='26.52px' lengthAdjust='spacingAndGlyphs'>dplyr</text>
<text x='322.44' y='315.77' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='44.47px' lengthAdjust='spacingAndGlyphs'>duckplyr</text>
<text x='240.15' y='329.01' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='36.24px' lengthAdjust='spacingAndGlyphs'>Library</text>
<text transform='translate(36.20,174.70) rotate(-90)' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='40.16px' lengthAdjust='spacingAndGlyphs'>Time (s)</text>
<rect x='432.15' y='143.95' width='115.50' height='61.50' style='stroke-width: 1.07; stroke: none; fill: #3B3E48;' />
<text x='437.63' y='158.61' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='104.54px' lengthAdjust='spacingAndGlyphs'>Laptop power mode</text>
<circle cx='446.27' cy='174.05' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<polygon points='446.27,184.14 452.50,194.92 440.04,194.92 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<text x='460.39' y='177.19' style='font-size: 8.80px;fill: #FFFFFF; font-family: "Arial";' textLength='37.56px' lengthAdjust='spacingAndGlyphs'>balanced</text>
<text x='460.39' y='194.47' style='font-size: 8.80px;fill: #FFFFFF; font-family: "Arial";' textLength='53.76px' lengthAdjust='spacingAndGlyphs'>performance</text>
<text x='59.10' y='37.76' style='font-size: 13.20px;fill: #FFFFFF; font-family: "Arial";' textLength='224.18px' lengthAdjust='spacingAndGlyphs'>Time elapsed with dplyr vs. duckplyr</text>
</g>
</svg>

After

Width:  |  Height:  |  Size: 6.5 KiB

View File

@ -0,0 +1,87 @@
<?xml version='1.0' encoding='UTF-8' ?>
<svg xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink' class='svglite' width='576.00pt' height='360.00pt' viewBox='0 0 576.00 360.00'>
<defs>
<style type='text/css'><![CDATA[
.svglite line, .svglite polyline, .svglite polygon, .svglite path, .svglite rect, .svglite circle {
fill: none;
stroke: #000000;
stroke-linecap: round;
stroke-linejoin: round;
stroke-miterlimit: 10.00;
}
.svglite text {
white-space: pre;
}
]]></style>
</defs>
<rect width='100%' height='100%' style='stroke: none; fill: #292E32;'/>
<defs>
<clipPath id='cpMC4wMHw1NzYuMDB8MC4wMHwzNjAuMDA='>
<rect x='0.00' y='0.00' width='576.00' height='360.00' />
</clipPath>
</defs>
<g clip-path='url(#cpMC4wMHw1NzYuMDB8MC4wMHwzNjAuMDA=)'>
<rect x='0.000000000000064' y='0.00' width='576.00' height='360.00' style='stroke-width: 1.07; stroke: #FFFFFF; fill: #292E32;' />
</g>
<defs>
<clipPath id='cpNTkuMTB8NDIxLjE5fDQ2LjQxfDI3OS4yMw=='>
<rect x='59.10' y='46.41' width='362.09' height='232.82' />
</clipPath>
</defs>
<g clip-path='url(#cpNTkuMTB8NDIxLjE5fDQ2LjQxfDI3OS4yMw==)'>
<polyline points='59.10,279.23 421.19,279.23 ' style='stroke-width: 1.07; stroke: #3B3E48; stroke-linecap: butt;' />
<polyline points='59.10,225.86 421.19,225.86 ' style='stroke-width: 1.07; stroke: #3B3E48; stroke-linecap: butt;' />
<polyline points='59.10,172.48 421.19,172.48 ' style='stroke-width: 1.07; stroke: #3B3E48; stroke-linecap: butt;' />
<polyline points='59.10,119.11 421.19,119.11 ' style='stroke-width: 1.07; stroke: #3B3E48; stroke-linecap: butt;' />
<polyline points='59.10,65.74 421.19,65.74 ' style='stroke-width: 1.07; stroke: #3B3E48; stroke-linecap: butt;' />
<circle cx='127.08' cy='111.51' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<circle cx='142.09' cy='105.00' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<circle cx='132.50' cy='111.77' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<circle cx='246.17' cy='113.52' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<circle cx='223.06' cy='116.55' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<circle cx='260.64' cy='105.27' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<polygon points='249.90,102.73 256.13,113.51 243.67,113.51 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<polygon points='243.36,104.64 249.58,115.42 237.13,115.42 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<polygon points='218.81,108.00 225.03,118.79 212.58,118.79 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<polygon points='119.28,100.75 125.50,111.54 113.05,111.54 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<polygon points='114.27,96.29 120.49,107.07 108.04,107.07 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<polygon points='131.88,96.40 138.10,107.19 125.65,107.19 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<circle cx='335.41' cy='90.10' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<circle cx='351.34' cy='85.38' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<circle cx='336.95' cy='85.21' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<polygon points='344.58,92.55 350.81,103.33 338.36,103.33 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<polygon points='349.73,94.20 355.96,104.98 343.51,104.98 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<polygon points='344.11,90.98 350.33,101.77 337.88,101.77 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
</g>
<g clip-path='url(#cpMC4wMHw1NzYuMDB8MC4wMHwzNjAuMDA=)'>
<polyline points='59.10,279.23 59.10,46.41 ' style='stroke-width: 2.13; stroke: #3B3E48; stroke-linecap: butt;' />
<text x='54.17' y='283.15' text-anchor='end' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='6.29px' lengthAdjust='spacingAndGlyphs'>0</text>
<text x='54.17' y='229.78' text-anchor='end' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='12.59px' lengthAdjust='spacingAndGlyphs'>10</text>
<text x='54.17' y='176.41' text-anchor='end' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='12.59px' lengthAdjust='spacingAndGlyphs'>20</text>
<text x='54.17' y='123.04' text-anchor='end' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='12.59px' lengthAdjust='spacingAndGlyphs'>30</text>
<text x='54.17' y='69.67' text-anchor='end' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='12.59px' lengthAdjust='spacingAndGlyphs'>40</text>
<polyline points='56.36,279.23 59.10,279.23 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
<polyline points='56.36,225.86 59.10,225.86 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
<polyline points='56.36,172.48 59.10,172.48 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
<polyline points='56.36,119.11 59.10,119.11 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
<polyline points='56.36,65.74 59.10,65.74 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
<polyline points='59.10,279.23 421.19,279.23 ' style='stroke-width: 2.13; stroke: #3B3E48; stroke-linecap: butt;' />
<polyline points='126.99,281.97 126.99,279.23 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
<polyline points='240.15,281.97 240.15,279.23 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
<polyline points='353.30,281.97 353.30,279.23 ' style='stroke-width: 1.07; stroke: #333333; stroke-linecap: butt;' />
<text x='126.99' y='292.01' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='26.52px' lengthAdjust='spacingAndGlyphs'>dplyr</text>
<text x='240.15' y='292.01' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='44.47px' lengthAdjust='spacingAndGlyphs'>duckplyr</text>
<text x='353.30' y='292.01' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='44.47px' lengthAdjust='spacingAndGlyphs'>duckplyr</text>
<text x='353.30' y='303.89' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='22.26px' lengthAdjust='spacingAndGlyphs'>with</text>
<text x='353.30' y='315.77' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='101.25px' lengthAdjust='spacingAndGlyphs'>`as_duckplyr_tibble`</text>
<text x='240.15' y='329.01' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='36.24px' lengthAdjust='spacingAndGlyphs'>Library</text>
<text transform='translate(36.20,162.82) rotate(-90)' text-anchor='middle' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='40.16px' lengthAdjust='spacingAndGlyphs'>Time (s)</text>
<rect x='432.15' y='132.07' width='115.50' height='61.50' style='stroke-width: 1.07; stroke: none; fill: #3B3E48;' />
<text x='437.63' y='146.73' style='font-size: 11.00px;fill: #FFFFFF; font-family: "Arial";' textLength='104.54px' lengthAdjust='spacingAndGlyphs'>Laptop power mode</text>
<circle cx='446.27' cy='162.17' r='4.62' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<polygon points='446.27,172.26 452.50,183.04 440.04,183.04 ' style='stroke-width: 0.71; stroke: none; fill: #FFFFFF; fill-opacity: 0.80;' />
<text x='460.39' y='165.31' style='font-size: 8.80px;fill: #FFFFFF; font-family: "Arial";' textLength='37.56px' lengthAdjust='spacingAndGlyphs'>balanced</text>
<text x='460.39' y='182.59' style='font-size: 8.80px;fill: #FFFFFF; font-family: "Arial";' textLength='53.76px' lengthAdjust='spacingAndGlyphs'>performance</text>
<text x='59.10' y='37.76' style='font-size: 13.20px;fill: #FFFFFF; font-family: "Arial";' textLength='224.18px' lengthAdjust='spacingAndGlyphs'>Time elapsed with dplyr vs. duckplyr</text>
</g>
</svg>

After

Width:  |  Height:  |  Size: 8.4 KiB

View File

@ -0,0 +1,127 @@
---
title: Performance experiments with duckplyr in R
description: I recently became aware of the 'duckplyr' library for R. Here are the results of my experimenting with it and benchmarking it against `dplyr`.
date: 2024-08-26T19:00:00+0200
draft: false
# ShowLastmod: true
toc: false
scrolltotop: true
tags:
- R
- statistics
---
I recently became aware of the [duckplyr][] library for R, which takes the place
of tidyverse's [dplyr][] library, but uses the [DuckDB] database under the hood.
Without really knowing anything about how dplyr works and if the use of DuckDB
would improve my workflow at all, I decided to perform an experiment. I am
currently analyzing two datasets, one with ~80k records and ~70 variables and
one with ~60k records and ~100 variables. Both datasets are wrangled with
[Tidyverse][]-foo in multiple ways and finally combined. The wrangling of the
data involves things like `rowwise()` and `c_across()`, which I know from
experience is quite an 'expensive' operation.
In order to get the execution times of my code, I did this repeatedly:
1. Restart R (by pressing <kbd>CTRL</kbd> <kbd>SHIFT</kbd> <kbd>F10</kbd>).
2. Run
```r
system.time(rmarkdown::render("my_file.Rmd"))
```
3. Record the user time and the system time elapsed.
4. Repeat twice.
I did this with both the "balanced power mode" and the "performance mode" on my
[laptop][]. During execution of the code, I left the laptop alone in order not
to interfere with the timing.
This is the result of my benchmarking:
{{< figure src="benchmarking1.svg" >}}
The times are user times. I left out the system times, which are in the range of
2-3 seconds.
Not really mind-boggling, right? It occurred to me that I rather double-check
that `duckplyr` was really being used. Indeed, this was _not_ the case:
```r
> class(clinical_data)
[1] "tbl_df" "tbl" "data.frame"
```
`clinical_data` was missing the `duckplyr_df' class. How come?
I import the raw data from Excel files (don't ask...) into tibbles, and
evidently, this prevents `duckplyr` from seeing the data frames. So I piped the
data frames through `as_duckplyr_tibble()` explicitly, and this got me the right
classes:
```r
> class(clinical_data)
[1] "duckplyr_df" "tbl_df" "tbl" "data.frame"
```
However, this did not really speed up the execution either.
{{< figure src="benchmarking2.svg" >}}
I looked around my RMarkdown chunks and their outputs, but I did not find any
warning that `duckplyr` had to fall back to `dplyr`'s methods. This could have
explained the absence of a noticeable difference.
Here are the average times (in seconds) for the benchmarking runs.
```r
> runs_table
# A tibble: 6 × 4
# Groups: library, power_mode [6]
library power_mode mean sd
<chr> <chr> <dbl> <dbl>
1 dplyr balanced 31.8 0.722
2 dplyr performance 32.6 0.477
3 duckplyr balanced 31.4 1.10
4 duckplyr performance 31.3 0.495
5 duckplyr with `as_duckplyr_tibble` balanced 36.0 0.517
6 duckplyr with `as_duckplyr_tibble` performance 33.6 0.303
```
So at least for my (!!!) use case, the use of `duckplyr` instead of `dplyr` did
not make any practical difference, and I can also leave my laptop's performance
mode alone. When it comes to optimizing performance, you can't just buy a
solution off the shelf, you always have to try and find the best solution for
your specific problem.
Your mileage will vary, of course. The people who develop `duckplyr` are
brilliant, and the fact that it does not work for me tells more about me and my
work than it does about `duckplyr`.
## The duckplyr demo dataset
As a case in point, the [duckplyr demo repository][duckplyr-demo] contains a
taxi data set. The ZIP file alone is a ~1.7 GB download. Deflated, the files
take up 2.4 GB. With about 21 million records (24 variables), this dataset
is _considerably_ larger than mine.
Here are the results from running `dplyr/run_all_queries.R` and
`duckplyr/run_all_queries.R` on my Thinkpad P14s (performance mode in F40 KDE):
| Library | q01 | q02 | q03 | q04 |
|----------|------:|------:|------:|-------:|
| dplyr | 3.4 s | 3.9 s | 9.1 s | 14.3 s |
| duckplyr | 4.3 s | 4.4 s | 9.4 s | 14.8 s |
I should add that execution times vary with each run, but the big picture stays
the same.
Maybe I'm missing the point and it's not about execution times, after all.
`¯\_(ツ)_/`
[dplyr]: https:/dplyr.tidyverse.org
[duckdb]: https://duckdb.org
[duckplyr]: https://duckplyr.tidyverse.org
[duckplyr-demo]: https://github.com/Tmonster/duckplyr_demo
[laptop]: {{< ref "/posts/2024-08-05-linux-on-thinkpad-P14s-Gen-5" >}}
[tidyverse]: https://tidyverse.org