TL;DR: DuckDB has a fully parallelized aggregate hash table that can efficiently aggregate over millions of groups. Grouped aggregations are a core data analysis command. It is particularly important for large-scale data analysis (“OLAP”) because it is useful for computing statistical summaries of huge tables. DuckDB contains a highly optimized parallel aggregation capability for fastContinue reading “Parallel Grouped Aggregation in DuckDB”
Tag Archives: Database
Jailer: A tool for database subsetting, schema and data browsing
It exports consistent, referentially intact row-sets from relational databases. It removes obsolete data without violating integrity. It is DBMS agnostic (by using JDBC), platform independent, and generates DbUnit datasets, hierarchically structured XML, and topologically sorted SQL-DML. Read more…
Are You Sure You Want to Use MMAP in Your Database Management System?
Memory-mapped (MMAP) file I/O is an OS-provided feature that maps the contents of a file on secondary storage into a program’s address space. The program then accesses pages via pointers as if the file resided entirely in memory. The OS transparently loads pages only when the program references them and automatically evicts pages if memoryContinue reading “Are You Sure You Want to Use MMAP in Your Database Management System?”