{"id":112000,"date":"2025-01-02T22:21:39","date_gmt":"2025-01-02T22:21:39","guid":{"rendered":"https:\/\/www.timestored.com\/b\/?p=112000"},"modified":"2025-01-02T22:28:54","modified_gmt":"2025-01-02T22:28:54","slug":"the-big-data-trends-of-2024","status":"publish","type":"post","link":"https:\/\/www.timestored.com\/b\/the-big-data-trends-of-2024\/","title":{"rendered":"The Big Data Events of 2024"},"content":{"rendered":"<div id=\"attachment_112001\" style=\"width: 310px\" class=\"wp-caption alignright\"><img aria-describedby=\"caption-attachment-112001\" loading=\"lazy\" class=\"wp-image-112001 size-medium\" src=\"https:\/\/www.timestored.com\/b\/wp-content\/uploads\/2025\/01\/pyarrowdowloads-300x157.png\" alt=\"pyarrow downloads\" width=\"300\" height=\"157\" srcset=\"https:\/\/www.timestored.com\/b\/wp-content\/uploads\/2025\/01\/pyarrowdowloads-300x157.png 300w, https:\/\/www.timestored.com\/b\/wp-content\/uploads\/2025\/01\/pyarrowdowloads-768x402.png 768w, https:\/\/www.timestored.com\/b\/wp-content\/uploads\/2025\/01\/pyarrowdowloads.png 820w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><p id=\"caption-attachment-112001\" class=\"wp-caption-text\">PyArrow dowloads<\/p><\/div>\n<ol>\n<li><strong>Open source tools are now as performant as pre-existing commercial offerings<\/strong> for data analysis and in many ways offer more features.<br \/>\nProof: See the time-series benchmarks and note how many are open source: <a href=\"https:\/\/www.timestored.com\/data\/time-series-database-benchmarks\">https:\/\/www.timestored.com\/data\/time-series-database-benchmarks<\/a><\/li>\n<li><strong>Everyone has discovered that column-oriented storage<\/strong> and vector execution is the secret to fast analytics.<\/li>\n<li><strong>Arrow format has won.<\/strong> It is now a cornerstone technology used in python, numpy, polars, duckdb, R.<br \/>\n<a href=\"https:\/\/www.reddit.com\/r\/datascience\/comments\/12dbhsg\/pandas_20_is_going_live_and_apache_arrow_will\/\">Pandas replaces numpy with arrow<\/a>, <a href=\"https:\/\/duckdb.org\/2021\/12\/03\/duck-arrow.html\">DuckDB quacks arrow<\/a>, <a href=\"https:\/\/community.questdb.com\/t\/whats-the-eta-of-supporting-of-arrow-adbc\/202\">QuestDB will support arrow<\/a>, <a href=\"https:\/\/www.influxdata.com\/blog\/introduction-apache-arrow\/\">InfluxDB<\/a> (2023), <a href=\"https:\/\/github.com\/pola-rs\/polars\">Polars<\/a> is built upon Apache Arrow.<\/li>\n<li><strong>Apache parquet has won<\/strong> as the lowest common denominator for basic data storage.<br \/>\n<a href=\"https:\/\/questdb.com\/blog\/questdb-release-8-1-0\/\">QuestDB queries parquet<\/a>, <a href=\"https:\/\/duckdb.org\/2021\/06\/25\/querying-parquet.html\">DuckDB supports parquet (2021)<\/a>, <a href=\"https:\/\/clickhouse.com\/blog\/apache-parquet-clickhouse-local-querying-writing\">Clickhouse<\/a> , <a href=\"https:\/\/docs.greptime.com\/contributor-guide\/datanode\/data-persistence-indexing\/\">GreptimeDB<\/a> uses Arrow and Parquet.<\/li>\n<li>Iceberg vs Delta vs Hudi. <strong>Iceberg won<\/strong>. <a href=\"https:\/\/aws.amazon.com\/about-aws\/whats-new\/2024\/12\/amazon-s3-tables-apache-iceberg-tables-analytics-workloads\/\">AWS announcement<\/a>.<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<h2>Trends of 2024<\/h2>\n<p><strong>DuckDB is on course to become the defacto column oriented database<\/strong> that all others will be compared to.<br \/>\nClickhouse conquered a number of enterprises but difficuly deploying and getting started now seem like key factors that held it back.<\/p>\n<div id=\"attachment_112002\" style=\"width: 835px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-112002\" loading=\"lazy\" class=\"wp-image-112002 size-full\" src=\"https:\/\/www.timestored.com\/b\/wp-content\/uploads\/2025\/01\/duckdbdownloads.png\" alt=\"DuckDB Downloads\" width=\"825\" height=\"428\" srcset=\"https:\/\/www.timestored.com\/b\/wp-content\/uploads\/2025\/01\/duckdbdownloads.png 825w, https:\/\/www.timestored.com\/b\/wp-content\/uploads\/2025\/01\/duckdbdownloads-300x156.png 300w, https:\/\/www.timestored.com\/b\/wp-content\/uploads\/2025\/01\/duckdbdownloads-768x398.png 768w\" sizes=\"(max-width: 825px) 100vw, 825px\" \/><p id=\"caption-attachment-112002\" class=\"wp-caption-text\">DuckDB Downloads<\/p><\/div>\n<div id=\"attachment_112005\" style=\"width: 754px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-112005\" loading=\"lazy\" class=\"wp-image-112005 size-full\" src=\"https:\/\/www.timestored.com\/b\/wp-content\/uploads\/2025\/01\/duckdb-stars.png\" alt=\"DuckDB Stars\" width=\"744\" height=\"491\" srcset=\"https:\/\/www.timestored.com\/b\/wp-content\/uploads\/2025\/01\/duckdb-stars.png 744w, https:\/\/www.timestored.com\/b\/wp-content\/uploads\/2025\/01\/duckdb-stars-300x198.png 300w\" sizes=\"(max-width: 744px) 100vw, 744px\" \/><p id=\"caption-attachment-112005\" class=\"wp-caption-text\">DuckDB Stars<\/p><\/div>\n<h2>Underlying Factors<\/h2>\n<p>Why has SQL and python won? In many ways these are terrible languages (GIL , SET theory) but they won? I can&#8217;t say all the reasons but some things that I believe worked in favour:<\/p>\n<ol>\n<li><strong>Open Source + Free = Hard to beat.<\/strong> We&#8217;ve seen open-source companies (license disputes mentioned below) take over every area. VCs and startups have realised making big money selling dev tools requires solving two problems: distribution + technology and the harder one is now distribution. The important thing is getting your product into the hands and heads of as many people as possible. Once there, you can withhold all useful enterprise features and charge for them, assuming AWS doesn&#8217;t try the same trick. I do wonder if this is causing the death of otherwise small viable software bsuinesses.<\/li>\n<li>Google = a second brain that worked on keyword search. Languages that had judicial overloading are harder to search than languages with many function names. <strong>Google makes it easier to find uniquely named functions<\/strong> that python has. Does anyone still read the manual? nevermind the 500+ page language bibles that were the only way to learn languages 20 years ago?<\/li>\n<li><strong>AI<\/strong> &#8211; It hasn&#8217;t been a factor to date but AI is similar to the google benefit but even more. The more data and usage, the more chance AI can write your code, write your query etc. Will this <strong>reinforce the benefit that fully expanded syntax and popularity already provides?<\/strong> APL could be even <a href=\"https:\/\/www.sacrideo.us\/is-apl-dead\/\">more dead<\/a> than it is already.<\/li>\n<\/ol>\n<h3><\/h3>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Open source tools are now as performant as pre-existing commercial offerings for data analysis and in many ways offer more features. Proof: See the time-series benchmarks and note how many are open source: https:\/\/www.timestored.com\/data\/time-series-database-benchmarks Everyone has discovered that column-oriented storage and vector execution is the secret to fast analytics. Arrow format has won. It is [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0},"categories":[115,23],"tags":[],"_links":{"self":[{"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/posts\/112000"}],"collection":[{"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/comments?post=112000"}],"version-history":[{"count":3,"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/posts\/112000\/revisions"}],"predecessor-version":[{"id":112007,"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/posts\/112000\/revisions\/112007"}],"wp:attachment":[{"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/media?parent=112000"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/categories?post=112000"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/tags?post=112000"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}