Changelog #20: Data Source descriptions and beta testing of Parquet ingestion
Document your Data Sources by adding descriptions just like you do for pipes and nodes
Here's a quick round-up of the latest product enhancements on our journey to making realtime data analytics delightful.
Data Source descriptions
In a large workspace that contains many Data Sources, you may want more information than just the name of the Data Source. Documentation matters. Now you can add a description to a Data Source like you already do with pipes, nodes and endpoints.
This feature is available through the UI and the CLI. Any new descriptions will propagate to shared sources.
Early beta support for Parquet files
At Tinybird we aim to capture and transform large amounts of data whatever the origin of the data or format. In addition to CSV and NDJSON, we're working on accepting Parquet format files. Parquet is an open-source, column-oriented data file format designed for efficient data storage and retrieval. It is also commonly used as an interchange format between data tools.
Our team is now testing ingesting data to Tinybird from Parquet. After further testing, this new format for ingestion will be included in our docs. If you'd like you join the beta reach out to us on Slack
CLI updates!
tb push --subset - We've added the tb push --subset
option to be used with --populate
so you can populate using only a subset of the data of between 0% and 10%. Now you can quickly validate a materialized view with just a subset of your total dataset. You can check everything is working with a single month’s worth of data even if you have several years’ worth of data.
Data Source description - We've added the option of adding a description for a Data Source, as you already could for pipes, thereby improving the documentation of your data project.
Endpoints from materialized Data Sources - We've fixed code so that nodes whose type is materialized can no longer be published as endpoints. Logically the endpoint should depend on the target Data Source of the materialized node, not the node itself.
Check out the latest command-line updates in the changelog.
ClickHouse improvements
groupSortedArray - This new aggregation function was added to ClickHouse. groupSortedArray(n)(param1)
returns an array with the n first values from a field param1 sorted by itself. groupSortedArray(n)(param1, param2)
returns an array with the n first values from a field param1 sorted by param2 (field or expression). This aggregation function is useful if, for example, you have a materialized view with the two most recent values in a field.
ASOF join - The ASOF join performance improvement was included in Tinybird. This join was improved by the ClickHouse community to be twice as fast.
Community Slack
Join the conversation and our community in Slack. We’d love to hear what you’re building with Tinybird!