Data Science Simplified
Keep up with latest tools and best practices in data science in 2 minutes.
Subscribe to get short daily code snippets delivered straight to your mailbox.
Latest short posts
The default collect method in Polars processes your data as a single batch, which means that all the data must fit into your available memory.
If your data requires more memory than you have available, use the streaming mode to process it in batches. To use streaming mode, simply pass the streaming=True
argument to the collect
method.
Ensuring data quality and reliability is crucial in dbt projects.
DataPilot’s Power User VSCode extension provide a seamless testing experience in dbt, enabling you to catch data issues early in the development process. With the extension, you can:
- Validate your dbt code against best practices
- Troubleshoot using real-time column lineage
- Generate dbt tests with ease
LLMs have restrictions in terms of context and action-taking capabilities.
Phidata enhances LLMs by integrating three essential components:
- Memory: Enables LLMs to have long-term conversations by storing chat history in a database.
- Knowledge: Provides LLMs with business context by storing information in a vector database.
- Tools: Enable LLMs to take actions like pulling data from an API, sending emails or querying a database.