![should you be doing statistical calculations in sql should you be doing statistical calculations in sql](https://codingsight.com/wp-content/uploads/2019/07/3-Calculate-the-Median-with-Ranking-Function-2.png)
There are things that just are not possible in SQL, or at least are extremely difficult and convoluted.
Should you be doing statistical calculations in sql how to#
SQL is built on Relational Algebra, introduced by Edgar Codd in the 70s, and this logic for how to join, merge, retrieve, filter and intersect data has truly stood the test of time.For many analysts SQL is the basis of how they think about tabular data. I have a love-hate relationship with SQL, not least because it was the first coding language I learnt and therefore have sort of nostalgic feelings towards it. With the advent of ELT over ETL, and great tools like Dataform and dbt, SQL pipelines running in your data warehouse have become incredibly powerful. This makes a SQL environment ideal for data pipelines.
![should you be doing statistical calculations in sql should you be doing statistical calculations in sql](https://i.stack.imgur.com/hUl0X.png)
BigQuery is also very easy to use, and integrates well with other GCP products I might use for data pipelines - this is also true for Redshift in the AWS ecosystem.
![should you be doing statistical calculations in sql should you be doing statistical calculations in sql](https://image2.slideserve.com/4457472/what-statistical-analysis-should-i-use1-l.jpg)
Its ability to easily crunch through data quickly and cost effectively is fantastic. If I want to perform analysis on a large dataset, I will almost always try to move my data into something like BigQuery. As well as this, databases such as PostgreSQL and MySQL, and modern data warehouses such BigQuery, Redshift and Snowflake, are highly optimised to perform these types of row-by-row calculations. For a start, there are no row limits (like the 1,048,576 row limit in Excel or the 500,000 cell limit in Google Sheets). SQL’s 3 biggest strengths (as I see them) are performance, ease of use and scalability.Įven a small database running on your laptop will be able to perform calculations and transformations on datasets far bigger than what would be possible in desktops spreadsheet applications such as Excel or Google Sheets. SQL has come a long way since its initial development by IBM in the 60s, and now is quite capable at performing relatively sophisticated analysis. SQL is the lingua franca for data manipulation and transformation, as well as permanent storage and management. In this post I’ll go through my perceived strengths and weaknesses of each language, and where each should and perhaps shouldn’t be used. Well that’s a tough question to answer, as they both have their own strengths and weaknesses for different tasks. I love both languages, and have used them for a number of years now. I regularly use SQL and R to perform data analysis and manage my data transformations and data pipelines.