Speeding up databricks SQL queries
Retrieving data from a datawarehouse is a common operation for any data scientist. In August 2021 databricks released a blog post describing how [Databricks] achieved high-bandwidth... »
Retrieving data from a datawarehouse is a common operation for any data scientist. In August 2021 databricks released a blog post describing how [Databricks] achieved high-bandwidth... »
If you have experience building machine learning models on tabular data you will have experienced that gradient boosting based algorithms like catboost, lightgbm and xgboost are... »
The RandomForest algorithm was introduced by Breiman back in 2001 (paper). In 2022 it is still a commonly used algorithm by many data scientists. The only... »
In the post Using MkDocs for technical reporting I explained how MkDocs works and why it’s a good choice for writing technical reports. In this post... »
skorecard is an open source python package that provides scikit-learn compatible tools for bucketing categorical and numerical features and building traditional credit risk acceptance models (scorecards)... »