Published on

Dataflow Tools

Authors
  • avatar
    Name
    Dataflow Team
    Twitter

This documentation serves as a comprehensive guide to the tools / platforms we use for machine learning, data science, and hackathons.

Table of Contents

Essential ML Tools

  • numpy: A fundamental package for scientific computing with Python, providing support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.
  • pandas: A powerful data manipulation and analysis library for Python, providing data structures like DataFrames for handling structured data.
  • polars: A fast DataFrame library implemented in Rust, designed for high performance and memory efficiency, particularly useful for large datasets.
  • XGBoost: An optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable, widely used for structured/tabular data.
  • scikit-learn: A machine learning library for Python that provides simple and efficient tools for data mining and data analysis, built on NumPy, SciPy, and Matplotlib.
  • matplotlib: A plotting library for Python and its numerical mathematics extension NumPy, used for creating static, animated, and interactive visualizations in Python.
  • seaborn: A statistical data visualization library based on Matplotlib, providing a high-level interface for drawing attractive and informative statistical graphics.
  • plotly: A graphing library for Python that makes interactive, publication-quality graphs online, supporting a wide range of chart types.
  • TensorFlow: A comprehensive open-source platform for machine learning.
  • PyTorch: A flexible and powerful deep learning framework.
  • Keras: A high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano.

ML Platforms (Notebooks, Datasets, Models, and More)

  • Kaggle: A platform for data science competitions, datasets, and collaborative coding, offering a wide range of public datasets and a cloud-based Jupyter notebook environment.
  • Hugging Face: A platform for sharing and collaborating on machine learning models, particularly in natural language processing (NLP) and computer vision.
  • Google Colab: A free Jupyter notebook environment that runs entirely in the cloud, allowing users to write and execute Python code in the browser with access to powerful hardware like GPUs.
  • Jupyter Notebooks: An open-source web application that allows you to create and share documents containing live code, equations, visualizations, and narrative text.
  • Groq: Fastest inference with AI models (uses specifically designed chips - Language Processing Units (LPUs)).
  • AI Datasets Platform in Kazakhstan by Astana Hub: Access government datasets, open source data, and more to kickstart your AI projects.
  • DSML.kz: Kazakhstan's largest AI community, fostering knowledge sharing and professional growth in AI/ML.

Hackathons List

ML Problem Solving Platforms

ML Hands-On Learning Platforms

ML Cheatsheets

Programming Languages

  • Python: The primary language for machine learning and data science.
  • C++: Used for performance-critical components and libraries.

Backend Frameworks

  • Flask: A lightweight framework for building simple and scalable web applications.
  • FastAPI: Utilized for creating high-performance APIs.

Web Development

  • Streamlit: For creating quick and interactive data-driven web applications.
  • Gradio: A user-friendly framework for building machine learning demos and interfaces: UI for users / API for software / MCP for LLMs.

DevOps Tools

  • Docker: For containerization of applications, ensuring consistency across environments.
  • MLflow: For managing the machine learning lifecycle, including experimentation, reproducibility, and deployment.
  • Git: Version control system for tracking changes in code and collaborating with team members.
  • GitHub: Platform for hosting and managing Git repositories.
  • Weights & Biases: A tool for experiment tracking, model management, and collaboration in machine learning projects.

Cloud Computing

AI Tools

No Code AI Dev Tools

  • Teachable Machine by Google: A web-based tool that makes it easy to create machine learning models without any coding, allowing users to train models using their own data.
  • Roboflow: A platform for building and deploying computer vision models without extensive coding.
  • HuggingFace AutoTrain: A tool that simplifies the process of training machine learning models, particularly in natural language processing and computer vision, without requiring deep technical expertise.
  • Firebase Studio by Google: A no-code development environment for building and deploying applications on Firebase.
  • Lovable: No Code AI-powered applications development platform.
  • Bolt.new: No Code AI-powered applications development platform.