- Published on
Dataflow Tools
- Authors
- Name
- Dataflow Team
This documentation serves as a comprehensive guide to the tools / platforms we use for machine learning, data science, and hackathons.
Table of Contents
- Essential ML Tools
- ML Platforms (Notebooks, Datasets, Models, and More)
- Hackathons List
- ML Problem Solving Platforms
- ML Hands-On Learning Platforms
- ML Cheatsheets
- Programming Languages
- Machine Learning Frameworks
- Backend Frameworks
- Web Development
- DevOps Tools
- Cloud Computing
- AI Tools
- No Code AI Dev Tools
Essential ML Tools
- numpy: A fundamental package for scientific computing with Python, providing support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.
- pandas: A powerful data manipulation and analysis library for Python, providing data structures like DataFrames for handling structured data.
- polars: A fast DataFrame library implemented in Rust, designed for high performance and memory efficiency, particularly useful for large datasets.
- XGBoost: An optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable, widely used for structured/tabular data.
- scikit-learn: A machine learning library for Python that provides simple and efficient tools for data mining and data analysis, built on NumPy, SciPy, and Matplotlib.
- matplotlib: A plotting library for Python and its numerical mathematics extension NumPy, used for creating static, animated, and interactive visualizations in Python.
- seaborn: A statistical data visualization library based on Matplotlib, providing a high-level interface for drawing attractive and informative statistical graphics.
- plotly: A graphing library for Python that makes interactive, publication-quality graphs online, supporting a wide range of chart types.
- TensorFlow: A comprehensive open-source platform for machine learning.
- PyTorch: A flexible and powerful deep learning framework.
- Keras: A high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano.
ML Platforms (Notebooks, Datasets, Models, and More)
- Kaggle: A platform for data science competitions, datasets, and collaborative coding, offering a wide range of public datasets and a cloud-based Jupyter notebook environment.
- Hugging Face: A platform for sharing and collaborating on machine learning models, particularly in natural language processing (NLP) and computer vision.
- Google Colab: A free Jupyter notebook environment that runs entirely in the cloud, allowing users to write and execute Python code in the browser with access to powerful hardware like GPUs.
- Jupyter Notebooks: An open-source web application that allows you to create and share documents containing live code, equations, visualizations, and narrative text.
- Groq: Fastest inference with AI models (uses specifically designed chips - Language Processing Units (LPUs)).
- AI Datasets Platform in Kazakhstan by Astana Hub: Access government datasets, open source data, and more to kickstart your AI projects.
- DSML.kz: Kazakhstan's largest AI community, fostering knowledge sharing and professional growth in AI/ML.
Hackathons List
ML Problem Solving Platforms
ML Hands-On Learning Platforms
- Google ML Crash Course
- Yandex ML Trainings 1.0 (ML Fundamentals)
- Yandex ML Trainings 2.0 (NLP)
- Yandex ML Trainings 3.0 (Computer Vision)
- Yandex ML Handbook
- FreeCodeCamp ML
- Coursera ML Courses by DeepLearning.AI
- Kaggle Learn
ML Cheatsheets
Programming Languages
- Python: The primary language for machine learning and data science.
- C++: Used for performance-critical components and libraries.
Backend Frameworks
- Flask: A lightweight framework for building simple and scalable web applications.
- FastAPI: Utilized for creating high-performance APIs.
Web Development
- Streamlit: For creating quick and interactive data-driven web applications.
- Gradio: A user-friendly framework for building machine learning demos and interfaces: UI for users / API for software / MCP for LLMs.
DevOps Tools
- Docker: For containerization of applications, ensuring consistency across environments.
- MLflow: For managing the machine learning lifecycle, including experimentation, reproducibility, and deployment.
- Git: Version control system for tracking changes in code and collaborating with team members.
- GitHub: Platform for hosting and managing Git repositories.
- Weights & Biases: A tool for experiment tracking, model management, and collaboration in machine learning projects.
Cloud Computing
- AWS (Amazon Web Services): Used for scalable cloud computing solutions.
- Microsoft Azure: Provides cloud services for building, testing, deploying, and managing applications.
AI Tools
- NotebookLM by Google
- Perplexity AI
- ChatGPT by OpenAI
- Claude by Anthropic
- Google AI Studio (Gemini) by Google + Google Gemini Cookbook by Google
- Gemini AI Chat by Google
- Grok by xAI
No Code AI Dev Tools
- Teachable Machine by Google: A web-based tool that makes it easy to create machine learning models without any coding, allowing users to train models using their own data.
- Roboflow: A platform for building and deploying computer vision models without extensive coding.
- HuggingFace AutoTrain: A tool that simplifies the process of training machine learning models, particularly in natural language processing and computer vision, without requiring deep technical expertise.
- Firebase Studio by Google: A no-code development environment for building and deploying applications on Firebase.
- Lovable: No Code AI-powered applications development platform.
- Bolt.new: No Code AI-powered applications development platform.