- Published on
Dataflow Everything
- Authors
- Name
- Dataflow
This documentation serves as a comprehensive guide to the resources we use for machine learning, data science, and hackathons.
Table of Contents
- Essential ML Tools
- ML Platforms (Notebooks, Datasets, Models, and More)
- Hackathons List
- ML Problem Solving Platforms
- ML Hands-On Learning Platforms
- ML Cheatsheets
- Programming Languages
- Machine Learning Frameworks
- Fast-Prototyping Frameworks
- DevOps Tools
- Free Cloud GPU Providers
- AI Tools
- No Code AI Dev Tools
- ML/AI YouTube Channels
Essential ML Tools
- numpy: A fundamental package for scientific computing with Python, providing support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.
- numba: JIT compiler that translates a subset of Python and NumPy code into fast machine code.
- pandas: A powerful data manipulation and analysis library for Python, providing data structures like DataFrames for handling structured data.
- polars: A fast DataFrame library implemented in Rust, designed for high performance and memory efficiency, particularly useful for large datasets.
- XGBoost: An optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable, widely used for structured/tabular data.
- YOLO: (You Only Look Once) revolutionary real-time object detection algorithm.
- scikit-learn: A machine learning library for Python that provides simple and efficient tools for data mining and data analysis, built on NumPy, SciPy, and Matplotlib.
- matplotlib: A plotting library for Python and its numerical mathematics extension NumPy, used for creating static, animated, and interactive visualizations in Python.
- seaborn: A statistical data visualization library based on Matplotlib, providing a high-level interface for drawing attractive and informative statistical graphics.
- plotly: A graphing library for Python that makes interactive, publication-quality graphs online, supporting a wide range of chart types.
- TensorFlow: A comprehensive open-source platform for machine learning.
- PyTorch: A flexible and powerful deep learning framework.
- cudf: A GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.
- cuML: A suite of GPU-accelerated machine learning algorithms.
ML Platforms (Notebooks, Datasets, Models, and More)
- Kaggle: A platform for data science competitions, datasets, and collaborative coding, offering a wide range of public datasets and a cloud-based Jupyter notebook environment.
- Hugging Face: A platform for sharing and collaborating on machine learning models, particularly in natural language processing (NLP) and computer vision.
- Google Colab: A free Jupyter notebook environment that runs entirely in the cloud, allowing users to write and execute Python code in the browser with access to powerful hardware like GPUs.
- Jupyter Notebooks: An open-source web application that allows you to create and share documents containing live code, equations, visualizations, and narrative text.
- Groq: Fastest inference with AI models (uses specifically designed chips - Language Processing Units (LPUs)).
- AI Datasets Platform in Kazakhstan by Astana Hub: Access government datasets, open source data, and more to kickstart your AI projects.
- DSML.kz: Kazakhstan's largest AI community, fostering knowledge sharing and professional growth in AI/ML.
Hackathons List
ML Problem Solving Platforms
- DeepML: A platform for practicing machine learning problems and improving your skills.
- CodeRun: A platform by Yandex for solving machine learning problems.
- LeetGPU: A platform for solving coding problems on GPUs (CUDA).
- Hackerearth ML: A platform for practicing machine learning problems and improving your skills.
ML Hands-On Learning Platforms
- Google ML Crash Course: A free, self-paced course by Google covering the basics of machine learning with practical exercises.
- Yandex ML Trainings 1.0 (ML Fundamentals): by Yandex for hands-on experience in Machine Learning fundamentals.
- Yandex ML Trainings 2.0 (NLP): by Yandex for hands-on experience in Natural Language Processing.
- Yandex ML Trainings 3.0 (Computer Vision): by Yandex for hands-on experience in Computer Vision.
- Yandex ML Handbook: A comprehensive guide covering various machine learning topics and techniques.
- FreeCodeCamp ML: A comprehensive curriculum covering various machine learning topics using Python.
- Coursera ML Courses by DeepLearning.AI: Courses on machine learning and deep learning by Andrew Ng and team.
- Kaggle Learn: Hands-on tutorials and courses on various data science and machine learning topics.
ML Cheatsheets
- Stanford CS 229 - Machine Learning Cheatsheet: A comprehensive cheatsheet covering key concepts and techniques in machine learning.
Programming Languages
- Python: The primary language for machine learning and data science.
- C++: Used for performance-critical components and libraries.
Fast-Prototyping Frameworks
- FastAPI: Utilized for creating high-performance APIs.
- Streamlit: For creating quick and interactive data-driven web applications.
- Gradio: A user-friendly framework for building machine learning demos and interfaces: UI for users / API for software / MCP for LLMs.
DevOps Tools
- Docker: For containerization of applications, ensuring consistency across environments.
- MLflow: For managing the machine learning lifecycle, including experimentation, reproducibility, and deployment.
- Git: Version control system for tracking changes in code and collaborating with team members.
- GitHub: Platform for hosting and managing Git repositories.
- Weights & Biases: A tool for experiment tracking, model management, and collaboration in machine learning projects.
Free Cloud GPU Providers
- Kaggle Kernels: Free access to GPUs for running Jupyter notebooks.
- Google Colab: Free access to GPUs and TPUs for running Jupyter notebooks.
- Sagemaker by Amazon: Free tier available for building, training, and deploying machine learning models.
- Gradient by Paperspace: Free tier available for running Jupyter notebooks with GPU support.
AI Tools
- NotebookLM by Google: great for research and brainstorming.
- Perplexity AI: AI-powered search engine that provides concise and accurate answers to user queries.
- ChatGPT by OpenAI: Advanced conversational AI model capable of understanding and generating human-like text.
- Claude by Anthropic: An AI assistant designed to be helpful, honest, and harmless.
- Google AI Studio (Gemini) by Google + Google Gemini Cookbook by Google: A platform for building and deploying AI applications using Google's Gemini models.
- Gemini AI Chat by Google: Chat interface for interacting with Google's Gemini models.
- Grok by xAI: AI assistant integrated with X (formerly Twitter) for enhanced user experience.
No Code AI Dev Tools
- Teachable Machine by Google: A web-based tool that makes it easy to create machine learning models without any coding, allowing users to train models using their own data.
- Roboflow: A platform for building and deploying computer vision models without extensive coding.
- HuggingFace AutoTrain: A tool that simplifies the process of training machine learning models, particularly in natural language processing and computer vision, without requiring deep technical expertise.
- Firebase Studio by Google: A no-code development environment for building and deploying applications on Firebase.
- Lovable: No Code AI-powered applications development platform.
- Bolt.new: No Code AI-powered applications development platform.