Skip to content

scalabl3/ray-fundamentals

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ray Core Demonstration

Complete preparation materials for understanding Ray Core fundamentals.

📋 Overview

This repository contains 5 comprehensive Python files that progressively build your Ray Core expertise:

  1. 01-ray-core-fundamentals.py - Ray basics, initialization, tasks, ObjectRefs
  2. 02-ray-core-advanced-tasks.py - Task dependencies, error handling, performance patterns
  3. 03-ray-core-actors.py - Stateful computing, actor patterns, lifecycle management
  4. 04-ray-core-debugging.py - Troubleshooting, performance debugging, distributed issues
  5. 05-ray-core-scenarios.py - Complete scenarios and challenges

Technical Details

Technical Skills Covered

  • Ray Fundamentals: Initialization, tasks, actors, ObjectRefs
  • Distributed Computing: Task graphs, parallelization patterns
  • Performance Optimization: Resource management, bottleneck identification
  • Error Handling: Fault tolerance, debugging distributed systems
  • Real-world Patterns: ML training, data pipelines, monitoring systems

Question Types

  • Implementation challenges (build X with Ray)
  • Debugging scenarios (fix this slow/broken Ray app)
  • Architecture design (how would you build a distributed system?)
  • Performance optimization (this Ray app is slow, make it faster)
  • Troubleshooting (tasks are hanging/failing, debug it)

🚀 Quick Start

Prerequisites

pip install ray numpy psutil

Run All Tests

python run_all_tests.py

Run Individual Files

python 01-ray-core-fundamentals.py
python 02-ray-core-advanced-tasks.py
python 03-ray-core-actors.py
python 04-ray-core-debugging.py
python 05-ray-core-scenarios.py

📚 Learning Path

Lesson Timeline

Lesson 1: Fundamentals

  • Run and understand 01-ray-core-fundamentals.py
  • Master Ray initialization, tasks, ObjectRefs
  • Practice basic parallel patterns
  • Key concepts: ray.remote, ray.get(), ray.put()

Lesson 2: Advanced Tasks & Actors

  • Study 02-ray-core-advanced-tasks.py
  • Learn task dependencies and chaining
  • Master 03-ray-core-actors.py
  • Understand when to use actors vs tasks
  • Key concepts: Stateful computation, parameter servers

Lesson 3: Debugging & Performance

  • Work through 04-ray-core-debugging.py
  • Learn performance debugging techniques
  • Practice with Ray Dashboard
  • Key concepts: Memory management, resource monitoring

Lesson 4: Scenarios

  • Complete 05-ray-core-scenarios.py
  • Practice explaining solutions
  • Review common questions
  • Mock practice

🎪 File Details

01 - Fundamentals

Core Concepts:

  • Ray initialization and shutdown
  • @ray.remote decorator
  • ObjectRefs vs actual values
  • ray.get() and ray.put()
  • Basic parallel execution

Key Points:

  • When remote functions return ObjectRefs
  • How to avoid killing parallelism with early ray.get()
  • Proper resource specification
  • Common initialization mistakes

02 - Advanced Tasks

Core Concepts:

  • Task dependencies and chaining
  • Dynamic task creation
  • Error handling patterns
  • Performance optimization
  • Resource-aware scheduling

Key Points:

  • Passing ObjectRefs between tasks
  • Building task graphs
  • Handling failures gracefully
  • Map-reduce patterns with Ray

03 - Actors

Core Concepts:

  • Stateful distributed computing
  • Actor lifecycle management
  • Actor-task interaction
  • Performance considerations
  • Common anti-patterns

Key Points:

  • When to use actors vs tasks
  • Parameter server pattern
  • Avoiding actor bottlenecks
  • Proper resource cleanup

04 - Debugging

Core Concepts:

  • Memory and object store issues
  • Performance profiling
  • Distributed debugging
  • Ray Dashboard usage
  • Common error patterns

Key Points:

  • Diagnosing slow Ray applications
  • Memory leak detection
  • Using Ray monitoring tools
  • Timeout and hanging task handling

05 - Challenge Scenarios

Complete Challenges:

  • Distributed data processing pipeline
  • ML parameter server implementation
  • Real-time monitoring system
  • Performance debugging exercise

Simulation:

  • End-to-end system design
  • Debugging broken applications
  • Optimization challenges
  • Architecture discussions

🛠 Common Questions

Implementation Questions

  1. "Implement a distributed data processing pipeline"

    • Multi-stage processing with validation, transformation, aggregation
    • Load balancing across components
    • Error handling and recovery
  2. "Build a parameter server for distributed ML training"

    • Coordinate multiple workers
    • Handle parameter updates
    • Monitor training progress
  3. "Create a real-time monitoring system"

    • High-frequency metric collection
    • Alert detection and management
    • Scalable architecture design

Debugging Questions

  1. "This Ray application is slow - debug and optimize it"

    • Identify sequential ray.get() calls
    • Find resource bottlenecks
    • Optimize task/actor patterns
  2. "Tasks are hanging - what's wrong?"

    • Check resource availability
    • Look for blocking operations
    • Use ray.wait() for timeouts
  3. "Ray is running out of memory - fix it"

    • Monitor object store usage
    • Check for memory leaks in actors
    • Use ray.put() for large shared data

Architecture Questions

  1. "When would you use actors vs tasks?"

    • Actors: Stateful operations, resource management, coordination
    • Tasks: Stateless computation, maximum parallelism
  2. "How do you handle failures in distributed Ray applications?"

    • Task-level error handling with try/catch
    • Actor supervision and restart
    • Graceful degradation strategies
  3. "How do you optimize Ray application performance?"

    • Avoid sequential ray.get() calls
    • Use appropriate resource specifications
    • Monitor and debug with Ray Dashboard

Tips

  • ✅ Run all test files successfully

  • ✅ Practice implementing scenarios from scratch

  • ✅ Understand the "why" behind each pattern

  • ✅ Review Ray Dashboard functionality

  • ✅ Prepare questions about their Ray usage

  • ✅ Ask clarifying questions about requirements

  • ✅ Start with simple solution, then optimize

  • ✅ Explain your thought process clearly

  • ✅ Consider error cases and edge conditions

  • ✅ Discuss trade-offs and alternatives

  • ✅ Use proper Ray patterns and best practices

  • ✅ Handle errors gracefully

  • ✅ Write clean, commented code

  • ✅ Consider scalability and performance

  • ✅ Test your solution

Troubleshooting

Common Setup Issues

Ray won't initialize:

# Check if already initialized
if not ray.is_initialized():
    ray.init()

Import errors:

pip install ray numpy psutil

Tests timing out:

  • Reduce data sizes in examples
  • Check available CPU/memory resources
  • Ensure Ray Dashboard isn't blocking ports

Memory errors:

  • Monitor system memory usage
  • Reduce array sizes in examples
  • Check for running Ray processes: ray stop

Getting Help

  1. Ray Documentation: https://docs.ray.io/
  2. Ray GitHub: https://github.com/ray-project/ray
  3. Ray Discourse: https://discuss.ray.io/
  4. Ray Dashboard: http://localhost:8265 (when Ray is running)

📊 Performance Benchmarks

When you run the test files, expect these approximate runtimes:

  • Fundamentals: ~30 seconds
  • Advanced Tasks: ~45 seconds
  • Actors: ~40 seconds
  • Debugging: ~50 seconds
  • Scenarios: ~60 seconds

If any file takes significantly longer, check for resource constraints or hanging tasks.

🎉 Success Criteria

You're ready when you can:

Implement from scratch:

  • Basic Ray tasks and actors
  • Task dependency chains
  • Error handling patterns
  • Performance optimization

Debug effectively:

  • Identify common bottlenecks
  • Use Ray Dashboard
  • Handle memory issues
  • Fix hanging tasks

Explain clearly:

  • When to use actors vs tasks
  • How Ray scheduling works
  • Performance trade-offs
  • Architecture decisions

Handle scenarios:

  • Design distributed systems
  • Optimize slow applications
  • Build fault-tolerant services
  • Scale Ray applications

🚀 Good Luck!

You now have comprehensive Ray Core preparation materials. Practice implementing each pattern, understand the concepts deeply.

About

Learn the basics of Ray core

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages