Python Multiprocessing

Python multiprocessing is a way of achieving parallelism or concurrency in Python by allowing multiple processes to run simultaneously on different CPUs or cores of a computer. It is an effective way to speed up CPU-bound tasks that can be split into smaller, independent parts.

Multiprocessing in Python is achieved using the multiprocessing module, which provides an API for spawning and managing new processes. The module includes a Process class that can be used to create new processes and a Pool class that can be used to manage a pool of worker processes.

Here is an example of how to use the Process class to create and start a new process:

from multiprocessing import Process

def my_function():
    print("Hello from process")

if __name__ == "__main__":
    p = Process(target=my_function)
    p.start()
    p.join()

In this example, a new process is created using the Process class and the target argument is set to the function that will be run in the new process. The start() method is then called to start the new process, and the join() method is called to wait for the new process to complete before continuing with the main process.

The Pool class is used when you have a set of tasks to perform in parallel. Here is an example of how to use the Pool class to parallelize a loop:

from multiprocessing import Pool

def square(x):
    return x*x

if __name__ == "__main__":
    with Pool(processes=4) as pool:
        results = pool.map(square, [1, 2, 3, 4, 5])
        print(results)

In this example, a Pool of 4 processes is created, and the map() method is used to apply the square() function to each element in the list. The map() method blocks until all the tasks are completed, and the results are returned in a list.

Multiprocessing in Python can be a powerful tool for achieving parallelism and speeding up CPU-bound tasks, but it also comes with some overhead and complexity. It is important to understand the trade-offs and limitations of multiprocessing and to use it appropriately for your specific use case.

Python Multiprocessing Classes:

In Python’s multiprocessing module, there are several classes that can be used for creating and managing processes. Here are some of the most commonly used classes:

  1. Process: This is the most basic class in the multiprocessing module. It is used to create a new process by passing a target function or method to the constructor. The Process class provides methods for starting and stopping the process, and for waiting for the process to finish.
  2. Pool: This class is used to create a pool of worker processes, which can be used to execute multiple tasks in parallel. The Pool class provides methods for submitting tasks to the pool, waiting for tasks to complete, and terminating the pool.
  3. Queue: This class provides a simple way to communicate between processes using a thread-safe queue. Processes can put items into the queue using the put() method, and get items from the queue using the get() method.
  4. Value and Array: These classes provide a way to share data between processes. Value is used for storing a single value, while Array is used for storing an array of values. Both classes ensure that the shared data is protected from race conditions and other synchronization issues.
  5. Manager: This class provides a way to create a server process that can manage shared objects like lists, dictionaries, and namespaces. The Manager class provides methods for creating new shared objects, accessing existing shared objects, and terminating the server process.

These classes are just some of the many tools available in the multiprocessing module. By using these classes, you can create and manage processes, share data between processes, and coordinate the execution of multiple tasks in parallel.

Python Multiprocessing Example:

Here is an example of how to use the multiprocessing module in Python to parallelize a task:

import multiprocessing

def square(x):
    return x * x

if __name__ == '__main__':
    with multiprocessing.Pool(processes=4) as pool:
        results = pool.map(square, [1, 2, 3, 4, 5])
    print(results)

In this example, the square function takes an integer and returns its square. We create a Pool object with 4 worker processes using the with statement to ensure that the processes are properly closed when we’re done with them. We then use the map method to apply the square function to each element in the list [1, 2, 3, 4, 5]. The map method returns a list of the results, which we store in the results variable.

When we run the script, the output will be:

[1, 4, 9, 16, 25]

This example demonstrates how easy it is to use the multiprocessing module to parallelize a task. By creating a Pool object and using the map method, we were able to execute the square function on each element of the list in parallel, resulting in a significant speedup compared to executing the function sequentially.

Commonly Used Functions of Multiprocessing:

The multiprocessing module in Python provides several functions for creating and managing processes. Here are some commonly used functions:

  1. Process(target, args): This function creates a new process and starts it. The target argument is a callable object that will be called in the new process, and the args argument is a tuple of arguments to pass to the target function.
  2. Pool(processes): This function creates a pool of worker processes that can be used to execute tasks in parallel. The processes argument is the number of worker processes to create in the pool.
  3. Queue(): This function creates a new queue object that can be used for inter-process communication. Processes can put items into the queue using the put() method, and get items from the queue using the get() method.
  4. Value(typecode, value): This function creates a shared value object that can be accessed by multiple processes. The typecode argument specifies the type of the value (e.g. ‘i’ for integer), and the value argument is the initial value of the shared variable.
  5. Array(typecode, size): This function creates a shared array object that can be accessed by multiple processes. The typecode argument specifies the type of the array (e.g. ‘i’ for integer), and the size argument is the size of the array.
  6. Lock(): This function creates a new lock object that can be used to synchronize access to shared resources. A lock can be acquired using the acquire() method and released using the release() method.
  7. Manager(): This function creates a new manager object that can be used to create and manage shared objects like lists, dictionaries, and namespaces. The manager provides a way to share data between processes without having to use low-level synchronization primitives like locks and semaphores.

These are just a few examples of the many functions provided by the multiprocessing module. By using these functions, you can create and manage processes, share data between processes, and coordinate the execution of multiple tasks in parallel.