Python multiprocessing is a way of achieving parallelism or concurrency in Python by allowing multiple processes to run simultaneously on different CPUs or cores of a computer. It is an effective way to speed up CPU-bound tasks that can be split into smaller, independent parts.
Multiprocessing in Python is achieved using the multiprocessing
module, which provides an API for spawning and managing new processes. The module includes a Process
class that can be used to create new processes and a Pool
class that can be used to manage a pool of worker processes.
Here is an example of how to use the Process
class to create and start a new process:
from multiprocessing import Process def my_function(): print("Hello from process") if __name__ == "__main__": p = Process(target=my_function) p.start() p.join()
In this example, a new process is created using the Process
class and the target
argument is set to the function that will be run in the new process. The start()
method is then called to start the new process, and the join()
method is called to wait for the new process to complete before continuing with the main process.
The Pool
class is used when you have a set of tasks to perform in parallel. Here is an example of how to use the Pool
class to parallelize a loop:
from multiprocessing import Pool def square(x): return x*x if __name__ == "__main__": with Pool(processes=4) as pool: results = pool.map(square, [1, 2, 3, 4, 5]) print(results)
In this example, a Pool
of 4 processes is created, and the map()
method is used to apply the square()
function to each element in the list. The map()
method blocks until all the tasks are completed, and the results are returned in a list.
Multiprocessing in Python can be a powerful tool for achieving parallelism and speeding up CPU-bound tasks, but it also comes with some overhead and complexity. It is important to understand the trade-offs and limitations of multiprocessing and to use it appropriately for your specific use case.
Python Multiprocessing Classes:
In Python’s multiprocessing
module, there are several classes that can be used for creating and managing processes. Here are some of the most commonly used classes:
Process
: This is the most basic class in themultiprocessing
module. It is used to create a new process by passing a target function or method to the constructor. TheProcess
class provides methods for starting and stopping the process, and for waiting for the process to finish.Pool
: This class is used to create a pool of worker processes, which can be used to execute multiple tasks in parallel. ThePool
class provides methods for submitting tasks to the pool, waiting for tasks to complete, and terminating the pool.Queue
: This class provides a simple way to communicate between processes using a thread-safe queue. Processes can put items into the queue using theput()
method, and get items from the queue using theget()
method.Value
andArray
: These classes provide a way to share data between processes.Value
is used for storing a single value, whileArray
is used for storing an array of values. Both classes ensure that the shared data is protected from race conditions and other synchronization issues.Manager
: This class provides a way to create a server process that can manage shared objects like lists, dictionaries, and namespaces. TheManager
class provides methods for creating new shared objects, accessing existing shared objects, and terminating the server process.
These classes are just some of the many tools available in the multiprocessing
module. By using these classes, you can create and manage processes, share data between processes, and coordinate the execution of multiple tasks in parallel.
Python Multiprocessing Example:
Here is an example of how to use the multiprocessing
module in Python to parallelize a task:
import multiprocessing def square(x): return x * x if __name__ == '__main__': with multiprocessing.Pool(processes=4) as pool: results = pool.map(square, [1, 2, 3, 4, 5]) print(results)
In this example, the square
function takes an integer and returns its square. We create a Pool
object with 4 worker processes using the with
statement to ensure that the processes are properly closed when we’re done with them. We then use the map
method to apply the square
function to each element in the list [1, 2, 3, 4, 5]
. The map
method returns a list of the results, which we store in the results
variable.
When we run the script, the output will be:
[1, 4, 9, 16, 25]
This example demonstrates how easy it is to use the multiprocessing
module to parallelize a task. By creating a Pool
object and using the map
method, we were able to execute the square
function on each element of the list in parallel, resulting in a significant speedup compared to executing the function sequentially.
Commonly Used Functions of Multiprocessing:
The multiprocessing
module in Python provides several functions for creating and managing processes. Here are some commonly used functions:
Process(target, args)
: This function creates a new process and starts it. Thetarget
argument is a callable object that will be called in the new process, and theargs
argument is a tuple of arguments to pass to the target function.Pool(processes)
: This function creates a pool of worker processes that can be used to execute tasks in parallel. Theprocesses
argument is the number of worker processes to create in the pool.Queue()
: This function creates a new queue object that can be used for inter-process communication. Processes can put items into the queue using theput()
method, and get items from the queue using theget()
method.Value(typecode, value)
: This function creates a shared value object that can be accessed by multiple processes. Thetypecode
argument specifies the type of the value (e.g. ‘i’ for integer), and thevalue
argument is the initial value of the shared variable.Array(typecode, size)
: This function creates a shared array object that can be accessed by multiple processes. Thetypecode
argument specifies the type of the array (e.g. ‘i’ for integer), and thesize
argument is the size of the array.Lock()
: This function creates a new lock object that can be used to synchronize access to shared resources. A lock can be acquired using theacquire()
method and released using therelease()
method.Manager()
: This function creates a new manager object that can be used to create and manage shared objects like lists, dictionaries, and namespaces. The manager provides a way to share data between processes without having to use low-level synchronization primitives like locks and semaphores.
These are just a few examples of the many functions provided by the multiprocessing
module. By using these functions, you can create and manage processes, share data between processes, and coordinate the execution of multiple tasks in parallel.