Concurrency and parallelism

You woke up in the morning. You need to get ready for office. You see notification on chat message on your mobile to send an email with some urgent info required.

You go to your kitchen start the burner, pour the “milk + water + chai patti” to prepare the tea. It is going to take minutes to get tea ready. Instead of waiting for 5 minutes at the kitchen; you login the computer to complete the email sending task.

Tea is getting ready (the thread 1) and simultaneously you sent email (the thread 2), once the email is sent you switched your attention to kitchen. Similar real world scenarios bough in the the picture to introduce multi threading API’s into the programming language.

Parallelism and concurrency difference

Cut the bringle with left hand and tomato with right hand with the two knives in both the hands simultaneously. This is Parallelism.

Pros of Concurrency

Cut both bringle and tomato with one hand. Switch between tomaro and bringle for cutting. Time slicing the cutting task between bringle and tomato is Concurrency

Concurrency leads to better responsiveness of the systems which does not keeps us waiting by leveraging the capability of working on task or applications in a time sliced manner. Scenario where one of the task waits for something may be for page loading, CPU scheduler then jumps on to other tasks; this decreases the overall runtime.

You have 100 units tasks which is to be performed by respective 100 individuals, one task per individual. If we execute the tasks in sequential manner, if one individual fails then next cannot continue his work. But when 100 individuals are working concurrently, then one error prone individuals task wont impact much, and can be completed by another individual. With this notion of thoughts the Fault tolerant & resilient software's can be build, as no single point of failure.

In case of one CPU core, threads execute concurrently i.e. in time sliced manner. Those threads executes parallely, when there are multiple cores; one thread one core is handled.

Benefit of the concurrency increases when there are many more CPU’s. One core (one hand) where you need cut the tomato and bringle in time sliced manner; tomato takes t amount of time and bringle takes another t amount of time; total 2t time. Two cores (hands) where one CPU core cuts tomato in t time, at that same time t, the another CPU core cuts tomato; total time taken is t. One CPU core take 2t amount of time; two PU cores take t amount of time to perform same actions.

Concurrency isn’t always great.

Not every code can be made concurrent. Pieces code which executes independent of each other can only be made concurrent. Pieces of code dependent on each other cannot be made execute concurrently.

Concurrency is all about time slicing different tasks. Task A, Task B, Task C. First perform task A for few milliseconds, then switch to task B, then to Task C for few milliseconds; and again to Task A. CPU scheduler decides how to time slice the given number of tasks, sequence in which tasks are executed or time sliced is not always constant. CPU scheduler is responsible for ordering of tasks, it decides the priority of the tasks so that the context switch between tasks happens.

Context switch is a operating systems standard concept. Context switching to which task execution is not deterministic or fixed. Next time the CPU scheduler might schedule the same set of tasks in a different way next time. Order of execution of context switch might be different next time for same set of tasks.

Take an example. Calculate the multiplication of all the numbers in an array.

Array has 100 numbers. Calculating array elements multiplication takes t time. Its a scenario having one CPU code having 1 thread.
Divide array into 4 parts. Each part having 25 elements. Each part takes t/4 time. t/4 part 1; t/4 part 2; t/4 part 3; t/4 part 4. Four threads. can this take us t/4 time in total. Its a scenario having one CPU code having 4 thread. Concurrency is executing each thread in time sliced manner. is it going to give. is time taken to execute all the threads is t/4?
Let’s deep dive to see whether scenario #2 takes t/4 amount of time to complete the execution of 4 threads. Here, all four task are going to be CPU intensive. None of the task will be waiting for network or disk. Task A takes t/4 time, Task B takes t/4 time, Task C takes t/4 time, Task D takes t/4 time. All the four tasks are executed in time sliced manner, total time taken by each task is t/4 separately. So, total time taken to complete the all 4 tasks is [t/4 + t/4 + t/4 + t/4] = t.
Total time taken to complete all 4 tasks executed concurrently on 1 core CPU is t and not t/4. Because, all of the 4 tasks are CPU intensive. Completely CPU intensive concurrent tasks/threads executing on one CPU do not provide the benefit of time saving.
This is not the scenario where task A and Task B is waiting for network or disk, and then context switches happened between Task C and task D until task A and task B are waiting for network or disk. Concurrent tasks or threads which are not completely CPU intensive, can provide the benefit of time saving. As non CPU intensive tasks provide benefit to use the CPU for another thread, and enables cumulative time saving for set of threads/tasks.
CPU intensive tasks are more CPU intensive and less intensive towards disk I/O an network data exchange. Non CPU intensive tasks not only require only CPU but the fair amount of network communication along with disk I/O. This is in context of single CPU core executing the set of tasks or threads.
Now a days, machines are no more single core, machines are multi core. Threads can end up executing on multiple cores parallely instead of time slicing manner.
Summary: Machine having single core CPU will take total time t to execute all 4 tasks. Each tasks cumulative time is t/4. As there is only one CPU core will lead the job scheduler to perform and execute all 4 tasks in time sliced manner with required number of context switches between tasks after some interval. Machine having multiple CPU cores will take total time t/4 to execute all 4 tasks. Each tasks takes time is t/4. All 4 tasks runs at the same time on 4 different cores; leading to time saving.

Benefit of concurrency depends on 2 factors;

Nature of the task — whether task/thread is CPU intensive, or non CPU intensive
Nature of hardware — whether machine is single core or its multi core

In case of multi core CPU, you need to analyze the problem and decide optimal number of threads. Do not blindly go for number of threads to be in picture can lead to disaster.

Computation can be up to 32 or 64 cores. So many threads (suppose 1000 threads) can lead job scheduler to so many context switches between threads. This defeats the purpose of leveraging the benefit of concurrency, where non threaded or sequential code executes better. So, be wise and optimistic while leveraging concurrency.

How it works for Python Programming?

Python’s multi-threading(https://docs.python.org/3/library/threading.html) is limited by the Global Interpreter Lock (GIL), which allows only one thread to execute Python byte-code at a time. Python threads behave as if running on a single core — even on multi-core systems. This means CPU-bound tasks do not benefit from true parallelism with threads. However, multi-threading works well for I/O-bound tasks (like network or file operations). For CPU-bound parallelism, use multiprocessing. Aka Concurrency

Python’s multiprocessing (https://docs.python.org/3/library/multiprocessing.html) module enables the execution of multiple processes concurrently, allowing for true parallelism and leveraging multiple CPU cores. This differs from multi-threading, which is subject to Python's Global Interpreter Lock (GIL) and thus cannot achieve true parallel execution of CPU-bound tasks. Aka Parallelism

Concurrent thread program

import threading
import time


def print_numbers():
    for i in range(1, 6):
        print(f"Numbers thread: {i}")
        time.sleep(1)


def print_letters():
    for letter in ["A", "B", "C", "D", "E"]:
        print(f"Letters thread: {letter}")
        time.sleep(1)


def main_calculation():
    for i in range(1, 6):
        result = i * i
        print(f"Main thread calculation: {i}^2 = {result}")
        time.sleep(1)


# Create threads
t1 = threading.Thread(target=print_numbers)
t2 = threading.Thread(target=print_letters)

# Start threads
t1.start()
t2.start()

# Main thread does its own calculation concurrently
main_calculation()

# Wait for both threads to finish
t1.join()
t2.join()

print("All threads have finished.")

# Disclaimer: This code was generated with assistance from GitHub Copilot,
# an AI-powered coding assistant

/home/pradip/workspace/concurrency/1_threading_demo.py 
Numbers thread: 1
Letters thread: A
Main thread calculation: 1^2 = 1
Numbers thread: 2
Letters thread: B
Main thread calculation: 2^2 = 4
Numbers thread: 3
Letters thread: C
Main thread calculation: 3^2 = 9
Numbers thread: 4Letters thread: D

Main thread calculation: 4^2 = 16
Letters thread: E
Numbers thread: 5
Main thread calculation: 5^2 = 25
All threads have finished.

is starting a thread same as function call? Default is a main thread. can we return the value calculated in created thread into main thread?

Answer is big No!

Let’s understand why answer is No.

Each thread has a separate call stack

Scenario 1: Default is a main thread. Main thread calls function A, then function A calls function B. Call stack looks looks like below.

Call Stack: [Main | function_A | function_B

Value returned by function B can be captured in function A. Value returned by function A can be captured in main program.

Scenario 2: Default is a main thread. Main thread calls function A. Main thread creates thread 1, thread 1 calls function B

Two call stacks will get created. One call stack for main thread. Another call stack for thread 1.

[Main, function_A

[thread_1, function_B

Value in the call stack of thread 1 cannot be returned to call stack of main thread. Values can be returned from the top function in the same call stack to second top function in the same call stack.

Important: If there are 10 threads running, there will be 10 different call stacks. Call stack is private to the thread, call stack cannot not be shared between threads. Functions triggered by the thread does not remain in the call stack of main thread.

Exception is thrown by a function called in the thread 1, that exception cannot be captured in the main thread.

# Example: Exception in a thread is not caught by the main thread

import threading
import time


def worker():
    time.sleep(1)
    raise ValueError("Exception from thread!")


def main():
    try:
        t = threading.Thread(target=worker)
        t.start()
        t.join()
    except ValueError as e:
        print(f"Caught exception in main: {e}")
    else:
        print("No exception caught in main thread.")


if __name__ == "__main__":
    main()

# Output:
# Exception in thread Thread-1:
# Traceback (most recent call last):
#   ...
# ValueError: Exception from thread!
# No exception caught in main thread.

# Disclaimer: This code was generated with assistance from GitHub Copilot,
# an AI-powered coding assistant

/home/pradip/workspace/concurrency/2_exception_in_thread.py 
Exception in thread Thread-1 (worker):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/home/pradip/workspace/concurrency/2_exception_in_thread.py", line 8, in worker
    raise ValueError("Exception from thread!")
ValueError: Exception from thread!
No exception caught in main thread.

The program here multiple threads are created and program want to simulate the behavior as if its returning values like a function; the shared variables are used. These shared variables can be read or updated across threads.

Threads do not return value to the main thread because as soon as thread is created new call stack is created, and value returned by function in one call stack cannot be read in another call stack. Also, exception is raised in thread, that exception cannot be handled in main thread.

You cannot expect the orientation of sequential program while working on threads.

(more content to be added…)

Feel free to reach out on LinkedIn for networking.

Concurrency and parallelism

Pros of Concurrency

How it works for Python Programming?

Concurrent thread program

Each thread has a separate call stack

20 Newsgroups Document Classification using ConvNets

Cost Reduction for Air Pressure System Failures of Scania Trucks

How FastAPI handles 1000 client hits in a second

K Means ++ Clustering on Amazon Fine Food Reviews

How FastAPI handles 1000 client hits in a second

Concurrency and parallelism

Pros of Concurrency

How it works for Python Programming?

Concurrent thread program

Each thread has a separate call stack

You might like