24 Concurrent Computing
Hello students,
In this blog, we focus on concurrent computing. The word 'concurrent' means at the same time. In computing world, when we execute two programs at the same time, we can say it is concurrent.
Concurrency can be achieved by executing two programs on two different processors or if the processor has multiple cores, then concurrency can be achieved by executing two programs on different cores or if we can also run two programs on two different computers at the same time.
Throughput Computation
When we have a high level of computation to be done in a single program itself to achieve at a result, then parallel computation techniques are used to break down the entire program into independent work units (computational unit) and execute them concurrently. Such computation problems are throughput computing problem.
Examples of throughout computation could be scientific modelling, video encoding, gaming, machine learning, deep learning and much more.
Multitasking, Multithreading, Multiprocessing are all ways of trying to execute more work units at one time.
Multitasking may execute all programs using one thread, one machine, one process.
Multithreading uses multiple threads, multiple processes.
Multiprocessing uses multiple processors or cores to execute different machine instructions that may belong to different programs or different threads.
Hyperthreading is a new technique that can execute multiple threads on a single core within any processor.
Concurrent Computing is related to solutions that involves Multithreading techniques to execute more computations at one time, especially when it is a throughput computing problem.
The best way to solve a high throughput computation is to decompose (means break it down) it into many smaller computations in a way that it can completed fast. In short its “divide and rule” strategy.
There are two types of decomposition one looks at in software programs.
- Domain Decompostion
- Functional Decomposition
Domain Decomposition
In Domain Decomposition, the computation is big in regards to lot of data involved but it can be broken down into smaller and similar computation and hence repetitive.
Example if we have to feed 10000 people and then the time to feed everyone is faster when there are more helpers.
100 helpers can do the same, small, similar, repetitive computations of feeding the 10000 people at the same time.
In programming world, when we execute a “for” loop , we can apply parallel computation on such so called “embarrasingly parallel problem” loops.
for (i = 1 to 10000)
feedperson()
Functional Decomposition,
In Functional Decomposition, the computation is big in regards to lot of independent work units.
Example, to feed 10000 people at location B, we need to transport 10000 food packets in warehouse from location A to location B but the transport vehicle can carry only 100 food packets at a time from location A to location B.
Transporting food and Feeding food are functionally different units of work and hence a functional decomposition category of parallel computing problem.
function transportFood()
{
for (i = 1 to 100)
transportfood(packets=100);
}
function feedFood()
{
for (i = 1 to 10000)
feedperson()
}
The above two functions are different units of work and hence can be executed parallelly, on different threads
The problem of computation and communication
There is one problem in the above example.
Even though transportFood() and feedFood() are functionally decomposed computations, there is invisible dependency between two functions.
I cannot feed, if no food has been transported.
Hence transportFood() computation needs to communicate to feedFood() computation that food has arrived.
Decomposition will work the fastest only when computations are not dependent on each other and they do not need to interact/communicate with each other
When two independent computations have to run concurrently and yet communicate, it can lead to typical deadlock issues where one may be waiting for another computation to get over or both computation are waiting on each other.
It also leads to performance issue as computation slows down.
The communication delay may be reduced by using hardware architectures that support sharing of memory resources or any other resources between different computation.
Uniform Memory Access based multiprocessor where processor memory is shared between all cores or processors.
The NUMA architecture additionally provides a separate memory for each processor.
Dual-SIMD cascade architecture where output from one computation executed on one processor core is placed continuously and immediately as input on another core supporting another computation.
Such an hardware architecture will enhance the performance of concurrent computation, where computations need to communicate with each other.
Cloud providers like AWS provide EC2 machines that are compute-intensive friendly powered by Graviton and Graviton2 processor.
Graviton processors have 64-bit core. The cores are very powerful to support web server computation, container computation like containerized microservices, data log processing all by itself.
The second generation Graviton2 supports gaming, in-memory cache, open source database, machine learning.
The power of processor and the processor architecture enables high throughput computation and reduced time for communication between computation.
They deliver 7x performance, 4x more compute core, 5x faster memory, 2x larger caches. Graviton2 supports Dual SIMD (Single Instruction Multiple Data) processor architecture as well.
Cloud Computing Exam paper usually put up the below question
Differentiate between computation and communication with context to cloud computing
What are multi-core processors. Why they are used in Cloud Computing
For more details refer to presentation 1 , presentation 2, presentation 3 on Concurrent computing.
Thank you.
Comments
Post a Comment