Wednesday, December 16, 2009

Dynamic Function Placement for Data-intensive Cluster Computing

Application partitioning is difficult because of :
  • variation in application behavior
  • variability in resource availability
  • availability in workload mixes
Effective use of cluster resources require
  1. load balancing
  2. proper partitioning of functionality among producers and consumers
Function placement is done in abacus only based on black box monitoring removing the burden from the programmers to worry about function placement.

Abacus consists of a programming model and a runtime system. In the abacus programming model, the programmers need to define their components as explicitly migratable functionally independent components or objects.

Anchored elements need to be explicitly defined in the graph of the application. I think this is required because when it comes to modeling the grapho for the application, these components should be makred properly.

Abacus components:
  1. Migration and Location Transparent Invocation Component (Binding Manager)
  2. Resource Monitoring and Management Component (Resource Manager)

Resource Manager uses notifications to collect monitoring information (mointoring and profiling happens during runtime).

The best net benefit is calculated by the server in order to determine whether it is worth doing the migration (minimum requirements for doing the migration). Code Mobility and Dynamic Linking are sidestep in this model.

Mobile Objects are defined by the programmer.

Cluster characteristics critical for function placement:
  • Communication bandwidth between nodes
  • Relative processor speed among nodes
  • Workload characteristics (e.g., bytes moved among functions, instructions executed by each function)
-> Data Intensive Applications: those that selectively filter, mine, sort, or otherwise manipulate large data sets. Spread the parallel computations across the source/sink servers.

Programmable Storage Services. is what they consider as a potential alternative to Cloud when it comes to naming.

Difference between Coign and Abacus is that Coign relies on the profiling history of functions / components to make decisions, while Abacus tries to do it at runtime.

Equanimity dynamically balances the load between a single client and its servers. Abacus extends it to real world clusters, i.e., resource contention, resource heterogeneity, workload variation.

Dynamic adaptation of resource placement based on resource usage and availability.

The two applications used in Abacus:
  1. The file system
  2. The search application

Goals for Abacus:
  1. improve overall performance

Parameters measured:
  • Data Flow Graph
  • Memory Consumption
  • Instructions Executed per Byte
  • Stall Time

No comments: