Sunday, March 16, 2008

FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment

A. Adya, W. J. Bolosky, M. Castro, R. Chaiken, G. Cermak, J. R. Douceur, J. Howell, J. R. Lorch, M. Theimer, R. P. Wattenhofer, "FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment", 5th OSDI, Dec 2002. http://citeseer.ist.psu.edu/adya02farsite.html

------------------

Farsite: secure, scalable file system

logically functions as a centralized file server but physically distributed among a set of untrusted computers

Randomized Replication => availability
cryptographic techniques => secrecy of file content (confidentiality)
Byzantine-fault-tolerant => integrity

scalable => distributed hint mechanism
high performance => locally caching data, lazily propagating file updates, varying the duration and granularity

Farsite:


  • central file server


    • shared namespace

    • location-transparent access

    • reliable data storage

  • local desktop filesystems


    • low cost

    • privacy from nosy sysadmins

    • resistance to geographically localized faults

The security is provided as a matter of virtual security of cryptography, randomized replication, and Byzantine fault tolerance.

The goal: harness the collective resources of loosely coupled insecure and unreliable machines to provide logically centralized secure and reliable file storage service.

cryptography and replication to preserver the confidentiality and integrity

directory metadata is relatively small. It must be comprehensible and revisable directly by the system. Byzantine is used for this.

Farsite's intended workload and machine characteristics are those observed on desktop machines.
workload


  1. high access locality

  2. low persistent update rate

  3. a pattern of read/write sharing that is sequential

Machine Characteristics


  1. high fail stop rate

  2. low but significant rate of malicious or opportunistic subversion

Administration in Farsite is an issue of configuring a minimal system and to authenticate new users and machines. Also signing certificates.

Farsite is intended to run on the desktop workstations ~ 10^5 machines nonce of which are dedicated servers. Connected by a high-bandwidth, low latency network whose topology can be ignored.

Fundamental technology trends for Farsite:


  1. a general increase in unused disk capacity (disk capacity is increasing at a faster rate than disk usage, this enables replication of reliability)

  2. a decrease in the computational cost of cryptographic operations (this enables distributed security)

The system allows the flexibility of multiple roots each of which can be regarded as the name of a virtual file server that is collaboratively created by the participating machines.

The security of any distributed system is an issue of managing trust.

The security components that rely on redundancy need to trust that an apparently distinct set of machines, is truly distinct and not a single malicious machine pretending to be many => Sybil Attack

The certificates


  1. namespace certificate : associating the root with a set of machines managing the root metadata

  2. user certificate: associating a user with his personal public key so that his identity can be validated

  3. machine certificate: associating a machine with its own public key to establish the validity of a machine

Machine certificates in Farsite are not signed directly by CAs but rather by users whose certificates designate them as authorized to certify machines.

users' private key is encrypted by a symmetric key and then stored on a globally readable directory in Farsite. CA private key is kept offline because the entire security of Farsite depends on their secrecy.

Each machine in Farsite may play three roles


  1. client: a machine that directly interacts with the user

  2. directory group : a set of machines that collectively manage file information

  3. file host




Automating Product-Line Variant Selection for Mobile Devices

White, J., Schmidt, D. C., Wuchner, E., and Nechypurenko, A. 2007. Automating Product-Line Variant Selection for Mobile Devices. In Proceedings of the 11th international Software Product Line Conference (September 10 - 14, 2007). International Conference on Software Product Line. IEEE Computer Society, Washington, DC, 129-140. DOI= http://dx.doi.org/10.1109/SPLC.2007.12

PLAs are a promising approach to help developers manage the complexity of variability between mobile devices.

PLAs can be retargeted for different requirement sets by leveraging common capabilities, patterns, and architectural styles.

The design of a PLA is typically guided by the Scope, Commonality, and Variability (SCV) [7].

With the large array of device types and rapid development speed of new devices and capabilities, the system will not be able to know about all device types a priori.

The problems with the existing component-based and feature-based models is the following:
  • lack of ability to consider resource consumption constraints, such as the consumed memory
  • An appropriate architecture for how a device discovery service would be used to characterize a device's nonfunctional requirements (OS, RAM, etc.)
  • Fast feature selection speed to help with dynamic software delivery for mobile devices
Contributions by the paper:
  • Scatter’s graphical requirement and resource specification mechanisms and show how they facilitate the capture and analysis of a wide variety of requirement types
  • how Scatter transforms requirement specifications into a format that can be operated on by a constraint solver
  • the automated variant selection engine, based on a Constraint Logic Programming Finite Domain (CLP(FD)) solver
  • how PLA constraints impact variant selection time for a constraint-based variant selection engine.
  • PLA design rules that we have gleaned from our experiments that help to improve variant selection time when using a constraint-based approach.
The three key challenges associated with creating automated variant selector in pervasive environments
  • Unknown device signatures (to respond to devices with different capabilities)
  • Variant Cost Optimization (the cost associated with the selected variants should be examined before orchestration, selection, and composition of the variants)
  • Limited selection time ( The time for selecting the appropriate set of variants should be reasonable compared to the time that the user is going to be available in a context where s/he needs the type of service)
In traditional PLA, software developers decide about the set of variants to be selected, configured, and organized to work together.

In pervasive environments there are two problems with manual component selection:
  • The target device signatures are not known ahead of time
  • variant selection must be done on demand
  • The solution would be to capture a formal model of PLA's commonalities and variabilities so that automation can take place
  • A model to capture non-functional requirements to prevent deploying the components on systems whose functional requirements fail due to the inconsistencies with the underlying infrastructures
Scatter has the following features
  • graphical modeling tool that defines a domain specific modeling language to visually model the components of the interface, the dependencies and composition rules of components, the non functional requirements of each component
  • A compiler to convert the graphical notation to a Prolog knowledge base and a CSP
  • remote mechanism to a device discovery service that communicates the discovered devices to Scatter's variant selection engine
  • A variant selection engine based on Prolog constraint solver that selects a correct and optional variant for a product
A key challenge in pervasive environments is that variant selection must take into account requirements based on business and context data.

At one extreme, a tool can limit the types of constraints that can be solved to a small subset that is considered most important. At the other extreme, a tool can allow developers to capture any type of constraint, but provide no guarantee of having a way of deducing a variant that satisfies them.

The strategy is to allow the datasources to change while the types of constraints remain constant.

The type of constraints as they have classified:
  • Software Stack on the device
  • Resource consumption constraints
  • hardware capability constraints
  • business/location based constraint
What does this mean? The restriction imposed by the specification format are only on the types of comparisons that can be done and not on the data that the comparison is based upon.

SOAP-based Web service and a CORBA remoting mechanism for remotely communicating device characteristics as they are discovered. (Key, Value) pairs form the reports to Scatter. (How does the device know that it should provide the following information in order to get the component it is looking for? There should be another agent installed on the device, being able to report the information to the device).

A rule is specified that only allows a component to be deployed on a device, if for every local nonfunctional requirement on the component, a resource is present that satisfies the requirement.

A CSP is a problem that involves finding a labeling (a set of values) for a set of variables that adhere to a set of labeling rules (constraints).

A variant becomes a binary string where the ith position represents if the ith component is present.
  • Nonfunctional requirements. Components with mismatched nonfunctional requirements are completely eliminated from the chain of composition.
  • Prune using low-granularity requirements. Rely on the footprints that various classes of variants provide
  • Limit resource tightness. Filter out unessential resource consumptive components
  • Create Service classes: Annotating the components based on the class that they are required to be selected from. The more non0functional requirements, the quicker a decision maker can find the required components that it is looking for.
Resource constraints are a key requirement type in mobile devices with limited capabilities.

The whole approach is based on CONSTRAINT-BASED SOLVER AUTOMATION

A key challenge of automating product variant selection is debugging mistakes in the product line specification.

Thursday, March 13, 2008

The Sybil Attack

Douceur, J.R. “The Sybil attack” in First International Workshop Peer-to-Peer Systems, IPTPS, 2002 Cambridge, MA, USA, March 7-8, 2002, pp. 251-260.

The goal:
  • To show that Sybil attacks are always possible without the presence of a logically centralized authority.
  • The impracticality of establishing distinct identities in a large-scale distributed system.

Peer-to-Peer systems commonly rely on the existence of multiple independent remote entities to mitigate the threat of hostile peers. There are two methods to do so:
  • Replicating computational or storage tasks among several remote sites to protect against integrity violation
  • Fragmenting tasks among several remote sites to protect against privacy violation
if the local entity has no direct physical knowledge of remote entities, it perceives them only as informational abstractions that we call identities.

The forging of multiple identities is called Sybil Attack

In the absence of a trusted identification authority (or unrealistic assumptions about the resources available to an attacker), a Sybil attack can severely compromise the initial generation of identities, thereby undermining the chain of vouchers.

faulty entities (deceptive) : The entities capable of performing any arbitrary behavior except as limited by explicit resource constraints

correct entities (honest): entities abiding the rules of any protocol we define

message: an uninterpreted finite-length bit string whose meaning is determined either by an explicit protocol or by an implicit agreement among a set of entities

Each entity e attempts to present an identity i to other entities in the system. l accepts i if e is able to present identity i to l successfully.

A secure hash of a public key is a straightforward and unforgeable identity. It can also generate a symmetric key for a communication session.


Three sources of information about another entity are:
  • a trusted agency
  • itself
  • other (untrusted) entities. (why is it considered untrusted, you can establish trust to some degree but does it still keep it untrusted?)
Direct validation:
  • Even when severely resource constrained, a faulty entity can counterfeit a constant number of multiple identities.
  • Each correct entity must simultaneously validate all the identities it is presented; otherwise, a faulty entity can counterfeit an unbounded number of identities.