Agent Technology for Data Analysis
WORKSHOP ON SCIENTIFIC DATA MANAGEMENT PROBLEMS AND SOLUTIONS
Motivation and Disclaimer
- Many efforts to use supernetworks to link supercomputers to transfer huge datasets
- Few efforts to make effective use of existing real-world networks
- Allow university users to access remote data
- I am not an agent technology expert
- We do have a prototype application
- I’m hoping some of you are!
Outline
- Why Java
- For Agent Technology?
- For Data Analysis?
- Analysis Studio application
What Problem are we trying to solve?
- Widely distributed users who need access to petabyte datasets
- Many university users with mediocre networks
- Most universities have no way to handle petabyte data samples
- Physicist needs unfettered access to data
- Would like effective use of desktop machine
- Canned analysis wont do
- CPU/data access requirements are infinite
Faster networks?
- Faster networks will not solve our problems anytime soon
- No matter how fast networks are they are always saturated.
- As networks become saturated latency becomes high
Why Agent Technology?
- By encapsulating users analysis code as a “user agent” we can send it to the data, wide-area network bandwidth requirements become trivial
- Analysis modules are typically small ’s kBytes
- HEP output is typically histograms (binned) and scatterplots, which are both small
- Possible to do GUI based analysis of large datasets using 28.8 modem connection
- Give user the impression his analysis is running locally.
Why Java for Agent Technology?
- Java produces machine independent bytecodes
- Trivial to move from one machine to another
- Network handling and Remote Method Invocation (RMI c.f. Corba) built-in
- (Remote) Dynamic loading build-in
- Multithreaded servers easy to write
- Built-in Java “Sandbox” can be used to restrict agents
Why Java for Data Analysis
- Easy to learn yet very powerful, fully OO language
- Very wide industry support
- Just In Time compilation = Fast
- Dynamic Optimization = Faster
- Very fast code, load, test, fix cycle
- Built in debugger, including remote debugging
- Numerical functionality good
- Java Grande Forum enhancing numerical support
“Java Analysis Studio”
Demo
Network Performance
More Information
- Java Analysis Studio
- http://www-sldnt.slac.stanford.edu/jas
- Java Grande Forum (numeric computing in Java)
- http://www.javagrande.org/
- Desktop access to remote resources
- http://www-fp.mcs.anl.gov/~gregor/datorr/
-