SINN and XQuery: Results and Implementation



Ähnliche Dokumente
Creating OpenSocial Gadgets. Bastian Hofmann

Mash-Up Personal Learning Environments. Dr. Hendrik Drachsler

LiLi. physik multimedial. Links to e-learning content for physics, a database of distributed sources

Long-term archiving of medical data new certified cloud-based solution offers high security and legally approved data management

GridMate The Grid Matlab Extension

SARA 1. Project Meeting

PhysNet and its Mirrors

Ressourcenmanagement in Netzwerken SS06 Vorl. 12,

Exploring the knowledge in Semi Structured Data Sets with Rich Queries

Einsatz einer Dokumentenverwaltungslösung zur Optimierung der unternehmensübergreifenden Kommunikation

Algorithms for graph visualization

Cloud Computing in der öffentlichen Verwaltung

CNC ZUR STEUERUNG VON WERKZEUGMASCHINEN (GERMAN EDITION) BY TIM ROHR

Tube Analyzer LogViewer 2.3

From Mapping to Metadata, From Simple to Enterprise Portals? - A one Stop Solution using Portlet Technology*

Approx. 2,000 frontlist books and 18,000 backlist books; list will be updated periodically

NEWSLETTER. FileDirector Version 2.5 Novelties. Filing system designer. Filing system in WinClient

Virtual PBX and SMS-Server

vcdm im Wandel Vorstellung des neuen User Interfaces und Austausch zur Funktionalität V

Order Ansicht Inhalt

Overall Coordination- and Communication Platform. for electronic and standardised Data-Exchange between. Ports and Hinterland in Rail-Traffic

Die Bedeutung neurowissenschaftlicher Erkenntnisse für die Werbung (German Edition)

Word-CRM-Upload-Button. User manual

Scan leaflet and let yourself be surprised! Augmented Reality Enter a new dimension.

ColdFusion 8 PDF-Integration

TomTom WEBFLEET Tachograph

DAS ERSTE MAL UND IMMER WIEDER. ERWEITERTE SONDERAUSGABE BY LISA MOOS

Bayerisches Landesamt für Statistik und Datenverarbeitung Rechenzentrum Süd. z/os Requirements 95. z/os Guide in Lahnstein 13.

Fachübersetzen - Ein Lehrbuch für Theorie und Praxis

ELBA2 ILIAS TOOLS AS SINGLE APPLICATIONS

Tuning des Weblogic /Oracle Fusion Middleware 11g. Jan-Peter Timmermann Principal Consultant PITSS

Praktikum Entwicklung Mediensysteme (für Master)

EVANGELISCHES GESANGBUCH: AUSGABE FUR DIE EVANGELISCH-LUTHERISCHE LANDESKIRCHE SACHSEN. BLAU (GERMAN EDITION) FROM EVANGELISCHE VERLAGSAN

Customer-specific software for autonomous driving and driver assistance (ADAS)

Using TerraSAR-X data for mapping of damages in forests caused by the pine sawfly (Dprion pini) Dr. Klaus MARTIN

Prof. Dr. Margit Scholl, Mr. RD Guldner Mr. Coskun, Mr. Yigitbas. Mr. Niemczik, Mr. Koppatz (SuDiLe GbR)

Service Discovery in Home Environments

Routing in WSN Exercise

PONS DIE DREI??? FRAGEZEICHEN, ARCTIC ADVENTURE: ENGLISCH LERNEN MIT JUSTUS, PETER UND BOB

Browser- gestützte Visualisierung komplexer Datensätze: Das ROAD System

Austrian Learning Network 4 Schools

Symbio system requirements. Version 5.1

Mitglied der Leibniz-Gemeinschaft

Webbasierte Exploration von großen 3D-Stadtmodellen mit dem 3DCityDB Webclient

MATHEMATIK - MODERNE - IDEOLOGIE. EINE KRITISCHE STUDIE ZUR LEGITIMITAT UND PRAXIS DER MODERNEN MATHEMATIK (THEORIE UND METHODE) FROM UVK

p^db=`oj===pìééçêíáåñçêã~íáçå=

HIR Method & Tools for Fit Gap analysis

Ein Stern in dunkler Nacht Die schoensten Weihnachtsgeschichten. Click here if your download doesn"t start automatically

Cycling and (or?) Trams

Industrial USB3.0 Miniature Camera with color and monochrome sensor

Oracle Integration Cloud Service

Product Lifecycle Manager

Exercise (Part II) Anastasia Mochalova, Lehrstuhl für ABWL und Wirtschaftsinformatik, Kath. Universität Eichstätt-Ingolstadt 1

LISA IT (Informatik) In Austria pupils must learn the following during their time at school: Schülerinnen und Schüler sollen wissen:

FACHKUNDE FüR KAUFLEUTE IM GESUNDHEITSWESEN FROM THIEME GEORG VERLAG

Karlsruhe Institute of Technology Die Kooperation von Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH)

prorm Budget Planning promx GmbH Nordring Nuremberg

Wie man heute die Liebe fürs Leben findet

JONATHAN JONA WISLER WHD.global

XML Template Transfer Transfer project templates easily between systems

Microsoft Azure Fundamentals MOC 10979

Exercise (Part V) Anastasia Mochalova, Lehrstuhl für ABWL und Wirtschaftsinformatik, Kath. Universität Eichstätt-Ingolstadt 1

Corporate Digital Learning, How to Get It Right. Learning Café

PONS DIE DREI??? FRAGEZEICHEN, ARCTIC ADVENTURE: ENGLISCH LERNEN MIT JUSTUS, PETER UND BOB

SAP Simple Service Request. Add-on for SAP Solution Manager by SAP Consulting SAP Deutschland SE & Co. KG

WP2. Communication and Dissemination. Wirtschafts- und Wissenschaftsförderung im Freistaat Thüringen

FAHRZEUGENTWICKLUNG IM AUTOMOBILBAU FROM HANSER FACHBUCHVERLAG DOWNLOAD EBOOK : FAHRZEUGENTWICKLUNG IM AUTOMOBILBAU FROM HANSER FACHBUCHVERLAG PDF

Markus BöhmB Account Technology Architect Microsoft Schweiz GmbH

H.1 FORMI: An RMI Extension for Adaptive Applications H.1 FORMI: An RMI Extension for Adaptive Applications

Prediction Market, 28th July 2012 Information and Instructions. Prognosemärkte Lehrstuhl für Betriebswirtschaftslehre insbes.

Employment and Salary Verification in the Internet (PA-PA-US)

prorm Workload Workload promx GmbH Nordring Nuremberg

new possibilities for the worldwide market

How to access licensed products from providers who are already operating productively in. General Information Shibboleth login...

Brandbook. How to use our logo, our icon and the QR-Codes Wie verwendet Sie unser Logo, Icon und die QR-Codes. Version 1.0.1

ETHISCHES ARGUMENTIEREN IN DER SCHULE: GESELLSCHAFTLICHE, PSYCHOLOGISCHE UND PHILOSOPHISCHE GRUNDLAGEN UND DIDAKTISCHE ANSTZE (GERMAN

BIW Wahlpflichtmodul. Einführung in Solr, Pipeline und REST. Philipp Schaer, TH Köln (University of Applied Sciences), Cologne, Germany

Java Tools JDK. IDEs. Downloads. Eclipse. IntelliJ. NetBeans. Java SE 8 Java SE 8 Documentation

Software development with continuous integration

Max und Moritz: Eine Bubengeschichte in Sieben Streichen (German Edition)

Distributed testing. Demo Video

Ihr Dienstleister für individuelle Softwareentwicklung und IT-Beratung

Username and password privileges. Rechteverwaltung. Controlling User Access. Arten von Rechten Vergabe und Entzug von Rechten DBS1 2004

German translation: technology

Beschwerdemanagement / Complaint Management

Lies doch mal! 2: 50 wichtige Jugendbücher (German Edition)

"What's in the news? - or: why Angela Merkel is not significant

Ingenics Project Portal

Schöpfung als Thema des Religionsunterrichts in der Sekundarstufe II (German Edition)

Anbindung ekey an VASERControl

Killy Literaturlexikon: Autoren Und Werke Des Deutschsprachigen Kulturraumes 2., Vollstandig Uberarbeitete Auflage (German Edition)

Was heißt Denken?: Vorlesung Wintersemester 1951/52. [Was bedeutet das alles?] (Reclams Universal-Bibliothek) (German Edition)

Network-Aligned content delivery through collaborative optimization

BVM-Tutorial 2010: BlueBerry A modular, cross-platform, C++ application framework

Engineering the Factory of the Future Now.Next.Beyond. Heiko Schwindt VP Automation & Electrification Solutions, Bosch Rexroth

Sinn und Aufgabe eines Wissenschaftlers: Textvergleich zweier klassischer Autoren (German Edition)

Open Archives Initiative - Protocol for Metadata Harvesting (OAI-PMH)

Handbuch der therapeutischen Seelsorge: Die Seelsorge-Praxis / Gesprächsführung in der Seelsorge (German Edition)

Transkript:

SINN and XQuery: Results and Implementation Thomas Severiens Thomas.Severiens@ISN-Oldenburg.de Michael Schlenker Michael.Schlenker@ISN-Oldenburg.de Institute for Science Networking Oldenburg GmbH SINN03, Oldenburg, 17.-19.09.2003

Content! Information Sources and Retrieval Mechanisms! Query-Language! Searching for Physics! Distributed Network! User Benefit! Implementation DXQ Structure! Implementation DXQ User-Interface! Implementation DXQ Examples 1

Information Sources and Retrieval Mechanisms! Google: Fulltext Search on Distributed, Online, Free Information! PhysNet: Fulltext Search on Distributed, Professional, Online, Free Information! PhysDoc: Fulltext Search on Distributed, Professional, Online, Free Publications (Articles, PrePrints,...)! Inspec, Abstract-Services, Publishers, etc.: Metadata- & Abstract Search on (Distributed), Professional, (Online), Publications 2

Information Sources and Retrieval Mechanisms! Google: Simple Search, easy to use, not optimized for structured search! PhysNet: Simple Search, easy to use, structured search not implemented! PhysDoc: Structured Search, easy to use, metadata search implemented, booleans! Inspec, Abstract-Services, Publishers, etc.: Query Language, for professional users, several easier to use web-interfaces! PhysDoc-SINN: XML-Query, Professional Query Language, as web-service for other applications, e.g. user-interfaces 3

XML-Query! Query-Language, optimized for highly structured search on highly structured data (XML).! Query is XML, Data is XML, Results are XML! Own datamodel and datatypes (closely leaned upon XML-Schema) (but Schema is buggy, so what to do?)! Complete programming language! Was optimized for database-world, could be adopted for necessities of internet-retrieval! Problems: Namespace-Handling, Casting (solved on Sept. 3rd 2003) 4

Distributed Searchengines! Sharing Indexes: A Index-files B A and B have similar content User may ask A or B, getting similar results + For data, which is valid over long periods - For dynamic data - Broad bandwidth between A and B required + User needs connection to A or B only 5

Distributed Searchengines! Sharing Queries: User Distributor A B A and B may have different content User asks Distributor to distribute queries (agents) + For dynamic data - Results depend on connectivity + A and B share computing load - Problem: ranking, merging algorithm, doublets 6

Distributed Searchengines! Combination: User Distributor A Index-files B A and B share parts of their index-files, to optimize availability, redundancy of data, computing load of participating servers. XML-Query allows the user to program merging algorithms, to be executed by the distributor. XML-Query allows to send complex queries into the system. Let s scale this model onto PhysDoc 7

PhysDoc-Search today! Harvest-Software based network of searchengines (without DXQ-Software installed) User User User Interface Interface Interface Broker Index Broker Index Broker Gatherer Gatherer Gatherer Gatherer Gatherer Gatherer Gatherer 8

PhysDoc the next step! How to re-use the existing network - Network of software - Network of organizations - Network of people Offering work power Offering computer power! SINN: Use the existing distributed workforce to implement a new, better, more intelligent search facility. 9

PhysDoc-Search Step 1! All software for step 1 is ready for implementation! User Interface XQD User Interface XQD XDP XDP XDP XML-DB XML-DB XML-DB Broker Index Broker Index Broker Gatherer Gatherer Gatherer Gatherer Gatherer Gatherer Gatherer 10

DXQ Benefit for the User! DXQ: Distributed XML-Query! What are the benefits for the users? - Queries may be highly structured - XML-structured results Better User-Interfaces possible - Same redundancy of data - Higher system-performance, due to load-information exchange - Reduced local computing load, due to sharing of workforce implemented 11

DXQ A closer view! For more information on the protocol: arxiv.org/abs/cs.dc/0309022! XQD (Distributor) and XDP (Provider) exchange queries, results and status information. User Interface XQD XDP XML-DB XDP XML-DB 12

PhysDoc-Search Step 2! Most of the software is ready for implementation! All software will be available soon. User Interface XQD User Interface XQD XDP XDP XML-DB XML-DB Cache Cache XQD XDP XDP XDP XML-DB Cache?!? XML-DB XML-DB Broker Index Broker Gatherer Gatherer Gatherer Gatherer Gatherer Gatherer Gatherer 13

PhysDoc-Search Step 3! Much work to do, post-sinn perspective! Replace SOIF by XML User Interface XQD User Interface XQD User Interface XQD XQD XQD XDP XML-DB Cache XML-Agent XDP XDP XML-DB XML-DB Cache Cache XML-Agent XML-Agent XML-Agent XML-Agent XML-Agent XML-Agen XDP XDP XDP XDP XML-DB XML-DB XML-DB XML-DB Cache Cache Cache Cache 14

XDP: Problems to be solved! XML-Database: Choose database, which supports native XML! XML-Database: Choose database, which supports XML- Query! XML-Processing results nearly always in very high computing load! Find work-arounds... XDP XML-DB 15

XQD Implementation! Handles communication with User Clients! Handles communication with Data Providers! Aggregates results via predefined algorithms or user supplied XML-Query programs XQD Client Interface XDP Interface Galax XML-Query Processor 16

Galax XML-Query Processor! Open Source! Provides various easy to use language bindings (C, Java, OCaml)! XML-Projection feature to reduce memory consumption Galax XML-Query Processor http://db.bell-labs.com/galax 17

XDP- Implementation! Communicates with XQD via DXQP! Provides XML-Query interface to the database or uses an existing XML-Query interface XDP XQD Interface XML-Query Processor Database XML DOM 18

XML DOM memory problems! XML Document Object Model (DOM) uses large amounts of memory, especially most Java libraries - Jdom: 25x source xml document - Tdom: 3x source xml document! XML-Query operates on the DOM! Source xml documents for the search index are in the some hundred megabytes range 19

Solutions for the Memory Problem SAX Stream Processing! Low Complexity! Document is reparsed for each XML-Query! Very low memory consumption Persistent DOM! High Complexity! Document is parsed once into a database! Medium memory consumption Not useful for XML-Query on large documents. Usable for XML-Query on large documents. 20

XDP: Persistent DOM! Use a database for persistence and efficient storage of the index! Provide a virtual DOM style access to the database! Plug the virtual DOM into the XML-Query processor! Virtual DOM support for Galax is in current development 21

DXQ Client Implementation! Provide functionality to send queries into the DXQ network! Provide functionality to introspect XQDs! Handle the DXQ protocol details for the user 22

DXQ Implementations! C and Tcl based client implementations are available, with simple UI examples! A C based XQD implementation is available using Galax as query processor! A C based XDP implementation is available using Galax as query processor 23

DXQ Protocol! DXQP is a message based protocol! DXQP can be implemented via any message exchange mechanism (HTTP, Sockets, SMTP,...)! DXQ is Unicode based, so non-us character sets are supported 24

DXQP Message Example DXQP-1.0 XML-QUERY Msg-From: dxqp://metasearch.isn-oldenburg.de/dxq-xqd/ Msg-To: dxqp://physnet-mirror.isn-oldenburg.de:8750/ Transaction-ID: 1 Content-Length: 23 let $a :=.//author return $a 25

DXQ Tcl Client Basic Example package require Tcl 8.4 package require ::dxqp::client package require ::dxqp::tcp-transport set c [::dxqp::client::dxqclient] set t ::dxqp::tcp-transport::transport set xqd dxqp://harvest.physik.uni-oldenburg.de:8750/ set query {<result>\{ for $r in //row where $r/id < 2 return $ \}</result>} puts [$c queryxqd $t $xqd $query concatenate] 26

DXQ C Client Web UI Screenshot von Christians CGI client 27

Thank you for your Attention Thomas Severiens Thomas.Severiens@ISN-Oldenburg.de Michael Schlenker Michael.Schlenker@ISN-Oldenburg.de For DXQ-Protocol: arxiv.org/abs/cs.dc/0309022 For the DXQ-Software: / For XML-Query: www.w3c.org/xml/query 28