PIDALION: Implementation issues of a Java-based
Multimedia Search Engine over the web
Dimitris E. Charilas, Ourania I. Markaki
National Technical University of Athens, Department of Electrical and Computer Engineering,
Heroon Polytechneiou 9, Athens, 15773, Greece
Phone: (+30) 210-772 2078  E-mail: omarkaki@gmail
Keywords: multimedia content, queries, content-based retrieval, multimedia crawler, metadata, image histogram, hierarchical presentation
Abstract - Fuelled by the rapid expansion of broadband connectivity and increasing interest in online multimedia-rich applications, the growth of digital multimedia content has skyrocketed. Among others, this growth is compounding the need for more effective methods for searching multimedia information. The automated web search engines that are currently used rely only on text descriptions and as a result provide matches of poor quality in case of multimedia content. The services of a multimedia sear
ch engine are therefore a possibility that the internet users still lack. Thus, the scope of this paper is to present an implementation approach for a personalized web-based multimedia search engine in the Java programming language. This approach combines the characteristics of the current search engines as well as new innovative features which guarantee at the same time the system’s quick response and better search results. In this paper the reader can find an analytical presentation of all the components required to form a multimedia search engine, as well as indications on how to implement key algorithms and functions.
1. I NTRODUCTION
The web creates new challenges for information retrieval. The amount of information on the web is growing rapidly and so is the number of new users inexperienced in the art of web research. It is estimated that 1-2 Exa-Bytes (millions of Tera-Bytes) of new information are created each year over the Web. This huge amount of information is anticipated to grow by a factor of 10 in the following two years. Automated search engines that rely on keyword matching usually return too many low quality matches. The situation is worse as far as multimedia content is concerned. The most popular search engine, Google [1], relies only on keywords to search for images and does not contain any information on semantic content. Content-based image retrieval systems (CBIR) try to solve this prob
lem. Many CBIR systems have been recently proposed and implemented in the literature. Examples include the QBIC system [2], where colour information is exploited, the PicToSeek system [3], which combines colour and shape invariant features to perform image retrieval and Virage [4] that allows the users to manually regulate the importance of the extracted descriptors according to their own perception. Fuzzy organization of the descriptors is proposed in [5] for increasing the retrieval precision at a certain recall value, while 3D searching is discussed in [6]. Applications of content-based retrieval systems are examined in [7], while in [8] a system regarding music access is proposed. Personalized retrieval is examined in the work presented in [9].
Last but not least, Marvel the latest and more intelligent content-based search engine, developed by the IBM research centre, USA in 2004 [10], tries to increase the retrieval precision accuracy by incorporating semantic annotation in the media volumes.  However, all the adopted approaches have static and local access only to the system’s database and thus cannot retrieve content from the web [11]. Furthermore, the aforementioned works focus on the algorithms for efficient content-based retrieval and not on the practical issues regarding the implementation of a large scale multimedia search engine over the Web. So far, several different techniques for making distributed multimedia content searchable have been proposed. In [12] there is information on the techniques of checking th
e outgoing links, analyzing the referring page, mining for textual information in the media file and utilizing metadata using the Dublin Core metadata model or the MPEG-7 standard.
This paper focuses on describing a multimedia search engine that combines features from existing search engines and enhances their functionalities through innovative algorithms and mechanisms. Our goal is not only to describe the system’s architecture and interconnectivity, but also to explain how the algorithms can be implemented in Java code. The proposed system, named PIDALION, runs on Windows environment, while the JavaServer Pages (JSP) and Java Servlets technologies are adopted to ensure the system’s interoperability and dynamic behaviour. The system’s database runs on SQL Server 2000. One of the key features of the proposed search engine is the provision of fully personalized retrieval services: users of PIDALION may share their personal content either with all web users or within the frame of groups, as well as maintain a personal profile, where their preferences are stored. Personalized
978-1-4244-4530-1/09/$25.00 ©2009 IEEE
retrieval can be achieved through the creation of social groups and the use of dynamic relevance feedback mechanisms, which tailor the system’s performance to the current user’s preferences.
This paper is organized as follows: Section 2 presents the system’s architecture, explaining briefly the role of each main component. Sections 3 to 7 present the functionality, architecture and key features-innovations of each component. Key algorithms are depicted in the form of pseudo-code. Finally, in Section 8 the issues covered in this paper are summarized and future expansions are proposed.
2. SYSTEM OVERVIEW
Fig. 1. Interconnection between subsystems
The platform described in this paper consists of the following subsystems:
• The multimedia crawling subsystem , whose role is to index multimedia content and handle the updating of the indexing process
• The multimedia metadata subsystem , which extracts metadata from multimedia content, according to the MPEG-7 descriptors achieving in this way interoperability
• The retrieval and display subsystem , which is responsible for scanning the database for multimedia content that matches specific criteria and forwarding it to the interface subsystem.
• The interface subsystem, which enables the interaction and communication between the user and the system, provides a functional projection of retrieved content and allows the composition of complex queries
• The multimedia  database subsystem , which covers the necessity of storing large amounts of metadata and thumbnails, as well as user profiles and preferences. The way in which the above subsystems interact and cooperate with each other is depicted in Figure 1.
3. THE MULTIMEDIA CRAWLING SUBSYSTEM
The subsystem in question is responsible for locating and indexing multimedia content. The architecture analysis of the multimedia crawling subsystem owes to cover the issues of detecting new web pages and storing the relevant information in the system’s database. In Google [15], web crawling is carried out by several distributed crawlers. A URL server sends to the crawler lists of URLs to be fetched. The web pages that are fetched are afterwards sent to the store-server which compresses and stores them into a repository [13]. In the frame of PIDALION, multimedia crawling is implemented through Java sockets that transfer web pages and multimedia content. The architecture and functionality of the multimedia crawling subsystem is presented in Figure 2. As observed, two different application scenarios are possible: content indexing from Web servers  and content indexing from home personal computers .
Fig. 2. Architecture of the multimedia crawling subsystem
3.1 Content indexing from web servers
Content indexing from web serves concerns multimedia information that is distributed and available on-line throughout the Internet. According to the architecture proposed, once a new web page is detected and registered it is up to the multimedia crawling subsystem to access and fetch the content that lies there. The tasks performed by the system’s daemon once new multimedia content has been located include more specifically scanning of all the relevant records in the  database, definition among the latter of the ones not formerly accessed, extraction through sockets of the necessary content and storing of the latter in the appropriate form in the database. This process is described through pseudo-code in the following lines.
Search Database for unchecked locations While (find unchecked location) {
Open Java Socket and download web page        Parse web page:
- make a list of image links
- make a list of links to other web pages
While (list has more image links) {
Open Java Socket and download image
Process (and store) image
Update system database}
Update database (set web pageÆ checked)
Add web page links to database as unchecked location}
3.2 Content Indexing from Home PCs - Social Group Scenario
In this case, the multimedia content is not on-line at all times but located in the users’ home personal computers. This indexing approach is really convenient in case of social groups, which constitute an interesting web application, as it allows users to share multimedia content among friends and groups of common interest.  Content indexing is accomplished in this case through the use of a Java application, which enables home users to scan a directory in their pc and upload their personal multimedia content to the system server. As soon as the user downloads and runs the application, he is prompted to specify the remote directory where the multimedia content to be indexed lies. The latter is then automatically located, sent to the system server, processed and stored. A personalized
multimedia index is created for this purpose.
4. THE MULTIMEDIA METADATA SUBSYSTEM
The functionality of this subsystem is primarily related to metadata extraction from multimedia content retrieved from the web or from users. The multimedia metadata extraction module is activated each time new multimedia content is identified. Secondly, this process takes place each time a user performs a query by example. Multimedia metadata are encoded with respect to the MPEG-7 standard to ensure interoperability among different types of distributed content.
4.1 Metadata Extraction
As far as image processing and visual descriptors extraction are concerned, a new Image object is initially created. Pixels are acquired by creating a PixelGrabber object and by calling the grabPixels() method, which provides the width and height of the image as well. Once the pixels have been acquired, a mask is applied so as to isolate the R, G and B values. A total of 8 bytes for each pixel is finally received. Afterwards, the RGB values of the extracted pixels are transformed into HSV values, based on which the image histogram is constructed. Since full independency of the image size is required, the histogram’s values are normalized, being divided with the total number of pixels in the o
riginal image.
4.2 Query process
This task is related to the utilization of the search engine by internet users and lies in the extraction of metadata from multimedia content submitted by the latter through the system’s interface within the frame of composing a query. Of course, in this case there is no need for the metadata to be stored but just to be forwarded to the retrieval and display subsystem which is going to perform the searching.
4.3 Indexing Process
In case of images, two kinds of data are extracted and stored: image metadata and image thumbnail.  In case of video files, key frames are extracted in the first place, enabling the process of metadata extraction to be repeated for each frame, which therefore is being treated as an independent image.  More specifically, a video processor is activated to analyze the video content and extract appropriate key-frames, based on the video summarization algorithms of [14]. Selection of this algorithm is due to the fact that a) it is extremely fast (real time processing) and b) it is not required to know in advance the number of key-frames to be extracted (the number of key frames ar
e automatically estimated based on the video content).  The Video Processor has been implemented using the (java.sun/products/java-media/jmf) Java Media Framework (JMF) API. Firstly, a media locator is created as well as a processor that will be used as a player to playback the media. The Video Indexer accesses individual video frames using a "pass-through" codec which is inserted into the data flow path. As data pass through this codec, a callback is invoked for each frame of video data. During the processor's configured state, two codecs, PreAccessCodec and PostAccessCodec, are set on the video track. These codecs are used to get access to individual video frames of the media file.
4.4 Type of Metadata
The proposed search engine adopts color and textural visual descriptors of the MPEG-7 standard. More specifically, as far as colour information is concerned, the scalable colour descriptor (SCD) and the dominant colour descriptor (DCD) are adopted, while for texture information, the homogenous and the non-homogeneous texture descriptors are used. Other types of metadata such as filetype, category and textual information are used to further improve the precision of multimedia retrieval. Moreover, the system uses metadata for maintaining social groups and personalized indices. Multimedia thematic category is determined manually in the framework of this search engine in order
to minimize the number of erroneous categorizations. A more complicated approach is proposed in [15], where semantic representations form different abstraction levels used to create hierarchical groups. MARVEL [11] also supports automatic annotation based on patterns.
5. THE RETRIEVAL AND DISPLAY SUBSYSTEM
The retrieval and display subsystem is both responsible for
accessing the database, executing complex queries and selecting the registrations that best match the given search
criteria, as well as for the organization and presentation of
the retrieved results.
5.1 The Retrieval functionality
Once the registrations that match the search criteria are determined, the Retrieval and Display subsystem draws the corresponding thumbnails, organizes their projection and presents it to the user. The retrieval process is more complex, and thus requires further analysis, when the user provid
es more than one search criteria. An approach for handling multiple search criteria would be to detect the groups of registrations that satisfy each one of the given criteria and afterwards find their section, thus detect the registrations that satisfy all the criteria. This approach constitutes though the least efficient strategy, since a lot of time is required and consequently the system's efficiency is significantly reduced. So, instead, multiple search criteria are confronted by being incorporated in complex queries, which are composed following a hierarchical structure. This structure ranges from the most easy-to-check criterion, and thus the criterion that requires less computational load (bottom) to the most complex and time consuming (top). As a result, the texture and colour histogram criteria are found on top of the hierarchical structure, since for each registration that possibly matches these criteria, the system has to examine three histogram vectors each one composing of ten values, which makes a total of thirty parameters for each registration. In a few words, this retrieval approach enables to reject as many registrations as possible using easy-to-check criteria, so that the most complex ones are left to be applied on a limited group of registrations. Figure 3 illustrates the hierarchical structure of the search criteria.
So far only the simplest search scenario, which involves submission of an image and searching in image content registrations, has been analyzed. However, PIDALION offers also the possibility to se
arch for video files or even submit a video as a prototype for retrieving either images or other video files. So at this point, it is clear that there are finally four different possible search modes:
1. Submission of image and search for images
2. Submission of image and search for videos
3. Submission of video and search for images
4. Submission of video and search for videos
Handling queries for video files is even more complex. When the fourth scenario takes place, the video file uploaded by the user is split in frames and each frame is treated as an independent image. A registration of a video file in the database is finally considered a match only if at least one frame similar to those of the prototype is encountered. Scenarios 2 and 3 derive obviously from the combination of cases 1 and 4.
Fig. 3. Criteria hierarchy
5.2 The display functionality
In the frame of enabling non-linear access to the retrieved content, so as to reduce the required access time for the user, the latter is organized in 8 groups, each one of them being determined by a particular set of algebraic relations involving dominant color or histogram reference values. These reference values are either the dominant color or histogram values of the file submitted by the user (in case of query by example), or the calculated mean values of the retrieved results in case of simple queries.  The final projection of the retrieved content includes just one element of each one of the afore-mentioned groups. Which serves as a link to the rest of its results, forming a hierarchical str
ucture and enabling the user to browse the content of his interest. Non-linear projection of the retrieved content constitutes one of the most important and innovative features of PIDALION, since almost all current search engines have adopted the alternative of serial projection.  6. THE INTERFACE SUBSYSTEM
The system’s interface is designed in such a way so as to permit easy navigation among the several possibilities and services supported without requiring that the user should have particular capabilities or knowledge. The user interface constitutes a dynamically changing environment and as a result it has been implemented using the JSP (Java Server Pages) and Java Servlets technologies which enable the dynamic production of html pages, through Java code. Among others, the main functionalities of the interface subsystem include
a)multimedia content declaration
b)composition of queries and retrieval of results
The system’s interface interacts with the retrieval and display subsystem and presents the retrieved results in
clusters following the projection scheme that the latter provides. An additional innovative feature of PIDALION is the possibility of evaluating the retrieved results. By clicking in the corresponding checkbox that is found below each retrieved registration, the user may provide to the system feedback regarding the type of results that are more appealing or of more interest to him.
c)browsing multimedia content of remote IP address This service offers the user the capability to access multimedia content that is located at a remote IP address. So as to facilitate browsing for the user, the multimedia content is organized in advance in a hierarchical structure.
d)user’s personal profile
7. THE MULTIMEDIA DATABASE SUBSYSTEM
7.1 Database structure
Since the database interacts with most subsystems and plays an important role in storing and retrieving metadata, it is analyzed as a separate component of the search engine. Although the storing space disposed for keeping the necessary information for the system’s operation is indivisible, in order to simplify the analysis it is assumed that the database is divided in the following general sectors, each one of which is named here after the type of data that it contains
Multimedia content location: the existence of this sector is imposed by the multimedia crawling subsystem and its usefulness lies in storing the sites where new multimedia content is located along with the data that is submitted to the system from the users interested.
java和jspMetadata: this section contains all the metadata extracted from the multimedia content and is the one to which the retrieval subsystem is referring for the purpose of detecting the set of registrations that match the search criteria each time a user introduces a query.
Thumbnails: the thumbnails’ sector, whose functionality is pretty obvious, contains two fields: the thumbnail’s name and its serial number. The thumbnails’ registrations stored here are forwarded to the retrieval and display subsystem whenever this is necessary.
User Profiles: Each user of the search engine may have a registration in this section of the database, so that his preferences are known to the system, making possible the perspective of relevance feedback.
Social Groups Info: For the purpose of keeping invisible the content of social groups to unauthorized users, the relevant information concerning social groups is stored in a separate database sector. This sector contains information on authorized users, indexed, and shared content etc.
7.2 Index organization for Real-time Response
Instead of storing the colour histogram as it is extracted from the multimedia content, it is preferred to store its difference from a reference vector: Initially, 256 bins are extracted for each chromatic component from each image or video frame that is being processed. Apparently, all this information is definite to provoke the system’s overloading and increase of response time. For this reason, the extracted bins are organized in 10 clusters, each one - except the last one - containing 26 original bins. Therefore, each registration is composed of 30 bins in total, enabling in this way fast search through numerous registrations.  The system’s quick response is also guaranteed through the use of sql queries that are precompiled in the form of sql procedures. The precompiled sql statements ensure the more rapid execution of queries in the database and moreover deter hackers from degrading data or accessing content they have no right to access, thus contribute to the system’s safety.
The search criteria provided by the user limit the range of the search especially in the case of complex queries. The search space though can be further limited through scalable sorting of metadata: similar metadata are stored in the same space, making it possible to reduce the system’s response time.  More specifically, the registrations are sorted taking into account the histogram vecto
r’s distance from a reference vector. Every time a user submits a new image, the afore-mentioned distance is calculated and the search range is limited to the registrations that are characterized by similar distances and thus similar histogram vectors. The sorting process takes place off-line, so as to avoid the system’s overloading. Moreover, since the sorting process should not interrupt the queries made by users, it takes place in a secondary database, which serves also as a backup. The secondary database draws the registrations from the primary database, sorts them and returns them, while the Retrieval and Display subsystem uses the primary database for handling queries.
8. CONCLUSIONS AND FUTURE WORK
In this paper, we described a multimedia search engine architecture and we presented implementation issues as the latter arise from the Greek funded project of “PIDALION”. The description of the system’s components and of their functionality enlightens the way in which PIDALION combines advantages of other search engines as well as enhances those features and allows scalability. The proposed search engine focuses on a) efficient organization of multimedia indices to allow real-time response even for large scale multimedia content, b) hierarchical and content–based visualization of retrieved results to enable easy and efficient navigation among multimedia content, c) intelligent query composition to minimize response time and best match searc
h criteria, d) fusion of text, semantic and visual descriptors to enhance content searching and interoperable

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。