Searchlight

From ReactOS Wiki
Jump to: navigation, search
This page is probably outdated

The information on this page might not be valid for the current state of ReactOS.
A Wiki Administrator should look at this page and decide or discuss what to do with it.


Find your files instantly.

Goal

The goal of the Searchlight is to provide a tiny, fast and reliable search tool with a simple to use user interface on the one hand and for experts extended search queries, metadata and content database for faster search results, simple system wide API on the other hand.

  • Searchlight will be a search tool for ReactOS, enabling the user to search documents, multimedia files, chat logs, email and contact lists in a similar way to other desktop search applications.
  • Searchlight will be fast
  • Searchlight starts returning results before you finish typing. Because it retrieves all file changes, it can keep the data store up-to-date.

Views

Views (GUI):

  • Quick search: for normal user
  • Advanced search: regular expressions, query builder, extended metadata, etc.
  • Search by category: let the user select a category, e.g. "conversations", "contacts", "documents", "media"
  • Timeline: like the WinFS demo app: list all indexed files on a timeline
  • Smart/Dynamic Folders/Filters: saved queries to a xml file, the explorer view them as a folder
  • Search by example: text files. images, etc.
  • Search by drawing

Search

Search query interfaces (see below).

Standard Search

Simple textbox, no other visible gui objects which could confuse user.

Simple regular expressions and boolean logic will be composed by the program logic, similar to web search engines or e.g. "Spotlight":

  • space characters => AND
  • "|" => OR * "+" => "="
  • "-" => "!="
  • support for "(" and ")"

To bring some advanced feature together with the simple and minimalistic design, an option drop-down menu will allow even everage joe user to construct complex queries with metatags, etc. (with a small set of useful options).

Sample Queries:

  1. taskmgr.exe
  2. react*.pdf
  3. *.jpg -games +wallpaper Another great feature will be the query by sentenses.

Sample Sentenses:

  1. Show me all images which are smaller then 10x10 pixel.
  2. Show me all applications which are older then 5 years.
  3. Show me all meetings for tomorrow.
  4. Show me all contatcs I talked today.
  5. Show me all emails I got today.
  6. Copy all mp3 with 5 stars to volume x.

The sentense will get parsed and a valid SQL-statement will be sent to the SQLite database.

Quick Search

Quick search and standard search are almost the same thing and share the same interface, although because most people don't want to learn even a simple search query syntax, quick search is especially for them.

So what's quick search exactly?

"tom yesterday email" is such a quick search and consists of three words, two words are keywords (yesterday & email). This quick search example query means: "Show me all emails I got from / wrote Tom, yesterday." SearchLight will automatically construct a valid SQL query and show you the requested

Advanced Search

It could be wise to combine the quick search and the advanced search. For example by default only a simple textbox is visible. If the user click on the "advanced search" button, more gui objects will be visible.

A cool feature will be the query constructor, similar as in iTunes/Spotlight and WinVista Search. This means, the user can construct the query simple with combobox values. Each new "line" is a new expression displayed with some combobox objects.

An IntelliSense like auto completion feature would be great too. (see wikipedia:IntelliSense)

Sample Queries:

  1. a) File.Name="taskmgr" AND File.Extention="exe"
  2. b) File.Filename="taskmgr.exe"
  3. File.Name="react*" AND File.Extention="pdf"
  4. File.Extention="jpg" AND File.Name!="*games*" AND File.Name="*wallpaper*"
  5. File.Extention="jpg" AND Picture.Height>=768 AND Picture.Width>=1024
  6. File.Category="Media" AND File.Size>="100000" AND ( Video.Description="*reactos*" OR Audio.Composer="*friedl*" ) SORT BY "File.Size"
  7. File.Date>=Date.Tomorrow AND File.Date<=Date.NextWeekend

All search queries will get parsed and as a result a valid SQL statement will be send to the SQLite3 database (library).

Sample SQL query:

  • SELECT fileid, metaname, metacontent FROM metadata WHERE metaname='File.Filename' AND metacontent='taskmgr.exe' ;
  • SELECT * FROM metadata md1, metadata md2 WHERE md1.metaname="Item.Size" AND md1.metacontent >= 1000 AND md1.filelist_fileid=md2.filelist_fileid AND md2.metaname="item.IsArchive" AND md2.metacontent="true" LIMIT 1000 ;

Extend Searchlight

Further possibilities to extend Searchlight:

  • Plugins (additional file formats, etc.)
  • an optional Firefox Extension, to index web sites as you view them
  • Thumbnails extraction from documents, images, mp3 and other media allowing quick previewing
  • support for removable media
  • web interface (like Google Desktop Search)
  • integrated Web search
  • possibility to sync files and their data stored in the databases
  • add Semantics-sensitive Integrated Matching for Pictures library like SIMPLIcity (http://www-db.stanford.edu/~wangz/project/imsearch/SIMPLIcity/TPAMI/ ; example: http://www.airliners.net/similarity/)

Organization

Searchlight will also let you save searches as a virtual folder. When you open the folder, it runs the search to populate the folder with items. By running the search in real-time, the virtual folder will be able to catch and display all the new files that meet the search criteria. Virtual folders don't recopy your files, so you can safely delete the virtual folder without losing any data. Searchlight's metatag feature will help to better organize files by allowing attaching description "tags" to a file to make it easier to find and organize. Metatags provide a magnitude of improvement over the simple file/folder organization scheme that hasn't changed much since the DOS days. You can tag any file with just about any word. For instance, you might have some videos, photos and planning documents all related to multiple projects. Under the traditional file system, all these files might go into one main folder, with subfolders for each different project. Then you have to deal with the puzzle of sharing the same file across multiple projects.

With the tagging features of Searchlight, you can easily give files multiple attributes. When you search for "reactos project", you're sure to get all the files associated with that project on the first try. If you have files that are relevant for more projects, just tag them to make sure they also show up when you search or create a virtual folder for "winehq project". You can use the built-in tags, such as author or rating (stars), or you can use your own custom keywords. Gathering your files should take no more than a single search even if the actual files are spread throughout the system.

Metadata

Metadata (Greek meta "over" and Latin data "information", literally "data about data"), are data that describe other data. Generally, a set of metadata describe a single set of data, called a resource.

  • "Metadata is data on data"

The main purpose of metadata is to speed up and enrich searching for resources. In general, search queries using metadata can save users from performing more complex filter operations manually.

Metadata can be devided into at least three parts, primary, secondary and tertiary metadata.

Primary Metadata

Primary metadata are information about the data which are stored in the files. Primary metadata is what people usually know as "metadata". It get extracted from the files (plus optional portions of the file content) to generate the first set of metadata.

Examples:

  • MP3 files have metadata tags in a format called ID3
  • HTML files have usually several metadata attributes (title, description, keywords, etc.)

Secondary Metadata

Secondary metadata are information about the data which are not stored in the files. Such metadata information can be stored as filesystem attributes or metadata information which belongs to a file or is stored in a "sidecar" file(s) that include the info. If you move the files to another computer, the data comes along for the ride.

Examples:

  • User scoring/rating of files (e.g. stars)
  • Tags: files belongs to project X and Y
  • Categories, etc.
  • additional metadata tags which cannot be stored in the file format for some reasons

Tertiary Metadata

Tertiary metadata are information about the data which are generally not stored in the files. Such metadata information can be stored as filesystem attributes or metadata information which belongs to a file or is stored in a "sidecar" file(s) that include the info. If you move the files to another computer, the data comes along for the ride.

An analysis of the original data (file) may produce/generate (with several different special methodes) additional metatdata information (associated with the file and/or at least a portion of content of the file and which may not exist in the original metadata/content of the file. Tertiary metadata may consists of a broader scope descriptiuon of the original file content.

Tertiary metadata may assigned by predefined rules (e.g. "image, focal length > 200, sensitive > 800, shutter speed > 1000" = "photo, night, action"), optinally by using external information (e.g. internet movie database, wikipedia, GPS, etc.)

Examples:

  • OCR
  • voice recognition
  • GPS (external) resource
  • picture analysis (picture/video/etc.): color/shape, time (day vs. night), nature (photo/drawing/painting; still/action), type (portrait/person/landscape), text (OCR), etc.
  • sound analysis (music , etc.): type (rock, country, jazz, etc.), recognized text (voice recognition), etc.
  • additional metadata tags which is generated by analysis of the original file content

Metadata Portability and Surviveability

Important, important, important, important!

Most available systems/applications don't bother about metadata portability and surviveability. Today, a simple application can delete/overwrite all metadata of files. Edit a photo (captured with a digital camera) with e.g. MS Paint and then try to read the XMP metatdata, then you know what we mean.

Don't use iTunes (or similar) ratings (even if you would love to) as we know all that work will be lost one day in transitioning between different computers and music libraries.

These disadvantages are a great chance for SearchLight. Metadata shouldn't be stored in applications monolythic data stores (databases) but in the file itself, as filesystem attribute or next to the file (sidecar file).

An important feature is to allow the user to remove parts or all metadata information from the selected files.

Security

To provide a maximum of security, the Searchlight has several methodes inbuilt to protect its own database content.

First of all, it is only possible to access the "Searchlight Framework" through the Searchlight API.

  • User can only see search results about files which they have access to, that means Searchlight Framework checks the file rights.

Further possibilities to extend Searchlight:

  • Every application which tries to access the Searchlight Framework the first time needs a user interaction. The user has to allow the access to the Searchlight Framework through a popup window.
  • A black-list (updated weekly from reactos.org) denies access to applications which stay on that list (hash/md5/etc checking). This means that the Searchlight Framework will be secure enough for daily use. And by far more secure than e.g. Apple's Spotlight, etc. To go for sure, really important data should be encrypted. And it will be possible to remove single directories as well as whole volumes from indexing.

Differences

  • In contrast to desktop search apps and vamporware WinFS, Searchlight Framework is designed to keep metadata of all volumes, even remove-able storages media (e.g. USB-HDD, etc.).
  • This means that Searchlight Framework create and maintain a set of databases for each volume (on each volume root directory (in a specific directory).
  • The great benefit is for example that you will be able to search for system files, all your program data, etc., except temp files and the system trash.
  • By design, Searchlight will be useful for all search tasks, in contrast to desktop search apps which can only show results from your home folder, your emails and your website favourites.
  • Besides it would be possible to search not only on the local computer but on network storages (requirement: preinstalled Searchlight on all computers and usage of a network protocol so that each computer search local and send the results to the computer that requested it).
  • The Searchlight index files and content like other desktop search tools, but then Searchlight goes beyond this and link the items based on their context.

The result is: better and more accurate search.

Why “Searchlight”?

A searchlight is an apparatus with reflectors for projecting a powerful beam of light of approximately parallel rays in a particular direction, usually devised so that it can be swiveled about.

Searchlight goal is to provide better and more accurate search results and it will returns search results as soon as you start typing and refines them on the fly as you add to your search criteria.

Author

Klemens Friedl