# Parsing scheme creating

Parsing scheme shows to system how to retrieve a data from a web page. To create a scheme use "New" button in the menu.

Create a scheme

When scheme is created it should be configured for the loading a data from a site. Each scheme consists of

  1. Scheme Blocks
  2. Scheme Pages
  3. Scheme Settings

# Blocks

To start a crawling, user should explain to the system what data he wants to get from a site. For these purposes parsing blocks were made. User can create a new block with some name and then choose in special editor what data he wants to retrieve. The system forms json output due to created blocks. This json updates each time when scheme changes and can be viewed pressing "Result JSON" button in the scheme page.

# Item block

This type of block uses for retrieving data saving it to a some key. Block with the name "Title" will store data like this way.

{
  "Title": "some value"
}
Item block

# Key block

This type of block uses only for add additional keys to result JSON structure. Block with the name "Key" with the item block with the name "Title" will generate next output:

{
  "Key": {
    "Title": "some value"
  }
}
Key block

# Object block

This type uses to retrieve the complex information from a site. One of the common type of such information is a table row. Each table row is an object and each column of this row is a field. If we have such table:

Company Contact Country
Alfreds Futterkiste Maria Anders Germany
Centro comercial Moctezuma Francisco Chang Mexico

Information from it can be retrieved via object block with the name "Row" contains two item blocks with the names "Company" and "Contact"

{
  "Row": {
    "Company": "Alfreds Futterkiste",
    "Contact": "Maria Anders",
    "Country": "Germany"
  }
}
Object block

# Modifiers

# Is single

As default all blocks has this modifier equals true. This means, that this block will take only the first entry from a site. If it will be set to false, all found entries will be added to the output, and it will have an array type. E.g. a previous example will return

{
  "Rows": [
    {
      "Company": "Alfreds Futterkiste",
      "Contact": "Maria Anders",
      "Country": "Germany"
    },
    {
      "Company": "Centro comercial Moctezuma",
      "Contact": "Francisco Chang",
      "Country": "Mexico"
    }
  ]
}

with this modifier disabled.

Object block

# Pages

Each parsing scheme have pages to parse. When scheme is created it contains only one page, which used for it creating. Other pages can be added in "Pages" tab of a scheme. List of all pages in the system is getting from the sitemap of the site the scheme belongs to. The "Pages" tab have two modes to show pages of a site. "Selected" shows only pages already added to the scheme, "Not selected" - all pages of the site not added to the scheme. All these tabs have search filters which are work as a mask. For example, "products/*/detail" will filter pages in the selected mode and show only starts with "products/" and ends with "/detail".

# Adding

Each page in the "Not selected" mode has the button to add it to the scheme. Also, all pages from the current selection can be added to the scheme via "Select n pages" button.

Select pages

# Removing

Each page in the "Selected" mode has the button to remove it from the scheme. Also, all pages from the current selection can be removed from the scheme via "Remove n pages" button.