Home|Products|Doc/VSQL|Community|Company
![]()
This document provides a brief overview of VectorSQL (VSQL).

VectorSQL is a vectorial dialect of SQL implemented as a library extension on top of the XJ vector programming language. In contrast to traditional SQL dialects, VectorSQL is interactive, function-based, vector-valued, and extensible:
·
function based: every command in VectorSQL is a
function (traditional SQL commands are statements). Functions have much simpler
syntax than staments, can be easily and consistently composed to produce larger
semantic units, and can be user defined.
·
interactive: VectorSQL code is highly synthetical,
i.e., it expresses larger concepts as compositions of smaller ones, and as a result,
it can be run interactively, enabling users to do exploratory what-if data
analysis (traditional SQL code is monolithic and compiled).
·
vectorized: every VectorSQL function can operate on
whole (single or multidimensional) arrays at once, without having to resort to
for or while loops.
·
extensible: new functions that you create are
first-class citizens in VectorSQL and perform just as fast as the native ones
provided in the core distribution.
VectorSQL is simple to learn and command of the language can comfortably be increased by gradually incorporating use of advanced XJ concepts (such as function rank, function composition, boxing and unboxing, dimension transposing, attaching, joining, and appending items) into the results of VectorSQL commands. This advanced functionality is useful when defining calculated columns, for example.
Anything that can be done with ANSI SQL (and this applies also to most common proprietary SQL vendor extensions) can be done with VectorSQL, frequently in a simpler manner. For any significant amount of SQL code, the equivalent VectorSQL is frequently not only up to 75% shorter (about a quarter of the actual number of LOC), but also simpler in structure (VectorSQL code is rarely nested) and thus easier to understand.
Exploratory Analysis refers to that style of continued interaction between a user and a DBMS where the results of running a query determines the nature and details of the next query to be performed. Exploratory analysis imposes, at the very least, the following requirements on a DBMS:
•
very fast query execution, so that most ad-hoc queries
complete in less than 5 seconds
•
interactive querying language, so that queries can be
comfortably and effectively entered piece-meal into a console
•
a way to keep track of previous results and to use
them in ulterior queries
The row selection index or NDX represents the current set of selected rows on the table, based on the immediately previous execution of WHERE and related VectorSQL commands. The NDX is simply a bit array of the same cardinality as the associated table. VectorSTAR provides the facilities to save these bit indexes to an ordinary file, which is typically quite small: the NDX for a dense selection out of a 10 million record table is barely 1.2MB uncompressed, and usually less than 200KB when compressed. Compression is done automatically by the SAVE_SELECTION command using a fast RLE algorithm. Actually, for a sparse selection, the NDX is stored as an index set (rather than bitset) and the result size in bytes is equal to the number of selected rows (i.e., for 10 rows out of 10 million, the NDX occupies a mere 10 bytes). A unique and very powerful capability of VectorSTAR is that this saved NDX can then be:
•
imported later into the same session
•
imported later into a future session by the same user
•
sent to a colleague (perhaps as a mail attachment?)
who imports it into their session
•
sent over a grid to a different node(s) where the
output will be produced and displayed
•
sent to a different node to be applied to some other
(usually conceptually related) table of the same cardinality
Whenever you do a WHERE command on a table, the effect of the command is that the selection set for the table is modified. Think of the selection set as a boolean vector with the same cardinality as its associated table. Entries marked 1 (TRUE) are currently selected (by SELECT_ROW commands that you have issued on the table), those marked 0 (FALSE) are not. The EVERY command resets the selection set for a table to all 1's.
The selection set is not persistent. It retains its value only while the session is running. Furthermore, if a user connects to VectorSTAR via a web browser, the default access mode does not guarantee them the exact same backend session process for every command they issue, thus preventing them from being able to refer to previous selection sets.
A user can save the current selection set to a file (note that this is not the same as saving the selected rows, it only saves their references) using the SAVE_SELECTION command. This file can then be sent to other users who can LOAD_SELECTION from that file and obtain a selection set on their session that is the same as if they had performed the same queries as the original user who created the selection set file.
The VectorSQL command set provides a nearly one-to-one functionality mapping to the ANSI standard SQL command set. Nevertheless, VectorSQL is a function-based dialect of SQL and as such, its structure is naturally sequential instead of nested (a fact that reportedly makes it easier to comprehend on complex queries) and, more notably, its order of execution is often "backwards" compared to that of SQL. In the end, although strict compatibility is impossible by design, functional compatibility is achieved in most cases.
A concrete example should help clarify the issue. Here is the code for calculating a business metric using a common commercial dialect of SQL:
SQL
INSERT INTO report_output
SELECT &&tech_id, &&datum_id, &&service_id, tmp.region, tmp.amount
FROM (SELECT tmp2.region, sum(tmp2.amount) amount
FROM (SELECT region,
round(sum(total_amnt),2) amount
FROM cdr
WHERE trunc(tx_time) = to_date ('&&rundate','YYYYMMDD')
GROUP BY region
UNION ALL
SELECT region_id region,
round((sum(abs(ADJ_AMNT))+nvl(sum(abs(DED_1_AMNT)),0)
+nvl(sum(abs(DED_2_AMNT)),0)),2) amount
FROM ahr
WHERE trunc(tx_time) = to_date('&&rundate','YYYYMMDD')
AND adj_type in (17,20)
GROUP BY region_id
)tmp2
GROUP BY tmp.region
)tmp;
Note both the nested structure and the fact that the order of LOC execution is not the same as their order of appearance. Note also the need for two temporary tables. Here is the equivalent in VectorSQL:
VectorSQL
FROM 'Cdr'
WHERE 'TxTime in_day ', RunDate
GROUP_BY 'Region'
SELECT INTO 'Tmp' 'group AS Region, round@sum TotalAmnt AS Amount'
FROM 'Ahr'
WHERE 'TxTime in_day ', RunDate
AND 'AdjType in 17, 20'
GROUP_BY 'RegionId'
Adj =. 'sum abs AdjAmnt Ded1Amnt Ded2Amnt'
SELECT UNION 'Tmp' 'group AS Region, round@sum ', Adj, ' AS Amount'
FROM 'Tmp'
GROUP_BY 'Region'
SELECT INTO 'ReportOutput' _
Tech, Datum, Service
group, sum Amount AS Amount
)
Things to note:
• VectorSQL is frequently shorter (sometimes significantly) although this is mostly related to the fact that some temporary tables that have to be calculated and stored on traditional RDBMS are not required on VectorSTAR. This is not the case for this query, however.
• Notice how the 2-level nesting of the original query is flattened by VectorSQL, as the cdr and ahr intermediate results do not need to be calculated from inside a subquery.
• Every SQL statement and function has a direct equivalent in VectorSQL.
• The nvl function is usually not needed.
Notice the use of the INTO modifier to SELECT, which is a shortcut for: INSERT INTO x ;; SELECT y . The use of _ in the last SELECT means "use the following lines up to the single ) as argument to SELECT".
Experience with a rather varied set of SQL programmer backgrounds and capabilities has shown that many SQL users have no trouble understanding the VectorSQL implementation of a SQL stored procedure. Frequently, the commonality of the command set is promptly perceived, leaving only the different ordering of clauses as a notable distinction .
There is a reason for the apparently "reverse" order of execution. Since VectorSQL is based on function composition, it has no complex "statements" and thus the ordering of the component functions (think "substatements", as in the FROM, WHERE, SORTED BY, etc., pieces of a full SELECT statement) has to match the required execution flow, whereas in SQL, where SELECT is a complex statement, the fixed syntatic ordering of its subclauses is only really the result of its original designer's choice. Note also that the ordering is not strictly "reverse" in many cases, as with the SORT_BY command, which frequently goes at the end in VectorSQL queries too.
In summary, existing SQL scrips can be reused in VectorSQL in the sense that their underlying logic won't have to be changed, but they will still require some (mostly mechanical) syntax transformation to accomodate the true-to-execution-flow ordering required by VectorSQL.
![]()