CSE-4/562 Spring 2019 - Practicum: Checkpoint 1

Practicum: Checkpoint 1

CSE-4/562 Spring 2019

February 8, 2019

Textbook: Ch. 16.1

Checkpoint 1

sif$ javac -cp build:jsqlparser.jar:evallib.jar -D build {all .java files} sif$ ls data/ R.csv      S.csv     T.csv sif$ head -n 2 data/R.csv 1|3|5 2|9|1 sif$ java -cp build:jsqlparser.jar:evallib.jar dubstep.Main
 - $> CREATE TABLE R(A int, B int, C int); $> SELECT A, C FROM R WHERE B < 5; 1|5 ... $>

Checkpoint 1

  • Your code compiled just the same as in Checkpoint 0.
  • Print a prompt '$>' at the start and after each command.
  • CREATE TABLE statements tell you the schema of each table.
  • Data lives in a '|'-separated file in 'data/[tablename].csv'
  • Print query results '|'-separated
→ SQL
CREATE TABLE PLAYERS(
ID string, 
FIRSTNAME string, 
LASTNAME string, 
FIRSTSEASON int, 
LASTSEASON int, 
WEIGHT int, 
BIRTHDATE date
);

SELECT FIRSTNAME, LASTNAME, 
   WEIGHT, BIRTHDATE 
FROM PLAYERS 
WHERE WEIGHT>200;
→ SQL CREATE TABLE SELECT
CREATE TABLE PLAYERS(
ID string, 
FIRSTNAME string, 
LASTNAME string, 
FIRSTSEASON int, 
LASTSEASON int, 
WEIGHT int, 
BIRTHDATE date
);

There is a table named "PLAYERS"...

  • ... with 7 attributes
  • ... who's attributes have the given types
  • ... with data in the file "data/PLAYERS.csv"

→ SQL CREATE TABLE Saved Schema .csv SELECT
ABDELAL01|Alaa|Abdelnaby|1990|1994|240|1968-06-24
ABDULKA01|Kareem|Abdul-jabbar|1969|1988|225|1947-04-16
ABDULMA01|Mahmo|Abdul-rauf|1990|2000|162|1969-03-09
ABDULTA01|Tariq|Abdul-wahad|1997|2002|223|1974-11-03
ABDURSH01|Shareef|Abdur-rahim|1996|2007|225|1976-12-11
ABERNTO01|Tom|Abernethy|1976|1980|220|1954-05-06
ABRAMJO01|John|Abramovic|1946|1947|195|1919-02-09
ACKERAL01|Alex|Acker|2005|2008|185|1983-01-21
ACKERDO01|Donald|Ackerman|1953|1953|183|1930-09-04
ACRESMA01|Mark|Acres|1987|1992|220|1962-11-15
ACTONCH01|Charles|Acton|1967|1967|210|1942-01-11
...		
→ SQL CREATE TABLE Saved Schema .csv SELECT Results

Example Queries

  1. SELECT A, B, ... FROM R (Project)
  2. SELECT A, B, ... FROM R WHERE ... (Project+Filter)
  3. SELECT A+B AS C, ... FROM R (Map)
  4. SELECT A+B AS C, ... FROM R WHERE ... (Map+Filter)
  5. SELECT SUM(A+B) AS C, ... FROM R (Aggregate)
  6. SELECT SUM(A+B) AS C, ... FROM R WHERE ... (Aggregate+Filter)
→ SQL CREATE TABLE Saved Schema .csv SELECT Results
    if(stmt instanceof Select){
      SelectBody bodyBase = ((Select)stmt).getSelectBody();
      if(bodyBase instanceof PlainSelect){
        PlainSelect body = (PlainSelect)bodyBase;
        ...
        body.getFromItem()
        body.getWhere()
        body.getSelectItems()
        ...
      }
    }
→ SQL CREATE TABLE Saved Schema .csv SELECT Iterators Results

Iterators

There are a number of data transformations that appear in more than one pattern. For example:

  1. Loading the CSV file in as data
  2. Filtering rows out of data
  3. Transforming (mapping) data into a new structure
  4. Summarizing (aggregating) data
  5. Printing output data

Suggestion: Abstract these steps out

Idea: Functions

  1. loadCSV(...) returns a table
  2. filter(condition, table) returns a table
  3. map(expressions, table) returns a table
  4. aggregate(aggregates, table) returns a row
  5. print(table)

Basic Mindset


  r = get_table("R")

  s = get_table("S")
  
  temp1 = apply_join(r, s, "R.B = S.B")
  
  temp2 = apply_select(temp1, "S.C = 10")
  
  result = apply_projection(temp2, "R.A")
    

Select

$$\sigma_{A \neq 3} R$$

AB
12
34
56

Select


                  def apply_select(input, condition)
                    result = []

                    for row in input:
                      if condition(row):
                        result += [row]

                    return result;
    

(All-At-Once)

Problem: A "table" can get very very big.

Better Idea: Iterators

hasNext()
Returns true if there are more rows to return
next()
Returns the next row
reset()
Resets the iterator back to the first row

All "functions" can be implemented as iterators that use constant space

Select

$$\sigma_{A \neq 3} R$$

AB
getNext()for row in input:
12return row;
getNext()for row in input:
34X
56return row;
getNext()for row in input:
Nonereturn None;

Example: Join (Naive)


			for r in R:
			  for s in S:
			    emit(merge(r, s))
		

Example: Join (Naive)


class NestedLoopJoinIterator implements RAIterator
{
  RAIterator R, S;
  Row current_r = null;
  public NestedLoopJoinIterator(RAIterator R, RAIterator S) {
    this.child = child; this.condition = condition; 
  }
  public Row next(){ 
  	if(!S.hasNext()) { S.reset(); current_r = null; }
    if(current_r == null){ current_r = R.next(); }
    return merge(S.next(), current_r);
  }
  
  /* ...? */
}
		

'|'-separated Value File Suggestions

  • Use BufferedReader's readline() method
  • Precompile your '|' splitter using java.util.regexp.Pattern.compile()
  • Parse everything upfront
  • Keep each row as an Array of PrimitiveValues
  • For Codd's sake, don't store entire tables in memory

Also, Use EvalLib's PrimitiveValue

SQL TypePrimitiveValue
stringStringValue
varcharStringValue
charStringValue
intLongValue
decimalDoubleValue
dateDateValue

EvalLib

How do you evaluate the A > 5 in
SELECT B FROM R WHERE A > 5?

More generally, how do you evaluate a JSqlParser Expression?


    Eval eval = new Eval(){ /* we'll get what goes here shortly */ }
		

    // Evaluate "1 + 2.0"
    PrimitiveValue result;
    result = 
    eval.eval(
      new Addition(
        new LongPrimitive(1),
        new DoublePrimitive(2.0)
      )
    ); 
    System.out.println("Result: "+result); // "Result: 3.0"
		

    // Evaluate "R.A >= 5"
    result =
    eval.eval(
      new GreaterThanEquals(
        new Column(new Table(null, "R"), "A"), // `R.A`
        new LongPrimitive(5)
      )
    );
		

Problem: What value should EvalLib give to R.A?

Your code needs to tell it...


  Eval eval = new Eval(){
    public PrimitiveValue eval(Column c){ 
      String colName = c.getColumnName();
      String tableName = c.getTable().getName();
      PrimitiveValue ret = /* look up colName.tableName */
      return ret;
    }
  }
		

Building Iterators From JSQLParser Statements


      SELECT [DISTINCT]
             target
      FROM source
      WHERE cond
      ORDER BY order_by
      LIMIT lim
      UNION nextselect
        

  RAIterator result = new TableIterator(plainSelect.getFromItem());
    

  if(plainSelect.getJoins() != null){
   for(Join join : plainSelect.getJoins()){
      /** Make sure Join is a simple join **/
      result = new NestedLoopJoinIterator(result, join.getFromItem());
    }
  }
    

  if(plainSelect.getWhere() != null){
    result = new FilterIterator(plainSelect.getWhere(), result);
  }
    

Questions?