Bartosz Bąbol

Software engineering

Fun With Scalameta

Introduction

My previous post was about new inline macros in scalameta which might suggests that scalameta is ‘new macros’ and that statement is not true at all. Actually new macros in current stage are just experimental feature which might work as we saw in previous post but something else needs more attention in current state of metaprogramming. Scalameta 1.0.0-that’s the main dish. It has came out in June this year and in my opinion it’s super interesting thing to learn.

What is Scalameta?

Scalameta is a framework for tokenizing and parsing code. I imagine this library as some kind of tool which you can gently “inject” into process of compilation of your program. With this tool you can do many different things with your code before it will be sent for compilation. So I’ve divided this post into Compilers stage, in each stage you’ve got different set of tools which you might use.

Thats my imagination let’s go back to more formal explanation. Scalameta provides developer an api for tokenizing code, representing AST and cool wrapper around it called quasiquotes which you can use for constructing and deconstructing code. Sounds similar to macros? New inline macros will use the same API as scalameta, so you can’t go wrong with previewing it. At the end of this post I’ve included useful links which I enourage you to preview. Without further introduction let’s start with some easy examples.

TL;DR

My repo for this blog post is here on github.

First stage of compilation: Lexical Analysis- def tokenize

First stage of compilation of program is lexical analysis. This analysis is responsible for grabbing everything as it is from input. So look at the example below:

Main.scala
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import scala.meta._

object Main extends App{
  val someCode =
    """
      def testMethod = {
        println("printing");
      }
    """.tokenize

  val tokens = someCode match {
    case Tokenized.Success(t) => t
    case Tokenized.Error(_, _, details) => throw new Exception(details)
  }
}

Ok so after importing this scalameta we’ve got a lot of implicits flying around our code and we’ve got access to many cool features one of them is method tokenize on string.

Method tokenize will return object Tokens. Check source code here. It’s wrapper containing all of tokens found in specific piece of code, and more of just that what you can read in source file. You can access tokens by invoking tokens method on Tokens object.

Main.scala
1
  println(tokens.tokens)

Possible types which tokenizer can recognise you might find in source code here You can also see structure of the tokens by invoking structure of Tokens.

Main.scala
1
println(tokens.structure)

Structure will contains absolutely everything like spaces, new lines etc. It might be useful for debugging.

If you want to preview human readable results of your play with tokens you can use method syntax:

Main.scala
1
println(tokens.syntax)

Goal of lexical analysis is to divide program into words(tokens). And this is exactly what method tokenize does. At the stage of tokenizing code our program doesn’t understand meaning of code. We can also preview syntax built from the tokens by invoking ‘syntax’ on Tokens. Actually we can pass completely not valid code here:

Main.scala
1
2
3
4
5
6
7
8
object Main extends App{
  val someCode =
    """
      defte st Method = {
        println("printing";
    """.tokenize

    ...

And it will be successfully separated into different words.

There is an error case which I’ve encountered playing with this example. Tokenizing the code might return Tokenized.Error in case where literal hasn’t got closed quote:

Main.scala
1
2
3
4
5
6
7
  val someCode =
    """
      defte st Method = {
        println("printing);
    """.tokenize

    ...

For me it was a little weird at the first time because I thought tokenizer doesn’t check any rules but in this particular example Token “printing” is treated as one token so there is no Token “printing and this is why we get error.

Tokenizing is very low level operation which gives you a lot information about how code looks like. In tokenized result you’ve got all spaces, new lines, commas, separators etc. In this level you might look for specific tokens, see how code is indented, modify it and so on. Let’s look at some example:

Tokenization example 1

Let’s say that we just grabbed project written by somebody else who felt scala syntax in different(worse) way. We want to replace occurences of getOrElse(null) to orNull. Example:

1
2
3
4
5
6
7
 case class Scalameta {
   def println() = sth.getOrElse(null)
   val x = Option(foo()).getOrElse(12)
   val y = {
     Option(bar()).getOrElse("foo") + Future(x).get(null)
   }
 }

to

1
2
3
4
5
6
7
 case class Scalameta {
   def println() = sth.orNull
   val x = Option(foo()).getOrElse(12)
   val y = {
     Option(bar()).getOrElse("foo") + Future(x).get(null)
   }
 }

I encourage you to open github sources from previous links and try to tackle this problem. I don’t want to spoil you fun with playing with Scalameta but if you’re interested in my quick solution you can look here

Using this API you can think about numerous analogical examples like:

  • replace filter(…).headOption to find
  • find(…).isDefined to exists
  • “${saveRateSettingParam}” to “$saveRateSettingParam”

And whatever syntax rule you want.

There is library called ScalaFmt, which heavily operates on token level, creator of this library also gave cool workshop about scalameta, so I encourage you to preview it. We will not dig deeper into tokenization in this post. At mentioned workshop there is more info and some cool examples too. Links at the end of the post.

Second stage of compilation: Parsing code- def parse[U]

So after tokenizing code compiler needs to understand it. This is stage of parsing code. Moreover compiler needs simpler structure than tokens, without redundant syntax like comments, spaces, commas or new lines etc. How to parse code in scalameta? Check this trait Api There is method def parse[U] and if you will track down this method you will finish in Parse Trait in Github

So Parse is parameterized with type T. What is type T? Hint is in object Parse. T in our example will be something which scalameta can read and Parse. In object Parse you can see a lot of implicits for types you can automatically parses. Let’s choose Type as T:

1
  implicit lazy val parseType: Parse[Type] = toParse(_.parseType())

Method parse[T] returns type Parsed(like Tokenized in previous stage), which has 2 nodes: Success and Error. Check implementation here

Let’s try to parse something, which might look like Type:

Main.scala
1
2
3
4
5
6
7
8
9
10
11
   val code = "List[String]".parse[Type]

   printResult[Type](code)

   def printResult[T](code: Parsed[T]): Unit = {
     code match {
       case Parsed.Success(tree) => println("Code is valid!")
       case Parsed.Error(pos, msg, details)  =>
         println(s"Pos: $pos, msg: $msg. More details: $details")
     }
   }

And code is parsed as Success, that makes sense. Let’s modify this line a bit.

Main.scala
1
2
3
4
5
6
   val code =
    """val l: List[String]= List()""".parse[Type]

   printResult[Type](code)

   ...

End suddenly our val code is now Parsed.Error(…). If you have read my previous posts from January about scalamacros, you’ve probably notice that AST was represented as Tree type. In scalameta AST is more strict (maybe better term will be typesafe). That means that you’ve got specific types of AST nodes, as you’ve seen in trait Parsed. Check this object to see what parsers you’ve available by default. Now we expect val code to be of type Stat not Type. Let’s change it:

Main.scala
1
2
3
4
   val code = """val l: List[String]= List()""".parse[Stat]

   printResult[Stat](code)
   ...

In previous stage we could tokenize everything we want, we have been operating on Token level so space, character etc. Parsing stage needs to know meaning of our code, so we can’t arbitrarily parse whatever we want. Types like Stat or Type will be present with us everytime when we will do sth with scalameta. I think this is the biggest difference between old API and new one. Other examples:

Main.scala
1
2
3
4
5
6
7
8
9
     val code = """val a: List[String]= List()""".parse[Stat]
     val caseExpr = """case true => println("its true!")""".parse[Case]
     val term = """x + y""".parse[Term]
     val arg = """a: List[String]""".parse[Term.Arg]

     printResult[Stat](code)
     printResult[Case](caseExpr)
     printResult[Term](term)
     printResult[Term.Arg](arg)

Ok Parsed.Success what next? Tree!

So parsing code gives us eventually one of AST type like Stat or Type. After successful parsing code, scalameta gives you full API for building AST. Moreover it provides you cool wrapper around those API called quasiquotes which drastically simplifies creating/deconstructing AST.

Open quasiquotes docs It will be very helpful. Keep it open all the time you do something with scalameta.

Lets modify our example a bit:

Main.scala
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import scala.meta._

object Main extends App{
  ...
  val code =
    """case class Car[CarCompany](brand: CarCompany, color: Color, name: String){
         val owner: String = "John"
         def playRadio() = {
           "playing radio"
         }
         val capacity, speed = (5, 200)
         val oneVal = 45
      }
    """.parse[Stat]

  val q"..$mods class $tname[..$tparams] ..$mods2 (...$paramss) extends $template" = parseCode(code)

  template match {
    case template"{ ..$stats } with ..$ctorcalls { $param => ..$stats2 }" => stats2.map{
      case q"..$mods def $name[..$tparams](...$paramss): $tpe = $expr" => println(s"methodName: $name")
      case q"..$mods val ..$patsnel: $tpeopt = $expr" => println(s"value $patsnel equals to $expr")

    }
  }

  def parseCode[T](code: Parsed[T]): T = {
    code match {
      case Parsed.Success(tree) => tree
      case Parsed.Error(pos, msg, details)  => throw new Exception(msg)
    }
  }
}

Now def parseCode[T] returns this tree so we can use quasiquotes API to play with parsed code. It’s super easy to construct and deconstruct code using this API.

If you didn’t see syntax of quasiquotes before it might look a little weird for you, especially those .. and … signs. I will copy (and modify a bit) explanation of them from my previous post about macro annotations:

1
$name[..$tparams](...$paramss)

Ok, so we are extracting methodName but what are those “..” and “…” signs means?

Let’s start with ..$- this pattern expects List[meta.Type.Param]. And this is nice because our annotated method could take many type parameters.

And what is …$- this pattern expects List[List[meta.Term.Param]]. This is becase our method can take many parameters sets, so it could look like this one:

1
2
3
private def foo[A, B, C](a:A ,b:B)(c: C): A = {
//body
}

Let’s look closer at those lines:

Main.scala
1
2
3
4
5
6
7
8
9
  val q"..$mods class $tname[..$tparams] ..$mods2 (...$paramss) extends $template" = parseCode(code)

  template match {
    case template"{ ..$stats } with ..$ctorcalls { $param => ..$stats2 }" => stats2.map{
      case q"..$mods def $name[..$tparams](...$paramss): $tpe = $expr" => println(s"methodName: $name")
      case q"..$mods val ..$patsnel: $tpeopt = $expr" => println(s"val names: $patsnel")
    }
  }
}

In above lines we deconstruct code. You can use pattern matching when you expect specific type of code. I’ve copied those patterns from docs So deconstructing is human readable code and moreover you’ve got some errors with parsing caught at compile time. I encourage you to preview docs and try to construct/deconstruct different code lines. Quasiquotes hide most of the complexity of building/deconstructing code. But what if you want to dig deeper into your code and try to understand what is going on under the hood of this cool API?

We need to go deeper

Run this code:

Main.scala
1
2
3
  val constructedTree = q"""def foo = println("quasiquotes")"""

  println(constructedTree.show[Structure])

Printed result looks like this:

console
1
Defn.Def(Nil, Term.Name("foo"), Nil, Nil, None, Term.Apply(Term.Name("println"), Seq(Lit("quasiquotes"))))

And printed result is equivalent of code built by quasiquotes. It will give you deeper insight whats going on in your metaprogram. In some cases you have to use constructors of scalameta types like Def or Term to build desired piece of code, we will see examples later. Hope you see what quasiquotes gives you, how they hide complexity inside human readable syntax.

Example no. 1- Constants

Ok so we know something what is scalameta, we saw examples of API usage and now let’s try to use it in some examples. This is the case:

We are working on the project and we need some standards like for example:

  • Constants strings in our project are always in object Constants
  • If value of some constant is assigned to 2 different vals then we want to throw a warning or exception whatever.

This is our object Constants and it clearly doesn’t follow our rules, “ruby” is assigned to 2 different vals:

Constants.scala
1
2
3
4
5
6
object Constants {
  val java = "java"
  val scala = "scala"
  val ruby1 = "ruby"
  val ruby2 = "ruby"
}

Let’s check for possible solution:

ConstantsValidator.scala
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import scala.meta._

object ConstantsValidator {
  case class Val(valName: scala.meta.Pat, valValue: String)

  def validate(source: Source) = source match {
    case source"..$stats" => stats.collect(_ match {
      case q"..$mods object ${Term.Name(name)} extends $template" => name match{
        case "Constants" => template match {
          case template"{ ..$stats2 } with ..$ctorcalls { $param => ..$stats3 }" =>{
            val vals: List[Val] = stats3.foldLeft(List[Val]()) {
              (acc, elem) => elem match {
                case q"..$mods2 val ..$patsnel: $tpeopt = $expr" => acc :+ Val(patsnel.head, expr.toString)
                case _ => acc
              }
            }
            vals.groupBy(_.valValue).foreach{ case
              (valueKey, listOfVals) => if (listOfVals.length > 1 ) throw new Exception(s"$valueKey is assigned more than once to different vals: ${listOfVals.map(_.valName)}")
            }
          }
        }
        case _ =>
      }
    })
  }
}

Invoke it in Main.scala:

Main.scala
1
ConstantsValidator.validate(new java.io.File("src/main/scala/Constants.scala").parse[Source].get)

and run program. You should get an exception with error msg. To understand what’s going on in the implementation go to quasiquotes docs and check how to deconstruct Source. Then I choose only objects for further parsing, and moreover object which name is Constants. Then I do some logic with groupBy to find repetitions. Hope it’s straightforward. Look at this line:

ConstantsValidator.scala
1
case q"..$mods object ${Term.Name(name)} extends $template" => name match{...}

If you will look at quasiquotes docs you will find different syntax:

1
case q"..$mods object $name extends $template" => name match{...}

But name in above line is actually shorthand for Term.Name(“someName”) and I’m interested with this string “someName” so I’ve changed this expression to fit my needs. This is one of the example when constructing code from bare Tree types is useful. Change value “ruby” to sth else to make project compile again.

Example no. 2- Name of the object

Let’s say that we want to set some naming convention in our project. We want all objects to start with uppercase letter. Our metaprogram should check this condition and if it will find object with lowercase first letter then it should replace it with proper one.

ConstantsValidator.scala
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
  def validateName(source: Source) ={
    val fixedFile: Source = source match {
      case source"..$stats" => source"..${buildNewStatements(stats)}"
    }

    val fw = new FileWriter("src/main/scala/Constants.scala")
    fw.write(fixedFile.syntax)
    fw.close

  }

  private def buildNewStatements(stats: scala.collection.immutable.Seq[Stat]): List[Stat] = {
    stats.foldLeft(List[Stat]())((acc, elem) => elem match {
      case q"..$mods object ${Term.Name(name)} extends $template" =>
        val isFirstLetterOfObjectLowercase = Character.isLowerCase(name.head)
        if(isFirstLetterOfObjectLowercase){
          val newName = name.head.toString.toUpperCase + name.tail
          val objectWithFixedName = q"..$mods object ${Term.Name(newName)} extends $template"
          acc :+ objectWithFixedName
        }else {
          acc :+ q"..$mods object ${Term.Name(name)} extends $template"
        }
      case whatever => acc :+ whatever
    })
  }

Idea is the same as in previous example. We deconstruct step by step another pieces of code, modify some element and construct new modified code. In this example we also save modified code to file so if you run code (and your object constants starts with lowercase letter) then you will see replacing of code. What is worth noticing, code is saved in the same structure, we have written it. That’s one of the feature of scalameta. Invoke this method to check it out:

Main.scala
1
ConstantsValidator.validateName(new java.io.File("src/main/scala/Constants.scala").parse[Source].get)

Example no. 3- Code metrics

Last example is building code review tool. Let’s say that we want some information, some basic statistics of scala objects in project for example number of classes, objects etc.

CodeMetrics.scala
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
object CodeMetrics {
  val allScalaFiles = recursiveListFiles(file("src/")).map(_.parse[Source]).collect{
    case Parsed.Success(tree) => tree}.toList

  val counts = allScalaFiles.foldLeft(Counts.initial)((acc, file) => {
    file match {
      case source"..$whateverItIsInFile" => whateverItIsInFile.foldLeft(acc)((accInFile: Counts, elem) => elem match {
        case q"..$mods object $name extends $template" =>
          accInFile.incObjectNo
        case q"..$mods class $tname[..$tparams] (...$paramss) extends $template" =>
          accInFile.incClassNo
        case q"..$mods trait $tname[..$tparams] extends $template" =>
          accInFile.incTraitNo
        case q"package object $name extends $template" =>
          accInFile.incPackageObjNo
        case _ => accInFile
      })
    }
  })
  ...
}

Where counts is accumulator object, which hold important data for us. In our case it’s number of classes, objects etc.

Counts.scala
1
2
3
4
5
6
7
8
9
10
11
12
package model

case class Counts(classNo: Int, objectNo: Int, traitNo: Int, packageObjNo: Int) {
  def incClassNo      = this.copy(classNo      = this.classNo + 1)
  def incObjectNo     = this.copy(objectNo     = this.objectNo + 1)
  def incTraitNo      = this.copy(traitNo      = this.traitNo + 1)
  def incPackageObjNo = this.copy(packageObjNo = this.packageObjNo + 1)
}

object Counts {
  val initial = Counts(0, 0, 0 ,0)
}

Invoke it to see results. I’ve hardcoded path to be /src

Main.scala
1
println(CodeMetrics.counts)

Full example you might see here

After getting a little bit familiar with quasiquotes Api I hope that this code is very straightforward for you.

Example no. 4- Code review tool

I’ve extended a little bit example from previous paragraph. I wanted to build some nice UI with some statistics about objects. For example: - I want to see dependencies between types in my project - I want to see number of statements in specific type of object - I want to see type of object

For sake of simplicity I’m interested in 3 scala object types: class, object, trait.

My simple solution you can find in this repo:

github

Summary

Presented examples are small and easy to implement. I wanted to give you some ideas how in my opinion scalameta might be used. But you can go further with those ideas. In Ruby on Rails community there is popular statement called ‘convention over configuration’. It means that if you follow some conventions everything works out of the box. Examples:

  • controllers in mvc are in folder ‘controllers’
  • name of controller has to follow some rules (it’s related to route name)
  • data layer has to be names accordingly to schema in db

etc.

Imagine to implement this conventions in some scala framework. Biggest advantage over ruby is that those rules might be checked at compile time. It opens brand new possibilities for scala ecosystem.

Last but not least: I encourage you to preview links below. Thx for reading hope that you’ve found this post useful. Don’t hesitate to text me if you’ve got some comments ;)

Useful links:

Comments