CS CODEDOM Parser


CS CODEDOM Parser is utility which parses the C# source code and creates the CODEDOM tree of the code (general classes that represent code, part of .NET Framework - namespace System.CodeDom).

Current version (0.1) is limited - it parses code down to type members and their parameters, it has very limited support for expressions and it does not parses the statements inside members. There are two main reasons, why I stayed now on this level now

  • First - It was enough for my needs (I wanted to do some code analysis to enforce coding standards

  • Second - CODEDOM is limited and cannot express fully the C# code - for more details see section CODEDOM Limitations below.

    On the other hand it also parses source code comments, so it can be used to analyze the interdependencies of code and comments.

Also the stability of this version is low - it's kind of alpha version. If anybody wants to help get this thing further he is welcomed.

The parser is based on Mono - CSharp Compiler code . I was looking around little bit around for available C# parser and C# parser building tools (I wanted C# parser  in C#) and finally decided for Mono. For more details about exploitation of Mono parser and other possibilities I explored  see section C# parser Tools.

CODEDOM Limitations

At first I thought it is great idea to use language independent syntax tree and CodeDom looks nice. If some code analysis tool is build on it, it can work for any .NET language. Just need to change parser and rest is the same, sounds cool. But, after I've got into the CodeDom, I have found that a lot of language features (and not just C#, basically for any language) is missing and it is not possible to parse the source code fully. The main problem is in expressions and statements, where CodeDom has very limited set of classes - there is for instance no support for unary operation and more more issues.

I decided to continue with CodeDom, even with its limitations, because it was enough for purposes of analyzing code for coding standards (at least what I need now - it also enables to keep comments and code in one tree, which is something I liked), but it is open issue for the future development.

Here is list of issues I've found (and there is more,):

CodeCompile unit does not have space for using directives or ns members, so they are placed now into first default NS.

using_alias_directive - no support found
nested namespaces - no support found ( so parser is flattening ns hierarchy)
variable declaration list (int i,j,k;) - no support - transformed to individual var declarations
pointer_type - no support found
"jagged" array type (array of arrays) - MS CSharpCodeProvider reverses order of ranks
params keyword - not supported - param is omitted in parsing and param is then an ordinary array type param
private modifier on nested delegate is not shown by CSharpCodeProvider (all other nested types works fine)
unsafe modifier - no support found
readonly modifier - no support found
volatile modifier - no support found
explicit interface implementation - not implemented yet (I think this can be done)
add and remove accessors for Event - no support found
virtual and override modifiers do not work in MS CSharpCodeProvider for events
Operator members and Destructors - no support found
Expressions - no unary expressions(operations) at all !!!, only one dim arrays, some operators not supported and more
Attribute targets : no support found
Attributes on accessor : no support found

If CompileUnit contains custom attributes in global scope, CSSharpCodeProvider prints then before global using directives (it is due to that using has to be in the first ns)

C# Parser tools

I wanted to use some existing tool so I looked around and  found this interesting stuff :

  • Mono project
    They are implementing complete open source .NET platform (they modified jay parser generator and used it to generate the parser).

  • Compiler Writing Tools using C#, from Malcolm Crowe of  the University of Paisley
    Mr.Crowe creates parser and lexer generator in C#. I was playing with these tools quite a bit, but when I wanted to do something bigger, I've got stuck.

  • C# grammar for flex/bison written by James Power of National University of Ireland
    Contains scripts for well-known tools bison and flex, which can generate C parser. I thought I can use then in some C# port of those tools, but I was not able, so finally  used the grammar from Mono.


  • jb2csharp
    This is port of JB Parser and Lexer Generation for Java (which itself is port of bison and flex). But the current version is alpha and I was not able to make work even their calculator example (which authors claim it was working).


  • CsLex from Brad Merrill
    It is a lexer generator.


  • I've also looked at the MS Rotor project, the C# parser there is in C++ (and it is not Open Source license). 

So finally I decided to use Mono source, I've used their lexer,  jay and their jay grammar to generate my parser. It the jay grammar I've use my code to create CodeDom objects.

Description of package

CS CODEDOM Parser package consist of :

  • CodeDom parser itself (/ directory)

  • NUnit tests for the parser (/NUnitTests directory)
    Contains bunch of tests, I've used to check functionality of the parser - if you want to run then you should have NUnit.

  • testParser (/testParser directory)
    Simple command line utility that tests the parser - it  parses file (name supplied as cmd line parameter) and write to stdout the code, which is generated by CSharpCodeProvider  (class in CodeDom).

  • CodeTreeView (/CodeTreeView directory)
    Simple windows application, which opens file and displays CODEDOM tree in left part (treeview control) and original source in right part (textbox control). When you click on tree node, textbox scrolls to show the code. It is something like very very simple source code viewer.

Licence

CS CODEDOM Parser and tools included in this package are distributed under the under GPL licence.

Download

You can download source code CsCodeParserSource.zip. Debug binaries are also part of the package.

The Future

The basic idea about future development is to extend CodeDom to support all language features, so the sources can be completely parsed. (Alternative is to leave CodeDom and have own syntax tree, but I still like the idea of the independent language tree structure, which can be used in different tasks).

Reporting of errors and warnings should be improved (unify codes and messages, unify error reporting, Report class should store reported errors).

Also parser should be improved to indicate location of syntax elements more exactly in the source file.

Better separation between the parser and CODEDOM builder is also needed.

If somebody likes the tool and wants to help with its improvements, he is welcomed.


Similar Articles