Introduction
Its an ETL framework for .net, .net standard and core. Simple, intuitive Extract, transform and load (ETL) library for .NET. Extremely fast, flexible, and easy to use.
Cinchoo ETL is a code-based ETL framework for extracting data from multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data warehouse in no time.
This article talks about generating CSV file from complex/netsed JSON using Cinchoo ETL framework. It is very simple to use and fewer lines of code. You can convert large files as the conversion process is stream based, quite fast and with low memory footprint. It supports many configurations. you can refer more there gitrepo.
Install
To install Cinchoo ETL (.NET Framework), run the following command in the Package Manager Console,
PM> Install-Package ChoETL
To install Cinchoo ETL (.NET Standard), run the following command in the Package Manager Console,
PM> Install-Package ChoETL.NETStandard
You can also search the respective name in nuget package manager.
Add namespace to the program.
Start with Simple JSON
Let's start with simple example of converting the below JSON input file.
- {
- "fruit":[
- {
- "name":"Apple",
- "binomial name":"Malus domestica",
- "major_producers":[
- "China",
- "United States",
- "Turkey"
- ],
- "nutrition":{
- "carbohydrates":"13.81g",
- "fat":"0.17g",
- "protein":"0.26g"
- }
- },
- {
- "name":"Orange",
- "binomial name":"Citrus x sinensis",
- "major_producers":[
- "Brazil",
- "United States",
- "India"
- ],
- "nutrition":{
- "carbohydrates":"11.75g",
- "fat":"0.12g",
- "protein":"0.94g"
- }
- } ]
- }
Convert JSON file to CSV format with following code.
- var sampleJson = File.ReadAllText("Folder Path/sample.json");
-
- StringBuilder stringBuilder = new StringBuilder();
- using (var fruitData = ChoJSONReader.LoadText(sampleJson)
- .WithJSONPath("$..fruit[*]")
- )
- {
- using (var w = new ChoCSVWriter(stringBuilder)
- .WithFirstLineHeader()
- .Configure(c => c.MaxScanRows = 1)
- .Configure(c => c.ThrowAndStopOnMissingField = false)
- )
- {
- w.Write(fruitData);
- }
- }
-
- File.WriteAllText("Folder Path/fruitData.csv", stringBuilder.ToString());
The CSV formatted output is following
- name,binomial name,major_producers_0,major_producers_1,major_producers_2,nutrition_carbohydrates,nutrition_fat,nutrition_protein
- Apple,Malus domestica,China,United States,Turkey,13.81g,0.17g,0.26g
- Orange,Citrus x sinensis,Brazil,United States,India,11.75g,0.12g,0.94g
Start with Complex/Nested/Dynamic JSON
Let's start with a simple example of converting the below JSON input file.
- {
- "data": {
- "getUsers": [
- {
- "UserProfileDetail": {
- "UserStatus": {
- "name": "User One"
- },
- "UserStatusDate": "2018-10-31T06:12:42+00:00",
- "EnrollId": "am**********************************",
- "lastDate": "2019-07-22T03:05:39.0245313-04:00"
- },
- "UserInformation": {
- "Id": 1111122,
- "firstName": "*****",
- "middleName": null,
- "lastName": "*****",
- "otherNames": null,
- "UserType": {
- "name": "CP"
- },
- "primaryState": "MA",
- "otherState": [
- "MA",
- "BA",
- "DL",
- "RJ"
- ],
- "UserLicense": [
- {
- "licenseState": "MA",
- "licenseNumber": "000000000",
- "licenseStatus": null,
- "aaaaaaaaaaaaaaaaa": "only one"
- }
- ],
- "Setting": "ADMINISTRATIVE",
- "primaryEmail": "*****@*****.com",
- "modifiedAt": null,
- "createdAt": null
- }
- },
- {
- "UserProfileDetail": {
- "UserStatus": {
- "name": "User Two"
- },
- "UserStatusDate": "2019-10-31T06:12:42+00:00",
- "EnrollId": "am**********************************",
- "lastDate": "2019-07-22T03:05:39.0245313-04:00"
- },
- "UserInformation": {
- "Id": 443333,
- "firstName": "*****",
- "middleName": "Jhon",
- "lastName": "*****",
- "otherNames": null,
- "UserType": {
- "name": "AP"
- },
- "primaryState": "AK",
- "otherState": [
- "MP",
- "CLT"
- ],
- "UserLicense": [
- {
- "licenseState": "KL",
- "licenseNumber": "000000220",
- "licenseStatus": "Valid"
- }
- ],
- "Setting": "ADMINISTRATIVE",
- "primaryEmail": "*****@*****.com",
- "modifiedAt": null,
- "createdAt": null
- }
- }
- ]
- },
- "errors": [
- {
- "message": "GraphQL.ExecutionError: 13614711 - NO__DATA",
- "extensions": {
- "code": "212"
- }
- },
- {
- "message": "GraphQL.ExecutionError: 13614712 - NO__DATA",
- "extensions": {
- "code": "212"
- }
- },
- {
- "message": "GraphQL.ExecutionError: Cannot return null for non-null type. Field: PrivilegeFlag, Type: Boolean!.\r\n at GraphQL.Execution.ExecutionStrategy.ValidateNodeResult(ExecutionContext context, ExecutionNode node)\r\n at GraphQL.Execution.ExecutionStrategy.ExecuteNodeAsync(ExecutionContext context, ExecutionNode node)",
- "locations": [
- {
- "line": 374,
- "column": 9
- }
- ],
- "path": [
- "getUsers",
- "3",
- "UserAffiliation",
- "0",
- "Affiliation",
- "admittingPrivilegeFlag"
- ],
- "extensions": {
- "code": "ID: 1454790"
- }
- },
- {
- "message": "GraphQL.ExecutionError: Cannot return null for non-null type. Field: admittingArrangementFlag, Type: Boolean!.\r\n at GraphQL.Execution.ExecutionStrategy.ValidateNodeResult(ExecutionContext context, ExecutionNode node)\r\n at GraphQL.Execution.ExecutionStrategy.ExecuteNodeAsync(ExecutionContext context, ExecutionNode node)",
- "locations": [
- {
- "line": 403,
- "column": 9
- }
- ],
- "path": [
- "getUsers",
- "3",
- "UserAffiliation",
- "0",
- "Affiliation",
- "admittingArrangementFlag"
- ],
- "extensions": {
- "code": "ID: 13614790"
- }
- }
- ]
- }
As JSON message is hierarchical and structural format, you will have to flatten out in order to product CSV file.
List out the difficulties/Challenge of json,
- Two different objects ‘Data’ and ‘Error’ and different types.
- The ‘Data’ object always dynamic
- 'otherState' have a different count in each data array
- The ‘error’ has different data.
Steps
- Create separate csv for ‘Data’ and ‘Error’.
- Set Configure > MaxScanRows based on ‘data’ count. Used to scan csv rows
- Set equal row count for each csv
- Merge csv and write it
The sample code is,
-
- var jsonString = File.ReadAllText("CAQH_Sample.json");
- StringBuilder csvPractitioners = new StringBuilder();
-
-
- using (var practitioners = ChoJSONReader.LoadText(jsonString)
- .WithJSONPath("$..getUsers[*]")
- .Configure(c => c.JsonSerializerSettings = new JsonSerializerSettings
- {
- DateParseHandling = DateParseHandling.None,
- DateTimeZoneHandling = DateTimeZoneHandling.Utc,
- Formatting = Formatting.Indented
- })
- )
- {
-
- var arrPractitioners = practitioners.ToArray();
- int practitionersCount = arrPractitioners.Length;
-
- using (var w = new ChoCSVWriter(csvPractitioners)
- .WithFirstLineHeader()
- .Configure(c => c.MaxScanRows = practitionersCount)
- .Configure(c => c.ThrowAndStopOnMissingField = false)
- )
- {
- w.Write(arrPractitioners);
- }
- }
-
-
- StringBuilder csvErrors = new StringBuilder();
- using (var errors = ChoJSONReader.LoadText(jsonString)
- .WithJSONPath("$..errors[*]")
- .WithField("errors_message", jsonPath: "$.message", isArray: false)
- .WithField("errors_extensions_code", jsonPath: "$.extensions.code", isArray: false)
- .WithField("errors_locations", jsonPath: "$.locations[*]", isArray: false)
- .WithField("errors_path", jsonPath: "$.path[*]"))
- {
- var arrError = errors.ToArray();
- int errorCount = arrError.Length;
-
- using (var w = new ChoCSVWriter(csvErrors)
- .WithFirstLineHeader()
- .Configure(c => c.MaxScanRows = errorCount)
- .Configure(c => c.ThrowAndStopOnMissingField = false)
- )
- {
- w.Write(arrError);
- }
- }
-
- char separator = ',';
-
- var objPractitioners = csvPractitioners.ToString().Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries).AsEnumerable();
-
- var contents = Regex.Replace(csvErrors.ToString(), "\"[^\"]*(?:\"\"[^\"]*)*\"", m => m.Value.Replace("\n", "").Replace("\r", ""));
- var objErrors = contents.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries).AsEnumerable();
-
- string defaultValue = string.Empty;
- int cnt = 0;
- if (objPractitioners.Count() < objErrors.Count())
- {
- cnt = objPractitioners.FirstOrDefault()?.Split(separator).Length ?? 0;
- }
- else
- {
- cnt = objErrors.FirstOrDefault()?.Split(separator).Length ?? 0;
- }
- if (cnt > 0)
- defaultValue = defaultValue.PadLeft(cnt - 1, separator);
-
- var fullString = objPractitioners.Merge(objErrors, defaultValue, (f, s) => string.Join(separator.ToString(), f, s)).ToArray();
- File.AppendAllLines("folderpath/OutputSample.csv", fullString);
Here Merge is Extensions methods,
- public static IEnumerable<T> Merge<T>(this IEnumerable<T> first,
- IEnumerable<T> second, T defaultValue, Func<T, T, T> operation)
- {
- using (var iter1 = first.GetEnumerator())
- using (var iter2 = second.GetEnumerator())
- {
- while (iter1.MoveNext())
- {
- if (iter2.MoveNext())
- {
- yield return operation(iter1.Current, iter2.Current);
- }
- else
- {
- yield return operation(iter1.Current, defaultValue);
- }
- }
- while (iter2.MoveNext())
- {
- yield return operation(defaultValue, iter2.Current);
- }
- }
- }
The output is,
- UserInformation_Id,UserInformation_firstName,UserInformation_UserType_name,UserInformation_primaryState,UserInformation_otherState_0,UserInformation_otherState_1,UserInformation_otherState_2,UserInformation_otherState_3,UserInformation_otherState_4,UserInformation_createdAt,UserInformation_lastUpdatedDate,errors_message,errors_extensions_code,errors_locations_line,errors_locations_column,errors_path_0,errors_path_1,errors_path_2,errors_path_3,errors_path_4,errors_path_5
- 1111122,*****1,CP,MA,MA,BA,,,,,2019-04-03T07:49:05.2827076-04:00,GraphQL.ExecutionError: 13614711 - NO__DATA,212,,,,,,,,
- 222222,*****2,CPP,MPA,KL,TN,DL,AP,RJ,,2019-12-03T07:50:05.2827076-05:00,GraphQL.ExecutionError: 13614712 - NO__DATA,212,,,,,,,,
- 33333,*****3,CPP,MPA,KL,TN,DL,AP,RJ,,2019-12-03T07:50:05.2827076-05:00,"GraphQL.ExecutionError: Cannot return null for non-null type. Field: PrivilegeFlag, Type: Boolean!. at GraphQL.Execution.ExecutionStrategy.ValidateNodeResult(ExecutionContext context, ExecutionNode node) at GraphQL.Execution.ExecutionStrategy.ExecuteNodeAsync(ExecutionContext context, ExecutionNode node)",ID: 1454790,374,9,getUsers,3,UserAffiliation,0,Affiliation,admittingPrivilegeFlag
- 44444,*****4,CPP,MPA,KL,TN,DL,AP,RJ,,2019-12-03T07:50:05.2827076-05:00,"GraphQL.ExecutionError: Cannot return null for non-null type. Field: admittingArrangementFlag, Type: Boolean!. at GraphQL.Execution.ExecutionStrategy.ValidateNodeResult(ExecutionContext context, ExecutionNode node) at GraphQL.Execution.ExecutionStrategy.ExecuteNodeAsync(ExecutionContext context, ExecutionNode node)",ID: 13614790,403,9,getUsers,3,UserAffiliation,0,Affiliation,admittingArrangementFlag
- 555555,*****3,CPP,MPA,KL,TN,DL,AP,RJ,,2019-12-03T07:50:05.2827076-05:00,,,,,,,,,,
Note Try with latest pre release version ChoETL