Recently I had to move a large number of articles from WordPress to .NET Core project as a part of the redevelopment of the existing web portal. I got an XML file that contains all the article's content with WordPress standard fields.
Problem
The problem I found in XML file was with the main content of the articles. All the content that has YouTube video was holding YouTube URLs instead of embed code. You can see the part of the content I received in the below screenshot.
Solution
To solve this problem, I choose RegEx. I prepared a RegEx pattern to match with YouTube video URL and replaced matched content with embedded code.
Here is my code that replaces YouTube video URL with embed code.
const string regexPattern = @"(?:https?:\/\/)?(?:www\.)?(?:(?:(?:youtube.com\/watch\?[^?]*v=|youtu.be\/)([\w\-]+))(?:[^\s?]+)?)";
const string embedCode = "<iframe width='750' height='422' src='https://www.youtube.com/embed/$1' frameborder='0' allowfullscreen='1' allow='accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture'></iframe>";
var regex = new Regex(regexPattern);
var resultContent = regex.Replace(Model.BodyContent.ToString(), embedCode);
You can see how it looks before and after in this screenshot,
Summary
I hope this will be useful to others also who are having similar issues during content migration from WordPress to .NET.
Thank you for reading.