C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
Note: This function uses the regular expression library included in the .NET Framework.
GetFirstParagraph: This uses the static Regex.Match method declared in the System.Text.RegularExpressions namespace.
Info: The Regex looks for the characters < and > with the letter p in between them. It then skips zero or more whitespace characters inside those tags.
Finally: It captures the minimum number of characters between the start tag and end tag. Both tags must be found for the match to proceed.
C# program that matches paragraph from HTML
using System;
using System.IO;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// Read in an HTML file.
string html = File.ReadAllText("Problem.html");
// Get the first paragraph.
Console.Write(GetFirstParagraph(html));
// End.
Console.ReadLine();
}
/// <summary>
/// Get first paragraph between P tags.
/// </summary>
static string GetFirstParagraph(string file)
{
Match m = Regex.Match(file, @"<p>\s*(.+?)\s*</p>");
if (m.Success)
{
return m.Groups[1].Value;
}
else
{
return "";
}
}
}
Output
This is the first paragraph...