TheDeveloperBlog.com

Home | Contact Us

C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML

<< Back to C-SHARP

C# Get Title From HTML With Regex

Get the HTML title from strings with Regex. Invoke the Regex.Match method.
Title from HTML. HTML documents have title elements. The data in title elements is important. It is used for search-engine optimization and RSS feeds. This simple method extracts the TITLE elements from HTML documents.Paragraph HTML RegexRegex
Example. We can extract the contents of the TITLE element from HTML. This is important for making sure your HTML is correct. After the code, we see the Regex parts in detail and more factors.File.ReadAllText

First: This console application first gets the first TITLE element from the HTML file.

Then: The program prints the title to the console. The application must have the specified HTML file present in the current directory.

Pattern: This looks for a start tag and an end tag. It ignores whitespace between the inner parts of the tags and the string.

C# program that gets TITLE element from HTML using System; using System.IO; using System.Text.RegularExpressions; class Program { static void Main() { // Read in an HTML file. string html = File.ReadAllText("Problem.html"); // Get the title of the HTML. Console.WriteLine(GetTitle(html)); // End. Console.ReadLine(); } /// <summary> /// Get title from an HTML string. /// </summary> static string GetTitle(string file) { Match m = Regex.Match(file, @"<title>\s*(.+?)\s*</title>"); if (m.Success) { return m.Groups[1].Value; } else { return ""; } } } Output Title of the Page Pattern: @ Uses special string syntax. \s* Matches 0 or more spaces. (.+?) Matches text but isn't greedy. Stops as soon as it can. \s* Matches 0 or more spaces. Match C# regular expression object. Groups[1] First group found in input. Starts at 1. Value String value of Group.
Errors. This code is not flexible enough for some HTML documents. For example, the program won't work for complicated HTML, such as HTML that heavily uses attributes. But the code that matches TITLE should work for all XHTML.

Also: They assume the HTML is lowercase, although this could be easily changed.

Paragraphs: You can use regular expressions like these for reading important elements from your HTML.

Summary. We can capture the contents of the TITLE and paragraph elements from HTML documents using the C# language. Every webmaster should know that the TITLE is important. This helper method makes it easier to process.
© TheDeveloperBlog.com
The Dev Codes

Related Links:


Related Links

Adjectives Ado Ai Android Angular Antonyms Apache Articles Asp Autocad Automata Aws Azure Basic Binary Bitcoin Blockchain C Cassandra Change Coa Computer Control Cpp Create Creating C-Sharp Cyber Daa Data Dbms Deletion Devops Difference Discrete Es6 Ethical Examples Features Firebase Flutter Fs Git Go Hbase History Hive Hiveql How Html Idioms Insertion Installing Ios Java Joomla Js Kafka Kali Laravel Logical Machine Matlab Matrix Mongodb Mysql One Opencv Oracle Ordering Os Pandas Php Pig Pl Postgresql Powershell Prepositions Program Python React Ruby Scala Selecting Selenium Sentence Seo Sharepoint Software Spellings Spotting Spring Sql Sqlite Sqoop Svn Swift Synonyms Talend Testng Types Uml Unity Vbnet Verbal Webdriver What Wpf