TheDeveloperBlog.com

Home | Contact Us

C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML

<< Back to PYTHON

Python Urllib Usage: Urlopen, UrlParse

Use urllib: open an HTML file from an Internet site and parse a URL.
Urllib. Data is often stored on the Internet. In these locations a Python program cannot access it directly. It must instead make an external request. With urllib we make these requests: we download external files.
Urlopen. There are some complexities to opening a website in Python. First we must import, from the urllib library, the urlopen method. This is the first line of the Python file. The program next calls urlopen().

Note: The argument is the location of the web page. We next call decode() on the line. This fixes some of the data.

Tip: The last argument to print is end="". This fixes some programs with double line breaks at the end of lines.

Python program that reads lines from Internet site from urllib.request import urlopen # Print first four lines of this site. i = 0 for line in urlopen("http://www.dotnetCodex.com/"): # Decode. line = line.decode() # Print. print(i, line, end="") # See if past limit. if i == 3: break i += 1 Output 0 <!doctype html><html><head><link rel=canonical 1 href=http://www.dotnetCodex.com><link rel=stylesheet 2 href=1><title>The Dev Codes</title><meta 3 name=description
Parse. Web locations usually begin in http or https. These are called URLs or URIs. In Python we use the urllib.parse module to access the urlparse type. And here we parse a URL—we choose the English Wikipedia.
Then we access some fields from the parsed URL object result. The scheme is the "http" part, with no punctuation. The netloc is the domain itself: it has periods but no leading or trailing punctuation.

And: The path is the location on the domain. We use it on the root page here, so the path is simply a forward-slash "/."

Tip: There are more fields on the ParseResult. You can just print the ParseResult and all the fields will be printed. This helps discovery.

Python program that uses urlparse from urllib.parse import urlparse # Parse this url. result = urlparse("http://en.wikipedia.org/") # Get some values from the ParseResult. scheme = result.scheme loc = result.netloc path = result.path # Print our values. print(scheme) print(loc) print(path) Output http en.wikipedia.org /
Summary. Python finds heavy usage on the Internet. It works both on the server and on desktop systems. It can fetch external files or web pages. But the complexity of Python programs increases when external files are necessary.Standard Library: Python.org

Because: External files cause errors. Sometimes they are not found. Other times they are in an invalid format.

© TheDeveloperBlog.com
The Dev Codes

Related Links:


Related Links

Adjectives Ado Ai Android Angular Antonyms Apache Articles Asp Autocad Automata Aws Azure Basic Binary Bitcoin Blockchain C Cassandra Change Coa Computer Control Cpp Create Creating C-Sharp Cyber Daa Data Dbms Deletion Devops Difference Discrete Es6 Ethical Examples Features Firebase Flutter Fs Git Go Hbase History Hive Hiveql How Html Idioms Insertion Installing Ios Java Joomla Js Kafka Kali Laravel Logical Machine Matlab Matrix Mongodb Mysql One Opencv Oracle Ordering Os Pandas Php Pig Pl Postgresql Powershell Prepositions Program Python React Ruby Scala Selecting Selenium Sentence Seo Sharepoint Software Spellings Spotting Spring Sql Sqlite Sqoop Svn Swift Synonyms Talend Testng Types Uml Unity Vbnet Verbal Webdriver What Wpf