TheDeveloperBlog.com

Home | Contact Us

C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML

Talend Centralizing Excel File Metadata

Talend Centralizing Excel File Metadata with What is Talend, Installation, Open Studio, Working with Project, Studio Data Integration Feature, Studio Data Integration Architecture, Data Integration etc.

<< Back to TALEND

Centralizing Excel File Metadata

In this section, we will learn how to centralize Excel File Metadata in Talend Studio for Data integration platform.

Before going further in this chapter first, we will understand why we will use Excel Files.

If we want to connect to an Excel File, we will centralize the connection and schema information in the Repository for reusability.

To create the Excel File connection form the beginning:

  • Go to the Repository panel.
  • Then expand the Metadata and right-click on the File Excel, then select Create File Excel option in the popup menu, as we can see in the below image:

Repository → Metadata → File Excel → Create file Excel

Centralizing Excel File Metadata

Note: To use the centralized Excel File in our job, go to the basic setting view of the necessary components with its property typeset as build-in for opening the File Metadata setup window.

Then New Excel File window will open where both the File connection and schema definitions are completed in four steps:

  • Define General properties
  • Defining File path and Format
  • Define File Parsing Parameters
  • Checking and customizing the File schema

Step1: Defining General Properties

In the first step, we will fill all the necessary details like Name, Purpose, and Description.

We can also manage the version and status fields of a Repository item in the project setting dialog box.

Click on the Select button next to the Path field for selecting a folder under the File Excel node to hold our newly created File connection.

Note: we cannot select a folder if we are editing an existing connection, but we can drag and drop it to a new folder whenever we want.

After filling all the details of general properties, click on the Next button.

Centralizing Excel File Metadata

Step2: Defining File path and Format

In the next step, we will click on the Browse button to locate our File from the local system.

Note: if we are saving an excel File connection that is defined in a component, the File path will be automatically in the File field.

For example, we will select the New_Excel.xlsx File from our system.

  • Ensure that the Read excel2007 file format(xlsx) checkbox is selected if the uploaded file is an Excel 2007 File.
  • If the uploaded xlsx File is extremely large, then select Less memory consumed for large excel (Event mode) from the Generation Mode list to stop out-of-memory errors; otherwise, the Memory-consumed (User mode) is selected by default.
  • We have the File Viewer and Sheets setting area, where we can see the file content and the one or more sheets.
  • We can select the sheet which we want to view from the Please select sheet dropdown list, and the Preview table shows the content of the selected sheet.

Note: the File preview table shows the first sheet of the File, by default.

  • We can also select one or more sheets, which we want to upload from the Set sheets parameters
  • The resulting schema will represent the combination of the structures of all the selected sheets if we choose more than one sheet.
Centralizing Excel File Metadata
  • After that, click on the Next button to process further.

Step3: File Parsing Parameters

In this step, we will describe the multiple settings of File parsing to recover the File schema properly.

  • In the File Settings area, we can set the Encoding type based on the uploaded file.
  • In the Rows To Skip section, we can skip the row if the Excel file has the Header or Footer as we can see in the below screenshot:
Centralizing Excel File Metadata
  • The Metadata column setting area represents the columns, which has to be read in the file by filling the First column and Last column fields with integers.
    For Example: if we want to skip the First column as it may not have the correct data, which has to be processed, put 2 in the First column, and to set the second column of the file as the first column of the schema.
    But in our case, we put 1 in the First column, as we can see in the below image:
Centralizing Excel File Metadata
  • We did not need to parse all the rows of the file, mainly when we have uploaded a large file while retrieving the schema of an Excel file. We can set the desired number of rows by selecting the Limit checkbox in Limit Of Rows section.
Centralizing Excel File Metadata
  • If our excel file has the header row, select set handling row as column names checkbox to take into account heading names.
  • To see the effect and result view, on the viewer, click on the Refresh Preview button as we can see in the below image:
Centralizing Excel File Metadata

And, click on the Next button.

Step4: Checking and Customizing the File schema

In the last step, we will check and customize the File schema.

  • To customize the File schema, check the data type in the Type column, which is correct or not.
  • The Guess button is used to update and recover the Excel File schema.

Note: If we have customized schema, the Guess feature does not keep these changes.

  • After that, click on the Finish button to complete the process, as we can see in the below image:
Centralizing Excel File Metadata

To see the newly created Metadata in the Talend studio:

  • Go to the Repository panel, then go to Metadata.
  • After that, expand the File Excel node, and select the New_excel metadata, as we can see in the below screenshot:

Repository → Metadata → File Excel → New_excel

Centralizing Excel File Metadata

To reuse the Metadata as a new component or the existing component, simply drag the File connection or schema from the Repository's Metadata node and drop it to the design workspace window.

For modifying the existing File connection:

  • Go to the Repository panel, then go to the Metadata node.
  • After that, expand the File Excel, and right-click on the schema and select Edit File Excel as we can see in the below image:
Centralizing Excel File Metadata

For adding a new schema to an existing File connection:

  • Go to the Repository panel, and right-click on the File Excel.
  • Select Retrieve Schema from the popup menu in the Metadata, as we can see in the below image:
Centralizing Excel File Metadata




Related Links:


Related Links

Adjectives Ado Ai Android Angular Antonyms Apache Articles Asp Autocad Automata Aws Azure Basic Binary Bitcoin Blockchain C Cassandra Change Coa Computer Control Cpp Create Creating C-Sharp Cyber Daa Data Dbms Deletion Devops Difference Discrete Es6 Ethical Examples Features Firebase Flutter Fs Git Go Hbase History Hive Hiveql How Html Idioms Insertion Installing Ios Java Joomla Js Kafka Kali Laravel Logical Machine Matlab Matrix Mongodb Mysql One Opencv Oracle Ordering Os Pandas Php Pig Pl Postgresql Powershell Prepositions Program Python React Ruby Scala Selecting Selenium Sentence Seo Sharepoint Software Spellings Spotting Spring Sql Sqlite Sqoop Svn Swift Synonyms Talend Testng Types Uml Unity Vbnet Verbal Webdriver What Wpf