Software:Norconex Web Crawler: Difference between revisions

From HandWiki
(import)
(import)
 
Line 13: Line 13:
}}
}}


'''Norconex Web Crawler''' is a [[Free and open-source software|free and open-source]] web crawling and web scraping Software written in [[Java (programming language)|Java]] and released under an [[Software:Apache License|Apache License]]. It can export data to many repositories such as [[Software:Apache Solr|Apache Solr]], [[Software:Elasticsearch|Elasticsearch]], Microsoft Azure Cognitive Search, Amazon CloudSearch and more.<ref>{{cite web |title=Committers |url=https://opensource.norconex.com/committers/ |website=opensource.norconex.com}}</ref><ref>{{cite web |last1=Hoppa |first1=Jocelyn |title=Importing Data from the Web with Norconex & Neo4j |url=https://neo4j.com/blog/importing-data-from-the-web-norconex-neo4j/ |website=Graph Database & Analytics |language=en |date=10 February 2020}}</ref><ref>{{cite web |title=Deploy a Norconex HTTP Collector Indexer Plugin {{!}} Cloud Search |url=https://developers.google.com/cloud-search/docs/guides/norconex-http-connector |website=Google for Developers |language=en}}</ref>
'''Norconex Web Crawler''' is a [[Free and open-source software|free and open-source]] web crawling and web scraping Software written in [[Java (programming language)|Java]] and released under an [[Software:Apache License|Apache License]]. It can export data to many repositories such as [[Software:Apache Solr|Apache Solr]], [[Software:Elasticsearch|Elasticsearch]],<ref>{{Cite web |date=Apr 12, 2024 |title=Enhance Your Search Capabilities with Norconex Web Crawler: Indexing Data to Elasticsearch |url=https://ohtwadi.medium.com/enhance-your-search-capabilities-with-norconex-web-crawler-indexing-data-to-elasticsearch-1a3e7b7d3617 |website=Medium}}</ref> Microsoft Azure Cognitive Search, Amazon CloudSearch and more.<ref>{{cite web |title=Committers |url=https://opensource.norconex.com/committers/ |website=opensource.norconex.com}}</ref><ref>{{cite web |last1=Hoppa |first1=Jocelyn |title=Importing Data from the Web with Norconex & Neo4j |url=https://neo4j.com/blog/importing-data-from-the-web-norconex-neo4j/ |website=Graph Database & Analytics |language=en |date=10 February 2020}}</ref><ref>{{cite web |title=Deploy a Norconex HTTP Collector Indexer Plugin {{!}} Cloud Search |url=https://developers.google.com/cloud-search/docs/guides/norconex-http-connector |website=Google for Developers |language=en}}</ref>


The Crawler can be run on its own or embedded in your own [[Java (programming language)|Java]] application.<ref>{{cite web |last1=Valcheva |first1=Silvia |title=10 Best Open Source Web Crawlers: Web Data Extraction Software |url=https://www.intellspot.com/open-source-web-crawlers/ |website=Blog For Data-Driven Business |date=11 February 2018}}</ref><ref>{{cite web |title=Norconex HTTP Collector |url=https://www.softpedia.com/get/Internet/Other-Internet-Related/Norconex-HTTP-Collector.shtml |website=Softpedia |access-date=25 September 2023}}</ref>
The Crawler can be run on its own or embedded in your own [[Java (programming language)|Java]] application.<ref>{{cite web |last1=Valcheva |first1=Silvia |title=10 Best Open Source Web Crawlers: Web Data Extraction Software |url=https://www.intellspot.com/open-source-web-crawlers/ |website=Blog For Data-Driven Business |date=11 February 2018}}</ref><ref>{{cite web |title=Norconex HTTP Collector |url=https://www.softpedia.com/get/Internet/Other-Internet-Related/Norconex-HTTP-Collector.shtml |website=Softpedia |date=9 July 2023 |access-date=25 September 2023}}</ref>


Some key features are:  
Some key features are:  
Line 43: Line 43:
== See also ==
== See also ==
* {{cite web |last1=Mitchell |first1=Pete |title=25 Best Free Web Crawler Tools |url=https://techcult.com/best-free-web-crawler-tools/ |access-date=2023-09-05 |website=TechCult |date=8 April 2022}}
* {{cite web |last1=Mitchell |first1=Pete |title=25 Best Free Web Crawler Tools |url=https://techcult.com/best-free-web-crawler-tools/ |access-date=2023-09-05 |website=TechCult |date=8 April 2022}}
* {{cite web |title=19 Best Web Crawling Tools for Efficient Data Extraction |url=https://crawlbase.com/blog/best-web-crawling-tools/ |access-date=2024-05-10 |website=Crawlbase}}


[[Category:Web crawlers]]
[[Category:Web crawlers]]


{{Sourceattribution|Norconex Web Crawler}}
{{Sourceattribution|Norconex Web Crawler}}

Latest revision as of 18:41, 15 May 2024

Norconex Web Crawler
Other namesNorconex HTTP Collector
Developer(s)Norconex Inc.
Initial release2016
Stable release
3.0.2 / 2022-01-05
RepositoryGitHub Repository
Written inJava
Operating systemCross-platform
LicenseApache License
WebsiteNorconex Web Crawler

Norconex Web Crawler is a free and open-source web crawling and web scraping Software written in Java and released under an Apache License. It can export data to many repositories such as Apache Solr, Elasticsearch,[1] Microsoft Azure Cognitive Search, Amazon CloudSearch and more.[2][3][4]

The Crawler can be run on its own or embedded in your own Java application.[5][6]

Some key features are:

  • Multi-threaded
  • Extract text from a variety of file formats (HTML, PDF, Word, etc.)
  • Extract metadata associated with documents
  • Supports pages rendered with JavaScript
  • Incremental crawls
  • Supports external commands to parse or manipulate documents
  • Send extracted data to a variety of repositories

Some well-known companies and products using Norconex Web Crawler are: Apache Solr Ecosystem, Department of National Defence, Universities Canada, U.S. Department of Education, Department of National Defence.[7] [8]

History

Norconex Web Crawler was released as free and open-source software in 2013.[9]

References

Mentions in Academic Research

See also