Software:Norconex Web Crawler: Difference between revisions

Norconex Web Crawler
Other names	Norconex HTTP Collector
Developer(s)	Norconex Inc.
Initial release	2016
Stable release	3.0.2 / 2022-01-05
Repository	GitHub Repository
Written in	Java
Operating system	Cross-platform
License	Apache License
Website	Norconex Web Crawler

Latest revision as of 18:41, 15 May 2024

Norconex Web Crawler is a free and open-source web crawling and web scraping Software written in Java and released under an Apache License. It can export data to many repositories such as Apache Solr, Elasticsearch,^[1] Microsoft Azure Cognitive Search, Amazon CloudSearch and more.^[2]^[3]^[4]

The Crawler can be run on its own or embedded in your own Java application.^[5]^[6]

Some key features are:

Multi-threaded
Extract text from a variety of file formats (HTML, PDF, Word, etc.)
Extract metadata associated with documents
Supports pages rendered with JavaScript
Incremental crawls
Supports external commands to parse or manipulate documents
Send extracted data to a variety of repositories

Some well-known companies and products using Norconex Web Crawler are: Apache Solr Ecosystem, Department of National Defence, Universities Canada, U.S. Department of Education, Department of National Defence.^[7] ^[8]

History

Norconex Web Crawler was released as free and open-source software in 2013.^[9]

References

Mentions in Academic Research

Kancherla, Vinay (1 December 2014). "A Smart Web Crawler for a Concept Based Semantic Search Engine (pg. 18)". Master's Projects. doi:10.31979/etd.ubfy-s3es. https://scholarworks.sjsu.edu/etd_projects/380/. Retrieved 28 September 2023.
Horváth, Balázs (28 August 2017) (in en). Recommendation Techniques for smart cities (pg. 12). https://aaltodoc.aalto.fi/handle/123456789/27974. Retrieved 28 September 2023.
Wani, Mudasir Ahmad; Agarwal, Nancy; Jabin, Suraiya; Hussain, Syed Zesahn (2018). "Design of iMacros-based Data Crawler and the Behavioral Analysis of Facebook Users". arXiv:1802.09566 [cs.SI].
Abbasi, Vahid. "Phonetic Analysis and Searching with Google Glass API" (in en). https://uub.primo.exlibrisgroup.com/discovery/fulldisplay?docid=alma991018494504807596&context=L&vid=46LIBRIS_UUB:UUB&lang=en&search_scope=MyInst_and_CI&adaptor=Local%20Search%20Engine&tab=Everything&query=creator,contains,vahid%20abbasi&offset=0.

@@ Line 13: / Line 13: @@
 }}
-'''Norconex Web Crawler''' is a [[Free and open-source software|free and open-source]] web crawling and web scraping Software written in [[Java (programming language)|Java]] and released under an [[Software:Apache License|Apache License]]. It can export data to many repositories such as [[Software:Apache Solr|Apache Solr]], [[Software:Elasticsearch|Elasticsearch]], Microsoft Azure Cognitive Search, Amazon CloudSearch and more.<ref>{{cite web |title=Committers |url=https://opensource.norconex.com/committers/ |website=opensource.norconex.com}}</ref><ref>{{cite web |last1=Hoppa |first1=Jocelyn |title=Importing Data from the Web with Norconex & Neo4j |url=https://neo4j.com/blog/importing-data-from-the-web-norconex-neo4j/ |website=Graph Database & Analytics |language=en |date=10 February 2020}}</ref><ref>{{cite web |title=Deploy a Norconex HTTP Collector Indexer Plugin {{!}} Cloud Search |url=https://developers.google.com/cloud-search/docs/guides/norconex-http-connector |website=Google for Developers |language=en}}</ref>
+'''Norconex Web Crawler''' is a [[Free and open-source software|free and open-source]] web crawling and web scraping Software written in [[Java (programming language)|Java]] and released under an [[Software:Apache License|Apache License]]. It can export data to many repositories such as [[Software:Apache Solr|Apache Solr]], [[Software:Elasticsearch|Elasticsearch]],<ref>{{Cite web |date=Apr 12, 2024 |title=Enhance Your Search Capabilities with Norconex Web Crawler: Indexing Data to Elasticsearch |url=https://ohtwadi.medium.com/enhance-your-search-capabilities-with-norconex-web-crawler-indexing-data-to-elasticsearch-1a3e7b7d3617 |website=Medium}}</ref> Microsoft Azure Cognitive Search, Amazon CloudSearch and more.<ref>{{cite web |title=Committers |url=https://opensource.norconex.com/committers/ |website=opensource.norconex.com}}</ref><ref>{{cite web |last1=Hoppa |first1=Jocelyn |title=Importing Data from the Web with Norconex & Neo4j |url=https://neo4j.com/blog/importing-data-from-the-web-norconex-neo4j/ |website=Graph Database & Analytics |language=en |date=10 February 2020}}</ref><ref>{{cite web |title=Deploy a Norconex HTTP Collector Indexer Plugin {{!}} Cloud Search |url=https://developers.google.com/cloud-search/docs/guides/norconex-http-connector |website=Google for Developers |language=en}}</ref>
-The Crawler can be run on its own or embedded in your own [[Java (programming language)|Java]] application.<ref>{{cite web |last1=Valcheva |first1=Silvia |title=10 Best Open Source Web Crawlers: Web Data Extraction Software |url=https://www.intellspot.com/open-source-web-crawlers/ |website=Blog For Data-Driven Business |date=11 February 2018}}</ref><ref>{{cite web |title=Norconex HTTP Collector |url=https://www.softpedia.com/get/Internet/Other-Internet-Related/Norconex-HTTP-Collector.shtml |website=Softpedia |access-date=25 September 2023}}</ref>
+The Crawler can be run on its own or embedded in your own [[Java (programming language)|Java]] application.<ref>{{cite web |last1=Valcheva |first1=Silvia |title=10 Best Open Source Web Crawlers: Web Data Extraction Software |url=https://www.intellspot.com/open-source-web-crawlers/ |website=Blog For Data-Driven Business |date=11 February 2018}}</ref><ref>{{cite web |title=Norconex HTTP Collector |url=https://www.softpedia.com/get/Internet/Other-Internet-Related/Norconex-HTTP-Collector.shtml |website=Softpedia |date=9 July 2023 |access-date=25 September 2023}}</ref>
 Some key features are:
@@ Line 43: / Line 43: @@
 == See also ==
 * {{cite web |last1=Mitchell |first1=Pete |title=25 Best Free Web Crawler Tools |url=https://techcult.com/best-free-web-crawler-tools/ |access-date=2023-09-05 |website=TechCult |date=8 April 2022}}
+* {{cite web |title=19 Best Web Crawling Tools for Efficient Data Extraction |url=https://crawlbase.com/blog/best-web-crawling-tools/ |access-date=2024-05-10 |website=Crawlbase}}
 [[Category:Web crawlers]]
 {{Sourceattribution|Norconex Web Crawler}}

Anonymous

Search

Software:Norconex Web Crawler: Difference between revisions

Namespaces

More

Page actions

Latest revision as of 18:41, 15 May 2024

Contents

History

References

Mentions in Academic Research

See also

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Software:Norconex Web Crawler: Difference between revisions

Latest revision as of 18:41, 15 May 2024

History

References

Mentions in Academic Research

See also

Navigation

Wiki tools

Page tools

Other projects

Categories