API Documentation

The WWW Utils Module (tendril.utils.www)

This module provides utilities to deal with the internet. All application code should access the internet through this module, since this where support for proxies and caching is implemented.

This module provides three main approaches to handling access to internet resources :

urllib based access (tendril.utils.www.bare)

urlopen(url)

Opens a url specified by the url parameter.

get_soup(url)

Gets a bs4 parsed soup for the url specified by the parameter.

cached_fetcher

Subclass of CacheBase to handle catching of url fetch responses.

requests based access (tendril.utils.www.req)

get_soup_requests(url[, session])

Gets a bs4 parsed soup for the url specified by the parameter.

get_session([target, heuristic])

Gets a pre-configured requests session.

suds based SOAP access (tendril.utils.www.soap)

get_soap_client(wsdl[, cache_requests, …])

Creates and returns a suds/SOAP client instance bound to the provided WSDL.

Caching Strategies

The backends provided by these modules have integrated caching mechanisms built-in to speed up access to internet based resources.

Redirect Caching

Redirect caching speeds up network accesses by saving 301 and 302 redirects, and not needing to get the correct URL on a second access. This redirect cache is stored as a pickled object in the INSTANCE_CACHE folder. The effect of this caching is far more apparent when a replicator cache is also used.

Redirect caching is only supported by the urllib based backend (tendril.utils.www.bare), and is likely going to be phased out entirely in the future.

Full Response Caching

This is a more typical kind of caching, which uses a backend-dependent mechanism to maintain a cache of full responses received.

Todo

Consider replacing uses of urllib/urllib2 backend with requests and simplify this module. Currently, the cache provided with the requests implementation here is the major bottleneck and seems to cause a major performance hit.