requests based www backend (tendril.utils.www.bare)

TODO Some introduction

tendril.utils.www.req.requests_cache = <cachecontrol.caches.file_cache.FileCache object>

The module’s cachecontrol.caches.FileCache instance which should be used whenever cached requests responses are desired. The cache is stored in the directory defined by tendril.config.REQUESTS_CACHE. This cache uses very weak permissions. These should probably be fine tuned.

tendril.utils.www.req._get_requests_cache_adapter(heuristic)[source]

Given a heuristic, constructs and returns a cachecontrol.CacheControlAdapter attached to the instance’s requests_cache.

tendril.utils.www.req.get_session(target='http://', heuristic=None)[source]

Gets a pre-configured requests session.

This function configures the following behavior into the session :

  • Proxy settings are added to the session.

  • It is configured to use the instance’s requests_cache.

  • Permanent redirect caching is handled by CacheControl.

  • Temporary redirect caching is not supported.

Each module / class instance which uses this should subsequently maintain it’s own session with whatever modifications it requires within a scope which makes sense for the use case (and probably close it when it’s done).

The session returned from here uses the instance’s REQUESTS_CACHE with a single - though configurable - heuristic. If additional caches or heuristics need to be added, it’s the caller’s problem to set them up.

Note

The caching here seems to be pretty bad, particularly for digikey passive component search. I don’t know why.

Parameters
  • target – Defaults to 'http://'. string containing a prefix for the targets that should be cached. Use this to setup site-specific heuristics.

  • heuristic (cachecontrol.heuristics.BaseHeuristic) – The heuristic to use for the cache adapter.

Return type

requests.Session

tendril.utils.www.req.get_soup_requests(url, session=None)[source]

Gets a bs4 parsed soup for the url specified by the parameter. The lxml parser is used.

If a session (previously created from get_session()) is provided, this session is used and left open. If it is not, a new session is created for the request and closed before the soup is returned.

Using a caller-defined session allows re-use of a single session across multiple requests, therefore taking advantage of HTTP keep-alive to speed things up. It also provides a way for the caller to modify the cache heuristic, if needed.

Any exceptions encountered will be raised, and are left for the caller to handle. The assumption is that a HTTP or URL error is going to make the soup unusable anyway.