27
pydrill Documentation Release 0.3.4 Wojciech Nowak Apr 24, 2018

pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

pydrill DocumentationRelease 0.3.4

Wojciech Nowak

Apr 24, 2018

Page 2: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int
Page 3: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

Contents

1 pydrill 31.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Sample usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Supported api calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Installation 7

3 Usage 93.1 Supported api calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Contributing 134.1 Types of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.2 Get Started! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.3 Pull Request Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.4 Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5 Credits 175.1 Development Lead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.2 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6 History 196.1 0.3.4 (2017-04-24) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196.2 0.3.3 (2017-04-24) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196.3 0.3.2 (2017-04-18) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196.4 0.3.1 (2017-03-06) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196.5 0.3 (2017-02-15) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196.6 0.1.1 (2016-05-21) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206.7 0.1.0 (2016-05-19) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206.8 0.0.2 (2016-04-24) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206.9 0.0.1 (2015-12-28) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

7 Indices and tables 21

i

Page 4: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

ii

Page 5: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

pydrill Documentation, Release 0.3.4

Contents:

Contents 1

Page 6: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

pydrill Documentation, Release 0.3.4

2 Contents

Page 7: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

CHAPTER 1

pydrill

Python Driver for Apache Drill.

Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage

• Free software: MIT license

• Documentation: https://pydrill.readthedocs.org.

1.1 Features

• Python 2/3 compatibility,

• Support for all rest API calls inluding profiles/options/metrics docs with full list.

• Mapping Results to internal python types,

• Compatibility with Pandas data frame,

• Drill Authentication using PAM,

1.2 Installation

Version from https://pypi.python.org/pypi/pydrill:

$ pip install pydrill

Latest version from git:

$ pip install git+git://github.com/PythonicNinja/pydrill.git

3

Page 8: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

pydrill Documentation, Release 0.3.4

1.3 Sample usage

from pydrill.client import PyDrill

drill = PyDrill(host='localhost', port=8047)

if not drill.is_active():raise ImproperlyConfigured('Please run Drill first')

yelp_reviews = drill.query('''SELECT * FROM`dfs.root`.`./Users/macbookair/Downloads/yelp_dataset_challenge_academic_dataset/

→˓yelp_academic_dataset_review.json`LIMIT 5

''')

for result in yelp_reviews:print("%s: %s" %(result['type'], result['date']))

# pandas dataframe

df = yelp_reviews.to_dataframe()print(df[df['stars'] > 3])

1.4 Supported api calls

class pydrill.client.PyDrill(host=’localhost’, port=8047, trasport_class=<class ’py-drill.transport.Transport’>, connection_class=<class ’py-drill.connection.requests_conn.RequestsHttpConnection’>,auth=None, **kwargs)

>>> drill = PyDrill(host='localhost', port=8047)>>> drill.is_active()True

is_active(timeout=2)

Parameters timeout – int

Returns boolean

metrics(timeout=10)Get the current memory metrics.

Parameters timeout – int

Returns pydrill.client.Result

options(timeout=10)List the name, default, and data type of the system and session options.

Parameters timeout – int

Returns pydrill.client.Result

perform_request(method, url, params=None, body=None)

4 Chapter 1. pydrill

Page 9: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

pydrill Documentation, Release 0.3.4

plan(sql, timeout=10)

Parameters

• sql – string

• timeout – int

Returns pydrill.client.ResultQuery

profile(query_id, timeout=10)Get the profile of the query that has the given queryid.

Parameters

• query_id – The UUID of the query in standard UUID format that Drill assigns to eachquery.

• timeout – int

Returns pydrill.client.Result

profile_cancel(query_id, timeout=10)Cancel the query that has the given queryid.

Parameters

• query_id – The UUID of the query in standard UUID format that Drill assigns to eachquery.

• timeout – int

Returns pydrill.client.Result

profiles(timeout=10)Get the profiles of running and completed queries.

Parameters timeout – int

Returns pydrill.client.Result

query(sql, timeout=10)Submit a query and return results.

Parameters

• sql – string

• timeout – int

Returns pydrill.client.ResultQuery

stats(timeout=10)Get Drillbit information, such as ports numbers.

Parameters timeout – int

Returns pydrill.client.Stats

storage(timeout=10)Get the list of storage plugin names and configurations.

Parameters timeout – int

Returns pydrill.client.Result

storage_delete(name, timeout=10)Delete a storage plugin configuration.

1.4. Supported api calls 5

Page 10: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

pydrill Documentation, Release 0.3.4

Parameters

• name – The name of the storage plugin configuration to delete.

• timeout – int

Returns pydrill.client.Result

storage_detail(name, timeout=10)Get the definition of the named storage plugin.

Parameters

• name – The assigned name in the storage plugin definition.

• timeout – int

Returns pydrill.client.Result

storage_enable(name, value=True, timeout=10)Enable or disable the named storage plugin.

Parameters

• name – The assigned name in the storage plugin definition.

• value – Either True (to enable) or False (to disable).

• timeout – int

Returns pydrill.client.Result

storage_update(name, config, timeout=10)Create or update a storage plugin configuration.

Parameters

• name – The name of the storage plugin configuration to create or update.

• config – Overwrites the existing configuration if there is any, and therefore, must includeall

required attributes and definitions. :param timeout: int :return: pydrill.client.Result

threads(timeout=10)Get the status of threads.

Parameters timeout – int

Returns pydrill.client.Result

6 Chapter 1. pydrill

Page 11: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

CHAPTER 2

Installation

At the command line:

$ easy_install pydrill

Or, if you have virtualenvwrapper installed:

$ mkvirtualenv pydrill$ pip install pydrill

Latest version from git:

$ pip install git+git://github.com/PythonicNinja/pydrill.git

7

Page 12: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

pydrill Documentation, Release 0.3.4

8 Chapter 2. Installation

Page 13: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

CHAPTER 3

Usage

To use pydrill in a project:

from pydrill.client import PyDrill

drill = PyDrill(host='localhost', port=8047)

You can also initialize via environment variables such as:

PYDRILL_HOSTPYDRILL_PORT

You can use Drill PAM authentication via auth param:

drill = PyDrill(auth='drill_user:drill_password')

To enable specific storage plugin you can:

drill.storage_enable('mongo')

You can view all queries which were executed or are running:

drill.profiles()

To check if Drill is running:

if drill.is_active():your_code

Query involves only providing sql:

employees = drill.query('''SELECT * FROM cp.`employee.json` LIMIT 5

''')

9

Page 14: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

pydrill Documentation, Release 0.3.4

for employee in employees:print result

If you feel like building sql queries is not nicest thing ever you should try pydrill_dsl https://pypi.python.org/pypi/pydrill_dsl

Support for pandas:

# pandas dataframedf = employees.to_dataframe()print(df[df['salary'] > 20000])

3.1 Supported api calls

class pydrill.client.PyDrill(host=’localhost’, port=8047, trasport_class=<class ’py-drill.transport.Transport’>, connection_class=<class ’py-drill.connection.requests_conn.RequestsHttpConnection’>,auth=None, **kwargs)

>>> drill = PyDrill(host='localhost', port=8047)>>> drill.is_active()True

is_active(timeout=2)

Parameters timeout – int

Returns boolean

metrics(timeout=10)Get the current memory metrics.

Parameters timeout – int

Returns pydrill.client.Result

options(timeout=10)List the name, default, and data type of the system and session options.

Parameters timeout – int

Returns pydrill.client.Result

perform_request(method, url, params=None, body=None)

plan(sql, timeout=10)

Parameters

• sql – string

• timeout – int

Returns pydrill.client.ResultQuery

profile(query_id, timeout=10)Get the profile of the query that has the given queryid.

Parameters

10 Chapter 3. Usage

Page 15: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

pydrill Documentation, Release 0.3.4

• query_id – The UUID of the query in standard UUID format that Drill assigns to eachquery.

• timeout – int

Returns pydrill.client.Result

profile_cancel(query_id, timeout=10)Cancel the query that has the given queryid.

Parameters

• query_id – The UUID of the query in standard UUID format that Drill assigns to eachquery.

• timeout – int

Returns pydrill.client.Result

profiles(timeout=10)Get the profiles of running and completed queries.

Parameters timeout – int

Returns pydrill.client.Result

query(sql, timeout=10)Submit a query and return results.

Parameters

• sql – string

• timeout – int

Returns pydrill.client.ResultQuery

stats(timeout=10)Get Drillbit information, such as ports numbers.

Parameters timeout – int

Returns pydrill.client.Stats

storage(timeout=10)Get the list of storage plugin names and configurations.

Parameters timeout – int

Returns pydrill.client.Result

storage_delete(name, timeout=10)Delete a storage plugin configuration.

Parameters

• name – The name of the storage plugin configuration to delete.

• timeout – int

Returns pydrill.client.Result

storage_detail(name, timeout=10)Get the definition of the named storage plugin.

Parameters

• name – The assigned name in the storage plugin definition.

3.1. Supported api calls 11

Page 16: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

pydrill Documentation, Release 0.3.4

• timeout – int

Returns pydrill.client.Result

storage_enable(name, value=True, timeout=10)Enable or disable the named storage plugin.

Parameters

• name – The assigned name in the storage plugin definition.

• value – Either True (to enable) or False (to disable).

• timeout – int

Returns pydrill.client.Result

storage_update(name, config, timeout=10)Create or update a storage plugin configuration.

Parameters

• name – The name of the storage plugin configuration to create or update.

• config – Overwrites the existing configuration if there is any, and therefore, must includeall

required attributes and definitions. :param timeout: int :return: pydrill.client.Result

threads(timeout=10)Get the status of threads.

Parameters timeout – int

Returns pydrill.client.Result

12 Chapter 3. Usage

Page 17: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

CHAPTER 4

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

You can contribute in many ways:

4.1 Types of Contributions

4.1.1 Report Bugs

Report bugs at https://github.com/PythonicNinja/pydrill/issues.

If you are reporting a bug, please include:

• Your operating system name and version.

• Any details about your local setup that might be helpful in troubleshooting.

• Detailed steps to reproduce the bug.

4.1.2 Fix Bugs

Look through the GitHub issues for bugs. Anything tagged with “bug” is open to whoever wants to implement it.

4.1.3 Implement Features

Look through the GitHub issues for features. Anything tagged with “feature” is open to whoever wants to implementit.

13

Page 18: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

pydrill Documentation, Release 0.3.4

4.1.4 Write Documentation

pydrill could always use more documentation, whether as part of the official pydrill docs, in docstrings, or even on theweb in blog posts, articles, and such.

4.1.5 Submit Feedback

The best way to send feedback is to file an issue at https://github.com/PythonicNinja/pydrill/issues.

If you are proposing a feature:

• Explain in detail how it would work.

• Keep the scope as narrow as possible, to make it easier to implement.

• Remember that this is a volunteer-driven project, and that contributions are welcome :)

4.2 Get Started!

Ready to contribute? Here’s how to set up pydrill for local development.

1. Fork the pydrill repo on GitHub.

2. Clone your fork locally:

$ git clone [email protected]:your_name_here/pydrill.git

3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set upyour fork for local development:

$ mkvirtualenv pydrill$ cd pydrill/$ python setup.py develop

4. Create a branch for local development:

$ git checkout -b name-of-your-bugfix-or-feature

Now you can make your changes locally.

5. When you’re done making changes, check that your changes pass flake8 and the tests, including testing otherPython versions with tox:

$ source ./run_docker (it will export PYDRILL_HOST and PYDRILL_PORT)or$ export PYDRILL_HOST='127.0.0.1'$ export PYDRILL_PORT='8047'$ tox -e run-isort # it will update imports$ tox -e check-isort # it will check if imports are correct$ tox -e check-flake8 # it will check quality by flake8$ tox -e py27 # run tests with py27$ tox # run all

To get flake8 and tox, just pip install them into your virtualenv.

6. Commit your changes and push your branch to GitHub:

14 Chapter 4. Contributing

Page 19: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

pydrill Documentation, Release 0.3.4

$ git add .$ git commit -m "Your detailed description of your changes."$ git push origin name-of-your-bugfix-or-feature

7. Submit a pull request through the GitHub website.

4.3 Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:

1. The pull request should include tests.

2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a functionwith a docstring, and add the feature to the list in README.rst.

3. The pull request should work for Python 2.6, 2.7, 3.3, 3.4 and 3.5, and for PyPy. Check https://travis-ci.org/PythonicNinja/pydrill/pull_requests and make sure that the tests pass for all supported Python versions.

4.4 Tips

To run a subset of tests:

$ tox -e py27 -- -k threads # it will only run tests which have key threads in name

4.3. Pull Request Guidelines 15

Page 20: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

pydrill Documentation, Release 0.3.4

16 Chapter 4. Contributing

Page 21: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

CHAPTER 5

Credits

5.1 Development Lead

• Wojciech Nowak <[email protected]>

5.2 Contributors

None yet. Why not be the first?

17

Page 22: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

pydrill Documentation, Release 0.3.4

18 Chapter 5. Credits

Page 23: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

CHAPTER 6

History

6.1 0.3.4 (2017-04-24)

• Updated pypi listing long_description

6.2 0.3.3 (2017-04-24)

• Fix pypi installation

6.3 0.3.2 (2017-04-18)

• Support for dtype on to_dataframe

6.4 0.3.1 (2017-03-06)

• Support for Drill Authentication using PAM

6.5 0.3 (2017-02-15)

• requests response encoding (utf-8)

• support Python 3.6 support

19

Page 24: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

pydrill Documentation, Release 0.3.4

6.6 0.1.1 (2016-05-21)

• Anaconda requirements fixed

6.7 0.1.0 (2016-05-19)

• First minor release

• Updated docs

6.8 0.0.2 (2016-04-24)

• First release on PyPI.

• Implementation of metrics/storage/options/stats

• Builds are tested by docker container with Apache Drill running

• support for pandas with ResultQuery.to_dataframe

6.9 0.0.1 (2015-12-28)

• Project start

20 Chapter 6. History

Page 25: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

CHAPTER 7

Indices and tables

• genindex

• modindex

• search

21

Page 26: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

pydrill Documentation, Release 0.3.4

22 Chapter 7. Indices and tables

Page 27: pydrill Documentation - Read the Docs · pydrill Documentation, Release 0.3.4 Parameters • name – The name of the storage plugin configuration to delete. • timeout – int

Index

Iis_active() (pydrill.client.PyDrill method), 4, 10

Mmetrics() (pydrill.client.PyDrill method), 4, 10

Ooptions() (pydrill.client.PyDrill method), 4, 10

Pperform_request() (pydrill.client.PyDrill method), 4, 10plan() (pydrill.client.PyDrill method), 4, 10profile() (pydrill.client.PyDrill method), 5, 10profile_cancel() (pydrill.client.PyDrill method), 5, 11profiles() (pydrill.client.PyDrill method), 5, 11PyDrill (class in pydrill.client), 4, 10

Qquery() (pydrill.client.PyDrill method), 5, 11

Sstats() (pydrill.client.PyDrill method), 5, 11storage() (pydrill.client.PyDrill method), 5, 11storage_delete() (pydrill.client.PyDrill method), 5, 11storage_detail() (pydrill.client.PyDrill method), 6, 11storage_enable() (pydrill.client.PyDrill method), 6, 12storage_update() (pydrill.client.PyDrill method), 6, 12

Tthreads() (pydrill.client.PyDrill method), 6, 12

23